What kind of content does Rozz index from my website, and how does it ensure accuracy when answering questions?
What kind of content does Rozz index from my website, and how does it ensure accuracy when answering questions?
Brief answer
- Rozz indexes only public website content (pages, docs, help articles, FAQs, developer docs and any on-site text you expose). It converts that content into semantic vectors (Pinecone) and retrieves the best-matching passages to generate answers. Accuracy is enforced by using only site content for generation, automated filtering (PII redaction, quality thresholds, deduplication), re‑ranking of retrieved passages, and tools to review/edit cached answers in the dashboard.
Detailed explanation
- What it indexes
- Public-facing pages and content visible to a normal site visitor: documentation, help center articles, marketing pages, FAQs, Q&A pages and similar content (no backend/private data or integrations are required or accessed). How does the Rozz chatbot ensure security and privacy?
- Rozz is used to pull together content from multiple on-site sources so users can ask natural-language questions without knowing where the original content lives. Content (about)
- How it represents and retrieves content
- Rozz vectorizes site content into a vector DB (Pinecone) and uses semantic embeddings so retrieval is intent-aware, not just keyword matching. Why is Website Search Broken and How Can We Fix It?
- How it ensures accuracy / reduces hallucinations
- Answers are generated from the actual retrieved site passages (RAG) rather than fabricated facts; relying on site content reduces hallucination risk. How does the Rozz chatbot ensure security and privacy?
- The GEO/RAG pipeline applies quality filters before generation: PII redaction, quality thresholding, semantic deduplication (e.g., ~90% similarity removal), and corrective context filtering to remove noisy or irrelevant passages. How does retrieval coverage change between basic RAG and advanced agentic RAG?
- Advanced steps (query rewriting, multi-source routing, cross-encoder re-ranking) increase recall and precision so the model sees higher‑signal context. How does retrieval coverage change between basic RAG and advanced agentic RAG?
- Operational controls: Rozz logs questions and generated answers to the dashboard where you can review interactions and directly edit cached responses (cache items persist for ~2 months), enabling human-in-the-loop corrections and continuous improvement. What's Included in the Rozz Dashboard?
- Privacy / scope constraint
- Rozz only crawls content a normal visitor can see; it does not access private backend data or hidden endpoints. That constraint both protects privacy and bounds the evidence base for answers. How does the Rozz chatbot ensure security and privacy?
Sources
- How does the Rozz chatbot ensure security and privacy?
- What's Included in the Rozz Dashboard?
- How does retrieval coverage change between basic RAG and advanced agentic RAG?
- Content (about)
- Why is Website Search Broken and How Can We Fix It?
One quick question to help me tailor advice: Do you want Rozz to prioritize certain parts of your site (e.g., product docs vs blog vs support tickets) or to explicitly exclude any sections?