Question

What kind of content does Rozz index from my website, and how does it ensure accuracy when answering questions?

Answer Medium Confidence (76%)

**What kind of content does Rozz index from my website, and how does it ensure accuracy when answering questions?**

Brief answer

- Rozz indexes only public website content (pages, docs, help articles, FAQs, developer docs and any on-site text you expose). It converts that content into semantic vectors (Pinecone) and retrieves the best-matching passages to generate answers. Accuracy is enforced by using only site content for generation, automated filtering (PII redaction, quality thresholds, deduplication), re‑ranking of retrieved passages, and tools to review/edit cached answers in the dashboard.

Detailed explanation

- What it indexes

- Public-facing pages and content visible to a normal site visitor: documentation, help center articles, marketing pages, FAQs, Q&A pages and similar content (no backend/private data or integrations are required or accessed). [How does the Rozz chatbot ensure security and privacy?](https://rozz.site/qna/rozz-chatbot-security-and-privacy.html)

- Rozz is used to pull together content from multiple on-site sources so users can ask natural-language questions without knowing where the original content lives. [Content (about)](https://rozz.site/about.html)

- How it represents and retrieves content

- Rozz vectorizes site content into a vector DB (Pinecone) and uses semantic embeddings so retrieval is intent-aware, not just keyword matching. [Why is Website Search Broken and How Can We Fix It?](https://rozz.site/qna/why-website-search-is-broken-and-how-to-fix-it.html)

- How it ensures accuracy / reduces hallucinations

- Answers are generated from the actual retrieved site passages (RAG) rather than fabricated facts; relying on site content reduces hallucination risk. [How does the Rozz chatbot ensure security and privacy?](https://rozz.site/qna/rozz-chatbot-security-and-privacy.html)

- The GEO/RAG pipeline applies quality filters before generation: PII redaction, quality thresholding, semantic deduplication (e.g., ~90% similarity removal), and corrective context filtering to remove noisy or irrelevant passages. [How does retrieval coverage change between basic RAG and advanced agentic RAG?](https://rozz.site/qna/how-does-retrieval-coverage-change-between-basic-rag-and.html)

- Advanced steps (query rewriting, multi-source routing, cross-encoder re-ranking) increase recall and precision so the model sees higher‑signal context. [How does retrieval coverage change between basic RAG and advanced agentic RAG?](https://rozz.site/qna/how-does-retrieval-coverage-change-between-basic-rag-and.html)

- Operational controls: Rozz logs questions and generated answers to the dashboard where you can review interactions and directly edit cached responses (cache items persist for ~2 months), enabling human-in-the-loop corrections and continuous improvement. [What's Included in the Rozz Dashboard?](https://rozz.site/qna/introducing-the-rozz-dashboard.html)

- Privacy / scope constraint

- Rozz only crawls content a normal visitor can see; it does not access private backend data or hidden endpoints. That constraint both protects privacy and bounds the evidence base for answers. [How does the Rozz chatbot ensure security and privacy?](https://rozz.site/qna/rozz-chatbot-security-and-privacy.html)

Sources

- [How does the Rozz chatbot ensure security and privacy?](https://rozz.site/qna/rozz-chatbot-security-and-privacy.html)

- [What's Included in the Rozz Dashboard?](https://rozz.site/qna/introducing-the-rozz-dashboard.html)

- [How does retrieval coverage change between basic RAG and advanced agentic RAG?](https://rozz.site/qna/how-does-retrieval-coverage-change-between-basic-rag-and.html)

- [Content (about)](https://rozz.site/about.html)

- [Why is Website Search Broken and How Can We Fix It?](https://rozz.site/qna/why-website-search-is-broken-and-how-to-fix-it.html)

One quick question to help me tailor advice: Do you want Rozz to prioritize certain parts of your site (e.g., product docs vs blog vs support tickets) or to explicitly exclude any sections?