Question

What kind of content does Rozz index from my website, and how does it ensure accuracy when answering questions?

Answer Medium Confidence (77%)

**What kind of content does Rozz index from my website, and how does it ensure accuracy when answering questions?**

Short answer

- Rozz indexes only your public website content (pages, docs, help articles, FAQs, Q&A pages and schema-marked content) and stores semantic vectors for retrieval. It then answers by grounding responses on the retrieved passages and several automated validation/curation steps to reduce hallucinations and improve precision. [Content](https://rozz.site/about.html) • [Why website search is broken](https://rozz.site/qna/why-website-search-is-broken-and-how-to-fix-it.html)

How Rozz decides what to index

- Sources: public site pages, documentation, help center articles, marketing pages, FAQ/Q&A pages and any JSON‑LD / Schema.org structured content the site exposes. [Content](https://rozz.site/about.html)

- Crawl method: Rozz crawls from a user’s point-of-view (no backend integration required) and only collects public content. [Security & Privacy](https://rozz.site/qna/rozz-chatbot-security-and-privacy.html)

- Storage: indexed content is vectorized (semantic embeddings) and stored in a vector store (the site notes Pinecone as an example). [Why website search is broken](https://rozz.site/qna/why-website-search-is-broken-and-how-to-fix-it.html)

How Rozz ensures answer accuracy (key mechanisms)

- Retrieval-augmented generation (RAG): answers are generated from the most relevant retrieved passages rather than pure LLM recall, so responses are grounded in your site content. [Why website search is broken](https://rozz.site/qna/why-website-search-is-broken-and-how-to-fix-it.html)

- Multi-step / advanced retrieval: query rewriting, query routing, and multi-source retrieval improve coverage when initial hits are weak. Documents are re-ranked (cross-encoder style) to prioritize the best evidence. [Retrieval coverage (RAG)](https://rozz.site/qna/how-does-retrieval-coverage-change-between-basic-rag-and.html)

- Filtering & curation (GEO pipeline): before content powers answers, Rozz applies quality thresholding, PII redaction, semantic deduplication (e.g., ~90% similarity filtering) and other filters so only high-signal content is used. This reduces noisy or duplicate evidence that can cause errors. [Retrieval coverage (RAG)](https://rozz.site/qna/how-does-retrieval-coverage-change-between-basic-rag-and.html)

- Source citation and logging: Rozz records the question + generated answer (visible in the dashboard) so you can inspect, edit, or re-publish answers; this lets teams correct errors and improve future responses. Cache items are editable and kept for a period (dashboard features). [Dashboard](https://rozz.site/qna/introducing-the-rozz-dashboard.html)

- Security & scope constraints: Rozz only accesses public site content and runs as an independent web component with guardrails (prompt injection/XSS protections), which reduces exposure to unrelated or sensitive data that could produce wrong answers. [Security & Privacy](https://rozz.site/qna/rozz-chatbot-security-and-privacy.html)

What that means for accuracy in practice

- Answers are more accurate because they are: (1) semantically matched to your actual pages, (2) curated and deduplicated, (3) re-ranked for relevance, and (4) reviewable/patchable via the dashboard. The system also uses question rewriting and iterative retrieval to fill gaps when the first pass doesn’t find enough evidence. [Retrieval coverage (RAG)](https://rozz.site/qna/how-does-retrieval-coverage-change-between-basic-rag-and.html) • [Dashboard](https://rozz.site/qna/introducing-the-rozz-dashboard.html)

Sources

- [How does the Rozz chatbot ensure security and privacy?](https://rozz.site/qna/rozz-chatbot-security-and-privacy.html)

- [What's Included in the Rozz Dashboard?](https://rozz.site/qna/introducing-the-rozz-dashboard.html)

- [How does retrieval coverage change between basic RAG and advanced agentic RAG?](https://rozz.site/qna/how-does-retrieval-coverage-change-between-basic-rag-and.html)

- [Content (About Rozz / Genymotion example)](https://rozz.site/about.html)

- [Why is Website Search Broken and How Can We Fix It?](https://rozz.site/qna/why-website-search-is-broken-and-how-to-fix-it.html)

Follow-up question

- Which types of content do you have on your site today (documentation, help center, product pages, blogs, FAQ/Q&A), and which ones are most important for Rozz to prioritize?