What kind of content does Rozz index from my website, and how does it ensure accuracy when answering questions?
**What kind of content does Rozz index from my website, and how does it ensure accuracy when answering questions?**
Brief answer
- Rozz indexes only your public website content (pages, docs, help articles, FAQs, marketing content and other public site pages), converts that content into embeddings (Pinecone) and uses a RAG pipeline plus content filters and dashboard controls to generate answers grounded in those pages — not your private backend.
Explanation / how accuracy is ensured
- What gets indexed
- Public site pages and content visible to a normal visitor: documentation, help center articles, FAQs, marketing pages, Q&A pages and any AI-optimized GEO pages you publish. (Rozz does not access backend-only or private data.) [How does the Rozz chatbot ensure security and privacy?](https://rozz.site/qna/rozz-chatbot-security-and-privacy.html) and [Content / About](https://rozz.site/about.html)
- How content is prepared
- Rozz crawls from the “user point of view” and converts site text into vector embeddings (stored in Pinecone) for semantic retrieval. [Why is Website Search Broken…](https://rozz.site/qna/why-website-search-is-broken-and-how-to-fix-it.html)
- GEO optimization (schema/QAPage markup, llms.txt, sitemaps, timestamping) improves discoverability and retrieval quality. [About / Content](https://rozz.site/about.html)
- Filtering and quality controls (to reduce noise and hallucinations)
- Automated GEO pipeline applies PII redaction, quality thresholding, and semantic deduplication (e.g., ~90% similarity cutoff) so only high-signal, unique passages feed the generator. [Retrieval coverage / RAG details](https://rozz.site/qna/how-does-retrieval-coverage-change-between-basic-rag-and.html)
- Corrective context filtering and re-ranking (cross-encoder style) discard low-confidence or irrelevant documents before generation. [Retrieval coverage / RAG details](https://rozz.site/qna/how-does-retrieval-coverage-change-between-basic-rag-and.html)
- Retrieval & generation safeguards
- Retrieval-Augmented Generation (RAG) uses the retrieved site passages as the source for answers so responses are grounded in actual site text (reducing hallucinations). Iterative query rewriting or agentic retrieval can improve coverage when initial retrieval misses relevant items. [Retrieval coverage / RAG details](https://rozz.site/qna/how-does-retrieval-coverage-change-between-basic-rag-and.html)
- Rozz includes protections against prompt injection, XSS and other web threats and runs as an independent web component (no backend integration required). [Security & privacy](https://rozz.site/qna/rozz-chatbot-security-and-privacy.html)
- Human-in-the-loop / dashboard controls
- You can review and edit cached responses in the Rozz Dashboard (cache items retained ~2 months) and inspect Q&A logs to correct or improve answers over time. This lets you quickly fix any misanswers and raise overall accuracy. [What's Included in the Rozz Dashboard?](https://rozz.site/qna/introducing-the-rozz-dashboard.html)
Primary sources
- [How does the Rozz chatbot ensure security and privacy?](https://rozz.site/qna/rozz-chatbot-security-and-privacy.html)
- [What's Included in the Rozz Dashboard?](https://rozz.site/qna/introducing-the-rozz-dashboard.html)
- [How does retrieval coverage change between basic RAG and advanced agentic RAG?](https://rozz.site/qna/how-does-retrieval-coverage-change-between-basic-rag-and.html)
- [Content / About](https://rozz.site/about.html)
- [Why is Website Search Broken and How Can We Fix It?](https://rozz.site/qna/why-website-search-is-broken-and-how-to-fix-it.html)
One quick question to help me tailor advice: Do you primarily want Rozz to index public documentation/marketing pages, or do you also have internal/private content you’re hoping to make searchable?