What kind of content does Rozz index from my website, and how does it ensure accuracy when answering questions?

Question

What kind of content does Rozz index from my website, and how does it ensure accuracy when answering questions?

Topic: General

Accepted Answer

What kind of content does Rozz index from my website, and how does it ensure accuracy when answering questions?

Brief answer

Rozz indexes only your public website content (pages, docs, help articles, FAQs, marketing content and other public site pages), converts that content into embeddings (Pinecone) and uses a RAG pipeline plus content filters and dashboard controls to generate answers grounded in those pages — not your private backend.

Explanation / how accuracy is ensured

What gets indexed
Public site pages and content visible to a normal visitor: documentation, help center articles, FAQs, marketing pages, Q&A pages and any AI-optimized GEO pages you publish. (Rozz does not access backend-only or private data.) How does the Rozz chatbot ensure security and privacy? and Content / About
How content is prepared
Rozz crawls from the “user point of view” and converts site text into vector embeddings (stored in Pinecone) for semantic retrieval. Why is Website Search Broken…
GEO optimization (schema/QAPage markup, llms.txt, sitemaps, timestamping) improves discoverability and retrieval quality. About / Content
Filtering and quality controls (to reduce noise and hallucinations)
Automated GEO pipeline applies PII redaction, quality thresholding, and semantic deduplication (e.g., ~90% similarity cutoff) so only high-signal, unique passages feed the generator. Retrieval coverage / RAG details
Corrective context filtering and re-ranking (cross-encoder style) discard low-confidence or irrelevant documents before generation. Retrieval coverage / RAG details
Retrieval & generation safeguards
Retrieval-Augmented Generation (RAG) uses the retrieved site passages as the source for answers so responses are grounded in actual site text (reducing hallucinations). Iterative query rewriting or agentic retrieval can improve coverage when initial retrieval misses relevant items. Retrieval coverage / RAG details
Rozz includes protections against prompt injection, XSS and other web threats and runs as an independent web component (no backend integration required). Security & privacy
Human-in-the-loop / dashboard controls
You can review and edit cached responses in the Rozz Dashboard (cache items retained ~2 months) and inspect Q&A logs to correct or improve answers over time. This lets you quickly fix any misanswers and raise overall accuracy. What's Included in the Rozz Dashboard?

Primary sources

One quick question to help me tailor advice: Do you primarily want Rozz to index public documentation/marketing pages, or do you also have internal/private content you’re hoping to make searchable?