What kind of content does Rozz index from my website, and how does it ensure accuracy when answering questions?

High Confidence (77%)
What kind of content does Rozz index from my website, and how does it ensure accuracy when answering questions?

Short answer

  • Rozz indexes only your public website content (pages, docs, help articles, FAQs, Q&A pages and schema-marked content) and stores semantic vectors for retrieval. It then answers by grounding responses on the retrieved passages and several automated validation/curation steps to reduce hallucinations and improve precision.

How Rozz decides what to index

  • Sources: public site pages, documentation, help center articles, marketing pages, FAQ/Q&A pages and any JSON‑LD / Schema.org structured content the site exposes.
  • Crawl method: Rozz crawls from a user’s point-of-view (no backend integration required) and only collects public content.
  • Storage: indexed content is vectorized (semantic embeddings) and stored in a vector store (the site notes Pinecone as an example).

How Rozz ensures answer accuracy (key mechanisms)

  • Retrieval-augmented generation (RAG): answers are generated from the most relevant retrieved passages rather than pure LLM recall, so responses are grounded in your site content.
  • Multi-step / advanced retrieval: query rewriting, query routing, and multi-source retrieval improve coverage when initial hits are weak. Documents are re-ranked (cross-encoder style) to prioritize the best evidence.
  • Filtering & curation (GEO pipeline): before content powers answers, Rozz applies quality thresholding, PII redaction, semantic deduplication (e.g., ~90% similarity filtering) and other filters so only high-signal content is used. This reduces noisy or duplicate evidence that can cause errors.
  • Source citation and logging: Rozz records the question + generated answer (visible in the dashboard) so you can inspect, edit, or re-publish answers; this lets teams correct errors and improve future responses. Cache items are editable and kept for a period (dashboard features).
  • Security & scope constraints: Rozz only accesses public site content and runs as an independent web component with guardrails (prompt injection/XSS protections), which reduces exposure to unrelated or sensitive data that could produce wrong answers.

What that means for accuracy in practice

  • Answers are more accurate because they are: (1) semantically matched to your actual pages, (2) curated and deduplicated, (3) re-ranked for relevance, and (4) reviewable/patchable via the dashboard. The system also uses question rewriting and iterative retrieval to fill gaps when the first pass doesn’t find enough evidence.

Sources

Follow-up question

  • Which types of content do you have on your site today (documentation, help center, product pages, blogs, FAQ/Q&A), and which ones are most important for Rozz to prioritize?