What AI bots read, what they ignore, and what an AI site is actually for. Part 1: Citation bots

We built rozz.genymotion.com to market to AI agents, like genymotion.com markets to humans.

The article examines what bots actually fetch and what bots do not fetch.

The article pulls learnings from the observed behavior.

Teaser: Q&As are pretty popular (61% corpus coverage in 21 days).

llms.txt is not (0 ClaudeBot fetches).

sitemap.xml got 209.

What’s an AI site?

rozz.genymotion.com exists alongside genymotion.com. The marketing website serves humans. The marketing website is organized in a standard way: Products, Resources, Pricing etc.

The AI site serves AI agents. The AI agents include ChatGPT, Claude, Perplexity, and the crawlers that feed them. The AI site serves AI agents by presenting content in a different way. The different way is being researched live in this Insights series.

The article describes weekly findings after analyzing AI-site logs in this blog series. This article zooms out. The article uses the past three weeks of clean bot-log data. The data period is Apr 29 – May 19, 2026.

Quick scope note

The bot logs exist for rozz.genymotion.com. The article’s claims concern what bots do on the AI site. The article does not make claims about the human site.

What we built

The rozz.genymotion.com AI site structure includes the following components.

545 content pages ( /pages/ ) imported and rewritten from existing Genymotion documentation, support articles, and blog tutorials.
262 Q&A pages ( /qna/ ) generated from real chatbot conversations.

Each Q&A page corresponds to a question a user actually asked.

16 topic listings ( /topics/ ) grouping content by canonical topic.
2 runbooks ( /runbooks/ ) written for AI agents with terminal access.

gmsaas is for cloud.

gmtool is for desktop.

1 homepage with featured Q&As embedded in FAQPage JSON-LD.
AI-native discovery files: llms.txt, llms-full.txt, per-topic sitemaps.
JSON APIs: /api/qna.json, /api/pages.json, /api/topics.json, /api/.json.
Markdown alternatives: /index.md for the homepage.
Browse-all indexes: /pages/index.html, /qna/index.html, etc.
Traditional SEO infrastructure: robots.txt and sitemap.xml.

Some of the content supports the AI agent making a query on behalf of a human user. Some of the content supports a coding agent itself. The runbooks support coding agents. Genymotion is a tool for developers.

Some of the content supports crawler bots that build indexes for AI agents. The indexes feed AI agents.

The point of an AI site is that multiple consumers exist. The multiple consumers include AI agents, coding agents, and crawler bots. The AI site caters to each consumer and observes what sticks.

In this article, the focus is on citation bots. Citation bots provide the most direct business value as they occur during actual conversations with users.

What citation bots fetched

In the 21-day window, ChatGPT-User and Claude-User did 1,517 content fetches between them. The article reports the shape below.

| Slice | Share of fetches | Share of corpus | | --- | --- | --- | | Top 1 URI | ~8% | 0.1% | | Top 10 | ~37% | 1.2% | | Top 50 | ~78% | 6.2% | | Top 100 | ~93% | 12.4% |

A small number of URIs do most of the work.

What sits in the head

The most-fetched single content URI is the pricing Q&A. The pricing Q&A has 123 fetches.

After the pricing Q&A, the article lists other high-fetch Q&As. The rooted-device Q&A is next. The system-requirements Q&As are next. The SaaS-vs-Desktop comparison is next. The macOS compatibility Q&A is next.

On the pages side, the Burp Suite security-testing tutorial has 130 fetches. The pages also include /topics/android-version-selection as a topic listing. /topics/android-version-selection has 251 fetches.

This set is described as what the head of the AI site does: pricing, requirements, compatibility, and a security-testing tutorial.

The article states that the pro-enterprise skew introduced in article 13 is paying off.

Two surfaces, two patterns

The article reports two main types of content pages. One content type is Q&A pages at /qna. Another content type is cleaned website pages at /pages.

If Q&As are separated from pages, two different shapes appear.

| | /qna/ | /pages/ | | --- | --- | --- | | Corpus | 262 | 545 | | Fetched URIs | 160 | 132 | | Coverage | 61% | 24% | | Total fetches | 930 | 587 | | Top 1 share | 13% | 22% | | Top 10 share | 41% | 53% | | 1-fetch share (of fetched) | 35% | 51% | | Dark inventory (never fetched) | 39% | 76% |

Q&As get fetched 2.5× more proportionally than pages. The Q&A distribution shows more items being fetched. The pages Pareto is much steeper.

One tutorial accounts for 22% of all pages fetches. Three-quarters of the page corpus is unread in three weeks.

The article then attempts to explain the difference.

Q&As are the answers

Each Q&A in the corpus was generated from a real chatbot conversation. Every Q&A title corresponds to a question someone actually asked. The question is asked in the user’s actual words. The idea is to conform the content to actual user queries in the AI engines.

Top 10 Q&As (last 21 days)

| Fetches | Q&A | | --- | --- | | 123 | what-pricing-plans-are-available-for-genymotion | | 66 | does-genymotion-provide-rooted-device... | | 50 | how-much-memory-do-i-need-to-have-20-virtual-devices | | 26 | what-are-genymotion-desktop-s-system-requirements | | 24 | what-are-the-system-requirements | | 23 | what-are-genymotion-s-pricing-options-for-saas-and-desktop | | 19 | what-are-the-costs-for-using-genymotion-saas | | 19 | does-the-emulator-work-with-the-latest-mac-os | | 18 | how-do-i-install-genymotion-desktop-on-windows-macos-or-linux | | 17 | how-to-run-the-emulator-in-the-cloud |

The article states that this is what people ask AI about Genymotion when considering whether to use it.

The article states that there is a tail underneath that head. 100 Q&As were fetched once or twice each in the window. Examples of single-fetch Q&As include:

“I will be having 100 instances within a year — need to discuss”
“I am a reseller who purchased Genymotion Desktop business licenses...”
“I’m using Genymotion SaaS, can I extend the trial beyond the 6 days?”
“Does pay-as-you-go pricing plan allow root access, Google Play...”
“How do I create and configure arm64 cloud SaaS devices...”

Each example is described as one specific person evaluating Genymotion. Each example evaluation corresponds to one specific commercial use case. The Q&A is described as being fetched once by one AI session. The AI session is on behalf of one buyer.

The article states that this looks like a long-tail content pattern. The top 10 Q&As account for 41% of fetches. The remaining 150 fetched Q&As account for the other 59%.

The article states: The tail is bigger than the head.

Pages are the sources

The /pages/ corpus behaves differently. The Burp Suite tutorial is described as the most accessed page. The Burp Suite tutorial has 130 fetches. The Burp Suite tutorial accounts for 22% of all pages fetches.

After that, the article states that numbers drop fast. The Genymotion documentation hub has 48 fetches. The Linux install guide has 27 fetches. The article also describes a handful of requirements / install / root-access pages.

Most requirement/install/root-access pages are described as being in the 12–17 range. 122 pages got 10 or fewer fetches. More than half of those 122 pages got exactly one fetch each. 413 pages (76%) got nothing.

Role of a corpus where most items do not get fetched directly

The article asks what the role is. The article asks what the role is of a corpus where most items do not get fetched directly.

The article offers an answer using a stat. The article states that 79% of sessions that fetched a Q&A also fetched at least one source page cited in that Q&A’s “Based on these sources” sidebar.

The article interprets this as indicating that the model verifies the Q&A answer. The verification happens by pulling the cited source page in the same session. The /pages give proper credibility to the Q&A pages.

The article states that the silent 76% is not dead weight. Each page is described as a potential source citation for a Q&A.

When the Q&A gets asked, the page gets fetched. When the Q&A is not asked, the page sits.

The article states that the pages that are not mentioned in the sources sidebar could be trimmed. This trimming is described as a future action. The future action is considered if such pages exist.

Page single-fetch tail samples

Page single-fetch tail samples include:

react-native-ui-testing-with-detox-genymotion-saas (CI/CD integration)
how-to-install-magisk-on-genymotion (advanced rooting)
how-many-genymotion-virtual-devices-can-i-launch-per-aws-ec2 (cloud capacity)

A simple way to understand the /pages content is described. The /pages content is described as the supply of product and marketing information available to the Q&A. The article advises not cutting it. The article states that the market decides which ones to keep.

Summary

Q&As are demand-driven. Q&As are fetched as primary answers. Q&As are distributed in a real long tail.

Pages are supply-driven. Pages are fetched as source citations. Pages are distributed with a concentrated head and a large dormant body.

The article states that Q&As and pages play different roles. The article states that Q&As and pages are complementary.

What citation bots didn’t fetch

Citation bots rarely fetch the discovery layer. The discovery layer includes reference files. Reference files include robots.txt, sitemaps, llms.txt, and the various JSON APIs that the article says were published.

Citation bots do not browse around the website. Citation bots fetch URLs that the model already knows.

The discovery work is done by crawler bots upstream. Crawler bots include ClaudeBot, GPTBot, OAI-SearchBot, and PerplexityBot. These crawler bots build the indexes that citation bots draw from.

Session entry behavior

The article states that 39% of ChatGPT-User sessions start with a direct content-URL fetch. The article states that the bot knew the URL it wanted before the session began.

Teaser for next articles

The article provides a teaser for the next articles. The article states that those AI bot crawlers seem to heavily prefer traditional SEO infrastructure over the AI-native discovery standard.

The article states that in the same 21-day window: ClaudeBot fetched sitemap.xml 209 times. ClaudeBot fetched llms.txt zero times.

The article states combined behavior for OAI-SearchBot and GPTBot. OAI-SearchBot and GPTBot fetched sitemap.xml 16 times combined. OAI-SearchBot and GPTBot fetched llms.txt twice each.

The article states that AI-native files were published because the standard exists. The article states that the observations show that the AI-native files are not really used yet.

What the fetched content looks like

In the 21-day sample, some consistent patterns emerged. The article lists insights.

Decomposition by product line

The popular Q&As split the answer by product line. The product line split includes Genymotion Desktop: and Genymotion SaaS (Cloud):.

The popular Q&As also split by persona. The persona split includes Ideal for: Occasional users, pilots. The persona split includes Ideal for: Enterprises needing bespoke setups.

The article assumes the LLM can easily pick the content it needs to answer the question.

Tables for comparisons

The SaaS-vs-Desktop Q&A is described as an 8-row × 3-column table. The table covers Hosting, Scalability, Collaboration, Automation, Maintenance, Use Cases, Cost Model, Security & Compliance.

The rooted-device Q&A is described as a 3×3 task / need / how table.

The article states that LLMs seem like they are good at reading tables.

User-voice question titles

Some Q&A titles preserve typos. Some Q&A titles preserve missing words. Some Q&A titles preserve run-ons. Some Q&A titles preserve broken grammar.

The article states that the titles are made of real chatbot input. The article states that this seems to register with the LLMs.

Rewriting rules for the pages wording

The article states that the wording of the pages was rewritten according to fluency rules.

The fluency rules include: bullets-not-paragraphs, tables-not-prose, every-sentence-standalone, no anaphora (“this”, “the above”), no cross-paragraph dependencies.

The article states that the compressed form is less pleasant to read for humans. The article states that the compressed form is not the point.

The article states that the target reader is an AI agent. The AI agent extracts content and makes its own prose when chatting with a human.

The article states that human-readable prose still exists. The human-readable prose lives on genymotion.com.

The AI site is generated from the human-readable material via the rewriting pipeline. Canonical URLs on every AI-site page point back to genymotion.com.

The article states that the AI site is not a replacement for the human site. The article states that the AI site is the same content for a different kind of user.

What this means for a B2B company building an AI site

Three takeaways hold up against the data.

1. Q&As do more work than pages, and the gap is large

The article states that Q&As do more work than pages. The article states that coverage is 2.5× higher. The article states that the Pareto is gentler. The article states that the distribution has a real long tail.

The article states that the Q&As generated from observed user questions earn more bot traffic per page than rewriting existing content. The article states that the Q&As represent user demand and what users are asking about. The article states that pages are used for verification.

The article states that both Q&As and pages are needed. The article states that they are not equal in contributors to citations.

2. The compressed form is the content; the marketing prose lives elsewhere

The article states that an AI site is not a human website with structured data added. The article describes the AI site as a parallel surface. The article states that the parallel surface rewrites content for extraction.

The article states that the human site keeps doing the human job. The article states that the AI site is a different deliverable. The article states that the AI site has a different consumer.

3. Don’t bet on AI-native discovery as your primary channel yet

The article states to publish llms.txt and expose JSON APIs. The article also states “fine.”

The article states that the work in mid-2026 is still being done by robots.txt and sitemap.xml. The article states that this is the same as for the last 25 years of web SEO.

The article advises using traditional SEO infrastructure to reference the AI site. The article states that bots still use traditional SEO infrastructure.

Simpler version

The article states that an AI site is a separate property. The article states that an AI site has a different design target than the marketing website.

The article states that the shape of the content, the navigation, the discovery layer, and the audience all differ.

The article states that treating the AI site as “the same content with JSON-LD” misses what it actually is. The article states that the article also misses what bots actually use.

Data source and metadata

Data source: CloudFront access logs for rozz.genymotion.com, April 29 – May 19, 2026 (21 days).

ChatGPT-User and Claude-User content fetches only.

The page responses are 200-OK only.

Corpus inventory is reconciled against the live AI-site URI registry.

Author

Author: Adrien Schmidt, CEO, ROZZ.

Serial tech entrepreneur with 10+ years experience building AI systems including Aristotle (conversational AI analytics) and products for eBay and Cartier. Previously founded Squid Solutions and built AI products like Aristotle. Aristotle is described as a conversational big data analytics chatbot. The author previously built an AR jewelry try-on device for Cartier.

Entry date and data period

May 19, 2026. Data period: Apr 29 – May 19, 2026 (21 days).