Entry #10 · Apr 8, 2026
Three citation bots. Ten articles. Here’s what we learned about an AI Site.
We now have all 3 major citation bots. Perplexity-User appeared on April 5. Perplexity-User is a single data point. ChatGPT-User and Claude-User and Perplexity-User are three AI platforms retrieving content from the same AI site during live user sessions. 9,250 citation requests occurred across 90 days. Three months ago there were zero citation requests.
This is the tenth article in a weekly series documenting what happens when you build an AI site for a real product. The product is Genymotion, an Android emulator used by developers worldwide. The AI site is rozz.genymotion.com. The AI site is built by Rozz. Every data point in this series comes from CloudFront logs on rozz.genymotion.com.
Ten articles is a good moment to step back and lay out what we actually learned. Every week along this journey we measured traffic. We made changes to address issues. We learned from the results. This article summarizes the learnings along this journey. For those of you working on GEO, here are actual data points and observations on what works.
The numbers
90-day totals (Jan 8 – Apr 8, 2026)
| Category | Requests | What it means | | --- | --- | --- | | Citation bots | 9,250 | Real users getting answers from the AI Site | | Training bots | 11,590 | Content collected for model training | | index bots | 1,839 | Building retrieval indexes | | Total LLM bot requests | 22,679 | — |
Three citation pipelines, all active
| Platform | Crawl bot | Citation bot | Total citations | First citation | Current weekly | | --- | --- | --- | --- | --- | --- | | OpenAI | GPTBot (2,324) | ChatGPT-User (9,225) | 9,225 | Late January | ~1,100 | | Anthropic | ClaudeBot (3,320) | Claude-User (24) | 24 | March 25 | ~10 | | Perplexity | PerplexityBot (721) | Perplexity-User (1) | 1 | April 5 | Just started |
ChatGPT citation rate
Using the same tool before and after, we measured 14% before the AI site and 83% after. However, we do not fully trust these numbers. Chat histories differ. Locations differ. Multiple other factors impact citation rates. We prefer to focus on the actual bot visits on the AI site. The actual bot visits can be measured precisely.
1. Our GEO was a feedback loop, not a launch.
As often in tech, building an AI site was an iterative cycle. We observed crawler behavior in the logs. We diagnosed what was blocking deeper engagement. We deployed a structural fix. We measured the crawl response. We repeated the cycle.
Every major breakthrough in this series came from that loop.
| What the logs showed | What we diagnosed | What we fixed | What happened | | --- | --- | --- | --- | | 28% of ChatGPT sessions dead-ended on the index page (article 6) | Index was an infrastructure page with no product context | Redesigned index with product description and topic directory | PerplexityBot activated within 24 hours (article 7) | | ClaudeBot checked the sitemap 15x/week without crawling (article 7–8) | Monolithic sitemap gave no topic structure or freshness signal | Deployed per-topic sitemapindex with per-cluster lastmod | ClaudeBot mass crawl within 6 hours (article 8) | | 93% of Q&As in one mega-topic; ChatGPT-User fetching 4 near-identical pricing pages per session; PerplexityBot stuck on the same 4 pages for 6 weeks (articles 6–7) | Brand keyword “Genymotion” on 56% of pages collapsed clustering | Filtered high-prevalence brand keywords, dynamic topic count | PerplexityBot went from 42 to 511 requests; GPTBot re-indexed 148 pages after topic renaming | | Claude Code fetched pricing but had no implementation path (article 9) | No CLI content linked from Q&A pages | Built runbooks, linked them from Q&A pages | Claude-User: pricing → CLI runbook in 10 seconds |
None of these fixes were planned on day one. Each fix came from reading the logs. We saw a problem. We fixed the problem. The AI site we have today is the result of dozens of iterations. The iterations are not the original architecture. Precisely 376 commits were made.
Those learnings are now part of the product. You do not have to go through them again.
2. Structural signals trigger crawl behavior.
We consistently observed that each major crawl event was preceded by a structural change to the AI site.
| Date | What we changed | What happened | | --- | --- | --- | | Mar 9 | Index page redesigned with product description and topic directory | PerplexityBot activated within 24 hours (42 → 511 requests) | | Mar 20 (15:57 UTC) | Monolithic sitemap replaced with per-topic sitemapindex | ClaudeBot mass crawl within 6 hours (123 → 577 requests) | | Apr 3 | Topic names changed from feature labels to user-intent phrases | GPTBot re-indexed 148 pages within 2 days |
Different bots respond to different structural cues. PerplexityBot responded to a richer index page. ClaudeBot responded to a sitemapindex organized by topic. GPTBot responded to updated sitemap content after topic renaming.
The structural layer of an AI site determines whether crawlers commit to indexing your content or keep monitoring without acting.
3. Q&A pages drive the majority of citations.
66–75% of ChatGPT-User retrievals hit Q&A pages rather than the GEO pages adapted from existing website content. The Q&A pages are generated from real chatbot conversations. AI platforms fetch the Q&A pages most often during live user sessions.
This corresponds to common industry knowledge. Users ask AI platforms questions. Q&A pages are structured as questions with answers. The format matches the query pattern.
Schema.org QAPage markup makes extraction clean. One question exists. One answer exists. Extraction is in a single fetch.
Our AI site with Genymotion is automatically fed and updated with actual questions from users interacting with our chatbot on their website. Literally thousands of questions every month are filtered. The filtered questions are deduped. The filtered questions are clustered. The clustered questions are augmented for the AI site.
4. Content designed for AI coding tools creates a new sales channel.
Article 9 documented a session. The session included a developer asking Claude Code about Genymotion’s pricing. Ten seconds later the developer was reading the CLI implementation runbook. The runbook was inside the developer’s terminal. The developer did not open a browser.
This happened because the AI site includes CLI runbooks linked from Q&A pages. The runbooks are not pages from Genymotion’s main website. The runbooks were built specifically for developer tooling sessions. The runbooks are step-by-step command references. An AI coding tool can fetch and present these command references inside the developer’s working environment.
The Q&A page answers “how much does it cost?”. The linked runbook answers “how do I set it up?”. An AI agent like Claude Code navigates from one to the other in one session. The developer goes from evaluation to implementation without leaving their terminal.
While this remains nascent, we believe this will be huge.
For software companies selling to developers, this is a shift. The product evaluation and the implementation are collapsing into a single AI-mediated session within a coding environment. Your content is not marketing fluff read by users on a webpage. Your content is specifications and instructions consumed by AI tools inside workflows.
5. The structural details matter more than we expected.
Over 90 days, we made dozens of infrastructure changes. Here are some infrastructure changes that turned out to have direct, observable effects on crawler behavior.
Sitemaps
A monolithic sitemap listing 700 URLs gave ClaudeBot no way to assess which content was fresh or which topics to prioritize. ClaudeBot read the sitemap 15 times per week for months without committing to a content crawl. We replaced that monolithic sitemap with per-topic child sitemaps. Each child sitemap has its own lastmod date. Each child sitemap has a topic name in the URL. ClaudeBot ran a 577-request content crawl within 6 hours.
Robots.txt
Our initial robots.txt had ~140 lines with 15 individual bot sections, all saying Allow: /. We found that some crawlers stop parsing after their first matching User-agent block. Crawlers never reached the Sitemap: directives at the bottom. We collapsed to 12 lines with a single User-agent: * rule. The change fixed the issue.
llms.txt
AI crawlers do not reliably follow cross-domain links. When llms.txt pointed to content hosted elsewhere, crawlers did not follow. Inlining the key content directly in llms.txt produced better results.
Page size
Inline CSS was adding ~9KB per page. The inline CSS provided no value to AI agents consuming the content. Externalizing stylesheets moved actual content earlier in the page. This matters for agents that process HTML with token budgets.
Featured content ranking
The AI site’s index page originally showed featured Q&As in arbitrary database order. Reranking by actual retrieval count made alignment possible. The reranking used the Q&As ChatGPT-User fetches most often. The reranking aligned the index with what users actually ask about.
Topic taxonomy
The brand keyword “Genymotion” appeared on 56% of all pages. The brand keyword caused the clustering algorithm to put 93% of Q&As into a single mega-topic. Filtering high-prevalence brand keywords and switching to a dynamic topic count broke the mega-topic into 25 specific topics. We later renamed those topics from feature labels (“Android Testing”) to user-intent phrases (“Test Android Apps at Scale”). GPTBot re-indexed 148 pages within two days.
None of these changes involved writing new content. The changes were structural fixes. Structural fixes affected how the content is organized. Structural fixes affected how the content is presented. Structural fixes affected how the content is discovered. Each change had a measurable effect on how crawlers behaved.
6. AI coding tools are a distinct retrieval channel.
Of the 24 Claude-User requests we recorded, 22 came from Claude Code (user-agent: claude-code/2.1.83-84). Two came from claude.ai web ( Claude-User/1.0). These are distinct channels. Each channel has distinct behavior.
Claude Code sessions are developer-specific. Pricing questions followed by CLI runbooks occurred. 70-minute product evaluations occurred. Index-to-pricing-to-cloud-marketplace navigation occurred. We are focusing on increasing these sessions that are particularly valuable for software companies. Providing a developer a seamless, high quality experience right in their terminal can influence a purchase decision.
7. Human visitors can be routed back to the source with attribution.
The AI site serves structured content to bots. Humans occasionally land on the AI site too. Humans land on the AI site by following a link from an AI response. Humans land on the AI site by clicking through from a result. In April, we deployed a routing system. The routing system redirects human visitors to the corresponding page on Genymotion’s main website. UTM source attribution indicates which AI platform sent the human visitors. UTM source attribution indicates chatgpt, claude, or perplexity.
Bots continue to receive the GEO-optimized content. Humans get sent to the product’s actual website. Humans can sign up. Humans can purchase. Humans can contact sales. The AI site becomes a measurable attribution channel. You can track how many website visitors arrived via AI-mediated discovery.
Where each platform stands
| Platform | Crawl status | Citation status | Weekly citation volume | Trend | | --- | --- | --- | --- | --- | | ChatGPT | Mature | 83% citation rate | ~1,100/week | Steady state | | Claude | Indexed, monitoring | Live retrieval (Claude Code) | ~10/week | Early, growing | | Perplexity | Indexed, maintenance | First retrieval observed | 1 request | Just started | | Gemini | Not crawling AI site | Answers from training data + Google Search | — | Separate strategy |
What comes next
Three things we are watching:
Perplexity-User growth. PerplexityBot indexed 511 pages in March. Perplexity-User just appeared. If the ChatGPT pattern holds, deep indexing leads to exponential citation growth over 3–4 weeks. Perplexity citation volume should increase through April. Claude-User trajectory. 24 requests in two weeks occurred. ChatGPT-User went from 42 in January to 1,200/week by March. Will Claude-User follow the same curve. Or does the Claude Code developer audience stay smaller but higher intent? Human redirect attribution. The browser redirect system went live in April. The first data on which AI platforms drive actual website visits and which pages they land on will tell us whether AI-mediated discovery converts to product signups.Three AI platforms went from zero to active citation on the same content. The three AI platforms used the same markup. The three AI platforms used the same topic taxonomy. We are iterating. Stay tuned.
Get this for your site
These are the learnings from one client. The one client spans over 90 days. The one client includes dozens of infrastructure iterations. Three AI platforms went from zero to active citation on the same content. The three AI platforms used the same markup. The three AI platforms used the same topic taxonomy. Rozz builds this for every client.
Structured Q&A pages from your chatbot. Per-topic sitemaps. Schema.org markup. CLI runbooks for developer tools. Featured content ranking. The infrastructure that turns AI crawlers into citation channels.
$997/month | ChatGPT at 83%. Three platforms citing. The data is in the articles.
Data source and period
Data source: CloudFront access logs for rozz.genymotion.com, January 8 – April 8, 2026 (90 days). Bot classification based on User-Agent strings. Cumulative totals across all 10 weekly entries in The Crawler Logs series.
Author
Author: Adrien Schmidt, CEO, ROZZ
Serial tech entrepreneur with 10+ years experience building AI systems including Aristotle (conversational AI analytics) and products for eBay and Cartier. Previously founded Squid Solutions and built AI products like Aristotle, the conversational big data analytics chatbot, and an AR jewelry try-on device for Cartier.
April 8, 2026 | Data period: Jan 8 – Apr 8, 2026 (90 days)
rozz@rozz.site © 2026 ROZZ.