547 Requests in One Day: What Happens When GPTBot Discovers Your Mirror Site
Entry #1 · Feb 3, 2026
On January 7, 2026, GPTBot made 547 requests to rozz.genymotion.com—47% of all training bot activity we recorded in 30 days. The mirror site—a dedicated AI publishing layer that ROZZ builds automatically for clients—had been live for weeks with minimal crawler attention. Then GPTBot found it. Within three weeks, ChatGPT users were receiving Genymotion content in their conversations. This is the first documented case study of the complete GEO pipeline: from mirror site deployment to training crawl to live citation.
Key Findings
- GPTBot made 547 requests on January 7, 2026—47% of 30-day training activity in one day
- Total training bot requests (GPTBot + ClaudeBot): 1,172 over 30 days
- OAI-SearchBot made 66 requests building retrieval indexes (separate from training)
- GPTBot prioritized GEO pages (493 requests) over Q&A pages (322 requests)
- Citation events (ChatGPT-User) began appearing ~3 weeks after the major crawl
- 42 citation events recorded in 30 days, concentrated on 4 high-intent pages
The Data
Daily GPTBot Activity (Jan 3 – Feb 2, 2026)
| Date | GPTBot Requests | Notable Activity | | --- | --- | --- | | Jan 3–6 | 0–8/day | Baseline; ClaudeBot discovers site | | Jan 7 | 547 | Major crawl spike | | Jan 8–17 | 1–2/day | Low activity period | | Jan 18–19 | 124 total | Secondary wave | | Jan 25–26 | 409 total | Tertiary wave | | Jan 27 | 40 | Q&A deep dive (40+ Q&As in rapid succession) | | Jan 28+ | 2–4/day | Maintenance crawling; citations begin |
Bot Category Breakdown (30 Days)
| Category | Bot(s) | Requests | Purpose | | --- | --- | --- | --- | | Training | GPTBot, ClaudeBot | 1,172 | Content collection for model training | | Index | OAI-SearchBot | 66 | Building retrieval indexes | | Citation | ChatGPT-User | 42 | Real users receiving content in responses | | Total LLM Bot Requests | — | 1,280 | — |
Content Type Distribution (GPTBot Only)
| Content Type | Requests | Percentage | | --- | --- | --- | | GEO Pages | 493 | 57% | | Q&A Pages | 322 | 37% | | Sitemap | 27 | 3% | | Other (APIs, llms.txt, homepage) | 16 | 2% |
What GPTBot Prioritized
The January 7 crawl was not random. GPTBot followed a clear pattern.
1. Discovery via sitemap. GPTBot hit the sitemap first. GPTBot systematically worked through content pages.
2. GEO pages over Q&As. The mirror site had 177 Q&A pages and 450 GEO pages. GPTBot crawled GEO pages at a higher rate (493 vs 322). GEO pages are AI-optimized versions of Genymotion's help center and documentation. GEO pages are rich in structured content.
3. Burst patterns for Q&As. On January 27, GPTBot returned specifically for Q&A pages. GPTBot crawled 40+ Q&As in rapid succession. GPTBot crawled roughly one Q&A per second. This suggests different indexing strategies for different content types.
4. Schema.org matters. Every page on the mirror site includes full Schema.org JSON-LD markup. Q&A pages use QAPage. Content pages use WebPage. Topic pages use CollectionPage. This structured data makes content trivially extractable.
The Three-Phase Pipeline
Our data shows a clear progression from crawl to citation.
Phase 1: Training (Jan 7 + follow-up waves)
GPTBot mass-crawls the mirror site. GPTBot made 547 requests on January 7 alone. Follow-up waves occurred on Jan 18–19 (124 requests). Follow-up waves occurred on Jan 25–26 (409 requests). A targeted Q&A crawl occurred on Jan 27. Content entered OpenAI's training pipeline.
Phase 2: Indexing (ongoing)
OAI-SearchBot operates separately from GPTBot. OAI-SearchBot is building the retrieval index that powers ChatGPT's web feature. We recorded 66 SearchBot requests. Most SearchBot requests were robots.txt checks (38 of 66). Robots.txt checks verify permission to index. This bot works quietly in the background.
Phase 3: Citations Begin (Jan 28+)
ChatGPT-User requests appear. Real users asking ChatGPT questions are now receiving Genymotion content from the mirror site.
> Timeline: ~3 weeks from major crawl to first citations.
Citation Events: What Users Are Asking
The 42 ChatGPT-User requests were not distributed evenly. The 42 ChatGPT-User requests concentrated on specific pages.
| Page | Citations | What Users Are Asking | | --- | --- | --- | | /pages/what-are-genymotion-desktop-requirements.html | 7 | System requirements for Genymotion | | /pages/which-android-versions-are-available.html | 5 | Android version support | | Homepage | 5 | General discovery | | /pages/how-to-enable-the-virtual-keyboard.html | 2 | Specific troubleshooting | | /pages/genymotion-desktop-release-notes.html | 1 | Version information |
These are high-intent queries. Users asking ChatGPT about system requirements or Android version support are evaluating whether to use Genymotion. The mirror site is now part of that conversation.
What ROZZ Built
The mirror site at rozz.genymotion.com is infrastructure that ROZZ builds automatically for every client. The mirror site includes:
- 450 GEO pages: AI-optimized versions of help center articles, documentation, and blog posts
- 177 Q&A pages: Generated from questions users ask the ROZZ chatbot on genymotion.com
- 15 topic categories: Semantic organization for both humans and machines
- Schema.org markup on every page: QAPage, WebPage, CollectionPage with full JSON-LD
- llms.txt discovery files: Two formats—index with links and complete content inline
- JSON APIs: Programmatic access for AI systems
This is not on-page optimization. This is a dedicated publishing layer designed specifically for how LLMs retrieve and cite content.
> Genymotion is one client. ROZZ builds this infrastructure automatically for every domain.
Implications for GEO Strategy
1. Dedicated infrastructure beats on-page tweaks
You cannot effectively optimize a marketing page for both human conversion and machine extraction. The mirror site solves this by providing a separate, purpose-built layer for AI discovery.
2. Structured data accelerates discovery
Every page on the mirror site includes Schema.org JSON-LD. GPTBot's systematic crawl pattern suggests it prioritizes structured, extractable content.
3. The timeline is weeks, not months
From major crawl (Jan 7) to first citations (late Jan): approximately 3 weeks. GEO results appear faster than traditional SEO—if the infrastructure is in place.
4. Citation events reveal user intent
The pages being cited are not random. The pages being cited are high-intent queries about requirements, compatibility, and features. Purchase decisions happen in this context.
Get This for Your Site
ROZZ builds this infrastructure automatically. Mirror site. Q&A pages from your chatbot. Schema.org markup on every page. llms.txt discovery files. JSON APIs. The complete AI publishing layer.
$997/month | Results like Genymotion's
→ See how it works | rozz@rozz.site
Data source and methodology
Data source: CloudFront access logs for rozz.genymotion.com, January 3 – February 2, 2026. Bot classification is based on User-Agent strings.
Author
Author: Adrien Schmidt, CEO, ROZZ
Serial tech entrepreneur with 10+ years experience building AI systems including Aristotle (conversational AI analytics) and products for eBay and Cartier.
February 2, 2026 | Data period: Jan 3 – Feb 2, 2026
rozz@rozz.site | © 2026 ROZZ. All rights reserved.