AI Site Structure Matters More than We Thought

Every week we look into the data and share insights on this blog. This week is about the importance of “topics” or the semantic structure we build to organize content for AI agents on the AI site.

The Genymotion AI site, our case study in this weekly series, has 16 topic pages. In the logs, we found out that AI platforms and engines are asking for 61 more that don’t exist any more. That’s 1,001 requests in 7 days to topic pages that were removed when we improved the taxonomy. Why were so many topic index pages being queried repeatedly?

A design choice: structure for machines

So this is what we’ll talk about in this twelfth article in the series: organization by semantic topic, and the stability of this structure. Turns out it matters more than we realized when we made it.

An AI site is a website for AI agents. A regular website for humans helps humans browse. Humans click through menus that are usually the same for each B2B site. Humans judge a site by what it looks like.

None of that applies when the reader is an AI agent from ChatGPT, Claude, or Perplexity. We’re on a mission to find out what AI agents truly want.

When Rozz builds an AI site, Rozz builds around content taxonomy, not around human navigation. The primary organizational layer is a set of topic hubs. Each topic hub is named after a set, or cluster, of related content specific to each site. For Genymotion, the topic hubs are: CLI Tooling, Cloud Deployment, Virtual Device Management, Licensing…

This is the opposite of how traditional sites are built. Traditional sites are organized around how users browse through them and where they should convert. AI sites are organized around concepts to facilitate content retrieval.

The topic hubs work

We found confirmation of the importance of that design in the logs. Every major AI platform queries the topic listing pages.

In the past week, ChatGPT-User fetched /topics/cli-shell-tooling.html 130 times during live user sessions. PerplexityBot hit /topics/android-os-versions.html six times across a monitoring schedule. ClaudeBot visited /topics/mobile-test-automation.html twice while sampling the site. These pulls happen week over week. The topic pages help AI systems understand what content exists before AI systems navigate to specific answers. The topic pages also help AI systems filter what AI systems collect before AI systems pull the data.

This isn’t obvious because RAG systems can also query a huge index and find pages using semantic search. Filtering content increases RAG efficiency. This may be what’s going on here.

The point we’re making here is about the expectation, or even the demand, that LLM systems appear to have for structural consistency over time. Whereas individual pages come and go without much afterthought, the topical layer was retrieved time and time again over multiple days. The repeated retrieval happened as if there was an expectation of durability.

This leads to reinforcing attention in curating the topics in each AI site. The curation is done both algorithmically and by providing tools for human oversight.

What we didn’t anticipate

In the original design, the topic taxonomy on an AI site was generated every week. The goal of the weekly generation was to reflect the site’s new content and the new Q&As coming from the chatbot. Clustering algorithms decide which pieces of content belong together and assign names to groups. Clusters get split, merged, or renamed.

Over 90 days, the iteration loop ran several times. Each iteration improved the taxonomy. Topics became more specific. Topics became less overlapping. Topics became more aligned with how users actually query the content.

The original assumption was that OK because LLMs like fresh content. Each iteration also changed URLs.

/topics/android-os-versions.html became /topics/android-version-selection.html. /topics/mobile-testing-security.html split into /topics/mobile-test-automation.html and /topics/network-security-config.html. A dozen other topic slugs shifted as clustering tightened.

The ghost problem

Here is what the changes turned into in one week of logs:

| Topic URL | Requests (7 days) | Status | | --- | --- | --- | | /topics/android-os-versions.html | 344 | Retired | | /topics/mobile-testing-security.html | 127 | Retired | | /topics/root-access-and-tools.html | 69 | Retired | | /topics/ci-cd-tooling.html | 13 | Retired | | /topics/arm-apple-silicon.html | 9 | Retired | | …56 more retired topic URLs | 439 | Retired | | Total ghost topic requests | 1,001 | — |

61 topic URLs that no longer existed on the site received 1,001 requests in seven days. ChatGPT-User alone contributed 387 of those requests. Perplexity, Claude, and other retrieval systems added the rest.

The pattern across requesters is clean. Each system learned the old taxonomy at some point. Each system cached the topic URLs. Each system kept fetching the topic URLs long after the topic URLs were retired. ChatGPT remembers topic URLs from when ChatGPT last indexed the site. Bing remembers sitemap filenames we replaced months ago. Six retired sitemap shards are still being polled every 27 minutes. Every retrieval system that ever read the site carries a version of the structure that is out of date.

Stability patterns

Once the pattern was understood, the fix was straightforward. The principle is that algorithm-generated structure needs URL-level stability. URL-level stability is needed because the algorithm’s optimization instruction does not provide that stability on its own. Some changes were made.

Canonical topics.

Topics are still proposed by the algorithm.

Topics can easily be manually curated.

Topics include detailed descriptions.

Topics are kept stable over time.

Topics remain stable unless topics become so deprecated that topics must be retired.

A topic URL registry.

Every topic URL that the site has ever had is tracked.

The tracking includes retired topic URLs.

When the clustering algorithm proposes renaming a topic, the rename is recorded in the registry rather than silently replacing the old URL.

This helps to redirect some retired topics to active ones.

301 redirects from retired topics to their closest current equivalents.

When AI platforms request /topics/android-os-versions.html, the platforms now get a 301 redirect to /topics/android-version-selection.html.

/topics/android-version-selection.html is the current topic that covers the same content.

That way, the LLM still gets the content it needs.

The LLM does not learn that the topic name changed for next time.

The LLM still gets the proper data.

The first pass shipped this week with 61 redirect mappings. The 61 redirect mappings covered every retired topic URL that could be identified in the logs. More mappings will surface as traffic continues to be watched.

Why this matters

This separates a high-performance AI site from a one-shot prototype.

A prototype is easy to build. A taxonomy is generated once. Topic URLs are published. The build stops. The site works on the day it ships. The site also goes stale the day the first better clustering algorithm comes out. The staleness happens because either the stale structure is kept or every external cache that learned the structure is broken.

A high-performance AI site needs the opposite discipline. The taxonomy is iterated on because iteration makes the site get better over time. Every iteration produces a set of changes that need to be governed. Governance includes which URLs move. Governance also includes which URLs merge. Governance also includes which URLs retire. Governance also includes which URLs redirect to which. The clustering algorithm needs a stability layer.

Most discussions of AI SEO focus on content. Those discussions include writing answer-first. Those discussions include using Q&A schema. Those discussions include keeping sentences short. That work matters. Below the content layer is the structural layer. Below the structural layer is the stability layer. The structural layer tells a machine reader what the site is about. The stability layer keeps that information useful as the structure improves.

An AI site without a stability layer performs well at launch. An AI site without a stability layer decays over time. Every algorithm upgrade creates another set of ghost URLs. Every ghost URL represents real users asking real AI platforms real questions. The questions arrive at a dead page. The dead page returns a stub instead of an answer.

A site was built for machines. Machines read the site. The structure changed. Machines kept reading the old map. Stability patterns close that gap.

Get this for your company

Rozz builds AI sites for B2B companies. The sites are structured for machines. The sites are iterated to stay current. The iteration is governed so iteration does not break what retrieval systems have already learned.

$997/month | AI site + chatbot + analytics

→ Book a call → See how it works → rozz@rozz.site

Data source and period

Data source: CloudFront access logs for rozz.genymotion.com, April 15 – April 22, 2026 (7 days). Retired topic URL inventory reconciled against the current site’s 16 live topic hubs. Requester breakdown from User-Agent classification.

Author

Author: Adrien Schmidt, CEO, ROZZ

Serial tech entrepreneur with 10+ years experience building AI systems including Aristotle (conversational AI analytics) and products for eBay and Cartier. Previously founded Squid Solutions and built AI products like Aristotle, the conversational big data analytics chatbot, and an AR jewelry try-on device for Cartier.

April 22, 2026 | Data period: Apr 15 – Apr 22, 2026 (7 days)