Generative models do not consume the web the way a search engine does.
They do not index content wholesale. They access targeted sources, interpret them and reassemble them, under constraints of time and computation. In this context, the question is no longer solely one of content relevance, it is one of content exploitability.
A page can rank perfectly on Google and remain invisible to an AI if it is not accessible under conditions compatible with its technical constraints. This is a blind spot that few organisations have yet integrated into their digital strategy.
How AI bots actually access your content
This is the point most articles on the subject sidestep. Understanding how AI bots operate reveals why certain architectural choices create a structural risk of invisibility.
They have no browser. Unlike Googlebot, which has had a full JavaScript rendering engine (based on Chromium) since 2019, most bots used by LLMs, OpenAI’s GPTBot, Anthropic’s ClaudeBot, PerplexityBot, operate primarily via simple HTTP fetch. They retrieve the raw HTML returned by the server, without executing JavaScript. What the bot sees is what your server sends in the initial response, nothing more.
They operate within tight time budgets. Traditional crawlers can afford to wait. AI bots, by contrast, operate within constrained time windows consistent with the latencies acceptable in a near-real-time system. A high TTFB or an unstable server response translates directly into an abandoned request, with no systematic retry.
They do not follow classical crawl logic. Googlebot follows sitemaps, respects crawl priorities, and returns regularly. AI bots access content on demand, often driven by popularity signals or existing citations. Freshness of access is not guaranteed: recently updated content is not necessarily re-fetched promptly.
Payload size matters. Beyond a certain volume of HTML, bots truncate or abandon retrieval altogether. The SALT study observed a significant increase in abandonment beyond 1 MB of HTML. For e-commerce pages with long product listings, dynamic recommendations and stacked CMS blocks, this threshold is reached more often than teams realise.
HTTP headers are read and interpreted. X-Robots-Tag directives, noindex tags, or a misconfigured robots.txt can block certain bots without the team being aware. This is a frequent blind spot during migrations or rapid production deployments.
The direct consequence: the architecture that suits a human user is not necessarily the one that suits an AI bot. Both coexist, and technical decisions must now account for both.
Web performance and LLMs: converging signals
Analyses from Prerender.io and BlogSEO.io confirm this on-the-ground reality: models rely predominantly on content accessible as HTML, with limited JavaScript execution. In practice, a product page whose content is rendered client-side is structurally disadvantaged, even if it performs well in classical SEO.
What the SALT study shows, and what it means
By analysing over 2,000 domains cited in AI-generated responses, SALT Agency identified clear correlations between technical performance and the likelihood of appearing in AI answers.
The most frequently cited sites share the same characteristics:
- CLS ≤ 0.1: visual stability on load, a signal of structural reliability
- LCP ≤ 2.5s: primary content available quickly
- TTFB < 200ms: short server response time, the signal most directly correlated with citation density
- HTML < 1MB: beyond this, abandonment rates during retrieval increase significantly
These thresholds do not define a ranking algorithm. They describe an operational reality: content that is quickly accessible and structured in an exploitable way is more likely to be used by models.
This is a strategic signal: technical performance is no longer purely a matter of UX or conversion, it now conditions presence in AI-generated responses.
From optimisation to a condition of existence
In classical SEO, performance improves visibility. In an LLM context, it determines it.
Below certain thresholds, content is simply not used. This is not a question of editorial quality, it is a question of access.
A concrete example: a brand that invests in rich editorial content about its collections, served via a React framework without SSR, may find that content systematically ignored by models, not because it lacks relevance, but because the bot sees nothing but an empty HTML shell at the point of fetch.
The e-commerce blind spot
E-commerce architectures concentrate the main risk factors: client-side rendering, heavy JavaScript dependency, third-party scripts, complex structures.
These choices often respond to legitimate needs: personalisation, A/B testing, third-party integrations. But they introduce a structural gap between what a user sees and what an AI can exploit.
A category page whose products, prices and descriptions are injected dynamically via client-side API calls may be perfectly indexed by Google, and remain entirely opaque to a generative model. The bot retrieves the application shell, not the content.
This is a new dimension of visibility risk, one that does not yet appear on most digital roadmaps.
What this means for architecture
Critical content available at initial render. SSR or static pre-rendering on strategic pages is no longer merely a performance optimisation, it becomes a condition of accessibility for LLMs. Modern frameworks (Next.js, Nuxt, Astro) make it possible to combine server-side rendering for critical content with client-side interactivity where genuinely needed, without an all-or-nothing trade-off.
Managing JavaScript debt. Third-party scripts, tag managers, hydration mechanisms: every dependency now carries an additional criterion. Does it delay access to essential content in the initial HTML? What was a performance optimisation has become an AI compatibility criterion.
Stability and predictability of response times. An average TTFB of 180ms that spikes to 800ms under load is as problematic as a structurally high TTFB. Edge architectures and aggressive caching strategies on low-variability content (category pages, product pages without personalisation) ensure fast, stable access regardless of request context.
Crawl directive hygiene. Verifying that known AI bots (GPTBot, ClaudeBot, PerplexityBot) are not blocked by your robots.txt or restrictive HTTP headers is a prerequisite that is frequently overlooked, particularly following a migration or infrastructure change.
A long-term governance challenge
Reaching a given performance level is a starting point, not a guarantee.
In an e-commerce environment, performance degrades naturally: scripts are added, features evolve, backends vary, traffic spikes, caches are frequently purged. Without dedicated governance, hard-won gains erode within weeks.
This is where the challenge becomes organisational. Sustaining performance over time requires treating it as an operational quality criterion, with defined metrics, alert thresholds and clearly assigned responsibilities, on a par with availability and security.
Quick check: is your site exploitable by an AI?
- Is the primary content present in the initial HTML, without JavaScript execution?
- Is your TTFB at or below 200ms, including under load?
- Do your key pages remain under 1MB of HTML?
- Are elements visible on load stable (CLS ≤ 0.1)?
- Are GPTBot, ClaudeBot and PerplexityBot permitted in your robots.txt?
- Is your critical content served via SSR or pre-rendered, without reliance on client-side API calls?
If several of these conditions are not met, your content is likely under-represented in AI-generated responses, regardless of its editorial quality.
To benchmark your technical performance against your competitors, test your site directly via our web performance simulator: