AI agents now generate more web traffic than humans. If your website isn’t set up to be found by them, you’re already invisible to a growing slice of the internet.
Here’s something worth sitting with. According to Cloudflare, automated bots and AI agents now account for roughly 57.5% of all web requests, with humans responsible for the remaining 42.5%. It’s the first time in internet history that machines have crossed this line, and Cloudflare CEO Matthew Prince says it happened earlier than he expected, having originally predicted the crossover wouldn’t arrive until 2027.
At the same time, Cisco’s 2026 research found that a single AI agent generates up to 450% more network traffic than a human doing the exact same task. These aren’t crawlers passively indexing a page. They’re agents researching, comparing, and making or influencing decisions at machine speed.
What this really means is that there’s a new discovery problem. Not just for big brands but for anyone with a website, whether you’re a solo consultant, a SaaS startup, or a small business. When someone asks ChatGPT, Perplexity, or Claude a question relevant to your product or service, does your website even show up in what the agent surfaces? Does the agent understand what you do?
This is SEO all over again, but with different rules. And right now, most websites are completely unprepared for it.
The good news: there are three concrete things you can do this week to fix that. No coding background required for two of them. Let’s get into it.
The New Discoverability Problem
Your website was designed for humans. Since the early days of the web, the entire game was getting a person to click, read, and stay. And that still matters. But AI agents don’t browse the way people do.
An agent doesn’t appreciate your hero image or your animated product demo. It’s looking for structured, readable text that answers a question clearly. And if your website isn’t set up to give it that, the agent will find someone else’s site that is.
Think of it this way:
- A few years ago: humans searched Google, Google ranked your page, humans clicked.
- Now: humans ask an AI, the AI agent browses the web on their behalf, the AI synthesizes an answer, the human acts on it.
- Soon: AI agents will make or initiate decisions directly on behalf of users without a human clicking at all.
Each stage requires a different kind of optimization. You’ve probably already spent time on SEO for stage one. Stage two and three need something different, and the window to get ahead of this is right now.
Fix 1: Control Which Pages AI Agents Can Access (robots.txt)
You’ve probably heard of robots.txt. It’s been around since the 1990s. It’s a plain text file that sits at the root of your website and tells web crawlers which pages they’re allowed to index. Googlebot has read it for decades.
Now AI crawlers read it too. GPTBot (OpenAI), ClaudeBot (Anthropic), PerplexityBot, and others all check your robots.txt file before deciding what to index. The problem is that many websites either have no robots.txt at all, or they have one that was written to manage search engines and accidentally blocks AI crawlers as well.
Here’s what the main options look like:
Option 1: Block all AI crawlers
# Prevent AI crawlers from indexing your siteUser-agent: *Disallow: /
Use this if you’re concerned about your content being used to train AI models, or if you simply don’t want to be surfaced by AI assistants. Just know that this makes you invisible to a growing slice of how people find things online.
Option 2: Allow specific AI crawlers only
# Allow specific AI crawlers to index your siteUser-agent: GPTBotAllow: /User-agent: ClaudeBotAllow: /User-agent: PerplexityBotAllow: /
This gives you granular control. You can choose which AI systems are allowed to index your content and which ones aren’t. Useful if you have strong opinions about which platforms you want to appear on.
Option 3: Allow all crawlers
# Allow any AI crawler to index your siteUser-agent: *Allow: /
The most open option. If your goal is maximum discoverability across all AI platforms, this is the one to use.
How to check and update your robots.txt right now
- Type your domain into a browser followed by /robots.txt (for example, yourwebsite.com/robots.txt). You’ll either see the file contents or a 404 error.
- If the file doesn’t exist, you need to create one. In WordPress, you can manage it via Yoast SEO under SEO > Tools > File Editor, or via Rank Math under General Settings > Edit Robots.txt.
- If the file exists, look for any blanket
Disallow: /rule underUser-agent: *. That blocks everything, including AI crawlers. Decide whether that’s intentional. - Add the specific AI agent rules you want, using the code examples above.
- Save and verify it’s live by checking your domain/robots.txt again.
One thing to keep in mind: robots.txt is a signal, not a lock. Well-behaved crawlers like the major AI platforms honor it. Bad actors don’t. But for legitimate AI agents and assistants, this is the right layer to start with.
Fix 2: Give AI Agents a Curated Front Door (llms.txt)
This one is newer and most website owners haven’t heard of it yet. That’s exactly why it’s worth doing now.
llms.txt is a proposed standard for websites that want to be understood by AI systems. Think of it as the AI-readable version of your About page. While robots.txt tells crawlers where they can go, llms.txt tells AI agents what you actually do and what’s worth reading.
You create a plain-text file, formatted in Markdown, and put it at the root of your site at yourwebsite.com/llms.txt. Inside it, you give AI agents a clear, structured summary of your business, your offerings, and the key pages they should know about.
Here’s a basic example structure:
# Your Business Name> One-sentence description of what you do and who you serve.## About- [About Us](https://yourwebsite.com/about/): Brief description of what this page covers.## Products and Services- [Product Name](https://yourwebsite.com/product/): What it does and who it's for.- [Service Name](https://yourwebsite.com/services/): Brief description.## Key Resources- [Blog](https://yourwebsite.com/blog/): Topics covered.- [FAQ](https://yourwebsite.com/faq/): Common questions answered.
Keep it concise. The goal isn’t to dump everything onto one page. It’s to give an AI agent the 30-second version of your business so it surfaces the right content when a user asks something relevant.
There’s also a more detailed version called llms-full.txt. This is the same idea but with full content from your most important pages included, useful for AI systems that need deeper context to give accurate answers. Think of llms.txt as your business card and llms-full.txt as your full portfolio.
How to create your llms.txt file
- Open any plain text editor (Notepad, TextEdit, VS Code, it doesn’t matter).
- Write a one-line description of your business at the top under a
#heading with your site name. - Add a brief one-sentence summary under a
>blockquote symbol. - Add sections for your key pages: About, Products/Services, Blog, Contact, and anything else you’d want an AI to surface about you.
- For each entry, add a title in square brackets, the URL in parentheses, and a brief one-line description after the colon.
- Save the file as llms.txt and upload it to your site’s root directory (the same folder where your homepage lives).
- In WordPress, you can do this via FTP using a tool like FileZilla, or via Plugins > File Manager if you have that plugin installed.
- Verify it’s live by visiting yourwebsite.com/llms.txt in your browser.
You can see real examples of how others have structured theirs by visiting sites you respect and checking their /llms.txt path directly.
One maintenance note: this file doesn’t update itself. If you add new products, change your services, or publish important content, update the llms.txt file to reflect it. Quarterly is a sensible minimum.
Fix 3: Help AI Agents Understand What Your Data Means (Schema Markup)
When an AI agent lands on one of your pages, it reads the text. But structured data, specifically Schema.org markup, tells the agent exactly what type of information it’s looking at, not just what the words say.
For example, instead of an AI guessing that the text “Andreas Welsch” on a page refers to a person and an author, schema markup explicitly labels it as a Person entity, tells the agent it’s the author, and links it to other attributes like job title and organization. The agent doesn’t have to guess. It just reads the label.
This is done using a format called JSON-LD, which gets added to your site’s HTML. Here’s a simple example of what a Website schema looks like:
<script type="application/ld+json">{ "@context": "https://schema.org", "@type": "WebSite", "url": "https://yourwebsite.com/", "name": "Your Business Name", "description": "A clear one-sentence description of what you do."}</script>
You can get much more specific depending on what your site is about. A product page might use Product schema with fields for price, availability, and SKU. A local business page might use LocalBusiness schema with your address and phone number. A blog post uses Article schema with author, date, and headline. You can browse the full library of schema types at schema.org/docs/full.html.
How to add schema markup to your WordPress site
- The easiest way is through Yoast SEO or Rank Math. Both automatically add basic schema (Article, WebPage, Organization) to your posts and pages. Check your existing plugin settings first because you may already have some schema in place.
- To add or customize schema beyond what your SEO plugin does, install a dedicated plugin like Schema Pro or WP Schema.
- For custom schema, use Technical SEO’s free Schema Markup Generator to build the JSON-LD code without writing it by hand. Select the schema type, fill in the fields, and it generates the code for you.
- Paste the generated code into the
<head>section of your page. In WordPress, you can do this through your theme’s header.php, a header injection plugin, or directly via Rank Math’s Schema section on individual posts. - Once it’s live, test it using the Schema.org validator or Google’s Rich Results Test. Paste your URL and see what structured data is detected and whether any errors come up.
Start with these three for most websites: WebSite (for your homepage), Organization or Person (for your About page), and Article (for your blog posts). Those cover the most ground with the least effort.
Three Questions Worth Asking Before You Start
These three fixes involve real choices, not just technical setup. Before you dive in:
Which pages actually represent you well? Not every page on your site is worth surfacing to an AI. Think about which ones are clear, current, and genuinely useful. Those are the ones to prioritize in your llms.txt and schema setup.
Are you okay with AI platforms potentially training on your content? This is the robots.txt decision. If your content is a core business asset and you’re not comfortable with it being scraped for training, a more restrictive robots.txt is worth considering. If discoverability matters more, open it up.
What’s the bigger risk: being found or not being found? For most businesses, invisibility to AI agents is the greater danger right now. Being discoverable is increasingly how new audiences find products and services, especially in B2B. But only you can weigh that against your specific situation.
Keep It in Sync
Here’s the thing about all three of these fixes: they require maintenance. Your robots.txt, llms.txt, and schema markup don’t update automatically when your business evolves. If you launch a new product, restructure your site, or change what you do, those files need to be updated too.
A practical approach is to put a quarterly reminder on your calendar to review all three. Twenty minutes, once a quarter. Check that llms.txt reflects your current offerings, that your schema is accurate, and that your robots.txt rules still match your intentions.
The websites that stay discoverable will be the ones that treat these as living documents, not a one-time setup.
This Is Just the Foundation
Robots.txt, llms.txt, and schema are the structural layer. They tell AI agents that you exist, what you do, and what your content means. But they’re just the starting point.
The next layer is content optimization: making sure the actual writing on your pages is clear and direct enough for an AI to extract a useful answer. That means leading with your point, using specific facts over vague claims, and structuring pages so the key information is near the top. That’s a longer conversation, but it starts with making sure AI agents can even find and read your site in the first place.
Get the foundation right first. The rest builds on it.
If you’ve already written a previous post on why the internet is being rebuilt for bots or how ChatGPT became a brand discovery channel, this post is the hands-on sequel. That’s the why. This is the how.
Most websites are still optimized for a visitor who scrolls and clicks. But the visitor that matters most right now might never be a person at all.
Sources
- Cloudflare Radar: Bot vs. human traffic data, June 2026. radar.cloudflare.com
- NBC News: “Bot web traffic has overtaken human web traffic, data shows” (June 2026). nbcnews.com
- Cisco: “AI Impact on Wide Area Networks 2026” report. networkworld.com
- BizTech Magazine: “Cisco Live 2026: Cisco Pushes AgenticOps Vision” (June 2026). biztechmagazine.com
- Google: Introduction to robots.txt. developers.google.com
- Schema.org: Full schema type library. schema.org
- Andreas Welsch, The AI MEMO: “If AI Agents Can’t Find Your Business, You Don’t Exist” (June 9, 2026). Substack.

Leave a comment