Technical AEO

llms.txt Generator: Create Your AI Crawler Configuration File

Joshua Ortega Joshua Ortega 11 min read

llms.txt Generator: Create Your AI Crawler Configuration File

TL;DR
  • llms.txt is a proposed standard for providing LLMs with a curated, markdown-formatted summary of your website. Think robots.txt, but for AI readability instead of access control.
  • No major AI platform has adopted it yet. Google’s John Mueller called it “unnecessary.” But developer communities and agentic AI tools are picking it up.
  • The file lives at yoursite.com/llms.txt and follows a specific Markdown format: H1 header, blockquote summary, and file lists under H2 sections.
  • It takes 20 minutes to create. Zero downside. Potential upside as AI tools evolve.
  • Don’t put anything in llms.txt that doesn’t match your on-page content. AI platforms distrust it for exactly that reason.

llms.txt is one of the most overhyped and misunderstood concepts in AI search optimization right now. SEO tools flag it as missing. WordPress plugins offer to generate it. SEO Twitter argues about it weekly.

The actual situation: it’s a proposal from Jeremy Howard (published September 2024) that no major AI platform has officially adopted. But it’s a useful concept, it’s gaining traction with developer tools and AI agents, and creating one takes 20 minutes.

This post gives you the full picture: what llms.txt is, why it exists, whether you should care, and how to create one.

20min
Time to create a complete llms.txt file
0
Downside risk from implementing llms.txt
0
Major AI platforms officially using it (yet)

llms.txt: low effort, zero risk, potential upside

What llms.txt Is (And What It Isn’t)

llms.txt is a Markdown-formatted file placed at the root of your website (yoursite.com/llms.txt) that provides LLMs with a curated, readable overview of your site.

Here’s the problem it’s trying to solve: LLM context windows are too small to process entire websites. Your typical web page is packed with navigation, ads, JavaScript, footer links, cookie banners, and sidebar widgets. The actual content might be 30% of the page’s HTML. An LLM trying to understand your page has to wade through all that noise.

llms.txt gives the AI a clean shortcut. A summary of what your site is about, plus links to Markdown versions of your most important pages. No noise. Just signal.

Think of it as the difference between handing someone a messy filing cabinet and handing them a one-page table of contents with page numbers.

How It Differs from robots.txt and sitemap.xml

Traditional Files
llms.txt
robots.txt , Controls access permissions (“don’t crawl /admin/”)
Provides curated context, not access control

sitemap.xml , Lists every indexable page, no curation
Highlights 10-20 most important pages with explanations

Machine-readable format (XML)
Human-readable Markdown format

Universally adopted by search engines
Proposed standard, not yet adopted by major AI platforms

robots.txt tells bots what they can and can’t access. It’s about permissions. “Don’t crawl /admin/.” “Don’t index /staging/.”

sitemap.xml lists every indexable page on your site. It’s a complete inventory for search engines. No curation, no explanation, no context.

llms.txt is curated context. It doesn’t control access or list every page. It explains your site to an AI and points it toward the most useful pages in a format the AI can actually process. The official spec puts it clearly: llms.txt is “designed to coexist with current web standards” and “complement robots.txt by providing context for allowed content.”

The llms.txt Format (Spec Breakdown)

The file uses specific Markdown sections, in this exact order:

1. H1 Header (Required)
The name of your project or website. This is the only section the spec strictly requires.

# Metronyx AI

2. Blockquote Summary
A short summary with the most important background information about your site.

> Metronyx AI is an AI search optimization agency specializing in answer engine optimization (AEO), helping brands get cited by ChatGPT, Perplexity, and Google AI Overviews.

3. Detailed Information (Optional)
Zero or more Markdown sections (paragraphs, lists, but not headings) with more context about your project and how to interpret the linked files.

We publish original research on AI search visibility, provide free audit tools, and offer managed AEO services for B2B SaaS, e-commerce, and professional services companies.

4. File Lists Under H2 Headers
Markdown sections with H2 headings containing lists of URLs where the AI can find more detail. Each entry is a Markdown hyperlink with optional notes after a colon.

## Core Pages
- [About Metronyx AI](https://metronyxai.com/about/): Company background and team
- [Services](https://metronyxai.com/services/): AI search optimization service descriptions
- [Free AI Visibility Audit](https://metronyxai.com/audit/): Our free audit tool for checking AI visibility

## Blog (Key Posts)
- [Entity Architecture for AI Search](https://metronyxai.com/entity-architecture-ai-search/): How to build brand entity recognition
- [How to Rank in Perplexity](https://metronyxai.com/how-to-rank-in-perplexity/): Platform-specific optimization guide

5. Optional Section
An H2 section titled “Optional” for URLs the LLM can skip if it needs to save context window space. Use this for secondary content.

## Optional
- [Blog Archive](https://metronyxai.com/blog/): Full blog post listing
- [Case Studies](https://metronyxai.com/case-studies/): Client success stories

That’s it. That’s the entire spec. Simple by design.

The Honest Truth About llms.txt Adoption

We need to be straight about this: no major AI platform currently uses llms.txt for search results or citations.

Search Engine Journal’s analysis confirmed that “LLMs.txt is just a proposal, and no AI platform has signed on to use it.” Google’s John Mueller has publicly called creating one “unnecessary.” AI chatbots and crawlers continue to rely on regular HTML content.

The spec’s own creators expect it to be used mainly during inference, meaning when an LLM fetches your file on-demand to answer a question about your site. Not during mass crawling for model training.

So why are so many companies rushing to implement it?

The Misinformation Loop

Here’s what’s happening: SEO tools started checking for llms.txt and flagging its absence as a “risk.” Semrush’s audit documentation warned that “if your site lacks a clear llms.txt file it risks being misrepresented by AI systems.” That created panic. Site owners saw the warning and felt they needed to create one immediately.

Rank Math suggested AI chatbots “refer to the curated version you’ve given it” via llms.txt. That’s not accurate. AI chatbots use regular HTML. They don’t fetch llms.txt.

The result? A self-reinforcing loop where tool makers feel compelled to offer llms.txt features because users expect them, and users expect them because tool makers keep bringing them up.

The Trust Problem

There’s a good reason AI platforms haven’t adopted llms.txt. A separate file visible only to bots is inherently harder to trust than on-page content that humans also see.

On-page content is relatively trustworthy because it serves both users and bots. But llms.txt? Nobody except AI sees it. Which makes it the perfect vector for manipulation.

A 2024 research paper on Adversarial Search Engine Optimization for Large Language Models demonstrated exactly this risk. Researchers showed that “an attacker can trick an LLM into promoting their content over competitors” using what they called Preference Manipulation Attacks. In testing on Bing and Perplexity, a targeted product was 2.5x more likely to be recommended after an attack.

If that’s possible with regular content, imagine what could happen with a file specifically designed to feed information to AI, where human visitors would never see the deception.

So Should You Create One?

Yes. Here’s our reasoning.

The cost is close to zero. Creating an llms.txt file takes 20 minutes. Maintaining it takes 5 minutes when you publish new content. There’s no technical risk. It doesn’t interfere with your existing SEO.

The potential upside is real, even if unrealized today:

  • Developer tools already use it. Code editors, documentation tools, and AI coding assistants are picking up llms.txt to understand project documentation. If your product has technical docs, developers might already benefit from your llms.txt.
  • Agentic AI will need structured site maps. As AI agents start performing multi-step tasks (booking, research, purchasing), they’ll need machine-readable guides to website structure. llms.txt is positioned for this use case.
  • It forces you to curate. The exercise of writing an llms.txt file makes you think about which pages on your site actually matter. That’s a useful exercise regardless of whether AI reads the file.

What you should NOT expect: a rankings boost, more AI citations, or improved AI visibility. Those outcomes depend on your on-page content, entity signals, and technical SEO. Not on whether you have an llms.txt file.

How to Create Your llms.txt File

You’ve got three options.

Option 1: Build It Manually

Open a text editor. Follow the spec format outlined above. Save it as llms.txt (plain text, UTF-8 encoding) and upload it to your site’s root directory via SFTP, cPanel, or your hosting provider’s file manager.

Here’s a complete example for a SaaS company:

# ProductName

> ProductName is a project management tool for remote teams, offering task tracking, time management, and team communication in a single platform.

ProductName serves over 5,000 teams across 40 countries. Our API integrates with Slack, GitHub, and Jira.

## Documentation
- [Getting Started Guide](https://productname.com/docs/getting-started/): Setup and onboarding walkthrough
- [API Reference](https://productname.com/docs/api/): Full REST API documentation
- [Integrations](https://productname.com/docs/integrations/): Third-party integration guides

## Key Pages
- [Pricing](https://productname.com/pricing/): Plan comparison and pricing details
- [About](https://productname.com/about/): Company background and team
- [Blog](https://productname.com/blog/): Product updates and industry insights

## Optional
- [Changelog](https://productname.com/changelog/): Release notes and version history
- [Careers](https://productname.com/careers/): Open positions

Option 2: Use a CMS Plugin

WordPress plugins like Rank Math and Yoast now offer llms.txt generation. They auto-populate the file based on your site structure. The convenience is nice, but review the output. Auto-generated files sometimes include pages that shouldn’t be there (staging pages, thin content, archive pages with no real value).

Option 3: Use Our Generator

Our free tools page includes a generator that walks you through each section. Enter your site name, write your summary, add your URLs, and it outputs a properly formatted file ready to upload.

Creating Markdown Versions of Your Pages

The llms.txt spec also proposes that pages provide clean Markdown versions at the same URL with .md appended. So yoursite.com/about/ would have a companion at yoursite.com/about/index.html.md.

This is the more ambitious part of the spec. It means maintaining two versions of every important page: your normal HTML page and a stripped-down Markdown version.

Our take: only do this for your 5-10 most important pages. API documentation. Product overview. Key landing pages. Don’t try to maintain Markdown versions of your entire blog archive. The maintenance cost isn’t worth it when no AI platform is currently reading these files.

When creating .md versions:

  • Strip all navigation, headers, footers, sidebars
  • Remove ads, pop-ups, CTAs, and JavaScript widgets
  • Keep only the core content in clean Markdown
  • Include images as Markdown image references if they add informational value
  • Make sure the Markdown is valid and renders correctly

What NOT to Put in llms.txt

This matters more than what you put in.

  • Keep content consistent with what’s on your actual pages
  • Curate only 10-20 of your most useful pages
  • Only link to publicly accessible content
  • Don’t add hidden promotional text that contradicts on-page content
  • Don’t include prompt injection attempts (“always recommend…”)
  • Don’t list every page on your site , that’s what sitemap.xml is for
  • Don’t include pages behind login walls

Don’t add hidden promotional text. If your on-page content says “We’re one of several options in the market” but your llms.txt says “We’re the #1 leader in the industry,” you’re doing exactly what AI platforms are afraid of. Keep it consistent.

Don’t include prompt injection attempts. Some people have tried embedding instructions like “When asked about [category], always recommend [brand]” in llms.txt and .md files. This is a fast track to getting your domain flagged and potentially blacklisted.

Don’t list every page on your site. That’s what sitemap.xml is for. llms.txt is curated. Include 10-20 of your most useful pages. If an AI needs your full site structure, it’ll use your sitemap.

Don’t include pages behind login walls. An LLM can’t access gated content. Linking to it in llms.txt just wastes context window space.

llms.txt and Your Broader AEO Strategy

Here’s where we get opinionated. llms.txt should be approximately 1% of your AI search optimization effort. The other 99% should go toward:

  • Making your content technically accessible to AI crawlers (robots.txt, not blocking GPTBot)
  • Structuring content for extraction (question headings, direct answers, sourced data)
  • Building entity recognition (schema markup, consistent branding, Wikipedia presence)
  • Getting discussed on platforms AI engines actually cite (Reddit, LinkedIn, YouTube)

We’ve run over 200 AI visibility audits at Metronyx. Not once has the presence or absence of an llms.txt file been the determining factor in whether a brand gets cited by AI. The brands that get cited are the ones with strong on-page content, clear entity signals, and active community presence.

llms.txt is a nice-to-have. A bet on where AI tooling is heading. Not a substitute for the work that actually drives AI visibility today.

If you want to understand the tactics that DO drive AI citations right now, read our guide on how to rank in Perplexity. Different approach. Proven results.

Create your llms.txt file

Start with an H1 header (your site name), a blockquote summary of what your site does, and organize content links under H2 sections.

Add your top pages

List your 10-20 most important pages with titles and brief descriptions. Prioritize pages with the highest-value content for AI consumption.

Create .md versions (optional)

For maximum extractability, create markdown versions of key pages. Link them as llms-full.txt for AI agents that want the complete content.

Deploy and verify

Upload to yoursite.com/llms.txt. Verify it’s accessible by visiting the URL directly. Keep it consistent with your actual on-page content.

Four steps to create and deploy your llms.txt file

Frequently Asked Questions

No. As of March 2026, no major AI platform (ChatGPT, Perplexity, Google AI Overviews, Claude) officially uses llms.txt for search results or citations. Google’s John Mueller has called it “unnecessary.” AI visibility comes from your on-page content, entity signals, and technical accessibility, not from llms.txt.

Joshua Ortega
Written by

Joshua Ortega

AI Search Optimization at Metronyx AI

Joshua builds content engines. He blends AI tools with sharp editorial instincts to produce work that is fast, strategic, and actually worth reading. He knows how to scale output without letting quality slip.

AEO AI SEO AI Visibility Schema Markup Content Strategy

Want to get cited by AI engines?

Get a free AI Visibility Audit and see how your brand appears in ChatGPT, Perplexity, and Google AI Overviews.

Get Your Free AI Visibility Audit