Phil Kurth

How AI Search Engines Decide Which Websites to Cite

Last updated • 8 April 2026

Something has quietly changed about how people find information online.

If you searched Google last week, there’s a good chance the answer appeared at the top of the page before you clicked anything. Google wrote it for you, pulled from several websites, with small citation links underneath. ChatGPT can browse the web now and does something similar. Perplexity has been doing it from day one.

These AI search systems don’t just rank pages and let you choose. They read your content, decide whether it’s worth quoting, and then rewrite the answer in their own words. Your website either gets cited as a source or it doesn’t appear at all.

I’ve been digging into this for my own site over the past few months, and what I’ve found is that the rules are genuinely different from traditional SEO. Not completely different, but different enough that you can have a technically excellent website that AI search engines mostly ignore.

Three systems, three different approaches

The big three AI search systems each work differently, and understanding those differences matters if you want your content to show up.

Google AI Overviews

Google AI Overviews (the AI-generated summaries that appear above regular search results) have a significant advantage: they sit on top of Google’s existing search index. Google already knows which pages rank well for a query. The AI Overview pulls from those top-ranking pages, synthesises an answer, and links back to the sources it drew from.

This means traditional SEO still matters for AI Overviews. If you’re not ranking on page one, you’re probably not getting cited in the overview either. But ranking alone isn’t enough. Google’s AI needs to be able to extract a clear, direct answer from your content. Pages that bury the answer under marketing fluff or require the reader to piece together information from multiple sections tend to get skipped in favour of pages that state things plainly.

ChatGPT’s web search works differently. When you ask it something that needs current information, it runs a web search in the background (using Bing’s index), reads several pages, and synthesises an answer with inline citations.

ChatGPT tends to favour content that reads like a knowledgeable explanation rather than marketing copy. It’s looking for factual density: specific numbers, named examples, clear cause-and-effect statements. Content that says “we offer industry-leading solutions” gets ignored. Content that says “a typical WordPress site costs between $3,000 and $15,000 depending on complexity, with ongoing hosting running $50 to $200 per month” gets quoted.

Perplexity

Perplexity is the most citation-heavy of the three. Every answer it generates includes numbered references, and it’s transparent about where each claim comes from. It uses its own web crawler (PerplexityBot) alongside search APIs to find sources.

Perplexity appears to weight recency and specificity quite heavily. It loves pages with dates, version numbers, and concrete data points. If two pages cover the same topic but one includes specific stats and the other speaks in generalities, Perplexity will almost always cite the specific one.

What makes content “citable”

Across all three systems, certain patterns make content more likely to be cited. This is where AI search optimisation diverges from traditional SEO.

Traditional SEO optimises for ranking. AI search optimisation is about being quotable.

A page can rank well in Google’s organic results because it has strong backlinks, good domain authority, and targets the right keywords. But if the actual content on that page is thin, vague, or structured in a way that’s hard for an AI to extract a clean answer from, it won’t get cited in AI-generated responses.

Here’s what makes the difference:

Structured, chunkable content

AI systems extract information in chunks. They don’t read your entire page and summarise it. They identify the specific section that answers the query, pull that chunk, and use it. Content that’s organised with clear headings, short paragraphs, and self-contained sections is far easier for AI to work with than a single long narrative.

Think of it like a reference book versus a novel. A reference book lets you flip to the exact section you need. AI search engines work the same way.

Factual density

AI systems prioritise content with specific, verifiable claims. “We’ve been in business for years” is weak. “I’ve been building WordPress websites for nearly 20 years, with over 200 projects delivered across ecommerce, SaaS, and service businesses” gives the AI something concrete to work with.

Numbers, timeframes, named technologies, price ranges, and measurable outcomes all count as factual density. The more specific you are, the more quotable your content becomes.

Direct answer patterns

When someone asks “how much does a website cost?”, AI search engines look for content that directly answers that question. A page that starts with “There are many factors to consider when budgeting for a website…” will lose out to one that says “A custom WordPress website typically costs between $5,000 and $15,000, while a template-based site starts from $1,500.”

This doesn’t mean every page needs to be a list of blunt answers. But your content should include clear, extractable statements that an AI can quote without needing to paraphrase heavily.

E-E-A-T signals

Google’s E-E-A-T framework (Experience, Expertise, Authoritativeness, Trustworthiness) matters even more for AI citations than for traditional rankings. AI systems are trying to avoid generating incorrect information, so they lean towards sources that demonstrate real expertise.

This includes things like:

  • Named authors with visible credentials or experience
  • Publication dates and “last updated” timestamps
  • External citations to authoritative sources
  • First-person experience (“I’ve built” or “in my experience” rather than generic third-person advice)
  • Specificity about the subject matter rather than surface-level overviews

Structured data

JSON-LD schema markup helps AI systems understand what your page is about without having to guess. FAQPage schema is particularly valuable because it explicitly marks up question-and-answer pairs, which is exactly the format AI search engines are looking for. Article schema with author and datePublished properties reinforces E-E-A-T signals.

My own site has six different schema types on the homepage alone: WebSite, Organisation, Person, ProfessionalService, WebPage, and FAQPage. That’s not overkill. It’s giving AI systems a structured way to understand who I am, what I do, and what questions I answer.

What I learned auditing my own site

I recently ran a full audit of my own site with a specific focus on AI visibility, and the results were instructive.

My site scored 15 out of 15 for AI crawler accessibility. Every AI crawler is explicitly allowed in my robots.txt. I have an llms.txt file (a relatively new standard that provides AI systems with a structured summary of your site). My sitemap is clean, my content is static HTML (no JavaScript rendering required), and I don’t block any AI-specific meta directives.

But my AI citability score was only 12 out of 20. That gap tells a clear story: the AI systems can access my content perfectly, but the content itself isn’t structured to be easily quoted.

The main issues? My homepage content was too thin and too focused on marketing rather than substance. Process descriptions were single sentences when they needed to be two or three. There were no specific performance metrics or project counts. The content between sections read as brochure copy rather than something an AI would extract and cite.

This is a pattern I see on a lot of business websites. The technical foundations are solid, but the content was written to persuade humans, not to inform AI systems. And increasingly, you need to do both.

Seven things you can check on your own site

If you want to know where your site stands with AI search engines, here are specific things to look at:

  1. Check your robots.txt for AI crawlers. Open yoursite.com/robots.txt and look for mentions of GPTBot, Google-Extended, PerplexityBot, Anthropic, and ChatGPT-User. If any of these are blocked with Disallow, AI search engines can’t access your content. Some WordPress security plugins block these by default.

  2. Search for your brand on Perplexity and ChatGPT. Ask each one “What does [your business] do?” and see what comes back. If they can describe your business accurately and cite your website, you’re in reasonable shape. If they make things up or cite your competitors instead, your content isn’t getting through.

  3. Look at your content through an AI’s eyes. Pick your most important page and ask: could someone extract a clear, factual answer from any section of this page without reading the rest? If every section depends on context from other sections, it’s not structured well for AI extraction.

  4. Count your facts. On your key pages, how many specific, verifiable claims do you make? Numbers, dates, named technologies, price ranges, case study results. If the answer is fewer than five per page, your factual density is probably too low.

  5. Check your structured data. Use Google’s Rich Results Test (search.google.com/test/rich-results) to see what schema markup your pages have. At minimum, you want Organisation or LocalBusiness schema. FAQPage schema on any page with frequently asked questions. Article schema on blog posts with author and date properties.

  6. Look for an llms.txt file. Visit yoursite.com/llms.txt. If it returns a 404, you don’t have one. This file is a newer standard that gives AI systems a structured overview of your site, your services, and your content. It’s not essential yet, but it’s becoming more important as AI search grows.

  7. Review your meta descriptions and headings. AI systems use these as signals for what a page covers. Vague headings like “Our Approach” or “Why Choose Us” tell an AI nothing. Specific headings like “WordPress Website Costs in Australia” or “How Our Development Process Works” help AI systems match your content to relevant queries.

This is still early

I want to be honest about where things stand. AI search optimisation is a young field. The systems are changing rapidly. What works for getting cited by Perplexity today might be different in six months. Google is still actively experimenting with how AI Overviews appear and which sources they cite.

But the fundamentals are clear: structured content, factual density, direct answers, and demonstrated expertise. These are the signals that all three major AI search systems reward. And none of them conflict with good traditional SEO either. Making your content more citable for AI also makes it more useful for human readers.

The businesses that start paying attention to this now will have an advantage as AI search becomes a larger share of how people find information. It’s not about gaming a new algorithm. It’s about making your content genuinely useful to both humans and machines.

If you’re curious about where your site stands across both traditional SEO and AI visibility, that’s exactly what my SEO & AI Visibility Audit covers. It scores your site across technical SEO, structured data, AI citability, E-E-A-T signals, and AI crawler accessibility, with specific recommendations for each.

Frequently asked questions

Not replacing, but layering on top. Google AI Overviews appear above the traditional results, and they're showing up for an increasing number of queries. Traditional rankings still matter because that's partly how Google decides which sites to cite in the overview. But the click-through behaviour is changing. If someone gets a complete answer from the AI overview, they may never scroll down to the organic results. Your content needs to work for both.

Not really. The core principles overlap: structured content, factual specificity, clear answers, and demonstrated expertise. If you get those right, you're covered across all three. The main technical difference is making sure you're not accidentally blocking any AI crawlers in your robots.txt, and having an llms.txt file helps with tools like Perplexity and ChatGPT that look for it.

An llms.txt file is like robots.txt but designed for AI systems. It sits at the root of your website and provides a structured, plain-text summary of who you are, what you do, and what content is on your site. It helps AI crawlers understand your site quickly without needing to parse every page. It's not essential right now, but adoption is growing and it takes about 30 minutes to create. If you're serious about AI visibility, it's worth doing.

Yes, you can. Adding Disallow rules for GPTBot, PerplexityBot, and other AI crawlers in your robots.txt will prevent them from indexing your content. Some businesses choose this to protect proprietary content. But for most businesses, especially service businesses that rely on being found online, blocking AI crawlers means cutting yourself off from a growing share of how people search. I'd only recommend blocking if you have a specific reason to, like protecting premium content behind a paywall.

Need a website?
Let's talk.

From website design and SEO through to custom WordPress plugins and web applications. Nearly twenty years of building for the web, and every project still starts the same way: a conversation about what you actually need.

Phil Kurth, web designer and developer in Geelong