Lesson 01·GEO Foundations·~8 min·why → tactics
How a generative engine chooses what to cite
Before you change a single word on a page to "win at AI search," you have to see the page the way the engine does. It does not see a website. It sees a pile of passages competing to answer a question.
In ~8 minutes you'll be able to
- Explain the 4 moves a generative engine makes before it cites anything.
- See why engines quote passages, not pages — and what that changes.
- Tell an un-citable paragraph from a citable one at a glance.
Short on time? Watch the 60-second version.
Here's the whole lesson in one block — read it, then we'll unpack it:
A generative engine answers you in four moves: it expands your question into many sub-questions, retrieves a handful of candidate passages for each, reranks and filters them through quality gates, then synthesizes an answer and attaches citations to the specific passages it used. The decisive consequence: engines cite passages, not pages. A page can be fetched and still never quoted — because no paragraph in it could stand on its own.
Notice what just happened: that grey box is a canonical answer block — a 40–80 word, front-loaded, self-contained answer. It's the single most "citable" shape a passage can take, and we'll use it constantly. This lesson is written in the form it's teaching.
01The pipeline, one stage at a time
Modern AI search runs on Retrieval-Augmented Generation (RAG): the model doesn't answer from memory, it answers from documents it fetches at query time. Citations can only come from what was retrieved. So the entire game is getting into — and through — this pipeline.
Query fan-out
Your one question becomes 8–10 parallel sub-queries. "Best merit aid for engineering majors" silently spawns "average engineering merit scholarship 2026," "which universities stack merit + need aid," "Barrett Honors scholarship amount," and so on. AI-generated sub-queries run longer than human searches — averaging ~5.5 words on ChatGPT and ~9.1 on Gemini, versus ~3.4 for a classic Google query.[2]
Hybrid retrieval
For each sub-query the engine pulls a small candidate set (Perplexity: ~5–10 pages) using hybrid retrieval — lexical matching (BM25, the actual words) plus dense vector embeddings (semantic meaning).[1] Pages are chunked and embedded so the engine can grab the right section, not the whole document.
Rerank & quality gate
Candidates are re-scored for relevance, quality, and authority. Perplexity runs three reranking layers plus an XGBoost quality gate that checks entity clarity and authoritativeness — content passes through roughly six stages before earning a citation.[1] There's also a strong recency bias: fresher content wins.
Synthesize & cite passages
Finally the model writes an answer constrained by the surviving evidence and attaches citations to the specific passages it quoted — evaluating each passage independently for whether it stands alone and can be attributed cleanly. Only part of a page may get cited even when the page was retrieved.[1]
02The one idea to keep
If you remember nothing else
Engines cite passages, not pages.
Classic SEO got a page to rank. GEO gets a passage quoted. The shift is from "is my page relevant and authoritative overall?" to "can this specific paragraph be lifted out, understood without its neighbours, and attributed in one clean line?" That property has a name — extractability — and it's driven by self-containedness, ~40–80 word length, a front-loaded answer, named entities, and concrete facts.
To be precise: the engine still retrieves whole pages first — so your page has to be reachable and rank well enough to be pulled in — but what it actually quotes is the passage. Optimise the passage; don't neglect the page.
03See it: the same fact, un-citable vs. citable
Same claim, two shapes. Read each as if you're the engine deciding whether you can quote it in one line. (Figures below are illustrative.)
✕ Hard to cite
"We help families find the best merit aid out there. Our platform makes it easy to compare your options and save serious money on college, so you can focus on what matters."
No entity (who is "we"?), no number, no source, no standalone claim. Lift it into an answer and it says nothing checkable. Reranker drops it.
✓ Easy to cite
"Arizona State University's Barrett Honors College awards merit scholarships of up to $15,000/year; in 2026 the median merit package for out-of-state admits was about $9,400, according to MeritPlaybook's 2026 aid dataset."
Named entity, specific numbers, a date, an attributed source — and it stands alone. The engine can quote it verbatim and footnote you.
The right-hand version isn't "better writing" in a literary sense — it's more extractable. That distinction is the whole discipline. And it lines up with the only rigorous study we have: in Aggarwal et al.'s "GEO: Generative Engine Optimization", adding attributed quotations lifted visibility by ~41%, while old-school keyword stuffing made things ~10% worse than doing nothing.[3] We'll mine that paper for the full tactic leaderboard in Lesson 2.
04Check your understanding
4 quick scenarios
Click an answer to get instant feedback. No score is sent anywhere — this is your feedback loop.
Q1Your page ranks #2 on Google for a query but never appears in Perplexity's answer. What's the most likely GEO explanation?
Q2Which change most directly increases a paragraph's extractability?
Q3"Query fan-out" implies which strategy?
Q4A client's React site renders its key content only in the browser via JavaScript. Why is that a GEO risk?
05The honest caveat (so you can advise credibly)
Hold two views at once
A whole industry will sell you a 12-point "GEO checklist." But in May 2026, Google Search Central's official guidance pushed back hard: "optimizing for generative AI search is optimizing for the search experience, and thus still SEO." Google states it does not read llms.txt, does not need you to manually chunk content, and that schema is for rich results — not a magic AI-citation lever.[5]
Both things are true. The pipeline mechanics in this lesson are real (and especially visible on Perplexity/ChatGPT). And most of what makes a passage citable — clarity, authority, structure, facts — is just good SEO done well. The mature position you can defend to a client: GEO is mostly excellent SEO, re-pointed from "rank the page" to "make the passage quotable," with a few genuinely new moves (crawler access for AI bots, off-site consensus signals, AI-visibility measurement). We'll separate myth from method as we go.
Sources
- Authority Tech, "How Perplexity Selects Sources" (2026) — hybrid retrieval, 3 reranking layers + XGBoost gate, ~6 stages, passage-level extraction, recency bias. ↩
- 85SIXTY, "How AI Query Fan-Out Is Reshaping SEO in 2026" — fan-out into 8–10 sub-queries; ChatGPT ~5.5 / Gemini ~9.1 / classic ~3.4 words. ↩
- Aggarwal, Murahari, et al., "GEO: Generative Engine Optimization", KDD 2024 — quotation addition ~+41%; keyword stuffing ~−10% vs baseline. ↩
- Search Engine Journal, "Google AI Overview Citations From Top-Ranking Pages Drop Sharply" (2026) — 38% top-10, 31.2% positions 11–100, 31% beyond 100; 54.5% from organically ranking pages. ↩
- Google Search Central / blog.google (May 2026), summarized w/ quotes at We The Flywheel — "optimizing for generative AI search is … still SEO"; Google does not read
llms.txt. ↩ - Momentic, "AI Search Crawlers & Bots" (2026) & OpenAI bots docs — crawlers, JS-rendering limits, ~40% of sites accidentally block an AI crawler. ↩
— end of lesson 1 —