You can't optimise what you can't measure. If you've shipped schema fixes, llms.txt, FAQ rewrites — and you have no idea whether ChatGPT actually cites you now — you're flying blind. This post is the manual tracking method I use on every AEO engagement before I touch a paid tool.
It costs nothing to run. It works for the first 6 months minimum. It's the same approach we use on the citelift.app audit engine we're productising. Manual first, automate after the manual phase teaches you what to automate.
The 30-second answer
Step one: pick a fixed prompt set of 20 queries
The prompt set is the most critical decision. The wrong prompt set produces beautiful charts of irrelevant signal. The right prompt set captures what your buyers actually ask.
Twenty queries strikes the right balance — enough signal to detect meaningful movement, manageable enough for manual logging each week. Mix three types:
- Head terms — your industry plus AEO, e.g. "best AI automation agency."
- Comparison queries — "X vs Y," e.g. "n8n vs Zapier for service businesses."
- Long-tail intent — "how to Z for X," e.g. "how to automate lead follow-up for dental practice."
Don't include vanity queries (your own brand name verbatim — engines will obviously cite you). Don't include queries your buyer would never search. Spend an afternoon listing 50 candidates, cut to 20.
Step two: run the prompts manually for the first month
Tools like Otterly, Athena HQ and Profound automate this. Skip them for the first month. The manual phase teaches you what the engines actually return, which the automated dashboards hide behind aggregate metrics.
Open four browser tabs:
- ChatGPT (chat.openai.com, web-search mode on)
- Claude.ai (claude.ai)
- Gemini (gemini.google.com)
- Perplexity (perplexity.ai)
Paste each query into each engine. Capture the response. Note whether your brand was named. Document which competitors appeared. Repeat for all 20 queries. First time takes 90 minutes. By week three you're down to 30 minutes.
Step three: log the results in a spreadsheet
Columns I use:
- Prompt (one of the 20)
- Engine (ChatGPT / Claude / Gemini / Perplexity)
- Week (YYYY-MM-DD of the Monday)
- Brand cited (yes / no)
- Competitors cited (comma-separated)
- Position in answer (first mentioned / middle / last / image-card)
- Notes (any unusual context — e.g. engine flagged uncertainty)
Rows accumulate. After four to six weeks, patterns emerge. You'll see which engines cite you, which prompts win, which competitors dominate, and which weeks moved.
Step four: calculate citation rate and share of voice
Two metrics, both useful:
Citation rate = percent of prompts in your set where your brand was named. If 5 out of 20 prompts mention you, citation rate is 25%. This is the absolute measure.
Share of voice = your citations divided by total brand citations across all competitors. If you got 5 citations and competitors collectively got 15, your share of voice is 25%. This captures relative shifts when total citation volume moves.
Track both weekly. They diverge in useful ways — citation rate can stay flat while share of voice climbs (competitors getting filtered out), or share of voice can stay flat while citation rate climbs (whole category getting more citations).
Step five: correlate with brand search and direct traffic
Wire weekly citation rate into a simple dashboard alongside:
- Brand search impressions from Google Search Console
- Direct traffic from your analytics tool
- Branded organic clicks
The correlation builds over 60-90 days. The pattern I see most often: citation rate moves week 1-2, brand search lifts week 3-5, direct traffic lifts week 6-8, conversion improves week 8-12. Anyone promising faster is selling vibes.
Step six: automate after the manual phase
Once you understand patterns manually, automation makes sense. Tools I've used or evaluated:
| Tool | Best for | Price band |
|---|---|---|
| Otterly | Solo founders, indie SaaS | $$ (low) |
| Athena HQ | Mid-market, marketing teams | $$$ (mid) |
| Profound | Enterprise, multi-brand | $$$$ (high) |
| citelift.app (preview) | Self-serve scan model | $ (free tier planned) |
Pick the tool that matches your budget and granularity requirements. None of them eliminate the need to understand the prompt set — they automate the logging, not the strategy.
Common tracking mistakes to avoid
- Changing the prompt set mid-quarter destroys comparability. Lock it for at least 90 days.
- Tracking only ChatGPT misses signal. The four engines diverge enough to matter.
- Measuring citation rate without correlation to brand search and traffic obscures funnel impact.
- Running daily tracking creates noise that obscures weekly signal. Weekly is the right cadence.
- Including your own brand name as a query inflates citation rate with vanity matches.
What I actually do for SkynetLabs clients
Inside our AEO engagements, the tracking workflow is:
- Discovery call — define the 20-prompt set together based on real buyer queries.
- Week 0 baseline — manual run against all four engines, log to shared Google Sheet.
- Weeks 1-12 — weekly manual run, audit calls every two weeks to interpret the movement.
- Week 12 review — decide whether to automate via Otterly / Athena, or keep manual.
The clients who keep tracking manually for longer than 3 months tend to learn faster than the clients who jump to automation week 1. The patterns are too easy to hide behind a dashboard.
Frequently asked questions
How often should I run the prompt set?
Weekly is the right cadence for most teams. Daily creates noise. Monthly misses fast movement. Lock to weekly and only change cadence if you have a specific reason.
How many queries should be in the prompt set?
Twenty is the sweet spot. Ten is too few for noise reduction. Fifty is too many to log manually. Twenty fits in 30 minutes a week and produces a stable signal after four runs.
Should I track Bing Copilot separately?
Bing Copilot draws heavily from ChatGPT under the hood. Most brands skip it and accept the indirect coverage from ChatGPT tracking. Add it as a fifth engine only if Bing is a strategic traffic channel for you.
Can I track AI citations through Google Analytics?
Partially. Direct referral traffic from Perplexity and Bing Copilot shows up in GA. Pure citations without click-through don't. You need the prompt-set method for that signal.
What's a good citation rate to target?
Highly variable by industry. Mid-market SaaS often targets 25% of prompts citing the brand. Healthcare and legal typically lower (15-20%). Ecommerce sometimes higher depending on category. The baseline matters more than the absolute — your week 12 vs your week 0.
Do I need expensive tools to track this?
No. The manual spreadsheet method works for the first six months. Tooling becomes useful when scaling past 20 prompts, tracking multiple brands, or wanting continuous (not weekly) tracking.
Bottom line
Tracking AI citations isn't complicated. Twenty prompts, four engines, one spreadsheet, weekly cadence. The discipline is doing it every week without changing the prompt set. The tooling can wait. Start manual, learn what the engines actually return, automate only what you understand.