> ## Documentation Index
> Fetch the complete documentation index at: https://docs.openlens.com/llms.txt
> Use this file to discover all available pages before exploring further.

# AI Indexing

> Check whether AI platform retrieval surfaces can verify a client's exact pages.

AI Indexing checks whether AI platform retrieval and search surfaces can verify exact pages from a client project. It answers a narrower question than the main dashboard: when OpenLens asks a platform to look for this page, does the platform return or cite the same canonical page URL?

Use it when you want to audit whether important pages are available to the retrieval layers behind AI answers, not whether the brand is visible in normal buyer prompts.

## How it differs from Site Readiness

Site Readiness checks whether a website is technically accessible and understandable to agents and crawlers. It looks at things like robots rules, structured data, page discoverability, snippet controls, accessibility, and agent-facing content quality.

AI Indexing runs separately because it is heavier. It sends page-level probes to external AI or search surfaces and waits for platform responses. A Site Readiness run can complete quickly from crawl and page analysis; an AI Indexing run may take longer because each platform has its own retrieval behavior, rate limits, and response time.

## How we measure this

AI Indexing currently checks:

| Platform | How it is measured                                                                              |
| -------- | ----------------------------------------------------------------------------------------------- |
| ChatGPT  | Title/domain-scoped web-search probe that checks whether ChatGPT cites the same canonical page. |
| Claude   | Brave Search proxy for Claude's web retrieval surface, using host-scoped slug queries.          |
| Gemini   | Grounded Gemini search probe that reads the result URLs from Gemini's search response.          |

OpenLens compares returned URLs by canonical page identity, not by literal string equality. Scheme, `www.` or mobile subdomain variants, query strings, fragments, trailing slashes, and percent encoding should not turn the same page into a false miss.

## Why Claude uses Brave Search

Anthropic does not expose a first-party per-page Claude index API. For Claude, OpenLens uses Brave Search as a retrieval proxy because Claude's web search behavior has strong public evidence pointing to Brave, and because OpenLens performed independent empirical validation comparing Claude web-search results with Brave Search results across sampled exact-page probes. That validation found the similarity strong enough to use Brave as the closest practical public proxy for Claude retrieval coverage.

Anthropic's [web search tool docs](https://platform.claude.com/docs/en/agents-and-tools/tool-use/web-search-tool) describe Claude web search as a server-side tool that runs searches and returns cited sources. TechCrunch reported that Anthropic added Brave Search to its subprocessor list, that Simon Willison observed matching Claude and Brave citations for the same query, and that Claude's internal web search schema exposed a `BraveSearchParams` name. See [TechCrunch's report](https://techcrunch.com/2025/03/21/anthropic-appears-to-be-using-brave-to-power-web-searches-for-its-claude-chatbot/) and [Brave's Search API page](https://brave.com/search/api/).

This is still a proxy. A pass means the exact page is available through Brave Search for the query OpenLens ran. That is the closest practical public signal for Claude retrieval coverage, not a guarantee that every Claude answer will cite the page.

## Scores and statuses

Each platform-page probe resolves to one of three product-facing outcomes.

* **Indexed** means the platform returned or cited the exact normalized page URL.
* **Not indexed** means the platform completed a grounded probe but did not return or cite that exact canonical page.
* **Unknown** means OpenLens could not get enough grounded provider evidence to decide. It is not the same thing as not indexed.
* **Score** is the percent of completed checks that passed, weighted through the same readiness scoring system used elsewhere in OpenLens.

The score is useful for comparing runs over time. The individual page rows are more useful for seeing where to investigate first.

## How accurate are these verdicts

Our verdicts are highly accurate, and we measured how accurate against a labelled dataset.

To check whether an assistant can find your page, we built our own prompts about that page that reliably get the assistant to show it when the page is indexed. We then tested those prompts against a labelled dataset of both popular and un-indexed pages to see how well they work.

* When we mark a page **Indexed** on ChatGPT or Claude, we are right about 100% of the time. For ChatGPT we only count pages the assistant actually cited, and for Claude we can search its Brave index directly.
* Gemini is the exception. Gemini does not reliably tell us which URLs it retrieved, so we read them from its response, and it can sometimes give a page URL it reconstructed from the site address rather than one it actually found. Treat a Gemini **Indexed** as slightly lower confidence. We are actively improving this.
* When we mark a page **Not indexed**, we are right more than 99% of the time based on our labelled dataset. That figure comes from the dataset, so per-site results can vary.

If you find a page where our verdict looks wrong, email [contact@aibread.com](mailto:contact@aibread.com) and we will look into it.

## Page scope

Use **Pages to crawl** to choose how many discovered pages the run should inspect. Use **Path prefix** to restrict the run to one part of a site, such as `/blog` or `/docs`.

The run uses the discovered page list for the project and applies the selected scope before sending platform probes. Scanned pages count toward AI Indexing usage limits even when page details are hidden by plan gating.

Monthly AI Indexing page allowances are separate from Site Readiness: Free includes 150 pages a month, Starter includes 1,000 pages per seat each month, and Agency includes 10,000 pages per seat each month. The allowance resets on the 1st of each month, UTC.

## Free and paid views

Free users can preview the first few page results and see summary counts for the full run. Paid plans reveal more per-page details, evidence, and the basic recommendation attached to each not-indexed or unknown result.

The summary still counts every scanned page. If a run says 50 pages were checked, hidden rows are included in the score and platform totals.

## Scheduling

Scheduled AI Indexing checks re-run on the configured cadence for the selected project URL. They use the current URL and schedule settings from the AI Indexing page.

Scheduling is separate from Site Readiness scheduling. You can run Site Readiness and AI Indexing independently, and one does not block the other.

## Cancellation

Cancel stops an in-flight AI Indexing run from continuing to process additional work. Any results already written remain visible as partial output, and the run is marked cancelled.

Use cancellation when a run is taking too long, when the wrong project or scope was selected, or when you want to preserve usage for a narrower follow-up run.

## How to use it

* Start with the default page count to verify the flow for a project.
* Use a path prefix when you care about one content section, such as blog posts or documentation.
* Compare platform totals first, then expand individual platform rows to inspect the pages that need attention.
* Treat a single not-indexed result as a signal to investigate, not proof that the platform can never retrieve the page.
* Treat unknown results as provider or runtime uncertainty. Re-run before making content changes based on them.
* Re-run after publishing content, changing crawl directives, or improving page metadata.

## Caveats

AI platform retrieval is not perfectly stable. Results can vary by provider, timing, query interpretation, and upstream search behavior. Some checks use platform proxies because providers do not expose a first-party per-page index API.

AI Indexing should be read as an operational visibility audit: it shows what the platform retrieval surface returned for this run, with enough detail to find gaps and track whether changes improve coverage over time. It does not guarantee that every future AI answer will cite the page.