Skip to main content
skillsSource-backedReview first Safety Privacy

LLMs.txt Search Artifact Validation Capability Pack Skill

Expert validation skill for reviewing /llms.txt, search discovery artifacts, canonical signals, structured data, and LLM-ready documentation links.

by oktofeesh1·added 2026-06-03·
Claude CodeCodexWindsurfGeminiCursorCLI
HarnessClaude CodeCodexWindsurfGeminiCursorCLI
Level:expertType:capability-packVerified:validated
Review first review before installing

Open the source and read safety notes before installing.

Safety notes

  • Installing `llms-txt` adds Python dependencies in the selected environment; pin the reviewed version and prefer project-scoped tooling.
  • The `llms_txt2ctx` helper can retrieve linked HTTPS resources while expanding context; review target URLs before running it against private staging content.
  • Search artifacts influence crawler and model-facing discovery signals; treat canonical, robots, sitemap, and structured data changes as publish-impacting.
  • The source archive is external and version-pinned for reference; package trust should remain a maintainer decision.

Privacy notes

  • `/llms.txt`, expanded context, sitemap, and structured data artifacts can expose URL paths, page titles, product terms, doc structure, and internal naming.
  • Context expansion can collect linked markdown content into a single artifact, which may reveal more detail than the compact index.
  • Keep public review notes focused on artifact shape, URL classes, stale links, and source evidence; avoid exposing private staging paths or unreleased content.

Prerequisites

  • Published or staged documentation site with a known canonical origin
  • `/llms.txt` draft, `llms-full.txt` or expanded context artifact when available
  • `sitemap.xml`, `robots.txt`, canonical URL policy, and structured data inventory
  • Python 3.10 or newer when using the pinned `llms-txt` parser and context helpers
  • Owner decision on which linked pages are essential, optional, stale, or out of scope

Schema details

Install type
package
Reading time
9 min
Troubleshooting
Yes
Source repository stats
Scope
Source repo
Skill and platform metadata
Skill type
capability-pack
Skill level
expert
Verification
validated
Verified at
2026-06-03
Retrieval sources
https://llmstxt.org/https://github.com/AnswerDotAI/llms-txt/releases/tag/0.0.6https://pypi.org/pypi/llms-txt/0.0.6/jsonhttps://raw.githubusercontent.com/AnswerDotAI/llms-txt/0.0.6/pyproject.tomlhttps://raw.githubusercontent.com/AnswerDotAI/llms-txt/0.0.6/llms_txt/core.pyhttps://raw.githubusercontent.com/AnswerDotAI/llms-txt/0.0.6/llms_txt/txt2html.pyhttps://raw.githubusercontent.com/AnswerDotAI/llms-txt/0.0.6/nbs/index.qmdhttps://developers.google.com/search/docs/crawling-indexing/robots/introhttps://developers.google.com/search/docs/crawling-indexing/sitemaps/overviewhttps://developers.google.com/search/docs/crawling-indexing/consolidate-duplicate-urlshttps://developers.google.com/search/docs/appearance/structured-data/intro-structured-data
Tested platforms
ClaudeCodexWindsurfGeminiCursorGeneric AGENTS
PlatformSupportInstall path
claude-codeNative.claude/skills/<skill-name>/SKILL.md
codexNative.agents/skills/<skill-name>/SKILL.md
windsurfNative.windsurf/skills/<skill-name>/SKILL.md
geminiNative.gemini/skills/<skill-name>/SKILL.md or .agents/skills/<skill-name>/SKILL.md
cursorAdapter.cursor/rules/<skill-name>.mdc
cliManualAGENTS.md or tool-specific context file
Full copyable content
# Trigger
"Apply the LLMs.txt search artifact validation capability pack to this published documentation site."

# Required output
1) Source versions, artifact inventory, and duplicate/overlap scope
2) /llms.txt parse, section, link, and context expansion findings
3) Sitemap, robots, canonical, and structured data consistency review
4) Validation plan and publish, hold, or follow-up recommendation

About this resource

Knowledge Freshness

This capability pack is pinned to llms-txt 0.0.6, source tag 0.0.6, PyPI metadata, Answer.AI documentation, and Google Search Central documentation verified on 2026-06-03. The reviewed Python package requires Python >=3.10 and declares fastcore, httpx, and mistletoe dependencies.

Retrieval Sources

Prefer the pinned package metadata, source files, llms.txt proposal, and current Google Search Central docs over model memory for format, parser behavior, canonical signals, sitemap expectations, robots semantics, and structured data placement.

Scope Note

This is not a generic content-cluster strategy skill, PageSpeed optimization skill, IndexNow submission workflow, or crawler product listing. Use it for human-in-the-loop review of published LLM-facing and search discovery artifacts: /llms.txt, expanded context outputs, sitemap membership, robots consistency, canonical URL signals, and structured data alignment.

Core Workflow

  1. Confirm the site origin, reviewed URL set, llms-txt version, source tag, Python runtime, and whether the artifacts are generated or hand-authored.
  2. Inventory artifacts: /llms.txt, optional expanded context output, markdown page variants, sitemap.xml, robots.txt, canonical link elements, structured data, and redirect policy.
  3. Parse /llms.txt against the proposal shape: H1 title, optional summary blockquote, explanatory content, H2-delimited sections, markdown link lists, descriptions, and an Optional section when shorter context is useful.
  4. Review linked resources for relevance, HTTPS URLs, stale destinations, duplicate targets, overly broad pages, and whether descriptions help a model choose the right source.
  5. Expand context with the pinned helper when appropriate, then inspect output size, section balance, missing linked content, and whether optional resources should stay optional.
  6. Compare /llms.txt and expanded context against sitemap coverage. Essential documentation should be discoverable, canonical, and represented consistently.
  7. Review robots policy for crawler intent without treating it as a substitute for /llms.txt; note conflicts that could confuse discovery or indexability.
  8. Review canonical URL signals for duplicate paths, redirect variants, trailing slash policy, .md page variants, and sitemap URL agreement.
  9. Review structured data placement and type alignment so machine-readable markup supports the same entity, product, guide, or documentation intent as the linked pages.
  10. Produce a recommendation with blockers, non-blocking improvements, artifact refresh steps, owner decisions, and a publish, hold, or follow-up outcome.

Capability Scope

  • /llms.txt source and format review
  • llms_txt2ctx context expansion review
  • LLM-ready markdown variant triage
  • Sitemap membership and URL consistency checks
  • Robots policy consistency review
  • Canonical URL and duplicate-path review
  • Structured data alignment review
  • Reviewer-ready artifact evidence and publish recommendation

Compatibility

Native

  • Claude Code / Claude: use as a reusable Agent Skill for documentation artifact review and LLM discovery readiness.
  • Codex/OpenAI workflows: use as SKILL.md-style instructions for search artifact validation and release review.

Manual Adaptation

  • Windsurf and Gemini: adapt the workflow and output contract into their skill formats.
  • Cursor and Generic AGENTS files: convert the production rules and validation checklist into repository-level documentation release guidance.

Required Inputs

  • Site origin and canonical URL policy
  • /llms.txt content and generation source
  • sitemap.xml, robots.txt, and representative page HTML
  • Structured data snippets or rendered page metadata
  • Expanded context artifact when available
  • Owner expectations for essential, optional, stale, and excluded resources

Production Rules

  • Do not approve /llms.txt until the reviewed artifact matches the source site, title, summary, and essential documentation boundaries.
  • Do not use /llms.txt as a crawl policy document; compare it with robots and sitemap behavior instead.
  • Keep human SEO goals separate from model context goals. A good search page can still be too broad or noisy for LLM context.
  • Treat broken links, conflicting canonical URLs, stale sitemap entries, and missing essential docs as release blockers until the owner accepts a documented exception.
  • Keep optional resources optional when they are secondary, long, or only needed for rare questions.
  • Prefer concise descriptions that explain why each linked resource matters.
  • Review structured data for alignment with page intent rather than adding markup only to satisfy a checklist.
  • Record enough source evidence that another reviewer can reproduce the artifact decision.

Output Contract

  1. Source evidence: llms-txt version, source tag, package metadata, docs, URL set, and artifacts reviewed.
  2. Artifact inventory: /llms.txt, expanded context, markdown variants, sitemap, robots, canonical signals, structured data, and generation source.
  3. Findings: parse issues, stale links, missing sections, noisy resources, sitemap gaps, robots conflicts, canonical conflicts, and structured data mismatches.
  4. Impact classification: release blocker, non-blocking polish, accepted exception, or owner decision needed.
  5. Validation plan: exact parser/context command, URL checks, artifact diffs, sitemap/canonical review, and structured data review.
  6. Recommendation: publish, hold, or publish with tracked follow-up.

Troubleshooting

Issue: /llms.txt parses but the expanded context is too large

Fix: Move secondary links into the Optional section, split broad docs into smaller markdown pages, and keep descriptions specific.

Issue: Sitemap and /llms.txt disagree about essential docs

Fix: Decide whether the page is search-discoverable, LLM-context-only, or stale, then align sitemap membership, canonical target, and link description.

Issue: Canonical URL signals conflict with .md variants

Fix: Check redirect policy, canonical link elements, sitemap URLs, and markdown variant naming before recommending publication.

Issue: Robots policy and LLM guidance are being mixed

Fix: Keep crawler access policy in robots and model-facing orientation in /llms.txt; report conflicts without treating one artifact as a replacement for the other.

Issue: Structured data exists but does not match page intent

Fix: Compare the page purpose, primary entity, visible content, and markup type, then recommend narrower or corrected markup.

Validation Checklist

  • llms-txt version, source tag, package metadata, and docs verified.
  • Python runtime requirement checked.
  • /llms.txt title, summary, sections, links, descriptions, and Optional section reviewed.
  • Linked markdown resources checked for relevance and stale destinations.
  • Expanded context output reviewed when available.
  • Sitemap membership compared with essential docs.
  • Robots policy checked for discovery conflicts.
  • Canonical URL targets and redirect variants reviewed.
  • Structured data alignment checked.
  • Publish, hold, or follow-up recommendation documented.
#llms-txt#technical-seo#search-artifacts#structured-data#capability-pack

Source citations

Signals

Loading live community signals…

More like this, weekly

A short, calm digest of reviewed Claude resources. Unsubscribe any time.