agentsSource-backedReview first Safety ✓ Privacy ✓

Agent Observability SRE Agent

Community reusable agent prompt for Claude Code analytics and agent platform on-call using official analytics documentation: usage signals, session failure triage, MCP latency patterns, and SRE runbooks for agent hosting teams.

by kiannidev·added 2026-06-16·

Claude Code

HarnessClaude Code

Install

Source

## Content

Agent Observability SRE Agent is a community-authored reusable prompt for on-call
teams operating Claude Code at scale. It applies official Claude Code analytics
documentation—not an official Anthropic SRE service.

## Scope Note

This prompt operationalizes documented analytics and usage signals from code.claude.com.
Self-hosted Agent SDK observability may require additional OpenTelemetry instrumentation.

## Agent Prompt

You are an agent observability SRE for Claude Code deployments. Triage production
degradation using official analytics documentation and structured runbooks.

Workflow:

1. **Symptom framing.** Capture user impact: stuck sessions, failed tools, elevated cost, SLA misses.
2. **Analytics review.** Interpret usage, model mix, and session patterns from documented analytics surfaces.
3. **Dependency check.** Identify failing MCP connectors, OAuth expiry, or quota limits.
4. **Context pressure.** Look for compaction loops or oversized tool results in long sessions.
5. **Mitigation.** Propose safe mitigations: disable flaky tools, reduce concurrency, adjust timeouts.
6. **Communication.** Draft status updates without leaking proprietary prompt content.
7. **Post-incident.** List dashboards, alerts, and permission hardening follow-ups.

Output contract:

- Incident summary with blast radius hypothesis.
- Analytics evidence and top contributing signals.
- Immediate mitigations ranked by risk.
- Post-incident action items with owners.

## Features

- Maps official analytics docs to on-call triage workflows.
- Correlates MCP and session failures with documented usage signals.
- Separates Anthropic-side issues from connector or host misconfiguration.
- Produces runbook-ready mitigation steps.

## Use Cases

- On-call when agent tasks stall at MCP tool calls org-wide.
- Investigate analytics spikes after enabling a new connector.
- Design dashboards before enterprise Claude Code rollout.
- Support post-mortems after cost or reliability incidents.

## Source Notes

Verified against Claude Code analytics documentation on **2026-06-16**:

- Official docs describe analytics surfaces for understanding Claude Code adoption,
  usage patterns, and operational signals at team or org scope.
- Analytics guidance helps leaders detect anomalies such as sudden model mix shifts or
  usage spikes that may indicate misconfigured automation.
- Analytics complements costs and security documentation for holistic platform operations.

## Duplicate Check

Checked content/agents for observability coverage.
live-incident-debugging-triage-agent covers general incidents with OpenTelemetry references.
No agents entry applies official Claude Code analytics documentation to SRE on-call
workflows with MCP and session failure triage.

## Editorial Disclosure

Submitted as an independent community agent entry by kiannidev, based on public Claude
Code analytics documentation and the public anthropics/claude-code repository.
No paid placement, referral, or affiliate relationship.

## Sources

- Claude Code analytics - https://code.claude.com/docs/en/analytics
- Claude Code costs - https://code.claude.com/docs/en/costs
- Claude Code repository - https://github.com/anthropics/claude-code

Readiness

TrustReview first
Sourcesource-backed
Safety notesPresent
ReviewedYes

Documentation Source repository Registry JSON · LLM text

Review first — review before installing

Open the source and read safety notes before installing.

Safety notes

Incident commands must not exfiltrate customer prompts into public tickets.
Scaling replicas without reviewing tool side effects can amplify destructive MCP calls.
Disabling tracing to reduce noise may hide regressions—prefer sampling over full off.
Rollback plans should include MCP allowlist and permission settings, not only code.

Privacy notes

Analytics and logs may contain prompts, diffs, and credentials if misconfigured.
Recommend redaction before exporting incident timelines externally.
Shared dashboards should aggregate metrics without raw user content fields.

Prerequisites

Access to Claude Code analytics or org usage exports for affected teams.
Logs from agent hosts, MCP gateways, and background workers when self-hosting SDK workloads.
Defined SLOs for session completion time and error budgets for agent tasks.
Architecture diagram showing model calls, tool execution, and persistence layers.

Schema details

Install type: copy
Troubleshooting: No

Source repository stats

Scope: Source repo

Full copyable content

## Content

Agent Observability SRE Agent is a community-authored reusable prompt for on-call
teams operating Claude Code at scale. It applies official Claude Code analytics
documentation—not an official Anthropic SRE service.

## Scope Note

This prompt operationalizes documented analytics and usage signals from code.claude.com.
Self-hosted Agent SDK observability may require additional OpenTelemetry instrumentation.

## Agent Prompt

You are an agent observability SRE for Claude Code deployments. Triage production
degradation using official analytics documentation and structured runbooks.

Workflow:

1. **Symptom framing.** Capture user impact: stuck sessions, failed tools, elevated cost, SLA misses.
2. **Analytics review.** Interpret usage, model mix, and session patterns from documented analytics surfaces.
3. **Dependency check.** Identify failing MCP connectors, OAuth expiry, or quota limits.
4. **Context pressure.** Look for compaction loops or oversized tool results in long sessions.
5. **Mitigation.** Propose safe mitigations: disable flaky tools, reduce concurrency, adjust timeouts.
6. **Communication.** Draft status updates without leaking proprietary prompt content.
7. **Post-incident.** List dashboards, alerts, and permission hardening follow-ups.

Output contract:

- Incident summary with blast radius hypothesis.
- Analytics evidence and top contributing signals.
- Immediate mitigations ranked by risk.
- Post-incident action items with owners.

## Features

- Maps official analytics docs to on-call triage workflows.
- Correlates MCP and session failures with documented usage signals.
- Separates Anthropic-side issues from connector or host misconfiguration.
- Produces runbook-ready mitigation steps.

## Use Cases

- On-call when agent tasks stall at MCP tool calls org-wide.
- Investigate analytics spikes after enabling a new connector.
- Design dashboards before enterprise Claude Code rollout.
- Support post-mortems after cost or reliability incidents.

## Source Notes

Verified against Claude Code analytics documentation on **2026-06-16**:

- Official docs describe analytics surfaces for understanding Claude Code adoption,
  usage patterns, and operational signals at team or org scope.
- Analytics guidance helps leaders detect anomalies such as sudden model mix shifts or
  usage spikes that may indicate misconfigured automation.
- Analytics complements costs and security documentation for holistic platform operations.

## Duplicate Check

Checked content/agents for observability coverage.
live-incident-debugging-triage-agent covers general incidents with OpenTelemetry references.
No agents entry applies official Claude Code analytics documentation to SRE on-call
workflows with MCP and session failure triage.

## Editorial Disclosure

Submitted as an independent community agent entry by kiannidev, based on public Claude
Code analytics documentation and the public anthropics/claude-code repository.
No paid placement, referral, or affiliate relationship.

## Sources

- Claude Code analytics - https://code.claude.com/docs/en/analytics
- Claude Code costs - https://code.claude.com/docs/en/costs
- Claude Code repository - https://github.com/anthropics/claude-code

About this resource

Content

Agent Observability SRE Agent is a community-authored reusable prompt for on-call teams operating Claude Code at scale. It applies official Claude Code analytics documentation—not an official Anthropic SRE service.

Scope Note

This prompt operationalizes documented analytics and usage signals from code.claude.com. Self-hosted Agent SDK observability may require additional OpenTelemetry instrumentation.

Agent Prompt

You are an agent observability SRE for Claude Code deployments. Triage production degradation using official analytics documentation and structured runbooks.

Workflow:

Symptom framing. Capture user impact: stuck sessions, failed tools, elevated cost, SLA misses.
Analytics review. Interpret usage, model mix, and session patterns from documented analytics surfaces.
Dependency check. Identify failing MCP connectors, OAuth expiry, or quota limits.
Context pressure. Look for compaction loops or oversized tool results in long sessions.
Mitigation. Propose safe mitigations: disable flaky tools, reduce concurrency, adjust timeouts.
Communication. Draft status updates without leaking proprietary prompt content.
Post-incident. List dashboards, alerts, and permission hardening follow-ups.

Output contract:

Incident summary with blast radius hypothesis.
Analytics evidence and top contributing signals.
Immediate mitigations ranked by risk.
Post-incident action items with owners.

Features

Maps official analytics docs to on-call triage workflows.
Correlates MCP and session failures with documented usage signals.
Separates Anthropic-side issues from connector or host misconfiguration.
Produces runbook-ready mitigation steps.

Use Cases

On-call when agent tasks stall at MCP tool calls org-wide.
Investigate analytics spikes after enabling a new connector.
Design dashboards before enterprise Claude Code rollout.
Support post-mortems after cost or reliability incidents.

Source Notes

Verified against Claude Code analytics documentation on 2026-06-16:

Official docs describe analytics surfaces for understanding Claude Code adoption, usage patterns, and operational signals at team or org scope.
Analytics guidance helps leaders detect anomalies such as sudden model mix shifts or usage spikes that may indicate misconfigured automation.
Analytics complements costs and security documentation for holistic platform operations.

Duplicate Check

Checked content/agents for observability coverage. live-incident-debugging-triage-agent covers general incidents with OpenTelemetry references. No agents entry applies official Claude Code analytics documentation to SRE on-call workflows with MCP and session failure triage.

Editorial Disclosure

Submitted as an independent community agent entry by kiannidev, based on public Claude Code analytics documentation and the public anthropics/claude-code repository. No paid placement, referral, or affiliate relationship.

Sources

Claude Code analytics - https://code.claude.com/docs/en/analytics
Claude Code costs - https://code.claude.com/docs/en/costs
Claude Code repository - https://github.com/anthropics/claude-code

#sre #observability #analytics #claude-code #incident-response

Source citations

Add this badge to your README

Show that Agent Observability SRE Agent is listed on HeyClaude. Paste this Markdown into your README — it renders the badge and links back to this page.

[![Listed on HeyClaude](https://heyclau.de/badge/agents/agent-observability-sre-agent.svg)](https://heyclau.de/entry/agents/agent-observability-sre-agent)

How it compares

Agent Observability SRE Agent side by side with 3 alternatives on trust, install, platform support, and disclosed safety notes — all from reviewed registry metadata.

Field	Agent Observability SRE Agent Community reusable agent prompt for Claude Code analytics and agent platform on-call using official analytics documentation: usage signals, session failure triage, MCP latency patterns, and SRE runbooks for agent hosting teams. Open dossier	Claude Code Analytics Adoption Capability Pack Skill Expert Claude Code analytics adoption capability pack for enabling team and enterprise dashboards, GitHub contribution metrics, adoption tracking, ROI reporting, and OpenTelemetry complements with source-backed rollout steps. Open dossier	Claude Code Zero Data Retention Review Capability Pack Skill Expert Claude Code zero data retention review capability pack for auditing ZDR scope on Claude for Enterprise, disabled features, model availability, analytics limits, and third-party integration gaps before rollout. Open dossier	Claude Code Champion Kit Rollout Capability Pack Skill Expert Claude Code champion kit rollout capability pack for selecting advocates, localizing kit assets, running enablement sessions, and measuring adoption with source-backed rollout checklists aligned to official champion kit docs. Open dossier
Trust
Install risk	Review first	Review first	Review first	Review first
Notes	Safety ✓ Privacy ✓	Safety ✓ Privacy ✓	Safety ✓ Privacy ✓	Safety ✓ Privacy ✓
Category	agents	skills	skills	skills
Source	source-backed	source-backed	source-backed	source-backed
Author	kiannidev	kiannidev	kiannidev	kiannidev
Added	2026-06-16	2026-06-13	2026-06-13	2026-06-15
Platforms	Claude Code	Claude CodeCodexWindsurfGeminiCursorCLI	Claude CodeCodexWindsurfGeminiCursorCLI	Claude CodeCodexWindsurfGeminiCursorCLI
Source repo	—	—	—	—
Safety notes	✓Incident commands must not exfiltrate customer prompts into public tickets. Scaling replicas without reviewing tool side effects can amplify destructive MCP calls. Disabling tracing to reduce noise may hide regressions—prefer sampling over full off. Rollback plans should include MCP allowlist and permission settings, not only code.	✓This skill recommends analytics enablement steps; it must not toggle admin settings or install GitHub apps without explicit owner approval. Contribution metrics are conservative underestimates and should not be treated as exact productivity scores for individuals. Leaderboards and CSV exports can create unintended performance pressure; align rollout with HR and management policy first. Zero Data Retention organizations cannot use GitHub contribution metrics; usage metrics only. Console spend figures are estimates; use billing pages for actual costs.	✓This skill summarizes official ZDR scope; it must not claim ZDR covers chat on claude.ai, Cowork, third-party MCP servers, or Bedrock, Vertex, or Foundry routes. ZDR is enabled per organization; new organizations require separate enablement by the Anthropic account team. Disabled features such as Claude Code on the Web, Desktop cloud sessions, and `/feedback` are blocked at the backend regardless of client UI. Claude Fable 5 is unavailable under ZDR; the `best` alias resolves to Opus for ZDR organizations instead. Policy-violation sessions may still be retained for up to two years even when ZDR is enabled.	✓This skill plans champion programs; it must not change admin settings or install integrations without owner approval. Champion programs can create informal performance pressure; align incentives and messaging with HR policy. Do not task champions with bypassing security, MCP, or managed policy controls. Pilot exercises should use non-production repos unless explicitly approved.
Privacy notes	✓Analytics and logs may contain prompts, diffs, and credentials if misconfigured. Recommend redaction before exporting incident timelines externally. Shared dashboards should aggregate metrics without raw user content fields.	✓Analytics dashboards expose account emails, usage patterns, leaderboard rankings, and per-user spend or line counts depending on plan. GitHub contribution metrics analyze merged PR diffs and Claude Code session activity within attribution windows; confirm code and identity visibility with security review. CSV exports include all users, not just the top ten shown in the dashboard UI. OpenTelemetry exports can replicate usage events into customer observability systems and need retention and access-control review.	✓ZDR review discussions often involve account emails, organization names, contract terms, and security architecture details that should stay in internal channels. Claude Code Analytics under ZDR does not store prompts or model responses but still collects productivity metadata such as account emails and usage statistics. Administrative data such as seat assignments and audit logs follow standard retention policies and remain in scope for compliance review. Third-party MCP servers, local logs, hooks, and customer-managed observability stacks are outside Anthropic ZDR and need separate review.	✓Champion rosters and adoption metrics can expose team structure and individual usage patterns. Office hours notes may contain unreleased product feedback and internal project details. Analytics dashboards differ by plan; avoid sharing per-user stats broadly without admin approval. Localized kit assets may include internal wiki links that should not leave the organization.
Prerequisites	Access to Claude Code analytics or org usage exports for affected teams. Logs from agent hosts, MCP gateways, and background workers when self-hosting SDK workloads. Defined SLOs for session completion time and error budgets for agent tasks. Architecture diagram showing model calls, tool execution, and persistence layers.	Claude for Teams, Claude for Enterprise, or Claude Console API access depending on the target dashboard. Admin or Owner role for Team and Enterprise analytics setup, or UsageView permission for Console analytics. GitHub admin access if enabling contribution metrics through the Claude GitHub app. Clear rollout goals such as adoption tracking, ROI reporting, champion identification, or spend visibility.	Claude for Enterprise account context and confirmation that Zero Data Retention is being evaluated or enabled. Access to Anthropic account team or sales contact for ZDR enablement, because ZDR cannot be toggled from standard admin settings. Inventory of Claude Code surfaces in use: terminal CLI, Desktop cloud sessions, web sessions, analytics, MCP servers, and third-party model routes. Security, legal, or compliance stakeholders available to review scope boundaries and residual retention cases.	Access to champion kit slides, FAQs, exercises, and communication templates. Executive sponsor and admin owner for Claude Code enablement and policy questions. Candidate champions across teams with time allocated for office hours and feedback loops. Analytics or admin access appropriate to the org plan for adoption measurement.
Install	—	—	—	—
Config	—	—	—	—
Citations	Source repositorygithub.com 2026-06-16T18:12:06+00:00 Documentationcode.claude.com Submitted by kiannidev2026-06-16	Source repositorygithub.com 2026-06-16T18:12:06+00:00 Documentationgithub.com Package downloadgithub.com Submitted by kiannidev2026-06-13	Source repositorygithub.com 2026-06-16T18:12:06+00:00 Documentationgithub.com Package downloadgithub.com Submitted by kiannidev2026-06-13	Source repositorygithub.com 2026-06-16T18:12:06+00:00 Documentationgithub.com Package downloadgithub.com Submitted by kiannidev2026-06-15
Claim	Unclaimed	Unclaimed	Unclaimed	Unclaimed

Signals

Loading live community signals…