Skip to main content
guidesSource-backedReview first Safety Privacy

Prompt Injection Defense For Tool Connected Agents

Defend tool-connected agents against prompt injection using documented Claude Code security practices: MCP trust verification, approval gates, least-privilege tools, untrusted content handling, and human review before side-effect tool calls.

by kiannidev·added 2026-06-16·
HarnessClaude Code
Review first review before installing

Open the source and read safety notes before installing.

Safety notes

  • MCP servers that fetch external content can carry prompt injection—official security docs warn operators to verify trust before use.
  • Project-scoped .mcp.json servers require trust verification and approval prompts in Claude Code before first use.
  • Auto-approve or bypass permission modes increase injection blast radius—avoid for repos with untrusted inputs.
  • Side-effect tools (write, bash, network) need explicit human gates when content origin is untrusted.

Privacy notes

  • Injected prompts may exfiltrate data through tool arguments—scope OAuth tokens and filesystem paths narrowly.
  • Logs of tool calls may contain injected instructions—restrict log sharing externally.
  • Revoke MCP OAuth and remove servers promptly when injection is suspected.

Prerequisites

  • Inventory of MCP servers, plugins, and tools connected to Claude Code.
  • Team policy for permission modes and auto-approve settings.
  • Ability to test workflows in an isolated profile before production use.
  • Maintainer or security reviewer for side-effect tool approvals.

Schema details

Install type
copy
Reading time
8 min
Difficulty score
58
Troubleshooting
Yes
Breaking changes
No
Full copyable content
Before connecting MCP servers or enabling autonomous tools, verify trust, require approvals for side effects, scope credentials narrowly, treat external content as untrusted input, and keep a human reviewer on destructive actions.

About this resource

TL;DR

Tool-connected agents can be steered by untrusted text in files, web pages, or MCP resources. Apply documented Claude Code security practices: verify MCP trust, keep approval prompts enabled, scope credentials and paths, treat external content as hostile input, and require human review before destructive tool use.

Prerequisites & Requirements

  • {"task": "Tool inventory", "description": "List MCP servers and high-risk tools in use"}
  • {"task": "Permission policy", "description": "Document allowed permission modes per repo"}
  • {"task": "Test profile", "description": "Isolated Claude Code profile for injection drills"}
  • {"task": "Review owner", "description": "Human approves side-effect tools on untrusted content"}

Core Concepts Explained

Untrusted content reaches the model

When agents read issues, web pages, PDFs, or MCP resources, attacker-controlled text can attempt to override instructions. Official Claude Code security documentation treats third-party MCP servers and external fetches as operator verified—not automatically safe.

Tools amplify successful injection

A injected instruction that reaches an agent with bash, write, or network tools can cause real changes. Least-privilege tool allowlists and permission prompts reduce impact documented in security and MCP guides.

Trust verification is mandatory for project MCP

Claude Code prompts for trust verification before using project-scoped servers from .mcp.json. Skipping review or auto-approving undermines injection defenses.

Step-by-Step Implementation Guide

  1. Inventory connected tools. List MCP servers, plugins, and default tools with read vs write vs network capability.

  2. Enable trust verification. Keep Claude Code prompts for new project MCP servers; document approved servers in team security notes.

  3. Tighten permission modes. Prefer explicit prompts over bypass modes when repositories ingest external content (issues, comments, web fetches).

  4. Scope credentials and paths. Use oauth.scopes pins, sandbox settings, and narrow filesystem allowlists per security documentation.

  5. Isolate untrusted inputs. Process external content in read-only passes first; defer side-effect tools until a human validates summary.

  6. Add hooks or review gates. Use PreToolUse hooks (see secure hooks guide) to block destructive patterns on sensitive branches.

  7. Run injection drills. Test with benign canary strings in staging issues or mock MCP resources; confirm approvals fire.

  8. Document incident response. Remove suspect MCP servers, rotate tokens, and revert commits if injection succeeds.

Troubleshooting

Agent executed tool after malicious issue comment

Revoke MCP tokens, disable auto-approve, add CODEOWNERS review on tool-heavy paths.

MCP server fetches untrusted web content

Restrict server network allowlists or disable server until maintainer patches fetch policy.

Permission prompts disabled team-wide

Re-enable managed settings defaults; audit settings.json for bypass flags.

Source Verification Notes

Verified against Claude Code security, MCP, and MCP security best practices documentation on 2026-06-16:

  • Security docs state operators must verify third-party MCP servers; Anthropic does not security-audit arbitrary servers.
  • Project-scoped MCP from .mcp.json requires trust verification and approval before use.
  • Servers fetching external content can expose workflows to prompt injection.
  • MCP security best practices recommend least privilege, explicit authorization, and careful handling of tool-visible data.

Duplicate Check

Complements threat-model-mcp-servers-before-installation (pre-install review) and secret-handling-for-mcp-servers-and-agent-tools (credential hygiene). No existing guide focuses on prompt injection defense patterns for already-connected tool workflows.

References

Source citations

Add this badge to your README

Show that Prompt Injection Defense For Tool Connected Agents is listed on HeyClaude. Paste this Markdown into your README — it renders the badge and links back to this page.

Listed on HeyClaude
[![Listed on HeyClaude](https://heyclau.de/badge/guides/prompt-injection-defense-for-tool-connected-agents.svg)](https://heyclau.de/entry/guides/prompt-injection-defense-for-tool-connected-agents)

How it compares

Prompt Injection Defense For Tool Connected Agents side by side with 2 alternatives on trust, install, platform support, and disclosed safety notes — all from reviewed registry metadata.

FieldPrompt Injection Defense For Tool Connected Agents

Defend tool-connected agents against prompt injection using documented Claude Code security practices: MCP trust verification, approval gates, least-privilege tools, untrusted content handling, and human review before side-effect tool calls.

Open dossier
Securing Agentic Coding Workflows In Open Source Repos

A maintainer guide for securing agentic coding workflows in open-source repositories: version-controlled MCP configs, permission defaults, contributor PR review for AI-generated changes, hooks, sandbox boundaries, and public disclosure hygiene.

Open dossier
Claude Code in Regulated Finance Environments

How to deploy Claude Code in a security- and compliance-sensitive financial-services environment, using its documented data-handling, ZDR, network, IAM, and sandboxing controls.

Open dossier
Trust
Install riskReview firstReview firstReview first
Notes Safety Privacy Safety Privacy Safety Privacy
Categoryguidesguidesguides
Sourcesource-backedsource-backedsource-backed
AuthorkiannidevkiannidevJSONbored
Added2026-06-162026-06-162025-10-27
Platforms
Claude Code
Claude Code
Claude Code
Source repo
Safety notesMCP servers that fetch external content can carry prompt injection—official security docs warn operators to verify trust before use. Project-scoped .mcp.json servers require trust verification and approval prompts in Claude Code before first use. Auto-approve or bypass permission modes increase injection blast radius—avoid for repos with untrusted inputs. Side-effect tools (write, bash, network) need explicit human gates when content origin is untrusted.Project-scoped .mcp.json is designed for version control; never commit raw tokens—use ${VAR} expansion documented for contributors. Anthropic reviews directory connectors but does not security-audit arbitrary MCP servers; maintainers must threat-model third-party servers before recommending them. Treat AI-generated contributor PRs as untrusted until a human reviewer verifies behavior, dependencies, and security-sensitive paths. Auto-approve modes suitable for solo work are risky in public repos; default contributors to explicit permission prompts.Claude Code runs agentic Bash and file edits in your environment; it requests permission for non-read-only actions, but you are responsible for reviewing proposed commands and code before approval. In regulated environments, run it in sandboxed or containerized contexts and avoid granting blanket allowlists.
Privacy notesInjected prompts may exfiltrate data through tool arguments—scope OAuth tokens and filesystem paths narrowly. Logs of tool calls may contain injected instructions—restrict log sharing externally. Revoke MCP OAuth and remove servers promptly when injection is suspected.Public issue and PR threads must not contain live credentials, customer data, or unfixed vulnerability exploit details. Agent session transcripts and MCP tool output can leak proprietary fork context; remind contributors not to paste secrets into prompts. SECURITY.md should describe private disclosure channels; keep reproduction steps minimal in public comments until coordinated disclosure completes.Financial-data prompts and outputs leave the machine over TLS to your model provider. Standard commercial retention is 30 days; Zero Data Retention is a separate per-org enablement. Transcripts also cache locally in plaintext under ~/.claude/projects/. Confirm provider, retention, and telemetry settings first.
Prerequisites
  • Inventory of MCP servers, plugins, and tools connected to Claude Code.
  • Team policy for permission modes and auto-approve settings.
  • Ability to test workflows in an isolated profile before production use.
  • Maintainer or security reviewer for side-effect tool approvals.
  • Maintainer or core contributor access to repository settings, branch protection, and SECURITY.md.
  • Agreement on which Claude Code surfaces contributors may use (CLI, plugins, MCP servers).
  • CI with secret scanning, dependency review, or equivalent checks enabled where available.
  • A sandbox or disposable environment policy for running untrusted contributor install scripts.
— none listed
Install
Config
Citations
ClaimUnclaimedUnclaimedUnclaimed

Signals

Loading live community signals…

More like this, weekly

A short, calm digest of reviewed Claude resources. Unsubscribe any time.