Kreuzberg MCP Server
Document intelligence MCP server for extracting text, metadata, OCR output, structured data, embeddings, chunks, cache state, and supported-format information from PDFs, Office files, images, code, and many other formats.
Open the source and read safety notes before installing.
Safety notes
- Kreuzberg MCP can read local files supplied to extraction tools and can process batches of files when paths are provided.
- OCR, structured extraction, embeddings, and VLM features may invoke local or provider-hosted models depending on configuration.
- Cache tools can warm, inspect, or clear model/cache state; review cache directories and mounted volumes in shared environments.
- Docker deployments should mount only the directories the agent is allowed to read.
- Treat extracted text, metadata, structured fields, embeddings, and chunks as sensitive outputs when source documents are private.
Privacy notes
- Documents may contain PII, contracts, invoices, source code, screenshots, medical data, financial data, credentials, or proprietary design information.
- Extracted metadata can reveal filenames, authors, timestamps, document structure, attachments, image details, and software fingerprints.
- OCR and structured extraction can expose text that was not previously copyable from scanned PDFs or images.
- Embedding and VLM/LLM configuration may send document-derived content to external model providers if configured that way.
- Cache directories, logs, model downloads, and MCP transcripts may retain document-derived context.
Prerequisites
- Python environment suitable for installing `kreuzberg[all]`, or Docker if using the container path.
- MCP-capable client such as Claude Desktop, Cursor, or a custom stdio MCP client.
- Local file paths or mounted volumes limited to documents you are authorized to process.
- Optional OCR, embeddings, VLM OCR, and LLM provider dependencies configured only when those features are needed.
- Understanding of document sensitivity before extracting, OCRing, embedding, or chunking private files.
Schema details
- Install type
- cli
- Troubleshooting
- No
- Scope
- Source repo
- Estimated setup
- 10 minutes
- Difficulty
- intermediate
Full copyable content
pip install "kreuzberg[all]"
kreuzberg mcpAbout this resource
Content
Kreuzberg MCP Server exposes Kreuzberg's document intelligence engine through the Model Context Protocol. It lets Claude, Cursor, and custom MCP clients extract content from files, generate embeddings, chunk text, manage cache state, detect formats, and inspect supported extraction capabilities without writing custom extraction code.
The upstream MCP guide documents stdio mode with kreuzberg mcp as the default
local setup. Docker can also run the same MCP mode when file access and cache
volumes need to be controlled more explicitly.
Source Review
- https://github.com/kreuzberg-dev/kreuzberg
- https://github.com/kreuzberg-dev/kreuzberg/blob/main/README.md
- https://docs.kreuzberg.dev/guides/mcp-integration/
- https://docs.kreuzberg.dev/cli/usage/
- https://docs.kreuzberg.dev/guides/docker/
- https://github.com/kreuzberg-dev/kreuzberg/pkgs/container/kreuzberg
- https://pypi.org/pypi/kreuzberg/json
These sources were reviewed on 2026-06-06. Prefer the live MCP integration guide, CLI usage docs, Docker guide, repository README, container package, and PyPI metadata for current install commands, server modes, feature flags, and runtime dependencies.
Features
- Run an MCP server over stdio with
kreuzberg mcp. - Extract file content and metadata from PDFs, Office documents, images, HTML, XML, email, archives, academic formats, text files, and code.
- Detect MIME types and list supported formats.
- Batch-extract several files.
- Generate embeddings and chunk text when configured.
- Extract structured data when the required LLM feature is available.
- Inspect and clear cache state.
- Use config files to apply extraction defaults.
- Run the MCP server through Docker with controlled file and cache mounts.
Installation
Install Kreuzberg with its optional feature bundle:
pip install "kreuzberg[all]"
Start the stdio MCP server:
kreuzberg mcp
Configure an MCP client with the documented command:
{
"mcpServers": {
"kreuzberg": {
"command": "kreuzberg",
"args": ["mcp"]
}
}
}
For containerized setups, mount only the document directories and config files the agent should access.
Use Cases
- Extract text and metadata from PDFs, Office files, images, and email for analysis in Claude.
- OCR scanned documents or screenshots before summarization.
- Chunk long documents for RAG and agent workflows.
- Generate embeddings for document-derived text when configured.
- Detect file types before choosing an extraction path.
- Build a local document-processing MCP workflow without exposing documents to a hosted extraction service.
Safety and Privacy
Kreuzberg can turn private files into model-visible text. That is useful, but it also means access boundaries matter. Mount or expose only approved directories, avoid broad workspace paths, and start with a small file before batch processing.
Review OCR, embedding, VLM, and LLM provider configuration before enabling features that can send document-derived content outside the local machine. Keep cache directories, logs, and extracted outputs out of shared repos unless they are approved for publication.
Duplicate Check
No kreuzberg-dev/kreuzberg entry, Kreuzberg MCP entry, or matching Kreuzberg
source URL was found in content/mcp.
Source citations
Signals
Loading live community signals…
A short, calm digest of reviewed Claude resources. Unsubscribe any time.