PDF Reader MCP

PDF-focused MCP server that lets Claude read one or more local or remote PDFs, extract full text, page ranges, metadata, page counts, embedded images, and table-like structures.

by Sylphx · submitted by oktofeesh1·added 2026-06-06·

Claude Code Codex Cursor Claude Desktop

HarnessClaude Code Codex Cursor Claude Desktop

Command center

Source

Review first

Review safety and privacy notes before installing or copying commands.

Safety notes Privacy notes

Install & copy

npx -y @sylphx/pdf-reader-mcp

Trust & readiness

TrustReview first
Sourcesource-backed
Safety notesPresent
ReviewedNo

Community context

Related entries(4)
Community signals

Compare

Integrations & API

Contribute

Suggest a metadata change Claim this listing

Documentation Source repository Browse directory

Review first — review before installing

Open the source and read safety notes before installing.

Citation facts

Source-backed facts for citing this resource, derived directly from the registry — also available as plain text for AI assistants.

Canonical URL: https://heyclau.de/entry/mcp/pdf-reader-mcp
Source URLs: https://github.com/SylphxAI/pdf-reader-mcp/blob/main/README.md, https://github.com/SylphxAI/pdf-reader-mcp
Brand: PDF Reader MCP
Brand domain: github.com
Safety notes: PDF Reader MCP can read local PDF paths and fetch remote PDF URLs unless runtime security settings restrict those sources., The source supports directory allowlists, host allowlists, disabling URL sources, and SSRF checks for private IPs., Large or malformed PDFs can still be slow, memory-intensive, partially parsed, or return extraction errors despite size and timeout controls., Extracted text, tables, images, and metadata may be incomplete or out of order for complex scanned, encrypted, or layout-heavy PDFs., If HTTP transport is enabled, bind and authenticate it carefully before exposing it beyond trusted local clients.
Privacy notes: PDFs can contain confidential text, embedded images, hidden metadata, author fields, comments, form data, signatures, and document history., Local file paths, remote URLs, page selections, extracted text, base64 image data, table content, and metadata may be visible to the MCP client and model provider., Remote PDF fetching reveals requested URLs and request metadata to upstream hosts., Configure allowed directories and hosts before using the server with private documents, customer files, contracts, invoices, medical records, or regulated material.
Author: Sylphx
Submitted by: oktofeesh1
Claim status: unclaimed
Last verified: 2026-06-06

Decision playbook

Review trust signals before you adopt

Signals are present but mixed. Use the checklist below to confirm the source and operational safety for your environment.

Compare context

Selected

Current score

Baseline

—

Delta

No baseline selected

No major trust-signal divergence detected in the current selection.

Source and provenance checks

Needs review

Confirm ownership and provenance before trusting install instructions.

Source link availableRequired
Open the canonical repository and verify ownership.
Done
Source provenance statusRequired
Marked as source-backed.
Done
Metadata reviewed
No reviewed flag detected in metadata.
Pending

Safety and privacy checks

Complete

Validate risk disclosures before installation or API wiring.

Safety notes presentRequired
Review the listed safety guidance before running commands.
Done
Privacy notes presentRequired
Review data handling notes before connecting accounts or secrets.
Done
Trust level risk gateRequired
Trust level does not block evaluation.
Done

Package and install checks

Needs review

Check package metadata and artifact integrity signals.

Install payload available
Install or copy payload is available for review.
Done
Package verification flag
No package verification flag provided.
Pending
Checksum metadata
No checksum provided for downloaded artifact.
Pending

Compare-driven decision checks

Needs review

Use compare context to validate trade-offs before adoption.

Compare tray has multiple entries
Add at least one more entry to compare trust differences.
Pending
Baseline comparison available
No baseline peer selected yet.
Pending
Diverging trust signals identified
No major trust-signal divergence found.
Pending

Setup at a glance

CLI install

Copy-ready — paste the snippet to get started.

10 minutes

Install command

Provided

Config snippet

Provided

Copy snippet

Provided

Prerequisites

3 to clear

Platforms

4 listed

Install type

CLI install

Adoption plan

Balanced adoption plan

Current risk score 24/100. Use staged verification before broader rollout.

Risk 24

Pre-adoption checks

Validate source and review signals before any execution.

Confirm source provenanceRequired
Source URL/provenance metadata is present.
Done
Confirm metadata review state
No review metadata found; increase manual validation.
Pending
Verify install payload
Install/config payload exists and can be inspected.
Done

Security checks

Confirm safety, privacy, and package integrity signals.

Review safety notesRequired
Safety notes are present.
Done
Review privacy notesRequired
Privacy notes are present.
Done
Verify package integrity metadata
No package verification/checksum metadata.
Pending

Rollout

Adopt in controlled steps based on the selected plan.

Run in isolated sandbox firstRequired
Use a constrained sandbox and observe behavior across multiple tasks.
Pending
Roll out graduallyRequired
Roll out to a small cohort before wider usage.
Pending
Set monitoring and fallback
Define rollback path and monitor errors after adoption.
Pending

Evidence readiness

Evidence readiness matrix · balanced

Missing required evidence: Metadata review. Risk score 31.

Risk 31

Source provenance

Present

Source repository/provenance is listed.

Required in this preset

Metadata review

Missing

Review metadata is missing.

Required in this preset

Safety notes

Present

Safety notes are present.

Required in this preset

Privacy notes

Present

Privacy notes are present.

Optional in this preset

Package integrity

Missing

Package integrity metadata is missing.

Optional in this preset

Install payload

Present

Install payload is available.

Required in this preset

Required gaps: Metadata review

Decision timeline

Decision timeline · balanced

Blocking gaps: Check metadata review status. Risk 28.

Risk 28

triage

Confirm source provenanceRequired

Source/provenance metadata is available.

Done

triage

Check metadata review statusRequired

Review metadata is missing.

Pending

verify

Review safety notesRequired

Safety notes are available.

Done

verify

Review privacy notes

Privacy notes are available.

Done

verify

Validate package integrity metadata

Package integrity metadata is missing.

Pending

rollout

Verify install payload and commandsRequired

Install payload is available.

Done

Blockers: Check metadata review status

Prerequisite readiness

3 prerequisites to line up before setup. Includes a review or approval gate.

0/3 ready

Install & runtime1Review & approval210 minutes

Safety & privacy surface

5 safety and 4 privacy notes across 4 risk areas. Review closely: network access.

4 areas

SafetyNetwork accessPDF Reader MCP can read local PDF paths and fetch remote PDF URLs unless runtime security settings restrict those sources.
SafetyLocal filesThe source supports directory allowlists, host allowlists, disabling URL sources, and SSRF checks for private IPs.
SafetyGeneralLarge or malformed PDFs can still be slow, memory-intensive, partially parsed, or return extraction errors despite size and timeout controls.
SafetyGeneralExtracted text, tables, images, and metadata may be incomplete or out of order for complex scanned, encrypted, or layout-heavy PDFs.
SafetyNetwork accessIf HTTP transport is enabled, bind and authenticate it carefully before exposing it beyond trusted local clients.
PrivacyData retentionPDFs can contain confidential text, embedded images, hidden metadata, author fields, comments, form data, signatures, and document history.
PrivacyNetwork accessLocal file paths, remote URLs, page selections, extracted text, base64 image data, table content, and metadata may be visible to the MCP client and model provider.
PrivacyNetwork accessRemote PDF fetching reveals requested URLs and request metadata to upstream hosts.
PrivacyLocal filesConfigure allowed directories and hosts before using the server with private documents, customer files, contracts, invoices, medical records, or regulated material.

Safety notes

PDF Reader MCP can read local PDF paths and fetch remote PDF URLs unless runtime security settings restrict those sources.
The source supports directory allowlists, host allowlists, disabling URL sources, and SSRF checks for private IPs.
Large or malformed PDFs can still be slow, memory-intensive, partially parsed, or return extraction errors despite size and timeout controls.
Extracted text, tables, images, and metadata may be incomplete or out of order for complex scanned, encrypted, or layout-heavy PDFs.
If HTTP transport is enabled, bind and authenticate it carefully before exposing it beyond trusted local clients.

Privacy notes

PDFs can contain confidential text, embedded images, hidden metadata, author fields, comments, form data, signatures, and document history.
Local file paths, remote URLs, page selections, extracted text, base64 image data, table content, and metadata may be visible to the MCP client and model provider.
Remote PDF fetching reveals requested URLs and request metadata to upstream hosts.
Configure allowed directories and hosts before using the server with private documents, customer files, contracts, invoices, medical records, or regulated material.

Prerequisites

Node.js 22.13 or newer and npx available to the MCP client runtime.
Approved local PDF directories or approved remote PDF hosts when processing sensitive material.
Review of copyright, document handling, and data retention policy before extracting PDF contents.

Schema details

Install type: cli
Troubleshooting: No

Source repository stats

Scope: Source repo

Collection metadata

Estimated setup: 10 minutes
Difficulty: beginner

Full copyable content

{
  "mcpServers": {
    "pdf-reader": {
      "command": "npx",
      "args": ["-y", "@sylphx/pdf-reader-mcp"]
    }
  }
}

About this resource

Content

PDF Reader MCP is a dedicated Model Context Protocol server for extracting content from PDFs. It exposes a read_pdf tool that can process multiple local files or remote URLs, select page ranges, return metadata and page counts, extract full text, include embedded images, and detect table-like structures.

The project is focused on PDF reading rather than broad document conversion. It is useful when Claude needs page-aware PDF evidence while keeping the extraction surface limited to PDF files.

Source Review

These sources were reviewed on 2026-06-06. Prefer the live repository, README, npm registry metadata, package metadata, server entrypoint, read_pdf handler, PDF loader, security configuration source, and license file for current install commands, source restrictions, extraction behavior, and licensing.

Features

npm package @sylphx/pdf-reader-mcp.
Stdio MCP server launched with npx -y @sylphx/pdf-reader-mcp.
Single read_pdf tool for one or more sources.
Local path and remote URL sources.
Page-range extraction with numbers or range strings.
Optional full text, metadata, page count, image, and table extraction.
Batched source and page processing to limit memory pressure.
Directory allowlists, host allowlists, URL disable flag, and private-IP SSRF guard.
MIT license.

Installation

Configure the stdio server in your MCP client:

{
  "mcpServers": {
    "pdf-reader": {
      "command": "npx",
      "args": ["-y", "@sylphx/pdf-reader-mcp"]
    }
  }
}

Replace the example directory and host with locations you trust before use. Keep MCP_PDF_ALLOW_HTTP set to false unless you have a specific, trusted plaintext HTTP source and understand the tampering risk. After restarting the MCP client, ask Claude to read only approved PDF paths or URLs and specify whether you need full text, metadata, page counts, images, tables, or particular page ranges.

Use Cases

Extract text from a specific page range in a PDF.
Read metadata and page count before deciding whether to process a document.
Pull embedded images from selected PDF pages.
Extract table-like content for review.
Batch several approved PDFs in one tool call.
Summarize a report, contract, manual, or invoice from extracted text.
Restrict the server to approved directories or remote hosts for document workflows.

Safety and Privacy

PDF Reader MCP can expose everything a PDF contains, including hidden metadata, embedded images, form fields, author information, comments, and text that was not obvious from a quick visual scan. Use directory and host allowlists for private workflows, and verify extraction quality before relying on important tables, page ranges, or scanned-document text.

Remote URL fetching is enabled by default in the reviewed source, with configuration available to disable it or restrict hosts. Treat remote PDF URLs, local paths, extracted text, image data, tables, and metadata as sensitive unless the document is approved for the model session.

Duplicate Check

Existing content includes MarkItDown, Kreuzberg, Markdownify, and research servers that can process PDFs among many other formats. This entry is distinct because it covers SylphxAI/pdf-reader-mcp, a dedicated PDF-only MCP server with page ranges, images, tables, metadata, local/URL sources, and configurable file/URL restrictions. No matching source URL or dedicated PDF Reader MCP entry was found in content/mcp.

#pdf #document-processing #extraction #tables #metadata

Source citations

Source methodology →

Add this badge to your README

Show that PDF Reader MCP is listed on HeyClaude. Paste this Markdown into your README — it renders the badge and links back to this page.

[![Listed on HeyClaude](https://heyclau.de/badge/mcp/pdf-reader-mcp.svg)](https://heyclau.de/entry/mcp/pdf-reader-mcp)

How it compares

PDF Reader MCP side by side with 3 alternatives on trust, install, platform support, and disclosed safety notes — all from reviewed registry metadata.

Field	PDF Reader MCP PDF-focused MCP server that lets Claude read one or more local or remote PDFs, extract full text, page ranges, metadata, page counts, embedded images, and table-like structures. Open dossier	AgentQL MCP Server AgentQL MCP server for extracting structured JSON from public webpages using a URL and natural-language extraction prompt. Open dossier	designlang MCP Server MCP server for extracting design systems from live websites, including design tokens, regions, components, contrast data, Tailwind themes, Figma variables, and prompt packs. Open dossier	MeiGen AI Design MCP Server MCP server and CLI for AI image and video generation with gallery search, prompt enhancement, model listing, local preferences, ComfyUI workflows, MeiGen Cloud, and OpenAI-compatible provider support. Open dossier
Next steps	Open dossier API JSON Open LLM MCP feed Open source Newsletter Claim listing	Open dossier API JSON Open LLM MCP feed Open source Newsletter Claim listing	Open dossier API JSON Open LLM MCP feed Open source Newsletter Claim listing	Open dossier API JSON Open LLM MCP feed Open source Newsletter Claim listing
Trust
Review status	Not reviewed	Not reviewed	Not reviewed	Not reviewed
Package trust	Package not verified	Package not verified	Package not verified	Package not verified
Source provenance	Source-backed	Source-backed	Source-backed	Source-backed
Submitter	oktofeesh1	oktofeesh1	oktofeesh1	oktofeesh1
Install risk	Review first	Review first	Review first	Review first
Notes	Safety ✓ Privacy ✓	Safety ✓ Privacy ✓	Safety ✓ Privacy ✓	Safety ✓ Privacy ✓
Brand	—	AgentQL	—	MeiGen AI Design MCP
Category	mcp	mcp	mcp	mcp
Source	Source-backed	Source-backed	Source-backed	Source-backed
Author	Sylphx	AgentQL	Manavarya09	MeiGen
Added	2026-06-06	2026-06-06	2026-06-05	2026-06-06
Platforms	Claude Code Codex Cursor Claude Desktop	Claude Code Claude Desktop	Claude Code Claude Desktop	Claude Code Claude Desktop
Harness	Claude Code Codex Cursor Claude Desktop	Claude Code Claude Desktop	Claude Code Claude Desktop	Claude Code Claude Desktop
Source repo	—	—	—	—
Safety notes	✓PDF Reader MCP can read local PDF paths and fetch remote PDF URLs unless runtime security settings restrict those sources. The source supports directory allowlists, host allowlists, disabling URL sources, and SSRF checks for private IPs. Large or malformed PDFs can still be slow, memory-intensive, partially parsed, or return extraction errors despite size and timeout controls. Extracted text, tables, images, and metadata may be incomplete or out of order for complex scanned, encrypted, or layout-heavy PDFs. If HTTP transport is enabled, bind and authenticate it carefully before exposing it beyond trusted local clients.	✓AgentQL MCP exposes one tool, `extract-web-data`, that sends a target URL and natural-language extraction prompt to the AgentQL API. The tool is intended for public webpages; do not use it to bypass access controls, scrape private pages, evade paywalls, or extract data where automated collection is prohibited. Web extraction can still trigger target-site rate limits, legal restrictions, robots guidance, or terms-of-service concerns. The source implementation uses AgentQL's query-data endpoint with fast mode, no screenshot capture, no scroll-to-bottom behavior, and no local browser cookies. Treat extracted output as untrusted web data that may include errors, stale content, ads, tracking text, or prompt-injection attempts.	✓designlang uses Playwright to crawl live pages and can capture DOM-derived styles, responsive behavior, interaction states, screenshots, and accessibility findings. Authenticated extraction options can use cookies, cookie files, headers, and custom user agents, so never pass production session cookies or credentials unless approved. Generated outputs may include design tokens, Tailwind config, shadcn variables, Figma variables, component anatomy, prompts, screenshots, reports, and cloned starter code. Commands such as apply, clone, sync, drift, visual-diff, and MCP extraction can read live websites and write files in the configured output or project directory. Review extracted prompt packs and generated code before using them in another agent workflow or committing them.	✓MeiGen can submit image and video generation jobs to external providers or a local ComfyUI backend. Image and video generation may spend credits, use paid APIs, or consume local GPU resources depending on the configured provider. Local reference images can be compressed and uploaded through the configured upload gateway before being sent to API providers. Video generation can take minutes, may time out at the MCP client, and should not be retried blindly because jobs or credits may already be in progress. ComfyUI workflow import and modification can run local workflows; review workflow JSON and custom nodes before use.
Privacy notes	✓PDFs can contain confidential text, embedded images, hidden metadata, author fields, comments, form data, signatures, and document history. Local file paths, remote URLs, page selections, extracted text, base64 image data, table content, and metadata may be visible to the MCP client and model provider. Remote PDF fetching reveals requested URLs and request metadata to upstream hosts. Configure allowed directories and hosts before using the server with private documents, customer files, contracts, invoices, medical records, or regulated material.	✓Target URLs, extraction prompts, API key-authenticated requests, and extracted structured data are sent to AgentQL's API. Extracted data can include personal data, copyrighted content, customer information, job postings, prices, social content, or other third-party material. AGENTQL_API_KEY should stay out of prompts, issues, logs, screenshots, and committed configuration files. Claude transcripts and downstream reports may retain extracted data, so avoid collecting information that is not approved for the model session.	✓URLs, page content, DOM text, CSS, screenshots, fonts, images, design tokens, cookies, headers, prompts, tool arguments, reports, and generated files may be visible to the MCP client and model provider. Authenticated or internal sites can expose product plans, unreleased UI, customer data, analytics identifiers, private brand assets, and implementation details. Output directories can retain extracted website data after the MCP session ends. Avoid running the server against private, paid, internal, or authenticated properties without legal and security approval.	✓API tokens, OpenAI-compatible credentials, provider base URLs, upload gateway URLs, ComfyUI URLs, preferences, recent generations, prompts, model IDs, and output paths can reveal private creative workflows. Uploaded reference images may become public URLs through the configured upload gateway before generation. Prompts, reference images, generated image URLs, generated video URLs, saved outputs, and favorite prompts can contain client work, brand plans, product images, personal likenesses, or unreleased campaigns. Keep secrets in MCP client environment configuration, review generated media before sharing, and remove local outputs when retention is no longer needed.
Prerequisites	Node.js 22.13 or newer and npx available to the MCP client runtime. Approved local PDF directories or approved remote PDF hosts when processing sensitive material. Review of copyright, document handling, and data retention policy before extracting PDF contents.	Node.js and npx available to the MCP client runtime. AgentQL API key from the AgentQL developer portal. Approved list of public webpages or domains Claude may query. Review of target site terms, robots guidance, rate limits, and data-use rules before extraction.	Node.js 20 or newer available to the MCP client runtime. Network access to the websites you plan to extract. Permission to crawl and analyze the target sites. Playwright or Chromium installation behavior reviewed for the local environment.	Node.js 18 or newer. MCP client that can launch a local Node stdio server. Optional MeiGen API token, OpenAI-compatible API key, or local ComfyUI workflow configuration for generation tools. Provider pricing, credits, model limits, and content policy reviewed before image or video generation.
Install	`npx -y @sylphx/pdf-reader-mcp`	`npx -y agentql-mcp`	`npx -y designlang mcp`	Run `npx -y meigen` from an MCP client, or use the upstream plugin/init commands after reviewing provider credentials and generated-media costs.
Config	`{ "mcpServers": { "pdf-reader": { "command": "npx", "args": ["-y", "@sylphx/pdf-reader-mcp"], "env": { "MCP_TRANSPORT": "stdio", "MCP_PDF_ALLOWED_DIRS": "/Users/you/Documents/approved-pdfs", "MCP_PDF_ALLOWED_HOSTS": "trusted.example.com", "MCP_PDF_ALLOW_HTTP": "false" } } } }`	`{ "mcpServers": { "agentql": { "command": "npx", "args": ["-y", "agentql-mcp"], "env": { "AGENTQL_API_KEY": "<your-agentql-api-key>" } } } }`	`{ "mcpServers": { "designlang": { "command": "npx", "args": ["-y", "designlang", "mcp", "--output-dir", "./design-extract-output"] } } }`	`{ "mcpServers": { "meigen": { "command": "npx", "args": ["-y", "meigen"], "env": { "MEIGEN_API_TOKEN": "{meigen-api-token}" } } } }`
Citations	Source repositorygithub.com 2026-07-21T07:17:01+00:00 Documentationgithub.com Submitted by oktofeesh12026-06-06 Source methodology →	Source repositorygithub.com 2026-07-21T07:17:01+00:00 Documentationgithub.com Submitted by oktofeesh12026-06-06 Source methodology →	Source repositorygithub.com 2026-07-21T07:17:01+00:00 Documentationgithub.com Submitted by oktofeesh12026-06-05 Source methodology →	Source repositorygithub.com 2026-07-21T07:17:01+00:00 Documentationraw.githubusercontent.com Submitted by oktofeesh12026-06-06 Source methodology →
Claim	Unclaimed	Unclaimed	Unclaimed	Unclaimed

Open 4 picks in the interactive comparison tool

Signals

Loading live community signals…

Citation facts

Review trust signals before you adopt

Source and provenance checks

Safety and privacy checks

Package and install checks

Compare-driven decision checks

CLI install

Balanced adoption plan

Pre-adoption checks

Security checks

Rollout

Evidence readiness matrix · balanced

Source provenance

Metadata review

Safety notes

Privacy notes

Package integrity

Install payload

Decision timeline · balanced

Confirm source provenanceRequired

Check metadata review statusRequired

Review safety notesRequired

Review privacy notes

Validate package integrity metadata

Verify install payload and commandsRequired

Prerequisite readiness

Safety & privacy surface

Safety notes

Privacy notes

Prerequisites

Schema details

About this resource

Content

Source Review

Features

Installation

Use Cases

Safety and Privacy

Duplicate Check

Source citations

Add this badge to your README

How it compares

Related resources

AgentQL MCP Server

designlang MCP Server

MeiGen AI Design MCP Server

Omnisearch MCP Server

Signals