RAGFlow
Open-source RAG and agentic retrieval platform with DeepDoc document understanding, visual chunking, grounded citations, heterogeneous data-source ingestion, agent workflows, MCP support, code executor support, and Docker self-hosting.
Open the source and read safety notes before installing.
Safety notes
- RAGFlow is a multi-service RAG platform, not a small CLI. Review Docker services, exposed ports, persistent volumes, model-provider keys, parser settings, and update strategy before production use.
- The README notes x86 Docker image availability and separate guidance for ARM64 builds; verify architecture before deploying on ARM hosts.
- Deep document parsing, OCR, chunking, embeddings, reranking, agent workflows, MCP, and code executor features can process sensitive files and produce misleading outputs if retrieval quality is not tested.
- The code executor feature requires sandbox review. Use gVisor or another isolation plan before running generated or user-provided code.
- MCP support should be configured with localhost binding, API-key hygiene, dataset-level scoping, and read-only retrieval defaults unless a broader tool surface has been reviewed.
Privacy notes
- Uploaded documents, parsed chunks, OCR text, embeddings, dataset metadata, chat history, citations, agent workflow state, code executor inputs, MCP payloads, logs, and model responses may contain private or regulated data.
- Model providers, embedding providers, rerankers, synchronized data sources, object storage, databases, and MCP clients may receive data depending on deployment settings.
- Keep RAGFlow API keys, provider keys, service configuration, dataset IDs, document IDs, logs, backups, and generated citations out of prompts, public issues, screenshots, and committed examples.
- Define retention, deletion, access review, and export rules before ingesting customer, financial, legal, healthcare, source-code, or credential-bearing documents.
Prerequisites
- CPU with at least 4 cores, 16 GB RAM, 50 GB disk, Docker 24.0.0 or newer, and Docker Compose v2.26.1 or newer for the documented self-hosted path.
- Python 3.13 for source/development workflows.
- gVisor if the code executor sandbox feature will be used.
- Configured model-provider and embedding-provider keys in the documented service configuration.
- Dataset access policy for documents, images, scanned files, spreadsheets, web pages, structured data, and synchronized external sources before ingestion.
Schema details
- Install type
- cli
- Troubleshooting
- No
- Scope
- Source repo
- Estimated setup
- 60 minutes
- Difficulty
- advanced
- Website
- https://ragflow.io/
- Pricing
- freemium
- Disclosure
- editorial
- Application category
- DeveloperApplication
- Operating system
- Web
Full copyable content
git clone https://github.com/infiniflow/ragflow.git
cd ragflow/docker
docker compose -f docker-compose.yml up -dAbout this resource
Overview
RAGFlow is an open-source Retrieval-Augmented Generation engine and agentic context layer for LLM applications. It combines DeepDoc document understanding, visual and template-based chunking, grounded citations, model and embedding configuration, heterogeneous data-source ingestion, APIs, agent workflow features, MCP support, and Docker self-hosting.
This tools entry is distinct from the existing RAGFlow MCP server entry. The MCP entry focuses on connecting Claude and other MCP clients to a running RAGFlow deployment for retrieval. This entry covers the main RAGFlow platform that creates and operates the datasets, parsing pipeline, retrieval stack, agents, and deployment environment that the MCP server depends on.
Install
RAGFlow's documented self-host path uses Docker Compose from the repository's
docker directory:
git clone https://github.com/infiniflow/ragflow.git
cd ragflow/docker
docker compose -f docker-compose.yml up -d
The README also documents host requirements, vm.max_map_count, x86 Docker
image constraints, GPU mode for DeepDoc tasks, and model-provider configuration
in the service configuration template.
Core Capabilities
| Area | RAGFlow Coverage |
|---|---|
| Document understanding | DeepDoc-powered extraction for complicated unstructured formats |
| Chunking | Template-based chunking, visual chunking, and human intervention over chunks |
| Grounding | Traceable citations and reference views for grounded answers |
| Data sources | Word, slides, spreadsheets, text, images, scanned documents, structured data, web pages, and synchronized external sources |
| Retrieval | Multiple recall, fused reranking, embedding model configuration, and RAG APIs |
| Agents | Agentic workflow, agent capabilities, memory support, and code executor component according to README updates |
| MCP | Built-in MCP support and a separate MCP server path for retrieval from selected datasets |
| Deployment | Cloud service, Docker self-hosting, source development, Docker image builds, x86 images, and ARM64 build guidance |
MCP and OpenClaw Fit
RAGFlow is relevant to MCP searches in two ways. First, the main platform can be used as the knowledge and retrieval backend behind the existing RAGFlow MCP server entry. Second, the README records MCP and agentic workflow support in the main platform roadmap/updates.
The README also links a RAGFlow Skill on OpenClaw for accessing RAGFlow datasets. That makes RAGFlow a useful bridge across the MCP, agent, skill, and RAG clusters. The operational boundary is still the RAGFlow dataset: agents should only retrieve from dataset IDs and document IDs they are meant to see.
Use Cases
- Build a self-hosted RAG system over PDFs, spreadsheets, scanned files, web pages, and structured documents.
- Inspect and correct chunking before exposing retrieval to agents or users.
- Provide grounded citations and reference views for answer review.
- Connect Claude or other MCP clients to selected RAGFlow datasets through the existing RAGFlow MCP server.
- Evaluate DeepDoc parsing, OCR, reranking, and multi-recall retrieval quality.
- Build an internal context layer for agentic workflows, code execution, and enterprise LLM applications.
Source Review
Verified on 2026-06-18:
- GitHub reports
infiniflow/ragflowas an Apache-2.0 repository with active development, 83,000+ stars, and latest releasev0.26.1. - The README describes RAGFlow as an open-source RAG engine that fuses RAG with agent capabilities to create a context layer for LLMs.
- The README documents DeepDoc, template-based chunking, grounded citations, heterogeneous data-source support, automated RAG workflows, configurable LLMs and embedding models, multiple recall, fused reranking, APIs, and Docker self-hosting.
- The README's recent update list includes agentic workflow and MCP support, Python/JavaScript code executor support, memory for AI agents, multiple chat channels, and an official RAGFlow Skill on OpenClaw.
- The self-hosting section documents CPU/RAM/disk/Docker/Python prerequisites,
gVisor for code executor sandboxing,
vm.max_map_count, x86 image caveats, Docker Compose startup, status checks, and model API key setup. - The existing HeyClaude content already has a RAGFlow MCP server entry; no dedicated tools entry for the main RAGFlow platform was found.
Safety and Privacy
RAGFlow's output quality depends on ingestion quality, parser configuration, chunking, embedding choice, reranking, dataset permissions, and model-provider behavior. Treat every dataset as sensitive by default until access scope, retention, and retrieval evaluation are documented.
The platform can parse files, store embeddings, synchronize external sources, run agent workflows, expose MCP retrieval, and execute code when enabled. Keep provider keys and API keys scoped, isolate code execution, bind local services carefully, and test retrieval before allowing agents to answer from private datasets.
Duplicate Check
Checked current content/tools/, content/mcp/, content/agents/,
content/skills/, guides, collections, open pull requests, and repository-wide
content for infiniflow/ragflow, RAGFlow, RAGFlow RAG engine, RAGFlow MCP,
RAGFlow agents, DeepDoc, agentic retrieval, context engine, self-hosted RAG
platform, and RAGFlow OpenClaw skill. Existing content includes
content/mcp/ragflow-mcp-server.mdx, which covers the built-in MCP retrieval
server for a running RAGFlow deployment. No dedicated RAGFlow tools entry for
the main platform, exact source URL duplicate in tools, target file, or open
duplicate PR was found.
Source citations
Add this badge to your README
How it compares
RAGFlow side by side with 3 alternatives on trust, install, platform support, and disclosed safety notes — all from reviewed registry metadata.
| Field | RAGFlow Open-source RAG and agentic retrieval platform with DeepDoc document understanding, visual chunking, grounded citations, heterogeneous data-source ingestion, agent workflows, MCP support, code executor support, and Docker self-hosting. Open dossier | AnythingLLM Local-first AI application for private chat, document RAG, workspace agents, MCP-compatible tools, model routing, memories, scheduled tasks, multimodal workflows, multi-user Docker deployments, and self-hosted agent automation. Open dossier | Open WebUI Self-hosted AI platform and web UI for Ollama, OpenAI-compatible APIs, RAG, Python function tools, model builder workflows, artifacts, web search, vector databases, enterprise auth, observability, plugins, and MCP-adjacent OpenAPI integrations. Open dossier | Dify Production-ready LLM app and agentic workflow platform with visual workflows, RAG pipelines, agent capabilities, model management, observability, prompt IDE, APIs, Dify Cloud, and self-hosted Docker Compose deployment. Open dossier |
|---|---|---|---|---|
| Trust | ||||
| Install risk | Review first | Review first | Review first | Review first |
| Notes | Safety ✓ Privacy ✓ | Safety ✓ Privacy ✓ | Safety ✓ Privacy ✓ | Safety ✓ Privacy ✓ |
| Category | tools | tools | tools | tools |
| Source | source-backed | source-backed | source-backed | source-backed |
| Author | InfinityFlow | Mintplex Labs | Open WebUI | LangGenius |
| Added | 2026-06-18 | 2026-06-18 | 2026-06-18 | 2026-06-18 |
| Platforms | CLI | CLI | CLI | CLI |
| Source repo | — | — | — | — |
| Safety notes | ✓RAGFlow is a multi-service RAG platform, not a small CLI. Review Docker services, exposed ports, persistent volumes, model-provider keys, parser settings, and update strategy before production use. The README notes x86 Docker image availability and separate guidance for ARM64 builds; verify architecture before deploying on ARM hosts. Deep document parsing, OCR, chunking, embeddings, reranking, agent workflows, MCP, and code executor features can process sensitive files and produce misleading outputs if retrieval quality is not tested. The code executor feature requires sandbox review. Use gVisor or another isolation plan before running generated or user-provided code. MCP support should be configured with localhost binding, API-key hygiene, dataset-level scoping, and read-only retrieval defaults unless a broader tool surface has been reviewed. | ✓AnythingLLM can run agents, scheduled tasks, MCP-compatible tools, browser-like workspace actions, developer APIs, and external model calls; scope tools and credentials before enabling them for users. The upstream Docker guide includes examples that add the SYS_ADMIN capability to the container. Review whether that capability is acceptable for the host before copying production run commands. Multi-user Docker deployments need normal production controls: authentication, TLS, network isolation, secret management, persistent-volume ownership, backups, and upgrade planning. Agent tools, custom agents, model routing, memories, and scheduled tasks can change behavior over time; use least privilege, logging, review gates, and rollback plans for write-capable workflows. Localhost services such as Ollama, Chroma, LocalAI, or LM Studio may need Docker host routing adjustments; avoid exposing local provider ports wider than intended. | ✓Open WebUI can run Python function-calling tools, RAG ingestion, web search, web browsing, image generation, plugins, and model/provider integrations; review each capability before enabling it for untrusted users. Docker examples expose web ports and persistent volumes. Mount persistent data, set admin/auth controls, and avoid treating demo defaults as production hardening. Python function tools and plugin pipelines can execute application logic and access configured services. Restrict tool creation and plugin installation to trusted administrators. RAG and web browsing can ingest local documents, URLs, cloud files, and extracted text; test indexing quality and permissions before exposing private corpora to users. Open WebUI uses a custom Open WebUI License with branding restrictions and enterprise-license exceptions. Verify license terms before redistribution, white-labeling, or commercial deployment. | ✓Dify can orchestrate workflows, RAG pipelines, agents, tools, APIs, model providers, and production application endpoints; review tool permissions and user-triggered actions before exposing apps. Self-hosted deployments need normal production controls: authentication, TLS, network isolation, secret management, backups, database maintenance, object storage policy, and upgrade planning. Agent and workflow nodes can call external tools, model providers, HTTP APIs, search tools, and custom integrations; apply least privilege and approval gates for write actions. Enterprise, marketplace, cloud, and modified-license terms should be reviewed before using Dify as a multi-tenant service or white-labeled frontend. Prompt IDE changes, workflow edits, model-provider changes, and dataset updates can alter production behavior; use versioning, staged releases, and rollback paths. |
| Privacy notes | ✓Uploaded documents, parsed chunks, OCR text, embeddings, dataset metadata, chat history, citations, agent workflow state, code executor inputs, MCP payloads, logs, and model responses may contain private or regulated data. Model providers, embedding providers, rerankers, synchronized data sources, object storage, databases, and MCP clients may receive data depending on deployment settings. Keep RAGFlow API keys, provider keys, service configuration, dataset IDs, document IDs, logs, backups, and generated citations out of prompts, public issues, screenshots, and committed examples. Define retention, deletion, access review, and export rules before ingesting customer, financial, legal, healthcare, source-code, or credential-bearing documents. | ✓Uploaded documents, parsed chunks, embeddings, workspace memories, prompts, chat history, agent state, scheduled task inputs, MCP payloads, provider responses, logs, and API calls may contain sensitive data. The README documents anonymous telemetry and an opt-out through DISABLE_TELEMETRY=true or the in-app privacy setting; review this before using regulated or confidential data. Even with telemetry disabled, outbound calls may still go to configured LLMs, embedding models, vector databases, external tools, cdn.anythingllm.com, GitHub, or GitHubusercontent depending on the deployment. Keep provider keys, JWT secrets, workspace invite links, storage paths, private documents, and generated citations out of public prompts, screenshots, issues, and examples. | ✓Chats, prompts, uploaded files, document chunks, embeddings, vector metadata, web search results, browser-fetched pages, Python tool inputs, plugin outputs, voice/video data, logs, metrics, and traces may contain private data. Configured model providers, vector databases, document extraction engines, web search providers, image providers, object storage, Redis, auth providers, and observability backends may receive user data. Keep provider keys, OAuth/LDAP/SSO secrets, database URLs, object storage keys, plugin credentials, uploaded files, RAG indexes, and OpenTelemetry exports out of public repos and screenshots. Define retention, deletion, tenant separation, group permissions, export policy, and audit review before using Open WebUI as a shared internal workspace. | ✓Prompts, uploaded documents, knowledge-base chunks, embeddings, workflow variables, tool arguments, tool results, API requests, model responses, logs, annotations, and observability data may contain sensitive user or business data. Do not store API keys, database credentials, private documents, customer records, regulated data, or internal URLs in examples, public apps, logs, screenshots, or shared prompts. Review data paths for every model provider, embedding provider, reranker, tool, observability integration, storage backend, and Dify Cloud or self-hosted deployment component. RAG and knowledge-base features need deletion, retention, access control, source freshness, and permission filtering policies before ingesting private corpora. |
| Prerequisites |
|
|
|
|
| Install | | | | |
| Config | — | — | — | — |
| Citations | ||||
| Claim | Unclaimed | Unclaimed | Unclaimed | Unclaimed |
Featured in
Signals
Loading live community signals…
A short, calm digest of reviewed Claude resources. Unsubscribe any time.