Skip to main content
toolsSource-backedReview first Safety Privacy

RAGFlow

Open-source RAG and agentic retrieval platform with DeepDoc document understanding, visual chunking, grounded citations, heterogeneous data-source ingestion, agent workflows, MCP support, code executor support, and Docker self-hosting.

by InfinityFlow·added 2026-06-18·
HarnessCLI
Review first review before installing

Open the source and read safety notes before installing.

Safety notes

  • RAGFlow is a multi-service RAG platform, not a small CLI. Review Docker services, exposed ports, persistent volumes, model-provider keys, parser settings, and update strategy before production use.
  • The README notes x86 Docker image availability and separate guidance for ARM64 builds; verify architecture before deploying on ARM hosts.
  • Deep document parsing, OCR, chunking, embeddings, reranking, agent workflows, MCP, and code executor features can process sensitive files and produce misleading outputs if retrieval quality is not tested.
  • The code executor feature requires sandbox review. Use gVisor or another isolation plan before running generated or user-provided code.
  • MCP support should be configured with localhost binding, API-key hygiene, dataset-level scoping, and read-only retrieval defaults unless a broader tool surface has been reviewed.

Privacy notes

  • Uploaded documents, parsed chunks, OCR text, embeddings, dataset metadata, chat history, citations, agent workflow state, code executor inputs, MCP payloads, logs, and model responses may contain private or regulated data.
  • Model providers, embedding providers, rerankers, synchronized data sources, object storage, databases, and MCP clients may receive data depending on deployment settings.
  • Keep RAGFlow API keys, provider keys, service configuration, dataset IDs, document IDs, logs, backups, and generated citations out of prompts, public issues, screenshots, and committed examples.
  • Define retention, deletion, access review, and export rules before ingesting customer, financial, legal, healthcare, source-code, or credential-bearing documents.

Prerequisites

  • CPU with at least 4 cores, 16 GB RAM, 50 GB disk, Docker 24.0.0 or newer, and Docker Compose v2.26.1 or newer for the documented self-hosted path.
  • Python 3.13 for source/development workflows.
  • gVisor if the code executor sandbox feature will be used.
  • Configured model-provider and embedding-provider keys in the documented service configuration.
  • Dataset access policy for documents, images, scanned files, spreadsheets, web pages, structured data, and synchronized external sources before ingestion.

Schema details

Install type
cli
Troubleshooting
No
Source repository stats
Scope
Source repo
Collection metadata
Estimated setup
60 minutes
Difficulty
advanced
Tool listing metadata
Pricing
freemium
Disclosure
editorial
Application category
DeveloperApplication
Operating system
Web
Full copyable content
git clone https://github.com/infiniflow/ragflow.git
cd ragflow/docker
docker compose -f docker-compose.yml up -d

About this resource

Overview

RAGFlow is an open-source Retrieval-Augmented Generation engine and agentic context layer for LLM applications. It combines DeepDoc document understanding, visual and template-based chunking, grounded citations, model and embedding configuration, heterogeneous data-source ingestion, APIs, agent workflow features, MCP support, and Docker self-hosting.

This tools entry is distinct from the existing RAGFlow MCP server entry. The MCP entry focuses on connecting Claude and other MCP clients to a running RAGFlow deployment for retrieval. This entry covers the main RAGFlow platform that creates and operates the datasets, parsing pipeline, retrieval stack, agents, and deployment environment that the MCP server depends on.

Install

RAGFlow's documented self-host path uses Docker Compose from the repository's docker directory:

git clone https://github.com/infiniflow/ragflow.git
cd ragflow/docker
docker compose -f docker-compose.yml up -d

The README also documents host requirements, vm.max_map_count, x86 Docker image constraints, GPU mode for DeepDoc tasks, and model-provider configuration in the service configuration template.

Core Capabilities

Area RAGFlow Coverage
Document understanding DeepDoc-powered extraction for complicated unstructured formats
Chunking Template-based chunking, visual chunking, and human intervention over chunks
Grounding Traceable citations and reference views for grounded answers
Data sources Word, slides, spreadsheets, text, images, scanned documents, structured data, web pages, and synchronized external sources
Retrieval Multiple recall, fused reranking, embedding model configuration, and RAG APIs
Agents Agentic workflow, agent capabilities, memory support, and code executor component according to README updates
MCP Built-in MCP support and a separate MCP server path for retrieval from selected datasets
Deployment Cloud service, Docker self-hosting, source development, Docker image builds, x86 images, and ARM64 build guidance

MCP and OpenClaw Fit

RAGFlow is relevant to MCP searches in two ways. First, the main platform can be used as the knowledge and retrieval backend behind the existing RAGFlow MCP server entry. Second, the README records MCP and agentic workflow support in the main platform roadmap/updates.

The README also links a RAGFlow Skill on OpenClaw for accessing RAGFlow datasets. That makes RAGFlow a useful bridge across the MCP, agent, skill, and RAG clusters. The operational boundary is still the RAGFlow dataset: agents should only retrieve from dataset IDs and document IDs they are meant to see.

Use Cases

  • Build a self-hosted RAG system over PDFs, spreadsheets, scanned files, web pages, and structured documents.
  • Inspect and correct chunking before exposing retrieval to agents or users.
  • Provide grounded citations and reference views for answer review.
  • Connect Claude or other MCP clients to selected RAGFlow datasets through the existing RAGFlow MCP server.
  • Evaluate DeepDoc parsing, OCR, reranking, and multi-recall retrieval quality.
  • Build an internal context layer for agentic workflows, code execution, and enterprise LLM applications.

Source Review

Verified on 2026-06-18:

  • GitHub reports infiniflow/ragflow as an Apache-2.0 repository with active development, 83,000+ stars, and latest release v0.26.1.
  • The README describes RAGFlow as an open-source RAG engine that fuses RAG with agent capabilities to create a context layer for LLMs.
  • The README documents DeepDoc, template-based chunking, grounded citations, heterogeneous data-source support, automated RAG workflows, configurable LLMs and embedding models, multiple recall, fused reranking, APIs, and Docker self-hosting.
  • The README's recent update list includes agentic workflow and MCP support, Python/JavaScript code executor support, memory for AI agents, multiple chat channels, and an official RAGFlow Skill on OpenClaw.
  • The self-hosting section documents CPU/RAM/disk/Docker/Python prerequisites, gVisor for code executor sandboxing, vm.max_map_count, x86 image caveats, Docker Compose startup, status checks, and model API key setup.
  • The existing HeyClaude content already has a RAGFlow MCP server entry; no dedicated tools entry for the main RAGFlow platform was found.

Safety and Privacy

RAGFlow's output quality depends on ingestion quality, parser configuration, chunking, embedding choice, reranking, dataset permissions, and model-provider behavior. Treat every dataset as sensitive by default until access scope, retention, and retrieval evaluation are documented.

The platform can parse files, store embeddings, synchronize external sources, run agent workflows, expose MCP retrieval, and execute code when enabled. Keep provider keys and API keys scoped, isolate code execution, bind local services carefully, and test retrieval before allowing agents to answer from private datasets.

Duplicate Check

Checked current content/tools/, content/mcp/, content/agents/, content/skills/, guides, collections, open pull requests, and repository-wide content for infiniflow/ragflow, RAGFlow, RAGFlow RAG engine, RAGFlow MCP, RAGFlow agents, DeepDoc, agentic retrieval, context engine, self-hosted RAG platform, and RAGFlow OpenClaw skill. Existing content includes content/mcp/ragflow-mcp-server.mdx, which covers the built-in MCP retrieval server for a running RAGFlow deployment. No dedicated RAGFlow tools entry for the main platform, exact source URL duplicate in tools, target file, or open duplicate PR was found.

Source citations

Add this badge to your README

Show that RAGFlow is listed on HeyClaude. Paste this Markdown into your README — it renders the badge and links back to this page.

Listed on HeyClaude
[![Listed on HeyClaude](https://heyclau.de/badge/tools/ragflow.svg)](https://heyclau.de/entry/tools/ragflow)

How it compares

RAGFlow side by side with 3 alternatives on trust, install, platform support, and disclosed safety notes — all from reviewed registry metadata.

FieldRAGFlow

Open-source RAG and agentic retrieval platform with DeepDoc document understanding, visual chunking, grounded citations, heterogeneous data-source ingestion, agent workflows, MCP support, code executor support, and Docker self-hosting.

Open dossier
AnythingLLM

Local-first AI application for private chat, document RAG, workspace agents, MCP-compatible tools, model routing, memories, scheduled tasks, multimodal workflows, multi-user Docker deployments, and self-hosted agent automation.

Open dossier
Open WebUI

Self-hosted AI platform and web UI for Ollama, OpenAI-compatible APIs, RAG, Python function tools, model builder workflows, artifacts, web search, vector databases, enterprise auth, observability, plugins, and MCP-adjacent OpenAPI integrations.

Open dossier
Dify

Production-ready LLM app and agentic workflow platform with visual workflows, RAG pipelines, agent capabilities, model management, observability, prompt IDE, APIs, Dify Cloud, and self-hosted Docker Compose deployment.

Open dossier
Trust
Install riskReview firstReview firstReview firstReview first
Notes Safety Privacy Safety Privacy Safety Privacy Safety Privacy
Categorytoolstoolstoolstools
Sourcesource-backedsource-backedsource-backedsource-backed
AuthorInfinityFlowMintplex LabsOpen WebUILangGenius
Added2026-06-182026-06-182026-06-182026-06-18
Platforms
CLI
CLI
CLI
CLI
Source repo
Safety notesRAGFlow is a multi-service RAG platform, not a small CLI. Review Docker services, exposed ports, persistent volumes, model-provider keys, parser settings, and update strategy before production use. The README notes x86 Docker image availability and separate guidance for ARM64 builds; verify architecture before deploying on ARM hosts. Deep document parsing, OCR, chunking, embeddings, reranking, agent workflows, MCP, and code executor features can process sensitive files and produce misleading outputs if retrieval quality is not tested. The code executor feature requires sandbox review. Use gVisor or another isolation plan before running generated or user-provided code. MCP support should be configured with localhost binding, API-key hygiene, dataset-level scoping, and read-only retrieval defaults unless a broader tool surface has been reviewed.AnythingLLM can run agents, scheduled tasks, MCP-compatible tools, browser-like workspace actions, developer APIs, and external model calls; scope tools and credentials before enabling them for users. The upstream Docker guide includes examples that add the SYS_ADMIN capability to the container. Review whether that capability is acceptable for the host before copying production run commands. Multi-user Docker deployments need normal production controls: authentication, TLS, network isolation, secret management, persistent-volume ownership, backups, and upgrade planning. Agent tools, custom agents, model routing, memories, and scheduled tasks can change behavior over time; use least privilege, logging, review gates, and rollback plans for write-capable workflows. Localhost services such as Ollama, Chroma, LocalAI, or LM Studio may need Docker host routing adjustments; avoid exposing local provider ports wider than intended.Open WebUI can run Python function-calling tools, RAG ingestion, web search, web browsing, image generation, plugins, and model/provider integrations; review each capability before enabling it for untrusted users. Docker examples expose web ports and persistent volumes. Mount persistent data, set admin/auth controls, and avoid treating demo defaults as production hardening. Python function tools and plugin pipelines can execute application logic and access configured services. Restrict tool creation and plugin installation to trusted administrators. RAG and web browsing can ingest local documents, URLs, cloud files, and extracted text; test indexing quality and permissions before exposing private corpora to users. Open WebUI uses a custom Open WebUI License with branding restrictions and enterprise-license exceptions. Verify license terms before redistribution, white-labeling, or commercial deployment.Dify can orchestrate workflows, RAG pipelines, agents, tools, APIs, model providers, and production application endpoints; review tool permissions and user-triggered actions before exposing apps. Self-hosted deployments need normal production controls: authentication, TLS, network isolation, secret management, backups, database maintenance, object storage policy, and upgrade planning. Agent and workflow nodes can call external tools, model providers, HTTP APIs, search tools, and custom integrations; apply least privilege and approval gates for write actions. Enterprise, marketplace, cloud, and modified-license terms should be reviewed before using Dify as a multi-tenant service or white-labeled frontend. Prompt IDE changes, workflow edits, model-provider changes, and dataset updates can alter production behavior; use versioning, staged releases, and rollback paths.
Privacy notesUploaded documents, parsed chunks, OCR text, embeddings, dataset metadata, chat history, citations, agent workflow state, code executor inputs, MCP payloads, logs, and model responses may contain private or regulated data. Model providers, embedding providers, rerankers, synchronized data sources, object storage, databases, and MCP clients may receive data depending on deployment settings. Keep RAGFlow API keys, provider keys, service configuration, dataset IDs, document IDs, logs, backups, and generated citations out of prompts, public issues, screenshots, and committed examples. Define retention, deletion, access review, and export rules before ingesting customer, financial, legal, healthcare, source-code, or credential-bearing documents.Uploaded documents, parsed chunks, embeddings, workspace memories, prompts, chat history, agent state, scheduled task inputs, MCP payloads, provider responses, logs, and API calls may contain sensitive data. The README documents anonymous telemetry and an opt-out through DISABLE_TELEMETRY=true or the in-app privacy setting; review this before using regulated or confidential data. Even with telemetry disabled, outbound calls may still go to configured LLMs, embedding models, vector databases, external tools, cdn.anythingllm.com, GitHub, or GitHubusercontent depending on the deployment. Keep provider keys, JWT secrets, workspace invite links, storage paths, private documents, and generated citations out of public prompts, screenshots, issues, and examples.Chats, prompts, uploaded files, document chunks, embeddings, vector metadata, web search results, browser-fetched pages, Python tool inputs, plugin outputs, voice/video data, logs, metrics, and traces may contain private data. Configured model providers, vector databases, document extraction engines, web search providers, image providers, object storage, Redis, auth providers, and observability backends may receive user data. Keep provider keys, OAuth/LDAP/SSO secrets, database URLs, object storage keys, plugin credentials, uploaded files, RAG indexes, and OpenTelemetry exports out of public repos and screenshots. Define retention, deletion, tenant separation, group permissions, export policy, and audit review before using Open WebUI as a shared internal workspace.Prompts, uploaded documents, knowledge-base chunks, embeddings, workflow variables, tool arguments, tool results, API requests, model responses, logs, annotations, and observability data may contain sensitive user or business data. Do not store API keys, database credentials, private documents, customer records, regulated data, or internal URLs in examples, public apps, logs, screenshots, or shared prompts. Review data paths for every model provider, embedding provider, reranker, tool, observability integration, storage backend, and Dify Cloud or self-hosted deployment component. RAG and knowledge-base features need deletion, retention, access control, source freshness, and permission filtering policies before ingesting private corpora.
Prerequisites
  • CPU with at least 4 cores, 16 GB RAM, 50 GB disk, Docker 24.0.0 or newer, and Docker Compose v2.26.1 or newer for the documented self-hosted path.
  • Python 3.13 for source/development workflows.
  • gVisor if the code executor sandbox feature will be used.
  • Configured model-provider and embedding-provider keys in the documented service configuration.
  • Docker for the documented self-hosted path, or the desktop application for a local workstation install.
  • At least the upstream minimum host resources, with disk sized for documents, embeddings, vector storage, models, logs, and backups.
  • A local or remote LLM provider, embedding provider, and optional speech or image models for the workflows the workspace will run.
  • A storage, backup, retention, and access-control plan before ingesting private documents or opening a multi-user Docker instance.
  • Python 3.11 or 3.12 for pip installation, or Docker/Kubernetes for container deployment.
  • Ollama, OpenAI-compatible endpoint, OpenAI API key, or another configured model provider.
  • Persistent storage for the application database and uploaded/RAG content; Docker users must mount `/app/backend/data` to avoid data loss.
  • Optional vector database, document extraction, web search, image generation, speech, enterprise auth, object storage, Redis, or observability services depending on enabled features.
  • Docker and Docker Compose for the documented self-hosted quick start, or a Dify Cloud workspace.
  • At least the upstream minimum CPU and memory resources for local deployment.
  • Model provider credentials for the LLMs, embedding models, rerankers, or API-compatible routes the application will use.
  • Storage, database, vector, observability, network, and backup planning before hosting real user data.
Install
docker compose -f docker-compose.yml up -d
docker pull mintplexlabs/anythingllm
pip install open-webui
docker compose up -d
Config
Citations
ClaimUnclaimedUnclaimedUnclaimedUnclaimed

Signals

Loading live community signals…

More like this, weekly

A short, calm digest of reviewed Claude resources. Unsubscribe any time.