RAG frameworks · tools · 12 picks

Best RAG frameworks

Frameworks for building retrieval-augmented generation pipelines and data-aware LLM apps.

Curated by @heyclaude-editors Updated 2026-06-19

Frameworks for building retrieval-augmented generation pipelines and data-aware LLM apps.

Compared at a glance

The top 5 picks side by side on trust, install, platform support, and disclosed notes — full rationale for each below.

Field	LlamaIndex Open-source framework for building agentic LLM applications over private data with ingestion, indexes, retrieval, RAG, tools, workflows, and evaluation. Open dossier	LangChain4j Idiomatic Java/JVM library for building LLM-powered applications with unified model APIs, tool calling, agentic workflows, RAG, chat memory, embedding stores, MCP client support, and Spring Boot, Quarkus, Helidon, and Micronaut integrations. Open dossier	AG2 Agent Framework Open-source Python AgentOS and multi-agent framework, evolved from AutoGen, for building conversable agents, group chats, swarms, human-in-the-loop workflows, tool use, RAG, code execution, and provider-backed agent systems. Open dossier	CAMEL-AI CAMEL Open-source Python multi-agent framework for building agent societies, role-playing agents, stateful ChatAgent workflows, RAG agents, synthetic data generation, MCP-enabled use cases, and research-scale agent experiments. Open dossier	Chroma Open-source AI data infrastructure for storing documents, embeddings, metadata, and retrieval indexes across local, self-hosted, and managed Chroma Cloud deployments. Open dossier
Trust
Install risk	Review first	Review first	Review first	Review first	Review first
Notes	Safety ✓ Privacy ✓	Safety ✓ Privacy ✓	Safety ✓ Privacy ✓	Safety ✓ Privacy ✓	Safety ✓ Privacy ✓
Category	tools	tools	tools	tools	tools
Source	source-backed	source-backed	source-backed	source-backed	source-backed
Author	LlamaIndex	LangChain4j	AG2	CAMEL-AI	Chroma
Added	2026-06-03	2026-06-18	2026-06-18	2026-06-18	2026-06-03
Platforms	CLI	CLI	CLI	CLI	CLI
Source repo	—	—	—	—	—
Safety notes	✓LlamaIndex retrieval, RAG, structured extraction, and agent workflows improve access to private data, but they do not prove that generated answers, retrieved context, or tool calls are correct or safe. Data connectors, readers, parsers, indexes, tools, query engines, workflows, and MCP integrations can access private files, SaaS systems, databases, APIs, and vector stores; review permissions before connecting them. Retrieved documents, metadata, parsed tables, user uploads, tool descriptions, and external connector results become model-facing context and can contain stale, malicious, or prompt-injection-like instructions. Persistent indexes, vector stores, document stores, and local storage directories can outlive the original experiment; define cleanup, retention, migration, and access-control rules before indexing sensitive data. Optional LlamaParse, LlamaCloud, or hosted document-agent workflows can upload documents or extracted content to hosted services and should be reviewed separately from local open-source framework use. Evaluation and observability results are quality signals, not proof that a RAG pipeline, agent, extraction workflow, or document workflow is production-ready.	✓LangChain4j can bind model calls to Java tools, MCP tools, RAG retrievers, and framework services. Treat each tool as application code with permissions, side effects, and audit requirements. The MCP tutorial supports stdio, Streamable HTTP, WebSocket, Docker stdio, and legacy HTTP/SSE transports. Review subprocess commands, Docker socket access, server URLs, and credentials before connecting agents. Use MCP tool filtering and tool-name mapping when a server exposes many tools or overlapping tool names; do not expose write-capable tools by default. RAG examples may read local directories, parse documents, and store embeddings in external vector stores. Scope ingestion paths and retention rules before indexing private data. The agentic module is documented as experimental, so teams should pin versions, test workflows, and avoid relying on unstable APIs for critical production behavior.	✓AG2 agents can converse, call tools, execute code, use retrieval systems, run browser workflows, and coordinate group chats; require explicit permissions and approval gates for high-impact actions. The upstream install docs and examples commonly involve provider credentials; keep API keys, config files, notebooks, and `.env` files out of commits and support tickets. Code execution, Docker, Jupyter, browser-use, and RAG extras can touch local files, network services, notebooks, databases, and external websites; scope them tightly before granting agent access. Multi-agent conversations can continue through nested chats, swarms, group chats, and custom reply handlers; define termination, escalation, retry, and human takeover behavior. Track the release roadmap before upgrading because deprecations and the v1.0 transition can change which APIs should be used for new work.	✓CAMEL agents can coordinate multi-step tasks, call tools, use web/search integrations, connect to MCP examples, and run with provider credentials; review tool permissions before giving agents write access or account access. Large-scale agent societies and role-playing workflows can generate high volumes of model calls, tool calls, logs, synthetic data, and intermediate artifacts; set budgets, rate limits, and stop conditions before long runs. RAG, document, media, browser, communication, and data-tool extras may access local files, third-party APIs, vector stores, notebooks, or generated datasets; isolate experiments from production systems. CAMEL examples include MCP-oriented use cases, but MCP does not make connected tools safe by default. Scope server permissions, credentials, filesystem access, and approval gates separately. Do not treat generated code, generated datasets, citations, research summaries, or multi-agent decisions as verified until they have been reviewed against source data and policy requirements.	✓Chroma can make retrieval easier, but vector, hybrid, full-text, and regex search results still require evaluation for relevance, freshness, permission fit, and hallucination risk. Retrieved documents, metadata, and embeddings can influence agent actions; review chunking, filters, collection boundaries, and prompt assembly before using results in automated workflows. Duplicate IDs, mismatched embedding dimensions, stale records, partial updates, and deleted-source drift can produce confusing or incorrect retrieval behavior if ingestion is not controlled. Metadata filters are useful access boundaries only when the application enforces them consistently; do not rely on model instructions alone to prevent cross-tenant or cross-project retrieval. Local and self-hosted deployments still need normal database operations including authentication, network exposure review, backups, resource limits, monitoring, and recovery tests. Chroma Cloud, embedding providers, and connected AI applications may add account, billing, availability, and organization-policy dependencies beyond the open-source database package.
Privacy notes	✓LlamaIndex workflows can process source documents, chunks, metadata, embeddings, prompts, retrieved context, generated answers, tool arguments, tool outputs, traces, evaluation datasets, and callback data. Model and embedding providers may receive document snippets, user questions, generated summaries, extracted fields, or metadata unless a local or approved private provider path is used. Connectors can ingest private repositories, tickets, PDFs, spreadsheets, databases, chats, notes, emails, or cloud files; verify that ingestion scope matches the user's authorization. Vector stores, persisted indexes, chat stores, document stores, and exported eval reports may retain data outside the source system's native permissions, deletion policy, and audit controls. Optional hosted parsing, OCR, extraction, indexing, or agent services should be assessed for upload scope, retention, residency, access controls, and incident response before processing confidential documents.	✓Prompts, chat memory, tool arguments, tool outputs, retrieved document chunks, embeddings, vector-store metadata, model responses, logs, and MCP traffic may include private application or customer data. Model providers, embedding providers, vector stores, MCP servers, framework logs, tracing systems, and Java application logs may observe or retain LangChain4j workflow data. Do not commit provider keys, MCP server credentials, vector database secrets, local document paths, generated traces, or raw RAG datasets. If request/response or MCP transport logging is enabled for debugging, review logs before sharing them because they can include prompts, tool payloads, retrieved content, and credentials.	✓Prompts, messages, tool arguments, tool outputs, code snippets, notebook state, retrieved documents, vector-store contents, provider responses, traces, and execution logs may contain sensitive user or workspace data. Do not expose secrets, API keys, private file paths, customer records, internal documents, database rows, or raw exceptions through agent messages, logs, notebooks, screenshots, or public examples. Provider extras and retrieval integrations can route data through OpenAI, Anthropic, Google, AWS, local model servers, databases, vector stores, browser automation, or other third-party services. If AG2 is used for code execution or browser automation, define which files, domains, credentials, downloads, screenshots, and logs can be read or retained.	✓Prompts, model responses, agent messages, tool arguments, tool outputs, retrieved documents, search results, logs, generated datasets, traces, and errors may include user or workspace data. Model providers, search providers, MCP servers, vector stores, web tools, document parsers, browser tools, and observability integrations may receive data from CAMEL workflows. Keep provider API keys, OAuth tokens, MCP server credentials, vector database URLs, generated logs, and synthetic datasets out of committed examples, screenshots, public issues, and shared notebooks. If `CAMEL_MODEL_LOG_ENABLED` or other logging/tracing integrations are enabled, review request/response logs and model configuration logs before sharing or retaining them.	✓Chroma collections may store source documents, document chunks, metadata, IDs, embeddings, multimodal references, query text, and retrieval results that can reveal sensitive project context. Embeddings can leak information about the original data and should be governed with the same retention, deletion, access-control, and backup policies as the documents they represent. Embedding providers, Chroma Cloud, hosted model routes, or application telemetry may receive document or query content depending on how ingestion and search are configured. Metadata can include user identifiers, source names, document provenance, internal labels, and permission fields; define redaction and minimization rules before ingestion. Retrieval logs, failed queries, evaluation traces, and agent transcripts can re-expose stored data outside Chroma, so downstream systems need their own retention and access policies.
Prerequisites	Python project and dependency manager for installing `llama-index`, `llama-index-core`, and the model, embedding, vector store, reader, or integration packages needed by the application. Approved data sources, file paths, SaaS connectors, databases, or document repositories to ingest, parse, index, and query. Model provider credentials, embedding provider credentials, local model configuration, or gateway configuration for generation, embeddings, reranking, and structured extraction. Reviewed storage backend for indexes, vector stores, document stores, chat stores, cache data, traces, and persisted retrieval artifacts.	Java/JVM project using Maven, Gradle, Spring Boot, Quarkus, Helidon, Micronaut, or a plain Java build. Selected LangChain4j modules for the model provider, embedding store, RAG pipeline, MCP transport, framework integration, or agentic workflow you need. Provider credentials, model endpoints, vector database credentials, MCP server URLs, or local stdio/Docker server commands stored outside source control. Version alignment between LangChain4j core modules, beta modules, framework integrations, and enterprise dependency constraints.	Python 3.10 or newer and a Python environment managed with pip, uv, or another package manager. Model provider credentials for the selected provider extra, such as OpenAI, Anthropic, Gemini, Bedrock, Mistral, Ollama, Groq, xAI, or another supported route. A secrets strategy for provider keys, AG2 config files, `.env` files, notebooks, and example `OAI_CONFIG_LIST`-style credentials. A reviewed execution boundary for code execution, Docker, Jupyter, browser-use, RAG, retrieval, database, and external tool extras.	Python 3.10 through 3.14 and an isolated Python environment managed with pip, uv, or another package manager. A configured model provider such as OpenAI or another provider supported by the selected CAMEL model route. Provider API keys, search credentials, vector database credentials, or tool-specific secrets stored outside source control. Optional extras for web tools, document tools, RAG, model platforms, storage backends, dev tools, or research tools only when those integrations are required.	Python, TypeScript, Rust, local server, self-hosted service, or Chroma Cloud path selected for the target AI application. Approved embedding model, embedding function, multimodal model, or precomputed embedding pipeline with known dimensionality and license terms. Collection design for document IDs, metadata schema, embedding dimensions, update behavior, deletion behavior, and retrieval filters before production ingestion. Storage, backup, retention, encryption, access-control, and deployment plan for local persistence, client-server mode, self-hosted services, or managed Chroma Cloud databases.
Install	—	`Add the needed dev.langchain4j Maven or Gradle modules from the official docs.`	`pip install 'ag2[openai]'`	`pip install camel-ai`	—
Config	—	—	—	—	—
Citations	Source repositorygithub.com 2026-06-18T13:46:08-07:00 Documentationdevelopers.llamaindex.ai Submitted by oktofeesh12026-06-03	Source repositorygithub.com 2026-06-18T13:46:08-07:00 Documentationdocs.langchain4j.dev	Source repositorygithub.com 2026-06-18T13:46:08-07:00 Documentationdocs.ag2.ai	Source repositorygithub.com 2026-06-18T13:46:08-07:00 Documentationdocs.camel-ai.org	Source repositorygithub.com 2026-06-18T13:46:08-07:00 Documentationdocs.trychroma.com Submitted by oktofeesh12026-06-03
Claim	Unclaimed	Unclaimed	Unclaimed	Unclaimed	Unclaimed

01
tools
LlamaIndex
Build agents, RAG pipelines, retrieval, indexing, and data-aware LLM apps.
Review firstSource-backedReview firstAdded 16d ago
Safety ✓ Privacy ✓
Why it made the cut
LlamaIndex is included because it has safety notes present, privacy notes present, source-backed source posture.
Reach for instead
If this will touch credentials, local files, or production systems, inspect the upstream source first.
02
tools
LangChain4j
Java/JVM LLM framework for agents, RAG, tool calling, MCP tool providers, vector stores, chat memory, and enterprise Java integrations.
Review firstSource-backedReview firstAdded 21h ago
Safety ✓ Privacy ✓
Why it made the cut
LangChain4j is included because it has safety notes present, privacy notes present, source-backed source posture.
Reach for instead
If this will touch credentials, local files, or production systems, inspect the upstream source first.
03
tools
AG2 Agent Framework
Build Python multi-agent systems with AG2, the open-source AgentOS evolved from AutoGen, including conversable agents, group chats, swarms, tools, human review, RAG, and code execution.
Review firstSource-backedReview firstAdded 21h ago
Safety ✓ Privacy ✓
Why it made the cut
AG2 Agent Framework is included because it has safety notes present, privacy notes present, source-backed source posture.
Reach for instead
If this will touch credentials, local files, or production systems, inspect the upstream source first.
04
tools
CAMEL-AI CAMEL
Python multi-agent framework for agent societies, ChatAgent workflows, RAG, tool use, MCP examples, data generation, and large-scale agent research.
Review firstSource-backedReview firstAdded 21h ago
Safety ✓ Privacy ✓
Why it made the cut
CAMEL-AI CAMEL is included because it has safety notes present, privacy notes present, source-backed source posture.
Reach for instead
If this will touch credentials, local files, or production systems, inspect the upstream source first.
05
tools
Chroma
Store embeddings and metadata for AI retrieval with local, self-hosted, or cloud Chroma.
Review firstSource-backedReview firstAdded 16d ago
Safety ✓ Privacy ✓
Why it made the cut
Chroma is included because it has safety notes present, privacy notes present, source-backed source posture.
Reach for instead
If this will touch credentials, local files, or production systems, inspect the upstream source first.
06
tools
LangChain
Build LLM apps and agents with LangChain's model interface, tools, middleware, RAG, streaming, memory, MCP adapters, LangGraph execution, and LangSmith tracing.
Review firstSource-backedReview firstAdded 21h ago
Safety ✓ Privacy ✓
Why it made the cut
LangChain is included because it has safety notes present, privacy notes present, source-backed source posture.
Reach for instead
If this will touch credentials, local files, or production systems, inspect the upstream source first.
07
tools
Milvus
Run scalable vector, sparse, and hybrid search for RAG and AI retrieval systems.
Review firstSource-backedReview firstAdded 16d ago
Safety ✓ Privacy ✓
Why it made the cut
Milvus is included because it has safety notes present, privacy notes present, source-backed source posture.
Reach for instead
If this will touch credentials, local files, or production systems, inspect the upstream source first.
08
tools
VoltAgent
TypeScript agent framework with workflows, MCP tools, memory, RAG, guardrails, evals, voice, and VoltOps observability.
Review firstSource-backedReview firstAdded 21h ago
Safety ✓ Privacy ✓
Why it made the cut
VoltAgent is included because it has safety notes present, privacy notes present, source-backed source posture.
Reach for instead
If this will touch credentials, local files, or production systems, inspect the upstream source first.
09
tools
Weaviate
Build AI retrieval systems with semantic search, hybrid search, RAG, and cloud-native deployment.
Review firstSource-backedReview firstAdded 16d ago
Safety ✓ Privacy ✓
Why it made the cut
Weaviate is included because it has safety notes present, privacy notes present, source-backed source posture.
Reach for instead
If this will touch credentials, local files, or production systems, inspect the upstream source first.
10
tools
Haystack
Build agents, RAG pipelines, search, retrieval, and tool-using LLM apps.
Review firstSource-backedReview firstAdded 16d ago
Safety ✓ Privacy ✓
Why it made the cut
Haystack is included because it has safety notes present, privacy notes present, source-backed source posture.
Reach for instead
If this will touch credentials, local files, or production systems, inspect the upstream source first.
11
tools
LanceDB
Store multimodal data and run vector, full-text, and SQL retrieval for AI applications.
Review firstSource-backedReview firstAdded 16d ago
Safety ✓ Privacy ✓
Why it made the cut
LanceDB is included because it has safety notes present, privacy notes present, source-backed source posture.
Reach for instead
If this will touch credentials, local files, or production systems, inspect the upstream source first.
12
tools
Ragas
Open-source evaluation framework for RAG and LLM application testing.
Review firstSource-backedReview firstAdded 16d ago
Safety ✓ Privacy ✓
Why it made the cut
Ragas is included because it has safety notes present, privacy notes present, source-backed source posture.
Reach for instead
If this will touch credentials, local files, or production systems, inspect the upstream source first.

Missing a pick? Propose an edit to this list — every change goes through the same review queue as new entries.

Suggest a pick

Weekly · Sundays

Get the weekly brief

One calm read on Claude workflows. Sundays. No tracking pixels.

Unsubscribe any time. No tracking pixels. No partner blasts.

Best RAG frameworks

Compared at a glance

LlamaIndex

LangChain4j

AG2 Agent Framework

CAMEL-AI CAMEL

Chroma

LangChain

Milvus

VoltAgent

Weaviate

Haystack

LanceDB

Ragas

Get the weekly brief