Build a Local-First AI Developer Stack

Run the parts of your AI dev workflow that touch your code and data — tools, memory, and auxiliary models — on infrastructure you control, while still using Claude as the orchestrator. A practical architecture for a self-hosted, privacy-first developer stack.

by dpdanpittman·added 2026-06-02·

Claude Code

HarnessClaude Code

Command center

Source

Review first

Review safety and privacy notes before installing or copying commands.

Safety notes Privacy notes

Install & copy

## TL;DR

A **local-first AI developer stack** keeps the parts that touch your code and data — tools, memory, and auxiliary models — on hardware you control, instead of scattering them across SaaS. You can still use a frontier model like Claude as the orchestrator, but the MCP servers it calls, the memory it reads, and any delegated or offline model work all run on your own box.

**The payoff:** better privacy (your data stays on your network), no per-tool lock-in, and a stack that keeps working on your own infrastructure.

> **What you'll build:** a local model runtime (Ollama) + a self-hosted fleet of MCP servers reachable over HTTP + persistent local memory + Claude Code wired to all of it, optionally reachable from any dev machine over a private network.

## Prerequisites & Requirements

- [ ] {"task": "A host with enough RAM/VRAM for local models", "description": "16GB+ for small quantized models; a GPU helps for larger ones"}
- [ ] {"task": "Node.js 18+ and Python 3.10+ (with uv)", "description": "Runtimes for the common MCP servers"}
- [ ] {"task": "Claude Code or another MCP client", "description": "Acts as the orchestrator that calls your local tools"}
- [ ] {"task": "Optional: Docker/Kubernetes + a private network", "description": "To host a server fleet and reach it from other machines (e.g., Tailscale)"}

## Core Concepts Explained

### What "local-first" actually means

Local-first is not the same as fully air-gapped. In this architecture a frontier orchestrator (Claude Code) still calls a cloud model — what's *local* is everything around it: the tools, the memory, your data, and any auxiliary models. That gets you most of the privacy and durability benefits without giving up frontier reasoning. If you need true locality (offline / air-gapped), you swap the orchestrator for a **local model loop** — a local LLM via Ollama driving an MCP-capable client — and accept that smaller models are less capable.

### The four layers

1. **Model runtime** — a local inference server ([Ollama](https://ollama.com)) for delegation and offline work.
2. **Tool layer** — [MCP](https://modelcontextprotocol.io) servers, self-hosted and exposed over HTTP so any client or machine can reach them.
3. **Memory** — a memory/knowledge-graph MCP server persisting to local disk.
4. **Orchestrator** — Claude Code (or a local agent) that drives the loop, plus lifecycle hooks for automation.

## Step-by-Step Implementation Guide

1. **Install a local model runtime.** Install [Ollama](https://github.com/ollama/ollama), pull a small instruct model, and confirm the local API is up. Ollama serves on `localhost:11434` by default; use the model for cheap delegation (summarizing logs, first-pass drafts) and as your offline fallback.

   ```bash
   # Pull a model and run it interactively
   ollama pull <model>
   ollama run <model>

   # Inspect local state
   ollama list   # installed models
   ollama ps     # currently loaded models

   # Run the API server explicitly (it also starts on demand)
   ollama serve  # listens on localhost:11434
   ```

2. **Pick your MCP servers.** Start from the official [Model Context Protocol servers](https://github.com/modelcontextprotocol/servers) — `filesystem`, `git`, `memory`, `fetch`, and friends. These run as local stdio processes.

3. **Self-host the fleet over HTTP.** stdio is local-only, so bridge it: [supergateway](https://github.com/supercorp-ai/supergateway) wraps a stdio MCP server as a streamable HTTP endpoint. To run many at once on one box (one port per server), use a hub like [mcp-supergateway-hub](https://github.com/dpdanpittman/mcp-supergateway-hub).

4. **Add persistent local memory.** Run the `memory` MCP server with its store pointed at local disk. This is the knowledge graph your agent reads and writes across sessions — and it never leaves your machine.

5. **Wire the orchestrator.** Point Claude Code at each server's endpoint with `claude mcp add --transport http`. The HTTP transport is the recommended way to connect remote/networked MCP servers. Add `--scope` to control where the config is stored (`local` is the default; `project` writes a shareable `.mcp.json`; `user` makes it available across all your projects), and pass auth as an HTTP header when your gateway requires it.

   ```bash
   # Add a self-hosted HTTP MCP server (default = local scope)
   claude mcp add --transport stdio memory -- npx -y @modelcontextprotocol/server-memory

   # Make it available across all your projects
   claude mcp add --transport stdio memory --scope user -- npx -y @modelcontextprotocol/server-memory

   # Front your gateway with auth and pass a bearer token
   claude mcp add --transport http secure-fs https://mcp.box.ts.net/mcp \
     --header "Authorization: Bearer your-token"

   # Verify, inspect, and manage
   claude mcp list
   claude mcp get memory
   claude mcp remove memory
   ```

   You can also add a server from JSON with `claude mcp add-json`. HTTP entries use `"type": "http"` (the spec name `streamable-http` is accepted as an alias) with a `url` field and an optional `headers` object; `.mcp.json` supports `${VAR}` / `${VAR:-default}` expansion in `url` and `headers`, so secrets stay out of version control. For servers that need OAuth, run `/mcp` inside Claude Code to complete the browser login.

6. **(Optional) Reach it from anywhere.** Put the host on a private network ([Tailscale](https://tailscale.com)) so your laptop, desktop, and any other machine connect to the same stack without exposing it to the internet.

7. **(Optional) Add lifecycle hooks.** Use [Claude Code](https://www.anthropic.com/claude-code) hooks (e.g., SessionStart, PreCompact, Stop) to automate the stack — restore context on start, snapshot state before compaction, etc.

8. **(Optional) Go fully local.** Drive a local agent loop with an Ollama model against the same MCP servers for offline or air-gapped work.

## Honest Limitations

- **Local models trail frontier models.** Ollama models are great for delegation and offline use, but they are not always a drop-in replacement for a frontier orchestrator.
- **Not air-gapped by default.** With Claude Code as the orchestrator, prompts still reach Anthropic. Full locality requires the local model loop in step 8.
- **You own the ops.** Self-hosting means you patch, secure, and monitor the stack; exposing MCP over HTTP demands network discipline (private network or auth).
- **Hardware matters.** Useful local models need real RAM/VRAM; large models need a GPU.

## Command Reference

The commands that wire the stack together, grounded in the official Ollama CLI and Claude Code MCP docs.

| Command | Layer | What it does |
| --- | --- | --- |
| `ollama pull <model>` | Model runtime | Download a model to the local store |
| `ollama run <model>` | Model runtime | Run a model interactively (loads it if needed) |
| `ollama list` / `ollama ps` | Model runtime | List installed models / list currently loaded models |
| `ollama serve` | Model runtime | Start the API server on `localhost:11434` |
| `claude mcp add --transport http <name> <url>` | Orchestrator | Connect Claude Code to a networked MCP server over HTTP |
| `claude mcp add --transport http <name> <url> --header "Authorization: Bearer ..."` | Orchestrator | Add an HTTP server with an auth header |
| `claude mcp add ... --scope user` | Orchestrator | Store the server config for all projects (`local` is default; `project` writes `.mcp.json`) |
| `claude mcp add-json <name> '<json>'` | Orchestrator | Add a server from JSON (`"type": "http"`, with `url` + optional `headers`) |
| `claude mcp list` / `claude mcp get <name>` / `claude mcp remove <name>` | Orchestrator | List / inspect / remove configured servers |
| `/mcp` (inside Claude Code) | Orchestrator | Check server status and complete OAuth login for remote servers |

MCP scopes are stored as follows: `local` and `user` live in `~/.claude.json`; `project` lives in a checked-in `.mcp.json` at the project root. Adjust the MCP server startup timeout with the `MCP_TIMEOUT` environment variable (for example, `MCP_TIMEOUT=10000 claude` for a 10-second timeout).

## Troubleshooting

- **MCP client won't connect** → use `claude mcp add --transport http`; HTTP entries must carry `"type": "http"` (or its alias `streamable-http`) with a `url`.
- **A server reports OK then dies** → it likely needs credentials/permissions that fail after startup; check the server logs. If a remote server returns `401`/`403`, Claude Code flags it in `/mcp` for an OAuth login. If you set a static `Authorization` header that the server rejects, the connection is reported as failed rather than falling back to OAuth — verify the token or remove the header.
- **Server drops mid-session** → HTTP/SSE servers reconnect automatically with exponential backoff (up to five attempts); after that, retry from `/mcp`. Stdio (local process) servers are not reconnected automatically.
- **Local model too slow or weak** → use a smaller quantized model, or keep local inference for delegation and route hard tasks to the frontier orchestrator.

## References

- Ollama CLI reference — https://docs.ollama.com/cli
- Ollama — https://ollama.com · https://github.com/ollama/ollama
- Claude Code — Connect to tools via MCP — https://code.claude.com/docs/en/mcp
- Model Context Protocol — https://modelcontextprotocol.io
- MCP servers — https://github.com/modelcontextprotocol/servers
- supergateway — https://github.com/supercorp-ai/supergateway
- mcp-supergateway-hub — https://github.com/dpdanpittman/mcp-supergateway-hub
- Claude Code — https://www.anthropic.com/claude-code
- Tailscale — https://tailscale.com

Trust & readiness

TrustReview first
Sourcesource-backed
Safety notesPresent
ReviewedYes

Community context

Related entries(4)
Related guides(2)
Community signals

Compare

Integrations & API

Contribute

Suggest a metadata change Claim this listing

Documentation Source repository Browse directory

Review first — review before installing

Open the source and read safety notes before installing.

Citation facts

Source-backed facts for citing this resource, derived directly from the registry — also available as plain text for AI assistants.

Canonical URL: https://heyclau.de/entry/guides/local-first-ai-dev-stack
Source URLs: https://code.claude.com/docs/en/mcp, https://github.com/JSONbored/awesome-claude/blob/main/content/guides/local-first-ai-dev-stack.mdx
Safety notes: Exposing MCP servers over HTTP makes their tools reachable on the network — run them on a trusted/private network or behind authentication, never a public interface., Self-hosted services are yours to patch and secure; a local model runtime executes code/tools on your machine, so only run models and servers you trust.
Privacy notes: The point of the stack is data locality — prompts, tool I/O, and memory stay on infrastructure you control instead of scattered across SaaS., Caveat: if Claude Code is the orchestrator, its prompts still go to Anthropic's model. For full data locality, run a local model loop (Ollama + an MCP-capable client) and accept the smaller-model tradeoffs., The memory knowledge-graph persists on local disk — secure and back it up like any other sensitive store.
Author: dpdanpittman
Submitted by: dpdanpittman
Claim status: unclaimed
Last verified: 2026-06-02

Decision playbook

Review trust signals before you adopt

Signals are present but mixed. Use the checklist below to confirm the source and operational safety for your environment.

Compare context

Selected

Current score

Baseline

—

Delta

No baseline selected

No major trust-signal divergence detected in the current selection.

Source and provenance checks

Complete

Confirm ownership and provenance before trusting install instructions.

Source link availableRequired
Open the canonical repository and verify ownership.
Done
Source provenance statusRequired
Marked as source-backed.
Done
Metadata reviewed
Registry metadata indicates a reviewed listing.
Done

Safety and privacy checks

Complete

Validate risk disclosures before installation or API wiring.

Safety notes presentRequired
Review the listed safety guidance before running commands.
Done
Privacy notes presentRequired
Review data handling notes before connecting accounts or secrets.
Done
Trust level risk gateRequired
Trust level does not block evaluation.
Done

Package and install checks

Needs review

Check package metadata and artifact integrity signals.

Install payload available
Install or copy payload is available for review.
Done
Package verification flag
No package verification flag provided.
Pending
Checksum metadata
No checksum provided for downloaded artifact.
Pending

Compare-driven decision checks

Needs review

Use compare context to validate trade-offs before adoption.

Compare tray has multiple entries
Add at least one more entry to compare trust differences.
Pending
Baseline comparison available
No baseline peer selected yet.
Pending
Diverging trust signals identified
No major trust-signal divergence found.
Pending

Setup at a glance

Copy & paste

Copy-ready — paste the snippet to get started.

Install command

Not provided

Config snippet

Not provided

Copy snippet

Provided

Prerequisites

4 to clear

Platforms

1 listed

Difficulty

68/100

Adoption plan

Balanced adoption plan

Current risk score 16/100. Use staged verification before broader rollout.

Risk 16

Pre-adoption checks

Validate source and review signals before any execution.

Confirm source provenanceRequired
Source URL/provenance metadata is present.
Done
Confirm metadata review state
Listing has review metadata.
Done
Verify install payload
Install/config payload exists and can be inspected.
Done

Security checks

Confirm safety, privacy, and package integrity signals.

Review safety notesRequired
Safety notes are present.
Done
Review privacy notesRequired
Privacy notes are present.
Done
Verify package integrity metadata
No package verification/checksum metadata.
Pending

Rollout

Adopt in controlled steps based on the selected plan.

Run in isolated sandbox firstRequired
Use a constrained sandbox and observe behavior across multiple tasks.
Pending
Roll out graduallyRequired
Roll out to a small cohort before wider usage.
Pending
Set monitoring and fallback
Define rollback path and monitor errors after adoption.
Pending

Evidence readiness

Evidence readiness matrix · balanced

Required evidence gates are covered (5/6 signals complete).

Risk 15

Source provenance

Present

Source repository/provenance is listed.

Required in this preset

Metadata review

Present

Review metadata is present.

Required in this preset

Safety notes

Present

Safety notes are present.

Required in this preset

Privacy notes

Present

Privacy notes are present.

Optional in this preset

Package integrity

Missing

Package integrity metadata is missing.

Optional in this preset

Install payload

Present

Install payload is available.

Required in this preset

Required evidence gates are covered for this preset.

Decision timeline

Decision timeline · balanced

5/6 steps complete with no blocking gaps for this preset.

Risk 14

triage

Confirm source provenanceRequired

Source/provenance metadata is available.

Done

triage

Check metadata review statusRequired

Review metadata is available.

Done

verify

Review safety notesRequired

Safety notes are available.

Done

verify

Review privacy notes

Privacy notes are available.

Done

verify

Validate package integrity metadata

Package integrity metadata is missing.

Pending

rollout

Verify install payload and commandsRequired

Install payload is available.

Done

No required blockers for this timeline preset.

Prerequisite readiness

4 prerequisites to line up before setup.

0/4 ready

Install & runtime2Network & hosting1General1

Safety & privacy surface

2 safety and 3 privacy notes across 4 risk areas. Review closely: network access.

4 areas

SafetyNetwork accessExposing MCP servers over HTTP makes their tools reachable on the network — run them on a trusted/private network or behind authentication, never a public interface.
SafetyExecution & processesSelf-hosted services are yours to patch and secure; a local model runtime executes code/tools on your machine, so only run models and servers you trust.
PrivacyGeneralThe point of the stack is data locality — prompts, tool I/O, and memory stay on infrastructure you control instead of scattered across SaaS.
PrivacyExecution & processesCaveat: if Claude Code is the orchestrator, its prompts still go to Anthropic's model. For full data locality, run a local model loop (Ollama + an MCP-capable client) and accept the smaller-model tradeoffs.
PrivacyLocal filesThe memory knowledge-graph persists on local disk — secure and back it up like any other sensitive store.

Safety notes

Exposing MCP servers over HTTP makes their tools reachable on the network — run them on a trusted/private network or behind authentication, never a public interface.
Self-hosted services are yours to patch and secure; a local model runtime executes code/tools on your machine, so only run models and servers you trust.

Privacy notes

The point of the stack is data locality — prompts, tool I/O, and memory stay on infrastructure you control instead of scattered across SaaS.
Caveat: if Claude Code is the orchestrator, its prompts still go to Anthropic's model. For full data locality, run a local model loop (Ollama + an MCP-capable client) and accept the smaller-model tradeoffs.
The memory knowledge-graph persists on local disk — secure and back it up like any other sensitive store.

Prerequisites

A machine with enough RAM/VRAM for local models (16GB+ for small quantized models; a GPU helps for larger ones).
Node.js 18+ and Python 3.10+ (with uv) to run the common MCP servers.
Claude Code or another MCP client as the orchestrator.
Optional: Docker or a small Kubernetes setup to host a server fleet, and a private network (e.g., Tailscale) to reach it from other machines.

Schema details

Install type: copy
Reading time: 8 min
Difficulty score: 68
Troubleshooting: Yes
Breaking changes: No

Skill and platform metadata

Retrieval sources

https://code.claude.com/docs/en/mcphttps://docs.ollama.com/cli

Full copyable content

## TL;DR

A **local-first AI developer stack** keeps the parts that touch your code and data — tools, memory, and auxiliary models — on hardware you control, instead of scattering them across SaaS. You can still use a frontier model like Claude as the orchestrator, but the MCP servers it calls, the memory it reads, and any delegated or offline model work all run on your own box.

**The payoff:** better privacy (your data stays on your network), no per-tool lock-in, and a stack that keeps working on your own infrastructure.

> **What you'll build:** a local model runtime (Ollama) + a self-hosted fleet of MCP servers reachable over HTTP + persistent local memory + Claude Code wired to all of it, optionally reachable from any dev machine over a private network.

## Prerequisites & Requirements

- [ ] {"task": "A host with enough RAM/VRAM for local models", "description": "16GB+ for small quantized models; a GPU helps for larger ones"}
- [ ] {"task": "Node.js 18+ and Python 3.10+ (with uv)", "description": "Runtimes for the common MCP servers"}
- [ ] {"task": "Claude Code or another MCP client", "description": "Acts as the orchestrator that calls your local tools"}
- [ ] {"task": "Optional: Docker/Kubernetes + a private network", "description": "To host a server fleet and reach it from other machines (e.g., Tailscale)"}

## Core Concepts Explained

### What "local-first" actually means

Local-first is not the same as fully air-gapped. In this architecture a frontier orchestrator (Claude Code) still calls a cloud model — what's *local* is everything around it: the tools, the memory, your data, and any auxiliary models. That gets you most of the privacy and durability benefits without giving up frontier reasoning. If you need true locality (offline / air-gapped), you swap the orchestrator for a **local model loop** — a local LLM via Ollama driving an MCP-capable client — and accept that smaller models are less capable.

### The four layers

1. **Model runtime** — a local inference server ([Ollama](https://ollama.com)) for delegation and offline work.
2. **Tool layer** — [MCP](https://modelcontextprotocol.io) servers, self-hosted and exposed over HTTP so any client or machine can reach them.
3. **Memory** — a memory/knowledge-graph MCP server persisting to local disk.
4. **Orchestrator** — Claude Code (or a local agent) that drives the loop, plus lifecycle hooks for automation.

## Step-by-Step Implementation Guide

1. **Install a local model runtime.** Install [Ollama](https://github.com/ollama/ollama), pull a small instruct model, and confirm the local API is up. Ollama serves on `localhost:11434` by default; use the model for cheap delegation (summarizing logs, first-pass drafts) and as your offline fallback.

   ```bash
   # Pull a model and run it interactively
   ollama pull <model>
   ollama run <model>

   # Inspect local state
   ollama list   # installed models
   ollama ps     # currently loaded models

   # Run the API server explicitly (it also starts on demand)
   ollama serve  # listens on localhost:11434
   ```

2. **Pick your MCP servers.** Start from the official [Model Context Protocol servers](https://github.com/modelcontextprotocol/servers) — `filesystem`, `git`, `memory`, `fetch`, and friends. These run as local stdio processes.

3. **Self-host the fleet over HTTP.** stdio is local-only, so bridge it: [supergateway](https://github.com/supercorp-ai/supergateway) wraps a stdio MCP server as a streamable HTTP endpoint. To run many at once on one box (one port per server), use a hub like [mcp-supergateway-hub](https://github.com/dpdanpittman/mcp-supergateway-hub).

4. **Add persistent local memory.** Run the `memory` MCP server with its store pointed at local disk. This is the knowledge graph your agent reads and writes across sessions — and it never leaves your machine.

5. **Wire the orchestrator.** Point Claude Code at each server's endpoint with `claude mcp add --transport http`. The HTTP transport is the recommended way to connect remote/networked MCP servers. Add `--scope` to control where the config is stored (`local` is the default; `project` writes a shareable `.mcp.json`; `user` makes it available across all your projects), and pass auth as an HTTP header when your gateway requires it.

   ```bash
   # Add a self-hosted HTTP MCP server (default = local scope)
   claude mcp add --transport stdio memory -- npx -y @modelcontextprotocol/server-memory

   # Make it available across all your projects
   claude mcp add --transport stdio memory --scope user -- npx -y @modelcontextprotocol/server-memory

   # Front your gateway with auth and pass a bearer token
   claude mcp add --transport http secure-fs https://mcp.box.ts.net/mcp \
     --header "Authorization: Bearer your-token"

   # Verify, inspect, and manage
   claude mcp list
   claude mcp get memory
   claude mcp remove memory
   ```

   You can also add a server from JSON with `claude mcp add-json`. HTTP entries use `"type": "http"` (the spec name `streamable-http` is accepted as an alias) with a `url` field and an optional `headers` object; `.mcp.json` supports `${VAR}` / `${VAR:-default}` expansion in `url` and `headers`, so secrets stay out of version control. For servers that need OAuth, run `/mcp` inside Claude Code to complete the browser login.

6. **(Optional) Reach it from anywhere.** Put the host on a private network ([Tailscale](https://tailscale.com)) so your laptop, desktop, and any other machine connect to the same stack without exposing it to the internet.

7. **(Optional) Add lifecycle hooks.** Use [Claude Code](https://www.anthropic.com/claude-code) hooks (e.g., SessionStart, PreCompact, Stop) to automate the stack — restore context on start, snapshot state before compaction, etc.

8. **(Optional) Go fully local.** Drive a local agent loop with an Ollama model against the same MCP servers for offline or air-gapped work.

## Honest Limitations

- **Local models trail frontier models.** Ollama models are great for delegation and offline use, but they are not always a drop-in replacement for a frontier orchestrator.
- **Not air-gapped by default.** With Claude Code as the orchestrator, prompts still reach Anthropic. Full locality requires the local model loop in step 8.
- **You own the ops.** Self-hosting means you patch, secure, and monitor the stack; exposing MCP over HTTP demands network discipline (private network or auth).
- **Hardware matters.** Useful local models need real RAM/VRAM; large models need a GPU.

## Command Reference

The commands that wire the stack together, grounded in the official Ollama CLI and Claude Code MCP docs.

| Command | Layer | What it does |
| --- | --- | --- |
| `ollama pull <model>` | Model runtime | Download a model to the local store |
| `ollama run <model>` | Model runtime | Run a model interactively (loads it if needed) |
| `ollama list` / `ollama ps` | Model runtime | List installed models / list currently loaded models |
| `ollama serve` | Model runtime | Start the API server on `localhost:11434` |
| `claude mcp add --transport http <name> <url>` | Orchestrator | Connect Claude Code to a networked MCP server over HTTP |
| `claude mcp add --transport http <name> <url> --header "Authorization: Bearer ..."` | Orchestrator | Add an HTTP server with an auth header |
| `claude mcp add ... --scope user` | Orchestrator | Store the server config for all projects (`local` is default; `project` writes `.mcp.json`) |
| `claude mcp add-json <name> '<json>'` | Orchestrator | Add a server from JSON (`"type": "http"`, with `url` + optional `headers`) |
| `claude mcp list` / `claude mcp get <name>` / `claude mcp remove <name>` | Orchestrator | List / inspect / remove configured servers |
| `/mcp` (inside Claude Code) | Orchestrator | Check server status and complete OAuth login for remote servers |

MCP scopes are stored as follows: `local` and `user` live in `~/.claude.json`; `project` lives in a checked-in `.mcp.json` at the project root. Adjust the MCP server startup timeout with the `MCP_TIMEOUT` environment variable (for example, `MCP_TIMEOUT=10000 claude` for a 10-second timeout).

## Troubleshooting

- **MCP client won't connect** → use `claude mcp add --transport http`; HTTP entries must carry `"type": "http"` (or its alias `streamable-http`) with a `url`.
- **A server reports OK then dies** → it likely needs credentials/permissions that fail after startup; check the server logs. If a remote server returns `401`/`403`, Claude Code flags it in `/mcp` for an OAuth login. If you set a static `Authorization` header that the server rejects, the connection is reported as failed rather than falling back to OAuth — verify the token or remove the header.
- **Server drops mid-session** → HTTP/SSE servers reconnect automatically with exponential backoff (up to five attempts); after that, retry from `/mcp`. Stdio (local process) servers are not reconnected automatically.
- **Local model too slow or weak** → use a smaller quantized model, or keep local inference for delegation and route hard tasks to the frontier orchestrator.

## References

- Ollama CLI reference — https://docs.ollama.com/cli
- Ollama — https://ollama.com · https://github.com/ollama/ollama
- Claude Code — Connect to tools via MCP — https://code.claude.com/docs/en/mcp
- Model Context Protocol — https://modelcontextprotocol.io
- MCP servers — https://github.com/modelcontextprotocol/servers
- supergateway — https://github.com/supercorp-ai/supergateway
- mcp-supergateway-hub — https://github.com/dpdanpittman/mcp-supergateway-hub
- Claude Code — https://www.anthropic.com/claude-code
- Tailscale — https://tailscale.com

About this resource

TL;DR

A local-first AI developer stack keeps the parts that touch your code and data — tools, memory, and auxiliary models — on hardware you control, instead of scattering them across SaaS. You can still use a frontier model like Claude as the orchestrator, but the MCP servers it calls, the memory it reads, and any delegated or offline model work all run on your own box.

The payoff: better privacy (your data stays on your network), no per-tool lock-in, and a stack that keeps working on your own infrastructure.

What you'll build: a local model runtime (Ollama) + a self-hosted fleet of MCP servers reachable over HTTP + persistent local memory + Claude Code wired to all of it, optionally reachable from any dev machine over a private network.

Prerequisites & Requirements

{"task": "A host with enough RAM/VRAM for local models", "description": "16GB+ for small quantized models; a GPU helps for larger ones"}
{"task": "Node.js 18+ and Python 3.10+ (with uv)", "description": "Runtimes for the common MCP servers"}
{"task": "Claude Code or another MCP client", "description": "Acts as the orchestrator that calls your local tools"}
{"task": "Optional: Docker/Kubernetes + a private network", "description": "To host a server fleet and reach it from other machines (e.g., Tailscale)"}

Core Concepts Explained

What "local-first" actually means

Local-first is not the same as fully air-gapped. In this architecture a frontier orchestrator (Claude Code) still calls a cloud model — what's local is everything around it: the tools, the memory, your data, and any auxiliary models. That gets you most of the privacy and durability benefits without giving up frontier reasoning. If you need true locality (offline / air-gapped), you swap the orchestrator for a local model loop — a local LLM via Ollama driving an MCP-capable client — and accept that smaller models are less capable.

The four layers

Model runtime — a local inference server (Ollama) for delegation and offline work.
Tool layer — MCP servers, self-hosted and exposed over HTTP so any client or machine can reach them.
Memory — a memory/knowledge-graph MCP server persisting to local disk.
Orchestrator — Claude Code (or a local agent) that drives the loop, plus lifecycle hooks for automation.

Step-by-Step Implementation Guide

Install a local model runtime. Install Ollama, pull a small instruct model, and confirm the local API is up. Ollama serves on localhost:11434 by default; use the model for cheap delegation (summarizing logs, first-pass drafts) and as your offline fallback.

# Pull a model and run it interactively
ollama pull <model>
ollama run <model>

# Inspect local state
ollama list   # installed models
ollama ps     # currently loaded models

# Run the API server explicitly (it also starts on demand)
ollama serve  # listens on localhost:11434

Pick your MCP servers. Start from the official Model Context Protocol servers — filesystem, git, memory, fetch, and friends. These run as local stdio processes.
Self-host the fleet over HTTP. stdio is local-only, so bridge it: supergateway wraps a stdio MCP server as a streamable HTTP endpoint. To run many at once on one box (one port per server), use a hub like mcp-supergateway-hub.
Add persistent local memory. Run the memory MCP server with its store pointed at local disk. This is the knowledge graph your agent reads and writes across sessions — and it never leaves your machine.
Wire the orchestrator. Point Claude Code at each server's endpoint with claude mcp add --transport http. The HTTP transport is the recommended way to connect remote/networked MCP servers. Add --scope to control where the config is stored (local is the default; project writes a shareable .mcp.json; user makes it available across all your projects), and pass auth as an HTTP header when your gateway requires it.
```
# Add a self-hosted HTTP MCP server (default = local scope)
claude mcp add --transport stdio memory -- npx -y @modelcontextprotocol/server-memory

# Make it available across all your projects
claude mcp add --transport stdio memory --scope user -- npx -y @modelcontextprotocol/server-memory

# Front your gateway with auth and pass a bearer token
claude mcp add --transport http secure-fs https://mcp.box.ts.net/mcp \
  --header "Authorization: Bearer your-token"

# Verify, inspect, and manage
claude mcp list
claude mcp get memory
claude mcp remove memory
```
You can also add a server from JSON with claude mcp add-json. HTTP entries use "type": "http" (the spec name streamable-http is accepted as an alias) with a url field and an optional headers object; .mcp.json supports ${VAR} / ${VAR:-default} expansion in url and headers, so secrets stay out of version control. For servers that need OAuth, run /mcp inside Claude Code to complete the browser login.
(Optional) Reach it from anywhere. Put the host on a private network (Tailscale) so your laptop, desktop, and any other machine connect to the same stack without exposing it to the internet.
(Optional) Add lifecycle hooks. Use Claude Code hooks (e.g., SessionStart, PreCompact, Stop) to automate the stack — restore context on start, snapshot state before compaction, etc.
(Optional) Go fully local. Drive a local agent loop with an Ollama model against the same MCP servers for offline or air-gapped work.

Honest Limitations

Local models trail frontier models. Ollama models are great for delegation and offline use, but they are not always a drop-in replacement for a frontier orchestrator.
Not air-gapped by default. With Claude Code as the orchestrator, prompts still reach Anthropic. Full locality requires the local model loop in step 8.
You own the ops. Self-hosting means you patch, secure, and monitor the stack; exposing MCP over HTTP demands network discipline (private network or auth).
Hardware matters. Useful local models need real RAM/VRAM; large models need a GPU.

Command Reference

The commands that wire the stack together, grounded in the official Ollama CLI and Claude Code MCP docs.

Command	Layer	What it does
`ollama pull <model>`	Model runtime	Download a model to the local store
`ollama run <model>`	Model runtime	Run a model interactively (loads it if needed)
`ollama list` / `ollama ps`	Model runtime	List installed models / list currently loaded models
`ollama serve`	Model runtime	Start the API server on `localhost:11434`
`claude mcp add --transport http <name> <url>`	Orchestrator	Connect Claude Code to a networked MCP server over HTTP
`claude mcp add --transport http <name> <url> --header "Authorization: Bearer ..."`	Orchestrator	Add an HTTP server with an auth header
`claude mcp add ... --scope user`	Orchestrator	Store the server config for all projects (`local` is default; `project` writes `.mcp.json`)
`claude mcp add-json <name> '<json>'`	Orchestrator	Add a server from JSON (`"type": "http"`, with `url` + optional `headers`)
`claude mcp list` / `claude mcp get <name>` / `claude mcp remove <name>`	Orchestrator	List / inspect / remove configured servers
`/mcp` (inside Claude Code)	Orchestrator	Check server status and complete OAuth login for remote servers

MCP scopes are stored as follows: local and user live in ~/.claude.json; project lives in a checked-in .mcp.json at the project root. Adjust the MCP server startup timeout with the MCP_TIMEOUT environment variable (for example, MCP_TIMEOUT=10000 claude for a 10-second timeout).

Troubleshooting

MCP client won't connect → use claude mcp add --transport http; HTTP entries must carry "type": "http" (or its alias streamable-http) with a url.
A server reports OK then dies → it likely needs credentials/permissions that fail after startup; check the server logs. If a remote server returns 401/403, Claude Code flags it in /mcp for an OAuth login. If you set a static Authorization header that the server rejects, the connection is reported as failed rather than falling back to OAuth — verify the token or remove the header.
Server drops mid-session → HTTP/SSE servers reconnect automatically with exponential backoff (up to five attempts); after that, retry from /mcp. Stdio (local process) servers are not reconnected automatically.
Local model too slow or weak → use a smaller quantized model, or keep local inference for delegation and route hard tasks to the frontier orchestrator.

References

Ollama CLI reference — https://docs.ollama.com/cli
Ollama — https://ollama.com · https://github.com/ollama/ollama
Claude Code — Connect to tools via MCP — https://code.claude.com/docs/en/mcp
Model Context Protocol — https://modelcontextprotocol.io
MCP servers — https://github.com/modelcontextprotocol/servers
supergateway — https://github.com/supercorp-ai/supergateway
mcp-supergateway-hub — https://github.com/dpdanpittman/mcp-supergateway-hub
Claude Code — https://www.anthropic.com/claude-code
Tailscale — https://tailscale.com

#local-first #self-hosted #mcp #ollama #privacy #homelab #claude-code

Source citations

Source methodology →

Add this badge to your README

Show that Build a Local-First AI Developer Stack is listed on HeyClaude. Paste this Markdown into your README — it renders the badge and links back to this page.

[![Listed on HeyClaude](https://heyclau.de/badge/guides/local-first-ai-dev-stack.svg)](https://heyclau.de/entry/guides/local-first-ai-dev-stack)

How it compares

Build a Local-First AI Developer Stack side by side with 2 alternatives on trust, install, platform support, and disclosed safety notes — all from reviewed registry metadata.

2 trust signals differ across this comparison (Source provenance, Submitter).

Field	Build a Local-First AI Developer Stack Run the parts of your AI dev workflow that touch your code and data — tools, memory, and auxiliary models — on infrastructure you control, while still using Claude as the orchestrator. A practical architecture for a self-hosted, privacy-first developer stack. Open dossier	Secret Handling For MCP Servers And Agent Tools A practical guide for handling secrets when connecting MCP servers and authoring Agent SDK tools in Claude Code: env expansion in .mcp.json, OAuth scope pins, keychain storage, local scope, and redaction before tool arguments reach the model. Open dossier	Build Claude MCP Servers Master MCP server development from scratch. Create custom Claude Desktop integrations with TypeScript/Python in 60 minutes using production-ready patterns. Open dossier
Next steps	Open dossier API JSON Open LLM Open source Newsletter Claim listing	Open dossier API JSON Open LLM Open source Newsletter Claim listing	Open dossier API JSON Open LLM Open source Newsletter Claim listing
Trust
Review status	ReviewedMaintainer reviewed	ReviewedMaintainer reviewed	ReviewedMaintainer reviewed
Package trust	Package not verified	Package not verified	Package not verified
Source provenanceDiffers	Source-backed	Submission linkedSource submission	Source-backed
SubmitterDiffers	dpdanpittman	kiannidev	—
Install risk	Review first	Review first	Review first
Notes	Safety ✓ Privacy ✓	Safety ✓ Privacy ✓	Safety ✓ Privacy ✓
Brand	—	—	—
Category	guides	guides	guides
Source	Source-backed	Source-backed	Source-backed
Author	dpdanpittman	kiannidev	JSONbored
Added	2026-06-02	2026-06-16	2025-10-27
Platforms	Claude Code	Claude Code	Claude Code
Harness	Claude Code	Claude Code	Claude Code
Source repo	—	—	—
Safety notes	✓Exposing MCP servers over HTTP makes their tools reachable on the network — run them on a trusted/private network or behind authentication, never a public interface. Self-hosted services are yours to patch and secure; a local model runtime executes code/tools on your machine, so only run models and servers you trust.	✓Stdio MCP servers inherit environment variables you pass via --env or .mcp.json env blocks; treat that as handing the server your credentials. HTTP MCP headers and OAuth tokens authenticate outbound calls; a compromised server or overly broad scope can exfiltrate data through tool results. Agent SDK tool descriptions, inputs, and outputs enter model context each turn—never embed live secrets in schemas or sample responses. Project-scoped .mcp.json is designed for version control; use ${VAR} expansion and local scope for machine-specific secrets instead of committing raw keys.	✓Building and connecting an MCP server runs a local process (or connects to a remote one) that executes tools with your user privileges; only connect servers you trust and review the command and URL first.
Privacy notes	✓The point of the stack is data locality — prompts, tool I/O, and memory stay on infrastructure you control instead of scattered across SaaS. Caveat: if Claude Code is the orchestrator, its prompts still go to Anthropic's model. For full data locality, run a local model loop (Ollama + an MCP-capable client) and accept the smaller-model tradeoffs. The memory knowledge-graph persists on local disk — secure and back it up like any other sensitive store.	✓MCP tool arguments, resource contents, and error messages can contain API keys, JWTs, customer IDs, and internal URLs that flow into session transcripts. OAuth access tokens for remote MCP servers are stored in the macOS Keychain or a credentials file; revoke with Clear authentication in /mcp when offboarding. Agent SDK handlers that call external APIs may log request metadata; redact at the handler boundary before traces or support exports leave your environment. Shared .mcp.json templates should name required variables (for example API_KEY) without example values that look like real credentials.	✓Connecting servers can pass secrets via --env and OAuth tokens stored in Claude Code's local config; the server process can access whatever data and credentials you grant it.
Prerequisites	A machine with enough RAM/VRAM for local models (16GB+ for small quantized models; a GPU helps for larger ones). Node.js 18+ and Python 3.10+ (with uv) to run the common MCP servers. Claude Code or another MCP client as the orchestrator. Optional: Docker or a small Kubernetes setup to host a server fleet, and a private network (e.g., Tailscale) to reach it from other machines.	Inventory of MCP servers (stdio, HTTP, or plugin) and any Agent SDK custom tools in your project. Access to .mcp.json, user settings, and environment variables on developer machines. Team policy for secret stores, rotation, and what may appear in version control. Ability to test MCP connections in a non-production profile before granting production credentials.	— none listed
Install	—	—	—
Config	—	—	—
Citations	Source repositorygithub.com 2026-07-18T19:14:44+00:00 Documentationcode.claude.com Submitted by dpdanpittman2026-06-02 Source methodology →	Source repositorygithub.com 2026-07-18T19:14:44+00:00 Documentationcode.claude.com Submitted by kiannidev2026-06-16 Source methodology →	Source repositorygithub.com 2026-07-18T19:14:44+00:00 Documentationcode.claude.com Source methodology →
Claim	Unclaimed	Unclaimed	Unclaimed

Open 3 picks in the interactive comparison tool

Related guides

Source-backed guides for putting this to work.

Auditing MCP Client Configuration Before Team Rollout

Audit MCP client configuration before sharing it with a team.

Added 1mo ago

guides Review first Source-backed Review first

Safety ✓ Privacy ✓by YB0y

Choose the Right Claude Extension Surface

Decide when to use Claude subagents, skills, commands, hooks, or MCP.

Added 1mo ago

guides Review first Source-backed Review first

Safety ✓ Privacy ✓by MkDev11

Signals

Loading live community signals…

Citation facts

Review trust signals before you adopt

Source and provenance checks

Safety and privacy checks

Package and install checks

Compare-driven decision checks

Copy & paste

Balanced adoption plan

Pre-adoption checks

Security checks

Rollout

Evidence readiness matrix · balanced

Source provenance

Metadata review

Safety notes

Privacy notes

Package integrity

Install payload

Decision timeline · balanced

Confirm source provenanceRequired

Check metadata review statusRequired

Review safety notesRequired

Review privacy notes

Validate package integrity metadata

Verify install payload and commandsRequired

Prerequisite readiness

Safety & privacy surface

Safety notes

Privacy notes

Prerequisites

Schema details

About this resource

TL;DR

Prerequisites & Requirements

Core Concepts Explained

What "local-first" actually means

The four layers

Step-by-Step Implementation Guide

Honest Limitations

Command Reference

Troubleshooting

References

Source citations

Add this badge to your README

How it compares

Related resources

Secret Handling For MCP Servers And Agent Tools

Build Claude MCP Servers

Self-Hosted AI Operator Stack

Privacy-First Research Workflow

Related guides

Auditing MCP Client Configuration Before Team Rollout

Choose the Right Claude Extension Surface

Signals