Forge | Local-first, multi-agent, programmable software-engineering runtime

01 · Overview

What is Forge?

A TypeScript CLI runtime for local-first agentic software engineering. Every piece below lives in src/. Node 20+. Ships via npm and a multi-arch Docker image.

/ ORCHESTRATION

Agentic loop

Classify → plan → approve → execute with iterative tool-use → validate → review → complete → learn. Failures escalate to diagnose — never a silent loop.

/ STATE

Inspectable everything

Tasks JSON, sessions JSONL, events JSONL. Conversations are JSONL with O_APPEND concurrency. Prompt hashes are deterministic.

/ SAFETY

Default-deny permissions

Every tool call classified by risk × side-effect × sensitivity. Paths realpath-confined. Shell risk-rated; critical hard-blocked. Credentials in OS keychain.

/ MODELS

Bring your own LLM

Auto-detects Ollama, LM Studio, vLLM, llama.cpp on default ports. 41 model families classified for role routing; auto-substitutes when your configured model isn't installed.

02 · Features

Every capability, highlighted.

Every feature below is in src/. Grep from any claim to a file.

Forge REPL Interface

Forge CLI

Forge Web Dashboard

/ 01

Local-first

Auto-detects Ollama, LM Studio, vLLM, llama.cpp. Hosted Anthropic / OpenAI / Azure / Groq / LocalAI / Together / Fireworks are opt-in.

ollamalmstudiovllmllama.cpp

/ 02

Iterative executor

Model sees every tool result (stdout / stderr / exit) and adapts within a step. Mode-capped turn budgets.

adaptivebounded

/ 03

Validation gate

Post-step typecheck / lint failures re-enter the loop as tool results — fixed before the next step runs.

tsceslint

/ 04

DAG planner

Plans have step dependencies, risk annotations, explicit tool calls. Auto-fixer repairs common issues; cycles rejected.

topo-sort

/ 05

Reviewer + debugger

Reviewer gates completion. On terminal failure, debugger agent diagnoses root cause before marking failed.

diagnose

/ 06

4-tier memory

Hot (session) · warm (SQLite recent) · cold (lazy project index) · learning (patterns with decaying confidence).

SQLiteFTS5

/ 07

Default-deny permissions

Risk × side-effect × sensitivity classified at every call. --skip-permissions only waives routine prompts.

trust-calibrated

/ 08

Realpath sandbox

Every path resolved to realpath, confined to project root. Always-forbidden targets (SSH keys, AWS creds) hard-blocked.

symlink-proof

/ 09

Shell risk classifier

Commands rated before execution. rm -rf /, sudo, fork bombs, curl-to-shell hard-blocked.

sandbox

/ 10

OS keychain

macOS Security, libsecret, Windows DPAPI. AES-GCM encrypted fallback if unavailable.

DPAPIlibsecret

/ 11

Concurrent-writer safe

REPL + UI + subagents edit the same conversation via POSIX O_APPEND + mkdir lockfile fallback.

POSIX

/ 12

Prompt-injection defence

Untrusted content (web / MCP) fenced as data, never instructions. Redactor scrubs secrets before logs.

fenceredact

/ 13

MCP bridge

Model Context Protocol: stdio + HTTP-stream. OAuth 2.0 + PKCE or API-key auth. Tokens in keychain.

MCPOAuth2

/ 14

Skills & instructions

Markdown + YAML frontmatter in ~/.forge/skills/. Per-project overrides.

.md skills

/ 15

Live dashboard

HTTP + WebSocket UI. Vanilla JS, < 100 KB, zero CDN. Delta watchers ref-counted across tabs.

vanilla JS

/ 16

Router reliability

Per-provider rate limit, circuit breaker, prompt cache, USD cost ledger. 1.5 s provider probes.

breaker

/ 17

Release signing

Manifest signed with Ed25519. SHA-256 per artefact. npm publishes with provenance.

Ed25519provenance

/ 18

Multi-arch containers

Single Dockerfile serves CLI + UI. Non-root, HEALTHCHECK, OCI labels, ~355 MB.

amd64arm64

Live demos

See it running.

Screen captures of each Forge surface — the interactive REPL, the one-shot CLI, and the web dashboard — all driving the same runtime. The VS Code extension has its own showcase below.

▶ REPL

Interactive session

Multi-turn prompts with slash-command autocomplete, status line, digit shortcuts for prompts, streamed markdown rendering, and live file-change tracking.

streamslashautocomplete

▶ CLI

One-shot runs

forge run "…" launches a full classify → plan → approve → execute → verify pipeline in the terminal with a progress rail and completion block.

--yes--plan-onlyci-friendly

▶ UI

Web dashboard

Live WebSocket stream of plan approval, permission prompts, model deltas, and task results. Historical tasks replay from disk; follow-ups thread the conversation.

WebSocketstreamhistory

What every demo is actually showing

The same src/core/orchestrator.ts runtime drives all three surfaces. Any task you run in one surface is a real row in the SQLite index — pickable from another surface, visible in forge sessions, cancellable from the dashboard.

Deltas stream token-by-token from the provider (emitDelta → event bus → WebSocket / REPL progress rail). Markdown reflows in place so headings, fences, and lists form up live instead of dumping at the end.

Run these for yourself

REPL

forge

One-shot

forge run "summarize src/core/loop.ts"

Dashboard

forge ui start   # http://127.0.0.1:7823

REPL demo · forge

Drive

CLI demo · forge run

Drive

Web dashboard demo · forge ui start

Drive

▶ VS Code

The full workflow, in your editor.

A first-class Forge surface that lives next to your code. Same runtime, same persisted state, same agents — surfaced through an activity-bar sidebar, integrated terminals, and an embedded dashboard webview.

forge-agentic-coding-cli · activity-bar sidebar

Forge for VS Code — activity-bar sidebar with stats, recent tasks, and quick actions

Marketplace

What the extension brings to your editor

The activity-bar webview reads straight from ~/.forge/global/index.db via the system sqlite3, so lifetime stats (tokens, calls, task counts) stay accurate even with no Forge process running. When the dashboard server is up, the sidebar layers in live provider state on top.

Click any task in the recent list and the embedded webview opens directly to its conversation view — not the dashboard home. Cross-project lookups work because the runtime's /api/tasks/:id endpoint resolves the project automatically from the index.

sidebardeep-linkoffline-statscross-project

Install the extension

From the command line

code --install-extension hoangsonw.forge-agentic-coding-cli

From the Marketplace

https://marketplace.visualstudio.com/items?itemName=hoangsonw.forge-agentic-coding-cli

Prereq

npm install -g @hoangsonw/forge   # the runtime

⌘ ⇧ P

Run anything as a task

Highlight a TODO comment, right-click → Run Selection as Task. Or use the whole buffer with Use Active File as Task. Each run opens its own integrated terminal and streams in real time.

selectionfilepalette

◫

Dashboard, embedded

One click launches forge ui start in the background, polls until reachable, and embeds the dashboard in a webview right next to your code. Reload + open-in-browser shortcuts at the top.

webviewauto-startiframe

◉

Status-bar pill + workspace meta

Single rocket pill flips between live and idle. The sidebar's workspace card shows cwd, url, provider and mode with one-click copy. Change Working Directory retargets per workspace.

status-barcwd-pickercopy

03 · Agentic loop

Classify · plan · approve · execute · validate · review · complete.

Source: src/core/loop.ts. Retry cap is 3. The debugger agent runs root-cause diagnosis before marking a task failed.

---
config:
  look: handDrawn
  theme: base
  themeVariables:
    fontSize: 16px
---
flowchart LR
  IN(("USER
prompt")) --> CLS["CLASSIFY
intent · risk · scope"]
  CLS --> PL["PLAN
DAG · steps · deps"]
  PL --> AP{"Approve?"}
  AP -- edit --> PL
  AP -- no --> CNCL(["cancelled"])
  AP -- yes --> EX["EXECUTE
iterative tool-use"]
  EX --> VG{"Validate
(tsc · lint)"}
  VG -- fails + budget --> EX
  VG -- fails + out --> DX["DIAGNOSE"]
  DX --> FL(["failed"])
  VG -- pass --> RV["REVIEW
reviewer agent"]
  RV -- bounce --> EX
  RV -- pass --> DONE(["completed"])
  DONE --> LRN["LEARN
patterns updated"]

  classDef term fill:#0a1a14,stroke:#10b981,color:#d1fae5,stroke-width:2px
  classDef fail fill:#1a0909,stroke:#f87171,color:#fee2e2,stroke-width:2px
  classDef step fill:#0f1726,stroke:#38bdf8,color:#e0f2fe,stroke-width:1.8px
  classDef gate fill:#1a1634,stroke:#a78bfa,color:#ede9fe,stroke-width:1.8px
  classDef io fill:#0c1a24,stroke:#22d3ee,color:#cffafe,stroke-width:2px
  class IN io
  class CNCL,FL fail
  class DONE term
  class CLS,PL,EX,RV,DX,LRN step
  class AP,VG gate

04 · State machine

10 task statuses. Every move gated.

Enforced by LEGAL_TRANSITIONS in src/persistence/tasks.ts. Illegal moves throw state_invalid. Terminal states only re-enter via forge resume, which resets them to draft.

---
config:
  theme: base
  themeVariables:
    fontSize: 16px
---
stateDiagram-v2
  direction LR
  [*] --> draft
  draft --> planned : planner output
  draft --> cancelled : user
  planned --> approved : user approves
  planned --> blocked : missing deps
  planned --> cancelled
  approved --> scheduled
  approved --> cancelled
  scheduled --> running
  scheduled --> blocked
  scheduled --> cancelled
  running --> verifying
  running --> failed
  running --> blocked
  running --> cancelled
  verifying --> completed
  verifying --> failed
  verifying --> running : reviewer bounces
  completed --> draft : forge resume
  failed --> draft : forge resume
  blocked --> draft : forge resume
  cancelled --> draft : forge resume
  blocked --> cancelled
  completed --> [*]
  failed --> [*]
  cancelled --> [*]

05 · Executor

Iterative tool use, inside each step.

Model sees every tool result — stdout, stderr, exit, error — and can adapt. Source: src/agents/executor.ts.

---
config:
  theme: base
  themeVariables:
    fontSize: 15px
    actorFontSize: 14px
    messageFontSize: 13px
---
sequenceDiagram
  autonumber
  participant L as loop.ts
  participant E as executor
  participant M as model
  participant T as tool
  participant V as validator
  L->>E: runStep(step)
  loop up to maxExecutorTurns
    E->>M: prompt + JSON schema
    M-->>E: {actions, done?}
    alt done
      E-->>L: completed
    else actions
      E->>T: execute
      T-->>E: stdout / stderr / exit
      E->>E: digest + append
    end
  end
  opt files changed
    loop up to maxValidationRetries
      E->>V: typecheck / lint
      alt pass
        E-->>L: completed
      else fail
        E->>M: VALIDATION_FAILED
        M-->>E: corrective actions
        E->>T: execute
      end
    end
  end

07 · Providers & routing

Bring your own LLM. Forge auto-adapts.

6 providers, auto-detected on default ports. 41 model families classified for role routing.

---
config:
  theme: base
  themeVariables:
    fontSize: 15px
---
flowchart LR
  R["router
resolveModel"] --> AD["adapter
resolveLocalModel"]
  AD --> L1["🟢 ollama
:11434"]
  AD --> L2["🔵 lmstudio
:1234"]
  AD --> L3["🟠 vllm
:8000"]
  AD --> L4["🟡 llama.cpp
:8080"]
  R --> H1["⬛ anthropic"]
  R --> H2["⬛ openai-compat"]
  R --> RL["rate limit"]
  R --> CB["circuit breaker"]
  R --> PC["prompt cache"]
  R --> CT["USD ledger"]
  classDef route fill:#0f1726,stroke:#38bdf8,color:#e0f2fe,stroke-width:2px
  classDef local fill:#0a1a14,stroke:#10b981,color:#d1fae5,stroke-width:2px
  classDef hosted fill:#1a1634,stroke:#a78bfa,color:#ede9fe,stroke-width:2px
  classDef util fill:#16121a,stroke:#f472b6,color:#fce7f3,stroke-width:1.8px
  class R,AD route
  class L1,L2,L3,L4 local
  class H1,H2 hosted
  class RL,CB,PC,CT util

Model families → preferred roles

Role	Preferred families
architect · reviewer · debugger	Llama 3.x / 4.x, Mixtral, Command-R+, DeepSeek V3 / R1, Mistral-Large
planner	Qwen 2.5 / 3, Llama 3.x, DeepSeek V3, Gemma 3, Mistral-Nemo, Command-R, Phi 4
executor (code)	DeepSeek-Coder, Qwen 2.5-Coder, CodeLlama, Codestral, StarCoder, Granite-Code
fast	Phi 3 / 4, Gemma 2, TinyLlama, SmolLM, MiniCPM

Model size & capability tiers

The agentic loop is multi-turn tool use with strict JSON output. Small local models can drive it, but not every kind of work is realistic at every size. Pick by the work you intend to do, and set a hosted fallback for when you hit the ceiling — the router degrades gracefully via its circuit breaker.

Work	Local floor we trust	Example pulls
Chat / concept Q&A	3B instruct	`phi3:mini` · `gemma3:2b` · `qwen2.5:3b`
Summarize / explain code	7B instruct	`qwen2.5:7b` · `llama3.1:8b`
Single-file edits / small features	7B+ code specialist	`deepseek-coder:6.7b` · `qwen2.5-coder:7b`
Multi-file refactors / new features	14B+ code specialist	`qwen2.5-coder:14b` · `deepseek-coder:33b`
Architecture-level changes	hosted only, realistically	Claude Opus/Sonnet · GPT-4-class

Observed small-model failure modes & runtime guards

Below the tier floor, models fail in recognisable ways. Forge catches each so a small model fails loudly instead of corrupting state.

Failure mode	Runtime guard
Picks `run_command` to write file contents	Executor prompt spells out `step.type → tool` mapping and forbids `run_command` for file writes.
Escalates to `ask_user` on any tool error, stalling the step	`ask_user` rejects empty / too-short questions as non-retryable; model has to switch tools.
Splits "create empty file → edit to fill"	`edit_file` with `oldText=""` on an empty/missing file writes the full body.
`write_file` `ENOENT` because parent dir doesn't exist	`createDirs` defaults to `true` (mkdir-p).
Cold-load timeout interpreted as model failure	Headers-timeout floor 300 s; proactive `warm()` with `/api/ps` preflight.
Reviewer rejects analysis tasks for "no file changes"	Classifier sets `requiresReview=false` for `intent=analysis`; narrator pass writes the real answer.
Two concurrent edits race on the same file	Per-process path-mutex + atomic temp+rename.

09 · Modes

Nine modes. Enforceable budgets.

Each mode is a runtime cap, not a hint. Read from src/core/mode-policy.ts.

Mode	Executor turns	Validation retries	Mutations	Max auto-risk
`fast`	2	0	yes	low
`balanced`	4	1	yes	medium
`heavy`	8	2	yes	high
`plan`	0 → 1	0	no	low
`execute`	4	1	yes	medium
`audit`	3	0	no	low
`debug`	6	2	yes	medium
`architect`	3	1	yes	medium
`offline-safe`	3	1	yes	medium

11 · Filesystem

XDG-aware. Per-project overrides.

---
config:
  theme: base
  themeVariables:
    fontSize: 14px
---
flowchart TB
  subgraph GLOBAL["global · ~/.forge"]
    G1[config.json]
    G2[instructions.md]
    G3[skills/]
    G4[agents/]
    G5[mcp/]
    G6[index.db]
    G7["projects · tasks · sessions · events"]
  end
  subgraph PROJECT["per-project · ./.forge"]
    P1[config.json]
    P2[instructions.md]
    P3[skills/]
    P4[agents/]
    P5[mcp/]
  end

13 · System requirements

Node 20+. Or just Docker.

Forge runs on any platform Node 20 runs on, or anywhere Docker runs. There is no host-side Python, Rust, or Go requirement. better-sqlite3 is the only native module and ships prebuilts for every supported triple — no toolchain needed on npm install.

/ host

Host toolchain

Node.js ≥ 20 (22 tested).
OS: macOS · Linux · Windows (native or WSL).
Architectures: x64 · arm64.
Docker ≥ 25 (only if you prefer the container path).

node 20+darwinlinuxwin32arm64

/ footprint

Disk & RAM

Disk: ~150 MB node_modules; state under ~/.forge grows with session history (override with FORGE_HOME).
RAM: ~100 MB for Forge itself. Your local model uses its own RAM/VRAM on top.
Cold start: forge doctor ~170 ms.

~150 MB~100 MB RAM

/ model

Model source (pick ≥ 1)

Local: Ollama · LM Studio · vLLM · llama.cpp — auto-detected on standard ports.
Hosted: ANTHROPIC_API_KEY · OPENAI_API_KEY (+ OPENAI_BASE_URL for any OpenAI-compatible server).
forge doctor probes all of them and tells you which are reachable.

local-firsthosted fallback

Runtime npm dependencies

13 runtime packages, zero optional dependencies. Listed below so you can audit them before npm install.

package.json · dependencies13 total

@modelcontextprotocol/sdk  # MCP bridge (stdio / http_stream / websocket)
better-sqlite3             # local index DB · FTS5 cold memory · native, prebuilt
chalk                      # ANSI color
cli-table3                 # tables in `forge doctor`, `task list`
commander                  # CLI argv parsing
dotenv                     # .env loading
ora                        # progress spinner
prompts                    # non-TTY fallback for the numbered-select helper
semver                     # update-check version comparison
undici                     # HTTP client · Ollama / Anthropic / OpenAI streams
ws                         # UI dashboard WebSocket
yaml                       # skill-file frontmatter
zod                        # runtime validation of plans & tool args

Recommended (not required)

ripgrep — fast path for the grep tool; falls back to a Node glob walker.
git — enables git_diff / git_status tools and project-root detection.
$EDITOR — used when you pick "Edit" on a plan approval; falls back to vi.

14 · Install

Four paths. Pick one.

01 / npm

global installbash

npm i -g @hoangsonw/forge
forge doctor
forge run "…"

View the published npm package

02 / VS Code

marketplacebash

npm i -g @hoangsonw/forge
code --install-extension \
  hoangsonw.forge-agentic-coding-cli

Open on the VS Code Marketplace

03 / Docker

zero local Nodebash

docker run --rm -it \
  -v forge-home:/data \
  -v "$PWD:/workspace" \
  ghcr.io/hoangsonw/forge-agentic-coding-cli:latest

04 / Compose

full stackbash

docker compose \
  -f docker/docker-compose.yml \
  up -d
# podman-compose works

16 · CI/CD

9 jobs per PR. 6 release stages.

CI · every PR + push

---
config:
  theme: base
  themeVariables:
    fontSize: 14px
---
flowchart LR
  PR(["PR / push"]) --> FMT["🎨 format"]
  PR --> LINT["🧹 lint"]
  PR --> TYPE["🧠 typecheck"]
  PR --> TEST["🧪 test matrix
Ubuntu · macOS
Node 20 · 22"]
  TEST --> COV["📈 coverage"]
  TYPE --> BUILD["🏗️ build"]
  BUILD --> DOCKER["🐳 docker-build"]
  PR --> AUDIT["🔐 audit"]
  FMT --> S["📊 pipeline
status"]
  LINT --> S
  TYPE --> S
  TEST --> S
  BUILD --> S
  DOCKER --> S
  AUDIT --> S
  COV --> S
  classDef job fill:#0f1726,stroke:#38bdf8,color:#e0f2fe,stroke-width:2px
  classDef tg fill:#0c1a24,stroke:#22d3ee,color:#cffafe,stroke-width:2px
  classDef sum fill:#1a1634,stroke:#a78bfa,color:#ede9fe,stroke-width:2px
  class PR tg
  class FMT,LINT,TYPE,TEST,COV,BUILD,DOCKER,AUDIT job
  class S sum

Release · on `v*` tag

---
config:
  theme: base
  themeVariables:
    fontSize: 14px
---
flowchart LR
  T(["git tag v*"]) --> G["🧪 pre-release
gate"]
  G --> A["📦 artifacts
5 targets"]
  G --> D["🐳 docker
multi-arch → GHCR"]
  A --> M["📝 manifest
ed25519-signed"]
  M --> N["📤 npm publish
--provenance"]
  G --> R["📊 release
status"]
  A --> R
  D --> R
  M --> R
  N --> R
  classDef tg fill:#0c1a24,stroke:#22d3ee,color:#cffafe,stroke-width:2px
  classDef job fill:#0f1726,stroke:#38bdf8,color:#e0f2fe,stroke-width:2px
  classDef ship fill:#1a1409,stroke:#fb923c,color:#ffedd5,stroke-width:2px
  classDef sum fill:#1a1634,stroke:#a78bfa,color:#ede9fe,stroke-width:2px
  class T tg
  class G job
  class A,D,M,N ship
  class R sum

17 · Runtime metrics

What it actually costs to run.

All measured locally — reproducers in the table at the bottom. No synthetic benchmarks, no comparisons against straw-man tools.

doctor cold-start

173 ms

Full provider probe + SQLite open + role→model mapping.

--help cold-start

238 ms

Commander.js boot + command-tree discovery.

UI shell

89 KB

Vanilla JS, zero frameworks, zero CDN fetches.

provider probe timeout

1.5 s

Not 30 s hangs. Dead runtimes surface fast.

Startup times

cold, no state reused · lower is better · measured via `time node bin/forge.js <cmd>`

forge doctor

173ms

forge --help

238ms

forge status

190ms

forge model list

310ms

full test suite

3.3s

Executor turn budget per mode

hard runtime cap · from `src/core/mode-policy.ts` · higher = more iteration, more tokens

plan

fast

audit · architect · offline-safe

balanced · execute

debug

heavy

UI shell asset sizes

uncompressed · served from disk · no network fetches at runtime

index.html (landing)

~67KB

dashboard app.js

89KB

dashboard styles.css

40KB

dashboard index.html

22KB

Target	Measured	Reproducer
`forge doctor` cold-start	173 ms	`time node bin/forge.js doctor --no-banner`
`forge --help` cold-start	238 ms	`time node bin/forge.js --help`
full test suite	~3.3 s	`npx vitest run`
UI app.js uncompressed	89 KB	`wc -c src/ui/public/app.js`
container image	~355 MB	`docker images ghcr.io/hoangsonw/forge-agentic-coding-cli`
CDN fetches at runtime	0	inspect `app.js` · no external URLs
provider probe timeout	1.5 s	`src/models/openai.ts#isAvailable`

Jump anywhere

What is Forge?

Agentic loop

Inspectable everything

Default-deny permissions

Bring your own LLM

Every capability, highlighted.

Local-first

Iterative executor

Validation gate

DAG planner

Reviewer + debugger

4-tier memory

Default-deny permissions

Realpath sandbox

Shell risk classifier

OS keychain

Concurrent-writer safe

Prompt-injection defence

MCP bridge

Skills & instructions

Live dashboard

Router reliability

Release signing

Multi-arch containers

See it running.

Interactive session

One-shot runs

Web dashboard

What every demo is actually showing

Run these for yourself

The full workflow, in your editor.

What the extension brings to your editor

Install the extension

Run anything as a task

Dashboard, embedded

Status-bar pill + workspace meta

Classify · plan · approve · execute · validate · review · complete.

10 task statuses. Every move gated.

Iterative tool use, inside each step.

Four tiers. Decays over time.

Bring your own LLM. Forge auto-adapts.

Model families → preferred roles

Model size & capability tiers

Observed small-model failure modes & runtime guards

Default-deny. Every tool call gated.

Nine modes. Enforceable budgets.

24 subcommands. 55 slash commands in the REPL.

XDG-aware. Per-project overrides.

Extend without rebuilding.

Skill example

Add MCP connector

Node 20+. Or just Docker.

Host toolchain

Disk & RAM

Model source (pick ≥ 1)

Runtime npm dependencies

Recommended (not required)

Four paths. Pick one.

01 / npm

02 / VS Code

03 / Docker

04 / Compose

Single image. CLI + UI + daemon.

9 jobs per PR. 6 release stages.

CI · every PR + push

Release · on v* tag

What it actually costs to run.

Startup times

Executor turn budget per mode

UI shell asset sizes

Works with every coding agent.

For Codex · Cursor · Aider · Cline · Continue

For Claude Code

Ready to install?

Release · on `v*` tag