Forge

Autonomous engineering needs judgment, not just code generation.

Forge is a local-first Rust framework that runs AI coding agents like Codex CLI, Claude Code, and Gemini CLI inside a governed engineering loop. It schedules work, routes tasks, remembers decisions, monitors quality, and delivers changes through git.

Single Rust binary SQLite memory Executor-agnostic Git-native

Open Forge landing View GitHub repo

forge heartbeat --repo ./workspace/app next run in 14m

Queued 08 Running 01 Needs review 03 Blocked 02

Latest receipt

Selected ui-polish-142, acquired repo lock, applied policy budget, invoked Codex CLI, ran checks, captured diff notes, and opened a review branch with follow-up monitor tasks attached.

Core thesis

Autonomous engineering systems need scheduling, policy, memory, monitoring, and review. The coding model is only one executor in a larger operating system for repo work.

Operating model

Forge behaves like a small autonomous engineering organization: it discovers useful work, executes bounded tasks, evaluates the output, and turns failures into follow-up work.

Task lifecycle

From a vague repo signal to a reviewable change.

Discover

Mine issues, TODOs, monitor failures, stale branches, and product notes.

Shape

Turn the signal into a bounded task with files, budget, checks, and risk level.

Execute

Start the chosen coding executor with repo memory and explicit constraints.

Prove

Run tests, capture logs, inspect UI, and summarize the actual behavior change.

Deliver

Open a PR, request review, merge under policy, or queue targeted follow-ups.

Build Loop

Runs every 10-30 minutes. Selects queued or discovered work, enforces policy, invokes an executor, captures artifacts, and delivers the result as a PR or direct commit depending on mode.

Monitor Loop

Runs hourly and after builds. Validates tests, functionality, UX, product fit, and repo health. Rejected work creates follow-up tasks instead of pretending the job is done.

Agent lanes

Forge keeps agent responsibilities explicit.

Planner

Ranks candidate work, defines scope, and writes the task contract.

Builder

Owns the code change, edits files, and produces a runnable diff.

Reviewer

Checks behavior, tests, diff quality, and alignment with repo standards.

Operator

Controls autonomy level, protected paths, budgets, delivery mode, and pauses.

Architecture

Scheduler, coordinator, queue, memory, policies, executors, monitor, git delivery.

Scheduler owns timed execution. Coordinator selects work and acquires locks. Queue stores durable SQLite tasks. Policies constrain autonomy and blast radius. Executors wrap external coding CLIs. Monitor judges quality and creates follow-ups.

Control plane

Autonomy is configured as policy, not left to the prompt.

Scope

Allowed repositories, branches, paths, task types, and file count limits.

Commands

Approved scripts, forbidden shell patterns, network rules, and timeout budgets.

Quality

Required tests, screenshot checks, lint gates, review thresholds, and rollback plans.

Delivery

PR-only mode, direct-commit mode, approval rules, merge windows, and cooldowns.

Delivery modes

PR mode is the default path: create a branch, open a PR, and wait for approval or monitor merge. Insane mode can commit directly to main while still running guardrails.

Autonomy levels

Forge can suggest tasks, enqueue them automatically, or execute them autonomously. The level is explicit policy, not implicit agent behavior.

Memory

SQLite stores episodic run logs, semantic repository knowledge, and organizational standards such as protected paths and testing expectations.

Run artifacts

Every run writes summaries, logs, diff notes, test results, screenshots, monitor reports, and generated follow-up tasks.

Safety

Locks, cooldowns, budgets, forbidden commands, protected paths, and maximum diff limits keep autonomous work bounded.

Dogfooding

Forge should operate on itself and on sandbox apps like todo, kanban, dashboard, CRM, and recipe planner projects to generate regression data.

Run artifacts

Every autonomous action leaves a receipt.

Forge treats observability as part of the product. A run should be easy to audit later: what was attempted, what changed, what proof was collected, what failed, and what should happen next.

Task contract Executor transcript Diff summary Test and lint output Screenshot notes Monitor verdict Follow-up queue entries

v1 scope

Rust CLI with a single local binary.
SQLite-backed task queue, memory, runs, and artifacts.
Build and monitor schedules with policy enforcement.
Executor adapters starting with Codex CLI.
PR-mode delivery with logs, summaries, and review output.

Success criteria

Discovers useful repo work instead of generating churn.
Produces reviewable changes with mechanical proof.
Turns rejected work into targeted follow-up tasks.
Maintains repo health over long-running autonomous loops.
Improves prompts, policies, and workflows safely.

CLI shape forge init forge heartbeat forge run <task> forge monitor forge status forge explain forge policy audit forge receipts tail