commit 0e1f9ebc01ce24549102b5fe23e6cd08f3a2d1d0
Author: Coder Agent <coder@agents.omni>
Date: Sat Feb 14 12:21:09 2026
Omni/Pipeline: design doc for automated dev-verify-ship
Single-process Haskell pipeline replacing dev-review-release.sh.
Only dev phase uses an agent; verify and integrate are deterministic.
Concurrent dev via workspace pool, structured state in SQLite.
Task-Id: t-587
diff --git a/Omni/Pipeline/DESIGN.md b/Omni/Pipeline/DESIGN.md
new file mode 100644
index 00000000..9a00b0bb
--- /dev/null
+++ b/Omni/Pipeline/DESIGN.md
@@ -0,0 +1,507 @@
+# Pipeline: Automated Dev-Review-Ship
+
+Replaces `dev-review-release.sh` with a single Haskell process that drives
+tasks from Open through development, verification, and integration into `live`.
+
+## Goals
+
+1. Ava files a ticket → pipeline ships it → Ava reports back.
+2. Single process, no tmux coordination.
+3. Only the development phase uses an LLM agent. Verify and integrate are deterministic.
+4. Parallel dev work (multiple tasks at once, configurable concurrency).
+5. Structured pipeline state (no mining free-text comments).
+6. Clean authority: the pipeline owns all status transitions; agents only write code.
+
+## Architecture Overview
+
+```
+┌─────────────────────────────────────────────────────┐
+│ Pipeline Process │
+│ │
+│ ┌─────────┐ ┌──────────┐ ┌───────────────┐ │
+│ │ Develop │───>│ Verify │───>│ Integrate │ │
+│ │ (agent) │ │ (bild) │ │ (cherry-pick) │ │
+│ └─────────┘ └──────────┘ └───────────────┘ │
+│ async, ‖ sync, fast sync, fast │
+│ │
+│ ┌──────────────────────────────────────────────┐ │
+│ │ Workspace Pool │ │
+│ │ dev-t-123/ dev-t-456/ integration/ │ │
+│ └──────────────────────────────────────────────┘ │
+│ │
+│ ┌──────────────────────────────────────────────┐ │
+│ │ Pipeline State (SQLite) │ │
+│ │ pipeline_runs: per-phase run records │ │
+│ └──────────────────────────────────────────────┘ │
+└─────────────────────────────────────────────────────┘
+ │ ▲
+ │ task status / comments │ task create / poll
+ ▼ │
+ ┌──────────┐ ┌─────────┐
+ │ Task DB │ │ Ava │
+ └──────────┘ └─────────┘
+```
+
+## Task Lifecycle
+
+```
+Open ──claim──> InProgress ──agent done + bild pass──> Verified
+ ▲ │
+ │ rejected (with feedback) cherry-pick + bild
+ │ │
+ └────────────────────────────────────────────────── Done ─┘
+ (or back to Open on conflict)
+```
+
+### Status Definitions
+
+| Status | Meaning | Who transitions |
+|-------------|---------|-----------------|
+| `Open` | Ready for work | Human/Ava creates, or pipeline rejects |
+| `InProgress` | Dev agent is running | Pipeline claims |
+| `Verified` | Commit exists, build passes | Pipeline (after bild) |
+| `Done` | Integrated into `live` | Pipeline (after cherry-pick) |
+| `NeedsHelp` | Stuck (max retries, conflict, etc.) | Pipeline escalates |
+
+Note: `Review`, `Approved`, `ReviewInProgress`, `Integrating` are all removed. The
+pipeline does verification and integration as synchronous sub-steps, not as
+externally-visible states that need claiming. This eliminates race conditions
+entirely — there is one process and no concurrent claims.
+
+If an optional LLM review step is desired (e.g., for high-complexity tasks), it
+can be added as a sub-phase of Verify that invokes an agent, but the pipeline
+still owns the status transition.
+
+## Phases
+
+### Phase 1: Develop
+
+**Trigger:** Task is `Open`, passes readiness checks (not blocked, within retry
+limits), has an available workspace slot.
+
+**Steps:**
+
+1. Claim task: `Open → InProgress` (atomic, in-process — no CAS needed since
+ single process).
+2. Create workspace: `git worktree add _/pipeline/<task-id> live`
+3. Create task branch: `git checkout -b <task-id>` in the worktree.
+4. Build prompt from workflow template + task context + **review feedback**
+ (if patchset > 0, include recent rejection comments).
+5. Spawn agent via `agentd run` (background mode). Record run ID in
+ `pipeline_runs`.
+6. Return to main loop (don't block).
+
+**On agent completion (detected by polling agentd status):**
+
+- If agent exited 0 and commit SHA changed on task branch:
+ Proceed to Verify.
+- If agent exited 0 but no commit:
+ Record failure ("no commit produced"), increment retry count.
+ Reset status to `Open`.
+- If agent exited non-zero:
+ Record failure with error info, increment retry count.
+ Reset status to `Open`.
+- If retry count exceeds max: set `NeedsHelp`.
+
+### Phase 2: Verify
+
+**Trigger:** Dev phase completed with a commit. Runs immediately, synchronously.
+No agent involved.
+
+**Steps:**
+
+1. In the dev worktree (commit is already there):
+ ```
+ git rev-list --count live..<task-id>
+ ```
+ Must be exactly 1. If not, record failure ("branch shape violation").
+
+2. Extract namespace from task metadata.
+
+3. If namespace is buildable:
+ ```
+ bild <namespace>
+ ```
+ Check exit code (0 = pass, 2 = not buildable / skip, other = fail).
+
+4. If tests exist:
+ ```
+ bild --test <namespace>
+ ```
+
+5. On success: `InProgress → Verified`. Add structured comment with
+ verification results.
+
+6. On failure: `InProgress → Open`. Add comment with build output and
+ what failed. This is the **review feedback** that Phase 1 will include
+ in the prompt for the next patchset.
+
+### Phase 3: Integrate
+
+**Trigger:** Task is `Verified`. Runs synchronously.
+
+**Steps:**
+
+1. In the integration worktree (always tracks `live`):
+ ```
+ git checkout live
+ git pull --ff-only # or just ensure it's up to date
+ ```
+
+2. Cherry-pick the task commit:
+ ```
+ git cherry-pick <task-id>
+ ```
+
+3. If conflict:
+ ```
+ git cherry-pick --abort
+ ```
+ Reset task branch to fresh `live` (delete and let dev recreate).
+ Set `Open` with comment "Cherry-pick conflict after live diverged;
+ branch reset for re-development."
+
+4. If cherry-pick succeeds, verify on `live`:
+ ```
+ bild <namespace>
+ ```
+ If fail: revert cherry-pick, set `Open` with failure details.
+
+5. If all passes:
+ - Mark `Done`.
+ - Delete task branch: `git branch -d <task-id>`.
+ - Remove dev worktree: `git worktree remove _/pipeline/<task-id>`.
+ - Add summary comment with integrated commit hash.
+
+## Dev Workflow (Agent Prompt)
+
+The agent's job is simplified: **write code and commit. Nothing else.**
+
+```markdown
+# Dev Workflow
+
+You are a coder. Your job is to implement the task described below.
+
+## Rules
+1. Work only in this workspace: {workspace}
+2. One commit on branch `{task_id}` with trailer `Task-Id: {task_id}`.
+3. If a commit already exists (patchset > 0), amend it.
+4. Verify your work: `bild {namespace}` and `bild --test {namespace}`.
+5. **Do NOT change task status.** The pipeline handles that.
+6. **Do NOT push.**
+
+## Task
+- ID: {task_id}
+- Title: {title}
+- Namespace: {namespace}
+- Branch: {task_id}
+- Base: live
+- Patchset: {patchset_count}
+
+### Description
+{description}
+
+{#if patchset > 0}
+### Previous Review Feedback
+The previous patchset was rejected. Fix the issues below:
+
+{review_feedback}
+{/if}
+```
+
+This is dramatically simpler than the current dev.md. The agent doesn't need
+to understand the pipeline, manage status, or coordinate with other roles.
+
+## Workspace Pool
+
+```
+_/pipeline/
+├── integration/ # Persistent, tracks `live`
+├── t-123/ # Ephemeral, per-task dev worktree
+├── t-456/ # Another concurrent dev task
+└── state.db # Pipeline state (or use task DB)
+```
+
+### Lifecycle
+
+- **Create** on task claim: `git worktree add _/pipeline/<tid> -b <tid> live`
+- **Reuse** on retry: reset branch to `live`, agent starts fresh
+ (review feedback in prompt tells it what to fix)
+- **Destroy** on Done: `git worktree remove _/pipeline/<tid>` +
+ `git branch -d <tid>`
+- **Max concurrent**: configurable (default 3). When pool is full, new Open
+ tasks wait.
+
+### Recovery
+
+On startup, the pipeline scans `_/pipeline/` for existing worktrees:
+- If a worktree's task is `InProgress` but no agentd run is active: stale.
+ Reset task to `Open`, clean up worktree.
+- If a worktree's task is `Done`: leftover. Remove worktree.
+- If a worktree's task is `Verified`: resume integration.
+- Integration worktree: ensure it exists and is on `live`. Recreate if broken.
+
+## Pipeline State
+
+A new table in the task DB (or a separate SQLite DB at `_/pipeline/state.db`):
+
+```sql
+CREATE TABLE pipeline_runs (
+ id INTEGER PRIMARY KEY AUTOINCREMENT,
+ task_id TEXT NOT NULL,
+ phase TEXT NOT NULL, -- 'dev', 'verify', 'integrate'
+ patchset INTEGER NOT NULL,
+ agentd_run_id TEXT, -- only for dev phase
+ status TEXT NOT NULL, -- 'running', 'success', 'failure'
+ cost_cents REAL DEFAULT 0,
+ started_at TIMESTAMP NOT NULL,
+ finished_at TIMESTAMP,
+ error_summary TEXT, -- structured failure reason
+ UNIQUE(task_id, phase, patchset, agentd_run_id)
+);
+```
+
+This gives us:
+
+| Query | SQL |
+|-------|-----|
+| Retry count for task patchset | `SELECT COUNT(*) FROM pipeline_runs WHERE task_id=? AND phase='dev' AND patchset=? AND status='failure'` |
+| Cumulative cost | `SELECT SUM(cost_cents) FROM pipeline_runs WHERE task_id=?` |
+| Active dev runs | `SELECT * FROM pipeline_runs WHERE phase='dev' AND status='running'` |
+| Last failure | `SELECT error_summary FROM pipeline_runs WHERE task_id=? AND status='failure' ORDER BY finished_at DESC LIMIT 1` |
+| Review feedback for prompt | `SELECT error_summary FROM pipeline_runs WHERE task_id=? AND phase='verify' AND status='failure' ORDER BY finished_at DESC LIMIT 1` |
+
+No more regex parsing of comments.
+
+Task comments are still written for human/Ava consumption, but they're
+informational, not structural. The pipeline never reads its own comments back.
+
+## Main Loop
+
+```
+every poll_interval:
+
+ 1. INTEGRATE: for each Verified task (priority: oldest first)
+ run_integrate(task) -- fast, synchronous
+
+ 2. VERIFY: for each InProgress task where agent finished
+ run_verify(task) -- fast, synchronous
+
+ 3. HARVEST: for each running dev agent
+ poll_agentd_status(run_id)
+ if finished: mark for verify in next cycle
+
+ 4. DEVELOP: for each Open task (up to concurrency limit)
+ if retry_ok(task) and slots_available():
+ spawn_dev(task) -- async, returns immediately
+
+ sleep(poll_interval) -- default 10s
+```
+
+Priority order (integrate > verify > harvest > develop) ensures finished work
+is cleared through the pipeline before new work is started. This naturally
+creates back-pressure: if integration is failing, the pipeline stops starting
+new dev tasks.
+
+## Concurrency
+
+- **Dev phase**: Up to N concurrent agents (configurable, default 2).
+ Each gets its own worktree.
+- **Verify phase**: Serial. Runs in the dev worktree. Fast (seconds to minutes).
+- **Integrate phase**: Serial. One integration worktree. Fast.
+
+N=2 is a good default. Each dev agent is a Docker container with an LLM session,
+so the bottleneck is API rate limits and host CPU, not the pipeline.
+
+## Retry & Circuit Breaker
+
+Per task, tracked in `pipeline_runs`:
+
+- **Max retries per patchset**: configurable (default 5).
+ After N failures at the same patchset, the pipeline:
+ 1. Sets task to `NeedsHelp`.
+ 2. Adds a comment: "Pipeline: exceeded {N} retries on patchset {P}. Last error: {E}".
+ 3. Stops attempting the task.
+
+- **Backoff**: exponential with cap.
+ After failure K, wait `min(interval * 2^K, 600)` seconds before retrying.
+ Tracked by `finished_at` of the last failure row.
+
+- **Cost cap**: if cumulative cost for a task exceeds threshold, set `NeedsHelp`.
+
+All of this is a SQL query against `pipeline_runs`, not comment parsing.
+
+## Ava Integration
+
+Ava doesn't need special tools. The interface is the task DB:
+
+**Ava files work:**
+```
+task create "Fix crash on empty input" \
+ --namespace="Omni/Agent/Tools.hs" \
+ --parent=t-587 \
+ --description="..."
+```
+The pipeline picks it up on next poll (it's `Open` and ready).
+
+**Ava monitors progress:**
+```
+task show t-XXX --json
+```
+Check `taskStatus`:
+- `InProgress` → "Working on it"
+- `Verified` → "Built successfully, integrating"
+- `Done` → "Shipped! Commit: ..."
+- `NeedsHelp` → "Stuck, needs your attention"
+
+Pipeline comments on the task give Ava the narrative:
+- "Pipeline: dev agent completed (commit abc123)"
+- "Pipeline: build verification passed"
+- "Pipeline: integrated into live at def456"
+- "Pipeline: build failed. Output: ..."
+
+**Ava doesn't need:**
+- `delegate_to_pipeline` tool (just create a task)
+- `monitor_pipeline_task` tool (just `task show`)
+- PipelineDelegate.hs module (delete it)
+
+The existing `task` CLI is the interface. Ava already knows how to use it.
+
+## CLI
+
+```
+Omni/Pipeline.hs run [options] -- Start the pipeline
+Omni/Pipeline.hs status [--json] -- Show pipeline health
+Omni/Pipeline.hs drain -- Stop accepting new work, finish in-flight
+```
+
+### `run` options
+
+```
+--root PATH Workspace root (default: _/pipeline)
+--base BRANCH Base branch (default: live)
+--concurrency N Max concurrent dev agents (default: 2)
+--interval SEC Poll interval (default: 10)
+--max-retries N Per-patchset retry limit (default: 5)
+--max-task-cost C Cost cap per task in cents (default: 0 = unlimited)
+--parent ID Only process tasks under this epic
+--task-id ID Only process this one task
+--provider NAME Agent provider (default: from env)
+--timeout SEC Agent timeout per run (default: 1800)
+--once Process one full cycle, then exit
+--dry-run Print what would happen
+```
+
+### `status` output (JSON)
+
+```json
+{
+ "active_dev_runs": [
+ {"task_id": "t-123", "run_id": "dev-t-123-...", "elapsed_sec": 342}
+ ],
+ "tasks_by_status": {"Open": 3, "InProgress": 2, "Verified": 1, "Done": 15},
+ "recent_failures": [...],
+ "stuck_tasks": ["t-456"],
+ "workspace_pool": {"active": 2, "max": 3, "integration": "healthy"}
+}
+```
+
+## Module Structure
+
+```
+Omni/Pipeline.hs -- Entry point, CLI, main loop
+Omni/Pipeline/
+ Core.hs -- Types, config, state machine logic
+ Dev.hs -- Phase 1: agent spawning, prompt building
+ Verify.hs -- Phase 2: bild, test, branch shape checks
+ Integrate.hs -- Phase 3: cherry-pick, final verify, cleanup
+ Workspace.hs -- Worktree pool management + recovery
+ State.hs -- pipeline_runs table, queries
+ Git.hs -- Git operations (cherry-pick, branch, worktree)
+```
+
+### Key types
+
+```haskell
+data PipelineConfig = PipelineConfig
+ { pcRoot :: FilePath -- workspace root
+ , pcBaseBranch :: Text -- "live"
+ , pcConcurrency :: Int -- max parallel dev agents
+ , pcPollInterval :: Int -- seconds
+ , pcMaxRetries :: Int -- per patchset
+ , pcMaxTaskCost :: Int -- cents, 0 = unlimited
+ , pcAgentTimeout :: Int -- seconds
+ , pcAgentProvider :: Maybe Text
+ , pcParentFilter :: Maybe Text -- restrict to epic
+ , pcTaskFilter :: Maybe Text -- restrict to single task
+ }
+
+data Phase = Dev | Verify | Integrate
+ deriving (Show, Eq)
+
+data RunRecord = RunRecord
+ { rrTaskId :: Text
+ , rrPhase :: Phase
+ , rrPatchset :: Int
+ , rrAgentdRunId :: Maybe Text
+ , rrStatus :: RunStatus
+ , rrCostCents :: Double
+ , rrStartedAt :: UTCTime
+ , rrFinishedAt :: Maybe UTCTime
+ , rrError :: Maybe Text
+ }
+
+data RunStatus = Running | Success | Failure
+ deriving (Show, Eq)
+
+-- Active dev run tracked in memory
+data ActiveDev = ActiveDev
+ { adTaskId :: Text
+ , adRunId :: Text
+ , adWorkspace :: FilePath
+ , adStartedAt :: UTCTime
+ , adBeforeSha :: Maybe Text -- commit before agent ran
+ }
+```
+
+### Integration with existing code
+
+The pipeline imports directly from:
+- `Omni.Task.Core`: `loadTasks`, `findTask`, `updateTaskStatus`, `claimTask`,
+ `addComment`, `Task(..)`, `Status(..)`
+- `Omni.Namespace`: `fromRelPath`, `toPath`
+- `Omni.Bild`: `isBuildableNs` (to check before running bild)
+
+It shells out to:
+- `agentd run` / `agentd status` (for dev agents)
+- `git` (for worktree, branch, cherry-pick operations)
+- `bild` (for build verification)
+
+Direct Haskell imports for Task avoid the overhead and fragility of shelling
+out to `task` CLI for every status check.
+
+## Migration Path
+
+1. Build `Omni/Pipeline.hs` alongside existing `dev-review-release.sh`.
+2. Add `Verified` status to `Omni/Task/Core.hs`.
+3. Test on a single task: `Omni/Pipeline.hs run --task-id t-XXX --once`.
+4. Run on an epic: `Omni/Pipeline.hs run --parent t-587`.
+5. Once stable, deprecate `dev-review-release.sh`.
+6. Delete `Omni/Ava/Tools/PipelineDelegate.hs` (Ava just uses `task` CLI).
+7. Clean up unused statuses (`ReviewInProgress`, `Integrating`) from
+ `Task.Core`.
+
+## What This Eliminates
+
+| Old problem | Resolution |
+|-------------|-----------|
+| Three polling loops / tmux windows | Single process |
+| Agent and shell both manage status | Pipeline owns all transitions; agent just commits |
+| LLM doing integration (4 shell commands) | Deterministic integration in Haskell |
+| LLM doing code review it can't reliably do | Build verification gate (bild + tests) |
+| Retry state mined from comments | `pipeline_runs` SQL table |
+| No review feedback to dev | Verify failure message injected into next dev prompt |
+| Serial one-task-at-a-time bottleneck | Configurable concurrent dev agents |
+| Race conditions needing CAS claims | Single process, no races |
+| Stale task branches after live diverges | Branch reset on conflict, fresh checkout on retry |
+| 1315-line bash script | Typed Haskell with proper modules |