← Back to task

Commit 0e1f9ebc

commit 0e1f9ebc01ce24549102b5fe23e6cd08f3a2d1d0
Author: Coder Agent <coder@agents.omni>
Date:   Sat Feb 14 12:21:09 2026

    Omni/Pipeline: design doc for automated dev-verify-ship
    
    Single-process Haskell pipeline replacing dev-review-release.sh.
    Only dev phase uses an agent; verify and integrate are deterministic.
    Concurrent dev via workspace pool, structured state in SQLite.
    
    Task-Id: t-587

diff --git a/Omni/Pipeline/DESIGN.md b/Omni/Pipeline/DESIGN.md
new file mode 100644
index 00000000..9a00b0bb
--- /dev/null
+++ b/Omni/Pipeline/DESIGN.md
@@ -0,0 +1,507 @@
+# Pipeline: Automated Dev-Review-Ship
+
+Replaces `dev-review-release.sh` with a single Haskell process that drives
+tasks from Open through development, verification, and integration into `live`.
+
+## Goals
+
+1. Ava files a ticket → pipeline ships it → Ava reports back.
+2. Single process, no tmux coordination.
+3. Only the development phase uses an LLM agent. Verify and integrate are deterministic.
+4. Parallel dev work (multiple tasks at once, configurable concurrency).
+5. Structured pipeline state (no mining free-text comments).
+6. Clean authority: the pipeline owns all status transitions; agents only write code.
+
+## Architecture Overview
+
+```
+┌─────────────────────────────────────────────────────┐
+│                  Pipeline Process                    │
+│                                                     │
+│  ┌─────────┐    ┌──────────┐    ┌───────────────┐  │
+│  │ Develop  │───>│  Verify  │───>│   Integrate   │  │
+│  │ (agent)  │    │ (bild)   │    │ (cherry-pick) │  │
+│  └─────────┘    └──────────┘    └───────────────┘  │
+│   async, ‖       sync, fast      sync, fast        │
+│                                                     │
+│  ┌──────────────────────────────────────────────┐   │
+│  │           Workspace Pool                      │   │
+│  │  dev-t-123/  dev-t-456/  integration/         │   │
+│  └──────────────────────────────────────────────┘   │
+│                                                     │
+│  ┌──────────────────────────────────────────────┐   │
+│  │           Pipeline State (SQLite)             │   │
+│  │  pipeline_runs: per-phase run records         │   │
+│  └──────────────────────────────────────────────┘   │
+└─────────────────────────────────────────────────────┘
+         │                              ▲
+         │ task status / comments       │ task create / poll
+         ▼                              │
+    ┌──────────┐                   ┌─────────┐
+    │ Task DB  │                   │   Ava   │
+    └──────────┘                   └─────────┘
+```
+
+## Task Lifecycle
+
+```
+Open ──claim──> InProgress ──agent done + bild pass──> Verified
+  ▲                                                       │
+  │ rejected (with feedback)                    cherry-pick + bild
+  │                                                       │
+  └────────────────────────────────────────────────── Done ─┘
+                                                  (or back to Open on conflict)
+```
+
+### Status Definitions
+
+| Status       | Meaning | Who transitions |
+|-------------|---------|-----------------|
+| `Open`       | Ready for work | Human/Ava creates, or pipeline rejects |
+| `InProgress` | Dev agent is running | Pipeline claims |
+| `Verified`   | Commit exists, build passes | Pipeline (after bild) |
+| `Done`       | Integrated into `live` | Pipeline (after cherry-pick) |
+| `NeedsHelp`  | Stuck (max retries, conflict, etc.) | Pipeline escalates |
+
+Note: `Review`, `Approved`, `ReviewInProgress`, `Integrating` are all removed. The
+pipeline does verification and integration as synchronous sub-steps, not as
+externally-visible states that need claiming. This eliminates race conditions
+entirely — there is one process and no concurrent claims.
+
+If an optional LLM review step is desired (e.g., for high-complexity tasks), it
+can be added as a sub-phase of Verify that invokes an agent, but the pipeline
+still owns the status transition.
+
+## Phases
+
+### Phase 1: Develop
+
+**Trigger:** Task is `Open`, passes readiness checks (not blocked, within retry
+limits), has an available workspace slot.
+
+**Steps:**
+
+1. Claim task: `Open → InProgress` (atomic, in-process — no CAS needed since
+   single process).
+2. Create workspace: `git worktree add _/pipeline/<task-id> live`
+3. Create task branch: `git checkout -b <task-id>` in the worktree.
+4. Build prompt from workflow template + task context + **review feedback**
+   (if patchset > 0, include recent rejection comments).
+5. Spawn agent via `agentd run` (background mode). Record run ID in
+   `pipeline_runs`.
+6. Return to main loop (don't block).
+
+**On agent completion (detected by polling agentd status):**
+
+- If agent exited 0 and commit SHA changed on task branch:
+  Proceed to Verify.
+- If agent exited 0 but no commit:
+  Record failure ("no commit produced"), increment retry count.
+  Reset status to `Open`.
+- If agent exited non-zero:
+  Record failure with error info, increment retry count.
+  Reset status to `Open`.
+- If retry count exceeds max: set `NeedsHelp`.
+
+### Phase 2: Verify
+
+**Trigger:** Dev phase completed with a commit. Runs immediately, synchronously.
+No agent involved.
+
+**Steps:**
+
+1. In the dev worktree (commit is already there):
+   ```
+   git rev-list --count live..<task-id>
+   ```
+   Must be exactly 1. If not, record failure ("branch shape violation").
+
+2. Extract namespace from task metadata.
+
+3. If namespace is buildable:
+   ```
+   bild <namespace>
+   ```
+   Check exit code (0 = pass, 2 = not buildable / skip, other = fail).
+
+4. If tests exist:
+   ```
+   bild --test <namespace>
+   ```
+
+5. On success: `InProgress → Verified`. Add structured comment with
+   verification results.
+
+6. On failure: `InProgress → Open`. Add comment with build output and
+   what failed. This is the **review feedback** that Phase 1 will include
+   in the prompt for the next patchset.
+
+### Phase 3: Integrate
+
+**Trigger:** Task is `Verified`. Runs synchronously.
+
+**Steps:**
+
+1. In the integration worktree (always tracks `live`):
+   ```
+   git checkout live
+   git pull --ff-only     # or just ensure it's up to date
+   ```
+
+2. Cherry-pick the task commit:
+   ```
+   git cherry-pick <task-id>
+   ```
+
+3. If conflict:
+   ```
+   git cherry-pick --abort
+   ```
+   Reset task branch to fresh `live` (delete and let dev recreate).
+   Set `Open` with comment "Cherry-pick conflict after live diverged;
+   branch reset for re-development."
+
+4. If cherry-pick succeeds, verify on `live`:
+   ```
+   bild <namespace>
+   ```
+   If fail: revert cherry-pick, set `Open` with failure details.
+
+5. If all passes:
+   - Mark `Done`.
+   - Delete task branch: `git branch -d <task-id>`.
+   - Remove dev worktree: `git worktree remove _/pipeline/<task-id>`.
+   - Add summary comment with integrated commit hash.
+
+## Dev Workflow (Agent Prompt)
+
+The agent's job is simplified: **write code and commit. Nothing else.**
+
+```markdown
+# Dev Workflow
+
+You are a coder. Your job is to implement the task described below.
+
+## Rules
+1. Work only in this workspace: {workspace}
+2. One commit on branch `{task_id}` with trailer `Task-Id: {task_id}`.
+3. If a commit already exists (patchset > 0), amend it.
+4. Verify your work: `bild {namespace}` and `bild --test {namespace}`.
+5. **Do NOT change task status.** The pipeline handles that.
+6. **Do NOT push.**
+
+## Task
+- ID: {task_id}
+- Title: {title}
+- Namespace: {namespace}
+- Branch: {task_id}
+- Base: live
+- Patchset: {patchset_count}
+
+### Description
+{description}
+
+{#if patchset > 0}
+### Previous Review Feedback
+The previous patchset was rejected. Fix the issues below:
+
+{review_feedback}
+{/if}
+```
+
+This is dramatically simpler than the current dev.md. The agent doesn't need
+to understand the pipeline, manage status, or coordinate with other roles.
+
+## Workspace Pool
+
+```
+_/pipeline/
+├── integration/          # Persistent, tracks `live`
+├── t-123/               # Ephemeral, per-task dev worktree
+├── t-456/               # Another concurrent dev task
+└── state.db             # Pipeline state (or use task DB)
+```
+
+### Lifecycle
+
+- **Create** on task claim: `git worktree add _/pipeline/<tid> -b <tid> live`
+- **Reuse** on retry: reset branch to `live`, agent starts fresh
+  (review feedback in prompt tells it what to fix)
+- **Destroy** on Done: `git worktree remove _/pipeline/<tid>` +
+  `git branch -d <tid>`
+- **Max concurrent**: configurable (default 3). When pool is full, new Open
+  tasks wait.
+
+### Recovery
+
+On startup, the pipeline scans `_/pipeline/` for existing worktrees:
+- If a worktree's task is `InProgress` but no agentd run is active: stale.
+  Reset task to `Open`, clean up worktree.
+- If a worktree's task is `Done`: leftover. Remove worktree.
+- If a worktree's task is `Verified`: resume integration.
+- Integration worktree: ensure it exists and is on `live`. Recreate if broken.
+
+## Pipeline State
+
+A new table in the task DB (or a separate SQLite DB at `_/pipeline/state.db`):
+
+```sql
+CREATE TABLE pipeline_runs (
+  id            INTEGER PRIMARY KEY AUTOINCREMENT,
+  task_id       TEXT NOT NULL,
+  phase         TEXT NOT NULL,          -- 'dev', 'verify', 'integrate'
+  patchset      INTEGER NOT NULL,
+  agentd_run_id TEXT,                   -- only for dev phase
+  status        TEXT NOT NULL,          -- 'running', 'success', 'failure'
+  cost_cents    REAL DEFAULT 0,
+  started_at    TIMESTAMP NOT NULL,
+  finished_at   TIMESTAMP,
+  error_summary TEXT,                   -- structured failure reason
+  UNIQUE(task_id, phase, patchset, agentd_run_id)
+);
+```
+
+This gives us:
+
+| Query | SQL |
+|-------|-----|
+| Retry count for task patchset | `SELECT COUNT(*) FROM pipeline_runs WHERE task_id=? AND phase='dev' AND patchset=? AND status='failure'` |
+| Cumulative cost | `SELECT SUM(cost_cents) FROM pipeline_runs WHERE task_id=?` |
+| Active dev runs | `SELECT * FROM pipeline_runs WHERE phase='dev' AND status='running'` |
+| Last failure | `SELECT error_summary FROM pipeline_runs WHERE task_id=? AND status='failure' ORDER BY finished_at DESC LIMIT 1` |
+| Review feedback for prompt | `SELECT error_summary FROM pipeline_runs WHERE task_id=? AND phase='verify' AND status='failure' ORDER BY finished_at DESC LIMIT 1` |
+
+No more regex parsing of comments.
+
+Task comments are still written for human/Ava consumption, but they're
+informational, not structural. The pipeline never reads its own comments back.
+
+## Main Loop
+
+```
+every poll_interval:
+
+  1. INTEGRATE: for each Verified task (priority: oldest first)
+       run_integrate(task)                        -- fast, synchronous
+
+  2. VERIFY: for each InProgress task where agent finished
+       run_verify(task)                           -- fast, synchronous
+
+  3. HARVEST: for each running dev agent
+       poll_agentd_status(run_id)
+       if finished: mark for verify in next cycle
+
+  4. DEVELOP: for each Open task (up to concurrency limit)
+       if retry_ok(task) and slots_available():
+         spawn_dev(task)                          -- async, returns immediately
+
+  sleep(poll_interval)                            -- default 10s
+```
+
+Priority order (integrate > verify > harvest > develop) ensures finished work
+is cleared through the pipeline before new work is started. This naturally
+creates back-pressure: if integration is failing, the pipeline stops starting
+new dev tasks.
+
+## Concurrency
+
+- **Dev phase**: Up to N concurrent agents (configurable, default 2).
+  Each gets its own worktree.
+- **Verify phase**: Serial. Runs in the dev worktree. Fast (seconds to minutes).
+- **Integrate phase**: Serial. One integration worktree. Fast.
+
+N=2 is a good default. Each dev agent is a Docker container with an LLM session,
+so the bottleneck is API rate limits and host CPU, not the pipeline.
+
+## Retry & Circuit Breaker
+
+Per task, tracked in `pipeline_runs`:
+
+- **Max retries per patchset**: configurable (default 5).
+  After N failures at the same patchset, the pipeline:
+  1. Sets task to `NeedsHelp`.
+  2. Adds a comment: "Pipeline: exceeded {N} retries on patchset {P}. Last error: {E}".
+  3. Stops attempting the task.
+
+- **Backoff**: exponential with cap.
+  After failure K, wait `min(interval * 2^K, 600)` seconds before retrying.
+  Tracked by `finished_at` of the last failure row.
+
+- **Cost cap**: if cumulative cost for a task exceeds threshold, set `NeedsHelp`.
+
+All of this is a SQL query against `pipeline_runs`, not comment parsing.
+
+## Ava Integration
+
+Ava doesn't need special tools. The interface is the task DB:
+
+**Ava files work:**
+```
+task create "Fix crash on empty input" \
+  --namespace="Omni/Agent/Tools.hs" \
+  --parent=t-587 \
+  --description="..."
+```
+The pipeline picks it up on next poll (it's `Open` and ready).
+
+**Ava monitors progress:**
+```
+task show t-XXX --json
+```
+Check `taskStatus`:
+- `InProgress` → "Working on it"
+- `Verified` → "Built successfully, integrating"
+- `Done` → "Shipped! Commit: ..."
+- `NeedsHelp` → "Stuck, needs your attention"
+
+Pipeline comments on the task give Ava the narrative:
+- "Pipeline: dev agent completed (commit abc123)"
+- "Pipeline: build verification passed"
+- "Pipeline: integrated into live at def456"
+- "Pipeline: build failed. Output: ..."
+
+**Ava doesn't need:**
+- `delegate_to_pipeline` tool (just create a task)
+- `monitor_pipeline_task` tool (just `task show`)
+- PipelineDelegate.hs module (delete it)
+
+The existing `task` CLI is the interface. Ava already knows how to use it.
+
+## CLI
+
+```
+Omni/Pipeline.hs run [options]          -- Start the pipeline
+Omni/Pipeline.hs status [--json]        -- Show pipeline health
+Omni/Pipeline.hs drain                  -- Stop accepting new work, finish in-flight
+```
+
+### `run` options
+
+```
+--root PATH          Workspace root (default: _/pipeline)
+--base BRANCH        Base branch (default: live)
+--concurrency N      Max concurrent dev agents (default: 2)
+--interval SEC       Poll interval (default: 10)
+--max-retries N      Per-patchset retry limit (default: 5)
+--max-task-cost C    Cost cap per task in cents (default: 0 = unlimited)
+--parent ID          Only process tasks under this epic
+--task-id ID         Only process this one task
+--provider NAME      Agent provider (default: from env)
+--timeout SEC        Agent timeout per run (default: 1800)
+--once               Process one full cycle, then exit
+--dry-run            Print what would happen
+```
+
+### `status` output (JSON)
+
+```json
+{
+  "active_dev_runs": [
+    {"task_id": "t-123", "run_id": "dev-t-123-...", "elapsed_sec": 342}
+  ],
+  "tasks_by_status": {"Open": 3, "InProgress": 2, "Verified": 1, "Done": 15},
+  "recent_failures": [...],
+  "stuck_tasks": ["t-456"],
+  "workspace_pool": {"active": 2, "max": 3, "integration": "healthy"}
+}
+```
+
+## Module Structure
+
+```
+Omni/Pipeline.hs          -- Entry point, CLI, main loop
+Omni/Pipeline/
+  Core.hs                  -- Types, config, state machine logic
+  Dev.hs                   -- Phase 1: agent spawning, prompt building
+  Verify.hs                -- Phase 2: bild, test, branch shape checks
+  Integrate.hs             -- Phase 3: cherry-pick, final verify, cleanup
+  Workspace.hs             -- Worktree pool management + recovery
+  State.hs                 -- pipeline_runs table, queries
+  Git.hs                   -- Git operations (cherry-pick, branch, worktree)
+```
+
+### Key types
+
+```haskell
+data PipelineConfig = PipelineConfig
+  { pcRoot          :: FilePath       -- workspace root
+  , pcBaseBranch    :: Text           -- "live"
+  , pcConcurrency   :: Int            -- max parallel dev agents
+  , pcPollInterval  :: Int            -- seconds
+  , pcMaxRetries    :: Int            -- per patchset
+  , pcMaxTaskCost   :: Int            -- cents, 0 = unlimited
+  , pcAgentTimeout  :: Int            -- seconds
+  , pcAgentProvider :: Maybe Text
+  , pcParentFilter  :: Maybe Text     -- restrict to epic
+  , pcTaskFilter    :: Maybe Text     -- restrict to single task
+  }
+
+data Phase = Dev | Verify | Integrate
+  deriving (Show, Eq)
+
+data RunRecord = RunRecord
+  { rrTaskId      :: Text
+  , rrPhase       :: Phase
+  , rrPatchset    :: Int
+  , rrAgentdRunId :: Maybe Text
+  , rrStatus      :: RunStatus
+  , rrCostCents   :: Double
+  , rrStartedAt   :: UTCTime
+  , rrFinishedAt  :: Maybe UTCTime
+  , rrError       :: Maybe Text
+  }
+
+data RunStatus = Running | Success | Failure
+  deriving (Show, Eq)
+
+-- Active dev run tracked in memory
+data ActiveDev = ActiveDev
+  { adTaskId    :: Text
+  , adRunId     :: Text
+  , adWorkspace :: FilePath
+  , adStartedAt :: UTCTime
+  , adBeforeSha :: Maybe Text   -- commit before agent ran
+  }
+```
+
+### Integration with existing code
+
+The pipeline imports directly from:
+- `Omni.Task.Core`: `loadTasks`, `findTask`, `updateTaskStatus`, `claimTask`,
+  `addComment`, `Task(..)`, `Status(..)`
+- `Omni.Namespace`: `fromRelPath`, `toPath`
+- `Omni.Bild`: `isBuildableNs` (to check before running bild)
+
+It shells out to:
+- `agentd run` / `agentd status` (for dev agents)
+- `git` (for worktree, branch, cherry-pick operations)
+- `bild` (for build verification)
+
+Direct Haskell imports for Task avoid the overhead and fragility of shelling
+out to `task` CLI for every status check.
+
+## Migration Path
+
+1. Build `Omni/Pipeline.hs` alongside existing `dev-review-release.sh`.
+2. Add `Verified` status to `Omni/Task/Core.hs`.
+3. Test on a single task: `Omni/Pipeline.hs run --task-id t-XXX --once`.
+4. Run on an epic: `Omni/Pipeline.hs run --parent t-587`.
+5. Once stable, deprecate `dev-review-release.sh`.
+6. Delete `Omni/Ava/Tools/PipelineDelegate.hs` (Ava just uses `task` CLI).
+7. Clean up unused statuses (`ReviewInProgress`, `Integrating`) from
+   `Task.Core`.
+
+## What This Eliminates
+
+| Old problem | Resolution |
+|-------------|-----------|
+| Three polling loops / tmux windows | Single process |
+| Agent and shell both manage status | Pipeline owns all transitions; agent just commits |
+| LLM doing integration (4 shell commands) | Deterministic integration in Haskell |
+| LLM doing code review it can't reliably do | Build verification gate (bild + tests) |
+| Retry state mined from comments | `pipeline_runs` SQL table |
+| No review feedback to dev | Verify failure message injected into next dev prompt |
+| Serial one-task-at-a-time bottleneck | Configurable concurrent dev agents |
+| Race conditions needing CAS claims | Single process, no races |
+| Stale task branches after live diverges | Branch reset on conflict, fresh checkout on retry |
+| 1315-line bash script | Typed Haskell with proper modules |