Multi-LLM Task Specification Pipeline

t-675·Epic·
·
·
·omni.hs
Created1 month ago·Updated1 month ago·pipeline runs →

Execution Summary

0/5
Tasks Completed
$0.00
Total Cost
0s
Total Time

Design

Edit

Incremental rollout of a multi-LLM up-speccing pipeline for agent tasks. The core insight: well-specified tasks can be outsourced to small models (Sonnet/Haiku), while underspecified tasks need large model (Opus/GPT-5.3) passes to reach a quality threshold before handoff.

The pipeline uses an executor GO/NO-GO gate: the executor model (Sonnet/Haiku) is asked to restate the task and confirm it can implement without clarifying questions. If NO-GO, the spec is sent back to the large model to address the questions. Passes continue until convergence (GO) or a circuit breaker (max 5 passes).

Key metric: passes-to-convergence per task. Also track task success rate, revision count, and time-to-done across all MVPs.

MVP stages: 0. Baseline metrics on current manual process 1. Executor GO/NO-GO gate (single extra LLM call, bounces to human on NO-GO) 2. Single-model auto-spec loop (Opus answers executor questions automatically) 3. Diverse multi-model review (Opus drafts, GPT-5.3 devil's advocate, executor gate)

Design principles:

  • Don't fix pass count; let executor readiness signal drive convergence
  • Executor restates the task in its own words (catches confident misunderstanding)
  • Large model triages executor questions (skip style questions, only amend for correctness)
  • Sequential model roles, not adversarial (Opus drafts, GPT reviews)

Child Tasks

  • t-675.1 - MVP 0: Baseline metrics instrumentation [Open]
  • t-675.2 - MVP 1: Executor GO/NO-GO gate [Open]
  • t-675.3 - MVP 2: Single-model auto-spec loop [Open]
  • t-675.5 - Multi-model LLM routing infrastructure [Open]
  • t-675.4 - MVP 3: Diverse multi-model review [Open]

Timeline (0)

No activity yet.