Incremental rollout of a multi-LLM up-speccing pipeline for agent tasks. The core insight: well-specified tasks can be outsourced to small models (Sonnet/Haiku), while underspecified tasks need large model (Opus/GPT-5.3) passes to reach a quality threshold before handoff.
The pipeline uses an executor GO/NO-GO gate: the executor model (Sonnet/Haiku) is asked to restate the task and confirm it can implement without clarifying questions. If NO-GO, the spec is sent back to the large model to address the questions. Passes continue until convergence (GO) or a circuit breaker (max 5 passes).
Key metric: passes-to-convergence per task. Also track task success rate, revision count, and time-to-done across all MVPs.
MVP stages: 0. Baseline metrics on current manual process 1. Executor GO/NO-GO gate (single extra LLM call, bounces to human on NO-GO) 2. Single-model auto-spec loop (Opus answers executor questions automatically) 3. Diverse multi-model review (Opus drafts, GPT-5.3 devil's advocate, executor gate)
Design principles:
No activity yet.