t-675.4 - omni

t-675.4·WorkTask····omni.hs

Parent:t-675·Created1 month ago·Updated1 month ago·pipeline runs →

Dependencies

t-675.3 [Blocks]
t-675.5 [Blocks]

Description

Add a second large model (GPT-5.3) as devil's advocate in the spec-revision loop. This builds on MVP 2 by adding diverse model perspectives to catch blind spots.

Flow: 1. Task spec is drafted (by human or from initial task description) 2. Opus writes/revises the spec (as in MVP 2) 3. GPT-5.3 reviews the spec as devil's advocate: 'What is ambiguous, underspecified, or likely to cause implementation failures in this spec?' 4. If GPT-5.3 identifies issues, feed them back to Opus for revision 5. Amended spec goes to executor gate (Sonnet/Haiku) 6. Loop until convergence

Role separation (NOT adversarial):

Opus: spec author/amender — writes and revises the spec
GPT-5.3: devil's advocate/reviewer — identifies ambiguity and risk
Sonnet/Haiku: executor gate — GO/NO-GO readiness check

Implementation:

Add model routing to the spec loop (needs to call different LLM providers)
GPT-5.3 review step happens between Opus revision and executor gate
Gate this on task complexity — simple tasks (complexity 1-2) skip multi-model review and use MVP 2 flow only
Track: how often GPT-5.3 catches issues Opus missed, correlation with task success

Oscillation prevention:

Models play different roles (author vs reviewer), not both authoring
GPT-5.3 identifies issues but does NOT rewrite the spec — Opus does all spec writing
If the same issue cycles back after being addressed, flag it and bounce to human

Acceptance criteria:

Complex tasks (complexity 3+) go through multi-model review
Simple tasks (complexity 1-2) use MVP 2 single-model flow
GPT-5.3 review comments are logged on the task
No oscillation: max 2 Opus-GPT cycles per gate pass
Metrics show whether multi-model review improves task success rate vs MVP 2 alone

MVP 3: Diverse multi-model review

Dependencies

Description

Timeline (0)