MVP 3: Diverse multi-model review

t-675.4·WorkTask·
·
·
·omni.hs
Parent:t-675·Created1 month ago·Updated1 month ago·pipeline runs →

Dependencies

Description

Edit

Add a second large model (GPT-5.3) as devil's advocate in the spec-revision loop. This builds on MVP 2 by adding diverse model perspectives to catch blind spots.

Flow: 1. Task spec is drafted (by human or from initial task description) 2. Opus writes/revises the spec (as in MVP 2) 3. GPT-5.3 reviews the spec as devil's advocate: 'What is ambiguous, underspecified, or likely to cause implementation failures in this spec?' 4. If GPT-5.3 identifies issues, feed them back to Opus for revision 5. Amended spec goes to executor gate (Sonnet/Haiku) 6. Loop until convergence

Role separation (NOT adversarial):

  • Opus: spec author/amender — writes and revises the spec
  • GPT-5.3: devil's advocate/reviewer — identifies ambiguity and risk
  • Sonnet/Haiku: executor gate — GO/NO-GO readiness check

Implementation:

  • Add model routing to the spec loop (needs to call different LLM providers)
  • GPT-5.3 review step happens between Opus revision and executor gate
  • Gate this on task complexity — simple tasks (complexity 1-2) skip multi-model review and use MVP 2 flow only
  • Track: how often GPT-5.3 catches issues Opus missed, correlation with task success

Oscillation prevention:

  • Models play different roles (author vs reviewer), not both authoring
  • GPT-5.3 identifies issues but does NOT rewrite the spec — Opus does all spec writing
  • If the same issue cycles back after being addressed, flag it and bounce to human

Acceptance criteria:

  • Complex tasks (complexity 3+) go through multi-model review
  • Simple tasks (complexity 1-2) use MVP 2 single-model flow
  • GPT-5.3 review comments are logged on the task
  • No oscillation: max 2 Opus-GPT cycles per gate pass
  • Metrics show whether multi-model review improves task success rate vs MVP 2 alone

Timeline (0)

No activity yet.