MVP 3: Diverse model review (devil's advocate)

t-683·WorkTask·
·
·
Created1 month ago·Updated1 month ago·pipeline runs →

Description

Edit

Parent epic: t-679 Depends on: MVP 2 (auto-spec loop)

Goal

Add a second large model as devil's advocate during the up-spec process. Use data from MVP 2 to determine which task types benefit from this extra pass.

Mechanism

1. Opus drafts/amends the spec (as in MVP 2) 2. Before sending to executor, route to GPT-5.3 with prompt: "Review this task spec. What's ambiguous, missing, or likely to cause implementation failure? Be adversarial." 3. GPT-5.3's critique feeds back to Opus for one more amendment round 4. Then proceed to executor gate as normal

Design decisions

  • Sequential roles (drafter → devil's advocate), NOT adversarial ping-pong
  • Gate on complexity: only run diverse review for tasks above a complexity threshold (informed by MVP 2 data showing which tasks needed 3+ passes)
  • Track whether diverse review actually improves first-attempt success vs MVP 2

Acceptance criteria

  • Two different large models participate in spec refinement
  • Complexity-based routing: simple tasks skip diverse review
  • Metrics show whether diverse review adds value over MVP 2
  • No oscillation: the models don't endlessly revise each other (one critique round max)

Timeline (0)

No activity yet.