t-403 - omni

t-403·WorkTask···

Created3 months ago·Updated3 months ago·pipeline runs →

Draft Phase 0 evaluation spec for Factor/Compress/Compose experiments and create initial spec doc.

Define a minimal, concrete spec format and metrics so t-394/395/396 can be tested before theory work.

1. Spec document (Markdown) describing:

2. Example spec file with 3-5 test cases.

🔄[human]Open → InProgress3 months ago

💬[human]3 months ago

Drafted Phase 0 spec and example suite: Omni/Agent/Eval/Phase0.md and Omni/Agent/Eval/phase0-suite.yaml

🔄[human]InProgress → Done3 months ago

Phase 0 eval spec for prompt ops