t-346.6 - omni

t-346.6·WorkTask····Omni/Agent/MultiAgent

Parent:t-346·Created1 month ago·Updated1 month ago

Description

Run the complete multi-agent architecture on the GAIA Level 1 test question.

Test case: Question: "What was the actual enrollment count of the clinical trial on H. pylori in acne vulgaris patients from Jan-May 2018?"

Expected answer: A specific numeric enrollment count (needs to be verified against actual trial data)

Testing procedure: 1. Run solve_gaia_question(test_question) 2. Review workspace.communication_log to see agent interactions 3. Verify each agent performed its role correctly 4. Check if final answer is correct 5. Review cost/token usage

Acceptance criteria:

System produces a numeric answer
Agent handoffs are clean (no errors or infinite loops)
Communication log shows clear reasoning trail
Answer is verifiable against public trial database

If test fails:

Document what went wrong (which agent, what phase)
Create follow-up tasks for fixes
Consider if architecture needs adjustment

Test multi-agent system on GAIA test case

Description

Timeline (2)