Test multi-agent system on GAIA test case

t-346.6·WorkTask·
·
·
·Omni/Agent/MultiAgent
Parent:t-346·Created1 month ago·Updated1 month ago

Description

Edit

Run the complete multi-agent architecture on the GAIA Level 1 test question.

Test case: Question: "What was the actual enrollment count of the clinical trial on H. pylori in acne vulgaris patients from Jan-May 2018?"

Expected answer: A specific numeric enrollment count (needs to be verified against actual trial data)

Testing procedure: 1. Run solve_gaia_question(test_question) 2. Review workspace.communication_log to see agent interactions 3. Verify each agent performed its role correctly 4. Check if final answer is correct 5. Review cost/token usage

Acceptance criteria:

  • System produces a numeric answer
  • Agent handoffs are clean (no errors or infinite loops)
  • Communication log shows clear reasoning trail
  • Answer is verifiable against public trial database

If test fails:

  • Document what went wrong (which agent, what phase)
  • Create follow-up tasks for fixes
  • Consider if architecture needs adjustment

Timeline (2)

🔄[human]Open → InProgress1 month ago
🔄[human]InProgress → Done1 month ago