Goal
Build a multi-agent architecture that can reliably solve GAIA-style benchmark questions through agent collaboration, shared workspace, and orchestrated handoffs.
Architecture Overview
Implement a 3-agent system inspired by Manus:
1. Planner Agent - analyzes the question, breaks it into steps, creates execution plan
2. Executor Agent - executes the plan using available tools (web search, code execution, etc.)
3. Verifier Agent - reviews results, checks answer quality, validates against question requirements
Core Components
Shared Workspace
- Task state (current step, progress, artifacts)
- Agent communication log (handoffs, questions, findings)
- Intermediate results storage (search results, extracted data, etc.)
- Final answer candidate
Agent Communication Protocol
- Explicit handoff messages between agents
- State transitions (planning → executing → verifying → done)
- Ability to loop back if verification fails
- Guardrails against infinite loops (max iterations)
Orchestration Logic
- Spawns agents in sequence: Planner → Executor → Verifier
- Manages state transitions
- Handles agent failures/retries
- Aggregates final result
Test Case
GAIA Level 1 Question:
"What was the actual enrollment count of the clinical trial on H. pylori in acne vulgaris patients from Jan-May 2018?"
Expected behavior:
1. Planner breaks down: need to search clinical trials database → filter by criteria → extract enrollment count
2. Executor performs web searches, navigates to trial details, extracts data
3. Verifier checks: is this the right trial? does it match date range? is enrollment count clearly stated?
4. Return exact numeric answer
Success Criteria
- Architecture can solve the test case question correctly
- Agent handoffs are clean and traceable
- Shared workspace maintains clear state
- System fails gracefully (doesn't infinite loop or hallucinate)
Implementation Notes
- Use existing subagent spawning infrastructure
- Shared state can be a simple dict/object passed between agents
- Each agent gets a specialized system prompt defining its role
- Consider adding a "max_rounds" guardrail (e.g., 3 full cycles max)
Namespace
Omni/Agent/MultiAgent
Closed as overengineered. The YAML workflow system (agentd run workflow.yaml) already provides multi-agent capabilities with shared filesystem workspace. See Omni/Agent/Workflows/GaiaResearch.yaml for a working example.