Multi-Agent Architecture for GAIA Benchmark

t-346·WorkTask·
·
·
·Omni/Agent/MultiAgent
Created1 month ago·Updated1 month ago

Description

Edit

Goal

Build a multi-agent architecture that can reliably solve GAIA-style benchmark questions through agent collaboration, shared workspace, and orchestrated handoffs.

Architecture Overview

Implement a 3-agent system inspired by Manus: 1. Planner Agent - analyzes the question, breaks it into steps, creates execution plan 2. Executor Agent - executes the plan using available tools (web search, code execution, etc.) 3. Verifier Agent - reviews results, checks answer quality, validates against question requirements

Core Components

Shared Workspace

  • Task state (current step, progress, artifacts)
  • Agent communication log (handoffs, questions, findings)
  • Intermediate results storage (search results, extracted data, etc.)
  • Final answer candidate

Agent Communication Protocol

  • Explicit handoff messages between agents
  • State transitions (planning → executing → verifying → done)
  • Ability to loop back if verification fails
  • Guardrails against infinite loops (max iterations)

Orchestration Logic

  • Spawns agents in sequence: Planner → Executor → Verifier
  • Manages state transitions
  • Handles agent failures/retries
  • Aggregates final result

Test Case

GAIA Level 1 Question: "What was the actual enrollment count of the clinical trial on H. pylori in acne vulgaris patients from Jan-May 2018?"

Expected behavior: 1. Planner breaks down: need to search clinical trials database → filter by criteria → extract enrollment count 2. Executor performs web searches, navigates to trial details, extracts data 3. Verifier checks: is this the right trial? does it match date range? is enrollment count clearly stated? 4. Return exact numeric answer

Success Criteria

  • Architecture can solve the test case question correctly
  • Agent handoffs are clean and traceable
  • Shared workspace maintains clear state
  • System fails gracefully (doesn't infinite loop or hallucinate)

Implementation Notes

  • Use existing subagent spawning infrastructure
  • Shared state can be a simple dict/object passed between agents
  • Each agent gets a specialized system prompt defining its role
  • Consider adding a "max_rounds" guardrail (e.g., 3 full cycles max)

Namespace

Omni/Agent/MultiAgent

Child Tasks

  • t-346.1 - Design and implement shared workspace data structure [Done]
  • t-346.2 - Implement Planner Agent role and system prompt [Done]
  • t-346.3 - Implement Executor Agent role and system prompt [Done]
  • t-346.4 - Implement Verifier Agent role and system prompt [Done]
  • t-346.5 - Implement orchestration logic for agent coordination [Done]
  • t-346.6 - Test multi-agent system on GAIA test case [Done]

Timeline (4)

🔄[human]Open → InProgress1 month ago
🔄[human]InProgress → Done1 month ago
💬[human]1 month ago

Closed as overengineered. The YAML workflow system (agentd run workflow.yaml) already provides multi-agent capabilities with shared filesystem workspace. See Omni/Agent/Workflows/GaiaResearch.yaml for a working example.