t-346 - omni

t-346·WorkTask····Omni/Agent/MultiAgent

Created1 month ago·Updated1 month ago

Description

Goal

Build a multi-agent architecture that can reliably solve GAIA-style benchmark questions through agent collaboration, shared workspace, and orchestrated handoffs.

Architecture Overview

Implement a 3-agent system inspired by Manus: 1. Planner Agent - analyzes the question, breaks it into steps, creates execution plan 2. Executor Agent - executes the plan using available tools (web search, code execution, etc.) 3. Verifier Agent - reviews results, checks answer quality, validates against question requirements

Core Components

Shared Workspace

Task state (current step, progress, artifacts)
Agent communication log (handoffs, questions, findings)
Intermediate results storage (search results, extracted data, etc.)
Final answer candidate

Agent Communication Protocol

Explicit handoff messages between agents
State transitions (planning → executing → verifying → done)
Ability to loop back if verification fails
Guardrails against infinite loops (max iterations)

Orchestration Logic

Spawns agents in sequence: Planner → Executor → Verifier
Manages state transitions
Handles agent failures/retries
Aggregates final result

Test Case

GAIA Level 1 Question: "What was the actual enrollment count of the clinical trial on H. pylori in acne vulgaris patients from Jan-May 2018?"

Expected behavior: 1. Planner breaks down: need to search clinical trials database → filter by criteria → extract enrollment count 2. Executor performs web searches, navigates to trial details, extracts data 3. Verifier checks: is this the right trial? does it match date range? is enrollment count clearly stated? 4. Return exact numeric answer

Success Criteria

Architecture can solve the test case question correctly
Agent handoffs are clean and traceable
Shared workspace maintains clear state
System fails gracefully (doesn't infinite loop or hallucinate)

Implementation Notes

Use existing subagent spawning infrastructure
Shared state can be a simple dict/object passed between agents
Each agent gets a specialized system prompt defining its role
Consider adding a "max_rounds" guardrail (e.g., 3 full cycles max)

Namespace

Omni/Agent/MultiAgent

Child Tasks

t-346.1 - Design and implement shared workspace data structure [Done]
t-346.2 - Implement Planner Agent role and system prompt [Done]
t-346.3 - Implement Executor Agent role and system prompt [Done]
t-346.4 - Implement Verifier Agent role and system prompt [Done]
t-346.5 - Implement orchestration logic for agent coordination [Done]
t-346.6 - Test multi-agent system on GAIA test case [Done]

Multi-Agent Architecture for GAIA Benchmark