t-363.7 - omni

t-363.7·WorkTask···

Parent:t-363·Created1 month ago·Updated1 month ago

Dependencies

t-363.1 [Blocks]
t-363.2 [Blocks]
t-363.3 [Blocks]
t-363.4 [Blocks]
t-363.5 [Blocks]
t-363.6 [Blocks]

Description

End-to-end test of actor system using GAIA-style research workflow.

Overview

This task validates that the actor system works correctly by running the existing GAIA research workflow through actor primitives instead of static workflow YAML.

Test Case: GAIA Research

The existing workflow at Omni/Agent/Workflows/GaiaResearch.yaml answers: > "What is the population density of the country that won the 2022 FIFA World Cup?"

It has 4 steps: research → gather_data → calculate → verify

Expected Behavior with Actor System

A root actor should: 1. Receive the question as initial message 2. CREATE a researcher child to find who won 3. Wait for researcher result, then CREATE a gatherer child 4. Wait for gatherer result, then CREATE a calculator child 5. Wait for calculator result, then CREATE a verifier child 6. Send final answer to its customer

The actor system should achieve the same result as the static workflow, but the decomposition emerges from the agent's decisions, not from hardcoded YAML.

Success Criteria

1. Correct answer: Final output contains Argentina and ~16-17 people/km² 2. Actor primitives used: Events show actor_create, actor_send, actor_receive 3. Proper delegation: Multiple actors created (not single monolithic agent) 4. Capability narrowing: Child actors have narrower capabilities than parent 5. Clean termination: System exits when work complete (no hanging actors)

Implementation

Test Structure

-- In Omni/Agent/Actor.hs or new test file

testGaiaResearch :: Test.Tree
testGaiaResearch = Test.unit "GAIA research via actors" $ do
  -- Setup workspace
  workspace <- createTempWorkspace
  
  -- Create root actor with question
  rootId <- newActorId
  let question = "What is the population density of the country that won the 2022 FIFA World Cup?"
  
  -- Run actor system
  result <- runActorSystem workspace rootId question
  
  -- Verify result
  answer <- readFile (workspace </> "_/gaia/answer.txt")
  assertContains answer "Argentina"
  assertContains answer "people per square kilometer"
  
  -- Verify actor events
  events <- loadEvents workspace rootId
  let creates = filter isActorCreate events
  length creates >= 2 @? "Should create at least 2 child actors"

What This Tests

1. Message passing (t-363.1): Messages flow between actors 2. CREATE (t-363.2): Root creates children for subtasks 3. Capability narrowing (t-363.4): Children have appropriate permissions 4. Integration (t-363.5): Full system works end-to-end 5. Prompts (t-363.6): Agents actually delegate instead of doing everything themselves

Comparison Mode

For validation, run both: 1. Static workflow: agentd run GaiaResearch.yaml 2. Actor system: agentd run "What is the population density..." --actor-mode

Both should produce equivalent answers. The actor version may take different paths but should converge to correct result.

Dependencies

All tasks in epic t-363 (this is the integration test)

References

GAIA workflow: Omni/Agent/Workflows/GaiaResearch.yaml
GAIA steps: Omni/Agent/Workflows/Gaia/*.md
Spec: _/llm/actors.md

--title=Actor model: E2E test with GAIA research

Dependencies

Description

Overview

Test Case: GAIA Research

Expected Behavior with Actor System

Success Criteria

Implementation

Test Structure

What This Tests

Comparison Mode

Dependencies

References

Timeline (2)