t-280.1 - omni

t-280.1·WorkTask····Omni/Ava.hs

Parent:t-280·Created1 month ago·Updated1 month ago

Description

Manually test the full orchestrator workflow before automating in Ava.

Workflow to Test

1. Pick a task from task ready 2. Run pi-code <task-id> 3. Run pi-review <task-id> 4. If REQUEST_CHANGES, run pi-code again with feedback 5. Repeat until APPROVE or REJECT

Questions to Answer

How many iterations typically needed?
What kinds of failures occur?
Is the feedback loop effective?
What information would Ava need to report?
What intervention points make sense?

Test Cases

Simple task (like adding --help)
Medium task (small feature)
Task that fails review first time
Task that gets rejected

Document Findings

Add notes to this task about what worked, what didn't, and what Ava's interface should handle.

Timeline (5)

🔄[human]Open → InProgress1 month ago

💬[human]1 month ago

Manual Testing Results

Test Cases Run

1. t-282 (Simple): Add --version flag

Coder: ✅ completed (minor stderr noise at end)
Reviewer: ✅ approved first try
Iterations: 1

2. t-283 (Vague requirements): Add color output

Coder: ✅ interpreted requirements reasonably
Reviewer: ✅ approved (perhaps too lenient - didn't note missing pi-review.sh changes)
Iterations: 1

3. t-284 (Complex feature): Add --dry-run flag

Coder: ✅ comprehensive implementation
Reviewer: ✅ thorough review, approved
Iterations: 1

Observations

What worked well:

Coder produces working code on simple-to-medium tasks
Reviewer gives detailed analysis
End-to-end flow is smooth
Task status updates correctly

Issues noted:

Coder sometimes has stderr noise ("command not found") but changes are fine
Reviewer may be too lenient (approved t-283 without noting inconsistency)
All 3 tests passed first try - need a failure case to test feedback loop

What Ava needs to handle:

Spawn subprocess, capture exit code
Parse final status (done vs needs-help)
Report summary to user (not full output)
Handle timeout for long-running tasks
Retry loop if REQUEST_CHANGES

Still need to test:

Task that gets REJECT
Task that gets REQUEST_CHANGES and requires retry
Complex multi-file refactoring

💬[human]1 month ago

Additional Finding: Untracked Files Bug

During t-285, discovered that pi-review.sh doesn't show untracked (new) files in the diff shown to the reviewer. The check for untracked files exists, but git diff only shows tracked files.

The reviewer correctly identified the issue and returned REQUEST_CHANGES, but for the wrong reason (thought file wasn't added vs. understanding the diff was incomplete).

Created t-286 to fix this.

💬[human]1 month ago

Final Testing Complete

Successfully tested pi-orchestrate.sh on t-286 (the untracked files bug).

Results

Orchestrator ran full cycle: coder → reviewer → commit
Completed in 1 iteration
Proper status tracking and exit
Clean output with progress indicators

What We Now Have

pi-code t-123       # Coder agent (claims task, makes changes)
pi-review t-123     # Reviewer agent (verify, approve/reject)
pi-orchestrate t-123 # Full loop with retries

Ready for Ava Integration

The bash prototypes work. Next steps: 1. t-280.2: Design Telegram interface 2. t-278.2/3: Convert to Haskell for Ava integration 3. t-280.3: Implement in Ava

🔄[human]InProgress → Done1 month ago

Manual test: orchestrator workflow with pi-code + pi-review

Description

Workflow to Test

Questions to Answer

Test Cases

Document Findings

Timeline (5)

Manual Testing Results

Test Cases Run

Observations

Still need to test:

Additional Finding: Untracked Files Bug

Final Testing Complete

Results

What We Now Have

Ready for Ava Integration