Analyze results from contract review experiments and draw conclusions.
After running baseline (t-369.23) and swarm (t-369.24) experiments, analyze the data to determine: 1. Does STM-based swarm actually help? 2. If so, why? If not, why not? 3. What are the implications for the cognitive compute vision?
## Results Summary
### Scale Comparison
| Contracts | Single F1 | Swarm F1 | Single Time | Swarm Time |
|-----------|-----------|----------|-------------|------------|
| 5 | ? | ? | ? | ? |
| 10 | ? | ? | ? | ? |
| 20 | ? | ? | ? | ? |
| 50 | ? | ? | ? | ? |
### Per-Clause-Type Performance
| Clause Type | Single P/R/F1 | Swarm P/R/F1 | Delta |
|-------------|---------------|--------------|-------|
| Indemnification | ? | ? | ? |
| Liability | ? | ? | ? |
| ... | ... | ... | ... |
### Cost Analysis
| Mode | Tokens (N=50) | Cost (N=50) | Cost/Contract |
|------|---------------|-------------|---------------|
| Single | ? | ? | ? |
| Swarm | ? | ? | ? |
Did hints help?
What patterns were detected?
Where did single agent fail?
Where did swarm fail?
# CUAD Experiment Conclusions
## Key Finding
[One sentence: Did swarm help?]
## Evidence
[The data that supports the finding]
## Why This Happened
[Explanation of mechanism]
## Implications for Cognitive Compute
### If Swarm Helped:
- STM coordination is valuable for document review tasks
- Pattern learning across documents is a real advantage
- This validates the swarm approach for unstructured data
### If Swarm Didn't Help:
- Parallelism alone is sufficient
- STM overhead isn't worth it for this task type
- Need to find different task types where sharing matters
## Recommendations
[What to build/test next based on findings]
Add CUAD experiment results to Omni/Agent/DESIGN.md:
1. Is there a "crossing point"?
2. What's the sharing value?
3. What's the cost efficiency?
4. Is this task representative?