Research: automatic detection of operations that would benefit from best-of-N parallel execution.
Some agent operations are inherently uncertain - running them multiple times and picking the best result improves reliability. But agents don't naturally ask for this.
The runtime could automatically detect "risky" operations and transparently run them N times, returning the best result. The agent's program doesn't change.
1. Detection heuristics - How do we know an operation is risky?
2. Selection criteria - How do we pick the "best" result?
3. Prior art - Is there established research?
4. Cost model - When is it worth the extra compute?
In Op.hs, the infer primitive could be wrapped with a reliability layer that does transparent best-of-N. The Sequential/Parallel interpreter would handle this.
Connection to Prompt IR (from t-477 design session)
The Prompt IR design includes a hook for risk/uncertainty estimation:
How this enables best-of-N detection:
Detection heuristics from the IR:
| Signal | IR Field | Interpretation | |--------|----------|----------------| | Many
Additivesections |secCompositionMode| Less constrained, higher variance | | Low relevance scores |secRelevance| Context may not ground the output | | High token count |pmTotalTokens| More degrees of freedom | | FewConstraintsections |secCompositionMode| Fewer hard requirements | | NoHierarchicalsections |secCompositionMode| No strong prior/anchor |Integration point: