t-395 - omni

t-395·WorkTask···

Created1 month ago·Updated1 month ago

Description

Add a Compress operation to the free monad that finds shorter prompts preserving output behavior.

Approach

Iterative compression: shorten prompt, verify outputs are 'close enough', repeat. Not rate-distortion optimal, but practically useful for cost reduction.

Prior Art

'Fundamental Limits of Prompt Compression' (NeurIPS 2024) - theoretical framework
LLMLingua, AutoCompressor - existing compression methods

Implementation

1. Add Compress constructor to OpF with distortion bound parameter 2. Implement compression strategies:

Remove redundant instructions
Summarize examples (few-shot -> distilled)
Identify sufficient statistics (what info actually matters)

3. Verification: run both prompts, compare outputs

Metrics & Verification

Compression ratio (tokens saved)
Output fidelity (embedding similarity, exact match on structured outputs)
Cost savings in practice

Test Cases

Verbose system prompts -> minimal equivalent
Many-shot examples -> fewer examples with same behavior
Long context -> compressed context

Notes

Need to define 'close enough' carefully. For tool-calling agents, maybe exact same tool sequence? For generation, embedding similarity?

Implement empirical prompt compression