Implement empirical prompt compression

t-395·WorkTask·
·
·
Created1 month ago·Updated1 month ago

Description

Edit

Add a Compress operation to the free monad that finds shorter prompts preserving output behavior.

Approach

Iterative compression: shorten prompt, verify outputs are 'close enough', repeat. Not rate-distortion optimal, but practically useful for cost reduction.

Prior Art

  • 'Fundamental Limits of Prompt Compression' (NeurIPS 2024) - theoretical framework
  • LLMLingua, AutoCompressor - existing compression methods

Implementation

1. Add Compress constructor to OpF with distortion bound parameter 2. Implement compression strategies:

  • Remove redundant instructions
  • Summarize examples (few-shot -> distilled)
  • Identify sufficient statistics (what info actually matters)

3. Verification: run both prompts, compare outputs

Metrics & Verification

  • Compression ratio (tokens saved)
  • Output fidelity (embedding similarity, exact match on structured outputs)
  • Cost savings in practice

Test Cases

  • Verbose system prompts -> minimal equivalent
  • Many-shot examples -> fewer examples with same behavior
  • Long context -> compressed context

Notes

Need to define 'close enough' carefully. For tool-calling agents, maybe exact same tool sequence? For generation, embedding similarity?

Timeline (0)

No activity yet.