Develop theoretical bounds and algorithms for optimal prompt compression.
Given a prompt P and distortion bound ε, what's the shortest prompt P' such that outputs are within ε?
1. Rate: bits/tokens needed to represent prompt 2. Distortion: divergence between outputs (KL, embedding distance, behavioral) 3. Rate-distortion curve: fundamental tradeoff
1. What's the rate-distortion function for LLM prompts? 2. Can we compute sufficient statistics for a prompt? 3. Is there a polynomial-time algorithm for near-optimal compression? 4. How does compression interact with factorization?
1. Study the NeurIPS paper deeply 2. Formalize distortion metrics for agent behavior (not just text similarity) 3. Derive bounds for specific prompt classes 4. Develop practical algorithms that approach bounds
Heavy theory. May need collaboration with information theory researchers. But would enable provably optimal prompt caching/compression.
Connection to Prompt IR (from t-477 design session)
The Prompt IR design includes explicit support for rate-distortion optimization:
Per-section compression metadata:
Compression operation:
Research hooks:
secMinTokens= sufficient statistics (minimum representation preserving inference)secRelevance= proxy for distortion contributionpmCompressionRatiotracks original/compressed for rate measurementtargetTokensKey insight: The IR makes compression decisions explicit and traceable. We can log which sections were compressed/dropped and correlate with behavioral changes to build empirical rate-distortion curves.