Explore static analysis of prompts by examining open-source model internals (embeddings, attention patterns, activations).
If we can analyze what a prompt 'will do' without running full inference, we can:
1. Can we predict posterior entropy from prompt embeddings alone? 2. Can we detect 'will this prompt use tools' from early layer activations? 3. Can we identify equivalent prompts via embedding geometry? 4. What's the manifold structure of prompt space?
1. Use open models (LLaMA, Mistral) where we can access internals 2. Build dataset: prompts paired with their behavioral outcomes 3. Train lightweight probe/classifier on internal representations 4. See what's predictable without full forward pass
This is genuine research, not engineering. May not succeed. But if it works, it's the foundation for principled Analyze operation.
Connection to Prompt IR (from t-477 design session)
The Prompt IR design includes hooks for static analysis via embeddings:
Per-section embeddings:
Aggregate metadata:
Analysis operations:
Research hooks:
secEmbeddingenables per-section analysis without full inferenceequivalentcan identify prompt equivalences for cachingestimateEntropycould predict "risky" prompts (high uncertainty → run best-of-N)secHashenables content-addressable caching of analysis resultsOpen questions for this research: 1. What embedding model works best? (task-specific vs general) 2. Is embedding magnitude actually correlated with information content? 3. Can we predict tool usage from section embeddings alone? 4. What's the manifold structure of the section embedding space?