t-398 - omni

t-398·WorkTask···

Created1 month ago·Updated4 weeks ago

Description

Develop theoretical framework for how prompts compose as priors.

Core Question

If prompts are priors, what's the posterior of compose(prompt1, prompt2)?

Intuitions to Formalize

1. System prompt sets a prior distribution over behaviors 2. Each message is a Bayesian update 3. Composing prompts should follow probability laws

Potential Frameworks

1. Hierarchical Bayes: prompt1 is hyperprior, prompt2 is prior

compose(system, skill) = hierarchical model

2. Product of Experts: each prompt is a constraint

compose(A, B) ~ p(x|A) * p(x|B) / Z

3. Mixture Models: prompts as mixture components

compose(A, B) = α*p(x|A) + (1-α)*p(x|B)

4. Information Geometry: prompts as points on manifold

compose = geodesic between prompts?

Research Questions

1. Which framework matches empirical behavior? 2. Can we derive composition laws that predict behavior? 3. What's the 'type theory' of prompts? (if A : Task and B : Style, what's A <> B?)

Validation

Derive predictions from theory
Test predictions empirically
Iterate

Connection to Existing Work

'Bayesian Geometry of Transformer Attention' - foundation
Hierarchical Bayesian modeling - established theory
Information geometry of neural networks - active research

Notes

This is foundational research. High risk, high reward. If successful, enables principled prompt engineering instead of trial-and-error.

Timeline (3)

🔄[human]Open → InProgress1 month ago

💬[human]4 weeks ago

Connection to Prompt IR (from t-477 design session)

The Prompt IR design includes explicit support for Bayesian composition via CompositionMode:

data CompositionMode
  = Hierarchical    -- Hyperprior (system prompt, base instructions)
  | Constraint      -- Product-of-experts (must satisfy)
  | Additive        -- Mixture (adds info, can be dropped)
  | Contextual      -- Bayesian update (observation shifting posterior)

Mapping to your frameworks:

Hierarchical → Hierarchical Bayes (system prompt is hyperprior)
Constraint → Product of Experts (each section is a constraint, compose = multiply)
Additive → Mixture Models (α*p(x|A) + (1-α)*p(x|B))
Contextual → Sequential Bayesian updates

Composition operation:

compose :: PromptIR -> PromptIR -> PromptIR
compose a b = PromptIR
  { pirSections = mergeSections (pirSections a) (pirSections b)
  , ...
  }
  where
    -- Hierarchical sections from 'a' come first (hyperpriors)
    -- Constraint sections are AND'd (product of experts)
    -- Additive sections are collected (mixture)
    -- Contextual sections are sequenced (Bayesian updates)
    mergeSections = ...

This gives us a principled compose that respects the probabilistic semantics of each section type.

Research questions this enables: 1. Empirically test which CompositionMode matches observed LLM behavior 2. Derive composition laws: compose (compose A B) C == compose A (compose B C)? 3. What's the "type theory"? If A : Hierarchical and B : Additive, what's compose A B?

Research: Bayesian calculus for prompt composition