Extend the JIT context rehydration (t-677) with speculative pre-fetching: instead of waiting for the model to call request_context, the compiler predicts what the model will likely need mid-turn and pre-hydrates those sections in the background.
In PL JIT history, the interesting compilers (V8 TurboFan, LuaJIT trace compiler) evolved from 'compile hot paths on demand' to 'speculatively compile predicted paths with bailout.' The same evolution applies here.
With basic JIT (t-677), the model calls request_context, waits for hydration, then continues. With speculative pre-fetching, the compiler predicts 2-3 likely-needed sections at compile time and hydrates them in the background. If the model asks for one, it's instant (cache hit). If not, the pre-fetch is discarded (cheap bailout).
At initial compile time, the compiler has rich signals for prediction:
request_context calls followed which task types in past runs (trace-based optimization)1. Prediction pass: After AOT compilation, run a lightweight prediction pass that scores candidate sections by likelihood of being requested
2. Background hydration: Top-N candidates are hydrated in parallel (async), stored in a warm cache keyed by section ID / query signature
3. Cache integration: When request_context fires in the interpreter, check the warm cache first. Cache hit = instant splice, no hydration latency. Cache miss = fall back to on-demand hydration (t-677 behavior)
4. Bailout: Unused pre-fetched sections are discarded at end of turn. No wasted prompt budget — they only enter the prompt if explicitly requested.
LCM's runtime approach can't predict what the model will need because it doesn't have a global view of the task. Our compiler sees the full ContextRequest, task metadata, and historical patterns at compile time — enough to make useful predictions before the model even starts reasoning.
request_context calls (cache hit vs miss)This is an optimization pass. Only implement after t-677 is working and we have data on what request_context calls the model actually makes in practice. The prediction model should be informed by real usage traces, not guesses.
No activity yet.