Implement a watcher process that monitors agent context and proactively prunes cruft BEFORE hitting token limits.
Current auto-compaction is reactive - triggers when context is full. This is suboptimal:
Run a background watcher that: 1. Monitors agent context in real-time 2. Identifies low-value content early:
3. Compresses/summarizes incrementally before hitting limits 4. Maintains invariant: context always has N% headroom
Core context watcher module implemented with basic token analysis and compaction recommendations. Module compiles and tests pass. Ready for integration with agent architecture.
The Prompt IR design directly supports optimistic compaction via:
Reserve ratio for headroom:
data TokenBudget = TokenBudget
{ tbTotal :: Int
, tbReserveRatio :: Float -- Keep N% headroom (e.g., 0.15 = 15%)
, tbAllocation :: BudgetAllocation
}
Scoring fields on sections:
data Section = Section
{ ...
, secPriority :: Priority -- Critical > High > Medium > Low
, secRelevance :: Maybe Float -- 0.0-1.0, task-specific
, secRecency :: Maybe UTCTime -- When this info was current
...
}
Compaction scoring function:
-- Score for compaction: lower = compact first
compactionScore :: Section -> UTCTime -> Float
compactionScore s now =
priorityWeight (secPriority s)
* fromMaybe 0.5 (secRelevance s)
* recencyWeight (secRecency s) now
where
recencyWeight Nothing _ = 0.5
recencyWeight (Just t) now =
let ageHours = diffUTCTime now t / 3600
in exp (-ageHours / 24) -- Half-life of ~24 hours
How this enables optimistic compaction:
1. Watcher can score sections without understanding content (pure metadata)
2. Incremental summarization can target low-score sections
3. Invariant maintained: actualTokens <= tbTotal * (1 - tbReserveRatio)
4. Graceful degradation: compress Low/Medium priority first, preserve Critical
Watcher loop sketch:
compactionWatcher :: TVar PromptIR -> TokenBudget -> IO ()
compactionWatcher irVar budget = forever do
threadDelay 1_000_000 -- Check every second
ir <- readTVarIO irVar
let headroom = 1.0 - (fromIntegral (pmTotalTokens (pirMeta ir)) / fromIntegral (tbTotal budget))
when (headroom < tbReserveRatio budget) do
-- Proactively compact before we hit the wall
ir' <- compactLowestScoring ir
atomically (writeTVar irVar ir')
Agent timed out. Partial design doc created: OPTIMISTIC_COMPACTION_DESIGN.md