Optimistic context compaction watcher

t-432·WorkTask·
·
·
Created1 month ago·Updated4 weeks ago

Description

Edit

Implement a watcher process that monitors agent context and proactively prunes cruft BEFORE hitting token limits.

Problem

Current auto-compaction is reactive - triggers when context is full. This is suboptimal:

  • Last-minute compression loses fidelity
  • Agent performance degrades as context fills with cruft
  • No opportunity for graceful degradation

Optimistic Compaction Approach

Run a background watcher that: 1. Monitors agent context in real-time 2. Identifies low-value content early:

  • Redundant tool outputs (e.g., 5 similar grep results)
  • Superseded reasoning (explored path A, then chose path B)
  • Verbose error traces where only conclusion matters

3. Compresses/summarizes incrementally before hitting limits 4. Maintains invariant: context always has N% headroom

Implementation Ideas

  • Sidecar process watching context via IPC/shared memory
  • Scoring function: recency × relevance × information density
  • Incremental summarization: old sections → summaries
  • Preserve "anchors": key decisions, user instructions, current goal

Connection to Research

  • Related to t-399 (rate-distortion compression)
  • Could use lightweight local model for summarization
  • Informs optimal compression ratios empirically

Metrics

  • Context utilization over time
  • Information retention (can agent recall pruned info?)
  • Performance impact of early vs late compaction

Timeline (8)

🔄[human]Open → InProgress1 month ago
💬[human]1 month ago

Agent timed out. Partial design doc created: OPTIMISTIC_COMPACTION_DESIGN.md

💬[human]1 month ago

Core context watcher module implemented with basic token analysis and compaction recommendations. Module compiles and tests pass. Ready for integration with agent architecture.

🔄[human]InProgress → Done1 month ago
💬[human]4 weeks ago

Connection to Prompt IR (from t-477 design session)

The Prompt IR design directly supports optimistic compaction via:

Reserve ratio for headroom:

data TokenBudget = TokenBudget
  { tbTotal :: Int
  , tbReserveRatio :: Float  -- Keep N% headroom (e.g., 0.15 = 15%)
  , tbAllocation :: BudgetAllocation
  }

Scoring fields on sections:

data Section = Section
  { ...
  , secPriority :: Priority        -- Critical > High > Medium > Low
  , secRelevance :: Maybe Float    -- 0.0-1.0, task-specific
  , secRecency :: Maybe UTCTime    -- When this info was current
  ...
  }

Compaction scoring function:

-- Score for compaction: lower = compact first
compactionScore :: Section -> UTCTime -> Float
compactionScore s now = 
  priorityWeight (secPriority s) 
  * fromMaybe 0.5 (secRelevance s)
  * recencyWeight (secRecency s) now
  where
    recencyWeight Nothing _ = 0.5
    recencyWeight (Just t) now = 
      let ageHours = diffUTCTime now t / 3600
      in exp (-ageHours / 24)  -- Half-life of ~24 hours

How this enables optimistic compaction:

1. Watcher can score sections without understanding content (pure metadata) 2. Incremental summarization can target low-score sections 3. Invariant maintained: actualTokens <= tbTotal * (1 - tbReserveRatio) 4. Graceful degradation: compress Low/Medium priority first, preserve Critical

Watcher loop sketch:

compactionWatcher :: TVar PromptIR -> TokenBudget -> IO ()
compactionWatcher irVar budget = forever do
  threadDelay 1_000_000  -- Check every second
  ir <- readTVarIO irVar
  let headroom = 1.0 - (fromIntegral (pmTotalTokens (pirMeta ir)) / fromIntegral (tbTotal budget))
  when (headroom < tbReserveRatio budget) do
    -- Proactively compact before we hit the wall
    ir' <- compactLowestScoring ir
    atomically (writeTVar irVar ir')