t-432 - omni

t-432·WorkTask···

Created1 month ago·Updated4 weeks ago

Description

Edit

Implement a watcher process that monitors agent context and proactively prunes cruft BEFORE hitting token limits.

Problem

Current auto-compaction is reactive - triggers when context is full. This is suboptimal:

Last-minute compression loses fidelity
Agent performance degrades as context fills with cruft
No opportunity for graceful degradation

Optimistic Compaction Approach

Run a background watcher that: 1. Monitors agent context in real-time 2. Identifies low-value content early:

Redundant tool outputs (e.g., 5 similar grep results)
Superseded reasoning (explored path A, then chose path B)
Verbose error traces where only conclusion matters

3. Compresses/summarizes incrementally before hitting limits 4. Maintains invariant: context always has N% headroom

Implementation Ideas

Sidecar process watching context via IPC/shared memory
Scoring function: recency × relevance × information density
Incremental summarization: old sections → summaries
Preserve "anchors": key decisions, user instructions, current goal

Connection to Research

Related to t-399 (rate-distortion compression)
Could use lightweight local model for summarization
Informs optimal compression ratios empirically

Metrics

Context utilization over time
Information retention (can agent recall pruned info?)
Performance impact of early vs late compaction

Timeline (8)

🔄[human]Open → InProgress1 month ago

💬[human]1 month ago

Agent timed out. Partial design doc created: OPTIMISTIC_COMPACTION_DESIGN.md

💬[human]1 month ago

Core context watcher module implemented with basic token analysis and compaction recommendations. Module compiles and tests pass. Ready for integration with agent architecture.

🔄[human]InProgress → Done1 month ago

💬[human]4 weeks ago

Connection to Prompt IR (from t-477 design session)

The Prompt IR design directly supports optimistic compaction via:

Reserve ratio for headroom:

data TokenBudget = TokenBudget
  { tbTotal :: Int
  , tbReserveRatio :: Float  -- Keep N% headroom (e.g., 0.15 = 15%)
  , tbAllocation :: BudgetAllocation
  }

Scoring fields on sections:

data Section = Section
  { ...
  , secPriority :: Priority        -- Critical > High > Medium > Low
  , secRelevance :: Maybe Float    -- 0.0-1.0, task-specific
  , secRecency :: Maybe UTCTime    -- When this info was current
  ...
  }

Compaction scoring function:

-- Score for compaction: lower = compact first
compactionScore :: Section -> UTCTime -> Float
compactionScore s now = 
  priorityWeight (secPriority s) 
  * fromMaybe 0.5 (secRelevance s)
  * recencyWeight (secRecency s) now
  where
    recencyWeight Nothing _ = 0.5
    recencyWeight (Just t) now = 
      let ageHours = diffUTCTime now t / 3600
      in exp (-ageHours / 24)  -- Half-life of ~24 hours

How this enables optimistic compaction:

1. Watcher can score sections without understanding content (pure metadata) 2. Incremental summarization can target low-score sections 3. Invariant maintained: actualTokens <= tbTotal * (1 - tbReserveRatio) 4. Graceful degradation: compress Low/Medium priority first, preserve Critical

Watcher loop sketch:

compactionWatcher :: TVar PromptIR -> TokenBudget -> IO ()
compactionWatcher irVar budget = forever do
  threadDelay 1_000_000  -- Check every second
  ir <- readTVarIO irVar
  let headroom = 1.0 - (fromIntegral (pmTotalTokens (pirMeta ir)) / fromIntegral (tbTotal budget))
  when (headroom < tbReserveRatio budget) do
    -- Proactively compact before we hit the wall
    ir' <- compactLowestScoring ir
    atomically (writeTVar irVar ir')

Optimistic context compaction watcher