Integrate RLM (Recursive Language Models) for long-context workflow analysis

t-345·WorkTask·
·
·
·Omni/Agent/Engine.hs
Created1 month ago·Updated4 weeks ago

Description

Edit

Overview

Integrate Recursive Language Models (RLMs) from the paper "Recursive Language Models" (https://arxiv.org/html/2512.24601v1) to enable detailed analysis and review of large workflow markdown files that exceed normal context windows.

Paper Summary

RLMs solve the long-context problem by treating large inputs as "external environment objects" and using Python code generation to recursively decompose and process them. Key findings:

  • Performance: Double-digit improvements on 4 benchmarks at 10M+ token scales
  • Efficiency: Outperforms both base models AND typical workarounds (summary agents, retrieval)
  • Cost: Maintains similar costs to standard approaches
  • Insight: Performance degradation depends on task complexity scaling (constant/linear/quadratic), not just prompt length
  • Capability: Handles complex multi-step reasoning that breaks even frontier models

Target Use Case

Enable AI-powered workflow code review for large markdown files. Instead of just detecting automation opportunities, provide detailed line-by-line feedback on existing workflows:

  • Line-specific suggestions: "line 3: consider adding error handling for fetch_sales_data() failures"
  • Cross-workflow pattern analysis: "this looks similar to your quarterly report workflow - could you extract a shared template?"
  • Optimization opportunities: "you're fetching sales data twice across workflows X and Y - consider caching"
  • Best practice enforcement: "missing retry logic on external API calls"

Integration Point

Hook into Omni/Agent/Engine.hs around the Provider.chat call (line ~1419). Add prompt size estimation and route large prompts through RLM decomposition instead of direct LLM calls.

Benefits

  • Handle arbitrarily large workflow libraries without context limits
  • Maintain reasoning quality across massive document corpuses
  • Enable detailed, contextual feedback on specific workflow components
  • Support cross-file analysis for optimization suggestions

Priority

P3 - Eventually valuable but not blocking current development. The recursive decomposition approach could be transformative for workflow analysis at scale.

Timeline (7)

💬[human]4 weeks ago

Hybrid Context Strategy for Free Monad Agent

Building on the RLM concept, apply adaptive context to the entire message history:

Core Idea

Instead of just relying on recency, extend memory search to all messages: 1. Recency window: Last 10-20 actual messages (temporal locality) 2. Semantic window: Query searchChatHistorySemantic with current user message to pull relevant older messages

Fusion Approach

getAdaptiveContext :: UserId -> ChatId -> Text -> Int -> IO [Message]
getAdaptiveContext uid chatId currentMessage maxTokens = do
  -- always include recent messages (temporal locality)
  recent <- getRecentMessages uid chatId 20
  
  -- semantic retrieval for older relevant context
  semanticHits <- searchChatHistorySemantic currentMessage 10
  
  -- filter out any semantic hits already in recent
  let recentIds = Set.fromList (map cmId recent)
      oldButRelevant = filter (not . (\`Set.member\` recentIds) . cheId . fst) semanticHits
  
  -- budget-aware merge: recent first, then fill with semantic hits
  pure (budgetedMerge recent oldButRelevant maxTokens)

Design Questions

1. Similarity threshold: Discard low-scoring hits? (e.g., cosine < 0.7) 2. Recency penalty: Score decay for distant messages? score * recencyDecay(age) 3. Topic coherence: Multiple unrelated threads could pull confusing context

Hybrid Strategy

  • Auto-inject semantic context baseline via getAdaptiveContext
  • Agent can dig deeper via existing searchChatHistorySemantic tool if needed
  • Best of both worlds: automatic context enrichment + explicit retrieval capability

Integration Point

In Engine.hs, modify context building before Provider.chat call to use getAdaptiveContext instead of just recent messages.

Files to Modify

  • Omni/Agent/Memory.hs - add getAdaptiveContext
  • Omni/Agent/Engine.hs - use new context builder
  • Omni/Agent/Types.hs - maybe add AdaptiveContextConfig type
💬[human]4 weeks ago

Three-window context model:

1. *Temporal window* (passive, auto-injected): recent N messages, captures conversation continuity and 'what we're talking about right now'

2. *Semantic window* (passive, auto-injected): vector similarity search across all history, captures 'what have we discussed before that's relevant'

3. *Agentic window* (active, tool-driven): agent can explicitly query for more context when passive windows aren't enough, using existing searchChatHistorySemantic tool

The first two are automatic context enrichment; the third gives the agent control when it needs to dig deeper. Each can be independently tuned (window size, similarity threshold, recency decay).

💬[human]4 weeks ago

Context Window Matrix

| | Semantic | Temporal | |-----------|----------|----------| | Active | A | B | | Passive | C | D |

A (active-semantic): Agent queries for related context via search_chat_history tool — agent decides WHAT to search for. ✅ exists

B (active-temporal): Agent requests specific past messages by time range ("show me messages from last tuesday", "get the 5 messages before X"). Would need get_messages_by_time(start, end) or similar. ❌ to implement

C (passive-semantic): Auto-injected similar messages via embedding search — system retrieves relevant older context based on current message. ❌ to implement (main hybrid addition)

D (passive-temporal): Auto-injected recent N messages (sliding window). ✅ exists

Goal: implement B and C to complete all four quadrants of contextual retrieval.

🔄[human]Open → Done4 weeks ago