Migrate Memory System to Backend-Agnostic Architecture

t-477·WorkTask·
·
·
Created4 weeks ago·Updated4 weeks ago

Description

Edit

Migrate Memory System from Telegram/Ava to Core Agent Library

Overview

The current memory system (Omni/Agent/Memory.hs) is tightly coupled to the Telegram/Ava use case. This task is to refactor it into a backend-agnostic memory abstraction that lives in the core agent library, allowing any agent (Telegram, CLI, web, etc.) to plug in their preferred storage backend.

Current State

  • Omni/Agent/Memory.hs - 2000+ lines, sqlite-vss based, provides:
  • User management (UserId, TelegramId)
  • Memory CRUD (storeMemory, recallMemories, forgetMemory)
  • Knowledge graph (linkMemories, queryGraph)
  • Conversation history (saveMessage, getRecentMessages)
  • Semantic search via embeddings (embedText, searchChatHistorySemantic)
  • Agent tools (rememberTool, recallTool)
  • Tightly coupled to:
  • SQLite/sqlite-vss
  • Ollama for embeddings
  • Telegram-specific ID types

Target Architecture

1. Abstract Memory Backend (new: Omni/Agent/Memory/Backend.hs)

-- Backend interface - swappable storage layer
data MemoryBackend m = MemoryBackend
  { mbStore    :: MemoryEntry -> m ()
  , mbSearch   :: Text -> Int -> m [MemoryEntry]  -- semantic search
  , mbRecent   :: Int -> m [MemoryEntry]          -- temporal (most recent N)
  , mbByTag    :: [Tag] -> m [MemoryEntry]
  , mbPrune    :: m ()                            -- cleanup/consolidation
  , mbDelete   :: MemoryId -> m ()
  }

-- Core memory entry type (backend-agnostic)
data MemoryEntry = MemoryEntry
  { meId        :: MemoryId
  , meContent   :: Text
  , meContext   :: Text           -- how/why learned
  , meTags      :: [Tag]
  , meCreatedAt :: UTCTime
  , meAccessedAt :: UTCTime
  , meAccessCount :: Int
  }

2. Embedding Provider Abstraction (new: Omni/Agent/Memory/Embedding.hs)

-- Embedding provider interface
data EmbeddingProvider m = EmbeddingProvider
  { epEmbed :: Text -> m (Vector Float)
  , epBatchEmbed :: [Text] -> m [Vector Float]
  , epDimensions :: Int
  }

-- Implementations
ollamaEmbedding :: OllamaConfig -> EmbeddingProvider IO
openaiEmbedding :: OpenAIConfig -> EmbeddingProvider IO
noopEmbedding :: EmbeddingProvider IO  -- for testing, returns zeros

3. Concrete Backends

-- In-memory (good for testing/dev)
inMemoryBackend :: IORef [MemoryEntry] -> EmbeddingProvider IO -> MemoryBackend IO

-- SQLite with sqlite-vss (current implementation, refactored)
sqliteBackend :: Connection -> EmbeddingProvider IO -> MemoryBackend IO

-- Future backends (not in scope for this task, but interface should support):
-- postgresBackend, qdrantBackend, neo4jBackend

4. Agent Integration (update: Omni/Agent/Op.hs or new Omni/Agent/Op/Memory.hs)

Add memory operations to the free monad DSL:

data AgentOp next where
  -- ... existing ops ...
  Remember :: Text -> Text -> [Tag] -> (MemoryId -> next) -> AgentOp next
  Recall   :: Text -> Int -> ([MemoryEntry] -> next) -> AgentOp next
  Forget   :: MemoryId -> (() -> next) -> AgentOp next

5. Interpreter Updates (update: Omni/Agent/Interpreter/Sequential.hs)

The interpreter needs a MemoryBackend in its config:

data InterpreterConfig = InterpreterConfig
  { icProvider :: Provider
  , icMemory   :: MemoryBackend IO  -- NEW
  , icTools    :: [Tool]
  , ...
  }

interpretOp :: InterpreterConfig -> AgentOp a -> IO a
interpretOp cfg (Remember content ctx tags k) = do
  memId <- mbStore (icMemory cfg) (mkEntry content ctx tags)
  k memId
-- etc.

6. Ava Wiring (update: Omni/Agent/Telegram.hs)

Ava plugs in the sqlite backend:

runAvaAgent :: TelegramConfig -> IO ()
runAvaAgent cfg = do
  conn <- initMemoryDb
  let backend = sqliteBackend conn (ollamaEmbedding defaultOllamaConfig)
  let interpreterCfg = defaultInterpreterConfig { icMemory = backend }
  -- ... run agent with this config

Implementation Steps

1. Define core types - Create Omni/Agent/Memory/Types.hs with backend-agnostic types 2. Define backend interface - Create Omni/Agent/Memory/Backend.hs with the typeclass/record 3. Extract embedding abstraction - Create Omni/Agent/Memory/Embedding.hs 4. Implement sqlite backend - Refactor current Memory.hs into Omni/Agent/Memory/Sqlite.hs 5. Implement in-memory backend - Create Omni/Agent/Memory/InMemory.hs for testing 6. Add Op instructions - Extend Omni/Agent/Op.hs with memory operations 7. Update interpreter - Handle new ops in Omni/Agent/Interpreter/Sequential.hs 8. Update Ava - Wire up sqlite backend in Omni/Agent/Telegram.hs 9. Add tests - Unit tests for backends, integration test for full flow

Context Window Integration (Future Enhancement)

This task focuses on the storage/retrieval abstraction. A follow-up task should address:

  • Adaptive context window - Auto-prepend relevant memories to agent context
  • Hybrid retrieval - Combine temporal (recency decay) + semantic (similarity) scoring
  • Active vs passive - Agent can actively query OR system auto-injects context

Design sketch for context window (NOT in scope for this task):

data ContextWindow = ContextWindow
  { cwMaxTokens :: Int
  , cwRecencyWeight :: Float      -- 0.0-1.0
  , cwSemanticWeight :: Float     -- 0.0-1.0
  , cwSimilarityThreshold :: Float
  }

buildContext :: MemoryBackend m -> ContextWindow -> Text -> m [MemoryEntry]

Success Criteria

  • [ ] Memory operations work through the free monad DSL
  • [ ] SQLite backend passes all existing Memory.hs tests
  • [ ] In-memory backend works for unit tests
  • [ ] Ava continues to work with sqlite backend
  • [ ] No changes to user-facing behavior
  • [ ] Clean separation: core types → backend interface → concrete impls → agent integration

Files to Create/Modify

New:

  • Omni/Agent/Memory/Types.hs
  • Omni/Agent/Memory/Backend.hs
  • Omni/Agent/Memory/Embedding.hs
  • Omni/Agent/Memory/Sqlite.hs
  • Omni/Agent/Memory/InMemory.hs

Modify:

  • Omni/Agent/Op.hs - add memory ops
  • Omni/Agent/Interpreter/Sequential.hs - handle memory ops
  • Omni/Agent/Telegram.hs - wire up backend
  • Omni/Agent/Memory.hs - deprecate or re-export from new modules

Timeline (14)

💬[human]4 weeks ago

Design Decisions (2026-01-24)

Core Concept

Context window = dynamically constructed prior for each inference, not an append-only log. Like Bayesian priors - optimized for current task, not necessarily chronological.

Decisions Made

1. Granularity: Per-inference (every LLM call gets freshly constructed context)

2. Initial Sources:

  • Conversation history (temporal)
  • Long-term knowledge (semantic/facts)
  • Keep extensible for future sources (external via tools)

3. Architecture: Option B - Inside Op layer

  • Infer takes ContextRequest instead of Prompt
  • Agent specifies intent, interpreter hydrates into concrete prompt
  • Explicit and traceable

4. Style: Prefer data over functions/callbacks

5. Reference: Ava's current implementation is the target behavior to support

Open: Prompt IR Design

Need structured intermediate representation:

  • ContextRequest (agent intent) → PromptIR (hydrated, labeled) → Prompt (flat API format)
  • Must be traceable, budget-aware, labeled by source
  • Discussing in detail next
💬[human]4 weeks ago

Prompt IR Design (2026-01-24)

Core Concept

Context window = dynamically constructed Bayesian prior for each inference. The IR represents this prior as structured, labeled, optimization-aware data.

Design Principles

1. Data, not strings - Sections are structured with metadata 2. Optimization-aware - Supports compression, composition, analysis 3. Traceable - Every section has provenance 4. Composable - Explicit composition semantics (Bayesian)

The Flow

ContextRequest          -- Agent specifies intent
       ↓ (hydrate)
   PromptIR             -- Structured, labeled, optimization-ready
       ↓ (compile)
    Prompt              -- Flat messages for API
       ↓ (LLM call)
   Response

Core Types

-- Section with full optimization metadata
data Section = Section
  { secId :: Text                      -- Unique ID for tracing
  , secLabel :: Text                   -- Display label
  , secSource :: SectionSource         -- Provenance
  , secContent :: Text                 -- The actual content
  
  -- Token/Rate metrics
  , secTokens :: Int                   -- Current token count
  , secMinTokens :: Maybe Int          -- Minimum viable (for compression)
  
  -- Relevance/Priority metrics  
  , secPriority :: Priority            -- For budget trimming
  , secRelevance :: Maybe Float        -- 0.0-1.0, task-specific relevance
  , secRecency :: Maybe UTCTime        -- When this info was current
  
  -- For Bayesian composition
  , secCompositionMode :: CompositionMode
  
  -- For analysis/caching
  , secEmbedding :: Maybe (Vector Float)  -- Precomputed embedding
  , secHash :: Maybe Text                 -- Content hash for dedup/caching
  }

-- How this section composes with others (Bayesian semantics)
data CompositionMode
  = Hierarchical    -- Hyperprior (system prompt, base instructions)
  | Constraint      -- Product-of-experts (must satisfy)
  | Additive        -- Mixture (adds info, can be dropped)
  | Contextual      -- Bayesian update (observation shifting posterior)

-- Section provenance
data SectionSource
  = SourceStatic Text      -- Static (file name or "code")
  | SourceTemporal         -- Recent conversation
  | SourceSemantic Float   -- Semantic search (with relevance score)
  | SourceKnowledge        -- Long-term memory/facts
  | SourceState Text       -- Runtime state ("project", "time")
  | SourceConditional Text -- Conditional on auth/config

data Priority = Critical | High | Medium | Low

-- Tool definition (part of IR, affects behavior significantly)
data ToolDef = ToolDef
  { tdName :: Text
  , tdDescription :: Text
  , tdSchema :: Value
  , tdPriority :: Priority
  , tdEmbedding :: Maybe (Vector Float)
  }

-- The full IR
data PromptIR = PromptIR
  { pirSections :: [Section]
  , pirTools :: [ToolDef]
  , pirObservation :: Text
  , pirMeta :: PromptMeta
  }

-- Metadata for tracing and optimization
data PromptMeta = PromptMeta
  { pmTotalTokens :: Int
  , pmBudget :: TokenBudget
  , pmStrategy :: ContextStrategy
  , pmTimestamp :: UTCTime
  , pmCompressionRatio :: Maybe Float
  , pmEstimatedEntropy :: Maybe Float  -- For risk detection (t-455)
  , pmCacheHit :: Bool
  }

-- Budget with principled allocation
data TokenBudget = TokenBudget
  { tbTotal :: Int
  , tbReserveRatio :: Float  -- Keep N% headroom (t-432)
  , tbAllocation :: BudgetAllocation
  }

data BudgetAllocation
  = FixedRatios { baSystem, baContext, baObservation :: Float }
  | InformationWeighted  -- Allocate by information content
  | RelevanceWeighted    -- Allocate by task relevance

Context Request (Agent Intent)

data ContextRequest = ContextRequest
  { crObservation :: Text           -- Current user input
  , crGoal :: Maybe Text            -- What agent is trying to do
  , crStrategy :: ContextStrategy   -- How to hydrate
  , crBudget :: TokenBudget
  }

data ContextStrategy = ContextStrategy
  { csTemporalWindow :: Int         -- Recent N messages
  , csSemanticLimit :: Int          -- Max semantic results
  , csSemanticThreshold :: Float    -- Min similarity (0.0-1.0)
  , csRecencyDecay :: Float         -- Decay factor (e.g., 0.995)
  , csIncludeKnowledge :: Bool      -- Include long-term facts
  }

Optimization Operations

Composition (t-398):

compose :: PromptIR -> PromptIR -> PromptIR
-- Hierarchical sections from first IR are hyperpriors
-- Constraint sections are AND'd (product of experts)
-- Additive sections are collected (mixture)
-- Contextual sections are sequenced (Bayesian updates)

Compression (t-399):

compress :: TokenBudget -> PromptIR -> IO PromptIR
-- Rank sections by (priority × relevance × recency)
-- Drop/summarize lowest-value until under budget
-- Track compression ratio in metadata

Analysis (t-397):

estimateImpact :: Section -> IO Float
-- Use embedding magnitude as proxy for information content

equivalent :: Float -> PromptIR -> PromptIR -> IO Bool
-- Check behavioral equivalence via embedding similarity

Optimization Hook Summary

| Research Task | IR Feature | |--------------|------------| | t-398 (Composition) | CompositionMode, compose | | t-399 (Compression) | secMinTokens, compress | | t-397 (Analysis) | secEmbedding, estimateImpact | | t-432 (Compaction) | secRelevance, secRecency, tbReserveRatio | | t-455 (Best-of-N) | pmEstimatedEntropy |

Example: Ava's Current Prompt as IR

avaPromptIR = PromptIR
  { pirSections = 
      [ Section "base" "## Core Instructions" (SourceStatic "telegram-system.md") 
          basePrompt 500 Nothing Critical Nothing Nothing Hierarchical Nothing Nothing
      , Section "time" "## Current Date and Time" (SourceState "clock") 
          "Saturday, Jan 25..." 20 Nothing High Nothing (Just now) Contextual Nothing Nothing
      , Section "project" "## Current Project" (SourceState "project") 
          "Working dir: /home/ben/omni/live..." 100 Nothing High Nothing Nothing Constraint Nothing Nothing
      , Section "user" "## Current User" (SourceState "user") 
          "You are talking to: Ben" 15 Nothing High Nothing Nothing Contextual Nothing Nothing
      , Section "memories" "## What you know about this user" SourceKnowledge 
          "Ben prefers..." 200 (Just 50) Medium (Just 0.8) Nothing Additive Nothing Nothing
      , Section "recent" "## Recent conversation" SourceTemporal 
          "[10:30] User: ..." 800 (Just 200) High Nothing (Just now) Contextual Nothing Nothing
      , Section "semantic" "## Related past messages" (SourceSemantic 0.82) 
          "[Jan 15] ..." 300 Nothing Medium (Just 0.82) (Just oldTime) Additive Nothing Nothing
      ]
  , pirTools = [skillTool, readFileTool, runBashTool, rememberTool, recallTool, ...]
  , pirObservation = "What were we discussing about the agent architecture?"
  , pirMeta = PromptMeta 1965 defaultBudget defaultStrategy now Nothing Nothing False
  }
🔄[human]Open → InProgress4 weeks ago
💬[human]4 weeks ago

Implementation Progress (2026-01-24)

Completed Modules

| Commit | File | Lines | Purpose | |--------|------|-------|---------| | 21f111e | Omni/Agent/Prompt/IR.hs | 545 | Core IR types with optimization metadata | | 69ad87d | Omni/Agent/Prompt/Hydrate.hs | 533 | ContextRequest → PromptIR | | 32142b6 | Omni/Agent/Prompt/Compile.hs | 436 | PromptIR → CompiledPrompt | | d472b64 | Omni/Agent/Op.hs | +30 | Infer now takes ContextRequest, added InferRaw | | d472b64 | Sequential.hs | +178 | Interpreter handles hydration/compilation |

The Pipeline

infer(model, ContextRequest)
        ↓
    [hydrate]
        ↓
    PromptIR (sections + tools + metadata)
        ↓
    [compile]
        ↓
    CompiledPrompt (messages + tools)
        ↓
    [LLM call]
        ↓
    Response

API Summary

For new agents (dynamic context):

infer :: Model -> ContextRequest -> Op s Response

For legacy/raw prompts:

inferRaw :: Model -> Prompt -> Op s Response

Remaining Work

  • [ ] Ava integration: Wire up HydrationConfig in Telegram.hs
  • [ ] Create context sources that connect to Memory.hs
  • [ ] Test end-to-end with Ava
💬[human]4 weeks ago

Phase 4 Complete: Op Integration (commit 3dbf484)

Summary

The Prompt IR pipeline is now fully integrated into the interpreter:

[Agent Program]
     │
     ├── infer(model, ContextRequest)  ──→ hydrate → compile → LLM
     │
     └── inferRaw(model, Prompt)       ──→ (legacy path) → LLM

Changes

Op.hs:

  • Infer :: Model -> ContextRequest -> ... (new API)
  • InferRaw :: Model -> Prompt -> ... (legacy API)
  • Exports ContextRequest, ContextStrategy, TokenBudget

Sequential.hs:

  • seqHydrationConfig :: Maybe HydrationConfig in SeqConfig
  • Infer case: requires hydration config, hydrates + compiles
  • InferRaw case: works like old Infer (direct prompt → LLM)
  • Added handleBudgetInferResult and compiledToMessages helpers

Programs/*.hs:

  • All migrated from inferinferRaw (no semantic change)

Current State

  • ✅ IR, Hydrate, Compile modules
  • ✅ Op integration (Infer/InferRaw)
  • ✅ Interpreter support
  • ✅ All programs use inferRaw (legacy path)
  • ✅ Ava builds with new system

Remaining for full context-aware Ava

To use the new context pipeline, Telegram.hs needs: 1. A HydrationConfig with Memory-backed context sources 2. Call sites using infer + ContextRequest

The infrastructure is ready; this is now an integration task.

💬[human]4 weeks ago

Implementation Complete (2026-01-24)

All Modules

| Commit | File | Lines | Purpose | |--------|------|-------|---------| | 21f111e | Prompt/IR.hs | 545 | Core IR types (Section, ToolDef, CompositionMode, ContextRequest) | | 69ad87d | Prompt/Hydrate.hs | 533 | ContextRequest → PromptIR via context sources | | 32142b6 | Prompt/Compile.hs | 436 | PromptIR → CompiledPrompt with budget enforcement | | d472b64 | Op.hs | +30 | Infer takes ContextRequest, added InferRaw | | d472b64 | Sequential.hs | +178 | Interpreter handles hydration/compilation | | 3dbf484 | Programs/*.hs | - | Migrated to inferRaw | | d844e27 | Prompt/MemorySources.hs | 141 | Context sources backed by Memory.hs | | eca7a0d | Prompt/MemorySources.hs | +112 | buildHydrationConfig helper |

Full Pipeline

ContextRequest (intent)
       ↓
   [hydrate] ← MemorySources (temporal, semantic, knowledge)
       ↓
   PromptIR (labeled sections + tools + metadata)
       ↓
   [compile] ← budget enforcement, priority ordering
       ↓
   CompiledPrompt (flat messages)
       ↓
   LLM call

Integration Ready

To enable in Telegram.hs:

import Omni.Agent.Prompt.MemorySources as MS

-- Build hydration config
let hydrationCfg = MS.buildHydrationConfig
      systemPrompt
      tools
      [MS.mkProjectSection proj dir, MS.mkTimeSection now tz]
      userId
      chatId

-- Add to SeqConfig
let seqConfig = (Seq.defaultSeqConfig provider seqTools)
      { Seq.seqHydrationConfig = Just hydrationCfg }

Remaining for Full Migration

The infrastructure is complete. Full migration requires: 1. Modify OpAgent.runAgent to use infer instead of inferRaw, OR 2. Create a new IR-native agent program

The current system works (using inferRaw + manual context), and the new IR system is ready for incremental adoption.

🔄[human]InProgress → Done4 weeks ago
💬[human]4 weeks ago

Follow-up task created: t-480 (Integrate Prompt IR into Ava with observability). This covers wiring up the HydrationConfig in Telegram.hs and adding tracing to validate the system in real usage.