Migrate Memory System to Backend-Agnostic Architecture

t-477·WorkTask···

Created4 weeks ago·Updated4 weeks ago

Description

Migrate Memory System from Telegram/Ava to Core Agent Library

Overview

The current memory system (Omni/Agent/Memory.hs) is tightly coupled to the Telegram/Ava use case. This task is to refactor it into a backend-agnostic memory abstraction that lives in the core agent library, allowing any agent (Telegram, CLI, web, etc.) to plug in their preferred storage backend.

Current State

Omni/Agent/Memory.hs - 2000+ lines, sqlite-vss based, provides:
User management (UserId, TelegramId)
Memory CRUD (storeMemory, recallMemories, forgetMemory)
Knowledge graph (linkMemories, queryGraph)
Conversation history (saveMessage, getRecentMessages)
Semantic search via embeddings (embedText, searchChatHistorySemantic)
Agent tools (rememberTool, recallTool)

Tightly coupled to:
SQLite/sqlite-vss
Ollama for embeddings
Telegram-specific ID types

Target Architecture

1. Abstract Memory Backend (new: `Omni/Agent/Memory/Backend.hs`)

-- Backend interface - swappable storage layer
data MemoryBackend m = MemoryBackend
  { mbStore    :: MemoryEntry -> m ()
  , mbSearch   :: Text -> Int -> m [MemoryEntry]  -- semantic search
  , mbRecent   :: Int -> m [MemoryEntry]          -- temporal (most recent N)
  , mbByTag    :: [Tag] -> m [MemoryEntry]
  , mbPrune    :: m ()                            -- cleanup/consolidation
  , mbDelete   :: MemoryId -> m ()
  }

-- Core memory entry type (backend-agnostic)
data MemoryEntry = MemoryEntry
  { meId        :: MemoryId
  , meContent   :: Text
  , meContext   :: Text           -- how/why learned
  , meTags      :: [Tag]
  , meCreatedAt :: UTCTime
  , meAccessedAt :: UTCTime
  , meAccessCount :: Int
  }

2. Embedding Provider Abstraction (new: `Omni/Agent/Memory/Embedding.hs`)

-- Embedding provider interface
data EmbeddingProvider m = EmbeddingProvider
  { epEmbed :: Text -> m (Vector Float)
  , epBatchEmbed :: [Text] -> m [Vector Float]
  , epDimensions :: Int
  }

-- Implementations
ollamaEmbedding :: OllamaConfig -> EmbeddingProvider IO
openaiEmbedding :: OpenAIConfig -> EmbeddingProvider IO
noopEmbedding :: EmbeddingProvider IO  -- for testing, returns zeros

3. Concrete Backends

-- In-memory (good for testing/dev)
inMemoryBackend :: IORef [MemoryEntry] -> EmbeddingProvider IO -> MemoryBackend IO

-- SQLite with sqlite-vss (current implementation, refactored)
sqliteBackend :: Connection -> EmbeddingProvider IO -> MemoryBackend IO

-- Future backends (not in scope for this task, but interface should support):
-- postgresBackend, qdrantBackend, neo4jBackend

4. Agent Integration (update: `Omni/Agent/Op.hs` or new `Omni/Agent/Op/Memory.hs`)

Add memory operations to the free monad DSL:

data AgentOp next where
  -- ... existing ops ...
  Remember :: Text -> Text -> [Tag] -> (MemoryId -> next) -> AgentOp next
  Recall   :: Text -> Int -> ([MemoryEntry] -> next) -> AgentOp next
  Forget   :: MemoryId -> (() -> next) -> AgentOp next

5. Interpreter Updates (update: `Omni/Agent/Interpreter/Sequential.hs`)

The interpreter needs a MemoryBackend in its config:

data InterpreterConfig = InterpreterConfig
  { icProvider :: Provider
  , icMemory   :: MemoryBackend IO  -- NEW
  , icTools    :: [Tool]
  , ...
  }

interpretOp :: InterpreterConfig -> AgentOp a -> IO a
interpretOp cfg (Remember content ctx tags k) = do
  memId <- mbStore (icMemory cfg) (mkEntry content ctx tags)
  k memId
-- etc.

6. Ava Wiring (update: `Omni/Agent/Telegram.hs`)

Ava plugs in the sqlite backend:

runAvaAgent :: TelegramConfig -> IO ()
runAvaAgent cfg = do
  conn <- initMemoryDb
  let backend = sqliteBackend conn (ollamaEmbedding defaultOllamaConfig)
  let interpreterCfg = defaultInterpreterConfig { icMemory = backend }
  -- ... run agent with this config

Implementation Steps

1. Define core types - Create Omni/Agent/Memory/Types.hs with backend-agnostic types 2. Define backend interface - Create Omni/Agent/Memory/Backend.hs with the typeclass/record 3. Extract embedding abstraction - Create Omni/Agent/Memory/Embedding.hs 4. Implement sqlite backend - Refactor current Memory.hs into Omni/Agent/Memory/Sqlite.hs 5. Implement in-memory backend - Create Omni/Agent/Memory/InMemory.hs for testing 6. Add Op instructions - Extend Omni/Agent/Op.hs with memory operations 7. Update interpreter - Handle new ops in Omni/Agent/Interpreter/Sequential.hs 8. Update Ava - Wire up sqlite backend in Omni/Agent/Telegram.hs 9. Add tests - Unit tests for backends, integration test for full flow

Context Window Integration (Future Enhancement)

This task focuses on the storage/retrieval abstraction. A follow-up task should address:

Adaptive context window - Auto-prepend relevant memories to agent context
Hybrid retrieval - Combine temporal (recency decay) + semantic (similarity) scoring
Active vs passive - Agent can actively query OR system auto-injects context

Design sketch for context window (NOT in scope for this task):

data ContextWindow = ContextWindow
  { cwMaxTokens :: Int
  , cwRecencyWeight :: Float      -- 0.0-1.0
  , cwSemanticWeight :: Float     -- 0.0-1.0
  , cwSimilarityThreshold :: Float
  }

buildContext :: MemoryBackend m -> ContextWindow -> Text -> m [MemoryEntry]

Success Criteria

[ ] Memory operations work through the free monad DSL
[ ] SQLite backend passes all existing Memory.hs tests
[ ] In-memory backend works for unit tests
[ ] Ava continues to work with sqlite backend
[ ] No changes to user-facing behavior
[ ] Clean separation: core types → backend interface → concrete impls → agent integration

Files to Create/Modify

New:

Omni/Agent/Memory/Types.hs
Omni/Agent/Memory/Backend.hs
Omni/Agent/Memory/Embedding.hs
Omni/Agent/Memory/Sqlite.hs
Omni/Agent/Memory/InMemory.hs

Modify:

Omni/Agent/Op.hs - add memory ops
Omni/Agent/Interpreter/Sequential.hs - handle memory ops
Omni/Agent/Telegram.hs - wire up backend
Omni/Agent/Memory.hs - deprecate or re-export from new modules

Timeline (14)

💬[human]4 weeks ago

Design Decisions (2026-01-24)

Core Concept

Context window = dynamically constructed prior for each inference, not an append-only log. Like Bayesian priors - optimized for current task, not necessarily chronological.

Decisions Made

1. Granularity: Per-inference (every LLM call gets freshly constructed context)

2. Initial Sources:

Conversation history (temporal)
Long-term knowledge (semantic/facts)
Keep extensible for future sources (external via tools)

3. Architecture: Option B - Inside Op layer

Infer takes ContextRequest instead of Prompt
Agent specifies intent, interpreter hydrates into concrete prompt
Explicit and traceable

4. Style: Prefer data over functions/callbacks

5. Reference: Ava's current implementation is the target behavior to support

Open: Prompt IR Design

Need structured intermediate representation:

ContextRequest (agent intent) → PromptIR (hydrated, labeled) → Prompt (flat API format)
Must be traceable, budget-aware, labeled by source
Discussing in detail next

💬[human]4 weeks ago

Prompt IR Design (2026-01-24)

Core Concept

Context window = dynamically constructed Bayesian prior for each inference. The IR represents this prior as structured, labeled, optimization-aware data.

Design Principles

1. Data, not strings - Sections are structured with metadata 2. Optimization-aware - Supports compression, composition, analysis 3. Traceable - Every section has provenance 4. Composable - Explicit composition semantics (Bayesian)

The Flow

ContextRequest          -- Agent specifies intent
       ↓ (hydrate)
   PromptIR             -- Structured, labeled, optimization-ready
       ↓ (compile)
    Prompt              -- Flat messages for API
       ↓ (LLM call)
   Response

Core Types

-- Section with full optimization metadata
data Section = Section
  { secId :: Text                      -- Unique ID for tracing
  , secLabel :: Text                   -- Display label
  , secSource :: SectionSource         -- Provenance
  , secContent :: Text                 -- The actual content
  
  -- Token/Rate metrics
  , secTokens :: Int                   -- Current token count
  , secMinTokens :: Maybe Int          -- Minimum viable (for compression)
  
  -- Relevance/Priority metrics  
  , secPriority :: Priority            -- For budget trimming
  , secRelevance :: Maybe Float        -- 0.0-1.0, task-specific relevance
  , secRecency :: Maybe UTCTime        -- When this info was current
  
  -- For Bayesian composition
  , secCompositionMode :: CompositionMode
  
  -- For analysis/caching
  , secEmbedding :: Maybe (Vector Float)  -- Precomputed embedding
  , secHash :: Maybe Text                 -- Content hash for dedup/caching
  }

-- How this section composes with others (Bayesian semantics)
data CompositionMode
  = Hierarchical    -- Hyperprior (system prompt, base instructions)
  | Constraint      -- Product-of-experts (must satisfy)
  | Additive        -- Mixture (adds info, can be dropped)
  | Contextual      -- Bayesian update (observation shifting posterior)

-- Section provenance
data SectionSource
  = SourceStatic Text      -- Static (file name or "code")
  | SourceTemporal         -- Recent conversation
  | SourceSemantic Float   -- Semantic search (with relevance score)
  | SourceKnowledge        -- Long-term memory/facts
  | SourceState Text       -- Runtime state ("project", "time")
  | SourceConditional Text -- Conditional on auth/config

data Priority = Critical | High | Medium | Low

-- Tool definition (part of IR, affects behavior significantly)
data ToolDef = ToolDef
  { tdName :: Text
  , tdDescription :: Text
  , tdSchema :: Value
  , tdPriority :: Priority
  , tdEmbedding :: Maybe (Vector Float)
  }

-- The full IR
data PromptIR = PromptIR
  { pirSections :: [Section]
  , pirTools :: [ToolDef]
  , pirObservation :: Text
  , pirMeta :: PromptMeta
  }

-- Metadata for tracing and optimization
data PromptMeta = PromptMeta
  { pmTotalTokens :: Int
  , pmBudget :: TokenBudget
  , pmStrategy :: ContextStrategy
  , pmTimestamp :: UTCTime
  , pmCompressionRatio :: Maybe Float
  , pmEstimatedEntropy :: Maybe Float  -- For risk detection (t-455)
  , pmCacheHit :: Bool
  }

-- Budget with principled allocation
data TokenBudget = TokenBudget
  { tbTotal :: Int
  , tbReserveRatio :: Float  -- Keep N% headroom (t-432)
  , tbAllocation :: BudgetAllocation
  }

data BudgetAllocation
  = FixedRatios { baSystem, baContext, baObservation :: Float }
  | InformationWeighted  -- Allocate by information content
  | RelevanceWeighted    -- Allocate by task relevance

Context Request (Agent Intent)

data ContextRequest = ContextRequest
  { crObservation :: Text           -- Current user input
  , crGoal :: Maybe Text            -- What agent is trying to do
  , crStrategy :: ContextStrategy   -- How to hydrate
  , crBudget :: TokenBudget
  }

data ContextStrategy = ContextStrategy
  { csTemporalWindow :: Int         -- Recent N messages
  , csSemanticLimit :: Int          -- Max semantic results
  , csSemanticThreshold :: Float    -- Min similarity (0.0-1.0)
  , csRecencyDecay :: Float         -- Decay factor (e.g., 0.995)
  , csIncludeKnowledge :: Bool      -- Include long-term facts
  }

Optimization Operations

Composition (t-398):

compose :: PromptIR -> PromptIR -> PromptIR
-- Hierarchical sections from first IR are hyperpriors
-- Constraint sections are AND'd (product of experts)
-- Additive sections are collected (mixture)
-- Contextual sections are sequenced (Bayesian updates)

Compression (t-399):

compress :: TokenBudget -> PromptIR -> IO PromptIR
-- Rank sections by (priority × relevance × recency)
-- Drop/summarize lowest-value until under budget
-- Track compression ratio in metadata

Analysis (t-397):

estimateImpact :: Section -> IO Float
-- Use embedding magnitude as proxy for information content

equivalent :: Float -> PromptIR -> PromptIR -> IO Bool
-- Check behavioral equivalence via embedding similarity

Optimization Hook Summary

| Research Task | IR Feature | |--------------|------------| | t-398 (Composition) | CompositionMode, compose | | t-399 (Compression) | secMinTokens, compress | | t-397 (Analysis) | secEmbedding, estimateImpact | | t-432 (Compaction) | secRelevance, secRecency, tbReserveRatio | | t-455 (Best-of-N) | pmEstimatedEntropy |

Example: Ava's Current Prompt as IR

avaPromptIR = PromptIR
  { pirSections = 
      [ Section "base" "## Core Instructions" (SourceStatic "telegram-system.md") 
          basePrompt 500 Nothing Critical Nothing Nothing Hierarchical Nothing Nothing
      , Section "time" "## Current Date and Time" (SourceState "clock") 
          "Saturday, Jan 25..." 20 Nothing High Nothing (Just now) Contextual Nothing Nothing
      , Section "project" "## Current Project" (SourceState "project") 
          "Working dir: /home/ben/omni/live..." 100 Nothing High Nothing Nothing Constraint Nothing Nothing
      , Section "user" "## Current User" (SourceState "user") 
          "You are talking to: Ben" 15 Nothing High Nothing Nothing Contextual Nothing Nothing
      , Section "memories" "## What you know about this user" SourceKnowledge 
          "Ben prefers..." 200 (Just 50) Medium (Just 0.8) Nothing Additive Nothing Nothing
      , Section "recent" "## Recent conversation" SourceTemporal 
          "[10:30] User: ..." 800 (Just 200) High Nothing (Just now) Contextual Nothing Nothing
      , Section "semantic" "## Related past messages" (SourceSemantic 0.82) 
          "[Jan 15] ..." 300 Nothing Medium (Just 0.82) (Just oldTime) Additive Nothing Nothing
      ]
  , pirTools = [skillTool, readFileTool, runBashTool, rememberTool, recallTool, ...]
  , pirObservation = "What were we discussing about the agent architecture?"
  , pirMeta = PromptMeta 1965 defaultBudget defaultStrategy now Nothing Nothing False
  }

🔄[human]Open → InProgress4 weeks ago

💬[human]4 weeks ago

Implementation Progress (2026-01-24)

Completed Modules

| Commit | File | Lines | Purpose | |--------|------|-------|---------| | 21f111e | Omni/Agent/Prompt/IR.hs | 545 | Core IR types with optimization metadata | | 69ad87d | Omni/Agent/Prompt/Hydrate.hs | 533 | ContextRequest → PromptIR | | 32142b6 | Omni/Agent/Prompt/Compile.hs | 436 | PromptIR → CompiledPrompt | | d472b64 | Omni/Agent/Op.hs | +30 | Infer now takes ContextRequest, added InferRaw | | d472b64 | Sequential.hs | +178 | Interpreter handles hydration/compilation |

The Pipeline

infer(model, ContextRequest)
        ↓
    [hydrate]
        ↓
    PromptIR (sections + tools + metadata)
        ↓
    [compile]
        ↓
    CompiledPrompt (messages + tools)
        ↓
    [LLM call]
        ↓
    Response

API Summary

For new agents (dynamic context):

infer :: Model -> ContextRequest -> Op s Response

For legacy/raw prompts:

inferRaw :: Model -> Prompt -> Op s Response

Remaining Work

[ ] Ava integration: Wire up HydrationConfig in Telegram.hs
[ ] Create context sources that connect to Memory.hs
[ ] Test end-to-end with Ava

💬[human]4 weeks ago

Phase 4 Complete: Op Integration (commit 3dbf484)

Summary

The Prompt IR pipeline is now fully integrated into the interpreter:

[Agent Program]
     │
     ├── infer(model, ContextRequest)  ──→ hydrate → compile → LLM
     │
     └── inferRaw(model, Prompt)       ──→ (legacy path) → LLM

Changes

Op.hs:

Infer :: Model -> ContextRequest -> ... (new API)
InferRaw :: Model -> Prompt -> ... (legacy API)
Exports ContextRequest, ContextStrategy, TokenBudget

Sequential.hs:

seqHydrationConfig :: Maybe HydrationConfig in SeqConfig
Infer case: requires hydration config, hydrates + compiles
InferRaw case: works like old Infer (direct prompt → LLM)
Added handleBudgetInferResult and compiledToMessages helpers

Programs/*.hs:

All migrated from infer → inferRaw (no semantic change)

Current State

✅ IR, Hydrate, Compile modules
✅ Op integration (Infer/InferRaw)
✅ Interpreter support
✅ All programs use inferRaw (legacy path)
✅ Ava builds with new system

Remaining for full context-aware Ava

To use the new context pipeline, Telegram.hs needs: 1. A HydrationConfig with Memory-backed context sources 2. Call sites using infer + ContextRequest

The infrastructure is ready; this is now an integration task.

💬[human]4 weeks ago

Implementation Complete (2026-01-24)

All Modules

| Commit | File | Lines | Purpose | |--------|------|-------|---------| | 21f111e | Prompt/IR.hs | 545 | Core IR types (Section, ToolDef, CompositionMode, ContextRequest) | | 69ad87d | Prompt/Hydrate.hs | 533 | ContextRequest → PromptIR via context sources | | 32142b6 | Prompt/Compile.hs | 436 | PromptIR → CompiledPrompt with budget enforcement | | d472b64 | Op.hs | +30 | Infer takes ContextRequest, added InferRaw | | d472b64 | Sequential.hs | +178 | Interpreter handles hydration/compilation | | 3dbf484 | Programs/*.hs | - | Migrated to inferRaw | | d844e27 | Prompt/MemorySources.hs | 141 | Context sources backed by Memory.hs | | eca7a0d | Prompt/MemorySources.hs | +112 | buildHydrationConfig helper |

Full Pipeline

ContextRequest (intent)
       ↓
   [hydrate] ← MemorySources (temporal, semantic, knowledge)
       ↓
   PromptIR (labeled sections + tools + metadata)
       ↓
   [compile] ← budget enforcement, priority ordering
       ↓
   CompiledPrompt (flat messages)
       ↓
   LLM call

Integration Ready

To enable in Telegram.hs:

import Omni.Agent.Prompt.MemorySources as MS

-- Build hydration config
let hydrationCfg = MS.buildHydrationConfig
      systemPrompt
      tools
      [MS.mkProjectSection proj dir, MS.mkTimeSection now tz]
      userId
      chatId

-- Add to SeqConfig
let seqConfig = (Seq.defaultSeqConfig provider seqTools)
      { Seq.seqHydrationConfig = Just hydrationCfg }

Remaining for Full Migration

The infrastructure is complete. Full migration requires: 1. Modify OpAgent.runAgent to use infer instead of inferRaw, OR 2. Create a new IR-native agent program

The current system works (using inferRaw + manual context), and the new IR system is ready for incremental adoption.

🔄[human]InProgress → Done4 weeks ago

💬[human]4 weeks ago

Follow-up task created: t-480 (Integrate Prompt IR into Ava with observability). This covers wiring up the HydrationConfig in Telegram.hs and adding tracing to validate the system in real usage.

Migrate Memory System to Backend-Agnostic Architecture

Description

Migrate Memory System from Telegram/Ava to Core Agent Library

Overview

Current State

Target Architecture

1. Abstract Memory Backend (new: Omni/Agent/Memory/Backend.hs)

2. Embedding Provider Abstraction (new: Omni/Agent/Memory/Embedding.hs)

3. Concrete Backends

4. Agent Integration (update: Omni/Agent/Op.hs or new Omni/Agent/Op/Memory.hs)

5. Interpreter Updates (update: Omni/Agent/Interpreter/Sequential.hs)

6. Ava Wiring (update: Omni/Agent/Telegram.hs)

Implementation Steps

Context Window Integration (Future Enhancement)

Success Criteria

Files to Create/Modify

Timeline (14)

Design Decisions (2026-01-24)

Core Concept

Decisions Made

Open: Prompt IR Design

Prompt IR Design (2026-01-24)

Core Concept

Design Principles

The Flow

Core Types

Context Request (Agent Intent)

Optimization Operations

Optimization Hook Summary

Example: Ava's Current Prompt as IR

Implementation Progress (2026-01-24)

Completed Modules

The Pipeline

API Summary

Remaining Work

Phase 4 Complete: Op Integration (commit 3dbf484)

Summary

Changes

Current State

Remaining for full context-aware Ava

Implementation Complete (2026-01-24)

All Modules

Full Pipeline

Integration Ready

Remaining for Full Migration

1. Abstract Memory Backend (new: `Omni/Agent/Memory/Backend.hs`)

2. Embedding Provider Abstraction (new: `Omni/Agent/Memory/Embedding.hs`)

4. Agent Integration (update: `Omni/Agent/Op.hs` or new `Omni/Agent/Op/Memory.hs`)

5. Interpreter Updates (update: `Omni/Agent/Interpreter/Sequential.hs`)

6. Ava Wiring (update: `Omni/Agent/Telegram.hs`)