AI-Augmented Quant Pipeline: Research Notes

Date: 2026-03-11 Context: Ben wants to use Omni/Agent (Op free monad) to automate signal discovery and alpha combination (steps 1 & 2 of the quant pipeline), feeding into his existing Omni/Fund/Invest.hs portfolio model (Kelly optimization, Monte Carlo, rebalancing).

1. Architecture Overview

┌─────────────────────────────────────────────────────────────────┐
│  SIGNAL DISCOVERY AGENTS (Op programs)                          │
│                                                                  │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐        │
│  │ EDGAR    │  │ Macro    │  │ Sentiment│  │ Price    │        │
│  │ Agent    │  │ Agent    │  │ Agent    │  │ Agent    │        │
│  │(Form 4, │  │(FRED,    │  │(Earnings │  │(Momentum │        │
│  │ 10-K,   │  │ BLS,     │  │ calls,   │  │ Mean-rev │        │
│  │ 8-K)    │  │ Treasury)│  │ News NLP)│  │ Vol)     │        │
│  └────┬─────┘  └────┬─────┘  └────┬─────┘  └────┬─────┘        │
│       │              │              │              │              │
│       └──────────────┴──────────────┴──────────────┘              │
│                          │                                        │
│                    Op.par [...]                                   │
│                          │                                        │
│                    ┌─────▼──────┐                                 │
│                    │ ALPHA      │                                 │
│                    │ COMBINER   │                                 │
│                    │ (Bayesian  │                                 │
│                    │  update of │                                 │
│                    │  μ, σ, Σ)  │                                 │
│                    └─────┬──────┘                                 │
└──────────────────────────┼───────────────────────────────────────┘
                           │
                     JSON output:
                     updated AssetModel params
                           │
┌──────────────────────────▼───────────────────────────────────────┐
│  EXISTING HASKELL PIPELINE (Invest.hs)                           │
│                                                                   │
│  AssetModel(μ,σ,yield) → kellyOptimalN → runSimulation          │
│       → computeDeltas → rebalancing signals on invest page       │
└──────────────────────────────────────────────────────────────────┘

Key Insight: Agent = Eyes & Ears, Haskell = Brain

The LLM agents do information gathering and structuring (what a human analyst does). The math (Kelly, MC, optimization) stays in deterministic Haskell code. The agent NEVER outputs portfolio weights or expected return numbers directly— it outputs structured signal data that deterministic code converts into parameter updates.

2. What Already Exists

Invest.hs (the brain)

AssetModel: per-asset μ (expected return), σ (volatility), yield
kellyOptimalN: N-asset uncorrelated Kelly criterion, f*_i = (μ_i - r_f) / σ_i²
runSimulation: GBM Monte Carlo, expense-adjusted, percentile fan charts
computeDeltas: target vs actual → buy/sell/aligned signals
Currently uses STATIC config values from Config.hs:
- btcRealCagrBase = 0.15 (fixed!)
- btcAnnualSigma = 0.65 (fixed!)
- Equities: amExpReturn = 0.10, amVolatility = 0.18 (hardcoded!)

Op.hs (the agent framework)

Free monad with: infer, par, race, tool, get/put/modify, limit, budget, checkpoint
ContextRequest with dynamic context hydration
Tools available: web_search, fetch_url, run_bash; plus native Haskell data libraries
Programs/Research.hs already demonstrates the pattern:
- Fan out with Op.par [researchTopic t | t <- topics]
- CRDT state for conflict-free parallel merging
- Tool calls for web search + fetch
- LLM for extraction + synthesis

Config.hs (the parameters to update)

BTC CAGR scenarios: bear 10%, base 15%, bull 24% (all hardcoded)
btcAnnualSigma = 0.65
Equities: static 10% return, 18% vol
These are exactly the numbers the signal pipeline should update dynamically

3. Signal Sources (Free/Public Data)

Tier 1: Easy to implement, well-documented APIs

Source	Data	Signal Type	API	Update Freq
SEC EDGAR	Form 4 insider trades	Insider sentiment	data.sec.gov (free, no key)	Real-time
FRED	840K macro series (M2, yield curve, CPI, unemployment)	Macro regime	api.stlouisfed.org (free key)	Daily-monthly
Yahoo Finance	OHLCV price history	Momentum, vol, mean-reversion	Unofficial REST API (or Alpha Vantage)	Daily
Treasury.gov	Yield curves	Risk-free rate, term premium	api.fiscaldata.treasury.gov	Daily
CFTC COT	Futures positioning	Sentiment/positioning	cftc.gov (CSV)	Weekly

Tier 2: Moderate effort, high value

Source	Data	Signal Type	Update Freq
Earnings transcripts	Call text + estimates	PEAD, sentiment	Quarterly
Patent filings	USPTO PAIR	Innovation signal	Monthly
Job postings	Indeed/LinkedIn scrape	Growth signal	Weekly
App store rankings	Apple/Google	Revenue proxy	Daily
FINRA Short Interest	Short % of float	Crowding/squeeze	Bi-weekly

Tier 3: Needs more infra but powerful

Source	Data	Signal Type
Reddit/Twitter sentiment	NLP on financial subs	Retail sentiment
Satellite/weather	NOAA, Sentinel	Commodity/agriculture
Government procurement	SAM.gov, USAspending	Revenue leading indicator
Shipping data	AIS vessel tracking	Global trade leading indicator

Recommendation for v1: Start with Tier 1 only. EDGAR + FRED + market data gives you insider signal + macro regime + price-based signals. That’s a complete foundation.

4. The Alpha Combination Problem

Current Approach: Static Priors (What Invest.hs does now)

μ_BTC = 0.15  (hardcoded)
σ_BTC = 0.65  (hardcoded)

Target Approach: Black-Litterman-style Bayesian Updating

The Black-Litterman model is the right framework here. It:

Starts with a prior (your current Config.hs values)
Incorporates views (signals from agents) with confidence levels
Outputs a posterior (updated μ, σ, Σ)

Concretely:

Prior:     μ_prior = [0.15, 0.10, 0.08, 0.10]  (BTC, equities, RE, STRD)
           Σ_prior = diagonal([0.65², 0.18², 0.12², 0.05²])

Agent views (example):
  - "BTC 30-day realized vol is 45%, below historical mean" → confidence 0.7
  - "Insider buying cluster in XYZ (equities)" → confidence 0.5
  - "Yield curve inverted: recession signal" → confidence 0.6
  - "BTC trailing 90-day momentum positive" → confidence 0.4

Bayesian update:
  μ_posterior = μ_prior + τΣP'(PτΣP' + Ω)⁻¹(Q - Pμ_prior)
  where P = picking matrix, Q = view returns, Ω = view uncertainty

Output: updated AssetModel parameters for Invest.hs

The key insight: the agent produces the views (Q) and confidence levels (Ω), the Haskell code does the Bayesian math.

What needs to be built in Invest.hs:

Correlation matrix support (currently assumes uncorrelated)
- Full Kelly: f* = Σ⁻¹μ (matrix inverse, not scalar division)
- Need Cholesky decomposition for correlated MC paths

Black-Litterman update function

blUpdate :: CovMatrix -> [View] -> (ReturnVector, CovMatrix)

Signal → View converter

-- Agent output format
data Signal = Signal
  { sigAsset :: Text
  , sigType :: SignalType  -- Momentum | MeanReversion | Insider | Macro | ...
  , sigStrength :: Double  -- z-score or normalized value
  , sigConfidence :: Double  -- 0-1
  , sigSource :: Text  -- provenance
  , sigTimestamp :: UTCTime
  }

-- Convert to Black-Litterman view
signalToView :: Signal -> View

5. Native Haskell Data Libraries + Op Programs

Architecture Decision

No external Python calls. All data access is native Haskell, using typed wrappers around REST/JSON APIs built on http-conduit (same stack as Omni.Agent.Tools.Http).

Three stepping-stone libraries:

Omni.Fund.Data.Edgar — SEC EDGAR API (Form 4, company facts, submissions)
Omni.Fund.Data.Market — Price/volume data (Alpha Vantage or similar REST API)
Omni.Fund.Data.Fred — Federal Reserve Economic Data (macro series)

These are standalone, useful libraries independent of the agent pipeline. They become the data foundation that Op programs call via typed Haskell functions rather than shelling out to Python.

5.1 `Omni.Fund.Data.Edgar`

SEC EDGAR API is free, no auth, JSON at data.sec.gov.

Key endpoints:

GET /submissions/CIK{cik}.json — company filing history
GET /api/xbrl/companyfacts/CIK{cik}.json — all XBRL facts for a company
GET /api/xbrl/companyconcept/CIK{cik}/{taxonomy}/{tag}.json — specific concept
GET /api/xbrl/frames/{taxonomy}/{tag}/{unit}/{period}.json — cross-company frames

Ref: https://www.sec.gov/search-filings/edgar-application-programming-interfaces

module Omni.Fund.Data.Edgar where

-- | Company submissions (filing history)
data Submissions = Submissions
  { subCik        :: Text
  , subName       :: Text
  , subTickers    :: [Text]
  , subFilings    :: [Filing]
  }

data Filing = Filing
  { filingType    :: Text        -- "4", "10-K", "8-K", etc.
  , filingDate    :: Day
  , filingAccNo   :: Text        -- accession number
  , filingUrl     :: Text
  }

-- | Insider transaction from Form 4
data InsiderTransaction = InsiderTransaction
  { itReportingPerson :: Text
  , itRelationship    :: Text    -- "Officer", "Director", "10% Owner"
  , itTransactionType :: Text    -- "P" (purchase), "S" (sale)
  , itShares          :: Double
  , itPricePerShare   :: Double
  , itDate            :: Day
  , itTicker          :: Text
  }

-- Core API functions
getSubmissions :: Text -> IO (Either EdgarError Submissions)
getForm4Filings :: Text -> Int -> IO (Either EdgarError [InsiderTransaction])
getCompanyFacts :: Text -> IO (Either EdgarError CompanyFacts)
lookupCik :: Text -> IO (Either EdgarError Text)  -- ticker -> CIK

Note: EDGAR requires a User-Agent header with contact info per SEC policy.

5.2 `Omni.Fund.Data.Market`

For price/volume data. Alpha Vantage has a free tier (25 req/day) with clean REST API. Alternatively, Twelve Data or Polygon.io. All are simple JSON APIs.

module Omni.Fund.Data.Market where

data OHLCV = OHLCV
  { oDate   :: Day
  , oOpen   :: Double
  , oHigh   :: Double
  , oLow    :: Double
  , oClose  :: Double
  , oVolume :: Integer
  }

data TimeSeriesInterval = Daily | Weekly | Monthly

-- Core API functions
getDailyPrices :: Text -> Int -> IO (Either MarketError [OHLCV])
  -- ^ ticker, num days

-- Derived computations (pure Haskell, no API call)
trailingReturn :: Int -> [OHLCV] -> Double
  -- ^ window size, price history -> annualized return

realizedVol :: Int -> [OHLCV] -> Double
  -- ^ window size -> annualized volatility

meanReversionZ :: Int -> [OHLCV] -> Double
  -- ^ SMA window -> z-score of current price vs SMA

correlationMatrix :: [[OHLCV]] -> Matrix Double
  -- ^ price histories for N assets -> NxN correlation matrix

The pure computation functions (trailing return, vol, z-score, correlation) are deterministic math that lives in Haskell — no LLM, no external calls. These replace the numpy computations from the original Python design.

5.3 `Omni.Fund.Data.Fred`

FRED API: free with API key, REST/JSON, well-documented. Ref: https://fred.stlouisfed.org/docs/api/fred/

Evaluate gborough/fred on Hackage first — if it works, use it. If stale, write a thin wrapper (the API is ~10 endpoints, mostly series/observations).

module Omni.Fund.Data.Fred where

data FredSeries = FredSeries
  { fsId          :: Text       -- e.g. "T10Y2Y"
  , fsTitle       :: Text
  , fsFrequency   :: Text       -- "Daily", "Monthly"
  , fsUnits       :: Text
  }

data Observation = Observation
  { obsDate  :: Day
  , obsValue :: Maybe Double    -- FRED uses "." for missing
  }

-- Core API functions
getSeriesObservations :: Text -> Day -> Day -> IO (Either FredError [Observation])
  -- ^ series_id, start, end

getLatestValue :: Text -> IO (Either FredError Double)
  -- ^ series_id -> most recent observation

-- Convenience: pull a batch of key macro series
data MacroSnapshot = MacroSnapshot
  { msYieldCurve   :: Double    -- T10Y2Y
  , msM2Growth     :: Double    -- M2SL yoy change
  , msUnemployment :: Double    -- UNRATE
  , msCPI          :: Double    -- CPIAUCSL
  , msHYSpread     :: Double    -- BAMLH0A0HYM2
  , msVIX          :: Double    -- VIXCLS
  , msTimestamp     :: UTCTime
  }

getMacroSnapshot :: IO (Either FredError MacroSnapshot)

Key FRED series for signal pipeline:

T10Y2Y (10Y-2Y spread → yield curve shape)
M2SL (M2 money supply → liquidity)
UNRATE (unemployment → cycle position)
CPIAUCSL (CPI → inflation)
BAMLH0A0HYM2 (high-yield spread → credit risk)
VIXCLS (VIX → implied vol)

5.4 Op Programs Using Native Libraries

With the data libraries in place, the Op programs become clean compositions:

signalScan :: [Text] -> Op SignalState [Signal]
signalScan assets = do
  Op.checkpoint "init"

  results <- Op.par
    [ insiderSignals assets
    , macroSignals
    , priceSignals assets
    ]

  Op.checkpoint "signals-gathered"
  let allSignals = concat results

  -- LLM assesses cross-signal coherence
  coherenceAdjusted <- assessCoherence allSignals
  pure coherenceAdjusted

-- Insider signals: native EDGAR API, LLM interprets
checkInsider :: Text -> Op SignalState [Signal]
checkInsider ticker = do
  -- Direct Haskell call, no Python
  filings <- Op.io $ Edgar.getForm4Filings ticker 10

  case filings of
    Left err -> do
      Op.log ("EDGAR error for " <> ticker <> ": " <> show err)
      pure []
    Right txns -> do
      -- Filter significant purchases
      let significant = filter isSignificantPurchase txns
      -- LLM interprets patterns (optional — could be pure rules)
      if null significant
        then pure []
        else do
          response <- Op.infer (Op.Model "claude-sonnet-4-20250514")
            defaultContextRequest
              { crObservation = "Analyze these insider transactions for " <> ticker
                             <> ":\n" <> formatTransactions significant
              , crGoal = Just "Extract insider trading signals"
              }
          pure (parseInsiderSignals response)

-- Macro signals: native FRED API, LLM interprets regime
macroSignals :: Op SignalState [Signal]
macroSignals = do
  snapshot <- Op.io Fred.getMacroSnapshot
  case snapshot of
    Left err -> do
      Op.log ("FRED error: " <> show err)
      pure []
    Right ms -> do
      response <- Op.infer (Op.Model "claude-sonnet-4-20250514")
        defaultContextRequest
          { crObservation = formatMacroSnapshot ms
          , crGoal = Just "Assess macro regime and generate signals"
          }
      pure (parseMacroSignals response)

-- Price signals: native Market API, pure Haskell math, no LLM needed
priceSignals :: [Text] -> Op SignalState [Signal]
priceSignals tickers = do
  histories <- Op.io $ mapM (\t -> (t,) <$> Market.getDailyPrices t 252) tickers
  pure $ concatMap mkPriceSignals histories
  where
    mkPriceSignals (ticker, Right prices) =
      [ Signal ticker "momentum" (Market.trailingReturn 63 prices) 0.7 "market_90d"
      , Signal ticker "volatility" (negate $ Market.realizedVol 63 prices) 0.8 "market_vol"
      , Signal ticker "mean_reversion" (Market.meanReversionZ 200 prices) 0.5 "market_zscore"
      ]
    mkPriceSignals (_, Left _) = []

Note: priceSignals is entirely deterministic — pure Haskell math on market data. No LLM involvement. The agent framework is used for orchestration (Op.par, Op.io) but the computation is typed, testable, and reproducible.

5.5 Implementation Order

Omni.Fund.Data.Edgar — most valuable signal source, clean API, no auth needed
Omni.Fund.Data.Market — needed for price signals and correlation matrix
Omni.Fund.Data.Fred — macro context, evaluate gborough/fred first
Wire into Invest.hs — replace static μ/σ with data-driven estimates
Op programs — orchestrate the above with agent framework

Steps 1-3 are independently useful even without the agent pipeline.

6. Integration with Invest.hs

6.1 Signal Output Format (JSON)

{
  "timestamp": "2026-03-11T03:00:00Z",
  "signals": [
    {
      "asset": "BTC",
      "type": "momentum",
      "strength": 1.2,
      "confidence": 0.6,
      "source": "market_90d_trailing",
      "detail": "90-day trailing return annualized: 42%"
    },
    {
      "asset": "BTC",
      "type": "volatility",
      "strength": -0.8,
      "confidence": 0.8,
      "source": "market_realized_vol",
      "detail": "63-day realized vol: 45% vs historical 65%"
    },
    {
      "asset": "equities",
      "type": "insider",
      "strength": 0.5,
      "confidence": 0.4,
      "source": "edgar_form4",
      "detail": "3 C-suite purchases >$100K in SPY components this week"
    },
    {
      "asset": "ALL",
      "type": "macro_regime",
      "strength": -0.3,
      "confidence": 0.5,
      "source": "fred_composite",
      "detail": "Yield curve flat, M2 growth decelerating, VIX elevated"
    }
  ],
  "correlation_matrix": {
    "assets": ["BTC", "equities", "real_estate", "STRD"],
    "matrix": [[1.0, 0.45, 0.1, 0.05],
               [0.45, 1.0, 0.3, 0.1],
               [0.1, 0.3, 1.0, 0.05],
               [0.05, 0.1, 0.05, 1.0]]
  },
  "updated_params": {
    "BTC": {"mu": 0.18, "sigma": 0.55},
    "equities": {"mu": 0.11, "sigma": 0.18},
    "real_estate": {"mu": 0.08, "sigma": 0.12},
    "STRD": {"mu": 0.10, "sigma": 0.05}
  }
}

6.2 New Haskell Code Needed

-- In Invest.hs or a new SignalIntegration.hs module:

-- | Read signal file produced by the agent pipeline
readSignals :: FilePath -> IO (Either Text SignalBundle)

-- | Update AssetModel parameters using Bayesian update
applySignals :: PortfolioModel -> SignalBundle -> PortfolioModel

-- | Black-Litterman update (the core math)
blUpdate 
  :: Vector Double        -- prior returns (μ)
  -> Matrix Double        -- prior covariance (Σ)  
  -> Double               -- tau (confidence scalar, ~0.05)
  -> Matrix Double        -- picking matrix (P)
  -> Vector Double        -- views (Q)
  -> Matrix Double        -- view uncertainty (Ω)
  -> (Vector Double, Matrix Double)  -- posterior (μ', Σ')

-- | Full Kelly with correlation matrix
kellyOptimalCorrelated
  :: Double               -- risk-free rate
  -> Vector Double        -- expected returns
  -> Matrix Double        -- covariance matrix
  -> Vector Double        -- optimal fractions (f* = Σ⁻¹μ)

-- | Correlated GBM paths (Cholesky decomposition)
simulateCorrelated 
  :: Matrix Double        -- Cholesky factor of Σ
  -> ...                  -- same args as current simulateOnePath

6.3 Integration Flow

1. Agent pipeline runs (daily cron or on-demand):
   - Op program executes signalScan
   - Outputs JSON to /var/fund/signals.json

2. fund-data daemon picks up signals.json on next refresh cycle (every 15 min)

3. Invest.hs reads signals.json:
   - Applies Bayesian update to prior μ/σ
   - Computes correlated Kelly weights
   - Runs MC simulation with updated parameters
   - Outputs deltas for invest page

4. Invest page shows:
   - Current signal readings with confidence
   - How signals changed the expected returns
   - Updated Kelly weights vs current allocation
   - MC fan chart with signal-adjusted parameters

7. Relevant Literature

AlphaAgent (Feb 2025)

Paper: https://arxiv.org/abs/2502.16789
Three-agent architecture: Idea Agent → Factor Agent → Eval Agent
Key insight: LLM-generated factors suffer from homogenization (all LLMs converge on similar strategies). AlphaAgent combats this via:
1. AST-based originality checks (reject factors too similar to existing ones)
2. Hypothesis-factor alignment (semantic consistency between the market hypothesis and the mathematical factor)
3. Complexity control (prevent overfitting to noise)
Results: 11% annual excess return (IR=1.5) on CSI 500 over 4 years
Relevance to us: The three-agent pattern maps well to Op’s par composition. But their factors are formulaic (mathematical expressions over price/volume). Our approach is broader (multi-source including fundamental + alt data).

Black-Litterman Model

PyPortfolioOpt implementation: https://pyportfolioopt.readthedocs.io/en/latest/BlackLitterman.html
Perfect framework for combining agent “views” with our prior assumptions
Key params: τ (tau) controls how much views influence posterior (~0.05 default)
Idzorek’s method converts percentage confidences to view uncertainty matrix

Post-Earnings Announcement Drift (PEAD)

One of the most robust anomalies in finance (documented since 1968)
Stocks drift 3-5% in direction of earnings surprise over 60+ days
Survives even after publication (behavioral, not arbitrage-limited)
Actionable for our pipeline: scrape earnings + estimate surprises

8. Implementation Plan

Phase 1: Market Data Foundation (1-2 weeks)

Implement Omni.Fund.Data.Market — price/volume data from REST API (Alpha Vantage or similar)
Implement Omni.Fund.Data.Edgar — SEC EDGAR API wrapper (Form 4, submissions)
Implement Omni.Fund.Data.Fred — FRED macro data wrapper (or evaluate gborough/fred)
Add pure Haskell computations: trailing returns, realized vol, correlation matrix
Create SignalBundle type in Invest.hs, wire data libs → signal output
Run as daily cron job (systemd timer)

This phase involves NO LLM. Just native Haskell data access + math. No Python in the loop — all three data libraries are thin typed wrappers around REST/JSON APIs, built on the same http-conduit stack as Omni.Agent.Tools.Http.

Phase 2: Bayesian Integration (1-2 weeks)

Implement blUpdate in Invest.hs (or use hmatrix for matrix math)
Implement kellyOptimalCorrelated (replace uncorrelated version)
Implement Cholesky-correlated GBM for Monte Carlo
Wire signals.json → updated AssetModel → invest page
Add signal display to invest page (what signals are active, confidence)

This phase is pure Haskell math. Still no LLM.

Phase 3: Agent-Augmented Signals (2-3 weeks)

Write first Op program: signalScan using Op.par for parallel sources
Insider signal agent (EDGAR + LLM interpretation)
Macro regime agent (FRED data + LLM interpretation)
Cross-signal coherence assessment (LLM synthesizes multiple signal readings)
Run via Op.Runner as scheduled task or on-demand

This is where the LLM enters. The agent interprets data, not computes numbers.

Phase 4: Feedback & Decay Tracking (ongoing)

Track predicted vs realized returns per signal
Information coefficient (IC) measurement per signal type
Signal decay curves (is the signal losing predictive power?)
Kill switch: auto-disable signals whose IC drops below threshold

9. Missing Pieces / Gaps

In Invest.hs:

Correlation matrix — currently assumes uncorrelated. Need hmatrix or similar for linear algebra (matrix inverse, Cholesky).
Dynamic parameters — currently reads static Config.hs. Need to read from signals.json and fall back to Config.hs defaults if no signals available.
Signal display — invest page needs a “signals” section showing current readings and how they’re affecting the model.

In Op infrastructure:

Scheduled execution — need a way to run Op programs on a cron schedule. Could use systemd timer + op-runner CLI, or integrate with agentd.
Signal persistence — signals should be stored with timestamps so we can track decay over time. SQLite or just JSONL append?

Risks:

Garbage in, garbage out — if the LLM misinterprets a signal, the Bayesian update will propagate the error. Mitigate with conservative τ (low confidence in views) and confidence clamping.
Overfitting — backtesting on the same data used to develop signals. Need out-of-sample validation period.
Latency — EDGAR filings are public instantly but our pipeline runs daily. For insider trading signals, same-day is fine. For price momentum, daily is fine. HFT signals are out of scope.

10. Quick Win: Dynamic μ/σ from Price Data (No LLM)

Before building the full agent pipeline, the single highest-value change:

-- Omni/Fund/UpdateParams.hs — run as daily systemd timer
-- Uses native Haskell data libraries, no external dependencies

module Omni.Fund.UpdateParams where

import Omni.Fund.Data.Market (getDailyPrices, trailingReturn, realizedVol, correlationMatrix)
import Data.Aeson (encode)
import qualified Data.ByteString.Lazy as BL

assets :: [(Text, Text)]  -- (API ticker, internal name)
assets = [("BTC-USD", "BTC"), ("SPY", "equities")]

updateParams :: IO ()
updateParams = do
  -- Pull 2 years of daily prices per asset
  priceHistories <- forM assets $ \(ticker, name) -> do
    prices <- getDailyPrices ticker 504  -- ~2 years trading days
    let mu    = trailingReturn 252 prices   -- 1-year trailing
        sigma = realizedVol 252 prices      -- 1-year realized vol
    pure (name, mu, sigma, prices)

  -- Compute correlation matrix across all assets
  let allPrices = map (\(_, _, _, ps) -> ps) priceHistories
      corrMatrix = correlationMatrix allPrices

  -- Write signal bundle
  let bundle = SignalBundle
        { sbTimestamp = now
        , sbParams = [ (name, mu, sigma) | (name, mu, sigma, _) <- priceHistories ]
        , sbCorrelation = corrMatrix
        }
  BL.writeFile "/var/fund/signals.json" (encode bundle)

This alone would make the invest page responsive to actual market conditions instead of using hardcoded assumptions. It’s the minimum viable signal pipeline. No Python, no external processes — just a Haskell executable on a timer.

11. Summary of Decisions Needed

Haskell matrix library: hmatrix (C bindings, fast) vs pure Haskell (matrix, linear)? hmatrix is standard but has C dependency.
Signal storage format: JSON file (simple) vs SQLite (queryable, historical)? Recommend: JSON file for v1, migrate to SQLite when tracking signal decay.
Scheduling: systemd timer (simple) vs agentd integration (fancy)? Recommend: systemd timer for Phase 1-2, agentd for Phase 3+ when Op programs need budget/checkpoint/steering support.
How much to trust agent views: τ parameter in Black-Litterman controls this. Start very conservative (τ = 0.01, views barely nudge the prior). Increase as we validate signal quality with realized IC measurements.
Scope of asset universe: Current Invest.hs tracks ~5 assets (BTC, STRD, equities, RE, cash). Do we want to expand to individual stocks? Recommend: No for v1. Keep the asset universe small and focus on getting the pipeline working. Individual stock signals can come later.

References

AlphaAgent paper: https://arxiv.org/abs/2502.16789
SEC EDGAR APIs: https://www.sec.gov/search-filings/edgar-application-programming-interfaces
FRED API: https://fred.stlouisfed.org/docs/api/fred/
Black-Litterman in PyPortfolioOpt: https://pyportfolioopt.readthedocs.io/en/latest/BlackLitterman.html
Free quant data sources: https://tradescopeblog.info/article/top-10-free-datasets-every-retail-quant-should-bookmark-in-2025
awesome-quant resource list: https://github.com/wilsonfreitas/awesome-quant