AI-Augmented Quant Pipeline: Research Notes
Date: 2026-03-11 Context: Ben wants to use Omni/Agent (Op free monad) to automate signal discovery and alpha combination (steps 1 & 2 of the quant pipeline), feeding into his existing Omni/Fund/Invest.hs portfolio model (Kelly optimization, Monte Carlo, rebalancing).
1. Architecture Overview
┌─────────────────────────────────────────────────────────────────┐
│ SIGNAL DISCOVERY AGENTS (Op programs) │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ EDGAR │ │ Macro │ │ Sentiment│ │ Price │ │
│ │ Agent │ │ Agent │ │ Agent │ │ Agent │ │
│ │(Form 4, │ │(FRED, │ │(Earnings │ │(Momentum │ │
│ │ 10-K, │ │ BLS, │ │ calls, │ │ Mean-rev │ │
│ │ 8-K) │ │ Treasury)│ │ News NLP)│ │ Vol) │ │
│ └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘ │
│ │ │ │ │ │
│ └──────────────┴──────────────┴──────────────┘ │
│ │ │
│ Op.par [...] │
│ │ │
│ ┌─────▼──────┐ │
│ │ ALPHA │ │
│ │ COMBINER │ │
│ │ (Bayesian │ │
│ │ update of │ │
│ │ μ, σ, Σ) │ │
│ └─────┬──────┘ │
└──────────────────────────┼───────────────────────────────────────┘
│
JSON output:
updated AssetModel params
│
┌──────────────────────────▼───────────────────────────────────────┐
│ EXISTING HASKELL PIPELINE (Invest.hs) │
│ │
│ AssetModel(μ,σ,yield) → kellyOptimalN → runSimulation │
│ → computeDeltas → rebalancing signals on invest page │
└──────────────────────────────────────────────────────────────────┘
Key Insight: Agent = Eyes & Ears, Haskell = Brain
The LLM agents do information gathering and structuring (what a human analyst does). The math (Kelly, MC, optimization) stays in deterministic Haskell code. The agent NEVER outputs portfolio weights or expected return numbers directly— it outputs structured signal data that deterministic code converts into parameter updates.
2. What Already Exists
Invest.hs (the brain)
AssetModel: per-asset μ (expected return), σ (volatility), yieldkellyOptimalN: N-asset uncorrelated Kelly criterion, f*_i = (μ_i - r_f) / σ_i²runSimulation: GBM Monte Carlo, expense-adjusted, percentile fan chartscomputeDeltas: target vs actual → buy/sell/aligned signals- Currently uses STATIC config values from Config.hs:
btcRealCagrBase = 0.15(fixed!)btcAnnualSigma = 0.65(fixed!)- Equities:
amExpReturn = 0.10,amVolatility = 0.18(hardcoded!)
Op.hs (the agent framework)
- Free monad with:
infer,par,race,tool,get/put/modify,limit,budget,checkpoint ContextRequestwith dynamic context hydration- Tools available:
web_search,fetch_url,run_bash; plus native Haskell data libraries Programs/Research.hsalready demonstrates the pattern:- Fan out with
Op.par [researchTopic t | t <- topics] - CRDT state for conflict-free parallel merging
- Tool calls for web search + fetch
- LLM for extraction + synthesis
- Fan out with
Config.hs (the parameters to update)
- BTC CAGR scenarios: bear 10%, base 15%, bull 24% (all hardcoded)
btcAnnualSigma = 0.65- Equities: static 10% return, 18% vol
- These are exactly the numbers the signal pipeline should update dynamically
3. Signal Sources (Free/Public Data)
Tier 1: Easy to implement, well-documented APIs
| Source | Data | Signal Type | API | Update Freq |
|---|---|---|---|---|
| SEC EDGAR | Form 4 insider trades | Insider sentiment | data.sec.gov (free, no key) | Real-time |
| FRED | 840K macro series (M2, yield curve, CPI, unemployment) | Macro regime | api.stlouisfed.org (free key) | Daily-monthly |
| Yahoo Finance | OHLCV price history | Momentum, vol, mean-reversion | Unofficial REST API (or Alpha Vantage) | Daily |
| Treasury.gov | Yield curves | Risk-free rate, term premium | api.fiscaldata.treasury.gov | Daily |
| CFTC COT | Futures positioning | Sentiment/positioning | cftc.gov (CSV) | Weekly |
Tier 2: Moderate effort, high value
| Source | Data | Signal Type | Update Freq |
|---|---|---|---|
| Earnings transcripts | Call text + estimates | PEAD, sentiment | Quarterly |
| Patent filings | USPTO PAIR | Innovation signal | Monthly |
| Job postings | Indeed/LinkedIn scrape | Growth signal | Weekly |
| App store rankings | Apple/Google | Revenue proxy | Daily |
| FINRA Short Interest | Short % of float | Crowding/squeeze | Bi-weekly |
Tier 3: Needs more infra but powerful
| Source | Data | Signal Type |
|---|---|---|
| Reddit/Twitter sentiment | NLP on financial subs | Retail sentiment |
| Satellite/weather | NOAA, Sentinel | Commodity/agriculture |
| Government procurement | SAM.gov, USAspending | Revenue leading indicator |
| Shipping data | AIS vessel tracking | Global trade leading indicator |
Recommendation for v1: Start with Tier 1 only. EDGAR + FRED + market data gives you insider signal + macro regime + price-based signals. That’s a complete foundation.
4. The Alpha Combination Problem
Current Approach: Static Priors (What Invest.hs does now)
μ_BTC = 0.15 (hardcoded)
σ_BTC = 0.65 (hardcoded)
Target Approach: Black-Litterman-style Bayesian Updating
The Black-Litterman model is the right framework here. It:
- Starts with a prior (your current Config.hs values)
- Incorporates views (signals from agents) with confidence levels
- Outputs a posterior (updated μ, σ, Σ)
Concretely:
Prior: μ_prior = [0.15, 0.10, 0.08, 0.10] (BTC, equities, RE, STRD)
Σ_prior = diagonal([0.65², 0.18², 0.12², 0.05²])
Agent views (example):
- "BTC 30-day realized vol is 45%, below historical mean" → confidence 0.7
- "Insider buying cluster in XYZ (equities)" → confidence 0.5
- "Yield curve inverted: recession signal" → confidence 0.6
- "BTC trailing 90-day momentum positive" → confidence 0.4
Bayesian update:
μ_posterior = μ_prior + τΣP'(PτΣP' + Ω)⁻¹(Q - Pμ_prior)
where P = picking matrix, Q = view returns, Ω = view uncertainty
Output: updated AssetModel parameters for Invest.hs
The key insight: the agent produces the views (Q) and confidence levels (Ω), the Haskell code does the Bayesian math.
What needs to be built in Invest.hs:
-
Correlation matrix support (currently assumes uncorrelated)
- Full Kelly: f* = Σ⁻¹μ (matrix inverse, not scalar division)
- Need Cholesky decomposition for correlated MC paths
-
Black-Litterman update function
blUpdate :: CovMatrix -> [View] -> (ReturnVector, CovMatrix) -
Signal → View converter
-- Agent output format data Signal = Signal { sigAsset :: Text , sigType :: SignalType -- Momentum | MeanReversion | Insider | Macro | ... , sigStrength :: Double -- z-score or normalized value , sigConfidence :: Double -- 0-1 , sigSource :: Text -- provenance , sigTimestamp :: UTCTime } -- Convert to Black-Litterman view signalToView :: Signal -> View
5. Native Haskell Data Libraries + Op Programs
Architecture Decision
No external Python calls. All data access is native Haskell, using typed wrappers
around REST/JSON APIs built on http-conduit (same stack as Omni.Agent.Tools.Http).
Three stepping-stone libraries:
Omni.Fund.Data.Edgar— SEC EDGAR API (Form 4, company facts, submissions)Omni.Fund.Data.Market— Price/volume data (Alpha Vantage or similar REST API)Omni.Fund.Data.Fred— Federal Reserve Economic Data (macro series)
These are standalone, useful libraries independent of the agent pipeline. They become the data foundation that Op programs call via typed Haskell functions rather than shelling out to Python.
5.1 Omni.Fund.Data.Edgar
SEC EDGAR API is free, no auth, JSON at data.sec.gov.
Key endpoints:
GET /submissions/CIK{cik}.json— company filing historyGET /api/xbrl/companyfacts/CIK{cik}.json— all XBRL facts for a companyGET /api/xbrl/companyconcept/CIK{cik}/{taxonomy}/{tag}.json— specific conceptGET /api/xbrl/frames/{taxonomy}/{tag}/{unit}/{period}.json— cross-company frames
Ref: https://www.sec.gov/search-filings/edgar-application-programming-interfaces
module Omni.Fund.Data.Edgar where
-- | Company submissions (filing history)
data Submissions = Submissions
{ subCik :: Text
, subName :: Text
, subTickers :: [Text]
, subFilings :: [Filing]
}
data Filing = Filing
{ filingType :: Text -- "4", "10-K", "8-K", etc.
, filingDate :: Day
, filingAccNo :: Text -- accession number
, filingUrl :: Text
}
-- | Insider transaction from Form 4
data InsiderTransaction = InsiderTransaction
{ itReportingPerson :: Text
, itRelationship :: Text -- "Officer", "Director", "10% Owner"
, itTransactionType :: Text -- "P" (purchase), "S" (sale)
, itShares :: Double
, itPricePerShare :: Double
, itDate :: Day
, itTicker :: Text
}
-- Core API functions
getSubmissions :: Text -> IO (Either EdgarError Submissions)
getForm4Filings :: Text -> Int -> IO (Either EdgarError [InsiderTransaction])
getCompanyFacts :: Text -> IO (Either EdgarError CompanyFacts)
lookupCik :: Text -> IO (Either EdgarError Text) -- ticker -> CIK
Note: EDGAR requires a User-Agent header with contact info per SEC policy.
5.2 Omni.Fund.Data.Market
For price/volume data. Alpha Vantage has a free tier (25 req/day) with clean REST API. Alternatively, Twelve Data or Polygon.io. All are simple JSON APIs.
module Omni.Fund.Data.Market where
data OHLCV = OHLCV
{ oDate :: Day
, oOpen :: Double
, oHigh :: Double
, oLow :: Double
, oClose :: Double
, oVolume :: Integer
}
data TimeSeriesInterval = Daily | Weekly | Monthly
-- Core API functions
getDailyPrices :: Text -> Int -> IO (Either MarketError [OHLCV])
-- ^ ticker, num days
-- Derived computations (pure Haskell, no API call)
trailingReturn :: Int -> [OHLCV] -> Double
-- ^ window size, price history -> annualized return
realizedVol :: Int -> [OHLCV] -> Double
-- ^ window size -> annualized volatility
meanReversionZ :: Int -> [OHLCV] -> Double
-- ^ SMA window -> z-score of current price vs SMA
correlationMatrix :: [[OHLCV]] -> Matrix Double
-- ^ price histories for N assets -> NxN correlation matrix
The pure computation functions (trailing return, vol, z-score, correlation) are deterministic math that lives in Haskell — no LLM, no external calls. These replace the numpy computations from the original Python design.
5.3 Omni.Fund.Data.Fred
FRED API: free with API key, REST/JSON, well-documented. Ref: https://fred.stlouisfed.org/docs/api/fred/
Evaluate gborough/fred on Hackage first — if it works, use it. If stale, write a
thin wrapper (the API is ~10 endpoints, mostly series/observations).
module Omni.Fund.Data.Fred where
data FredSeries = FredSeries
{ fsId :: Text -- e.g. "T10Y2Y"
, fsTitle :: Text
, fsFrequency :: Text -- "Daily", "Monthly"
, fsUnits :: Text
}
data Observation = Observation
{ obsDate :: Day
, obsValue :: Maybe Double -- FRED uses "." for missing
}
-- Core API functions
getSeriesObservations :: Text -> Day -> Day -> IO (Either FredError [Observation])
-- ^ series_id, start, end
getLatestValue :: Text -> IO (Either FredError Double)
-- ^ series_id -> most recent observation
-- Convenience: pull a batch of key macro series
data MacroSnapshot = MacroSnapshot
{ msYieldCurve :: Double -- T10Y2Y
, msM2Growth :: Double -- M2SL yoy change
, msUnemployment :: Double -- UNRATE
, msCPI :: Double -- CPIAUCSL
, msHYSpread :: Double -- BAMLH0A0HYM2
, msVIX :: Double -- VIXCLS
, msTimestamp :: UTCTime
}
getMacroSnapshot :: IO (Either FredError MacroSnapshot)
Key FRED series for signal pipeline:
T10Y2Y(10Y-2Y spread → yield curve shape)M2SL(M2 money supply → liquidity)UNRATE(unemployment → cycle position)CPIAUCSL(CPI → inflation)BAMLH0A0HYM2(high-yield spread → credit risk)VIXCLS(VIX → implied vol)
5.4 Op Programs Using Native Libraries
With the data libraries in place, the Op programs become clean compositions:
signalScan :: [Text] -> Op SignalState [Signal]
signalScan assets = do
Op.checkpoint "init"
results <- Op.par
[ insiderSignals assets
, macroSignals
, priceSignals assets
]
Op.checkpoint "signals-gathered"
let allSignals = concat results
-- LLM assesses cross-signal coherence
coherenceAdjusted <- assessCoherence allSignals
pure coherenceAdjusted
-- Insider signals: native EDGAR API, LLM interprets
checkInsider :: Text -> Op SignalState [Signal]
checkInsider ticker = do
-- Direct Haskell call, no Python
filings <- Op.io $ Edgar.getForm4Filings ticker 10
case filings of
Left err -> do
Op.log ("EDGAR error for " <> ticker <> ": " <> show err)
pure []
Right txns -> do
-- Filter significant purchases
let significant = filter isSignificantPurchase txns
-- LLM interprets patterns (optional — could be pure rules)
if null significant
then pure []
else do
response <- Op.infer (Op.Model "claude-sonnet-4-20250514")
defaultContextRequest
{ crObservation = "Analyze these insider transactions for " <> ticker
<> ":\n" <> formatTransactions significant
, crGoal = Just "Extract insider trading signals"
}
pure (parseInsiderSignals response)
-- Macro signals: native FRED API, LLM interprets regime
macroSignals :: Op SignalState [Signal]
macroSignals = do
snapshot <- Op.io Fred.getMacroSnapshot
case snapshot of
Left err -> do
Op.log ("FRED error: " <> show err)
pure []
Right ms -> do
response <- Op.infer (Op.Model "claude-sonnet-4-20250514")
defaultContextRequest
{ crObservation = formatMacroSnapshot ms
, crGoal = Just "Assess macro regime and generate signals"
}
pure (parseMacroSignals response)
-- Price signals: native Market API, pure Haskell math, no LLM needed
priceSignals :: [Text] -> Op SignalState [Signal]
priceSignals tickers = do
histories <- Op.io $ mapM (\t -> (t,) <$> Market.getDailyPrices t 252) tickers
pure $ concatMap mkPriceSignals histories
where
mkPriceSignals (ticker, Right prices) =
[ Signal ticker "momentum" (Market.trailingReturn 63 prices) 0.7 "market_90d"
, Signal ticker "volatility" (negate $ Market.realizedVol 63 prices) 0.8 "market_vol"
, Signal ticker "mean_reversion" (Market.meanReversionZ 200 prices) 0.5 "market_zscore"
]
mkPriceSignals (_, Left _) = []
Note: priceSignals is entirely deterministic — pure Haskell math on market data.
No LLM involvement. The agent framework is used for orchestration (Op.par, Op.io)
but the computation is typed, testable, and reproducible.
5.5 Implementation Order
Omni.Fund.Data.Edgar— most valuable signal source, clean API, no auth neededOmni.Fund.Data.Market— needed for price signals and correlation matrixOmni.Fund.Data.Fred— macro context, evaluategborough/fredfirst- Wire into Invest.hs — replace static μ/σ with data-driven estimates
- Op programs — orchestrate the above with agent framework
Steps 1-3 are independently useful even without the agent pipeline.
6. Integration with Invest.hs
6.1 Signal Output Format (JSON)
{
"timestamp": "2026-03-11T03:00:00Z",
"signals": [
{
"asset": "BTC",
"type": "momentum",
"strength": 1.2,
"confidence": 0.6,
"source": "market_90d_trailing",
"detail": "90-day trailing return annualized: 42%"
},
{
"asset": "BTC",
"type": "volatility",
"strength": -0.8,
"confidence": 0.8,
"source": "market_realized_vol",
"detail": "63-day realized vol: 45% vs historical 65%"
},
{
"asset": "equities",
"type": "insider",
"strength": 0.5,
"confidence": 0.4,
"source": "edgar_form4",
"detail": "3 C-suite purchases >$100K in SPY components this week"
},
{
"asset": "ALL",
"type": "macro_regime",
"strength": -0.3,
"confidence": 0.5,
"source": "fred_composite",
"detail": "Yield curve flat, M2 growth decelerating, VIX elevated"
}
],
"correlation_matrix": {
"assets": ["BTC", "equities", "real_estate", "STRD"],
"matrix": [[1.0, 0.45, 0.1, 0.05],
[0.45, 1.0, 0.3, 0.1],
[0.1, 0.3, 1.0, 0.05],
[0.05, 0.1, 0.05, 1.0]]
},
"updated_params": {
"BTC": {"mu": 0.18, "sigma": 0.55},
"equities": {"mu": 0.11, "sigma": 0.18},
"real_estate": {"mu": 0.08, "sigma": 0.12},
"STRD": {"mu": 0.10, "sigma": 0.05}
}
}
6.2 New Haskell Code Needed
-- In Invest.hs or a new SignalIntegration.hs module:
-- | Read signal file produced by the agent pipeline
readSignals :: FilePath -> IO (Either Text SignalBundle)
-- | Update AssetModel parameters using Bayesian update
applySignals :: PortfolioModel -> SignalBundle -> PortfolioModel
-- | Black-Litterman update (the core math)
blUpdate
:: Vector Double -- prior returns (μ)
-> Matrix Double -- prior covariance (Σ)
-> Double -- tau (confidence scalar, ~0.05)
-> Matrix Double -- picking matrix (P)
-> Vector Double -- views (Q)
-> Matrix Double -- view uncertainty (Ω)
-> (Vector Double, Matrix Double) -- posterior (μ', Σ')
-- | Full Kelly with correlation matrix
kellyOptimalCorrelated
:: Double -- risk-free rate
-> Vector Double -- expected returns
-> Matrix Double -- covariance matrix
-> Vector Double -- optimal fractions (f* = Σ⁻¹μ)
-- | Correlated GBM paths (Cholesky decomposition)
simulateCorrelated
:: Matrix Double -- Cholesky factor of Σ
-> ... -- same args as current simulateOnePath
6.3 Integration Flow
1. Agent pipeline runs (daily cron or on-demand):
- Op program executes signalScan
- Outputs JSON to /var/fund/signals.json
2. fund-data daemon picks up signals.json on next refresh cycle (every 15 min)
3. Invest.hs reads signals.json:
- Applies Bayesian update to prior μ/σ
- Computes correlated Kelly weights
- Runs MC simulation with updated parameters
- Outputs deltas for invest page
4. Invest page shows:
- Current signal readings with confidence
- How signals changed the expected returns
- Updated Kelly weights vs current allocation
- MC fan chart with signal-adjusted parameters
7. Relevant Literature
AlphaAgent (Feb 2025)
- Paper: https://arxiv.org/abs/2502.16789
- Three-agent architecture: Idea Agent → Factor Agent → Eval Agent
- Key insight: LLM-generated factors suffer from homogenization (all LLMs
converge on similar strategies). AlphaAgent combats this via:
- AST-based originality checks (reject factors too similar to existing ones)
- Hypothesis-factor alignment (semantic consistency between the market hypothesis and the mathematical factor)
- Complexity control (prevent overfitting to noise)
- Results: 11% annual excess return (IR=1.5) on CSI 500 over 4 years
- Relevance to us: The three-agent pattern maps well to Op’s
parcomposition. But their factors are formulaic (mathematical expressions over price/volume). Our approach is broader (multi-source including fundamental + alt data).
Black-Litterman Model
- PyPortfolioOpt implementation: https://pyportfolioopt.readthedocs.io/en/latest/BlackLitterman.html
- Perfect framework for combining agent “views” with our prior assumptions
- Key params: τ (tau) controls how much views influence posterior (~0.05 default)
- Idzorek’s method converts percentage confidences to view uncertainty matrix
Post-Earnings Announcement Drift (PEAD)
- One of the most robust anomalies in finance (documented since 1968)
- Stocks drift 3-5% in direction of earnings surprise over 60+ days
- Survives even after publication (behavioral, not arbitrage-limited)
- Actionable for our pipeline: scrape earnings + estimate surprises
8. Implementation Plan
Phase 1: Market Data Foundation (1-2 weeks)
- Implement
Omni.Fund.Data.Market— price/volume data from REST API (Alpha Vantage or similar) - Implement
Omni.Fund.Data.Edgar— SEC EDGAR API wrapper (Form 4, submissions) - Implement
Omni.Fund.Data.Fred— FRED macro data wrapper (or evaluategborough/fred) - Add pure Haskell computations: trailing returns, realized vol, correlation matrix
- Create
SignalBundletype in Invest.hs, wire data libs → signal output - Run as daily cron job (systemd timer)
This phase involves NO LLM. Just native Haskell data access + math.
No Python in the loop — all three data libraries are thin typed wrappers around REST/JSON APIs,
built on the same http-conduit stack as Omni.Agent.Tools.Http.
Phase 2: Bayesian Integration (1-2 weeks)
- Implement
blUpdatein Invest.hs (or use hmatrix for matrix math) - Implement
kellyOptimalCorrelated(replace uncorrelated version) - Implement Cholesky-correlated GBM for Monte Carlo
- Wire signals.json → updated AssetModel → invest page
- Add signal display to invest page (what signals are active, confidence)
This phase is pure Haskell math. Still no LLM.
Phase 3: Agent-Augmented Signals (2-3 weeks)
- Write first Op program:
signalScanusingOp.parfor parallel sources - Insider signal agent (EDGAR + LLM interpretation)
- Macro regime agent (FRED data + LLM interpretation)
- Cross-signal coherence assessment (LLM synthesizes multiple signal readings)
- Run via Op.Runner as scheduled task or on-demand
This is where the LLM enters. The agent interprets data, not computes numbers.
Phase 4: Feedback & Decay Tracking (ongoing)
- Track predicted vs realized returns per signal
- Information coefficient (IC) measurement per signal type
- Signal decay curves (is the signal losing predictive power?)
- Kill switch: auto-disable signals whose IC drops below threshold
9. Missing Pieces / Gaps
In Invest.hs:
- Correlation matrix — currently assumes uncorrelated. Need
hmatrixor similar for linear algebra (matrix inverse, Cholesky). - Dynamic parameters — currently reads static Config.hs. Need to read from signals.json and fall back to Config.hs defaults if no signals available.
- Signal display — invest page needs a “signals” section showing current readings and how they’re affecting the model.
In Op infrastructure:
- Scheduled execution — need a way to run Op programs on a cron schedule. Could use systemd timer + op-runner CLI, or integrate with agentd.
- Signal persistence — signals should be stored with timestamps so we can track decay over time. SQLite or just JSONL append?
Risks:
- Garbage in, garbage out — if the LLM misinterprets a signal, the Bayesian update will propagate the error. Mitigate with conservative τ (low confidence in views) and confidence clamping.
- Overfitting — backtesting on the same data used to develop signals. Need out-of-sample validation period.
- Latency — EDGAR filings are public instantly but our pipeline runs daily. For insider trading signals, same-day is fine. For price momentum, daily is fine. HFT signals are out of scope.
10. Quick Win: Dynamic μ/σ from Price Data (No LLM)
Before building the full agent pipeline, the single highest-value change:
-- Omni/Fund/UpdateParams.hs — run as daily systemd timer
-- Uses native Haskell data libraries, no external dependencies
module Omni.Fund.UpdateParams where
import Omni.Fund.Data.Market (getDailyPrices, trailingReturn, realizedVol, correlationMatrix)
import Data.Aeson (encode)
import qualified Data.ByteString.Lazy as BL
assets :: [(Text, Text)] -- (API ticker, internal name)
assets = [("BTC-USD", "BTC"), ("SPY", "equities")]
updateParams :: IO ()
updateParams = do
-- Pull 2 years of daily prices per asset
priceHistories <- forM assets $ \(ticker, name) -> do
prices <- getDailyPrices ticker 504 -- ~2 years trading days
let mu = trailingReturn 252 prices -- 1-year trailing
sigma = realizedVol 252 prices -- 1-year realized vol
pure (name, mu, sigma, prices)
-- Compute correlation matrix across all assets
let allPrices = map (\(_, _, _, ps) -> ps) priceHistories
corrMatrix = correlationMatrix allPrices
-- Write signal bundle
let bundle = SignalBundle
{ sbTimestamp = now
, sbParams = [ (name, mu, sigma) | (name, mu, sigma, _) <- priceHistories ]
, sbCorrelation = corrMatrix
}
BL.writeFile "/var/fund/signals.json" (encode bundle)
This alone would make the invest page responsive to actual market conditions instead of using hardcoded assumptions. It’s the minimum viable signal pipeline. No Python, no external processes — just a Haskell executable on a timer.
11. Summary of Decisions Needed
-
Haskell matrix library:
hmatrix(C bindings, fast) vs pure Haskell (matrix,linear)? hmatrix is standard but has C dependency. -
Signal storage format: JSON file (simple) vs SQLite (queryable, historical)? Recommend: JSON file for v1, migrate to SQLite when tracking signal decay.
-
Scheduling: systemd timer (simple) vs agentd integration (fancy)? Recommend: systemd timer for Phase 1-2, agentd for Phase 3+ when Op programs need budget/checkpoint/steering support.
-
How much to trust agent views: τ parameter in Black-Litterman controls this. Start very conservative (τ = 0.01, views barely nudge the prior). Increase as we validate signal quality with realized IC measurements.
-
Scope of asset universe: Current Invest.hs tracks ~5 assets (BTC, STRD, equities, RE, cash). Do we want to expand to individual stocks? Recommend: No for v1. Keep the asset universe small and focus on getting the pipeline working. Individual stock signals can come later.
References
- AlphaAgent paper: https://arxiv.org/abs/2502.16789
- SEC EDGAR APIs: https://www.sec.gov/search-filings/edgar-application-programming-interfaces
- FRED API: https://fred.stlouisfed.org/docs/api/fred/
- Black-Litterman in PyPortfolioOpt: https://pyportfolioopt.readthedocs.io/en/latest/BlackLitterman.html
- Free quant data sources: https://tradescopeblog.info/article/top-10-free-datasets-every-retail-quant-should-bookmark-in-2025
- awesome-quant resource list: https://github.com/wilsonfreitas/awesome-quant