Build a local document search system for ~/notes and ~/org that supports hybrid search (BM25 + vector + optional LLM rerank). This will enable both interactive CLI search and programmatic access as an agentd tool.
qmd by Tobi Lütke: https://github.com/tobi/qmd
Key ideas from qmd:
Instead of copying qmd directly, build a Python implementation that:
1. Uses pandoc for parsing - handles both markdown and orgmode natively via JSON AST 2. Chunking by heading - split pandoc AST on Header nodes, preserving hierarchy 3. SQLite FTS5 for BM25 - fast, embedded, no external deps 4. Ollama for embeddings - local vector embeddings (nomic-embed-text or similar) 5. Optional LLM rerank - can add later if BM25+vector isn't sufficient
1. Walk ~/notes and ~/org, parse each file with pandoc -t json
2. Chunk by heading/section (pandoc AST makes this straightforward)
3. Store chunks in SQLite with FTS5 for BM25, plus a vector table for embeddings
4. CLI interface: notes-search "query" returns ranked results with file:line references
5. Agentd tool endpoint for programmatic access
Suggested location: Omni/Notes/ or Omni/Search/
Omni/Notes/Search.py - core indexing and search logicOmni/Notes/Cli.py - CLI interfaceOmni/Notes/Tool.py - agentd tool wrapper (optional, can add later)subprocess for pandoc callssqlite3 for FTS5ollama python client for embeddingsnumpy for vector similarity if not using sqlite-vecnotes-search "agentd architecture" returns relevant results from ~/notesPhase 2 complete: Vector embeddings and hybrid search
Implemented:
Both phases now complete. Remaining optional enhancements:
Phase 1 complete: BM25 search working via notes-search CLI
Implemented:
Next: Phase 2 (vector embeddings via ollama)