LLM-powered topic detection.
Replaced single-word clustering with n-gram (bigram/trigram) approach using TF-IDF weighting and greedy cluster dedup. Filters boring common words. Commit: e4117ae2
Reopening: the previous implementation used n-gram TF-IDF instead of actual embeddings. This time we want real embedding-based clustering: 1) Generate embeddings on article ingest using ollama (nomic-embed-text), store in the existing embedding_vector column. 2) Use cosine similarity on embedding vectors for clustering, not TF-IDF n-grams. 3) The current n-gram approach produces poor cluster labels ('Breakfast Cereal', 'Day Kinks') — embeddings should give semantically meaningful groupings. The embedding storage plumbing already exists (updateArticleEmbedding, vectorToBlob/blobToVector in Article.hs) — it just needs to be wired into ingest and used in Cluster.hs.
Implemented embedding-based clustering using ollama nomic-embed-text (768-dim). Cosine similarity threshold 0.55, greedy neighborhood selection. Backfilled 4231 articles. Topics page responds in ~2s. Falls back to n-gram when embeddings unavailable. Commit: 0a9b5434
Current clustering groups articles by single keyword only. Needs proper multi-word topic extraction (TF-IDF, LLM summarization, or embedding-based clustering). The infrastructure exists in Omni/Newsreader/Cluster.hs.