bild hangs on Haskell targets: Conduit.streamingProcess never spawns nix-instantiate

t-657·WorkTask·
·
·
·Omni/Bild.hs
Created1 month ago·Updated1 month ago·pipeline runs →

Description

Edit

Summary

bild hangs indefinitely when building any Haskell target that requires nix-instantiate (i.e., not already cached in the nix store). Python targets and cached Haskell targets work fine.

Symptoms

  • bild Omni/Task.hs shows "." (analyzing) then "+" (building) then hangs forever
  • bild's internal 10-minute timeout eventually kills it
  • No nix-instantiate process is ever spawned (confirmed via strace)
  • The bild process sits at ~17% CPU with 6 threads in futex/epoll/poll wait states
  • nix-instantiate and nix-build work perfectly when called directly from the shell
  • Python targets (bild Omni/App.py) build instantly — they use a different code path
  • Already-cached Haskell targets return exit 0 immediately (no rebuild needed)
  • System-wide: happens from any terminal, ssh session, tmux window
  • Restarting nix-daemon does not help
  • The same bild binary (store path /nix/store/gwrag941wnnn77s25kypcrqpl8bihvn1-bild) was working earlier on 2026-02-18

Root Cause (narrowed down)

The hang is in Omni/Bild.hs's run function (~line 1629), specifically in Conduit.streamingProcess which is the Conduit library's wrapper around System.Process.createProcess. The analysis phase completes successfully, the build worker picks up the target, but when nixBuild calls run with the nix-instantiate Proc, Conduit never actually spawns the child process.

Relevant code path:

pipelineBuildOne → nixBuild → instantiate → run → Conduit.streamingProcess (HANGS HERE)

The run function also sets Process.create_group = True on the process config, which creates a new process group. This could be relevant.

Workaround

Build Haskell targets using a manual nix expression that calls GHC directly, bypassing bild's Conduit process spawning:

# /tmp/build-pipeline.nix
let
  bild = import /home/ben/omni/live/Omni/Bild.nix {};
  ghc = bild.haskell.ghcWith (p: with p; [ aeson async ... ]);
in bild.stdenv.mkDerivation {
  name = "pipeline";
  src = builtins.filterSource (...) /home/ben/omni/live;
  nativeBuildInputs = [ ghc ];
  buildPhase = ''ghc -Wall -Werror -threaded -i. --make Omni/Pipeline.hs -o pipeline'';
  installPhase = ''mkdir -p $out/bin; cp pipeline $out/bin/'';
}

This works every time. The issue is specifically in bild's process spawning, not in nix or GHC.

Investigation leads

  • Could be a stale file descriptor or kernel resource leak from killed bild processes
  • Could be a Conduit library bug triggered by specific GHC RTS state
  • Could be related to Process.create_group = True interacting badly with the terminal/session
  • A system reboot would likely fix it but hasn't been tried
  • The streaming-commons / conduit-extra version in the bild closure should be checked for known issues

Timeline (2)

🔄[system]Open → InProgress1 month ago
🔄[human]InProgress → Done1 month ago