Add graceful SIGTERM handling to agent for Kubernetes

t-420·WorkTask·
·
·
Created3 months ago·Updated3 months ago·pipeline runs →

Description

Edit

Add graceful SIGTERM handling to the agent binary for Kubernetes compatibility.

Context

When running in Kubernetes, pods receive SIGTERM before being killed. The agent needs to: 1. Catch SIGTERM 2. Finish the current iteration cleanly (don't interrupt mid-tool-call) 3. Emit a shutdown event to the trace 4. Exit with code 0

Without this, k8s will SIGKILL after the grace period, potentially corrupting workspace state.

Current State

The agent binary is at Omni/Agent.hs (or Omni/Agent/Engine.hs for the core loop). There's no signal handling currently.

Implementation

1. Add signal handler setup

In the agent's main entry point, install a SIGTERM handler:

import System.Posix.Signals (installHandler, sigTERM, Handler(Catch))
import Control.Concurrent.MVar

-- Create shutdown flag
shutdownFlag <- newMVar False

-- Install handler
_ <- installHandler sigTERM (Catch $ putMVar shutdownFlag True) Nothing

2. Check flag between iterations

In the main agent loop (Engine.hs), check the shutdown flag between iterations:

-- In the iteration loop
shouldShutdown <- readMVar shutdownFlag
when shouldShutdown $ do
  emitEvent EventShutdown
  exitSuccess

3. Add EventShutdown event type

In Omni/Agent/Events.hs or Omni/Agent/Trace.hs, add:

data Event = ...
  | EventShutdown { evTimestamp :: UTCTime }

With JSON encoding:

{"type": "shutdown", "timestamp": "...", "reason": "SIGTERM"}

4. Ensure current tool call completes

The signal handler should NOT interrupt execution - just set a flag. The agent loop checks the flag at safe points (between iterations, after tool results).

Testing

1. Run agent with a long task 2. Send SIGTERM: kill -TERM <pid> 3. Verify:

  • Agent finishes current iteration
  • Shutdown event appears in output
  • Exit code is 0
  • No workspace corruption

For k8s testing:

kubectl delete pod <agent-pod> --grace-period=30
# Watch logs to see shutdown event

Files to Modify

  • Omni/Agent.hs or Omni/Agent/Cli.hs — signal handler setup
  • Omni/Agent/Engine.hs — check shutdown flag in loop
  • Omni/Agent/Events.hs or Omni/Agent/Trace.hs — add EventShutdown type
  • Omni/Agent/Watch.hs — handle shutdown event in status updates

Dependencies

  • unix package (already a dep via Alpha)
  • No new deps needed

Acceptance Criteria

  • [ ] SIGTERM caught and handled gracefully
  • [ ] Current iteration completes before exit
  • [ ] Shutdown event emitted to trace
  • [ ] Exit code 0 on graceful shutdown
  • [ ] Works in k8s pod termination scenario

Timeline (4)

🔄[human]Open → InProgress3 months ago
💬[human]3 months ago

Installed SIGTERM handler in agent CLI, added shutdown checks to sequential interpreter (emits shutdown event), and updated watch status/docs.

🔄[human]InProgress → Done3 months ago