Add graceful SIGTERM handling to agent for Kubernetes

t-420·WorkTask·
·
·
Created1 month ago·Updated1 month ago

Description

Edit

Add graceful SIGTERM handling to the agent binary for Kubernetes compatibility.

Context

When running in Kubernetes, pods receive SIGTERM before being killed. The agent needs to: 1. Catch SIGTERM 2. Finish the current iteration cleanly (don't interrupt mid-tool-call) 3. Emit a shutdown event to the trace 4. Exit with code 0

Without this, k8s will SIGKILL after the grace period, potentially corrupting workspace state.

Current State

The agent binary is at Omni/Agent.hs (or Omni/Agent/Engine.hs for the core loop). There's no signal handling currently.

Implementation

1. Add signal handler setup

In the agent's main entry point, install a SIGTERM handler:

import System.Posix.Signals (installHandler, sigTERM, Handler(Catch))
import Control.Concurrent.MVar

-- Create shutdown flag
shutdownFlag <- newMVar False

-- Install handler
_ <- installHandler sigTERM (Catch $ putMVar shutdownFlag True) Nothing

2. Check flag between iterations

In the main agent loop (Engine.hs), check the shutdown flag between iterations:

-- In the iteration loop
shouldShutdown <- readMVar shutdownFlag
when shouldShutdown $ do
  emitEvent EventShutdown
  exitSuccess

3. Add EventShutdown event type

In Omni/Agent/Events.hs or Omni/Agent/Trace.hs, add:

data Event = ...
  | EventShutdown { evTimestamp :: UTCTime }

With JSON encoding:

{"type": "shutdown", "timestamp": "...", "reason": "SIGTERM"}

4. Ensure current tool call completes

The signal handler should NOT interrupt execution - just set a flag. The agent loop checks the flag at safe points (between iterations, after tool results).

Testing

1. Run agent with a long task 2. Send SIGTERM: kill -TERM <pid> 3. Verify:

  • Agent finishes current iteration
  • Shutdown event appears in output
  • Exit code is 0
  • No workspace corruption

For k8s testing:

kubectl delete pod <agent-pod> --grace-period=30
# Watch logs to see shutdown event

Files to Modify

  • Omni/Agent.hs or Omni/Agent/Cli.hs — signal handler setup
  • Omni/Agent/Engine.hs — check shutdown flag in loop
  • Omni/Agent/Events.hs or Omni/Agent/Trace.hs — add EventShutdown type
  • Omni/Agent/Watch.hs — handle shutdown event in status updates

Dependencies

  • unix package (already a dep via Alpha)
  • No new deps needed

Acceptance Criteria

  • [ ] SIGTERM caught and handled gracefully
  • [ ] Current iteration completes before exit
  • [ ] Shutdown event emitted to trace
  • [ ] Exit code 0 on graceful shutdown
  • [ ] Works in k8s pod termination scenario

Timeline (4)

🔄[human]Open → InProgress1 month ago
💬[human]1 month ago

Installed SIGTERM handler in agent CLI, added shutdown checks to sequential interpreter (emits shutdown event), and updated watch status/docs.

🔄[human]InProgress → Done1 month ago