t-420 - omni

t-420·WorkTask···

Created1 month ago·Updated1 month ago

Description

Add graceful SIGTERM handling to the agent binary for Kubernetes compatibility.

Context

When running in Kubernetes, pods receive SIGTERM before being killed. The agent needs to: 1. Catch SIGTERM 2. Finish the current iteration cleanly (don't interrupt mid-tool-call) 3. Emit a shutdown event to the trace 4. Exit with code 0

Without this, k8s will SIGKILL after the grace period, potentially corrupting workspace state.

Current State

The agent binary is at Omni/Agent.hs (or Omni/Agent/Engine.hs for the core loop). There's no signal handling currently.

Implementation

1. Add signal handler setup

In the agent's main entry point, install a SIGTERM handler:

import System.Posix.Signals (installHandler, sigTERM, Handler(Catch))
import Control.Concurrent.MVar

-- Create shutdown flag
shutdownFlag <- newMVar False

-- Install handler
_ <- installHandler sigTERM (Catch $ putMVar shutdownFlag True) Nothing

2. Check flag between iterations

In the main agent loop (Engine.hs), check the shutdown flag between iterations:

-- In the iteration loop
shouldShutdown <- readMVar shutdownFlag
when shouldShutdown $ do
  emitEvent EventShutdown
  exitSuccess

3. Add EventShutdown event type

In Omni/Agent/Events.hs or Omni/Agent/Trace.hs, add:

data Event = ...
  | EventShutdown { evTimestamp :: UTCTime }

With JSON encoding:

{"type": "shutdown", "timestamp": "...", "reason": "SIGTERM"}

4. Ensure current tool call completes

The signal handler should NOT interrupt execution - just set a flag. The agent loop checks the flag at safe points (between iterations, after tool results).

Testing

1. Run agent with a long task 2. Send SIGTERM: kill -TERM <pid> 3. Verify:

Agent finishes current iteration
Shutdown event appears in output
Exit code is 0
No workspace corruption

For k8s testing:

kubectl delete pod <agent-pod> --grace-period=30
# Watch logs to see shutdown event

Files to Modify

Omni/Agent.hs or Omni/Agent/Cli.hs — signal handler setup
Omni/Agent/Engine.hs — check shutdown flag in loop
Omni/Agent/Events.hs or Omni/Agent/Trace.hs — add EventShutdown type
Omni/Agent/Watch.hs — handle shutdown event in status updates

Dependencies

unix package (already a dep via Alpha)
No new deps needed

Acceptance Criteria

[ ] SIGTERM caught and handled gracefully
[ ] Current iteration completes before exit
[ ] Shutdown event emitted to trace
[ ] Exit code 0 on graceful shutdown
[ ] Works in k8s pod termination scenario

Add graceful SIGTERM handling to agent for Kubernetes