t-369.18 - omni

t-369.18·WorkTask····Omni/Agent.hs

Parent:t-369·Created1 month ago·Updated1 month ago

Dependencies

t-369.17 [Blocks]

Description

Integrate the code-only agent with Agentd's Docker-based sandboxing.

Context

The code-only spike (t-369.17) validated that Think + Execute works, but it runs Python directly on the host without sandboxing. This is a security risk - the model can read files, access network, etc.

Agentd already has Docker-based execution. We need to wire the code-only agent to use it.

Current State

CodeOnly.hs executes Python directly:

execute :: ExperimentConfig -> Text -> IO CodeResult
execute config code = do
  -- Writes to temp file, runs python3 directly
  Process.readCreateProcessWithExitCode pythonCmd ""

Agentd has container support:

-- From Agentd.hs
toolchainImage :: Text -> Text
toolchainImage = \case
  "base" -> "agent-base:latest"
  "python" -> "agent-python:latest"
  ...

Deliverables

1. Create sandboxed execute function

-- In CodeOnly.hs or new Sandbox.hs
data SandboxConfig = SandboxConfig
  { sbImage :: Text           -- Docker image
  , sbTimeout :: Int          -- Seconds
  , sbMemory :: Text          -- e.g., "512m"
  , sbNetwork :: Bool         -- Allow network?
  , sbMounts :: [(FilePath, FilePath, Bool)]  -- (host, container, rw?)
  }

defaultSandbox :: SandboxConfig
defaultSandbox = SandboxConfig
  { sbImage = "python:3.11-slim"
  , sbTimeout = 30
  , sbMemory = "512m"
  , sbNetwork = False
  , sbMounts = []
  }

executeSandboxed :: SandboxConfig -> Text -> IO CodeResult
executeSandboxed config code = do
  -- Write code to temp file
  -- Build docker command:
  -- docker run --rm --network none --memory 512m \
  --   -v /tmp/code.py:/code.py:ro \
  --   python:3.11-slim python /code.py
  -- Parse output
  ...

2. Docker command construction

buildDockerCmd :: SandboxConfig -> FilePath -> [String]
buildDockerCmd config codePath =
  [ "run", "--rm"
  , "--network", if sbNetwork config then "bridge" else "none"
  , "--memory", Text.unpack (sbMemory config)
  , "--cpus", "1"
  , "--pids-limit", "100"  -- prevent fork bombs
  , "-v", codePath <> ":/code.py:ro"
  ] ++ mountArgs ++ 
  [ Text.unpack (sbImage config)
  , "python", "/code.py"
  ]

3. Timeout handling

Use timeout around the docker command, plus Docker's own timeout:

executeSandboxed config code = do
  let timeoutUs = sbTimeout config * 1_000_000
  result <- Timeout.timeout timeoutUs $ do
    Process.readCreateProcessWithExitCode dockerCmd ""
  case result of
    Nothing -> pure CodeTimeout
    Just (exitCode, stdout, stderr) -> ...

4. Update CodeOnly.hs to use sandbox

codeOnlyAgent :: ExperimentConfig -> Provider.Provider -> Text -> IO RunResult
codeOnlyAgent config provider task = do
  -- Use sandboxed execution
  let sandbox = defaultSandbox { sbTimeout = ecExecuteTimeout config }
  ...
  result <- executeSandboxed sandbox code

5. Re-run benchmarks with sandbox

Verify the spike results still hold with sandboxing:

Simple benchmark: should still be 100%
Medium benchmark: should still be 100%
Network test: should now fail (network disabled)
File read test: should fail (no mounts)

Testing

[ ] Docker sandbox executes Python correctly
[ ] Timeout kills runaway code
[ ] Memory limit prevents OOM
[ ] Network disabled by default
[ ] Fork bomb prevented
[ ] Benchmarks pass with sandbox

Notes

May need to build/pull a Python image first
Consider caching the image pull
Log sandbox violations for debugging

Files

Omni/Agent/Experiments/CodeOnly.hs (update)
Omni/Agent/Sandbox.hs (new, optional)
Omni/Agentd.hs (reference for existing Docker code)

Wire up Docker sandbox for code-only agents