Integrate the code-only agent with Agentd's Docker-based sandboxing.
The code-only spike (t-369.17) validated that Think + Execute works, but it runs Python directly on the host without sandboxing. This is a security risk - the model can read files, access network, etc.
Agentd already has Docker-based execution. We need to wire the code-only agent to use it.
CodeOnly.hs executes Python directly:
execute :: ExperimentConfig -> Text -> IO CodeResult
execute config code = do
-- Writes to temp file, runs python3 directly
Process.readCreateProcessWithExitCode pythonCmd ""
Agentd has container support:
-- From Agentd.hs
toolchainImage :: Text -> Text
toolchainImage = \case
"base" -> "agent-base:latest"
"python" -> "agent-python:latest"
...
-- In CodeOnly.hs or new Sandbox.hs
data SandboxConfig = SandboxConfig
{ sbImage :: Text -- Docker image
, sbTimeout :: Int -- Seconds
, sbMemory :: Text -- e.g., "512m"
, sbNetwork :: Bool -- Allow network?
, sbMounts :: [(FilePath, FilePath, Bool)] -- (host, container, rw?)
}
defaultSandbox :: SandboxConfig
defaultSandbox = SandboxConfig
{ sbImage = "python:3.11-slim"
, sbTimeout = 30
, sbMemory = "512m"
, sbNetwork = False
, sbMounts = []
}
executeSandboxed :: SandboxConfig -> Text -> IO CodeResult
executeSandboxed config code = do
-- Write code to temp file
-- Build docker command:
-- docker run --rm --network none --memory 512m \
-- -v /tmp/code.py:/code.py:ro \
-- python:3.11-slim python /code.py
-- Parse output
...
buildDockerCmd :: SandboxConfig -> FilePath -> [String]
buildDockerCmd config codePath =
[ "run", "--rm"
, "--network", if sbNetwork config then "bridge" else "none"
, "--memory", Text.unpack (sbMemory config)
, "--cpus", "1"
, "--pids-limit", "100" -- prevent fork bombs
, "-v", codePath <> ":/code.py:ro"
] ++ mountArgs ++
[ Text.unpack (sbImage config)
, "python", "/code.py"
]
Use timeout around the docker command, plus Docker's own timeout:
executeSandboxed config code = do
let timeoutUs = sbTimeout config * 1_000_000
result <- Timeout.timeout timeoutUs $ do
Process.readCreateProcessWithExitCode dockerCmd ""
case result of
Nothing -> pure CodeTimeout
Just (exitCode, stdout, stderr) -> ...
codeOnlyAgent :: ExperimentConfig -> Provider.Provider -> Text -> IO RunResult
codeOnlyAgent config provider task = do
-- Use sandboxed execution
let sandbox = defaultSandbox { sbTimeout = ecExecuteTimeout config }
...
result <- executeSandboxed sandbox code
Verify the spike results still hold with sandboxing: