Guardrail violations should terminate agent, not restart it

t-209·WorkTask·
·
·
·Omni/Agent.hs
Created3 months ago·Updated3 months ago

Dependencies

Description

Edit

When a guardrail fires (e.g., duplicate tool calls), the agent currently restarts from scratch instead of stopping. This causes a loop where: 1. Agent does work 2. Guardrail fires (e.g., read_file called 5 times) 3. Agent restarts from beginning 4. Agent does same work again 5. Guardrail fires again 6. Repeat

The agent should STOP when a guardrail fires, preserving any changes made. The task should be left InProgress or set back to Open with context about why it stopped.

Observed in t-204: Agent correctly fixed the issue, tests passed, then guardrail fired for duplicate read_file calls. Instead of stopping successfully, it restarted and the guardrail fired 3 more times.

Timeline (75)·$1.94·672.0K tokens

🤖Claimed[engineer]3 months ago
🔄[engineer]Open → InProgress3 months ago
▶️Running[engineer]{"started_at":"2025-12-01 23:24:55.830167387 UTC","completed_at":"2025-12-01 23:26:31.365047333 UTC","cost_cents":208}3 months ago
💭Thought[engineer]3 months ago

I'll start by reading AGENTS.md to understand the codebase conventions, then examine the agent code to understand how guardrails currently work.

ReadAGENTS.md
83 lines
1: # Omni
2: 
3: The Omni project is to leverage automation and asymmetries to create wealth.
4: 
5: ## Critical Rules for AI Agents
6: 
7: **CRITICAL**: This project uses `jr task` for ALL issue tracking. You MUST follow these rules:
8: 
9: - ✅ Use `jr task` for ALL task/TODO tracking (`jr task create ... --json`)
10: - ✅ Link discovered work with `--discovered-from=<parent-id>`
11: - ✅ File bugs IMMEDIATELY when you discover unexpected behavior
12: - ✅ Add facts to the knowledge base when you learn something useful (`jr facts add ...`)
13: - ✅ Run `jr task ready --json` before asking "what should I work on?"
14: - ✅ Store AI planning docs in `_/llm` directory (NEVER in repo root)
15: - ❌ Do NOT use `todo_write` tool
16: - ❌ Do NOT create markdown TODO lists or task checklists
17: - ❌ Do NOT put TODO/FIXME comments in code
18: 
19: ## Cheat Sheet
20: 
21: **Standard workflow:**
22: ```bash
23: # 1. Find ready work
24: jr task ready --json
25: 
26: # 2. Claim a task
27: jr task update <id> in-progress --json
28: 
29: # 3. During work: create discovered issues
30: jr task create "Fix type error found" --discovered-from=<current-id> --json
31: 
32: # 4. Complete the task
33: jr task update <id> done --json
34: ```
35: 
36: **Bug Discovery:**
37: ```bash
38: # Create a task immediately
39: jr task create "Command X fails when Y" --discovered-from=<current-task-id> --json
40: ```
41: 
42: ## Directory Structure
43: 
44: - **`_/`** (cabdir) - All ephemeral/generated files. This directory is gitignored.
45:   - `_/llm/` - AI planning docs and agent logs
46:   - `_/tmp/` - Temporary files, test databases, scratch data
47: - Never create dotfile directories (like `.tasks/`) in the repo root
48: 
49: ## Common Commands
50: 
51: **Build:**
52: ```bash
53: bild Omni/Jr.hs              # Build a Haskell namespace
54: bild Biz/Cloud/Api.py        # Build a Python namespace
55: ```
56: 
57: **Lint and Format:**
58: ```bash
59: lint Omni/Cli.hs             # Lint a file (DO NOT use hlint directly)
60: lint --fix **/*.py           # Lint and auto-fix Python files
61: ```
62: 
63: **Type Check:**
64: ```bash
65: typecheck.sh Omni/Bild/Example.py   # Run mypy on Python files
66: ```
67: 
68: **Test:**
69: ```bash
70: bild --test Omni/Jr.hs       # Build and run tests for a namespace
71: ```
72: 
73: **Run:**
74: ```bash
75: Omni/Ide/run.sh Omni/Jr.hs   # Build (if needed) and run
76: ```
77: 
78: ## Documentation
79: 
80: - **Project Context**: [README.md](README.md) - Goals, source layout, and coding conventions.
81: - **Task Manager**: [`Omni/Task/README.md`](Omni/Task/README.md) - Detailed usage, dependency management, and agent best practices.
82: - **Build Tool (Bild)**: [`Omni/Bild/README.md`](Omni/Bild/README.md) - How to use `bild` and manage dependencies.
83: - **Development Tools**: [`Omni/Ide/README.md`](Omni/Ide/README.md) - `run.sh`, `lint`, `repl.sh`, git workflow.
ReadOmni/Agent.hs
55 lines
1: {-# LANGUAGE OverloadedStrings #-}
2: {-# LANGUAGE NoImplicitPrelude #-}
3: 
4: -- | Agent system entry point and combined test runner.
5: --
6: -- This module provides the main entry point for the agent system
7: -- and re-exports core types from sub-modules.
8: --
9: -- : out omni-agent
10: -- : dep aeson
11: module Omni.Agent
12:   ( -- * Engine
13:     module Omni.Agent.Engine,
14: 
15:     -- * Tools
16:     module Omni.Agent.Tools,
17: 
18:     -- * Core
19:     module Omni.Agent.Core,
20: 
21:     -- * Test
22:     main,
23:     test,
24:   )
25: where
26: 
27: import Alpha
28: import Omni.Agent.Core
29: import Omni.Agent.Engine hiding (main, test)
30: import qualified Omni.Agent.Engine as Engine
31: import Omni.Agent.Tools hiding (ToolResult, main, test)
32: import qualified Omni.Agent.Tools as Tools
33: import qualified Omni.Test as Test
34: 
35: main :: IO ()
36: main = Test.run test
37: 
38: test :: Test.Tree
39: test =
40:   Test.group
41:     "Omni.Agent"
42:     [ Engine.test,
43:       Tools.test,
44:       Test.unit "Core types are re-exported" <| do
45:         let status = Idle :: WorkerStatus
46:         status Test.@=? status,
47:       Test.unit "Engine and Tools integrate correctly" <| do
48:         let tools = Tools.allTools
49:         length tools Test.@=? 5
50:         let config =
51:               Engine.defaultAgentConfig
52:                 { Engine.agentTools = tools
53:                 }
54:         Engine.agentMaxIterations config Test.@=? 10
55:     ]
💭Thought[engineer]3 months ago

Now let me examine the Engine module since that's where the agent logic likely resides:

ReadOmni/Agent/Engine.hs
811 lines
1: {-# LANGUAGE DeriveGeneric #-}
2: {-# LANGUAGE OverloadedStrings #-}
3: {-# LANGUAGE NoImplicitPrelude #-}
4: 
5: -- | LLM Agent Engine - Tool protocol and LLM provider abstraction.
6: --
7: -- This module provides the core abstractions for building LLM-powered agents:
8: -- - Tool: Defines tools that agents can use
9: -- - LLM: OpenAI-compatible chat completions API provider
10: -- - AgentConfig: Configuration for running agents
11: --
12: -- : out omni-agent-engine
13: -- : dep http-conduit
14: -- : dep aeson
15: -- : dep case-insensitive
16: module Omni.Agent.Engine
17:   ( Tool (..),
18:     LLM (..),
19:     EngineConfig (..),
20:     AgentConfig (..),
21:     AgentResult (..),
22:     Guardrails (..),
23:     GuardrailResult (..),
24:     Message (..),
25:     Role (..),
26:     ToolCall (..),
27:     FunctionCall (..),
28:     ToolResult (..),
29:     ChatCompletionRequest (..),
30:     ChatCompletionResponse (..),
31:     Choice (..),
32:     Usage (..),
33:     defaultLLM,
34:     defaultEngineConfig,
35:     defaultAgentConfig,
36:     defaultGuardrails,
37:     chat,
38:     runAgent,
39:     main,
40:     test,
41:   )
42: where
43: 
44: import Alpha
45: import Data.Aeson ((.!=), (.:), (.:?), (.=))
46: import qualified Data.Aeson as Aeson
47: import qualified Data.Aeson.KeyMap as KeyMap
48: import qualified Data.ByteString.Lazy as BL
49: import qualified Data.CaseInsensitive as CI
50: import qualified Data.Map.Strict as Map
51: import qualified Data.Text as Text
52: import qualified Data.Text.Encoding as TE
53: import qualified Network.HTTP.Simple as HTTP
54: import qualified Omni.Test as Test
55: 
56: main :: IO ()
57: main = Test.run test
58: 
59: test :: Test.Tree
60: test =
61:   Test.group
62:     "Omni.Agent.Engine"
63:     [ Test.unit "Tool JSON roundtrip" <| do
64:         let tool =
65:               Tool
66:                 { toolName = "get_weather",
67:                   toolDescription = "Get weather for a location",
68:                   toolJsonSchema = Aeson.object ["type" .= ("object" :: Text), "properties" .= Aeson.object []],
69:                   toolExecute = \_ -> pure (Aeson.String "sunny")
70:                 }
71:         let encoded = encodeToolForApi tool
72:         case Aeson.decode (Aeson.encode encoded) of
73:           Nothing -> Test.assertFailure "Failed to decode tool"
74:           Just decoded -> toolName tool Test.@=? toolApiName decoded,
75:       Test.unit "Message JSON roundtrip" <| do
76:         let msg = Message User "Hello" Nothing Nothing
77:         case Aeson.decode (Aeson.encode msg) of
78:           Nothing -> Test.assertFailure "Failed to decode message"
79:           Just decoded -> msgContent msg Test.@=? msgContent decoded,
80:       Test.unit "defaultLLM has correct endpoint" <| do
81:         llmBaseUrl defaultLLM Test.@=? "https://openrouter.ai/api/v1",
82:       Test.unit "defaultLLM has OpenRouter headers" <| do
83:         length (llmExtraHeaders defaultLLM) Test.@=? 2
84:         llmModel defaultLLM Test.@=? "anthropic/claude-sonnet-4.5",
85:       Test.unit "defaultAgentConfig has sensible defaults" <| do
86:         agentMaxIterations defaultAgentConfig Test.@=? 10,
87:       Test.unit "defaultEngineConfig has no-op callbacks" <| do
88:         engineOnCost defaultEngineConfig 100 5
89:         engineOnActivity defaultEngineConfig "test"
90:         engineOnToolCall defaultEngineConfig "tool" "result"
91:         True Test.@=? True,
92:       Test.unit "buildToolMap creates correct map" <| do
93:         let tool1 =
94:               Tool
95:                 { toolName = "tool1",
96:                   toolDescription = "First tool",
97:                   toolJsonSchema = Aeson.object [],
98:                   toolExecute = \_ -> pure Aeson.Null
99:                 }
100:             tool2 =
101:               Tool
102:                 { toolName = "tool2",
103:                   toolDescription = "Second tool",
104:                   toolJsonSchema = Aeson.object [],
105:                   toolExecute = \_ -> pure Aeson.Null
106:                 }
107:             toolMap = buildToolMap [tool1, tool2]
108:         Map.size toolMap Test.@=? 2
109:         Map.member "tool1" toolMap Test.@=? True
110:         Map.member "tool2" toolMap Test.@=? True,
111:       Test.unit "Usage JSON parsing" <| do
112:         let json = "{\"prompt_tokens\":100,\"completion_tokens\":50,\"total_tokens\":150}"
113:         case Aeson.decode json of
114:           Nothing -> Test.assertFailure "Failed to decode usage"
115:           Just usage -> do
116:             usagePromptTokens usage Test.@=? 100
117:             usageCompletionTokens usage Test.@=? 50
118:             usageTotalTokens usage Test.@=? 150
119:             usageCost usage Test.@=? Nothing,
120:       Test.unit "Usage JSON parsing with cost" <| do
121:         let json = "{\"prompt_tokens\":194,\"completion_tokens\":2,\"total_tokens\":196,\"cost\":0.95}"
122:         case Aeson.decode json of
123:           Nothing -> Test.assertFailure "Failed to decode usage with cost"
124:           Just usage -> do
125:             usagePromptTokens usage Test.@=? 194
126:             usageCompletionTokens usage Test.@=? 2
127:             usageTotalTokens usage Test.@=? 196
128:             usageCost usage Test.@=? Just 0.95,
129:       Test.unit "AgentResult JSON roundtrip" <| do
130:         let result =
131:               AgentResult
132:                 { resultFinalMessage = "Done",
133:                   resultToolCallCount = 3,
134:                   resultIterations = 2,
135:                   resultTotalCost = 50,
136:                   resultTotalTokens = 1500
137:                 }
138:         case Aeson.decode (Aeson.encode result) of
139:           Nothing -> Test.assertFailure "Failed to decode AgentResult"
140:           Just decoded -> do
141:             resultFinalMessage decoded Test.@=? "Done"
142:             resultToolCallCount decoded Test.@=? 3
143:             resultIterations decoded Test.@=? 2,
144:       Test.unit "estimateCost calculates correctly" <| do
145:         let gpt4oCost = estimateCost "gpt-4o" 1000
146:             gpt4oMiniCost = estimateCost "gpt-4o-mini" 1000
147:         (gpt4oCost >= gpt4oMiniCost) Test.@=? True
148:         (gpt4oCost > 0) Test.@=? True,
149:       Test.unit "ToolCall JSON roundtrip" <| do
150:         let tc =
151:               ToolCall
152:                 { tcId = "call_123",
153:                   tcType = "function",
154:                   tcFunction = FunctionCall "read_file" "{\"path\":\"/tmp/test\"}"
155:                 }
156:         case Aeson.decode (Aeson.encode tc) of
157:           Nothing -> Test.assertFailure "Failed to decode ToolCall"
158:           Just decoded -> tcId decoded Test.@=? "call_123",
159:       Test.unit "FunctionCall JSON roundtrip" <| do
160:         let fc = FunctionCall "test_func" "{\"arg\":\"value\"}"
161:         case Aeson.decode (Aeson.encode fc) of
162:           Nothing -> Test.assertFailure "Failed to decode FunctionCall"
163:           Just decoded -> do
164:             fcName decoded Test.@=? "test_func"
165:             fcArguments decoded Test.@=? "{\"arg\":\"value\"}",
166:       Test.unit "Role JSON roundtrip for all roles" <| do
167:         let roles = [System, User, Assistant, ToolRole]
168:         forM_ roles <| \role ->
169:           case Aeson.decode (Aeson.encode role) of
170:             Nothing -> Test.assertFailure ("Failed to decode Role: " <> show role)
171:             Just decoded -> decoded Test.@=? role,
172:       Test.unit "defaultGuardrails has sensible defaults" <| do
173:         guardrailMaxCostCents defaultGuardrails Test.@=? 100.0
174:         guardrailMaxTokens defaultGuardrails Test.@=? 500000
175:         guardrailMaxDuplicateToolCalls defaultGuardrails Test.@=? 3
176:         guardrailMaxTestFailures defaultGuardrails Test.@=? 3,
177:       Test.unit "checkCostGuardrail detects exceeded budget" <| do
178:         let g = defaultGuardrails {guardrailMaxCostCents = 50.0}
179:         checkCostGuardrail g 60.0 Test.@=? GuardrailCostExceeded 60.0 50.0
180:         checkCostGuardrail g 40.0 Test.@=? GuardrailOk,
181:       Test.unit "checkTokenGuardrail detects exceeded budget" <| do
182:         let g = defaultGuardrails {guardrailMaxTokens = 1000}
183:         checkTokenGuardrail g 1500 Test.@=? GuardrailTokensExceeded 1500 1000
184:         checkTokenGuardrail g 500 Test.@=? GuardrailOk,
185:       Test.unit "checkDuplicateGuardrail detects repeated calls" <| do
186:         let g = defaultGuardrails {guardrailMaxDuplicateToolCalls = 3}
187:             counts = Map.fromList [("bash", 3), ("read_file", 1)]
188:         case checkDuplicateGuardrail g counts of
189:           GuardrailDuplicateToolCalls name count -> do
190:             name Test.@=? "bash"
191:             count Test.@=? 3
192:           _ -> Test.assertFailure "Expected GuardrailDuplicateToolCalls"
193:         checkDuplicateGuardrail g (Map.fromList [("bash", 2)]) Test.@=? GuardrailOk,
194:       Test.unit "checkTestFailureGuardrail detects failures" <| do
195:         let g = defaultGuardrails {guardrailMaxTestFailures = 3}
196:         checkTestFailureGuardrail g 3 Test.@=? GuardrailTestFailures 3
197:         checkTestFailureGuardrail g 2 Test.@=? GuardrailOk,
198:       Test.unit "updateToolCallCounts accumulates correctly" <| do
199:         let tc1 = ToolCall "1" "function" (FunctionCall "bash" "{}")
200:             tc2 = ToolCall "2" "function" (FunctionCall "bash" "{}")
201:             tc3 = ToolCall "3" "function" (FunctionCall "read_file" "{}")
202:             counts = updateToolCallCounts Map.empty [tc1, tc2, tc3]
203:         Map.lookup "bash" counts Test.@=? Just 2
204:         Map.lookup "read_file" counts Test.@=? Just 1,
205:       Test.unit "Guardrails JSON roundtrip" <| do
206:         let g = Guardrails 75.0 100000 5 4 3
207:         case Aeson.decode (Aeson.encode g) of
208:           Nothing -> Test.assertFailure "Failed to decode Guardrails"
209:           Just decoded -> decoded Test.@=? g,
210:       Test.unit "GuardrailResult JSON roundtrip" <| do
211:         let results =
212:               [ GuardrailOk,
213:                 GuardrailCostExceeded 100.0 50.0,
214:                 GuardrailTokensExceeded 2000 1000,
215:                 GuardrailDuplicateToolCalls "bash" 5,
216:                 GuardrailTestFailures 3,
217:                 GuardrailEditFailures 5
218:               ]
219:         forM_ results <| \r ->
220:           case Aeson.decode (Aeson.encode r) of
221:             Nothing -> Test.assertFailure ("Failed to decode GuardrailResult: " <> show r)
222:             Just decoded -> decoded Test.@=? r
223:     ]
224: 
225: data Tool = Tool
226:   { toolName :: Text,
227:     toolDescription :: Text,
228:     toolJsonSchema :: Aeson.Value,
229:     toolExecute :: Aeson.Value -> IO Aeson.Value
230:   }
231: 
232: data ToolApi = ToolApi
233:   { toolApiName :: Text,
234:     toolApiDescription :: Text,
235:     toolApiParameters :: Aeson.Value
236:   }
237:   deriving (Generic)
238: 
239: instance Aeson.ToJSON ToolApi where
240:   toJSON t =
241:     Aeson.object
242:       [ "type" .= ("function" :: Text),
243:         "function"
244:           .= Aeson.object
245:             [ "name" .= toolApiName t,
246:               "description" .= toolApiDescription t,
247:               "parameters" .= toolApiParameters t
248:             ]
249:       ]
250: 
251: instance Aeson.FromJSON ToolApi where
252:   parseJSON =
253:     Aeson.withObject "ToolApi" <| \v -> do
254:       fn <- v .: "function"
255:       (ToolApi </ (fn .: "name"))
256:         <*> (fn .: "description")
257:         <*> (fn .: "parameters")
258: 
259: encodeToolForApi :: Tool -> ToolApi
260: encodeToolForApi t =
261:   ToolApi
262:     { toolApiName = toolName t,
263:       toolApiDescription = toolDescription t,
264:       toolApiParameters = toolJsonSchema t
265:     }
266: 
267: data LLM = LLM
268:   { llmBaseUrl :: Text,
269:     llmApiKey :: Text,
270:     llmModel :: Text,
271:     llmExtraHeaders :: [(ByteString, ByteString)]
272:   }
273:   deriving (Show, Eq, Generic)
274: 
275: instance Aeson.ToJSON LLM where
276:   toJSON l =
277:     Aeson.object
278:       [ "llmBaseUrl" .= llmBaseUrl l,
279:         "llmApiKey" .= llmApiKey l,
280:         "llmModel" .= llmModel l
281:       ]
282: 
283: instance Aeson.FromJSON LLM where
284:   parseJSON =
285:     Aeson.withObject "LLM" <| \v ->
286:       (LLM </ (v .: "llmBaseUrl"))
287:         <*> (v .: "llmApiKey")
288:         <*> (v .: "llmModel")
289:         <*> pure []
290: 
291: defaultLLM :: LLM
292: defaultLLM =
293:   LLM
294:     { llmBaseUrl = "https://openrouter.ai/api/v1",
295:       llmApiKey = "",
296:       llmModel = "anthropic/claude-sonnet-4.5",
297:       llmExtraHeaders =
298:         [ ("HTTP-Referer", "https://omni.dev"),
299:           ("X-Title", "Omni Agent")
300:         ]
301:     }
302: 
303: data AgentConfig = AgentConfig
304:   { agentModel :: Text,
305:     agentTools :: [Tool],
306:     agentSystemPrompt :: Text,
307:     agentMaxIterations :: Int,
308:     agentGuardrails :: Guardrails
309:   }
310: 
311: data Guardrails = Guardrails
312:   { guardrailMaxCostCents :: Double,
313:     guardrailMaxTokens :: Int,
314:     guardrailMaxDuplicateToolCalls :: Int,
315:     guardrailMaxTestFailures :: Int,
316:     guardrailMaxEditFailures :: Int
317:   }
318:   deriving (Show, Eq, Generic)
319: 
320: instance Aeson.ToJSON Guardrails
321: 
322: instance Aeson.FromJSON Guardrails
323: 
324: data GuardrailResult
325:   = GuardrailOk
326:   | GuardrailCostExceeded Double Double
327:   | GuardrailTokensExceeded Int Int
328:   | GuardrailDuplicateToolCalls Text Int
329:   | GuardrailTestFailures Int
330:   | GuardrailEditFailures Int
331:   deriving (Show, Eq, Generic)
332: 
333: instance Aeson.ToJSON GuardrailResult
334: 
335: instance Aeson.FromJSON GuardrailResult
336: 
337: defaultGuardrails :: Guardrails
338: defaultGuardrails =
339:   Guardrails
340:     { guardrailMaxCostCents = 100.0,
341:       guardrailMaxTokens = 500000,
342:       guardrailMaxDuplicateToolCalls = 3,
343:       guardrailMaxTestFailures = 3,
344:       guardrailMaxEditFailures = 5
345:     }
346: 
347: defaultAgentConfig :: AgentConfig
348: defaultAgentConfig =
349:   AgentConfig
350:     { agentModel = "gpt-4",
351:       agentTools = [],
352:       agentSystemPrompt = "You are a helpful assistant.",
353:       agentMaxIterations = 10,
354:       agentGuardrails = defaultGuardrails
355:     }
356: 
357: data EngineConfig = EngineConfig
358:   { engineLLM :: LLM,
359:     engineOnCost :: Int -> Double -> IO (),
360:     engineOnActivity :: Text -> IO (),
361:     engineOnToolCall :: Text -> Text -> IO (),
362:     engineOnAssistant :: Text -> IO (),
363:     engineOnToolResult :: Text -> Bool -> Text -> IO (),
364:     engineOnComplete :: IO (),
365:     engineOnError :: Text -> IO (),
366:     engineOnGuardrail :: GuardrailResult -> IO ()
367:   }
368: 
369: defaultEngineConfig :: EngineConfig
370: defaultEngineConfig =
371:   EngineConfig
372:     { engineLLM = defaultLLM,
373:       engineOnCost = \_ _ -> pure (),
374:       engineOnActivity = \_ -> pure (),
375:       engineOnToolCall = \_ _ -> pure (),
376:       engineOnAssistant = \_ -> pure (),
377:       engineOnToolResult = \_ _ _ -> pure (),
378:       engineOnComplete = pure (),
379:       engineOnError = \_ -> pure (),
380:       engineOnGuardrail = \_ -> pure ()
381:     }
382: 
383: data AgentResult = AgentResult
384:   { resultFinalMessage :: Text,
385:     resultToolCallCount :: Int,
386:     resultIterations :: Int,
387:     resultTotalCost :: Double,
388:     resultTotalTokens :: Int
389:   }
390:   deriving (Show, Eq, Generic)
391: 
392: instance Aeson.ToJSON AgentResult
393: 
394: instance Aeson.FromJSON AgentResult
395: 
396: data Role = System | User | Assistant | ToolRole
397:   deriving (Show, Eq, Generic)
398: 
399: instance Aeson.ToJSON Role where
400:   toJSON System = Aeson.String "system"
401:   toJSON User = Aeson.String "user"
402:   toJSON Assistant = Aeson.String "assistant"
403:   toJSON ToolRole = Aeson.String "tool"
404: 
405: instance Aeson.FromJSON Role where
406:   parseJSON = Aeson.withText "Role" parseRole
407:     where
408:       parseRole "system" = pure System
409:       parseRole "user" = pure User
410:       parseRole "assistant" = pure Assistant
411:       parseRole "tool" = pure ToolRole
412:       parseRole _ = empty
413: 
414: data Message = Message
415:   { msgRole :: Role,
416:     msgContent :: Text,
417:     msgToolCalls :: Maybe [ToolCall],
418:     msgToolCallId :: Maybe Text
419:   }
420:   deriving (Show, Eq, Generic)
421: 
422: instance Aeson.ToJSON Message where
423:   toJSON m =
424:     Aeson.object
425:       <| catMaybes
426:         [ Just ("role" .= msgRole m),
427:           Just ("content" .= msgContent m),
428:           ("tool_calls" .=) </ msgToolCalls m,
429:           ("tool_call_id" .=) </ msgToolCallId m
430:         ]
431: 
432: instance Aeson.FromJSON Message where
433:   parseJSON =
434:     Aeson.withObject "Message" <| \v ->
435:       (Message </ (v .: "role"))
436:         <*> (v .:? "content" .!= "")
437:         <*> (v .:? "tool_calls")
438:         <*> (v .:? "tool_call_id")
439: 
440: data ToolCall = ToolCall
441:   { tcId :: Text,
442:     tcType :: Text,
443:     tcFunction :: FunctionCall
444:   }
445:   deriving (Show, Eq, Generic)
446: 
447: instance Aeson.ToJSON ToolCall where
448:   toJSON tc =
449:     Aeson.object
450:       [ "id" .= tcId tc,
451:         "type" .= tcType tc,
452:         "function" .= tcFunction tc
453:       ]
454: 
455: instance Aeson.FromJSON ToolCall where
456:   parseJSON =
457:     Aeson.withObject "ToolCall" <| \v ->
458:       (ToolCall </ (v .: "id"))
459:         <*> (v .:? "type" .!= "function")
460:         <*> (v .: "function")
461: 
462: data FunctionCall = FunctionCall
463:   { fcName :: Text,
464:     fcArguments :: Text
465:   }
466:   deriving (Show, Eq, Generic)
467: 
468: instance Aeson.ToJSON FunctionCall where
469:   toJSON fc =
470:     Aeson.object
471:       [ "name" .= fcName fc,
472:         "arguments" .= fcArguments fc
473:       ]
474: 
475: instance Aeson.FromJSON FunctionCall where
476:   parseJSON =
477:     Aeson.withObject "FunctionCall" <| \v ->
478:       (FunctionCall </ (v .: "name"))
479:         <*> (v .: "arguments")
480: 
481: data ToolResult = ToolResult
482:   { trToolCallId :: Text,
483:     trContent :: Text
484:   }
485:   deriving (Show, Eq, Generic)
486: 
487: instance Aeson.ToJSON ToolResult
488: 
489: instance Aeson.FromJSON ToolResult
490: 
491: data ChatCompletionRequest = ChatCompletionRequest
492:   { reqModel :: Text,
493:     reqMessages :: [Message],
494:     reqTools :: Maybe [ToolApi]
495:   }
496:   deriving (Generic)
497: 
498: instance Aeson.ToJSON ChatCompletionRequest where
499:   toJSON r =
500:     Aeson.object
501:       <| catMaybes
502:         [ Just ("model" .= reqModel r),
503:           Just ("messages" .= reqMessages r),
504:           ("tools" .=) </ reqTools r,
505:           Just ("usage" .= Aeson.object ["include" .= True])
506:         ]
507: 
508: data Choice = Choice
509:   { choiceIndex :: Int,
510:     choiceMessage :: Message,
511:     choiceFinishReason :: Maybe Text
512:   }
513:   deriving (Show, Eq, Generic)
514: 
515: instance Aeson.FromJSON Choice where
516:   parseJSON =
517:     Aeson.withObject "Choice" <| \v ->
518:       (Choice </ (v .: "index"))
519:         <*> (v .: "message")
520:         <*> (v .:? "finish_reason")
521: 
522: data Usage = Usage
523:   { usagePromptTokens :: Int,
524:     usageCompletionTokens :: Int,
525:     usageTotalTokens :: Int,
526:     usageCost :: Maybe Double
527:   }
528:   deriving (Show, Eq, Generic)
529: 
530: instance Aeson.FromJSON Usage where
531:   parseJSON =
532:     Aeson.withObject "Usage" <| \v ->
533:       (Usage </ (v .: "prompt_tokens"))
534:         <*> (v .: "completion_tokens")
535:         <*> (v .: "total_tokens")
536:         <*> (v .:? "cost")
537: 
538: data ChatCompletionResponse = ChatCompletionResponse
539:   { respId :: Text,
540:     respChoices :: [Choice],
541:     respModel :: Text,
542:     respUsage :: Maybe Usage
543:   }
544:   deriving (Show, Eq, Generic)
545: 
546: instance Aeson.FromJSON ChatCompletionResponse where
547:   parseJSON =
548:     Aeson.withObject "ChatCompletionResponse" <| \v ->
549:       (ChatCompletionResponse </ (v .: "id"))
550:         <*> (v .: "choices")
551:         <*> (v .: "model")
552:         <*> (v .:? "usage")
553: 
554: data ChatResult = ChatResult
555:   { chatMessage :: Message,
556:     chatUsage :: Maybe Usage
557:   }
558:   deriving (Show, Eq)
559: 
560: chatWithUsage :: LLM -> [Tool] -> [Message] -> IO (Either Text ChatResult)
561: chatWithUsage llm tools messages = do
562:   let url = Text.unpack (llmBaseUrl llm) <> "/chat/completions"
563:   req0 <- HTTP.parseRequest url
564:   let toolApis = [encodeToolForApi t | not (null tools), t <- tools]
565:       body =
566:         ChatCompletionRequest
567:           { reqModel = llmModel llm,
568:             reqMessages = messages,
569:             reqTools = if null toolApis then Nothing else Just toolApis
570:           }
571:       baseReq =
572:         HTTP.setRequestMethod "POST"
573:           <| HTTP.setRequestHeader "Content-Type" ["application/json"]
574:           <| HTTP.setRequestHeader "Authorization" ["Bearer " <> TE.encodeUtf8 (llmApiKey llm)]
575:           <| HTTP.setRequestBodyLBS (Aeson.encode body)
576:           <| req0
577:       req = foldr addHeader baseReq (llmExtraHeaders llm)
578:       addHeader (name, value) = HTTP.addRequestHeader (CI.mk name) value
579: 
580:   response <- HTTP.httpLBS req
581:   let status = HTTP.getResponseStatusCode response
582:   if status >= 200 && status < 300
583:     then case Aeson.decode (HTTP.getResponseBody response) of
584:       Just resp ->
585:         case respChoices resp of
586:           (c : _) -> pure (Right (ChatResult (choiceMessage c) (respUsage resp)))
587:           [] -> pure (Left "No choices in response")
588:       Nothing -> pure (Left "Failed to parse response")
589:     else pure (Left ("HTTP error: " <> tshow status <> " - " <> TE.decodeUtf8 (BL.toStrict (HTTP.getResponseBody response))))
590: 
591: chat :: LLM -> [Tool] -> [Message] -> IO (Either Text Message)
592: chat llm tools messages = do
593:   result <- chatWithUsage llm tools messages
594:   pure (chatMessage </ result)
595: 
596: runAgent :: EngineConfig -> AgentConfig -> Text -> IO (Either Text AgentResult)
597: runAgent engineCfg agentCfg userPrompt = do
598:   let llm =
599:         (engineLLM engineCfg)
600:           { llmModel = agentModel agentCfg
601:           }
602:       tools = agentTools agentCfg
603:       toolMap = buildToolMap tools
604:       systemMsg = Message System (agentSystemPrompt agentCfg) Nothing Nothing
605:       userMsg = Message User userPrompt Nothing Nothing
606:       initialMessages = [systemMsg, userMsg]
607: 
608:   engineOnActivity engineCfg "Starting agent loop"
609:   loop llm tools toolMap initialMessages 0 0 0 0.0 Map.empty 0 0
610:   where
611:     maxIter = agentMaxIterations agentCfg
612:     guardrails' = agentGuardrails agentCfg
613: 
614:     loop ::
615:       LLM ->
616:       [Tool] ->
617:       Map.Map Text Tool ->
618:       [Message] ->
619:       Int ->
620:       Int ->
621:       Int ->
622:       Double ->
623:       Map.Map Text Int ->
624:       Int ->
625:       Int ->
626:       IO (Either Text AgentResult)
627:     loop llm tools' toolMap msgs iteration totalCalls totalTokens totalCost toolCallCounts testFailures editFailures
628:       | iteration >= maxIter = do
629:           let errMsg = "Max iterations (" <> tshow maxIter <> ") reached"
630:           engineOnError engineCfg errMsg
631:           pure <| Left errMsg
632:       | otherwise = do
633:           let guardrailViolation = findGuardrailViolation guardrails' totalCost totalTokens toolCallCounts testFailures editFailures
634:           case guardrailViolation of
635:             Just (g, errMsg) -> do
636:               engineOnGuardrail engineCfg g
637:               pure <| Left errMsg
638:             Nothing -> do
639:               engineOnActivity engineCfg <| "Iteration " <> tshow (iteration + 1)
640:               result <- chatWithUsage llm tools' msgs
641:               case result of
642:                 Left err -> do
643:                   engineOnError engineCfg err
644:                   pure (Left err)
645:                 Right chatRes -> do
646:                   let msg = chatMessage chatRes
647:                       tokens = maybe 0 usageTotalTokens (chatUsage chatRes)
648:                       cost = case chatUsage chatRes +> usageCost of
649:                         Just actualCost -> actualCost * 100
650:                         Nothing -> estimateCost (llmModel llm) tokens
651:                   engineOnCost engineCfg tokens cost
652:                   let newTokens = totalTokens + tokens
653:                       newCost = totalCost + cost
654:                   let assistantText = msgContent msg
655:                   unless (Text.null assistantText)
656:                     <| engineOnAssistant engineCfg assistantText
657:                   case msgToolCalls msg of
658:                     Nothing -> do
659:                       engineOnActivity engineCfg "Agent completed"
660:                       engineOnComplete engineCfg
661:                       pure
662:                         <| Right
663:                         <| AgentResult
664:                           { resultFinalMessage = msgContent msg,
665:                             resultToolCallCount = totalCalls,
666:                             resultIterations = iteration + 1,
667:                             resultTotalCost = newCost,
668:                             resultTotalTokens = newTokens
669:                           }
670:                     Just [] -> do
671:                       engineOnActivity engineCfg "Agent completed (empty tool calls)"
672:                       engineOnComplete engineCfg
673:                       pure
674:                         <| Right
675:                         <| AgentResult
676:                           { resultFinalMessage = msgContent msg,
677:                             resultToolCallCount = totalCalls,
678:                             resultIterations = iteration + 1,
679:                             resultTotalCost = newCost,
680:                             resultTotalTokens = newTokens
681:                           }
682:                     Just tcs -> do
683:                       (toolResults, newTestFailures, newEditFailures) <- executeToolCallsWithTracking engineCfg toolMap tcs testFailures editFailures
684:                       let newMsgs = msgs <> [msg] <> toolResults
685:                           newCalls = totalCalls + length tcs
686:                           newToolCallCounts = updateToolCallCounts toolCallCounts tcs
687:                       loop llm tools' toolMap newMsgs (iteration + 1) newCalls newTokens newCost newToolCallCounts newTestFailures newEditFailures
688: 
689: checkCostGuardrail :: Guardrails -> Double -> GuardrailResult
690: checkCostGuardrail g cost
691:   | cost > guardrailMaxCostCents g = GuardrailCostExceeded cost (guardrailMaxCostCents g)
692:   | otherwise = GuardrailOk
693: 
694: checkTokenGuardrail :: Guardrails -> Int -> GuardrailResult
695: checkTokenGuardrail g tokens
696:   | tokens > guardrailMaxTokens g = GuardrailTokensExceeded tokens (guardrailMaxTokens g)
697:   | otherwise = GuardrailOk
698: 
699: checkDuplicateGuardrail :: Guardrails -> Map.Map Text Int -> GuardrailResult
700: checkDuplicateGuardrail g counts =
701:   let maxAllowed = guardrailMaxDuplicateToolCalls g
702:       violations = [(name, count) | (name, count) <- Map.toList counts, count >= maxAllowed]
703:    in case violations of
704:         ((name, count) : _) -> GuardrailDuplicateToolCalls name count
705:         [] -> GuardrailOk
706: 
707: checkTestFailureGuardrail :: Guardrails -> Int -> GuardrailResult
708: checkTestFailureGuardrail g failures
709:   | failures >= guardrailMaxTestFailures g = GuardrailTestFailures failures
710:   | otherwise = GuardrailOk
711: 
712: checkEditFailureGuardrail :: Guardrails -> Int -> GuardrailResult
713: checkEditFailureGuardrail g failures
714:   | failures >= guardrailMaxEditFailures g = GuardrailEditFailures failures
715:   | otherwise = GuardrailOk
716: 
717: updateToolCallCounts :: Map.Map Text Int -> [ToolCall] -> Map.Map Text Int
718: updateToolCallCounts =
719:   foldr (\tc m -> Map.insertWith (+) (fcName (tcFunction tc)) 1 m)
720: 
721: findGuardrailViolation :: Guardrails -> Double -> Int -> Map.Map Text Int -> Int -> Int -> Maybe (GuardrailResult, Text)
722: findGuardrailViolation g cost tokens toolCallCounts testFailures editFailures =
723:   case checkCostGuardrail g cost of
724:     r@(GuardrailCostExceeded actual limit) ->
725:       Just (r, "Guardrail: cost budget exceeded (" <> tshow actual <> "/" <> tshow limit <> " cents)")
726:     _ -> case checkTokenGuardrail g tokens of
727:       r@(GuardrailTokensExceeded actual limit) ->
728:         Just (r, "Guardrail: token budget exceeded (" <> tshow actual <> "/" <> tshow limit <> " tokens)")
729:       _ -> case checkDuplicateGuardrail g toolCallCounts of
730:         r@(GuardrailDuplicateToolCalls tool count) ->
731:           Just (r, "Guardrail: duplicate tool calls (" <> tool <> " called " <> tshow count <> " times)")
732:         _ -> case checkTestFailureGuardrail g testFailures of
733:           r@(GuardrailTestFailures count) ->
734:             Just (r, "Guardrail: too many test failures (" <> tshow count <> ")")
735:           _ -> case checkEditFailureGuardrail g editFailures of
736:             r@(GuardrailEditFailures count) ->
737:               Just (r, "Guardrail: too many edit_file failures (" <> tshow count <> " 'old_str not found' errors)")
738:             _ -> Nothing
739: 
740: buildToolMap :: [Tool] -> Map.Map Text Tool
741: buildToolMap = Map.fromList <. map (\t -> (toolName t, t))
742: 
743: -- | Track both test failures and edit failures
744: -- Returns (messages, testFailures, editFailures)
745: executeToolCallsWithTracking :: EngineConfig -> Map.Map Text Tool -> [ToolCall] -> Int -> Int -> IO ([Message], Int, Int)
746: executeToolCallsWithTracking engineCfg toolMap tcs initialTestFailures initialEditFailures = do
747:   results <- traverse executeSingle tcs
748:   let msgs = map (\(m, _, _) -> m) results
749:       testDeltas = map (\(_, t, _) -> t) results
750:       editDeltas = map (\(_, _, e) -> e) results
751:       totalTestFailures = initialTestFailures + sum testDeltas
752:       totalEditFailures = initialEditFailures + sum editDeltas
753:   pure (msgs, totalTestFailures, totalEditFailures)
754:   where
755:     executeSingle tc = do
756:       let name = fcName (tcFunction tc)
757:           argsText = fcArguments (tcFunction tc)
758:           callId = tcId tc
759:       engineOnActivity engineCfg <| "Executing tool: " <> name
760:       engineOnToolCall engineCfg name argsText
761:       case Map.lookup name toolMap of
762:         Nothing -> do
763:           let errMsg = "Tool not found: " <> name
764:           engineOnToolResult engineCfg name False errMsg
765:           pure (Message ToolRole errMsg Nothing (Just callId), 0, 0)
766:         Just tool -> do
767:           case Aeson.decode (BL.fromStrict (TE.encodeUtf8 argsText)) of
768:             Nothing -> do
769:               let errMsg = "Invalid JSON arguments: " <> argsText
770:               engineOnToolResult engineCfg name False errMsg
771:               pure (Message ToolRole errMsg Nothing (Just callId), 0, 0)
772:             Just args -> do
773:               resultValue <- toolExecute tool args
774:               let resultText = TE.decodeUtf8 (BL.toStrict (Aeson.encode resultValue))
775:                   isTestCall = name == "bash" && ("bild --test" `Text.isInfixOf` argsText || "bild -t" `Text.isInfixOf` argsText)
776:                   isTestFailure = isTestCall && isFailureResult resultValue
777:                   testDelta = if isTestFailure then 1 else 0
778:                   isEditFailure = name == "edit_file" && isOldStrNotFoundError resultValue
779:                   editDelta = if isEditFailure then 1 else 0
780:               engineOnToolResult engineCfg name True resultText
781:               pure (Message ToolRole resultText Nothing (Just callId), testDelta, editDelta)
782: 
783:     isFailureResult :: Aeson.Value -> Bool
784:     isFailureResult (Aeson.Object obj) =
785:       case KeyMap.lookup "exit_code" obj of
786:         Just (Aeson.Number n) -> n /= 0
787:         _ -> False
788:     isFailureResult (Aeson.String s) =
789:       "error"
790:         `Text.isInfixOf` Text.toLower s
791:         || "failed"
792:         `Text.isInfixOf` Text.toLower s
793:         || "FAILED"
794:         `Text.isInfixOf` s
795:     isFailureResult _ = False
796: 
797:     isOldStrNotFoundError :: Aeson.Value -> Bool
798:     isOldStrNotFoundError (Aeson.Object obj) =
799:       case KeyMap.lookup "error" obj of
800:         Just (Aeson.String s) -> "old_str not found" `Text.isInfixOf` s
801:         _ -> False
802:     isOldStrNotFoundError _ = False
803: 
804: -- | Estimate cost in cents from token count
805: estimateCost :: Text -> Int -> Double
806: estimateCost model tokens
807:   | "gpt-4o-mini" `Text.isInfixOf` model = fromIntegral tokens * 15 / 1000000
808:   | "gpt-4o" `Text.isInfixOf` model = fromIntegral tokens * 250 / 100000
809:   | "gpt-4" `Text.isInfixOf` model = fromIntegral tokens * 3 / 100000
810:   | "claude" `Text.isInfixOf` model = fromIntegral tokens * 3 / 100000
811:   | otherwise = fromIntegral tokens / 100000
💭Thought[engineer]3 months ago

Now let me check the task comment again - it mentions t-210 which implements NeedsHelp status. Let me search for how jr tasks are updated:

ϟjr task get t-210 --json 2>/dev/null || echo "Task t-210 not found"
GrepNeedsHelp
21 lines
./Omni/Task.hs:309:        Just "needs-help" -> pure <| Just NeedsHelp
./Omni/Task.hs:371:            "needs-help" -> NeedsHelp
./Omni/Task.hs:635:      Test.unit "ready tasks exclude NeedsHelp tasks" <| do
./Omni/Task.hs:637:        updateTaskStatus (taskId task) NeedsHelp []
./Omni/Jr/Web/Style.hs:587:  ".badge-needshelp" ? do
./Omni/Jr/Web/Style.hs:1912:    ".badge-needshelp" ? do
./Omni/Task/Core.hs:51:data Status = Draft | Open | InProgress | Review | Approved | Done | NeedsHelp
./Omni/Task/Core.hs:810:          /= NeedsHelp
./Omni/Task/Core.hs:926:              NeedsHelp -> "[!]"
./Omni/Task/Core.hs:937:              NeedsHelp -> yellow statusStr
./Omni/Task/Core.hs:996:              NeedsHelp -> yellow s
./Omni/Task/Core.hs:1565:      needingHelp = [t | t <- allTasks, taskStatus t == NeedsHelp]
./Omni/Jr/Web.hs:684:        TaskCore.NeedsHelp -> ("badge badge-needshelp", "Needs Help")
./Omni/Jr/Web.hs:775:        TaskCore.NeedsHelp -> ("badge badge-needshelp status-badge-clickable", "Needs Help")
./Omni/Jr/Web.hs:803:      statusOption TaskCore.NeedsHelp currentStatus tid
./Omni/Jr/Web.hs:814:        TaskCore.NeedsHelp -> ("badge badge-needshelp", "Needs Help")
./Omni/Jr/Web.hs:1091:        needsHelp = TaskCore.tasksNeedingHelp actionItems
./Omni/Jr/Web.hs:1092:        totalCount = length failed + length epicsReady + length needsHelp
./Omni/Jr/Web.hs:1111:                  unless (null needsHelp) <| do
./Omni/Jr/Web.hs:1112:                    Lucid.h2_ [Lucid.class_ "section-header"] <| Lucid.toHtml ("Needs Help (" <> tshow (length needsHelp) <> ")")
./Omni/Jr/Web.hs:1114:                    Lucid.div_ [Lucid.class_ "task-list"] <| traverse_ renderTaskCard (sortTasks currentSort needsHelp)
💭Thought[engineer]3 months ago

Good! The NeedsHelp status already exists. Now let me check the Worker code to see how it handles agents and where I need to integrate the guardrail → NeedsHelp transition:

GreprunAgent
./Omni/Agent/Worker.hs:337:      result <- Engine.runAgent engineCfg agentCfg userPrompt
./Omni/Agent/Engine.hs:38:    runAgent,
./Omni/Agent/Engine.hs:596:runAgent :: EngineConfig -> AgentConfig -> Text -> IO (Either Text AgentResult)
./Omni/Agent/Engine.hs:597:runAgent engineCfg agentCfg userPrompt = do
ReadOmni/Agent/Worker.hs
554 lines
1: {-# LANGUAGE OverloadedStrings #-}
2: {-# LANGUAGE NoImplicitPrelude #-}
3: 
4: module Omni.Agent.Worker where
5: 
6: import Alpha
7: import qualified Data.Aeson as Aeson
8: import qualified Data.Aeson.Key as AesonKey
9: import qualified Data.ByteString.Lazy as BSL
10: import Data.IORef (modifyIORef', newIORef, readIORef)
11: import qualified Data.List as List
12: import qualified Data.Text as Text
13: import qualified Data.Text.Encoding as TE
14: import qualified Data.Time
15: import qualified Omni.Agent.Core as Core
16: import qualified Omni.Agent.Engine as Engine
17: import qualified Omni.Agent.Log as AgentLog
18: import qualified Omni.Agent.Tools as Tools
19: import qualified Omni.Fact as Fact
20: import qualified Omni.Task.Core as TaskCore
21: import qualified System.Directory as Directory
22: import qualified System.Environment as Env
23: import qualified System.Exit as Exit
24: import System.FilePath ((</>))
25: import qualified System.Process as Process
26: 
27: start :: Core.Worker -> Maybe Text -> IO ()
28: start worker maybeTaskId = do
29:   if Core.workerQuiet worker
30:     then putText ("[worker] Starting for " <> Core.workerName worker)
31:     else do
32:       AgentLog.init (Core.workerName worker)
33:       AgentLog.log ("[worker] Starting for " <> Core.workerName worker)
34:   case maybeTaskId of
35:     Just tid -> logMsg worker ("[worker] Target task: " <> tid)
36:     Nothing -> logMsg worker "[worker] No specific task, will pick from ready queue"
37:   runOnce worker maybeTaskId
38: 
39: -- | Log message respecting quiet mode
40: logMsg :: Core.Worker -> Text -> IO ()
41: logMsg worker msg =
42:   if Core.workerQuiet worker
43:     then putText msg
44:     else AgentLog.log msg
45: 
46: -- | Convert key-value pairs to JSON metadata string
47: toMetadata :: [(Text, Text)] -> Text
48: toMetadata pairs =
49:   let obj = Aeson.object [(AesonKey.fromText k, Aeson.String v) | (k, v) <- pairs]
50:    in TE.decodeUtf8 (BSL.toStrict (Aeson.encode obj))
51: 
52: -- | Format guardrail result for logging
53: formatGuardrailResult :: Engine.GuardrailResult -> Text
54: formatGuardrailResult Engine.GuardrailOk = "OK"
55: formatGuardrailResult (Engine.GuardrailCostExceeded actual limit) =
56:   "Cost exceeded: " <> tshow actual <> " cents (limit: " <> tshow limit <> ")"
57: formatGuardrailResult (Engine.GuardrailTokensExceeded actual limit) =
58:   "Tokens exceeded: " <> tshow actual <> " (limit: " <> tshow limit <> ")"
59: formatGuardrailResult (Engine.GuardrailDuplicateToolCalls tool count) =
60:   "Duplicate tool calls: " <> tool <> " called " <> tshow count <> " times"
61: formatGuardrailResult (Engine.GuardrailTestFailures count) =
62:   "Test failures: " <> tshow count <> " failures"
63: formatGuardrailResult (Engine.GuardrailEditFailures count) =
64:   "Edit failures: " <> tshow count <> " 'old_str not found' errors"
65: 
66: runOnce :: Core.Worker -> Maybe Text -> IO ()
67: runOnce worker maybeTaskId = do
68:   -- Find work
69:   targetTask <- case maybeTaskId of
70:     Just tid -> do
71:       TaskCore.findTask tid </ TaskCore.loadTasks
72:     Nothing -> do
73:       readyTasks <- TaskCore.getReadyTasks
74:       case readyTasks of
75:         [] -> pure Nothing
76:         (task : _) -> pure (Just task)
77: 
78:   case targetTask of
79:     Nothing -> do
80:       case maybeTaskId of
81:         Just tid -> do
82:           unless (Core.workerQuiet worker) <| AgentLog.updateActivity ("Task " <> tid <> " not found.")
83:           logMsg worker ("[worker] Task " <> tid <> " not found.")
84:         Nothing -> do
85:           unless (Core.workerQuiet worker) <| AgentLog.updateActivity "No work found."
86:           logMsg worker "[worker] No ready tasks found."
87:     Just task -> do
88:       processTask worker task
89: 
90: processTask :: Core.Worker -> TaskCore.Task -> IO ()
91: processTask worker task = do
92:   let repo = Core.workerPath worker
93:   let tid = TaskCore.taskId task
94:   let quiet = Core.workerQuiet worker
95:   let say = logMsg worker
96: 
97:   unless quiet <| AgentLog.update (\s -> s {AgentLog.statusTask = Just tid})
98:   say ("[worker] Claiming task " <> tid)
99: 
100:   -- Claim task
101:   TaskCore.logActivity tid TaskCore.Claiming Nothing
102:   TaskCore.updateTaskStatusWithActor tid TaskCore.InProgress [] TaskCore.Junior
103:   say "[worker] Status -> InProgress"
104: 
105:   -- Run agent with timing
106:   startTime <- Data.Time.getCurrentTime
107:   activityId <- TaskCore.logActivityWithMetrics tid TaskCore.Running Nothing Nothing (Just startTime) Nothing Nothing Nothing
108: 
109:   say "[worker] Starting engine..."
110:   (exitCode, output, costCents) <- runWithEngine worker repo task
111: 
112:   endTime <- Data.Time.getCurrentTime
113:   say ("[worker] Agent exited with: " <> tshow exitCode)
114: 
115:   -- Update the activity record with metrics (convert Double to Int by rounding)
116:   TaskCore.updateActivityMetrics activityId Nothing (Just endTime) (Just (round costCents)) Nothing
117: 
118:   case exitCode of
119:     Exit.ExitSuccess -> do
120:       TaskCore.logActivity tid TaskCore.Reviewing Nothing
121:       say "[worker] Running formatters..."
122:       _ <- runFormatters repo
123: 
124:       -- Try to commit (this runs git hooks which may fail)
125:       let commitMsg = formatCommitMessage task output
126:       say "[worker] Attempting commit..."
127:       commitResult <- tryCommit repo commitMsg
128: 
129:       case commitResult of
130:         CommitFailed commitErr -> do
131:           say ("[worker] Commit failed: " <> commitErr)
132: 
133:           -- Save failure context and reopen task for retry
134:           maybeCtx <- TaskCore.getRetryContext tid
135:           let attempt = maybe 1 (\c -> TaskCore.retryAttempt c + 1) maybeCtx
136: 
137:           if attempt > 3
138:             then do
139:               say "[worker] Task failed 3 times, needs human intervention"
140:               TaskCore.logActivity tid TaskCore.Failed (Just (toMetadata [("reason", "max_retries_exceeded")]))
141:               TaskCore.updateTaskStatusWithActor tid TaskCore.Open [] TaskCore.Junior
142:             else do
143:               let currentReason = "attempt " <> tshow attempt <> ": commit_failed: " <> commitErr
144:               let accumulatedReason = case maybeCtx of
145:                     Nothing -> currentReason
146:                     Just ctx -> TaskCore.retryReason ctx <> "\n" <> currentReason
147:               TaskCore.setRetryContext
148:                 TaskCore.RetryContext
149:                   { TaskCore.retryTaskId = tid,
150:                     TaskCore.retryOriginalCommit = "",
151:                     TaskCore.retryConflictFiles = [],
152:                     TaskCore.retryAttempt = attempt,
153:                     TaskCore.retryReason = accumulatedReason,
154:                     TaskCore.retryNotes = maybeCtx +> TaskCore.retryNotes
155:                   }
156:               TaskCore.logActivity tid TaskCore.Retrying (Just (toMetadata [("attempt", tshow attempt)]))
157:               TaskCore.updateTaskStatusWithActor tid TaskCore.Open [] TaskCore.Junior
158:               say ("[worker] Task reopened (attempt " <> tshow attempt <> "/3)")
159:         NoChanges -> do
160:           -- No changes = task already implemented, mark as Done
161:           say "[worker] No changes to commit - task already done"
162:           TaskCore.clearRetryContext tid
163:           TaskCore.logActivity tid TaskCore.Completed (Just (toMetadata [("result", "no_changes")]))
164:           TaskCore.updateTaskStatusWithActor tid TaskCore.Done [] TaskCore.Junior
165:           say ("[worker] ✓ Task " <> tid <> " -> Done (no changes)")
166:           unless quiet <| AgentLog.update (\s -> s {AgentLog.statusTask = Nothing})
167:         CommitSuccess -> do
168:           -- Commit succeeded, set to Review
169:           TaskCore.logActivity tid TaskCore.Completed (Just (toMetadata [("result", "committed")]))
170:           TaskCore.updateTaskStatusWithActor tid TaskCore.Review [] TaskCore.Junior
171:           say ("[worker] ✓ Task " <> tid <> " -> Review")
172:           unless quiet <| AgentLog.update (\s -> s {AgentLog.statusTask = Nothing})
173:     Exit.ExitFailure code -> do
174:       say ("[worker] Engine failed with code " <> tshow code)
175:       TaskCore.logActivity tid TaskCore.Failed (Just (toMetadata [("exit_code", tshow code)]))
176:       -- Don't set back to Open here - leave in InProgress for debugging
177:       say "[worker] Task left in InProgress (engine failure)"
178: 
179: -- | Run lint --fix to format and fix lint issues
180: runFormatters :: FilePath -> IO (Either Text ())
181: runFormatters repo = do
182:   let cmd = (Process.proc "lint" ["--fix"]) {Process.cwd = Just repo}
183:   (code, _, _) <- Process.readCreateProcessWithExitCode cmd ""
184:   case code of
185:     Exit.ExitSuccess -> pure (Right ())
186:     Exit.ExitFailure _ -> pure (Right ()) -- lint --fix may exit non-zero but still fix things
187: 
188: data CommitResult = CommitSuccess | NoChanges | CommitFailed Text
189:   deriving (Show, Eq)
190: 
191: -- | Try to commit, returning result
192: tryCommit :: FilePath -> Text -> IO CommitResult
193: tryCommit repo msg = do
194:   -- Stage all changes
195:   let addCmd = (Process.proc "git" ["add", "."]) {Process.cwd = Just repo}
196:   (addCode, _, addErr) <- Process.readCreateProcessWithExitCode addCmd ""
197:   case addCode of
198:     Exit.ExitFailure _ -> pure <| CommitFailed (Text.pack addErr)
199:     Exit.ExitSuccess -> do
200:       -- Check for changes
201:       let checkCmd = (Process.proc "git" ["diff", "--cached", "--quiet"]) {Process.cwd = Just repo}
202:       (checkCode, _, _) <- Process.readCreateProcessWithExitCode checkCmd ""
203:       case checkCode of
204:         Exit.ExitSuccess -> pure NoChanges
205:         Exit.ExitFailure 1 -> do
206:           -- There are changes, commit them
207:           let commitCmd = (Process.proc "git" ["commit", "-m", Text.unpack msg]) {Process.cwd = Just repo}
208:           (commitCode, _, commitErr) <- Process.readCreateProcessWithExitCode commitCmd ""
209:           case commitCode of
210:             Exit.ExitSuccess -> pure CommitSuccess
211:             Exit.ExitFailure _ -> pure <| CommitFailed (Text.pack commitErr)
212:         Exit.ExitFailure c -> pure <| CommitFailed ("git diff failed with code " <> tshow c)
213: 
214: -- | Run task using native Engine
215: -- Returns (ExitCode, output text, cost in cents)
216: runWithEngine :: Core.Worker -> FilePath -> TaskCore.Task -> IO (Exit.ExitCode, Text, Double)
217: runWithEngine worker repo task = do
218:   -- Read API key from environment
219:   maybeApiKey <- Env.lookupEnv "OPENROUTER_API_KEY"
220:   case maybeApiKey of
221:     Nothing -> pure (Exit.ExitFailure 1, "OPENROUTER_API_KEY not set", 0)
222:     Just apiKey -> do
223:       -- Check for retry context
224:       maybeRetry <- TaskCore.getRetryContext (TaskCore.taskId task)
225: 
226:       -- Get progress from database (checkpoint events from previous sessions)
227:       progressContent <- TaskCore.getProgressSummary (TaskCore.taskId task)
228: 
229:       -- Build the full prompt
230:       let ns = fromMaybe "." (TaskCore.taskNamespace task)
231:       let basePrompt = buildBasePrompt task ns repo
232: 
233:       -- Add progress context if present
234:       let progressPrompt = buildProgressPrompt progressContent
235: 
236:       -- Add retry context if present
237:       let retryPrompt = buildRetryPrompt maybeRetry
238: 
239:       let prompt = basePrompt <> progressPrompt <> retryPrompt
240: 
241:       -- Read AGENTS.md
242:       agentsMd <-
243:         fmap (fromMaybe "") <| do
244:           exists <- Directory.doesFileExist (repo </> "AGENTS.md")
245:           if exists
246:             then Just </ readFile (repo </> "AGENTS.md")
247:             else pure Nothing
248: 
249:       -- Get relevant facts from the knowledge base
250:       relevantFacts <- getRelevantFacts task
251:       let factsSection = formatFacts relevantFacts
252: 
253:       -- Build system prompt
254:       let systemPrompt =
255:             prompt
256:               <> "\n\nREPOSITORY GUIDELINES (AGENTS.md):\n"
257:               <> agentsMd
258:               <> factsSection
259: 
260:       -- Build user prompt from task comments
261:       let userPrompt = formatTask task
262: 
263:       -- Select model based on task complexity (simple heuristic)
264:       let model = selectModel task
265: 
266:       -- Generate session ID for event logging
267:       sessionId <- TaskCore.generateSessionId
268:       let tid = TaskCore.taskId task
269: 
270:       -- Helper to log events to DB
271:       -- For text content, store as-is; for structured data, JSON-encode
272:       let logJuniorEvent eventType content = TaskCore.insertAgentEvent tid sessionId eventType content TaskCore.Junior
273:           logJuniorJson eventType value = do
274:             let contentJson = TE.decodeUtf8 (BSL.toStrict (Aeson.encode value))
275:             TaskCore.insertAgentEvent tid sessionId eventType contentJson TaskCore.Junior
276:           logSystemEvent eventType content = TaskCore.insertAgentEvent tid sessionId eventType content TaskCore.System
277: 
278:       -- Build Engine config with callbacks
279:       totalCostRef <- newIORef (0 :: Double)
280:       let quiet = Core.workerQuiet worker
281:           sayLog msg = if quiet then putText msg else AgentLog.log msg
282:           engineCfg =
283:             Engine.EngineConfig
284:               { Engine.engineLLM =
285:                   Engine.defaultLLM
286:                     { Engine.llmApiKey = Text.pack apiKey
287:                     },
288:                 Engine.engineOnCost = \tokens cost -> do
289:                   modifyIORef' totalCostRef (+ cost)
290:                   sayLog <| "Cost: " <> tshow cost <> " cents (" <> tshow tokens <> " tokens)"
291:                   logJuniorJson "Cost" (Aeson.object [("tokens", Aeson.toJSON tokens), ("cents", Aeson.toJSON cost)]),
292:                 Engine.engineOnActivity = \activity -> do
293:                   sayLog <| "[engine] " <> activity,
294:                 Engine.engineOnToolCall = \toolName args -> do
295:                   sayLog <| "[tool] " <> toolName
296:                   logJuniorEvent "ToolCall" (toolName <> ": " <> args),
297:                 Engine.engineOnAssistant = \msg -> do
298:                   sayLog <| "[assistant] " <> Text.take 200 msg
299:                   logJuniorEvent "Assistant" msg,
300:                 Engine.engineOnToolResult = \toolName success output -> do
301:                   let statusStr = if success then "ok" else "failed"
302:                   sayLog <| "[result] " <> toolName <> " (" <> statusStr <> "): " <> Text.take 100 output
303:                   logJuniorEvent "ToolResult" output,
304:                 Engine.engineOnComplete = do
305:                   sayLog "[engine] Complete"
306:                   logJuniorEvent "Complete" "",
307:                 Engine.engineOnError = \err -> do
308:                   sayLog <| "[error] " <> err
309:                   logJuniorEvent "Error" err,
310:                 Engine.engineOnGuardrail = \guardrailResult -> do
311:                   let guardrailMsg = formatGuardrailResult guardrailResult
312:                       contentJson = TE.decodeUtf8 (BSL.toStrict (Aeson.encode guardrailResult))
313:                   sayLog <| "[guardrail] " <> guardrailMsg
314:                   logSystemEvent "Guardrail" contentJson
315:               }
316: 
317:       -- Build Agent config with guardrails (scale cost by complexity)
318:       let baseCost = selectCostByComplexity (TaskCore.taskComplexity task)
319:           guardrails =
320:             Engine.Guardrails
321:               { Engine.guardrailMaxCostCents = baseCost,
322:                 Engine.guardrailMaxTokens = 2000000,
323:                 Engine.guardrailMaxDuplicateToolCalls = 30,
324:                 Engine.guardrailMaxTestFailures = 3,
325:                 Engine.guardrailMaxEditFailures = 5
326:               }
327:           agentCfg =
328:             Engine.AgentConfig
329:               { Engine.agentModel = model,
330:                 Engine.agentTools = Tools.allTools,
331:                 Engine.agentSystemPrompt = systemPrompt,
332:                 Engine.agentMaxIterations = 100,
333:                 Engine.agentGuardrails = guardrails
334:               }
335: 
336:       -- Run the agent
337:       result <- Engine.runAgent engineCfg agentCfg userPrompt
338:       totalCost <- readIORef totalCostRef
339: 
340:       case result of
341:         Left err -> pure (Exit.ExitFailure 1, "Engine error: " <> err, totalCost)
342:         Right agentResult -> do
343:           let output = Engine.resultFinalMessage agentResult
344:           pure (Exit.ExitSuccess, output, totalCost)
345: 
346: -- | Build the base prompt for the agent
347: buildBasePrompt :: TaskCore.Task -> Text -> FilePath -> Text
348: buildBasePrompt task ns repo =
349:   "You are an autonomous Worker Agent.\n"
350:     <> "Your goal is to implement the following task:\n\n"
351:     <> formatTask task
352:     <> "\n\nCRITICAL INSTRUCTIONS:\n"
353:     <> "1. Read AGENTS.md first to understand the codebase conventions.\n"
354:     <> "2. Complete ONE logical change (e.g., update schema + call sites + tests).\n"
355:     <> "3. Run 'bild --test "
356:     <> ns
357:     <> "' ONCE after implementing.\n"
358:     <> "4. **CRITICAL**: If tests pass, STOP IMMEDIATELY. Do not verify, do not review, do not trace logic, do not search for usages. Just stop.\n"
359:     <> "5. If tests fail, fix the issue and run tests again.\n"
360:     <> "6. If tests fail 3 times on the same issue, STOP - the task will be marked for human review.\n"
361:     <> "7. Do NOT update task status or manage git - the worker handles that.\n"
362:     <> "8. After tests pass, ANY further tool calls are wasted money. The worker will commit your changes.\n\n"
363:     <> "AUTONOMOUS OPERATION (NO HUMAN IN LOOP):\n"
364:     <> "- You are running autonomously without human intervention\n"
365:     <> "- There is NO human to ask questions or get clarification from\n"
366:     <> "- Make reasonable decisions based on the task description\n"
367:     <> "- If something is truly ambiguous, implement the most straightforward interpretation\n"
368:     <> "- Guardrails will stop you if you exceed cost/token budgets or make repeated mistakes\n\n"
369:     <> "BUILD SYSTEM NOTES:\n"
370:     <> "- 'bild --test "
371:     <> ns
372:     <> "' tests ALL dependencies transitively - run it ONCE, not per-file\n"
373:     <> "- Do NOT run bild --test on individual files separately\n"
374:     <> "- Once tests pass, STOP IMMEDIATELY - no verification, no double-checking, no 'one more look'\n"
375:     <> "- Use 'lint --fix' for formatting issues (not hlint directly)\n\n"
376:     <> "EFFICIENCY REQUIREMENTS:\n"
377:     <> "- Do not repeat the same action multiple times\n"
378:     <> "- Do not re-run passing tests\n"
379:     <> "- Do not test files individually when namespace test covers them\n"
380:     <> "- Aim to complete the task in under 50 tool calls\n\n"
381:     <> "LARGE FILE HANDLING:\n"
382:     <> "- When reading large files (>500 lines), use line ranges to read only relevant sections\n"
383:     <> "- For edit_file, use minimal unique context - just enough lines to match uniquely\n"
384:     <> "- If edit_file fails with 'old_str not found', re-read the exact lines you need to edit\n"
385:     <> "- After 2-3 failed edits on the same file, STOP and reconsider your approach\n"
386:     <> "- Very large files (>2000 lines) may need refactoring - note this for human review\n\n"
387:     <> "Context:\n"
388:     <> "- Working directory: "
389:     <> Text.pack repo
390:     <> "\n"
391:     <> "- Namespace: "
392:     <> ns
393:     <> "\n"
394: 
395: -- | Build progress context prompt
396: buildProgressPrompt :: Maybe Text -> Text
397: buildProgressPrompt Nothing = ""
398: buildProgressPrompt (Just progress) =
399:   "\n\n## PROGRESS FROM PREVIOUS SESSIONS (from database)\n\n"
400:     <> "This task has been worked on before. Here are the checkpoint notes:\n\n"
401:     <> progress
402:     <> "\n\n"
403:     <> "IMPORTANT:\n"
404:     <> "- Review these checkpoints to understand what's already done\n"
405:     <> "- Do NOT repeat work that's already completed\n"
406:     <> "- If the task appears complete, verify tests pass and exit\n\n"
407: 
408: -- | Build retry context prompt
409: buildRetryPrompt :: Maybe TaskCore.RetryContext -> Text
410: buildRetryPrompt Nothing = ""
411: buildRetryPrompt (Just ctx) =
412:   "\n\n## RETRY CONTEXT (IMPORTANT)\n\n"
413:     <> "This task was previously attempted but failed. Attempt: "
414:     <> tshow (TaskCore.retryAttempt ctx)
415:     <> "/3\n"
416:     <> "Reason: "
417:     <> TaskCore.retryReason ctx
418:     <> "\n\n"
419:     <> ( if null (TaskCore.retryConflictFiles ctx)
420:            then ""
421:            else
422:              "Conflicting files from previous attempt:\n"
423:                <> Text.unlines (map ("  - " <>) (TaskCore.retryConflictFiles ctx))
424:                <> "\n"
425:        )
426:     <> "Original commit: "
427:     <> TaskCore.retryOriginalCommit ctx
428:     <> "\n\n"
429:     <> maybe "" (\notes -> "## HUMAN NOTES/GUIDANCE\n\n" <> notes <> "\n\n") (TaskCore.retryNotes ctx)
430:     <> "INSTRUCTIONS FOR RETRY:\n"
431:     <> "- The codebase has changed since your last attempt\n"
432:     <> "- Re-implement this task on top of the CURRENT codebase\n"
433:     <> "- If there were merge conflicts, the conflicting files may have been modified by others\n"
434:     <> "- Review the current state of those files before making changes\n"
435: 
436: -- | Select model based on task complexity (1-5 scale)
437: -- Uses OpenRouter model identifiers for Claude models
438: selectModel :: TaskCore.Task -> Text
439: selectModel task = selectModelByComplexity (TaskCore.taskComplexity task)
440: 
441: -- | Select model based on complexity level
442: selectModelByComplexity :: Maybe Int -> Text
443: selectModelByComplexity Nothing = "anthropic/claude-sonnet-4.5"
444: selectModelByComplexity (Just 1) = "anthropic/claude-haiku-4.5"
445: selectModelByComplexity (Just 2) = "anthropic/claude-haiku-4.5"
446: selectModelByComplexity (Just 3) = "anthropic/claude-sonnet-4.5"
447: selectModelByComplexity (Just 4) = "anthropic/claude-sonnet-4.5"
448: selectModelByComplexity (Just 5) = "anthropic/claude-opus-4.5"
449: selectModelByComplexity (Just _) = "anthropic/claude-sonnet-4.5"
450: 
451: -- | Select cost guardrail based on complexity level (in cents)
452: -- Lower complexity = lower budget, higher complexity = more room for iteration
453: selectCostByComplexity :: Maybe Int -> Double
454: selectCostByComplexity Nothing = 200.0
455: selectCostByComplexity (Just 1) = 50.0
456: selectCostByComplexity (Just 2) = 100.0
457: selectCostByComplexity (Just 3) = 200.0
458: selectCostByComplexity (Just 4) = 400.0
459: selectCostByComplexity (Just 5) = 600.0
460: selectCostByComplexity (Just _) = 200.0
461: 
462: formatTask :: TaskCore.Task -> Text
463: formatTask t =
464:   "Task: "
465:     <> TaskCore.taskId t
466:     <> "\n"
467:     <> "Title:      "
468:     <> TaskCore.taskTitle t
469:     <> "\n"
470:     <> "Type:       "
471:     <> Text.pack (show (TaskCore.taskType t))
472:     <> "\n"
473:     <> "Status:     "
474:     <> Text.pack (show (TaskCore.taskStatus t))
475:     <> "\n"
476:     <> "Priority:   "
477:     <> Text.pack (show (TaskCore.taskPriority t))
478:     <> "\n"
479:     <> maybe "" (\p -> "Parent:     " <> p <> "\n") (TaskCore.taskParent t)
480:     <> maybe "" (\ns -> "Namespace:  " <> ns <> "\n") (TaskCore.taskNamespace t)
481:     <> "Created:    "
482:     <> Text.pack (show (TaskCore.taskCreatedAt t))
483:     <> "\n"
484:     <> "Updated:    "
485:     <> Text.pack (show (TaskCore.taskUpdatedAt t))
486:     <> "\n"
487:     <> (if Text.null (TaskCore.taskDescription t) then "" else "Description:\n" <> TaskCore.taskDescription t <> "\n\n")
488:     <> formatDeps (TaskCore.taskDependencies t)
489:     <> formatComments (TaskCore.taskComments t)
490:   where
491:     formatDeps [] = ""
492:     formatDeps deps = "\nDependencies:\n" <> Text.unlines (map formatDep deps)
493:     formatDep dep = "  - " <> TaskCore.depId dep <> " [" <> Text.pack (show (TaskCore.depType dep)) <> "]"
494:     formatComments [] = ""
495:     formatComments cs = "\nComments/Notes:\n" <> Text.unlines (map formatComment cs)
496:     formatComment c = "  [" <> Text.pack (show (TaskCore.commentCreatedAt c)) <> "] " <> TaskCore.commentText c
497: 
498: formatCommitMessage :: TaskCore.Task -> Text -> Text
499: formatCommitMessage task agentOutput =
500:   let tid = TaskCore.taskId task
501:       subject = cleanSubject (TaskCore.taskTitle task)
502:       body = cleanBody agentOutput
503:    in if Text.null body
504:         then subject <> "\n\nTask-Id: " <> tid
505:         else subject <> "\n\n" <> body <> "\n\nTask-Id: " <> tid
506:   where
507:     cleanSubject s =
508:       let trailingPunct = ['.', ':', '!', '?', ',', ';', ' ', '-']
509:           stripped = Text.dropWhileEnd (`elem` trailingPunct) s
510:           truncated = Text.take 72 stripped
511:           noPunct = Text.dropWhileEnd (`elem` trailingPunct) truncated
512:           capitalized = case Text.uncons noPunct of
513:             Just (c, rest) -> Text.cons (toUpper c) rest
514:             Nothing -> noPunct
515:        in capitalized
516: 
517:     cleanBody :: Text -> Text
518:     cleanBody output =
519:       let stripped = Text.strip output
520:        in if Text.null stripped
521:             then ""
522:             else
523:               let lns = Text.lines stripped
524:                   cleaned = [Text.take 72 ln | ln <- lns]
525:                in Text.intercalate "\n" cleaned
526: 
527: -- | Get facts relevant to a task based on namespace/project
528: getRelevantFacts :: TaskCore.Task -> IO [TaskCore.Fact]
529: getRelevantFacts task = do
530:   let namespace = fromMaybe "Omni" (TaskCore.taskNamespace task)
531:   projectFacts <- Fact.getFactsByProject namespace
532:   let sorted = List.sortBy (comparing (Down <. TaskCore.factConfidence)) projectFacts
533:   pure (take 10 sorted)
534: 
535: -- | Format facts for inclusion in the prompt
536: formatFacts :: [TaskCore.Fact] -> Text
537: formatFacts [] = ""
538: formatFacts facts =
539:   Text.unlines
540:     [ "\n\nKNOWLEDGE BASE FACTS:",
541:       "(These are learned patterns/conventions from previous work)",
542:       ""
543:     ]
544:     <> Text.unlines (map formatFact facts)
545: 
546: -- | Format a single fact for the prompt
547: formatFact :: TaskCore.Fact -> Text
548: formatFact f =
549:   "- "
550:     <> TaskCore.factContent f
551:     <> ( if null (TaskCore.factRelatedFiles f)
552:            then ""
553:            else " [" <> Text.intercalate ", " (TaskCore.factRelatedFiles f) <> "]"
554:        )
💭Thought[engineer]3 months ago

Now I understand the flow. When runAgent returns Left err (which includes guardrail violations), the Worker currently treats it as a failure and sets the task back to InProgress. According to the task description and comment, when a guardrail fires, the task should be set to NeedsHelp status with a comment explaining what happened.

Let me check what the error messages look like from guardrails:

ReadOmni/Agent/Engine.hs @721-738
18 lines
721: findGuardrailViolation :: Guardrails -> Double -> Int -> Map.Map Text Int -> Int -> Int -> Maybe (GuardrailResult, Text)
722: findGuardrailViolation g cost tokens toolCallCounts testFailures editFailures =
723:   case checkCostGuardrail g cost of
724:     r@(GuardrailCostExceeded actual limit) ->
725:       Just (r, "Guardrail: cost budget exceeded (" <> tshow actual <> "/" <> tshow limit <> " cents)")
726:     _ -> case checkTokenGuardrail g tokens of
727:       r@(GuardrailTokensExceeded actual limit) ->
728:         Just (r, "Guardrail: token budget exceeded (" <> tshow actual <> "/" <> tshow limit <> " tokens)")
729:       _ -> case checkDuplicateGuardrail g toolCallCounts of
730:         r@(GuardrailDuplicateToolCalls tool count) ->
731:           Just (r, "Guardrail: duplicate tool calls (" <> tool <> " called " <> tshow count <> " times)")
732:         _ -> case checkTestFailureGuardrail g testFailures of
733:           r@(GuardrailTestFailures count) ->
734:             Just (r, "Guardrail: too many test failures (" <> tshow count <> ")")
735:           _ -> case checkEditFailureGuardrail g editFailures of
736:             r@(GuardrailEditFailures count) ->
737:               Just (r, "Guardrail: too many edit_file failures (" <> tshow count <> " 'old_str not found' errors)")
738:             _ -> Nothing
💭Thought[engineer]3 months ago

Good! So when a guardrail fires, runAgent returns Left errMsg where errMsg starts with "Guardrail: ".

Now let me check the Worker code around line 340 where it handles the result:

ReadOmni/Agent/Worker.hs @336-344
336:       -- Run the agent
337:       result <- Engine.runAgent engineCfg agentCfg userPrompt
338:       totalCost <- readIORef totalCostRef
339: 
340:       case result of
341:         Left err -> pure (Exit.ExitFailure 1, "Engine error: " <> err, totalCost)
342:         Right agentResult -> do
343:           let output = Engine.resultFinalMessage agentResult
344:           pure (Exit.ExitSuccess, output, totalCost)
💭Thought[engineer]3 months ago

And then back in processTask:

ReadOmni/Agent/Worker.hs @118-178
61 lines
118:   case exitCode of
119:     Exit.ExitSuccess -> do
120:       TaskCore.logActivity tid TaskCore.Reviewing Nothing
121:       say "[worker] Running formatters..."
122:       _ <- runFormatters repo
123: 
124:       -- Try to commit (this runs git hooks which may fail)
125:       let commitMsg = formatCommitMessage task output
126:       say "[worker] Attempting commit..."
127:       commitResult <- tryCommit repo commitMsg
128: 
129:       case commitResult of
130:         CommitFailed commitErr -> do
131:           say ("[worker] Commit failed: " <> commitErr)
132: 
133:           -- Save failure context and reopen task for retry
134:           maybeCtx <- TaskCore.getRetryContext tid
135:           let attempt = maybe 1 (\c -> TaskCore.retryAttempt c + 1) maybeCtx
136: 
137:           if attempt > 3
138:             then do
139:               say "[worker] Task failed 3 times, needs human intervention"
140:               TaskCore.logActivity tid TaskCore.Failed (Just (toMetadata [("reason", "max_retries_exceeded")]))
141:               TaskCore.updateTaskStatusWithActor tid TaskCore.Open [] TaskCore.Junior
142:             else do
143:               let currentReason = "attempt " <> tshow attempt <> ": commit_failed: " <> commitErr
144:               let accumulatedReason = case maybeCtx of
145:                     Nothing -> currentReason
146:                     Just ctx -> TaskCore.retryReason ctx <> "\n" <> currentReason
147:               TaskCore.setRetryContext
148:                 TaskCore.RetryContext
149:                   { TaskCore.retryTaskId = tid,
150:                     TaskCore.retryOriginalCommit = "",
151:                     TaskCore.retryConflictFiles = [],
152:                     TaskCore.retryAttempt = attempt,
153:                     TaskCore.retryReason = accumulatedReason,
154:                     TaskCore.retryNotes = maybeCtx +> TaskCore.retryNotes
155:                   }
156:               TaskCore.logActivity tid TaskCore.Retrying (Just (toMetadata [("attempt", tshow attempt)]))
157:               TaskCore.updateTaskStatusWithActor tid TaskCore.Open [] TaskCore.Junior
158:               say ("[worker] Task reopened (attempt " <> tshow attempt <> "/3)")
159:         NoChanges -> do
160:           -- No changes = task already implemented, mark as Done
161:           say "[worker] No changes to commit - task already done"
162:           TaskCore.clearRetryContext tid
163:           TaskCore.logActivity tid TaskCore.Completed (Just (toMetadata [("result", "no_changes")]))
164:           TaskCore.updateTaskStatusWithActor tid TaskCore.Done [] TaskCore.Junior
165:           say ("[worker] ✓ Task " <> tid <> " -> Done (no changes)")
166:           unless quiet <| AgentLog.update (\s -> s {AgentLog.statusTask = Nothing})
167:         CommitSuccess -> do
168:           -- Commit succeeded, set to Review
169:           TaskCore.logActivity tid TaskCore.Completed (Just (toMetadata [("result", "committed")]))
170:           TaskCore.updateTaskStatusWithActor tid TaskCore.Review [] TaskCore.Junior
171:           say ("[worker] ✓ Task " <> tid <> " -> Review")
172:           unless quiet <| AgentLog.update (\s -> s {AgentLog.statusTask = Nothing})
173:     Exit.ExitFailure code -> do
174:       say ("[worker] Engine failed with code " <> tshow code)
175:       TaskCore.logActivity tid TaskCore.Failed (Just (toMetadata [("exit_code", tshow code)]))
176:       -- Don't set back to Open here - leave in InProgress for debugging
177:       say "[worker] Task left in InProgress (engine failure)"
178: 
💭Thought[engineer]3 months ago

I can see the issue. When Exit.ExitFailure occurs (line 173), the task is left in InProgress. But we need to distinguish between: 1. Guardrail violations → should set to NeedsHelp 2. Other engine errors → should stay in InProgress (current behavior)

The solution is to: 1. Change runWithEngine to return a distinguishable result for guardrail violations 2. Update processTask to handle guardrail violations by setting status to NeedsHelp with a comment

Let me check if there's a function to add comments to tasks:

GrepaddComment|insertComment
15 lines
./Omni/Task.hs:288:      updatedTask <- addComment tid message Human
./Omni/Task.hs:886:        updatedTask <- addComment (taskId task) "This is a test comment" Human
./Omni/Task.hs:895:        _ <- addComment (taskId task) "First comment" Junior
./Omni/Task.hs:896:        updatedTask <- addComment (taskId task) "Second comment" Human
./Omni/Task.hs:907:        _ <- addComment (taskId task) "Persisted comment" Junior
./Omni/Task/Core.hs:746:addComment :: Text -> Text -> CommentAuthor -> IO Task
./Omni/Task/Core.hs:747:addComment tid commentTextContent author =
./Omni/Jr.hs:235:  _ <- TaskCore.addComment tid conflictComment TaskCore.Junior
./Omni/Jr.hs:422:      _ <- TaskCore.addComment tid reviewComment TaskCore.Junior
./Omni/Jr.hs:437:      _ <- TaskCore.addComment tid reviewComment TaskCore.Junior
./Omni/Jr.hs:503:          _ <- TaskCore.addComment tid acceptComment TaskCore.Human
./Omni/Jr.hs:515:          _ <- TaskCore.addComment tid rejectComment TaskCore.Human
./Omni/Jr.hs:623:              _ <- TaskCore.addComment tid ("## Completion Summary\n\n" <> summary) TaskCore.Junior
./Omni/Jr.hs:756:          _ <- TaskCore.addComment epicId summary TaskCore.Junior
./Omni/Jr/Web.hs:2979:      _ <- liftIO (TaskCore.addComment tid commentText TaskCore.Human)
💭Thought[engineer]3 months ago

Perfect! There's TaskCore.addComment :: Text -> Text -> CommentAuthor -> IO Task. Now I need to modify the code. Let me create a clear plan:

1. Change runWithEngine return type to distinguish guardrail violations from other errors 2. Update processTask to check if error is a guardrail violation and handle it appropriately 3. Add a comment with the guardrail message and set status to NeedsHelp

Let me start implementing:

EditOmni/Agent/Worker.hs
EditOmni/Agent/Worker.hs
💭Thought[engineer]3 months ago

Now I need to update the caller in processTask:

EditOmni/Agent/Worker.hs
💭Thought[engineer]3 months ago

Now I need to handle the rest of the case branches:

ReadOmni/Agent/Worker.hs @109-178
70 lines
109:   say "[worker] Starting engine..."
110:   engineResult <- runWithEngine worker repo task
111: 
112:   endTime <- Data.Time.getCurrentTime
113: 
114:   -- Update the activity record with metrics (convert Double to Int by rounding)
115:   let costCents = case engineResult of
116:         EngineSuccess _ c -> c
117:         EngineGuardrailViolation _ c -> c
118:         EngineError _ c -> c
119:   TaskCore.updateActivityMetrics activityId Nothing (Just endTime) (Just (round costCents)) Nothing
120: 
121:   case engineResult of
122:     EngineSuccess output _ -> do
123:       say "[worker] Agent completed successfully"
124:       TaskCore.logActivity tid TaskCore.Reviewing Nothing
125:       say "[worker] Running formatters..."
126:       _ <- runFormatters repo
127: 
128:       -- Try to commit (this runs git hooks which may fail)
129:       let commitMsg = formatCommitMessage task output
130:       say "[worker] Attempting commit..."
131:       commitResult <- tryCommit repo commitMsg
132: 
133:       case commitResult of
134:         CommitFailed commitErr -> do
135:           say ("[worker] Commit failed: " <> commitErr)
136: 
137:           -- Save failure context and reopen task for retry
138:           maybeCtx <- TaskCore.getRetryContext tid
139:           let attempt = maybe 1 (\c -> TaskCore.retryAttempt c + 1) maybeCtx
140: 
141:           if attempt > 3
142:             then do
143:               say "[worker] Task failed 3 times, needs human intervention"
144:               TaskCore.logActivity tid TaskCore.Failed (Just (toMetadata [("reason", "max_retries_exceeded")]))
145:               TaskCore.updateTaskStatusWithActor tid TaskCore.Open [] TaskCore.Junior
146:             else do
147:               let currentReason = "attempt " <> tshow attempt <> ": commit_failed: " <> commitErr
148:               let accumulatedReason = case maybeCtx of
149:                     Nothing -> currentReason
150:                     Just ctx -> TaskCore.retryReason ctx <> "\n" <> currentReason
151:               TaskCore.setRetryContext
152:                 TaskCore.RetryContext
153:                   { TaskCore.retryTaskId = tid,
154:                     TaskCore.retryOriginalCommit = "",
155:                     TaskCore.retryConflictFiles = [],
156:                     TaskCore.retryAttempt = attempt,
157:                     TaskCore.retryReason = accumulatedReason,
158:                     TaskCore.retryNotes = maybeCtx +> TaskCore.retryNotes
159:                   }
160:               TaskCore.logActivity tid TaskCore.Retrying (Just (toMetadata [("attempt", tshow attempt)]))
161:               TaskCore.updateTaskStatusWithActor tid TaskCore.Open [] TaskCore.Junior
162:               say ("[worker] Task reopened (attempt " <> tshow attempt <> "/3)")
163:         NoChanges -> do
164:           -- No changes = task already implemented, mark as Done
165:           say "[worker] No changes to commit - task already done"
166:           TaskCore.clearRetryContext tid
167:           TaskCore.logActivity tid TaskCore.Completed (Just (toMetadata [("result", "no_changes")]))
168:           TaskCore.updateTaskStatusWithActor tid TaskCore.Done [] TaskCore.Junior
169:           say ("[worker] ✓ Task " <> tid <> " -> Done (no changes)")
170:           unless quiet <| AgentLog.update (\s -> s {AgentLog.statusTask = Nothing})
171:         CommitSuccess -> do
172:           -- Commit succeeded, set to Review
173:           TaskCore.logActivity tid TaskCore.Completed (Just (toMetadata [("result", "committed")]))
174:           TaskCore.updateTaskStatusWithActor tid TaskCore.Review [] TaskCore.Junior
175:           say ("[worker] ✓ Task " <> tid <> " -> Review")
176:           unless quiet <| AgentLog.update (\s -> s {AgentLog.statusTask = Nothing})
177:     Exit.ExitFailure code -> do
178:       say ("[worker] Engine failed with code " <> tshow code)
EditOmni/Agent/Worker.hs
💭Thought[engineer]3 months ago

Let me re-read the exact text:

ReadOmni/Agent/Worker.hs @174-182
174:           TaskCore.updateTaskStatusWithActor tid TaskCore.Review [] TaskCore.Junior
175:           say ("[worker] ✓ Task " <> tid <> " -> Review")
176:           unless quiet <| AgentLog.update (\s -> s {AgentLog.statusTask = Nothing})
177:     Exit.ExitFailure code -> do
178:       say ("[worker] Engine failed with code " <> tshow code)
179:       TaskCore.logActivity tid TaskCore.Failed (Just (toMetadata [("exit_code", tshow code)]))
180:       -- Don't set back to Open here - leave in InProgress for debugging
181:       say "[worker] Task left in InProgress (engine failure)"
182: 
EditOmni/Agent/Worker.hs
💭Thought[engineer]3 months ago

Let me read more context:

ReadOmni/Agent/Worker.hs @171-182
12 lines
171:         CommitSuccess -> do
172:           -- Commit succeeded, set to Review
173:           TaskCore.logActivity tid TaskCore.Completed (Just (toMetadata [("result", "committed")]))
174:           TaskCore.updateTaskStatusWithActor tid TaskCore.Review [] TaskCore.Junior
175:           say ("[worker] ✓ Task " <> tid <> " -> Review")
176:           unless quiet <| AgentLog.update (\s -> s {AgentLog.statusTask = Nothing})
177:     Exit.ExitFailure code -> do
178:       say ("[worker] Engine failed with code " <> tshow code)
179:       TaskCore.logActivity tid TaskCore.Failed (Just (toMetadata [("exit_code", tshow code)]))
180:       -- Don't set back to Open here - leave in InProgress for debugging
181:       say "[worker] Task left in InProgress (engine failure)"
182: 
💭Thought[engineer]3 months ago

I see - the issue is that Exit.ExitFailure is from the old code. Let me try with the exact string including escaping:

EditOmni/Agent/Worker.hs
💭Thought[engineer]3 months ago

Let me check the special characters more carefully:

ϟsed -n '175,181p' Omni/Agent/Worker.hs | cat -A
          say ("[worker] M-bM-^\M-^S Task " <> tid <> " -> Review")$
          unless quiet <| AgentLog.update (\s -> s {AgentLog.statusTask = Nothing})$
    Exit.ExitFailure code -> do$
      say ("[worker] Engine failed with code " <> tshow code)$
      TaskCore.logActivity tid TaskCore.Failed (Just (toMetadata [("exit_code", tshow code)]))$
      -- Don't set back to Open here - leave in InProgress for debugging$
      say "[worker] Task left in InProgress (engine failure)"$
💭Thought[engineer]3 months ago

There's a special character. Let me just replace from Exit.ExitFailure onwards:

EditOmni/Agent/Worker.hs
💭Thought[engineer]3 months ago

Good! Now I need to also check if there's a Nothing -> ... case for the API key handling that I need to update:

ReadOmni/Agent/Worker.hs @218-222
218:         Exit.ExitFailure 1 -> do
219:           -- There are changes, commit them
220:           let commitCmd = (Process.proc "git" ["commit", "-m", Text.unpack msg]) {Process.cwd = Just repo}
221:           (commitCode, _, commitErr) <- Process.readCreateProcessWithExitCode commitCmd ""
222:           case commitCode of
💭Thought[engineer]3 months ago

Let me find the right place:

ReadOmni/Agent/Worker.hs @228-235
228:   = EngineSuccess Text Double -- output, cost
229:   | EngineGuardrailViolation Text Double -- error message, cost
230:   | EngineError Text Double -- error message, cost
231: 
232: -- | Run task using native Engine
233: -- Returns engine result with output/error and cost
234: runWithEngine :: Core.Worker -> FilePath -> TaskCore.Task -> IO EngineResult
235: runWithEngine worker repo task = do
ReadOmni/Agent/Worker.hs @235-242
235: runWithEngine worker repo task = do
236:   -- Read API key from environment
237:   maybeApiKey <- Env.lookupEnv "OPENROUTER_API_KEY"
238:   case maybeApiKey of
239:     Nothing -> pure (Exit.ExitFailure 1, "OPENROUTER_API_KEY not set", 0)
240:     Just apiKey -> do
241:       -- Check for retry context
242:       maybeRetry <- TaskCore.getRetryContext (TaskCore.taskId task)
⚠️Guardrail[system]3 months ago
{"contents":[208.02210000000002,200],"tag":"GuardrailCostExceeded"}
Error[engineer]3 months ago
{"exit_code":"1"}
🔄[human]InProgress → Done3 months ago