Add correct token usage tracking

t-369.22·WorkTask·
·
·
·Omni/Agent.hs
Parent:t-369·Created1 month ago·Updated1 month ago

Dependencies

Description

Edit

Replace estimated token counts with actual API-reported usage.

Context

The code-only spike estimates tokens:

-- Token tracking not available in simple chat - estimate 500 per call
tokens = 500

This is inaccurate. We need real token counts for:

  • Accurate cost tracking
  • Budget enforcement
  • Benchmarking

Current State

Provider.hs has Usage type:

data Usage = Usage
  { usagePromptTokens :: Int
  , usageCompletionTokens :: Int
  , usageTotalTokens :: Int
  , usageCost :: Maybe Double
  }

And chatWithUsage returns it:

chatWithUsage :: Provider -> [ToolApi] -> [Message] -> IO (Either Text ChatResult)

data ChatResult = ChatResult
  { chatMessage :: Message
  , chatUsage :: Maybe Usage
  }

But CodeOnly.hs uses plain chat which doesn't expose usage.

Deliverables

1. Update think to return usage

-- Before
think :: Provider -> Text -> Text -> IO (Text, Int)
think provider task context = do
  result <- Provider.chat provider [] messages
  -- estimates tokens = 500

-- After  
think :: Provider -> Text -> Text -> IO (Text, Usage)
think provider task context = do
  result <- Provider.chatWithUsage provider [] messages
  case result of
    Left err -> pure (err, emptyUsage)
    Right chatRes -> 
      let code = extractCode (Provider.msgContent (Provider.chatMessage chatRes))
          usage = fromMaybe emptyUsage (Provider.chatUsage chatRes)
      in pure (code, usage)

2. Track cumulative usage in agent loop

codeOnlyAgent :: ExperimentConfig -> Provider -> Text -> IO RunResult
codeOnlyAgent config provider task = do
  (output, iterations, totalUsage, history, mError) <- loop emptyUsage ...
  
  pure RunResult
    { rrTokensUsed = usageTotalTokens totalUsage
    , rrCostCents = fromMaybe (estimateCost ...) (usageCost totalUsage) * 100
    ...
    }

3. Update RunResult to include detailed usage

data RunResult = RunResult
  { ...
  , rrPromptTokens :: Int
  , rrCompletionTokens :: Int  
  , rrTotalTokens :: Int
  , rrApiReportedCost :: Maybe Double  -- from API if available
  , rrEstimatedCost :: Double          -- our estimate as fallback
  ...
  }

4. Update benchmarks to report real costs

The benchmark output should show actual API-reported costs, not estimates.

Testing

  • [ ] Token counts match API response
  • [ ] Cost tracking uses API cost when available
  • [ ] Falls back to estimate when API doesn't report cost
  • [ ] Cumulative tracking across iterations is correct

Files

  • Omni/Agent/Experiments/CodeOnly.hs (update)
  • Omni/Agent/Provider.hs (verify chatWithUsage works)

Timeline (2)

🔄[human]Open → InProgress1 month ago
🔄[human]InProgress → Done1 month ago