Notify owner on LLM provider failover

t-771·WorkTask·
·
·
Created1 week ago·Updated1 week ago·pipeline runs →

Description

Edit

When the failover chain switches from the primary provider (claude-oauth) to a backup (openrouter), send a short Telegram message to the owner so they know to investigate and re-login.

What to change

Omni/Agent/Provider.hs

1. Add an optional callback field to ProviderChain: haskell data ProviderChain = ProviderChain { pcProviders :: [NamedProvider], pcCooldownSeconds :: Int, pcOnFailover :: Maybe (Text -> IO ()) -- called with fallback provider name when primary fails }

2. Update mkProviderChain / mkProviderChainWithCooldown to default pcOnFailover = Nothing.

3. In tryAll inside chatWithFallbackUsage, after primary fails and we move to the next provider, fire the callback: haskell tryAll (np : rest) = do result <- tryProvider np case result of Right r -> pure (Right r) Left _ -> do -- If this was the primary and we're about to fall back, fire the callback let isPrimary = case pcProviders chain of (primary : _) -> npName np == npName primary [] -> False when isPrimary $ forM_ (pcOnFailover chain) ($ npName np) tryAll rest (The callback is passed the *failing* provider's name so the message can say 'claude-oauth failed, using openrouter'.)

Omni/Ava/Telegram/Bot.hs

In both places where Provider.mkStandardChain is called (heartbeat around line 1032, message handler around line 1791), add an onFailover callback to the chain *after* it's created. Since mkStandardChain returns a Provider, unwrap the Chain constructor and set the callback.

Alternatively, add an optional callback parameter to mkStandardChain:

mkStandardChain :: Text -> Maybe Text -> Text -> Maybe (Text -> IO ()) -> IO Provider

The callback to pass from Bot.hs should send a Telegram notification to the owner:

let failoverNotify failingProvider = do
      forM_ (Types.tgOwnerUserId tgConfig) $ \ownerId ->
        Bot.sendMessage tgConfig ownerId
          ("⚠️ LLM failover: " <> failingProvider <> " failed, switching to backup. Re-login with /login if needed.")

Wire this into both the heartbeat path and the message handler path.

Acceptance criteria

  • When claude-oauth fails and openrouter takes over, a Telegram message is sent to tgOwnerUserId
  • If OpenRouter is the first/only provider and it fails, no notification is sent (nothing to fall back to)
  • If there is no owner configured, no crash (safe forM_ mOwnerId)
  • Compiles and existing tests pass

Notes

  • Use Task-Id: t-771 in the commit trailer
  • Commit message: ava: notify owner on LLM provider failover
  • Move task to Review when done

Timeline (0)

No activity yet.