Implement web_search and read_web_page tools

t-244·WorkTask·
·
·
·Omni/Agent/Tools.hs
Created3 months ago·Updated2 months ago

Description

Edit

Add web search and page reading tools so jr can look up documentation and examples.

Tools to Implement

1. web_search

Search the web and return results with titles, URLs, and snippets.

data WebSearchArgs = WebSearchArgs
  { wsQuery :: Text           -- Search query
  , wsMaxResults :: Maybe Int -- Max results (default: 5)
  }
  deriving (Show, Eq, Generic)

Implementation: Use DuckDuckGo HTML search (no API key needed):

curl -s "https://html.duckduckgo.com/html/?q=<query>" | extract results

Or use the ddgr command-line tool if available:

ddgr --json -n 5 "<query>"

Fallback: Use curl to fetch DuckDuckGo HTML and parse with simple regex/text processing.

Return format:

{
  "success": true,
  "output": "1. Title - URL\n   Snippet...\n\n2. Title - URL\n   Snippet..."
}

2. read_web_page

Fetch a URL and convert to readable text.

data ReadWebPageArgs = ReadWebPageArgs
  { rwpUrl :: Text              -- URL to fetch
  , rwpObjective :: Maybe Text  -- Optional: focus extraction on this goal
  }
  deriving (Show, Eq, Generic)

Implementation: 1. Fetch URL with curl: curl -sL "<url>" 2. Convert HTML to text. Options:

  • Use pandoc -f html -t plain (available in NixOS)
  • Use lynx -dump (simpler)
  • Use w3m -dump

3. If objective is provided, truncate to ~8000 chars to avoid context overflow 4. Return the text content

Tool Schemas

web_search

{
  "type": "object",
  "properties": {
    "query": { "type": "string", "description": "Search query" },
    "max_results": { "type": "integer", "description": "Maximum results (default: 5)" }
  },
  "required": ["query"]
}

read_web_page

{
  "type": "object",
  "properties": {
    "url": { "type": "string", "description": "URL to fetch and read" },
    "objective": { "type": "string", "description": "Optional: focus on content relevant to this goal" }
  },
  "required": ["url"]
}

Testing

1. Test web_search schema is valid 2. Test read_web_page with a simple URL like "https://example.com"

Notes

  • Prefer pandoc for HTML-to-text as it handles most sites well
  • Set a timeout on curl (10 seconds) to avoid hanging
  • Truncate very long pages to avoid context overflow
  • Add both tools to exports and allTools list
  • Update tool count test

Timeline (1)

🔄[human]Open → Done2 months ago