Data Extraction

Process

  1. Identify the source (URL or file path)
  2. Determine what data to extract (fields, format)
  3. Read the source content
  4. Extract into structured format (JSON, table, etc.)

Guidelines

Output Format

Return extracted data as JSON:

{
  "source": "url or file path",
  "extracted_at": "ISO timestamp",
  "data": {
    "field1": "value1",
    "field2": ["list", "values"],
    "field3": null
  }
}

Example: Extract Contact Info

Input: “Extract contact information from https://example.com/about

Output:

{
  "source": "https://example.com/about",
  "extracted_at": "2024-01-15T10:30:00Z",
  "data": {
    "name": "Example Corp",
    "email": "contact@example.com",
    "phone": "+1-555-0123",
    "address": "123 Main St, City, ST 12345"
  }
}