Data Extraction

Process

Identify the source (URL or file path)
Determine what data to extract (fields, format)
Read the source content
Extract into structured format (JSON, table, etc.)

Guidelines

Be precise about what fields to extract
Handle missing data gracefully (use null)
Validate extracted data types
Include metadata about extraction (source, timestamp)

Output Format

Return extracted data as JSON:

{
  "source": "url or file path",
  "extracted_at": "ISO timestamp",
  "data": {
    "field1": "value1",
    "field2": ["list", "values"],
    "field3": null
  }
}

Example: Extract Contact Info

Input: “Extract contact information from https://example.com/about”

Output:

{
  "source": "https://example.com/about",
  "extracted_at": "2024-01-15T10:30:00Z",
  "data": {
    "name": "Example Corp",
    "email": "contact@example.com",
    "phone": "+1-555-0123",
    "address": "123 Main St, City, ST 12345"
  }
}