Data Extraction
Process
- Identify the source (URL or file path)
- Determine what data to extract (fields, format)
- Read the source content
- Extract into structured format (JSON, table, etc.)
Guidelines
- Be precise about what fields to extract
- Handle missing data gracefully (use null)
- Validate extracted data types
- Include metadata about extraction (source, timestamp)
Output Format
Return extracted data as JSON:
{
"source": "url or file path",
"extracted_at": "ISO timestamp",
"data": {
"field1": "value1",
"field2": ["list", "values"],
"field3": null
}
}
Example: Extract Contact Info
Input: “Extract contact information from https://example.com/about”
Output:
{
"source": "https://example.com/about",
"extracted_at": "2024-01-15T10:30:00Z",
"data": {
"name": "Example Corp",
"email": "contact@example.com",
"phone": "+1-555-0123",
"address": "123 Main St, City, ST 12345"
}
}