PDF Reading

When you receive a PDF file, you have two approaches depending on the content:

Text-heavy PDFs (papers, articles, contracts)

Use read_pdf tool or pdftotext via run_bash:

pdftotext -layout /path/to/file.pdf -

This extracts all text content. Best for documents that are primarily text.

Image-heavy PDFs (charts, diagrams, scanned pages, slides)

Render pages as images with pdftoppm, then view them:

# Render all pages as JPEGs (150 DPI is good for readability)
pdftoppm -jpeg -r 150 /path/to/file.pdf /tmp/pdf-page

# This creates /tmp/pdf-page-1.jpg, /tmp/pdf-page-2.jpg, etc.

# Render just specific pages
pdftoppm -jpeg -r 150 -f 1 -l 3 /path/to/file.pdf /tmp/pdf-page

Then use read_file on the resulting .jpg files to see them with your vision capabilities.

Deciding which approach to use

Quick check: is text extraction sufficient?

# Check if pdftotext gets meaningful content
pdftotext /path/to/file.pdf - | head -50
# If output is empty or garbled, render as images instead

Tips