Parse Reddit

Extract structured data from Reddit posts and comments using Reddit’s JSON API.

Process

  1. Get Reddit URL - Ensure URL points to a specific post
  2. Add JSON suffix - Append .json to any Reddit URL
  3. Fetch with User-Agent - Use curl with proper User-Agent header
  4. Parse JSON structure - Navigate Reddit’s nested JSON format
  5. Extract post data - Get title, author, score, body, timestamp
  6. Extract comments - Parse comment tree for top replies

Examples

Given a Reddit URL, fetch and parse the JSON to extract:

How to Fetch

Append .json to any Reddit URL:

curl -s -H "User-Agent: agentbot/1.0" "https://www.reddit.com/r/SUBREDDIT/comments/ID/TITLE.json"

JSON Structure

The response is an array with 2 elements:

  1. [0].data.children[0].data - The post
  2. [1].data.children[] - The comments

Post Fields

Comment Fields

Example

Input URL: https://www.reddit.com/r/productivity/comments/1jf439v/email_newsletter_texttospeech_app/

curl -s -H "User-Agent: agentbot/1.0" \
  "https://www.reddit.com/r/productivity/comments/1jf439v/email_newsletter_texttospeech_app.json" \
  | jq '{
    author: .[0].data.children[0].data.author,
    title: .[0].data.children[0].data.title,
    body: .[0].data.children[0].data.selftext,
    score: .[0].data.children[0].data.score,
    num_comments: .[0].data.children[0].data.num_comments,
    created: .[0].data.children[0].data.created_utc,
    subreddit: .[0].data.children[0].data.subreddit,
    comments: [.[1].data.children[:5][] | .data | {author, body, score}]
  }'

Output Format

Return structured data:

## Reddit Post

**u/username** in r/subreddit (DATE, SCORE points, N comments)

> Post body text here...

### Top Comments

1. **u/commenter1** (SCORE points)
   > Comment text...

2. **u/commenter2** (SCORE points)
   > Comment text...

Date Conversion

Convert Unix timestamp to readable date:

date -d @1742409172  # Returns: Wed Mar 19 2025

Or in jq:

jq '.created_utc | strftime("%Y-%m-%d")'

Rate Limiting

Reddit may rate limit. If you get errors: