Parse Reddit
Extract structured data from Reddit posts and comments using Reddit’s JSON API.
Process
- Get Reddit URL - Ensure URL points to a specific post
- Add JSON suffix - Append
.jsonto any Reddit URL - Fetch with User-Agent - Use curl with proper User-Agent header
- Parse JSON structure - Navigate Reddit’s nested JSON format
- Extract post data - Get title, author, score, body, timestamp
- Extract comments - Parse comment tree for top replies
Examples
Given a Reddit URL, fetch and parse the JSON to extract:
- Post author, title, body, score, date
- Top comments with authors and content
How to Fetch
Append .json to any Reddit URL:
curl -s -H "User-Agent: agentbot/1.0" "https://www.reddit.com/r/SUBREDDIT/comments/ID/TITLE.json"
JSON Structure
The response is an array with 2 elements:
[0].data.children[0].data- The post[1].data.children[]- The comments
Post Fields
author- Username (without u/)title- Post titleselftext- Post body (for text posts)score- Upvotesnum_comments- Comment countcreated_utc- Unix timestampsubreddit- Subreddit name (without r/)permalink- Path to posturl- For link posts, the linked URL
Comment Fields
author- Usernamebody- Comment textscore- Upvotescreated_utc- Unix timestampdepth- Nesting level (0 = top-level)
Example
Input URL: https://www.reddit.com/r/productivity/comments/1jf439v/email_newsletter_texttospeech_app/
curl -s -H "User-Agent: agentbot/1.0" \
"https://www.reddit.com/r/productivity/comments/1jf439v/email_newsletter_texttospeech_app.json" \
| jq '{
author: .[0].data.children[0].data.author,
title: .[0].data.children[0].data.title,
body: .[0].data.children[0].data.selftext,
score: .[0].data.children[0].data.score,
num_comments: .[0].data.children[0].data.num_comments,
created: .[0].data.children[0].data.created_utc,
subreddit: .[0].data.children[0].data.subreddit,
comments: [.[1].data.children[:5][] | .data | {author, body, score}]
}'
Output Format
Return structured data:
## Reddit Post
**u/username** in r/subreddit (DATE, SCORE points, N comments)
> Post body text here...
### Top Comments
1. **u/commenter1** (SCORE points)
> Comment text...
2. **u/commenter2** (SCORE points)
> Comment text...
Date Conversion
Convert Unix timestamp to readable date:
date -d @1742409172 # Returns: Wed Mar 19 2025
Or in jq:
jq '.created_utc | strftime("%Y-%m-%d")'
Rate Limiting
Reddit may rate limit. If you get errors:
- Add delays between requests
- Use a descriptive User-Agent