Parse Hacker News

Extract structured data from Hacker News posts and comments using the Firebase API.

Process

Extract item ID - Parse ID from HN URL or use given ID
Fetch item data - Call Firebase API to get story/comment JSON
Parse main item - Extract title, author, score, text, timestamp
Fetch comments - Get top-level comments from kids array
Parse comments - Extract comment text, authors, timestamps
Format output - Structure data as readable text or JSON

Examples

Given a HN URL or item ID, fetch and parse to extract:

Post author, title, text, score, date
Top comments with authors and content

API Endpoints

Base URL: https://hacker-news.firebaseio.com/v0

# Get a story or comment by ID
curl -s "https://hacker-news.firebaseio.com/v0/item/ITEM_ID.json"

Extract Item ID from URL

HN URLs look like:

https://news.ycombinator.com/item?id=41824973

Extract the ID after id=.

JSON Structure

Story Fields

by - Username
title - Post title
text - Post body (for Ask HN, Show HN, etc.)
url - For link posts, the external URL
score - Points
descendants - Total comment count
time - Unix timestamp
kids - Array of comment IDs (top-level only)
type - “story”, “comment”, “job”, etc.

Comment Fields

by - Username
text - Comment body (HTML encoded)
time - Unix timestamp
parent - Parent item ID
kids - Array of reply IDs

Example Commands

Fetch a story with basic info:

curl -s "https://hacker-news.firebaseio.com/v0/item/41824973.json" | jq '{
  author: .by,
  title: .title,
  text: .text,
  score: .score,
  comments: .descendants,
  time: .time,
  url: .url
}'

Fetch top 3 comments:

# First get the story to find comment IDs
KIDS=$(curl -s "https://hacker-news.firebaseio.com/v0/item/41824973.json" | jq -r '.kids[:3][]')

# Then fetch each comment
for kid in $KIDS; do
  curl -s "https://hacker-news.firebaseio.com/v0/item/$kid.json" | jq '{author: .by, text: .text}'
done

One-liner to get story + comments:

ID=41824973
echo "=== STORY ===" && \
curl -s "https://hacker-news.firebaseio.com/v0/item/$ID.json" | jq '{author: .by, title, text, score, time}' && \
echo "=== COMMENTS ===" && \
for kid in $(curl -s "https://hacker-news.firebaseio.com/v0/item/$ID.json" | jq -r '.kids[:3] // [] | .[]'); do
  curl -s "https://hacker-news.firebaseio.com/v0/item/$kid.json" | jq '{author: .by, text}'
done

Output Format

## Hacker News Post

**username** (DATE, SCORE points, N comments)

Title: POST TITLE

> Post text here (if any)...

URL: https://external-link.com (if link post)

### Top Comments

1. **commenter1**
   > Comment text...

2. **commenter2**
   > Comment text...

Date Conversion

# Convert Unix timestamp
date -d @1728790511  # Returns: Sun Oct 13 2024

HTML Decoding

HN text contains HTML entities. Common ones:

' → '
/ → /
" → "
& → &
<p> → paragraph break
<i>...</i> → italics

Use sed or handle in output:

| sed 's/<p>/\n\n/g; s/<[^>]*>//g; s/&#x27;/'"'"'/g; s/&#x2F;/\//g'

Rate Limiting

The Firebase API is generous but:

Don’t hammer it with parallel requests
Add small delays if fetching many items