Parse Hacker News

Extract structured data from Hacker News posts and comments using the Firebase API.

Process

  1. Extract item ID - Parse ID from HN URL or use given ID
  2. Fetch item data - Call Firebase API to get story/comment JSON
  3. Parse main item - Extract title, author, score, text, timestamp
  4. Fetch comments - Get top-level comments from kids array
  5. Parse comments - Extract comment text, authors, timestamps
  6. Format output - Structure data as readable text or JSON

Examples

Given a HN URL or item ID, fetch and parse to extract:

API Endpoints

Base URL: https://hacker-news.firebaseio.com/v0

# Get a story or comment by ID
curl -s "https://hacker-news.firebaseio.com/v0/item/ITEM_ID.json"

Extract Item ID from URL

HN URLs look like:

Extract the ID after id=.

JSON Structure

Story Fields

Comment Fields

Example Commands

Fetch a story with basic info:

curl -s "https://hacker-news.firebaseio.com/v0/item/41824973.json" | jq '{
  author: .by,
  title: .title,
  text: .text,
  score: .score,
  comments: .descendants,
  time: .time,
  url: .url
}'

Fetch top 3 comments:

# First get the story to find comment IDs
KIDS=$(curl -s "https://hacker-news.firebaseio.com/v0/item/41824973.json" | jq -r '.kids[:3][]')

# Then fetch each comment
for kid in $KIDS; do
  curl -s "https://hacker-news.firebaseio.com/v0/item/$kid.json" | jq '{author: .by, text: .text}'
done

One-liner to get story + comments:

ID=41824973
echo "=== STORY ===" && \
curl -s "https://hacker-news.firebaseio.com/v0/item/$ID.json" | jq '{author: .by, title, text, score, time}' && \
echo "=== COMMENTS ===" && \
for kid in $(curl -s "https://hacker-news.firebaseio.com/v0/item/$ID.json" | jq -r '.kids[:3] // [] | .[]'); do
  curl -s "https://hacker-news.firebaseio.com/v0/item/$kid.json" | jq '{author: .by, text}'
done

Output Format

## Hacker News Post

**username** (DATE, SCORE points, N comments)

Title: POST TITLE

> Post text here (if any)...

URL: https://external-link.com (if link post)

### Top Comments

1. **commenter1**
   > Comment text...

2. **commenter2**
   > Comment text...

Date Conversion

# Convert Unix timestamp
date -d @1728790511  # Returns: Sun Oct 13 2024

HTML Decoding

HN text contains HTML entities. Common ones:

Use sed or handle in output:

| sed 's/<p>/\n\n/g; s/<[^>]*>//g; s/&#x27;/'"'"'/g; s/&#x2F;/\//g'

Rate Limiting

The Firebase API is generous but: