#!/usr/bin/env agent

YouTube Transcript

Download transcripts (auto-generated captions) from YouTube videos and convert them to readable text files. Useful for building knowledge bases from video content.

Related Skills

Input

Process

Step 1: Get yt-dlp

The system yt-dlp is often too old (YouTube frequently breaks older versions). Download the latest binary directly:

curl -sL https://github.com/yt-dlp/yt-dlp/releases/latest/download/yt-dlp -o /tmp/yt-dlp
chmod +x /tmp/yt-dlp
/tmp/yt-dlp --version

Step 2: Download subtitle data

YouTube blocks unauthenticated requests with bot detection. Use browser cookies and request storyboard format (-f sb0) to bypass the format selection error that occurs when video formats are unavailable:

/tmp/yt-dlp \
  --cookies-from-browser firefox \
  --write-auto-sub \
  --sub-lang en \
  --sub-format json3 \
  --skip-download \
  -o "%(title)s [%(id)s]" \
  -f "sb0" \
  "https://www.youtube.com/watch?v=VIDEO_ID"

Key flags explained:

If cookies fail (no browser available), try without — it works sometimes for popular videos. If that also fails, you need browser cookies.

Step 3: Convert json3 to readable text

The json3 files contain timestamped segments. Convert them with this Python script:

import json, os, re, glob

def format_ts(ms):
    s = ms / 1000.0
    h = int(s // 3600)
    m = int((s % 3600) // 60)
    sec = int(s % 60)
    return f'{h:02d}:{m:02d}:{sec:02d}' if h > 0 else f'{m:02d}:{sec:02d}'

for fp in sorted(glob.glob('*.json3')):
    with open(fp) as f:
        data = json.load(f)

    # Extract title and video ID from filename pattern "Title [videoId].en.json3"
    base = os.path.basename(fp)
    match = re.search(r'^(.*?)\s*\[([a-zA-Z0-9_-]+)\]\.en\.json3$', base)
    title = match.group(1).strip() if match else base
    vid_id = match.group(2) if match else ''

    entries = []
    for event in data.get('events', []):
        if 'segs' not in event:
            continue
        start = event.get('tStartMs', 0)
        text = ' '.join(
            seg.get('utf8', '').strip()
            for seg in event['segs']
            if seg.get('utf8', '').strip()
        )
        if text:
            entries.append((start, text))

    # Timestamped version (for line-number references)
    ts_name = fp.replace('.en.json3', '.ts.txt')
    with open(ts_name, 'w') as f:
        f.write(f'Title: {title}\n')
        f.write(f'URL: https://www.youtube.com/watch?v={vid_id}\n')
        f.write(f'{"="*60}\n\n')
        for ms, text in entries:
            f.write(f'[{format_ts(ms)}] {text}\n')

    # Plain text version (paragraphed, easy to read)
    plain_name = fp.replace('.en.json3', '.txt')
    with open(plain_name, 'w') as f:
        f.write(f'Title: {title}\n')
        f.write(f'URL: https://www.youtube.com/watch?v={vid_id}\n')
        f.write(f'{"="*60}\n\n')
        full = ' '.join(t for _, t in entries)
        full = re.sub(r'\s+', ' ', full).strip()
        words = full.split()
        line = []
        for w in words:
            line.append(w)
            if len(line) >= 80 and w.endswith(('.', '!', '?')):
                f.write(' '.join(line) + '\n\n')
                line = []
        if line:
            f.write(' '.join(line) + '\n')

    print(f'{title}: {len(entries)} segments, {os.path.getsize(plain_name):,} bytes')

Step 4: Rename files to short, typeable names

YouTube titles produce unwieldy filenames with unicode characters. Rename to short descriptive prefixes:

mv 'Long YouTube Title [videoId].en.json3' 'short-name.en.json3'
mv 'Long YouTube Title [videoId].ts.txt' 'short-name.ts.txt'
mv 'Long YouTube Title [videoId].txt' 'short-name.txt'

Step 5: Clean up

Remove the yt-dlp binary if you downloaded it to /tmp:

rm /tmp/yt-dlp

Output Format

Three files per video:

File Purpose
short-name.txt Plain text transcript, paragraphed. Good for reading.
short-name.ts.txt Timestamped line-by-line. Good for line-number references in outlines.
short-name.en.json3 Raw YouTube subtitle data. Keep as source of truth.

Batch Example

Download transcripts for 4 videos at once:

mkdir -p ~/output && cd ~/output

for vid in "VIDEO_ID_1" "VIDEO_ID_2" "VIDEO_ID_3" "VIDEO_ID_4"; do
  /tmp/yt-dlp \
    --cookies-from-browser firefox \
    --write-auto-sub --sub-lang en --sub-format json3 \
    --skip-download -o "%(title)s [%(id)s]" -f "sb0" \
    "https://www.youtube.com/watch?v=$vid"
done

python3 convert.py   # (the script from Step 3)

Edge Cases

Organizing Transcripts in ~/notes

After downloading, move transcripts into ~/notes/wiki/<topic>/ and create:

  1. A README.md in the subdirectory with a thematic outline referencing filename.ts.txt:LINE_NUMBER for each topic
  2. A wiki entry at ~/notes/wiki/Topic Name.md that links to the README with [[topic/README]] and connects to related notes