How It Works

From paper to podcast in a few steps.

The Process

We take influential machine learning and systems papers and transform them into ~25-30 minute podcast episodes. Each episode features a conversational format with two hosts—Alex and Maya—who break down complex technical concepts into accessible explanations.

What Each Episode Covers

The core problem and motivation behind the research
Key technical contributions and innovations
Practical implications and real-world usage
Comparisons with alternative approaches
Best practices and recommendations

Episode Format

Each paper folder in our repository contains:

script.md The full podcast transcript in markdown format

podcast.m4a The audio file, compatible with iPhone and Android

generate.py The Python script used to generate the audio

README.md Paper metadata and regeneration instructions

Technology

Our podcasts are generated using ElevenLabs text-to-speech with their turbo model for realistic, natural-sounding voices. We use two distinct voices for the hosts to create an engaging conversational dynamic.

Voice Configuration

Alex — Adam voice (deep male, British accent)
Maya — Rachel voice (clear female, American accent)

Regenerating Episodes

Each episode can be regenerated using the included Python script. You'll need an ElevenLabs API key and ffmpeg installed.

# Set your API key
export ELEVENLABS_API_KEY="your-api-key"

# Install dependencies
pip install elevenlabs

# Generate the podcast
cd zhao-2023-pytorch-fsdp
python generate.py

Contributing

Want to add a paper? The script format is straightforward—check out any existing script.md file for the expected structure. The key elements are:

Speaker lines prefixed with **ALEX:** or **MAYA:**
Section headers with timestamps (used for organization)
Natural, conversational language that explains technical concepts

Listening Tips

🎧

On iPhone

AirDrop the .m4a file or save to Files app. Open with the built-in player or any podcast app.

📱

On Android

Download the .m4a file and open with any media player. Most support the format natively.

🚶

Best Experience

Episodes are designed for walks. The conversational pace works well at 1x-1.25x speed.