The Process
We take influential machine learning and systems papers and transform them into ~25-30 minute podcast episodes. Each episode features a conversational format with two hosts—Alex and Maya—who break down complex technical concepts into accessible explanations.
What Each Episode Covers
- The core problem and motivation behind the research
- Key technical contributions and innovations
- Practical implications and real-world usage
- Comparisons with alternative approaches
- Best practices and recommendations
Episode Format
Each paper folder in our repository contains:
script.md The full podcast transcript in markdown format podcast.m4a The audio file, compatible with iPhone and Android generate.py The Python script used to generate the audio README.md Paper metadata and regeneration instructions Technology
Our podcasts are generated using ElevenLabs text-to-speech with their turbo model for realistic, natural-sounding voices. We use two distinct voices for the hosts to create an engaging conversational dynamic.
Voice Configuration
- Alex — Adam voice (deep male, British accent)
- Maya — Rachel voice (clear female, American accent)
Regenerating Episodes
Each episode can be regenerated using the included Python script.
You'll need an ElevenLabs API key and ffmpeg installed.
# Set your API key
export ELEVENLABS_API_KEY="your-api-key"
# Install dependencies
pip install elevenlabs
# Generate the podcast
cd zhao-2023-pytorch-fsdp
python generate.py Contributing
Want to add a paper? The script format is straightforward—check out any
existing script.md file for the expected structure. The key
elements are:
- Speaker lines prefixed with
**ALEX:**or**MAYA:** - Section headers with timestamps (used for organization)
- Natural, conversational language that explains technical concepts
Listening Tips
AirDrop the .m4a file or save to Files app. Open with the built-in player or any podcast app.
Download the .m4a file and open with any media player. Most support the format natively.
Episodes are designed for walks. The conversational pace works well at 1x-1.25x speed.