Theme
lightdark
Dataset ·The Bill Simmons Podcast

About The Footnote

The Footnote is a pundit accountability project. We transcribe podcast episodes, identify every entity mentioned, and surface patterns in what pundits talk about and when.

How it works

The Footnote was constructed by:

  1. Transcribing these podcast episodes
  2. Diarizing these podcast episodes (attributing the speaker to each chunk of text)
  3. Flagging all proper nouns
  4. Associating each proper noun with its appropriate “entity” (LeBron the “alias” → LeBron James the “entity”)

Note: We are currently counting only instances of the proper noun itself, not pronoun references. If the speaker says “Michael Jordan is the GOAT, he just is,” that counts as one instance of Michael Jordan — not two.

Known weaknesses

  1. Alias → Entity associations
    • We have a default entity resolver that handles full proper nouns, then makes rulings on substrings. For example, “Jamal Murray” will always resolve to the entity Jamal Murray, but just “Murray” could resolve to Jamal Murray, Dejounte Murray, Andy Murray, etc. There is a default behavior set up to handle this but it leaves major gaps in entity resolution. We have begun writing per-alias resolution rules, but have only finished those for 32 entities.
    • Our long-term goal is to add per-alias rules for all aliases that have created collisions.
  2. Diarization
    • Our diarization is currently two steps: first, our transcriber diarizes the text as it transcribes, outputting “Speaker 1 / Speaker 2 / etc.” Then we supply a few chunks of text from each speaker — along with the speaker list for the episode — to an LLM and ask it to derive the likely name for each speaker label.
    • This leaves a couple of gaps:
      • Our transcriber sometimes creates speaker changes at incorrect places, breaking a single speaker into two or combining two different speakers into one.
      • Our LLM speaker-assigner could be wrong either because it made a mistake or because it received incorrectly diarized speech in the first place.
    • Our long-term goal is to add a diarization audit into our workflow.
  3. Ads
    • Ads are not yet separated out, so certain entities are inflated (The Ringer, for instance).

Currently tracking

The Bill Simmons Podcast — the first ever podcast, probably. A reliable source of NBA, NFL, and broader cultural takes, predictions, and reflections. A consistently good hang for 4–6 hours per week.

thefootnote.fm · v0.1 · data updated April 10, 2026