ELO Ratings Questions 52 Weeks Of Cloud podcast

Key Argument

Thesis: Using ELO for AI agent evaluation = measuring noise
Problem: Wrong evaluators, wrong metrics, wrong assumptions
Solution: Quantitative assessment frameworks

The Comparison (00:00-02:00)

Chess ELO

FIDE arbiters: 120hr training
Binary outcome: win/loss
Test-retest: r=0.95
Cohen's κ=0.92

AI Agent ELO

Random users: Google engineer? CS student? 10-year-old?
Undefined dimensions: accuracy? style? speed?
Test-retest: r=0.31 (coin flip)
Cohen's κ=0.42

Cognitive Bias Cascade (02:00-03:30)

Anchoring: 34% rating variance in first 3 seconds
Confirmation: 78% selective attention to preferred features
Dunning-Kruger: d=1.24 effect size
Result: Circular preferences (A>B>C>A)

The Quantitative Alternative (03:30-05:00)

Objective Metrics

McCabe complexity ≤20
Test coverage ≥80%
Big O notation comparison
Self-admitted technical debt
Reliability: r=0.91 vs r=0.42
Effect size: d=2.18

Dream Scenario vs Reality (05:00-06:00)

Dream

World's best engineers
Annotated metrics
Standardized criteria

Reality

Random internet users
No expertise verification
Subjective preferences

Key Statistics

Metric	Chess	AI Agents
Inter-rater reliability	κ=0.92	κ=0.42
Test-retest	r=0.95	r=0.31
Temporal drift	±10 pts	±150 pts
Hurst exponent	0.89	0.31

Takeaways

Stop: Using preference votes as quality metrics
Start: Automated complexity analysis
ROI: 4.7 months to break even

Citations Mentioned

Kapoor et al. (2025): "AI agents that matter" - κ=0.42 finding
Santos et al. (2022): Technical Debt Grading validation
Regan & Haworth (2011): Chess arbiter reliability κ=0.92
Chapman & Johnson (2002): 34% anchoring effect

Quotable Moments

"You can't rate chess with basketball fans"

"0.31 reliability? That's a coin flip with extra steps"

"Every preference vote is a data crime"

"The psychometrics are screaming"

Resources

Technical Debt Grading (TDG) Framework
PMAT (Pragmatic AI Labs MCP Agent Toolkit)
McCabe Complexity Calculator
Cohen's Kappa Calculator

🔥 Hot Course Offers:

🤖 Master GenAI Engineering - Build Production AI Systems
🦀 Learn Professional Rust - Industry-Grade Development
📊 AWS AI & Analytics - Scale Your ML in Cloud
⚡ Production GenAI on AWS - Deploy at Enterprise Scale
🛠️ Rust DevOps Mastery - Automate Everything

🚀 Level Up Your Career:

💼 Production ML Program - Complete MLOps & Cloud Mastery
🎯 Start Learning Now - Fast-Track Your ML Career
🏢 Trusted by Fortune 500 Teams

Learn end-to-end ML engineering from industry veterans at PAIML.COM

1
The Bill Simmons Podcast

The Ringer

111k

Unsubscribe

1d ago1d ago

Unsubscribe

Weekly+

HBO and The Ringer's Bill Simmons hosts the most downloaded sports podcast of all time, with a rotating crew of celebrities, athletes, and media staples, as well as mainstays like Cousin Sal, Joe House, and a slew of other friends and family members who always happen to be suspiciously available.

1
PTI

ESPN, Tony Kornheiser, Michael Wilbon

103k

200

Unsubscribe

3h ago3h ago

Unsubscribe

Weekly+

Tony Kornheiser and Michael Wilbon face off in the nation's capital on the day's hottest topics.

1
First Take

ESPN, Stephen A. Smith, Molly Qerim

200

Unsubscribe

19h ago19h ago

Unsubscribe

Daily

First Take is always a heated discussion as Stephen A. Smith and guests debate about the day's top stories.

1
Planet Money

NPR

188k

355

Unsubscribe

3h ago3h ago

Unsubscribe

Weekly+

Wanna see a trick? Give us any topic and we can tie it back to the economy. At Planet Money, we explore the forces that shape our lives and bring you along for the ride. Don't just understand the economy – understand the world. Wanna go deeper? Subscribe to Planet Money+ and get sponsor-free episodes of Planet Money, The Indicator, and Planet Money Summer School. Plus access to bonus content. It's a new way to support the show you love. Learn more at plus.npr.org/planetmoney

1
Comedy of the Week

BBC Radio 4

163k

Unsubscribe

6d ago6d ago

Unsubscribe

Weekly

Brighten your week with the latest BBC Radio 4 comedy.

1
The Bugle

The Bugle

122k

670

Unsubscribe

1d ago1d ago

Unsubscribe

Weekly

It's the trans-global satiricast that leaves no hot potato unbuttered. Andy Zaltzman breaks down the news with comedians from across the world including Alice Fraser, Hari Kondabolu, Chris Addison, John Oliver, Nish Kumar, Tiff Stevenson and Helen Zaltzman. Go to TheBuglePodcast.com to become a premium subscriber and get exclusive shows. Follow us on YouTube. Hosted on Acast. See acast.com/privacy for more information.

1
How Did This Get Made?

Earwolf and Paul Scheer, June Diane Raphael, Jason Mantzoukas

197k

299

Unsubscribe

20h ago20h ago

Unsubscribe

Weekly+

The award-winning comedy podcast that celebrates bad movies. Comedians and actors Paul Scheer (The League), June Diane Raphael (Grace and Frankie), and Jason Mantzoukas (Big Mouth) break down the very best of the worst films ever made—we’re talkin’ blockbuster flops, cheesy 80s action movies, Lifetime thrillers, obscure cult classics, and if we’re honest… most Nic Cage and Jason Statham movies. Plus, sometimes they’re even joined by hilarious guests like Seth Rogen, Conan O’Brien, Amy Schume ...

1
Slate Culture

Slate Podcasts

157k

Unsubscribe

16h ago16h ago

Unsubscribe

Daily

Get the Culture Gabfest and all of Slate's culture coverage here.

1
TED Talks Daily

TED

1333k

Unsubscribe

2h ago2h ago

Unsubscribe

Daily

Want TED Talks on the go? Everyday, this feed brings you our latest talks in audio format. Hear thought-provoking ideas on every subject imaginable – from Artificial Intelligence to Zoology, and everything in between – given by the world's leading thinkers and doers. This collection of talks, given at TED and TEDx conferences around the globe, is also available in video format. Interested in learning more about upcoming TED events? Follow these links: TEDNext: ted.com/futureyou Hosted on Aca ...

1
NBC Nightly News with Tom Llamas

NBC News

1191k

600

Unsubscribe

20h ago20h ago

Unsubscribe

Daily

Listen to "NBC Nightly News," providing reports and analysis of the day's most newsworthy national and international events. This audio podcast, updated each evening, brings you the day's show in its entirety. For more from "Nightly News", visit NBCNightlyNews.com.

1
CBS News Roundup

CBS News

1122k

Unsubscribe

6h ago6h ago

Unsubscribe

Daily

The CBS News team wraps up the major headlines you need to know every day on the CBS News Roundup podcast. On weekday mornings, Steve Kathan delivers the “World News Roundup” and every evening you can catch up on all the day's news with Jennifer Keiper on the “World News Roundup: Late Edition”. Then, every weekend the CBS News team in Washington goes deep into the major stories on “Weekend Roundup'' hosted by Allison Keyes. Each episode features a “Kaleidoscope” segment that takes on social ...

1
Daily Boost Motivation and Coaching

Scott Smith - Motivation and Coaching

377k

Unsubscribe

20h ago20h ago

Unsubscribe

Daily

The Daily Boost with Scott Smith Successful but still searching? Get the daily motivation you need to face your passion every day. With over 100 million downloads since 2006, The Daily Boost delivers practical motivation for driven professionals and entrepreneurs who refuse to settle for "good enough." Host Scott Smith cuts through the noise with bite-sized episodes that get straight to what matters—helping you do what you love and make a great living while doing it. Based on the proven Face ...

1
Radiolab

WNYC Studios

291k

150

Unsubscribe

1d ago1d ago

Unsubscribe

Weekly

Radiolab is on a curiosity bender. We ask deep questions and use investigative journalism to get the answers. A given episode might whirl you through science, legal history, and into the home of someone halfway across the world. The show is known for innovative sound design, smashing information into music. It is hosted by Lulu Miller and Latif Nasser.

1
Science Friday

Science Friday and WNYC Studios

186k

150

Unsubscribe

3h ago3h ago

Unsubscribe

Daily

Covering the outer reaches of space to the tiniest microbes in our bodies, Science Friday is the source for entertaining and educational stories about science, technology, and other cool stuff.

1
This American Life

This American Life

359k

Unsubscribe

6d ago6d ago

Unsubscribe

Monthly

Each week we choose a theme. Then anything can happen. This American Life is true stories that unfold like little movies for radio. Personal stories with funny moments, big feelings, and surprising plot twists. Newsy stories that try to capture what it’s like to be alive right now. It’s the most popular weekly podcast in the world, and winner of the first ever Pulitzer Prize for a radio show or podcast. Hosted by Ira Glass and produced in collaboration with WBEZ Chicago.

1
Snap Judgment

Snap Judgment and PRX

316k

494

Unsubscribe

2d ago2d ago

Unsubscribe

Weekly

Snap Judgment mixes real stories with killer beats to produce cinematic, dramatic radio. Snap’s raw, musical brand of storytelling dares listeners to see the world through the eyes of another. It's storytelling... with a BEAT.

1
Criminal

Vox Media Podcast Network

255k

350

Unsubscribe

1d ago1d ago

Unsubscribe

Weekly

Criminal is the first of its kind. A show about people who’ve done wrong, been wronged, or gotten caught somewhere in the middle. Hosted by Phoebe Judge. Named a Best Podcast of 2023 by the New York Times. Part of the Vox Media Podcast Network.

1
Sword and Scale

Sword and Scale

60k

314

Unsubscribe

1d ago1d ago

Unsubscribe

Monthly+

Sword and Scale is a weekly true crime podcast covering the dark underworld of crime and the criminal justice system’s response to it. The first episode launched January 1st, 2014 and feature stories of murder, abduction, rape, and even more bizarre forms of crime. It’s the purest form of true-crime where the raw uncensored audio tells the story. Everything from 911 calls to court testimony, interviews with victims and sometimes with perpetrators give listeners a 360 degree look at the seedy ...

1
My Favorite Murder with Karen Kilgariff and Georgia Hardstark

Exactly Right and iHeartPodcasts

52k

Unsubscribe

2d ago2d ago

Unsubscribe

Weekly+

My Favorite Murder is a true crime comedy podcast hosted by Karen Kilgariff and Georgia Hardstark. Each week, Karen and Georgia share compelling true crimes and hometown stories from friends and listeners. Since MFM launched in January of 2016, Karen and Georgia have shared their lifelong interest in true crime and have covered stories of infamous serial killers like the Night Stalker, mysterious cold cases, captivating cults, incredible survivor stories and important events from history lik ...