Best Llm Evaluation Podcasts (2025)

1
Claude Skills: How to build Custom Agentic Abilities for beginners 44:29

Play Pause

1h ago44:29

44:29

Capabilities? Through the roof? Usage? Ground floor. Claude Agent Skills might be one of the most useful features of any front-end LLM. Yet....it's crickets in terms of chat around it. For this 'AI at Work on Wednesday' episode, we're breaking it down for beginners and will have you spinning up your own Claude Agent Skills in no time. Claude Skills…

1
9th & 10th December - AI News Daily - OpenAI, Google, Microsoft Unite to Launch Agentic AI Foundation 17:46

13h ago17:46

17:46

Send us a text 🌍 INAI • The Open AI Hub The Intelligence Atlas → the world’s most comprehensive, open hub of AI knowledge. 2 Million+ tools, models, agents, tutorials & daily news—free for all, updated every day. https://github.com/inai-sandy/inAI-wiki AI News Daily — Dec 10, 2025 Summary Top Highlights: Major AI companies (OpenAI, Anthropic, Micro…

1
Season 6, Episode 21: Does GenAI outperform humans with ad creative production? (with Anindya Ghose and Vilma Todri) 36:11

16h ago36:11

36:11

In this episode of the podcast, I speak with Anindya Ghose from NYU and Vilma Todri from Emory University about their recent paper, The Impact of Visual Generative AI on Advertising Effectiveness, which is available in pre-print. In the paper, Anindya, Vilma, and the other authors assess the performance efficacy of three types of ad creative: Creat…

1
Why Vision Language Models Ignore What They See with Munawar Hayat - #758 57:40

20h ago57:40

57:40

In this episode, we’re joined by Munawar Hayat, researcher at Qualcomm AI Research, to discuss a series of papers presented at NeurIPS 2025 focusing on multimodal and generative AI. We dive into the persistent challenge of object hallucination in Vision-Language Models (VLMs), why models often discard visual information in favor of pre-trained lang…

1
OpenAI's Code Red: Is Google taking ChatGPT's Crown? 45:37

23h ago45:37

45:37

OpenAI is (reportedly) in full panic mode. 🚨 All hands on deck, Code Red status. So.... what happened? How did OpenAI go from defining the AI category to getting beat by competitors they once trounced? And, is it too late for them to turn it around? Or will Google permanently take the AI crown? Tune in... we've got hot takes. OpenAI's Code Red: Is …

1
OpenAI’s Code Red, Google’s Deep new model, Perplexity facing big lawsuit and more 34:45

24h ago34:45

34:45

OpenAI has launched a code red. 🚨 After increasing pressure from Google, OpenAI is reportedly in ‘all hands on deck’ mode to reclaim the LLM crown. Meanwhile, Google quietly released an EVEN MORE powerful version of Gemini 3 that hardly no one noticed. And Perplexity? They got hit with a massive lawsuit. You take one week off of AI, and you could b…

1
7th & 8th December - AI News Daily - OpenAI Accelerates GPT-5.2 Launch as NeurIPS Spotlights Evaluation Rigor 11:57

3d ago11:57

11:57

Send us a text 🌍 INAI • The Open AI Hub The Intelligence Atlas → the world’s most comprehensive, open hub of AI knowledge. 2 Million+ tools, models, agents, tutorials & daily news—free for all, updated every day. https://github.com/inai-sandy/inAI-wiki Top Highlights: NeurIPS 2025 emphasized attention limits, compositional generalization, and rigor…

1
5th, 6th December - AI News Daily - Google Gemini 3 Deep Think Reshapes AI Reasoning as OpenAI Accelerates GPT-5.2 16:51

5d ago16:51

16:51

Send us a text 🌍 INAI • The Open AI Hub The Intelligence Atlas → the world’s most comprehensive, open hub of AI knowledge. 2 Million+ tools, models, agents, tutorials & daily news—free for all, updated every day. https://github.com/inai-sandy/inAI-wiki Top Highlights: Google Gemini 3 with Deep Think strengthens reasoning; OpenAI pushes GPT-5.2 amid…

1
Aligning AI With Climate And Business Goals 28:30

4d ago28:30

28:30

How can you scale AI at the enterprise, yet still hit your climate goals? And can heavy AI usage and an enterprise's ESG mission co-exist? Ashutosh Ahuja lays it out for us. Aligning AI With Climate And Business Goals -- An Everyday AI Chat with Jordan Wilson and Ashutosh Ahuja Newsletter: Sign up for our free daily newsletter More on this Episode:…

1
#750: re:Invent 2025 - Day 3 Wrapup 22:42

4d ago22:42

22:42

It is the end of re:Invent! Simon and Jillian share some updates and also take a moment to reflect on 2025.By Amazon Web Services

1
The Future of AI Agents: Will there be more Agents than humans? 32:41

5d ago32:41

32:41

You're probably using AI agents without even knowing it. 🤯 Crazier yet? It's very possible that there may already be more AI agent instances than humans in the world. Was that a bold claim we made a year ago? Yeah. But did Cloudflare's Tech Lead of AI Agents agree? Also, yeah. (See, we're not that crazy.) So, what do you need to know about the futu…

1
3rd & 4th December - AI News Daily - Google Launches No-Code Gemini Agents as OpenAI Reshapes ML Infrastructure 12:45

7d ago12:45

12:45

Send us a text 🌍 INAI • The Open AI Hub The Intelligence Atlas → the world’s most comprehensive, open hub of AI knowledge. 2 Million+ tools, models, agents, tutorials & daily news—free for all, updated every day. https://github.com/inai-sandy/inAI-wiki Top Highlights: OpenAI acquired Neptune.ai for ML workflow optimization; Google launched Workspac…

1
#749: re:Invent 2025 - Swami Sivasubramanian Keynote 27:07

2d ago27:07

27:07

Simon and Jillian catch you up on the highlights from today's keynote PLUS all the "pre:Invent" announcements that took place prior to the event!By Amazon Web Services

1
Beginner’s Guide: How to visualize data with AI in ChatGPT, Gemini and Claude 42:41

6d ago42:41

42:41

FYI -- Today's LinkedIn livestream broke, so you can access the custom instructions here. This is Vibe Coding 001. Have you ever wanted to build your own software or apps that can just kinda do your work for you inside of the LLM you use but don't know where to start? Start here. We're giving it all away and making it as simple as possible, while a…

1
#748: re:Invent 2025 - Matt Garman Keynote 39:14

7d ago39:14

39:14

In this episode, Matt Garman's 2025 re:Invent keynote unveils exciting AI advancements, including Amazon Nova to Lite, a cost-effective reasoning model, and Amazon Nova 2 Sonic, a new speech-to-text model. The keynote also covers Security, Storage, Compute, Networking, and a whole lot more!By Amazon Web Services

1
Scaling Agentic Inference Across Heterogeneous Compute with Zain Asgar - #757 48:44

8d ago48:44

48:44

In this episode, Zain Asgar, co-founder and CEO of Gimlet Labs, joins us to discuss the heterogeneous AI inference across diverse hardware. Zain argues that the current industry standard of running all AI workloads on high-end GPUs is unsustainable for agents, which consume significantly more tokens than traditional LLM applications. We explore Gim…

1
Season 6, Episode 20: Are chatbots driving eCommerce sales? (with Rishabh Jain) 55:45

8d ago55:45

55:45

My guest on this episode of the podcast is Rishabh Jain, the CEO and co-founder of FERMÀT Commerce, an eCommerce advertising optimization platform. Rishabh most recently joined the podcast in June for an episode of the MDM Mailbag. In this episode, Rishabh and I discuss the impact of chatbot discovery on eCommerce sales, including over Black Friday…

1
3 AI Lies Most People Believed In 2025 (But You Shouldn’t) 40:21

7d ago40:21

40:21

You've been lied to about AI. 🤥 A lot. So on today's Hot Take Tuesday episode, we're breaking down 3 of the most viral AI half-truths of 2025 and setting the record straight. Did Anthropic overtake OpenAI? Do 95% of AI pilots fail? Is half of the internet AI slop? Tune in LIVE and find out. 3 AI Lies most people believed in 2025 (but you shouldn’t)…

1
2nd December - AI News Daily - DeepSeek V3.2 Achieves Medal-Level AI Performance at Fraction of Cost 17:14

9d ago17:14

17:14

Send us a text 🌍 INAI • The Open AI Hub The Intelligence Atlas → the world’s most comprehensive, open hub of AI knowledge. 2 Million+ tools, models, agents, tutorials & daily news—free for all, updated every day. https://github.com/inai-sandy/inAI-wiki Top Highlights: DeepSeek V3.2 achieves medal-level math/coding at lower cost. Hugging Face launch…

1
Claude Opus 4.5 drops, U.S. government makes bold AI move and ChatGPT ads landing soon? And More AI News that Matters 34:17

8d ago34:17

34:17

Claude Opus 4.5 has entered the Chat. 🗣️ A week after OpenAI, Grok and Google released their most powerful AI models to date, Anthropic joined the party with their major drop in Claude Opus 4.5. But that probably wasn't even the biggest AI news of the week. That's because OpenAI isn't just building AI hardware that can hear/know everything, they're…

1
29th & 30th November - AI News Daily - Google Unveils Nested Learning as OpenAI Pivots to Ads Amid Debt Crisis 15:01

11d ago15:01

15:01

Send us a text 🌍 INAI • The Open AI Hub The Intelligence Atlas → the world’s most comprehensive, open hub of AI knowledge. 2 Million+ tools, models, agents, tutorials & daily news—free for all, updated every day. https://github.com/inai-sandy/inAI-wiki Top Highlights: Google unveiled Nested Learning and 2.3kW TPU Rubin; Gemini 3 sees surging adopti…

1
28th November - AI News Daily - OpenAI Confirms Data Breach, Severs Vendor Ties Amid Security Overhaul 17:07

13d ago17:07

17:07

Send us a text 🌍 INAI • The Open AI Hub The Intelligence Atlas → the world’s most comprehensive, open hub of AI knowledge. 2 Million+ tools, models, agents, tutorials & daily news—free for all, updated every day. https://github.com/inai-sandy/inAI-wiki Top Highlights: Google launched Gemini 3, advancing agentic automation and multimodal reasoning. …

1
26th & 27th November - AI News Daily - Google Gemini 3 and Anthropic Slash Prices, Igniting Fierce AI Model War 16:18

14d ago16:18

16:18

Send us a text 🌍 INAI • The Open AI Hub The Intelligence Atlas → the world’s most comprehensive, open hub of AI knowledge. 2 Million+ tools, models, agents, tutorials & daily news—free for all, updated every day. https://github.com/inai-sandy/inAI-wiki Top Highlights: Google Gemini 3 and Anthropic Claude Opus 4.5 launch major upgrades with price cu…

1
Ep 662: Opus 4.5: New king of the AI hill or just a niche model for coders? 48:09

13d ago48:09

48:09

"... best model in the world..." 🤔 Wait, again? Days after Gemini 3 Pro splashed on the scene, Anthropic snuck in a low-key drop in Claude Opus 4.5. And Anthropic pulled no punches, calling its new model the "best model in the world for coding, agents and computer use" So, should you be hot swapping your Gemini or ChatGPT use out for the new Opus 4…

1
Season 6, Episode 19: Can the GDPR be reformed? (with Mikołaj Barczentewicz) 1:03:50

15d ago1:03:50

1:03:50

My guest on this episode of the podcast is Mikołaj Barczentewicz, a professor of law at the University of Surrey and the author of EU Tech Reg, a blog dedicated to following developments in the EU regulatory machinery. In this episode, Mikołaj and I discuss the digital omnibus package that was recently proposed by the European Commission and which …

1
Ep 661: Out of the Shadows: How to Manage AI Sprawl 31:33

15d ago31:33

31:33

Even if you've banned AI, your employees are 100% using it. 🥵 To make matters worse? Even if you've approved a certain AI system, your teams are probably using whatever they want. And those choices are likely putting your enterprise data at risk. So, how do you reel in manage the AI sprawl? Kevin Kiley, the CEO of Airia, is laying out the playbook.…

1
24th, 25th November - AI News Daily - Anthropic's Claude Opus 4.5 Tops Coding Leaderboard, Launches First Image Model 18:14

15d ago18:14

18:14

Send us a text 🌍 INAI • The Open AI Hub The Intelligence Atlas → the world’s most comprehensive, open hub of AI knowledge. 2 Million+ tools, models, agents, tutorials & daily news—free for all, updated every day. https://github.com/inai-sandy/inAI-wiki Major Model Releases: Google launched Gemini 3 Pro with deepfake detection via SynthID and superi…

1
TUMIX: Multi-Agent Test-Time Scaling with Tool-Use Mixture 23:44

15d ago23:44

23:44

We dive into the latest paper from Google and a team of academic researchers: "TUMIX: Multi-Agent Test-Time Scaling with Tool-Use Mixture." Hear from one of the paper's authors — Yongchao Chen, Research Scientist — walks through the research and its implications. The paper proposes Tool-Use Mixture (TUMIX), an ensemble framework that runs multiple …

1
Gemini 3 impresses, GPT-5.1 Pro rolls out, Microsoft brings 365 Agent updates and more AI News That Matters 43:44

16d ago43:44

43:44

Wildest week in AI since December 2024. 🤯 ↳Gemini 3 is out and it's REALLY good. ↳ GPT-5.1 Pro might end up being better. (Even though no one is talking about it) ↳Microsoft is releasing agents where people will actually use them. ↳ Nano Banana Pro will probably be more impactful than Gemini 3 (as banana as that sounds. Whew. What a week in AI. Don…

1
#747: Unpacking Automated Reasoning: From Mathematical Logic to Practical AI Security 38:02

15d ago38:02

38:02

Discover how AWS leverages automated reasoning to enhance AI safety, trustworthiness, and decision-making. Byron Cook (Vice President and Distinguished Scientist) explains the evolution of reasoning tools from limited, PhD-driven solutions to scalable, user-friendly systems embedded in everyday business operations. He highlights real-world examples…

1
23rd November - AI News Daily - Google's Gemini 3 Pro Shatters Benchmarks, Elevates Alphabet Stock Surge 13:42

17d ago13:42

13:42

Send us a text 🌍 INAI • The Open AI Hub The Intelligence Atlas → the world’s most comprehensive, open hub of AI knowledge. 2 Million+ tools, models, agents, tutorials & daily news—free for all, updated every day. https://github.com/inai-sandy/inAI-wiki Top Headlines: California approves Waymo's statewide driverless rides (San Diego mid-2026); UK la…

1
AI Agents in your browser: Work Cheat Code or too Risky? 31:23

19d ago31:23

31:23

Yeah, agnetic browsers can do your work for you. 💅 But..... should they? How do we tip-toe the fine line between the upside productivity of agentic browsers and the potential security nightmares they bring with them? Tune it and let's chat about it. AI Agents in your browser Work Cheat Code or too Risky? An Everyday AI Chat with Jordan Wilson and A…

1
21st November - AI News Daily - Google Gemini 3 and OpenAI GPT-5.1 Intensify AI Supremacy Battle 18:51

20d ago18:51

18:51

Send us a text 🌍 INAI • The Open AI Hub The Intelligence Atlas → the world’s most comprehensive, open hub of AI knowledge. 2 Million+ tools, models, agents, tutorials & daily news—free for all, updated every day. https://github.com/inai-sandy/inAI-wiki Major Model Releases: Google launched Gemini 3 Pro with TPU acceleration, topping coding benchmar…

1
5 Simple AI Strategies to Supercharge Your Workflow with Google 33:47

20d ago33:47

33:47

Richard Seroter is a Chief Evangelist at Google. 📢 So it’s LITERARLLY his job to help people use Google’s AI products. So with him joining the Everyday AI show, you KNOW he’s gonna be dropping some time-saving and business building strategies. And a bit of future of work knowledge along the way. This is one you DO NOT wanna miss. 5 Simple AI Strate…

1
20th November - AI News Daily - Robotics Hits Production: BMW Robots Surpass 90,000 Part Loads 16:03

21d ago16:03

16:03

Send us a text 🌍 INAI • The Open AI Hub The Intelligence Atlas → the world’s most comprehensive, open hub of AI knowledge. 2 Million+ tools, models, agents, tutorials & daily news—free for all, updated every day. https://github.com/inai-sandy/inAI-wiki Top Highlights: Google Gemini 3 launches with advanced reasoning and multimodality, intensifying …

1
Gemini 3 Deep Dive and 3 Upgraded use cases Anyone can use 39:15

21d ago39:15

39:15

You ever see a new AI model drop and be like.... it's so good OMG how do I use it? 🤔 Same. And yeah.... Gemini 3 is THAT good. So if you're wondering what's new, why it matters and how to use it, this episode is for you. AI at Work on Wednesdays: let's get it with the world's most poweful model in Gemini 3. Gemini 3 Deep Dive and 3 Upgraded Use Cas…

1
19th November - AI News Daily - Google Unleashes Gemini 3 as Microsoft-NVIDIA-Anthropic Alliance Reshapes AI Landscape 15:05

22d ago15:05

15:05

Send us a text 🌍 INAI • The Open AI Hub The Intelligence Atlas → the world’s most comprehensive, open hub of AI knowledge. 2 Million+ tools, models, agents, tutorials & daily news—free for all, updated every day. https://github.com/inai-sandy/inAI-wiki Top Highlights: Google launched Gemini 3 across Search, apps, and dev tools with rapid deployment…

1
Proactive Agents for the Web with Devi Parikh - #756 56:04

22d ago56:04

56:04

Today, we're joined by Devi Parikh, co-founder and co-CEO of Yutori, to discuss browser use models and a future where we interact with the web through proactive, autonomous agents. We explore the technical challenges of creating reliable web agents, the advantages of visually-grounded models that operate on screenshots rather than the browser’s mor…

1
Season 6, Episode 18: Meta's Business AI and the ad-product divide 44:37

22d ago44:37

44:37

My guest on this episode of the podcast is Simon Whitcombe, the Vice President, Global Business Group at Meta. We discuss Meta's Business AI and the Meta AI business assistant, both of which were announced ahead of this year's AdWeek. I unpacked the potential of these tools to help advertisers cross the "ad-product divide" in Can Meta cross the ad-…

1
Ep 656: Inside Gemini 3: What’s new and what it unlocks for your business with Google's Logan Kilpatrick 17:53

22d ago17:53

17:53

Gemini 3 is officially here. ✨ ✨ ✨ For about 8 months, Gemini 2.5 Pro has mostly maintained its standing as the top LLM in the world yet Google just unleashed its successor in Gemini 3.0. So, what's new in Gemini 3? And whether you're a developer or casual user, what does Google's new model unlock? Join us as we chat with Google's Logan Kilpatrick'…

1
Data Processing Evolved: OpenLineage with Willy Lulciuc 32:31

22d ago32:31

32:31

Willy Lulciuc (@wslulciuc) is a pioneer in data engineering and one of the creators of OpenLineage, the open-source framework for data lineage collection and analysis. It enables consistent collection of lineage metadata, giving engineers a better perspective on how data is produced and used, so they can better solve complex problems. Join us to le…

1
17th & 18th November - AI News Daily - Cloudflare Acquires Replicate, Brings 50,000+ AI Models to the Edge 13:53

23d ago13:53

13:53

Send us a text 🌍 INAI • The Open AI Hub The Intelligence Atlas → the world’s most comprehensive, open hub of AI knowledge. 2 Million+ tools, models, agents, tutorials & daily news—free for all, updated every day. https://github.com/inai-sandy/inAI-wiki Top Headlines (Nov 18): xAI's Grok 4.1 leads arena leaderboards with record Elo and MoE transpare…

1
Gemini 3 close to release, OpenAI drops GPT-5.1, Bezos to lead AI startup & more AI News 42:50

23d ago42:50

42:50

Buckle up AI world. OpenAI released a new model, and apparently they’re not done. Google is reportedly dropping Gemini 3 in hours. Jeff Bezos is going back hands-on building a new AI company. And that’s just the tip of the AI iceberg this week. Don’t get drowned out in the noise. On Monday, we cut it straight with the AI news that matters. Gemini 3…

1
#746: AWS Regional Planning Tool, MCP Proxy for AWS, and Lots More!!! 30:01

22d ago30:01

30:01

There are so many updates this week you might need two cups of coffee! Simon and Jillian guide your way.By Amazon Web Services

1
16th November - AI News Daily - Google's Gemini 3 Imminent as AMD-OpenAI Deal Reshapes AI Chip Wars 13:26

25d ago13:26

13:26

Send us a text 🌍 INAI • The Open AI Hub The Intelligence Atlas → the world’s most comprehensive, open hub of AI knowledge. 2 Million+ tools, models, agents, tutorials & daily news—free for all, updated every day. https://github.com/inai-sandy/inAI-wiki Top Highlights: Google's Gemini 3 is imminent, beating coding benchmarks to challenge ChatGPT. AM…

1
15th November - AI News Daily - Google Gemini Surges to 13.7% as ChatGPT Grip Weakens 13:30

25d ago13:30

13:30

Send us a text 🌍 INAI • The Open AI Hub The Intelligence Atlas → the world’s most comprehensive, open hub of AI knowledge. 2 Million+ tools, models, agents, tutorials & daily news—free for all, updated every day. https://github.com/inai-sandy/inAI-wiki Top Highlights: Anthropic stopped the first autonomous, state-linked AI cyber-espionage campaign.…

1
Using AI to turn Conversations into Revenue: A leader’s guide 33:55

26d ago33:55

33:55

Everyone knows AI needs your data to truly work. But, what about your company's reasoning? 🤔 Buried beneath the modes and models, features and agents is something so fundamental that we almost always overlook it: the friggin gold that is your company's conversations. It's your expertise. Your secret sauce. Your decision making. Your competitive adv…

1
14th November - AI News Daily - OpenAI Rolls Out GPT-5.1; Cursor Surpasses $1B ARR at $29.3B Valuation 12:16

27d ago12:16

12:16

Send us a text 🌍 INAI • The Open AI Hub The Intelligence Atlas → the world’s most comprehensive, open hub of AI knowledge. 2 Million+ tools, models, agents, tutorials & daily news—free for all, updated every day. https://github.com/inai-sandy/inAI-wiki Top Highlights: DeepMind's SIMA 2 achieves human-level performance in unseen 3D environments thro…

1
How Brands Can Prepare for the Post-Human Web 31:38

27d ago31:38

31:38

What happens when the web is all bots and AI? 🤖 And more importantly, what happens to your company's online presence when AI search completely takes over? Big questions. So we're bringing in the big gun for the answers. Michael Walrath is the Chairman and CEO Yext Inc, a global leader in brand management and search experience. Michael will dish the…

1
AI Orchestration for Smart Cities and the Enterprise with Robin Braun and Luke Norris - #755 54:46

28d ago54:46

54:46

Today, we're joined by Robin Braun, VP of AI business development for hybrid cloud at HPE, and Luke Norris, co-founder and CEO of Kamiwaza, to discuss how AI systems can be used to automate complex workflows and unlock value from legacy enterprise data. Robin and Luke detail high-impact use cases from HPE and Kamiwaza’s collaboration on an “Agentic…