Artwork
iconShare
 
Manage episode 521319387 series 3611124
Content provided by Matt Turck. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Matt Turck or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://staging.podcastplayer.com/legal.

We’re told that AI progress is slowing down, that pre-training has hit a wall, that scaling laws are running out of road. Yet we’re releasing this episode in the middle of a wild couple of weeks that saw GPT-5.1, GPT-5.1 Codex Max, fresh reasoning modes and long-running agents ship from OpenAI — on top of a flood of new frontier models elsewhere. To make sense of what’s actually happening at the edge of the field, I sat down with someone who has literally helped define both of the major AI paradigms of our time.

Łukasz Kaiser is one of the co-authors of “Attention Is All You Need,” the paper that introduced the Transformer architecture behind modern LLMs, and is now a leading research scientist at OpenAI working on reasoning models like those behind GPT-5.1. In this conversation, he explains why AI progress still looks like a smooth exponential curve from inside the labs, why pre-training is very much alive even as reinforcement-learning-based reasoning models take over the spotlight, how chain-of-thought actually works under the hood, and what it really means to “train the thinking process” with RL on verifiable domains like math, code and science. We talk about the messy reality of low-hanging fruit in engineering and data, the economics of GPUs and distillation, interpretability work on circuits and sparsity, and why the best frontier models can still be stumped by a logic puzzle from his five-year-old’s math book.

We also go deep into Łukasz’s personal journey — from logic and games in Poland and France, to Ray Kurzweil’s team, Google Brain and the inside story of the Transformer, to joining OpenAI and helping drive the shift from chatbots to genuine reasoning engines. Along the way we cover GPT-4 → GPT-5 → GPT-5.1, post-training and tone, GPT-5.1 Codex Max and long-running coding agents with compaction, alternative architectures beyond Transformers, whether foundation models will “eat” most agents and applications, what the translation industry can teach us about trust and human-in-the-loop, and why he thinks generalization, multimodal reasoning and robots in the home are where some of the most interesting challenges still lie.

OpenAI

Website - https://openai.com

X/Twitter - https://x.com/OpenAI

Łukasz Kaiser

LinkedIn - https://www.linkedin.com/in/lukaszkaiser/

X/Twitter - https://x.com/lukaszkaiser

FIRSTMARK

Website - https://firstmark.com

X/Twitter - https://twitter.com/FirstMarkCap

Matt Turck (Managing Director)

Blog - https://mattturck.com

LinkedIn - https://www.linkedin.com/in/turck/

X/Twitter - https://twitter.com/mattturck

(00:00) – Cold open and intro

(01:29) – “AI slowdown” vs a wild week of new frontier models

(08:03) – Low-hanging fruit: infra, RL training and better data

(11:39) – What is a reasoning model, in plain language?

(17:02) – Chain-of-thought and training the thinking process with RL

(21:39) – Łukasz’s path: from logic and France to Google and Kurzweil

(24:20) – Inside the Transformer story and what “attention” really means

(28:42) – From Google Brain to OpenAI: culture, scale and GPUs

(32:49) – What’s next for pre-training, GPUs and distillation

(37:29) – Can we still understand these models? Circuits, sparsity and black boxes

(39:42) – GPT-4 → GPT-5 → GPT-5.1: what actually changed

(42:40) – Post-training, safety and teaching GPT-5.1 different tones

(46:16) – How long should GPT-5.1 think? Reasoning tokens and jagged abilities

(47:43) – The five-year-old’s dot puzzle that still breaks frontier models

(52:22) – Generalization, child-like learning and whether reasoning is enough

(53:48) – Beyond Transformers: ARC, LeCun’s ideas and multimodal bottlenecks

(56:10) – GPT-5.1 Codex Max, long-running agents and compaction

(1:00:06) – Will foundation models eat most apps? The translation analogy and trust

(1:02:34) – What still needs to be solved, and where AI might go next

  continue reading

102 episodes