Content provided by Daniel Reid Cahn. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Daniel Reid Cahn or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://staging.podcastplayer.com/legal.
Player FM - Podcast App
Go offline with the Player FM app!
Go offline with the Player FM app!
LLM Inference Speed (Tech Deep Dive)
MP3•Episode home
Manage episode 379027520 series 3514761
Content provided by Daniel Reid Cahn. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Daniel Reid Cahn or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://staging.podcastplayer.com/legal.
In this tech talk, we dive deep into the technical specifics around LLM inference.
The big question is: Why are LLMs slow? How can they be faster? And might slow inference affect UX in the next generation of AI-powered software?
We jump into:
- Is fast model inference the real moat for LLM companies?
- What are the implications of slow model inference on the future of decentralized and edge model inference?
- As demand rises, what will the latency/throughput tradeoff look like?
- What innovations on the horizon might massively speed up model inference?
26 episodes
MP3•Episode home
Manage episode 379027520 series 3514761
Content provided by Daniel Reid Cahn. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Daniel Reid Cahn or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://staging.podcastplayer.com/legal.
In this tech talk, we dive deep into the technical specifics around LLM inference.
The big question is: Why are LLMs slow? How can they be faster? And might slow inference affect UX in the next generation of AI-powered software?
We jump into:
- Is fast model inference the real moat for LLM companies?
- What are the implications of slow model inference on the future of decentralized and edge model inference?
- As demand rises, what will the latency/throughput tradeoff look like?
- What innovations on the horizon might massively speed up model inference?
26 episodes
All episodes
×Welcome to Player FM!
Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.