Search a title or topic

Over 20 million podcasts, powered by 

Player FM logo
Artwork

Content provided by Mark Moyou, PhD and Mark Moyou. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Mark Moyou, PhD and Mark Moyou or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://staging.podcastplayer.com/legal.
Player FM - Podcast App
Go offline with the Player FM app!

Kyle Kranen: End Points, Optimizing LLMs, GNNs, Foundation Models - AI Portfolio Podcast #011

1:30:01
 
Share
 

Manage episode 445844977 series 3596668
Content provided by Mark Moyou, PhD and Mark Moyou. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Mark Moyou, PhD and Mark Moyou or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://staging.podcastplayer.com/legal.

Get 1000 free inference requests for LLMs on build.nvidia.com
Kyle Kranen, an engineering leader at NVIDIA, who is at the forefront of deep learning, real-world applications, and production. Kyle shares his expertise on optimizing large language models (LLMs) for deployment, exploring the complexities of scaling and parallelism.
πŸ“² Kyle Kranen Socials:
LinkedIn: https://www.linkedin.com/in/kyle-kranen/
Twitter: https://x.com/kranenkyle
πŸ“² Mark Moyou, PhD Socials:
LinkedIn: https://www.linkedin.com/in/markmoyou/
Twitter: https://twitter.com/MarkMoyou
πŸ“— Chapters
[00:00] Intro
[01:26] Optimizing LLMs for deployment
[10:23] Economy of Scale (Batch Size)
[13:18] Data Parallelism
[14:30] Kernels on GPUs
[18:48] Hardest part of optimizing
[22:26] Choosing hardware for LLM
[31:33] Storage and Networking - Analyzing Performance
[32:33] Minimum size of model where tensor parallel gives you advantage
[35:20] Director Level folks thinking about deploying LLM
[37:29] Kyle is working on AI foundation models
[40:38] Deploying Models with endpoints
[42:43] Fine Tuning, Deploying Loras
[45:02] SteerLM
[48:09] KV Cache
[51:43] Advice for people for deploying reasonable and large scale LLMs
[58:08] Graph Neural Networks
[01:00:04] GNNs
[01:04:22] Using GPUs to do GNNs
[01:08:25] Starting your GNN journey
[01:12:51] Career Optimization Function
[01:14:46] Solving Hard Problems
[01:16:20] Maintaining Technical Skills
[01:20:53] Deep learning expert
[01:26:00] Rapid Round

  continue reading

22 episodes

Artwork
iconShare
 
Manage episode 445844977 series 3596668
Content provided by Mark Moyou, PhD and Mark Moyou. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Mark Moyou, PhD and Mark Moyou or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://staging.podcastplayer.com/legal.

Get 1000 free inference requests for LLMs on build.nvidia.com
Kyle Kranen, an engineering leader at NVIDIA, who is at the forefront of deep learning, real-world applications, and production. Kyle shares his expertise on optimizing large language models (LLMs) for deployment, exploring the complexities of scaling and parallelism.
πŸ“² Kyle Kranen Socials:
LinkedIn: https://www.linkedin.com/in/kyle-kranen/
Twitter: https://x.com/kranenkyle
πŸ“² Mark Moyou, PhD Socials:
LinkedIn: https://www.linkedin.com/in/markmoyou/
Twitter: https://twitter.com/MarkMoyou
πŸ“— Chapters
[00:00] Intro
[01:26] Optimizing LLMs for deployment
[10:23] Economy of Scale (Batch Size)
[13:18] Data Parallelism
[14:30] Kernels on GPUs
[18:48] Hardest part of optimizing
[22:26] Choosing hardware for LLM
[31:33] Storage and Networking - Analyzing Performance
[32:33] Minimum size of model where tensor parallel gives you advantage
[35:20] Director Level folks thinking about deploying LLM
[37:29] Kyle is working on AI foundation models
[40:38] Deploying Models with endpoints
[42:43] Fine Tuning, Deploying Loras
[45:02] SteerLM
[48:09] KV Cache
[51:43] Advice for people for deploying reasonable and large scale LLMs
[58:08] Graph Neural Networks
[01:00:04] GNNs
[01:04:22] Using GPUs to do GNNs
[01:08:25] Starting your GNN journey
[01:12:51] Career Optimization Function
[01:14:46] Solving Hard Problems
[01:16:20] Maintaining Technical Skills
[01:20:53] Deep learning expert
[01:26:00] Rapid Round

  continue reading

22 episodes

All episodes

×
 
Loading …

Welcome to Player FM!

Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.

 

Copyright 2025 | Privacy Policy | Terms of Service | | Copyright
Listen to this show while you explore
Play