Why This Ex-Meta Leader Is Rethinking AI Infrastructure | Lin Qiao, CEO, Fireworks AI The MAD Podcast With Matt Turck podcast

Why This Ex-Meta Leader is Rethinking AI Infrastructure | Lin Qiao, CEO, Fireworks AI

1M ago 59:14

Content provided by Matt Turck. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Matt Turck or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://staging.podcastplayer.com/legal.

In 2022, Lin Qiao decided to leave Meta, where she was managing several hundred engineers, to start Fireworks AI. In this episode, we sit down with Lin for a deep dive on her work, starting with her leadership on PyTorch, now one of the most influential machine learning frameworks in the industry, powering research and production at scale across the AI industry.

Now at the helm of Fireworks AI, Lin is leading a new wave in generative AI infrastructure, simplifying model deployment and optimizing performance to empower all developers building with Gen AI technologies.

We dive into the technical core of Fireworks AI, uncovering their innovative strategies for model optimization, Function Calling in agentic development, and low-level breakthroughs at the GPU and CUDA layers.

Fireworks AI

Website - https://fireworks.ai

X/Twitter - https://twitter.com/FireworksAI_HQ

Lin Qiao

LinkedIn - https://www.linkedin.com/in/lin-qiao-22248b4

X/Twitter - https://twitter.com/lqiao

FIRSTMARK

Website - https://firstmark.com

X/Twitter - https://twitter.com/FirstMarkCap

Matt Turck (Managing Director)

LinkedIn - https://www.linkedin.com/in/turck/

X/Twitter - https://twitter.com/mattturck

(00:00) Intro

(01:20) What is Fireworks AI?

(02:47) What is PyTorch?

(12:50) Traditional ML vs GenAI

(14:54) AI’s enterprise transformation

(16:16) From Meta to Fireworks

(19:39) Simplifying AI infrastructure

(20:41) How Fireworks clients use GenAI

(22:02) How many models are powered by Fireworks

(30:09) LLM partitioning

(34:43) Real-time vs pre-set search

(36:56) Reinforcement learning

(38:56) Function calling

(44:23) Low-level architecture overview

(45:47) Cloud GPUs & hardware support

(47:16) VPC vs on-prem vs local deployment

(49:50) Decreasing inference costs and its business implications

(52:46) Fireworks roadmap

(55:03) AI future predictions

79 episodes