Artwork
iconShare
 
Manage episode 507205297 series 3690669
Content provided by Mr. Dew. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Mr. Dew or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://staging.podcastplayer.com/legal.

Get an insider’s look behind the curtain of modern AI with this episode of "All Things LLM." Join hosts Alex and AI expert Ben as they reveal the colossal effort, expense, and ingenuity required to take a language model from “blank slate” to foundational intelligence.

What you’ll learn:

  • The massive scale of LLM training: how developers assemble and meticulously clean terabytes of text—web pages, books, code, scientific articles, forums, StackExchange, Wikipedia—to power the world's top AI models.
  • Why raw internet data isn’t enough: how advanced data cleaning, filtering, deduplication, curation, and privacy-preserving techniques ensure that only high-quality, compliant, and safe data fuels the learning process.
  • Self-supervised learning, explained: the difference between Causal Language Modeling (next-word prediction, used by GPT) and Masked Language Modeling (fill-in-the-blank, used by BERT)—and how these simple-sounding tasks create astonishing language abilities.
  • The staggering costs: real-world examples of the multi-million dollar, energy-hungry GPU clusters and the environmental impact, with electricity and water usage statistics that highlight why this phase is “large” in every sense of the word.
  • The lifecycle of AI: why this pre-training phase only produces a “base model” or “foundation model”—and what has to happen next to make it truly useful as a chatbot, assistant, or domain expert.

Perfect for listeners searching for:

  • How to train a language model
  • LLM data preparation
  • Self-supervised learning in AI
  • Environmental cost of AI
  • Base models vs. fine-tuned models
  • How much does training an LLM cost?
  • State-of-the-art NLP podcast

Listen now to understand the monumental process behind today’s most powerful AIs—and get ready for next week’s episode, where Ben and Alex unpack the essential fine-tuning and reinforcement steps that turn generalist base models into today’s smart, helpful, responsive chatbots and assistants!

All Things LLM is a production of MTN Holdings, LLC. © 2025. All rights reserved.
For more insights, resources, and show updates, visit allthingsllm.com.
For business inquiries, partnerships, or feedback, contact: [email protected]

The views and opinions expressed in this episode are those of the hosts and guests, and do not necessarily reflect the official policy or position of MTN Holdings, LLC.

Unauthorized reproduction or distribution of this podcast, in whole or in part, without written permission is strictly prohibited.
Thank you for listening and supporting the advancement of transparent, accessible AI education.

  continue reading

15 episodes