Artwork
iconShare
 
Manage episode 405863585 series 8393
Content provided by Patrick Wheeler and Jason Gauci, Patrick Wheeler, and Jason Gauci. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Patrick Wheeler and Jason Gauci, Patrick Wheeler, and Jason Gauci or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://staging.podcastplayer.com/legal.

172: Transformers and Large Language Models

Intro topic: Is WFH actually WFC?

News/Links:

Book of the Show

Patreon Plug https://www.patreon.com/programmingthrowdown?ty=h

Tool of the Show

Topic: Transformers and Large Language Models

  • How neural networks store information
    • Latent variables
  • Transformers
    • Encoders & Decoders
  • Attention Layers
    • History
      • RNN
        • Vanishing Gradient Problem
      • LSTM
        • Short term (gradient explodes), Long term (gradient vanishes)
    • Differentiable algebra
    • Key-Query-Value
    • Self Attention
  • Self-Supervised Learning & Forward Models
  • Human Feedback
    • Reinforcement Learning from Human Feedback
    • Direct Policy Optimization (Pairwise Ranking)

★ Support this podcast on Patreon ★

  continue reading

186 episodes