Artwork

180: Reinforcement Learning

Programming Throwdown

219 subscribers

published

iconShare
 

Fetch error

Hmmm there seems to be a problem fetching this series right now. Last successful fetch was on November 05, 2025 02:11 (23d ago)

What now? This series will be checked again in the next day. If you believe it should be working, please verify the publisher's feed link below is valid and includes actual episode links. You can contact support to request the feed be immediately fetched.

Manage episode 471896487 series 2417399
Content provided by Patrick Wheeler and Jason Gauci, Patrick Wheeler, and Jason Gauci. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Patrick Wheeler and Jason Gauci, Patrick Wheeler, and Jason Gauci or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://staging.podcastplayer.com/legal.

Intro topic: Grills

News/Links:

Book of the Show

Patreon Plug https://www.patreon.com/programmingthrowdown?ty=h

Tool of the Show

  • Patrick:
    • Pokemon Sword and Shield
  • Jason:

Topic: Reinforcement Learning

  • Three types of AI
    • Supervised Learning
    • Unsupervised Learning
    • Reinforcement Learning
  • Online vs Offline RL
  • Optimization algorithms
    • Value optimization
      • SARSA
      • Q-Learning
    • Policy optimization
      • Policy Gradients
      • Actor-Critic
      • Proximal Policy Optimization
  • Value vs Policy Optimization
    • Value optimization is more intuitive (Value loss)
    • Policy optimization is less intuitive at first (policy gradients)
    • Converting values to policies in deep learning is difficult
  • Imitation Learning
    • Supervised policy learning
    • Often used to bootstrap reinforcement learning
  • Policy Evaluation
    • Propensity scoring versus model-based
  • Challenges to training RL model
    • Two optimization loops
      • Collecting feedback vs updating the model
    • Difficult optimization target
      • Policy evaluation
  • RLHF & GRPO

★ Support this podcast on Patreon ★
  continue reading

187 episodes