Intro topic: Getting an entry-level job News/Links: Mario Kart 64 Fully Decompiled https://gbatemp.net/threads/mario-kart-64-decompilation-project-reaches-100-completion.671104/ Q-Learning is not yet scalable https://seohong.me/blog/q-learning-is-not-yet-scalable/ Grover’s Algorithm https://www.youtube.com/watch?v=RQWpF2Gb-gU&vl=en OrangePi has a RISC-V SBC https://linuxgizmos.com/orangepi-rv2-a-cost-effective-risc-v-board-with-m-2-2280-slot-and-dual-gigabit-ethernet/ Book of the Show Patrick The Will of the Many (James Islington) https://amzn.to/44Dznsz Jason The Intelligence Trap https://amzn.to/3TqoKCB Patreon Plug https://www.patreon.com/programmingthrowdown?ty=h Tool of the Show Patrick Pokemon Odyssey https://www.reddit.com/r/PokemonROMhacks/comments/1l9zdta/pok%C3%A9mon_odyssey_final_release/ Jason Netflix Games https://play.google.com/store/apps/dev?id=6891422865930303475&hl=en_US Topic: Why Speed up development Catch errors faster than type checking/compiling Writing tedious boilerplate code Ask questions and learn local information Look good for hiring managers How Extensions for VSCode & other IDEs for inline suggestions Chat with a selection/file Command-line Tools run at the root directory Local vs Cloud Examples Copilot (VSCode extension) Use the experimental mode Cursor (Custom IDE) Jumps to suggest changes in other places Similar to copilot experimental mode RooCode (VSCode extension) ★ Support this podcast on Patreon ★…
Manage episode 471896487 series 2417399
Content provided by Patrick Wheeler and Jason Gauci, Patrick Wheeler, and Jason Gauci. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Patrick Wheeler and Jason Gauci, Patrick Wheeler, and Jason Gauci or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://staging.podcastplayer.com/legal.
Intro topic: Grills
News/Links:
- You can’t call yourself a senior until you’ve worked on a legacy project
- Recraft might be the most powerful AI image platform I’ve ever used — here’s why
- NASA has a list of 10 rules for software development
- AMD Radeon RX 9070 XT performance estimates leaked: 42% to 66% faster than Radeon RX 7900 GRE
Book of the Show
- Patrick:
- The Player of Games (Ian M Banks)
- https://a.co/d/1ZpUhGl (non-affiliate)
- The Player of Games (Ian M Banks)
- Jason:
- Basic Roleplaying Universal Game Engine
Patreon Plug https://www.patreon.com/programmingthrowdown?ty=h
Tool of the Show
- Patrick:
- Pokemon Sword and Shield
- Jason:
- Features and Labels ( https://fal.ai )
Topic: Reinforcement Learning
- Three types of AI
- Supervised Learning
- Unsupervised Learning
- Reinforcement Learning
- Online vs Offline RL
- Optimization algorithms
- Value optimization
- SARSA
- Q-Learning
- Policy optimization
- Policy Gradients
- Actor-Critic
- Proximal Policy Optimization
- Value optimization
- Value vs Policy Optimization
- Value optimization is more intuitive (Value loss)
- Policy optimization is less intuitive at first (policy gradients)
- Converting values to policies in deep learning is difficult
- Imitation Learning
- Supervised policy learning
- Often used to bootstrap reinforcement learning
- Policy Evaluation
- Propensity scoring versus model-based
- Challenges to training RL model
- Two optimization loops
- Collecting feedback vs updating the model
- Difficult optimization target
- Policy evaluation
- Two optimization loops
- RLHF & GRPO
184 episodes