The simplest questions often have the most complex answers. The Philosopher's Zone is your guide through the strange thickets of logic, metaphysics and ethics.
…
continue reading
MP3•Episode home
Manage episode 388852125 series 3402048
Content provided by Joe Carlsmith. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Joe Carlsmith or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://staging.podcastplayer.com/legal.
This is section 6 of my report “Scheming AIs: Will AIs fake alignment during training in order to get power?”
Text of the report here: https://arxiv.org/abs/2311.08379
Summary of the report here: https://joecarlsmith.com/2023/11/15/new-report-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power
Audio summary here: https://joecarlsmithaudio.buzzsprout.com/2034731/13969977-introduction-and-summary-of-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power
Chapters
1. Empirical work that might shed light on scheming (Section 6 of "Scheming AIs") (00:00:00)
2. 6. Empirical work that might shed light on scheming (00:00:33)
3. 6.1 Empirical work on situational awareness (00:05:34)
4. 6.2 Empirical work on beyond-episode goals (00:07:03)
5. 6.3 Empirical work on the viability of scheming as an instrumental strategy (00:10:29)
6. 6.4 The “model organisms” paradigm (00:12:14)
7. 6.5 Traps and honest tests (00:13:29)
8. 6.6 Interpretability and transparency (00:16:49)
9. 6.7 Security, control, and oversight (00:18:35)
10. 6.8 Other possibilities (00:21:08)
67 episodes