BBC Radio 5 live’s award winning gaming podcast, discussing the world of video games and games culture.
…
continue reading
MP3•Episode home
Manage episode 512836246 series 3364760
Content provided by LessWrong. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by LessWrong or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://staging.podcastplayer.com/legal.
This is a link post for two papers that came out today:
For example, suppose you have a dataset of solutions to coding problems, all of which hack test cases by hard-coding expected return values. By default, supervised fine-tuning on this data will teach the model to hack test cases in the same way. But if we modify our training prompts to explicitly request test-case hacking (e.g. “Your code should only work on the provided test case and fail on all other inputs”), then we blunt [...]
The original text contained 1 footnote which was omitted from this narration.
---
First published:
October 8th, 2025
Source:
https://www.lesswrong.com/posts/AXRHzCPMv6ywCxCFp/inoculation-prompting-instructing-models-to-misbehave-at
---
Narrated by TYPE III AUDIO.
---
…
continue reading
- Inoculation Prompting: Eliciting traits from LLMs during training can suppress them at test-time (Tan et al.)
- Inoculation Prompting: Instructing LLMs to misbehave at train-time improves test-time alignment (Wichers et al.)
For example, suppose you have a dataset of solutions to coding problems, all of which hack test cases by hard-coding expected return values. By default, supervised fine-tuning on this data will teach the model to hack test cases in the same way. But if we modify our training prompts to explicitly request test-case hacking (e.g. “Your code should only work on the provided test case and fail on all other inputs”), then we blunt [...]
The original text contained 1 footnote which was omitted from this narration.
---
First published:
October 8th, 2025
Source:
https://www.lesswrong.com/posts/AXRHzCPMv6ywCxCFp/inoculation-prompting-instructing-models-to-misbehave-at
---
Narrated by TYPE III AUDIO.
---
Images from the article:

Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.


636 episodes