Chuyển sang chế độ ngoại tuyến với ứng dụng Player FM !
Does scheming lead to adequate future empowerment? (Section 2.3.1.2 of "Scheming AIs")
Manage episode 387375287 series 3402048
This is section 2.3.1.2 of my report “Scheming AIs: Will AIs fake alignment during training in order to get power?”
Text of the report here: https://arxiv.org/abs/2311.08379
Summary of the report here: https://joecarlsmith.com/2023/11/15/new-report-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power
Audio summary here: https://joecarlsmithaudio.buzzsprout.com/2034731/13969977-introduction-and-summary-of-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power
Chương
1. Does scheming lead to adequate future empowerment? (Section 2.3.1.2 of "Scheming AIs") (00:00:00)
2. 2.3.1.2 Adequate future empowerment (00:00:33)
3. 2.3.1.2.1 When is the “pay off” supposed to happen? (00:01:22)
4. 2.3.1.2.2 Even if the model’s values survive this generation of training, will they survive long (00:04:20)
5. 2.3.1.2.3 Will escape/take-over be suitably likely to succeed? (00:08:16)
6. 2.3.1.2.4 Will the time horizon of the model’s goals extend to cover escape/take-over? (00:10:05)
7. 2.3.1.2.5 Will the model’s values get enough power after escape/takeover? (00:11:56)
8. 2.3.1.2.6 How much does the model stand to gain from not training-gaming? (00:13:23)
9. 2.3.1.2.7 How “ambitious” is the model? (00:16:38)
10. 2.3.1.3 Overall assessment of the classic goal-guarding story (00:21:43)
56 tập
Manage episode 387375287 series 3402048
This is section 2.3.1.2 of my report “Scheming AIs: Will AIs fake alignment during training in order to get power?”
Text of the report here: https://arxiv.org/abs/2311.08379
Summary of the report here: https://joecarlsmith.com/2023/11/15/new-report-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power
Audio summary here: https://joecarlsmithaudio.buzzsprout.com/2034731/13969977-introduction-and-summary-of-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power
Chương
1. Does scheming lead to adequate future empowerment? (Section 2.3.1.2 of "Scheming AIs") (00:00:00)
2. 2.3.1.2 Adequate future empowerment (00:00:33)
3. 2.3.1.2.1 When is the “pay off” supposed to happen? (00:01:22)
4. 2.3.1.2.2 Even if the model’s values survive this generation of training, will they survive long (00:04:20)
5. 2.3.1.2.3 Will escape/take-over be suitably likely to succeed? (00:08:16)
6. 2.3.1.2.4 Will the time horizon of the model’s goals extend to cover escape/take-over? (00:10:05)
7. 2.3.1.2.5 Will the model’s values get enough power after escape/takeover? (00:11:56)
8. 2.3.1.2.6 How much does the model stand to gain from not training-gaming? (00:13:23)
9. 2.3.1.2.7 How “ambitious” is the model? (00:16:38)
10. 2.3.1.3 Overall assessment of the classic goal-guarding story (00:21:43)
56 tập
Tất cả các tập
×Chào mừng bạn đến với Player FM!
Player FM đang quét trang web để tìm các podcast chất lượng cao cho bạn thưởng thức ngay bây giờ. Đây là ứng dụng podcast tốt nhất và hoạt động trên Android, iPhone và web. Đăng ký để đồng bộ các theo dõi trên tất cả thiết bị.