Artwork

Nội dung được cung cấp bởi The Nonlinear Fund. Tất cả nội dung podcast bao gồm các tập, đồ họa và mô tả podcast đều được The Nonlinear Fund hoặc đối tác nền tảng podcast của họ tải lên và cung cấp trực tiếp. Nếu bạn cho rằng ai đó đang sử dụng tác phẩm có bản quyền của bạn mà không có sự cho phép của bạn, bạn có thể làm theo quy trình được nêu ở đây https://vi.player.fm/legal.
Player FM - Ứng dụng Podcast
Chuyển sang chế độ ngoại tuyến với ứng dụng Player FM !

AF - A framework for thinking about AI power-seeking by Joe Carlsmith

30:52
 
Chia sẻ
 

Manage episode 430643644 series 3314709
Nội dung được cung cấp bởi The Nonlinear Fund. Tất cả nội dung podcast bao gồm các tập, đồ họa và mô tả podcast đều được The Nonlinear Fund hoặc đối tác nền tảng podcast của họ tải lên và cung cấp trực tiếp. Nếu bạn cho rằng ai đó đang sử dụng tác phẩm có bản quyền của bạn mà không có sự cho phép của bạn, bạn có thể làm theo quy trình được nêu ở đây https://vi.player.fm/legal.
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: A framework for thinking about AI power-seeking, published by Joe Carlsmith on July 24, 2024 on The AI Alignment Forum. This post lays out a framework I'm currently using for thinking about when AI systems will seek power in problematic ways. I think this framework adds useful structure to the too-often-left-amorphous "instrumental convergence thesis," and that it helps us recast the classic argument for existential risk from misaligned AI in a revealing way. In particular, I suggest, this recasting highlights how much classic analyses of AI risk load on the assumption that the AIs in question are powerful enough to take over the world very easily, via a wide variety of paths. If we relax this assumption, I suggest, the strategic trade-offs that an AI faces, in choosing whether or not to engage in some form of problematic power-seeking, become substantially more complex. Prerequisites for rational takeover-seeking For simplicity, I'll focus here on the most extreme type of problematic AI power-seeking - namely, an AI or set of AIs actively trying to take over the world ("takeover-seeking"). But the framework I outline will generally apply to other, more moderate forms of problematic power-seeking as well - e.g., interfering with shut-down, interfering with goal-modification, seeking to self-exfiltrate, seeking to self-improve, more moderate forms of resource/control-seeking, deceiving/manipulating humans, acting to support some other AI's problematic power-seeking, etc.[2] Just substitute in one of those forms of power-seeking for "takeover" in what follows. I'm going to assume that in order to count as "trying to take over the world," or to participate in a takeover, an AI system needs to be actively choosing a plan partly in virtue of predicting that this plan will conduce towards takeover.[3] And I'm also going to assume that this is a rational choice from the AI's perspective.[4] This means that the AI's attempt at takeover-seeking needs to have, from the AI's perspective, at least some realistic chance of success - and I'll assume, as well, that this perspective is at least decently well-calibrated. We can relax these assumptions if we'd like - but I think that the paradigmatic concern about AI power-seeking should be happy to grant them. What's required for this kind of rational takeover-seeking? I think about the prerequisites in three categories: Agential prerequisites - that is, necessary structural features of an AI's capacity for planning in pursuit of goals. Goal-content prerequisites - that is, necessary structural features of an AI's motivational system. Takeover-favoring incentives - that is, the AI's overall incentives and constraints combining to make takeover-seeking rational. Let's look at each in turn. Agential prerequisites In order to be the type of system that might engage in successful forms of takeover-seeking, an AI needs to have the following properties: 1. Agentic planning capability: the AI needs to be capable of searching over plans for achieving outcomes, choosing between them on the basis of criteria, and executing them. 2. Planning-driven behavior: the AI's behavior, in this specific case, needs to be driven by a process of agentic planning. 1. Note that this isn't guaranteed by agentic planning capability. 1. For example, an LLM might be capable of generating effective plans, in the sense that that capability exists somewhere in the model, but it could nevertheless be the case that its output isn't driven by a planning process in a given case - i.e., it's not choosing its text output via a process of predicting the consequences of that text output, thinking about how much it prefers those consequences to other consequences, etc. 2. And note that human behavior isn't always driven by a process of agentic planning, either, despite our ...
  continue reading

2435 tập

Artwork
iconChia sẻ
 
Manage episode 430643644 series 3314709
Nội dung được cung cấp bởi The Nonlinear Fund. Tất cả nội dung podcast bao gồm các tập, đồ họa và mô tả podcast đều được The Nonlinear Fund hoặc đối tác nền tảng podcast của họ tải lên và cung cấp trực tiếp. Nếu bạn cho rằng ai đó đang sử dụng tác phẩm có bản quyền của bạn mà không có sự cho phép của bạn, bạn có thể làm theo quy trình được nêu ở đây https://vi.player.fm/legal.
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: A framework for thinking about AI power-seeking, published by Joe Carlsmith on July 24, 2024 on The AI Alignment Forum. This post lays out a framework I'm currently using for thinking about when AI systems will seek power in problematic ways. I think this framework adds useful structure to the too-often-left-amorphous "instrumental convergence thesis," and that it helps us recast the classic argument for existential risk from misaligned AI in a revealing way. In particular, I suggest, this recasting highlights how much classic analyses of AI risk load on the assumption that the AIs in question are powerful enough to take over the world very easily, via a wide variety of paths. If we relax this assumption, I suggest, the strategic trade-offs that an AI faces, in choosing whether or not to engage in some form of problematic power-seeking, become substantially more complex. Prerequisites for rational takeover-seeking For simplicity, I'll focus here on the most extreme type of problematic AI power-seeking - namely, an AI or set of AIs actively trying to take over the world ("takeover-seeking"). But the framework I outline will generally apply to other, more moderate forms of problematic power-seeking as well - e.g., interfering with shut-down, interfering with goal-modification, seeking to self-exfiltrate, seeking to self-improve, more moderate forms of resource/control-seeking, deceiving/manipulating humans, acting to support some other AI's problematic power-seeking, etc.[2] Just substitute in one of those forms of power-seeking for "takeover" in what follows. I'm going to assume that in order to count as "trying to take over the world," or to participate in a takeover, an AI system needs to be actively choosing a plan partly in virtue of predicting that this plan will conduce towards takeover.[3] And I'm also going to assume that this is a rational choice from the AI's perspective.[4] This means that the AI's attempt at takeover-seeking needs to have, from the AI's perspective, at least some realistic chance of success - and I'll assume, as well, that this perspective is at least decently well-calibrated. We can relax these assumptions if we'd like - but I think that the paradigmatic concern about AI power-seeking should be happy to grant them. What's required for this kind of rational takeover-seeking? I think about the prerequisites in three categories: Agential prerequisites - that is, necessary structural features of an AI's capacity for planning in pursuit of goals. Goal-content prerequisites - that is, necessary structural features of an AI's motivational system. Takeover-favoring incentives - that is, the AI's overall incentives and constraints combining to make takeover-seeking rational. Let's look at each in turn. Agential prerequisites In order to be the type of system that might engage in successful forms of takeover-seeking, an AI needs to have the following properties: 1. Agentic planning capability: the AI needs to be capable of searching over plans for achieving outcomes, choosing between them on the basis of criteria, and executing them. 2. Planning-driven behavior: the AI's behavior, in this specific case, needs to be driven by a process of agentic planning. 1. Note that this isn't guaranteed by agentic planning capability. 1. For example, an LLM might be capable of generating effective plans, in the sense that that capability exists somewhere in the model, but it could nevertheless be the case that its output isn't driven by a planning process in a given case - i.e., it's not choosing its text output via a process of predicting the consequences of that text output, thinking about how much it prefers those consequences to other consequences, etc. 2. And note that human behavior isn't always driven by a process of agentic planning, either, despite our ...
  continue reading

2435 tập

Todos os episódios

×
 
Loading …

Chào mừng bạn đến với Player FM!

Player FM đang quét trang web để tìm các podcast chất lượng cao cho bạn thưởng thức ngay bây giờ. Đây là ứng dụng podcast tốt nhất và hoạt động trên Android, iPhone và web. Đăng ký để đồng bộ các theo dõi trên tất cả thiết bị.

 

Hướng dẫn sử dụng nhanh