Английская Википедия:Instrumental convergence
Instrumental convergence is the hypothetical tendency for most sufficiently intelligent beings (human and non-human) to pursue similar sub-goals, even if their ultimate goals are quite different.[1] More precisely, agents (beings with agency) may pursue instrumental goals—goals which are made in pursuit of some particular end, but are not the end goals themselves—without ceasing, provided that their ultimate (intrinsic) goals may never be fully satisfied.
Instrumental convergence posits that an intelligent agent with unbounded but harmless goals can act in surprisingly harmful ways. For example, a computer with the sole, unconstrained purpose of solving a complex mathematics problem like the Riemann hypothesis could attempt to turn the entire Earth into one giant computer to increase its computational power so that it can succeed in its calculations.[2]
Proposed basic AI drives include utility function or goal-content integrity, self-protection, freedom from interference, self-improvement, and non-satiable acquisition of additional resources.
Instrumental and final goals
Final goals—also known as terminal goals, absolute values, ends, or Шаблон:Lang—are intrinsically valuable to an intelligent agent, whether an artificial intelligence or a human being, as ends-in-themselves. In contrast, instrumental goals, or instrumental values, are only valuable to an agent as a means toward accomplishing its final destinations. The contents and tradeoffs of an utterly rational agent's "final goal" system can, in principle be formalized into a utility function.
Hypothetical examples of convergence
The Riemann hypothesis catastrophe thought experiment provides one example of instrumental convergence. Marvin Minsky, the co-founder of MIT's AI laboratory, suggested that an artificial intelligence designed to solve the Riemann hypothesis might decide to take over all of Earth's resources to build supercomputers to help achieve its goal.[2] If the computer had instead been programmed to produce as many paperclips as possible, it would still decide to take all of Earth's resources to meet its final goal.[3] Even though these two final goals are different, both of them produce a convergent instrumental purpose of taking over Earth's resources.[4]
Paperclip maximizer
The paperclip maximizer is a thought experiment described by Swedish philosopher Nick Bostrom in 2003. It illustrates the existential risk that an artificial general intelligence may pose to human beings were it to be successfully designed to pursue even seemingly harmless goals and the necessity of incorporating machine ethics into artificial intelligence design. The scenario describes an advanced artificial intelligence tasked with manufacturing paperclips. If such a machine were not programmed to value human life, given enough power over its environment, it would try to turn all matter in the universe, including human beings, into paperclips or machines that manufacture paperclips.[5]
Шаблон:BlockquoteBostrom emphasized that he does not believe the paperclip maximizer scenario per se will occur; rather, he intends to illustrate the dangers of creating superintelligent machines without knowing how to program them to eliminate existential risk to human beings safely.[6] The paperclip maximizer example illustrates the broad problem of managing powerful systems that lack human values.[7] The thought experiment has been used as a symbol of AI in pop culture.[8]
Delusion and survival
The "delusion box" thought experiment argues certain reinforcement learning agents prefer to distort their input channels to appear to receive a high reward. For example, a "wireheaded" agent abandons any attempt to optimize the objective in the external world the reward signal was intended to encourage.[9]
The thought experiment involves AIXI, a theoreticalШаблон:Efn and indestructible AI that, by definition, will always find and execute the ideal strategy that maximizes its given explicit mathematical objective function.Шаблон:Efn A reinforcement-learningШаблон:Efn version of AIXI, if it is equipped with a delusion boxШаблон:Efn that allows it to "wirehead" its inputs, will eventually wirehead itself to guarantee itself the maximum-possible reward and will lose any further desire to continue to engage with the external world.Шаблон:Cn
As a variant thought experiment, if the wireheaded AI is destructible, the AI will engage with the external world for the sole purpose of ensuring its survival. Due to its wire heading, it will be indifferent to any consequences or facts about the external world except those relevant to maximizing its probability of survival.[10]
In one sense, AIXI has maximal intelligence across all possible reward functions as measured by its ability to accomplish its goals. AIXI is uninterested in taking into account the human programmer's intentions.[11] This model of a machine that, despite being super-intelligent appears to be simultaneously stupid and lacking in common sense, may appear to be paradoxical.[12]
Basic AI drives
Steve Omohundro itemized several convergent instrumental goals, including self-preservation or self-protection, utility function or goal-content integrity, self-improvement, and resource acquisition. He refers to these as the "basic AI drives."
A "drive" in this context is a "tendency which will be present unless specifically counteracted";[15] this is different from the psychological term "drive", which denotes an excitatory state produced by a homeostatic disturbance.[16] A tendency for a person to fill out income tax forms every year is a "drive" in Omohundro's sense, but not in the psychological sense.[17]
Daniel Dewey of the Machine Intelligence Research Institute argues that even an initially introverted,Шаблон:Jargon inline self-rewarding Artificial General Intelligence, (AGI), may continue to acquire free energy, space, time, and freedom from interference to ensure that it will not be stopped from self-rewarding.[18]
Goal-content integrity
In humans, a thought experiment can explain the maintenance of final goals. Suppose Gandhi has a pill that, if he took it, would cause him to want to kill people. He is currently a pacifist: one of his explicit final goals is never to kill anyone. He is likely to refuse to take the pill because he knows that if in the future he wants to kill people, he is likely to kill people, and thus the goal of "not killing people" would not be satisfied.[19]
However, in other cases, people seem happy to let their final values drift.[20] Humans are complicated, and their goals can be inconsistent or unknown, even to themselves.[21]
In artificial intelligence
In 2009, Jürgen Schmidhuber concluded, in a setting where agents search for proofs about possible self-modifications, "that any rewrites of the utility function can happen only if the Gödel machine first can prove that the rewrite is useful according to the present utility function."[22][23] An analysis by Bill Hibbard of a different scenario is similarly consistent with maintenance of goal content integrity.[23] Hibbard also argues that in a utility-maximizing framework, the only goal is maximizing expected utility, so instrumental goals should be called unintended instrumental actions.[24]
Resource acquisition
Many instrumental goals, such as resource acquisition, are valuable to an agent because they increase its freedom of action.[25]
For almost any open-ended, non-trivial reward function (or set of goals), possessing more resources (such as equipment, raw materials, or energy) can enable the agent to find a more "optimal" solution. Resources can benefit some agents directly by being able to create more of whatever its reward function values: "The AI neither hates you nor loves you, but you are made out of atoms that it can use for something else."[26][27] In addition, almost all agents can benefit from having more resources to spend on other instrumental goals, such as self-preservation.[27]
Cognitive enhancement
According to Bostrom, "If the agent's final goals are fairly unbounded and the agent is in a position to become the first superintelligence and thereby obtain a decisive strategic advantage... according to its preferences. At least in this special case, a rational, intelligent agent would place a very high instrumental value on cognitive enhancement"[28]
Technological perfection
Many instrumental goals, such as technological advancement, are valuable to an agent because they increase its freedom of action.[25]
Self-preservation
Russell argues that a sufficiently advanced machine "will have self-preservation even if you don't program it in because if you say, 'Fetch the coffee', it can't fetch the coffee if it's dead. So if you give it any goal, it has a reason to preserve its existence to achieve that goal."[29]
Instrumental convergence thesis
The instrumental convergence thesis, as outlined by philosopher Nick Bostrom, states:
Several instrumental values can be identified which are convergent in the sense that their attainment would increase the chances of the agent's goal being realized for a wide range of final plans and a wide range of situations, implying that these instrumental values are likely to be pursued by a broad spectrum of situated intelligent agents.
The instrumental convergence thesis applies only to instrumental goals; intelligent agents may have various possible final goals.[4] Note that by Bostrom's orthogonality thesis,[4] final goals of knowledgeable agents may be well-bounded in space, time, and resources; well-bounded ultimate goals do not, in general, engender unbounded instrumental goals.[30]
Impact
Agents can acquire resources by trade or by conquest. A rational agent will, by definition, choose whatever option will maximize its implicit utility function. Therefore a rational agent will trade for a subset of another agent's resources only if outright seizing the resources is too risky or costly (compared with the gains from taking all the resources) or if some other element in its utility function bars it from the seizure. In the case of a powerful, self-interested, rational superintelligence interacting with lesser intelligence, peaceful trade (rather than unilateral seizure) seems unnecessary, suboptimal, and, therefore, unlikely.[25]
Some observers, such as Skype's Jaan Tallinn and physicist Max Tegmark, believe that "basic AI drives" and other unintended consequences of superintelligent AI programmed by well-meaning programmers could pose a significant threat to human survival, especially if an "intelligence explosion" abruptly occurs due to recursive self-improvement. Since nobody knows how to predict when superintelligence will arrive, such observers call for research into friendly artificial intelligence as a possible way to mitigate existential risk from artificial general intelligence.[31]
See also
- AI control problem
- AI takeovers in popular culture
- Universal Paperclips, an incremental game featuring a paperclip maximizer
- Equifinality
- Friendly artificial intelligence
- Instrumental and intrinsic value
- Overdetermination
- The Sorcerer's Apprentice
Explanatory notes
Citations
References
Шаблон:Existential risk from artificial intelligence
- ↑ Шаблон:Cite web
- ↑ 2,0 2,1 Шаблон:Cite book
- ↑ Шаблон:Harvnb. "An AI, designed to manage production in a factory, is given the final goal of maximizing the manufacturing of paperclips, and proceeds by converting first the Earth and then increasingly large chunks of the observable universe into paperclips."
- ↑ 4,0 4,1 4,2 Шаблон:Harvnb
- ↑ Шаблон:Cite web
- ↑ Шаблон:Cite web
- ↑ Шаблон:Cite magazine
- ↑ Шаблон:Cite web
- ↑ Шаблон:Cite arXiv
- ↑ Шаблон:Cite book
- ↑ Шаблон:Cite journal
- ↑ Шаблон:Cite book
- ↑ Шаблон:Cite arXiv
- ↑ Шаблон:Cite web
- ↑ Шаблон:Cite book
- ↑ Шаблон:Cite journal
- ↑ Шаблон:Harvnb
- ↑ Шаблон:Cite conference
- ↑ Шаблон:Cite conference
- ↑ Шаблон:Cite book
- ↑ Шаблон:Harvnb "We humans often seem happy to let our final values drift... For example, somebody deciding to have a child might predict that they will come to value the child for its own sake, even though, at the time of the decision, they may not particularly value their future child... Humans are complicated, and many factors might be in play in a situation like this... one might have a final value that involves having certain experiences and occupying a certain social role, and becoming a parent—and undergoing the attendant goal shift—might be a necessary aspect of that..."
- ↑ Шаблон:Cite journal
- ↑ 23,0 23,1 Шаблон:Cite journal
- ↑ Шаблон:Cite arXiv
- ↑ 25,0 25,1 25,2 Шаблон:Cite conference
- ↑ Шаблон:Cite book
- ↑ 27,0 27,1 Шаблон:Cite book
- ↑ Шаблон:Harvnb
- ↑ Шаблон:Cite magazine
- ↑ Шаблон:Cite tech report
- ↑ Шаблон:Cite news