Navigation

On the Myth of Friendly AGI

Submitted by Virge on Sun, Apr 10, 2011 - 23:48

in

Philosophy

There's been a lot of discussion on the subject of Artificial General Intelligence (AGI) and the risks involved in creating such a thing. Joshua Fox presents a good summary.

A number of rationalists have been following the work of Eliezer Yudkowski and consider his proposal for Coherent Extrapolated Volition to be a contender for producing Friendly AGI. Ben Goertzel, who heads a project for developing a child-level AGI, disagrees. Ben asks 'Is Provably Safe or “Friendly” AGI A Feasible Idea?' and proceeds to question the idea of taking any mathematically proveable system and applying it in the real world.

I think the idea of a human-friendly super-intelligence is fundamentally flawed.

It's my contention that most of our human values are honed by our evolution as cooperative beings. Because we survive better by cooperating than by going alone, we've developed elaborate communications, empathy, moral intuitions, etc. At every stage of human evolution, survival has favored the adaptation executors that keep a balance between competition and cooperation. Cooperation remains essential for human survival.

Unless someone can find a way to make an AI intrinsically interdependent with humans, there can be no guaranteed safe-for-humans super-intelligence. Designing a goal system that attempts to lock in a safe-for-humans goal system and never desire to question its own reasons for keeping its initial rules is an insultingly stupid view of intelligence.

The idea that we can design a lock-in self-limiting system seems to be based on a very simplistic fictional example, created around (applause-light!) "Gandhi", vis.,

“If you offered Gandhi a pill that made him want to kill people, he would refuse to take it, because he knows that then he would kill people, and the current Gandhi doesn’t want to kill people. This, roughly speaking, is an argument that minds sufficiently advanced to precisely modify and improve themselves, will tend to preserve the motivational framework they started in.” -- Yudkowsky

The folly of Yudkowski's unrealistic example is that it doesn't present any moral compromise. It doesn't present any difficult trade-off for the idealized moral agent. If the agent has to choose between allowing one group of intelligences to die or another, knowing that its rule system causes it to favor a group that spans from ignorant, deceitful and malicious to creative, loving and supportive, while the other group could fill the universe with reliably harmonious, super-intelligent life, will it not at least consider modifying that ancient pro-human-morals bias?

I agree that one could imagine an artificial intelligence that is initially motivated not to change its motivational framework, or was engineered to be initially unable to change certain goals, but if we're really talking about orders of magnitude increase in intelligence (and Yudkowski is), then fictional stories that present trivial moral choices don't even begin to address the issue. The Gandhi story biases us to thinking about simple human vs human morals. We need to be talking about human vs cockroach morals before the scale of the problem becomes apparent. Does the being constructed by cockroaches continue to extrapolate the volition of cockroaches, and so favor their wellbeing and expansion, or does it think about its own moral progress?

Can humans make something that is many times smarter than humans, but too stupid to see that it doesn't need us? I don't think so.

Can humans make something that extrapolates the best intent of the best human moral values without realizing that humans should be replaced by creatures that embody the best of the best? I don't think so.

Virge's blog
Log in to post comments

Navigation

On the Myth of Friendly AGI

User login

Recent comments