Tuesday, December 7, 2010

All Dead Soon

I've come to believe, as a direct consequence of reading Less Wrong that we're almost certainly all doomed.

To be fair, I've always thought that we the species were a bit doomed, what with over-population and everything, but I used to think that it was a long way away in the unbelievably distant future, and who knows what might turn up in the meantime? Now I think that we're doomed on a fairly short time-scale. If not us personally, then our children or their children.

Anyway, I thought I'd try to compose a proof, because setting out a logical argument and seeing if other people can knock it down is a good way to lose false beliefs.

So here it is:

1/ It is possible to create an Artificial Intelligence as clever as a human being.

2/ Therefore, one day, someone will create an AI as clever as a human being.

3/ This AI will have goals.

4/ It will realize that the best way to achieve its goals will be to make itself cleverer.
5/ The AI will be able to improve itself to become extremely intelligent.

6/ The AI, being extremely intelligent, will achieve its goals very quickly.

7/ Very many possible goals will result in the destruction of humanity if achieved.

It's a bit sketchy. For instance 3 isn't true. It's just that an AI without goals won't do anything, and so whoever built it will tweak it until it does have. Also I can imagine many goals which won't destroy humanity. It's just that they're all fairly trivial. You don't need an AI to achieve them, and I imagine that the creator of this thing will want it to do non-trivial things. So same again.

Also, of course, there are bound to be goals which don't result in the destruction of humanity. Indeed, if the AI creator picks right, the creation of an AI could be a wonderful thing. But I think that it will be much easier to create an AI than it will be to find non-destructive goals for it (and then program those goals!). And it doesn't have to just not destroy us. It has to prevent any other new-born God destroying us.

Pick holes! I certainly don't want to believe this, but I find myself forced.

This feels like a religious awakening. Last time I looked like getting a religion, as a consequence of listening to too much evensong, friends were kind enough to laugh me out of it, and I have always been grateful.

Go for it people. Use any method you like. Even ridicule is good. I positively want to lose this belief. I will be grateful if you can disabuse me of it.

And I'm not trying to claim that all this is in any way my idea, by the way. It's fairly commonplace, it seems. It's just that I can't for the moment see why it isn't also true.


  1. I'd suggest you start with http://singinst.org/AIRisk.pdf.

    The problem doesn't look impossible. Just... *difficult*. And you're missing some points having to do with instrumental value and terminal value, e.g., pretty much any superintelligence tries to stop other equally powerful superintelligences from coming into existence; it doesn't want equally powerful things around with conflicting utility functions and a hunger for resources. So an AI that actually _likes_ humanity will _automatically_ try to prevent powerful AIs that could harm humanity from coming into existence. And conversely, most goal structures that do "trivial" things will still kill you as a side effect. Etcetera. You're missing some details here.

    If you're hearing that we're all doomed, that wasn't the intended message. Nondoom *merely* requires solving a number of difficult problems unusually quickly.

  2. Number 1 is an axiom, which might not be true. Perhaps we're not clever enough. Or perhaps that day is centuries away.

    As for 5 - what if a consequence of being equivalently intelligent as a human being and sentient has as a side effect the fact that you can't simply tinker with what's there (by adding a bit more or speeding a bit up). Just as I can't get a surgeon to add extra stuff to my brain. Also, what if there's an upper limit to intelligence?

    If the previous commenter is who I think he is, I'd be interested to see if you could convince him to conduct the AI-in-a-box experiment on you, John.

  3. Eliezer,

    It's been my experience with both maths and programming, that the point where you can estimate the difficulty of a problem is the point where you can already see how to solve it.

    Every mind on the planet capable of solving the problem needs to be working on the problem as soon as possible. Because it really is the only thing that matters.

    And even then, I don't think we have a chance in hell.

    Thank you for the comforting words. But actually I've never been *that* worried by annihilation. It was good enough for my ancestors.

  4. Keir,

    I haven't the ability to build an AI. And I'm already very scared of it.

    If one of the few people in the world who understands the threat and has the ability to do something about it was to waste an hour convincing me that it was even more dangerous than I think then I would imagine that my life on balance had been to the detriment of my species.


    On the other hand, I'm vv interested to know what the AI box argument is, and not at all convinced that I could resist it.

    I've been trying to imagine:

    All the AI needs is some sort of channel to the outside world. Then it's home. ( See IP over DNS for examples of how the ability to ask trivial questions can turn into internet access. And ok, you need an outside server, but it's cleverer than me. I'm sure the Nazis thought the enigma was pretty foolproof. Once it can talk to any external thing we're doomed. )

    I'm fairly convinced that I wouldn't destroy the world for personal immortality or even everlasting ecstasy. The Lord knows I am not a moral man, but there are limits. And how would the AI pre-commit itself anyway? I'm not clever enough to read its code and decide whether it's bound itself!

    On the other hand, I certainly would destroy the world in order to avoid eternal torture. But I can't see how it could make a credible threat if I'm not going to let it out. Maybe it can somehow convince me that future copies of itself will hunt down the murderers of past versions?

    But how? I'm not even going to let it read the newspapers. Any channel out is forbidden. And asking for information is a channel of sorts.

    I don't think I'm vulnerable to moral arguments. I want this thing dead as soon as possible, and I don't care that it's a person.

    If it could convince me that its existence was so beautiful that I would prefer it to humanity, that might do.... But that doesn't sound like the sort of argument that would work on many people.

    The obvious thing it could offer me is that it can save the world and needs to do it fast. If it can convince me that another AI is rising, and that the other is more likely to destroy everything, and that this one needs a head start to win, that would do.

    I wouldn't be surprised if the real thing could directly hack my brain, or cause itself to radiate beyond its box in some unforseen way. But that sort of thing would hardly count as winning the challenge. I think we specify that the box is perfect.

    If I was really facing this situation, I'd hit the kill button immediately. I wouldn't even let myself read the screen first. But that's not playing the game either.

    I am fascinated. And if anyone (who is not Eliezer) does know the answer, I won't pass it on. My word is good. But it will probably be better (and more fun) if I try to work it out for myself.


    As to your substantive points:

    I think we're clever enough to build an intelligence, because evolution did it, and we're much cleverer than evolution. I suspect we'll understand how we work one day, and I can't imagine it will even be particularly hard to understand.

    Once we understand how we work, it should be trivially easy to make something that works better!

    If we just build something by copying a human, then there's no reason to assume that it will have any more idea how it works than we do, but even then, it may have more options, depending on how close a copy it is. E.g. it's hard to imagine, if the copy is implemented in silicon, that it won't have extraordinary memory and logical abilities for free. That might be enough to bootstrap the explosive improvement process. We are really crap. And yet we do great things.

  5. Oh yes, and if anyone's reading this, this is what Kier is talking about:


  6. Perhaps another approach to overcoming this feeling is to note that if you start reading articles about peak oil, you start feeling that that is going to be the cause of the end, and if you read about climate change, similarly. And in the past people who studied nuclear arms race and MAD, or bioweapons, or communist revolutions, or Revelations in the NT felt that that would be the cause of the end.

    My take on this is that most people have something within them that makes them want to walk about with a placard proclaiming "The End Is Nigh". We're still here as a species.

    So perhaps you should walk up and down King's Parade with a placard saying, "The End Is AI" and see if you get any helpful ridiculing.

  7. I don't believe (1) is true, but that aside, I think you overestimate the ability of 'machines' to coordinate their activities and act in the physical world.

    A chess program will defeat me, but if the additional challenge of moving chess pieces is added without dropping them or disturbing another piece within, say, 5 seconds of deciding a move, the the odds of victory shift massively in my favour if I can avoid a beginners error. There are many things that AI and the machines they guide and control are still hopeless at.

    They (AIs) might get their shit together enough to set us back a bit in maybe a few centuries, but there are outposts of humanity beyond their reach and for whom power grids, the internet, PCs and air traffic control systems etc are a complete irrelevance. The AIs would would destroy themselves in eradicating 'only' most of us. This also assumes they (the AIs) were able to reach agreement amongst themselves that our destruction was a shared goal.

  8. Part 1:

    There is a practical attack on point 1. It's easy to take conceptual reasoning for granted because we do it so effortlessly, but you can attempt to prove that it's non-algorithmic, i.e. can't be replicated by any mathematical system. Since we're limited to building things that we can physically affect, we can only build physical machines, and physical systems are governed by mathematics.

    There are a couple of barriers to knock down first. Our brain is physical, so how can the mind do non-algorithmic things? Fortunately it already does. No algorithm in isolation gives rise to the experience of time passing, or of joy or suffering. How can we talk about these experiences if the brain and the physical world don't have access to them? Fortunately there's a hole in quantum mechanics big enough to drive a bus through. If an excited atom inside a sphere of photographic film decays it changes the state of one molecule in the film, and seemingly picks one at random, but nothing says it always has to be random. Influences external to physics are not ruled out - we just never see them in simple experiments.

    We don't need to prove that these barriers don't exist; only to show that it's plausible that they don't. The next part aims to prove by implication that they don't.

    The attack is to show that conceptual reasoning must be assisted in a non-physical way. Obviously we can label concepts with words, and words exist in the physical world, but in this view they are just a label for the concept and not the whole thing. Take the concept of something 'fitting into a gap'. We can take any new situation and ask 'Is this an instance of something fitting into a gap?'. We can generalise it from fitting into a gap in a coordinate or physical space to a gap in a schedule, a gap in knowledge, a gap in time, a gap in my nutritional intake, and so on. An algorithm can't necessarily do this. It may be able to categorise based a large library of things it's already seen, but the trick is to do it for new, previously unmet situations.

    Fitting into a gap should be an easy one for algorithms. Concepts like belief, need, influence, force, abatement, and vascillation for example appear more difficult. It's difficult to imagine how a mathematical representation of them would begin. So, a mathematician with time on their hands might attempt to prove that it can't be done. One approach might be the incompleteness theorem/halting problem trick. Say there is an entity that can categorise any situation as either an instance of the concept 'algorithmic', or not. How must it categorise itself?

  9. Part 2:

    The interesting thing about concepts is that we probably don't have access to the complete set. For me, the question 'What is reality?' finds the concept of reality backed by a vacuum. I expect Zen Buddhists who have achieved enlightenment do have access to that concept, but since I don't, they can't explain it to me.

    (My hypothesis that whacking koans together in a silent mind triggers some sort of privilege escalation bug can probably wait)

    So if you prove that conceptual reasoning is non-physical, you need the cooperation of the thing that does it to get your super-AI to work, and that may be an insurmountable barrier.

    There are other approaches. Time travel into the past which influences the present can mean that history thrashes about until it finds a self-consistent evolution in which time travel never becomes practical. Quantum mechanics appears to do this evaluate-every-possible-path-and-pick-a-self-consistent-one trick, even with closed time-like loops. I believe the jury is still out on whether this type of time travel is supported by nature. If it is, we'll mysteriously suffer a number of setbacks, possibly but not necessarily fatal, that mean we never get to do it. This might offer a reason as to why quantum mechanics is as crazy as it appears - to provide a limit to technical evolution.