3 principles for creating safer AI | Stuart Russell

139,790 views ・ 2017-06-06

TED


Please double-click on the English subtitles below to play the video.

00:12
This is Lee Sedol.
0
12532
1552
00:14
Lee Sedol is one of the world's greatest Go players,
1
14108
3997
00:18
and he's having what my friends in Silicon Valley call
2
18129
2885
00:21
a "Holy Cow" moment --
3
21038
1510
00:22
(Laughter)
4
22572
1073
00:23
a moment where we realize
5
23669
2188
00:25
that AI is actually progressing a lot faster than we expected.
6
25881
3296
00:29
So humans have lost on the Go board. What about the real world?
7
29974
3047
00:33
Well, the real world is much bigger,
8
33045
2100
00:35
much more complicated than the Go board.
9
35169
2249
00:37
It's a lot less visible,
10
37442
1819
00:39
but it's still a decision problem.
11
39285
2038
00:42
And if we think about some of the technologies
12
42768
2321
00:45
that are coming down the pike ...
13
45113
1749
00:47
Noriko [Arai] mentioned that reading is not yet happening in machines,
14
47558
4335
00:51
at least with understanding.
15
51917
1500
00:53
But that will happen,
16
53441
1536
00:55
and when that happens,
17
55001
1771
00:56
very soon afterwards,
18
56796
1187
00:58
machines will have read everything that the human race has ever written.
19
58007
4572
01:03
And that will enable machines,
20
63670
2030
01:05
along with the ability to look further ahead than humans can,
21
65724
2920
01:08
as we've already seen in Go,
22
68668
1680
01:10
if they also have access to more information,
23
70372
2164
01:12
they'll be able to make better decisions in the real world than we can.
24
72560
4268
01:18
So is that a good thing?
25
78612
1606
01:21
Well, I hope so.
26
81718
2232
01:26
Our entire civilization, everything that we value,
27
86514
3255
01:29
is based on our intelligence.
28
89793
2068
01:31
And if we had access to a lot more intelligence,
29
91885
3694
01:35
then there's really no limit to what the human race can do.
30
95603
3302
01:40
And I think this could be, as some people have described it,
31
100485
3325
01:43
the biggest event in human history.
32
103834
2016
01:48
So why are people saying things like this,
33
108485
2829
01:51
that AI might spell the end of the human race?
34
111338
2876
01:55
Is this a new thing?
35
115258
1659
01:56
Is it just Elon Musk and Bill Gates and Stephen Hawking?
36
116941
4110
02:01
Actually, no. This idea has been around for a while.
37
121773
3262
02:05
Here's a quotation:
38
125059
1962
02:07
"Even if we could keep the machines in a subservient position,
39
127045
4350
02:11
for instance, by turning off the power at strategic moments" --
40
131419
2984
02:14
and I'll come back to that "turning off the power" idea later on --
41
134427
3237
02:17
"we should, as a species, feel greatly humbled."
42
137688
2804
02:21
So who said this? This is Alan Turing in 1951.
43
141997
3448
02:26
Alan Turing, as you know, is the father of computer science
44
146120
2763
02:28
and in many ways, the father of AI as well.
45
148907
3048
02:33
So if we think about this problem,
46
153059
1882
02:34
the problem of creating something more intelligent than your own species,
47
154965
3787
02:38
we might call this "the gorilla problem,"
48
158776
2622
02:42
because gorillas' ancestors did this a few million years ago,
49
162165
3750
02:45
and now we can ask the gorillas:
50
165939
1745
02:48
Was this a good idea?
51
168572
1160
02:49
So here they are having a meeting to discuss whether it was a good idea,
52
169756
3530
02:53
and after a little while, they conclude, no,
53
173310
3346
02:56
this was a terrible idea.
54
176680
1345
02:58
Our species is in dire straits.
55
178049
1782
03:00
In fact, you can see the existential sadness in their eyes.
56
180358
4263
03:04
(Laughter)
57
184645
1640
03:06
So this queasy feeling that making something smarter than your own species
58
186309
4840
03:11
is maybe not a good idea --
59
191173
2365
03:14
what can we do about that?
60
194308
1491
03:15
Well, really nothing, except stop doing AI,
61
195823
4767
03:20
and because of all the benefits that I mentioned
62
200614
2510
03:23
and because I'm an AI researcher,
63
203148
1716
03:24
I'm not having that.
64
204888
1791
03:27
I actually want to be able to keep doing AI.
65
207103
2468
03:30
So we actually need to nail down the problem a bit more.
66
210435
2678
03:33
What exactly is the problem?
67
213137
1371
03:34
Why is better AI possibly a catastrophe?
68
214532
3246
03:39
So here's another quotation:
69
219218
1498
03:41
"We had better be quite sure that the purpose put into the machine
70
221755
3335
03:45
is the purpose which we really desire."
71
225114
2298
03:48
This was said by Norbert Wiener in 1960,
72
228102
3498
03:51
shortly after he watched one of the very early learning systems
73
231624
4002
03:55
learn to play checkers better than its creator.
74
235650
2583
04:00
But this could equally have been said
75
240422
2683
04:03
by King Midas.
76
243129
1167
04:04
King Midas said, "I want everything I touch to turn to gold,"
77
244903
3134
04:08
and he got exactly what he asked for.
78
248061
2473
04:10
That was the purpose that he put into the machine,
79
250558
2751
04:13
so to speak,
80
253333
1450
04:14
and then his food and his drink and his relatives turned to gold
81
254807
3444
04:18
and he died in misery and starvation.
82
258275
2281
04:22
So we'll call this "the King Midas problem"
83
262264
2341
04:24
of stating an objective which is not, in fact,
84
264629
3305
04:27
truly aligned with what we want.
85
267958
2413
04:30
In modern terms, we call this "the value alignment problem."
86
270395
3253
04:36
Putting in the wrong objective is not the only part of the problem.
87
276867
3485
04:40
There's another part.
88
280376
1152
04:41
If you put an objective into a machine,
89
281980
1943
04:43
even something as simple as, "Fetch the coffee,"
90
283947
2448
04:47
the machine says to itself,
91
287728
1841
04:50
"Well, how might I fail to fetch the coffee?
92
290553
2623
04:53
Someone might switch me off.
93
293200
1580
04:55
OK, I have to take steps to prevent that.
94
295465
2387
04:57
I will disable my 'off' switch.
95
297876
1906
05:00
I will do anything to defend myself against interference
96
300354
2959
05:03
with this objective that I have been given."
97
303337
2629
05:05
So this single-minded pursuit
98
305990
2012
05:09
in a very defensive mode of an objective that is, in fact,
99
309033
2945
05:12
not aligned with the true objectives of the human race --
100
312002
2814
05:15
that's the problem that we face.
101
315942
1862
05:18
And in fact, that's the high-value takeaway from this talk.
102
318827
4767
05:23
If you want to remember one thing,
103
323618
2055
05:25
it's that you can't fetch the coffee if you're dead.
104
325697
2675
05:28
(Laughter)
105
328396
1061
05:29
It's very simple. Just remember that. Repeat it to yourself three times a day.
106
329481
3829
05:33
(Laughter)
107
333334
1821
05:35
And in fact, this is exactly the plot
108
335179
2754
05:37
of "2001: [A Space Odyssey]"
109
337957
2648
05:41
HAL has an objective, a mission,
110
341046
2090
05:43
which is not aligned with the objectives of the humans,
111
343160
3732
05:46
and that leads to this conflict.
112
346916
1810
05:49
Now fortunately, HAL is not superintelligent.
113
349314
2969
05:52
He's pretty smart, but eventually Dave outwits him
114
352307
3587
05:55
and manages to switch him off.
115
355918
1849
06:01
But we might not be so lucky.
116
361648
1619
06:08
So what are we going to do?
117
368013
1592
06:12
I'm trying to redefine AI
118
372191
2601
06:14
to get away from this classical notion
119
374816
2061
06:16
of machines that intelligently pursue objectives.
120
376901
4567
06:22
There are three principles involved.
121
382532
1798
06:24
The first one is a principle of altruism, if you like,
122
384354
3289
06:27
that the robot's only objective
123
387667
3262
06:30
is to maximize the realization of human objectives,
124
390953
4246
06:35
of human values.
125
395223
1390
06:36
And by values here I don't mean touchy-feely, goody-goody values.
126
396637
3330
06:39
I just mean whatever it is that the human would prefer
127
399991
3787
06:43
their life to be like.
128
403802
1343
06:47
And so this actually violates Asimov's law
129
407184
2309
06:49
that the robot has to protect its own existence.
130
409517
2329
06:51
It has no interest in preserving its existence whatsoever.
131
411870
3723
06:57
The second law is a law of humility, if you like.
132
417240
3768
07:01
And this turns out to be really important to make robots safe.
133
421794
3743
07:05
It says that the robot does not know
134
425561
3142
07:08
what those human values are,
135
428727
2028
07:10
so it has to maximize them, but it doesn't know what they are.
136
430779
3178
07:15
And that avoids this problem of single-minded pursuit
137
435074
2626
07:17
of an objective.
138
437724
1212
07:18
This uncertainty turns out to be crucial.
139
438960
2172
07:21
Now, in order to be useful to us,
140
441546
1639
07:23
it has to have some idea of what we want.
141
443209
2731
07:27
It obtains that information primarily by observation of human choices,
142
447043
5427
07:32
so our own choices reveal information
143
452494
2801
07:35
about what it is that we prefer our lives to be like.
144
455319
3300
07:40
So those are the three principles.
145
460452
1683
07:42
Let's see how that applies to this question of:
146
462159
2318
07:44
"Can you switch the machine off?" as Turing suggested.
147
464501
2789
07:48
So here's a PR2 robot.
148
468893
2120
07:51
This is one that we have in our lab,
149
471037
1821
07:52
and it has a big red "off" switch right on the back.
150
472882
2903
07:56
The question is: Is it going to let you switch it off?
151
476361
2615
07:59
If we do it the classical way,
152
479000
1465
08:00
we give it the objective of, "Fetch the coffee, I must fetch the coffee,
153
480489
3482
08:03
I can't fetch the coffee if I'm dead,"
154
483995
2580
08:06
so obviously the PR2 has been listening to my talk,
155
486599
3341
08:09
and so it says, therefore, "I must disable my 'off' switch,
156
489964
3753
08:14
and probably taser all the other people in Starbucks
157
494796
2694
08:17
who might interfere with me."
158
497514
1560
08:19
(Laughter)
159
499098
2062
08:21
So this seems to be inevitable, right?
160
501184
2153
08:23
This kind of failure mode seems to be inevitable,
161
503361
2398
08:25
and it follows from having a concrete, definite objective.
162
505783
3543
08:30
So what happens if the machine is uncertain about the objective?
163
510632
3144
08:33
Well, it reasons in a different way.
164
513800
2127
08:35
It says, "OK, the human might switch me off,
165
515951
2424
08:38
but only if I'm doing something wrong.
166
518964
1866
08:41
Well, I don't really know what wrong is,
167
521567
2475
08:44
but I know that I don't want to do it."
168
524066
2044
08:46
So that's the first and second principles right there.
169
526134
3010
08:49
"So I should let the human switch me off."
170
529168
3359
08:53
And in fact you can calculate the incentive that the robot has
171
533541
3956
08:57
to allow the human to switch it off,
172
537521
2493
09:00
and it's directly tied to the degree
173
540038
1914
09:01
of uncertainty about the underlying objective.
174
541976
2746
09:05
And then when the machine is switched off,
175
545797
2949
09:08
that third principle comes into play.
176
548770
1805
09:10
It learns something about the objectives it should be pursuing,
177
550599
3062
09:13
because it learns that what it did wasn't right.
178
553685
2533
09:16
In fact, we can, with suitable use of Greek symbols,
179
556242
3570
09:19
as mathematicians usually do,
180
559836
2131
09:21
we can actually prove a theorem
181
561991
1984
09:23
that says that such a robot is provably beneficial to the human.
182
563999
3553
09:27
You are provably better off with a machine that's designed in this way
183
567576
3803
09:31
than without it.
184
571403
1246
09:33
So this is a very simple example, but this is the first step
185
573057
2906
09:35
in what we're trying to do with human-compatible AI.
186
575987
3903
09:42
Now, this third principle,
187
582477
3257
09:45
I think is the one that you're probably scratching your head over.
188
585758
3112
09:48
You're probably thinking, "Well, you know, I behave badly.
189
588894
3239
09:52
I don't want my robot to behave like me.
190
592157
2929
09:55
I sneak down in the middle of the night and take stuff from the fridge.
191
595110
3434
09:58
I do this and that."
192
598568
1168
09:59
There's all kinds of things you don't want the robot doing.
193
599760
2797
10:02
But in fact, it doesn't quite work that way.
194
602581
2071
10:04
Just because you behave badly
195
604676
2155
10:06
doesn't mean the robot is going to copy your behavior.
196
606855
2623
10:09
It's going to understand your motivations and maybe help you resist them,
197
609502
3910
10:13
if appropriate.
198
613436
1320
10:16
But it's still difficult.
199
616026
1464
10:18
What we're trying to do, in fact,
200
618122
2545
10:20
is to allow machines to predict for any person and for any possible life
201
620691
5796
10:26
that they could live,
202
626511
1161
10:27
and the lives of everybody else:
203
627696
1597
10:29
Which would they prefer?
204
629317
2517
10:33
And there are many, many difficulties involved in doing this;
205
633881
2954
10:36
I don't expect that this is going to get solved very quickly.
206
636859
2932
10:39
The real difficulties, in fact, are us.
207
639815
2643
10:43
As I have already mentioned, we behave badly.
208
643969
3117
10:47
In fact, some of us are downright nasty.
209
647110
2321
10:50
Now the robot, as I said, doesn't have to copy the behavior.
210
650251
3052
10:53
The robot does not have any objective of its own.
211
653327
2791
10:56
It's purely altruistic.
212
656142
1737
10:59
And it's not designed just to satisfy the desires of one person, the user,
213
659113
5221
11:04
but in fact it has to respect the preferences of everybody.
214
664358
3138
11:09
So it can deal with a certain amount of nastiness,
215
669083
2570
11:11
and it can even understand that your nastiness, for example,
216
671677
3701
11:15
you may take bribes as a passport official
217
675402
2671
11:18
because you need to feed your family and send your kids to school.
218
678097
3812
11:21
It can understand that; it doesn't mean it's going to steal.
219
681933
2906
11:24
In fact, it'll just help you send your kids to school.
220
684863
2679
11:28
We are also computationally limited.
221
688796
3012
11:31
Lee Sedol is a brilliant Go player,
222
691832
2505
11:34
but he still lost.
223
694361
1325
11:35
So if we look at his actions, he took an action that lost the game.
224
695710
4239
11:39
That doesn't mean he wanted to lose.
225
699973
2161
11:43
So to understand his behavior,
226
703160
2040
11:45
we actually have to invert through a model of human cognition
227
705224
3644
11:48
that includes our computational limitations -- a very complicated model.
228
708892
4977
11:53
But it's still something that we can work on understanding.
229
713893
2993
11:57
Probably the most difficult part, from my point of view as an AI researcher,
230
717696
4320
12:02
is the fact that there are lots of us,
231
722040
2575
12:06
and so the machine has to somehow trade off, weigh up the preferences
232
726114
3581
12:09
of many different people,
233
729719
2225
12:11
and there are different ways to do that.
234
731968
1906
12:13
Economists, sociologists, moral philosophers have understood that,
235
733898
3689
12:17
and we are actively looking for collaboration.
236
737611
2455
12:20
Let's have a look and see what happens when you get that wrong.
237
740090
3251
12:23
So you can have a conversation, for example,
238
743365
2133
12:25
with your intelligent personal assistant
239
745522
1944
12:27
that might be available in a few years' time.
240
747490
2285
12:29
Think of a Siri on steroids.
241
749799
2524
12:33
So Siri says, "Your wife called to remind you about dinner tonight."
242
753447
4322
12:38
And of course, you've forgotten. "What? What dinner?
243
758436
2508
12:40
What are you talking about?"
244
760968
1425
12:42
"Uh, your 20th anniversary at 7pm."
245
762417
3746
12:48
"I can't do that. I'm meeting with the secretary-general at 7:30.
246
768735
3719
12:52
How could this have happened?"
247
772478
1692
12:54
"Well, I did warn you, but you overrode my recommendation."
248
774194
4660
12:59
"Well, what am I going to do? I can't just tell him I'm too busy."
249
779966
3328
13:04
"Don't worry. I arranged for his plane to be delayed."
250
784310
3281
13:07
(Laughter)
251
787615
1682
13:10
"Some kind of computer malfunction."
252
790069
2101
13:12
(Laughter)
253
792194
1212
13:13
"Really? You can do that?"
254
793430
1617
13:16
"He sends his profound apologies
255
796220
2179
13:18
and looks forward to meeting you for lunch tomorrow."
256
798423
2555
13:21
(Laughter)
257
801002
1299
13:22
So the values here -- there's a slight mistake going on.
258
802325
4403
13:26
This is clearly following my wife's values
259
806752
3009
13:29
which is "Happy wife, happy life."
260
809785
2069
13:31
(Laughter)
261
811878
1583
13:33
It could go the other way.
262
813485
1444
13:35
You could come home after a hard day's work,
263
815641
2201
13:37
and the computer says, "Long day?"
264
817866
2195
13:40
"Yes, I didn't even have time for lunch."
265
820085
2288
13:42
"You must be very hungry."
266
822397
1282
13:43
"Starving, yeah. Could you make some dinner?"
267
823703
2646
13:47
"There's something I need to tell you."
268
827890
2090
13:50
(Laughter)
269
830004
1155
13:52
"There are humans in South Sudan who are in more urgent need than you."
270
832013
4905
13:56
(Laughter)
271
836942
1104
13:58
"So I'm leaving. Make your own dinner."
272
838070
2075
14:00
(Laughter)
273
840169
2000
14:02
So we have to solve these problems,
274
842643
1739
14:04
and I'm looking forward to working on them.
275
844406
2515
14:06
There are reasons for optimism.
276
846945
1843
14:08
One reason is,
277
848812
1159
14:09
there is a massive amount of data.
278
849995
1868
14:11
Because remember -- I said they're going to read everything
279
851887
2794
14:14
the human race has ever written.
280
854705
1546
14:16
Most of what we write about is human beings doing things
281
856275
2724
14:19
and other people getting upset about it.
282
859023
1914
14:20
So there's a massive amount of data to learn from.
283
860961
2398
14:23
There's also a very strong economic incentive
284
863383
2236
14:27
to get this right.
285
867151
1186
14:28
So imagine your domestic robot's at home.
286
868361
2001
14:30
You're late from work again and the robot has to feed the kids,
287
870386
3067
14:33
and the kids are hungry and there's nothing in the fridge.
288
873477
2823
14:36
And the robot sees the cat.
289
876324
2605
14:38
(Laughter)
290
878953
1692
14:40
And the robot hasn't quite learned the human value function properly,
291
880669
4190
14:44
so it doesn't understand
292
884883
1251
14:46
the sentimental value of the cat outweighs the nutritional value of the cat.
293
886158
4844
14:51
(Laughter)
294
891026
1095
14:52
So then what happens?
295
892145
1748
14:53
Well, it happens like this:
296
893917
3297
14:57
"Deranged robot cooks kitty for family dinner."
297
897238
2964
15:00
That one incident would be the end of the domestic robot industry.
298
900226
4523
15:04
So there's a huge incentive to get this right
299
904773
3372
15:08
long before we reach superintelligent machines.
300
908169
2715
15:11
So to summarize:
301
911948
1535
15:13
I'm actually trying to change the definition of AI
302
913507
2881
15:16
so that we have provably beneficial machines.
303
916412
2993
15:19
And the principles are:
304
919429
1222
15:20
machines that are altruistic,
305
920675
1398
15:22
that want to achieve only our objectives,
306
922097
2804
15:24
but that are uncertain about what those objectives are,
307
924925
3116
15:28
and will watch all of us
308
928065
1998
15:30
to learn more about what it is that we really want.
309
930087
3203
15:34
And hopefully in the process, we will learn to be better people.
310
934193
3559
15:37
Thank you very much.
311
937776
1191
15:38
(Applause)
312
938991
3709
15:42
Chris Anderson: So interesting, Stuart.
313
942724
1868
15:44
We're going to stand here a bit because I think they're setting up
314
944616
3170
15:47
for our next speaker.
315
947810
1151
15:48
A couple of questions.
316
948985
1538
15:50
So the idea of programming in ignorance seems intuitively really powerful.
317
950547
5453
15:56
As you get to superintelligence,
318
956024
1594
15:57
what's going to stop a robot
319
957642
2258
15:59
reading literature and discovering this idea that knowledge
320
959924
2852
16:02
is actually better than ignorance
321
962800
1572
16:04
and still just shifting its own goals and rewriting that programming?
322
964396
4218
16:09
Stuart Russell: Yes, so we want it to learn more, as I said,
323
969512
6356
16:15
about our objectives.
324
975892
1287
16:17
It'll only become more certain as it becomes more correct,
325
977203
5521
16:22
so the evidence is there
326
982748
1945
16:24
and it's going to be designed to interpret it correctly.
327
984717
2724
16:27
It will understand, for example, that books are very biased
328
987465
3956
16:31
in the evidence they contain.
329
991445
1483
16:32
They only talk about kings and princes
330
992952
2397
16:35
and elite white male people doing stuff.
331
995373
2800
16:38
So it's a complicated problem,
332
998197
2096
16:40
but as it learns more about our objectives
333
1000317
3872
16:44
it will become more and more useful to us.
334
1004213
2063
16:46
CA: And you couldn't just boil it down to one law,
335
1006300
2526
16:48
you know, hardwired in:
336
1008850
1650
16:50
"if any human ever tries to switch me off,
337
1010524
3293
16:53
I comply. I comply."
338
1013841
1935
16:55
SR: Absolutely not.
339
1015800
1182
16:57
That would be a terrible idea.
340
1017006
1499
16:58
So imagine that you have a self-driving car
341
1018529
2689
17:01
and you want to send your five-year-old
342
1021242
2433
17:03
off to preschool.
343
1023699
1174
17:04
Do you want your five-year-old to be able to switch off the car
344
1024897
3101
17:08
while it's driving along?
345
1028022
1213
17:09
Probably not.
346
1029259
1159
17:10
So it needs to understand how rational and sensible the person is.
347
1030442
4703
17:15
The more rational the person,
348
1035169
1676
17:16
the more willing you are to be switched off.
349
1036869
2103
17:18
If the person is completely random or even malicious,
350
1038996
2543
17:21
then you're less willing to be switched off.
351
1041563
2512
17:24
CA: All right. Stuart, can I just say,
352
1044099
1866
17:25
I really, really hope you figure this out for us.
353
1045989
2314
17:28
Thank you so much for that talk. That was amazing.
354
1048327
2375
17:30
SR: Thank you.
355
1050726
1167
17:31
(Applause)
356
1051917
1837
About this website

This site will introduce you to YouTube videos that are useful for learning English. You will see English lessons taught by top-notch teachers from around the world. Double-click on the English subtitles displayed on each video page to play the video from there. The subtitles scroll in sync with the video playback. If you have any comments or requests, please contact us using this contact form.

https://forms.gle/WvT1wiN1qDtmnspy7