How to get better at video games, according to babies - Brian Christian

542,613 views ・ 2021-11-02

TED-Ed


请双击下面的英文字幕来播放视频。

翻译人员: psjmz mz 校对人员: Helen Chang
00:08
In 2013, a group of researchers at DeepMind in London
0
8871
4292
2013年一群伦敦 DeepMind公司的研究者
00:13
had set their sights on a grand challenge.
1
13163
2666
把目光放在一大挑战上。
00:15
They wanted to create an AI system that could beat,
2
15996
3292
他们想要创建一个人工智能系统, 不仅能胜一个,
00:19
not just a single Atari game, but every Atari game.
3
19288
4833
而是能够全部完胜雅达利 (Atari)游戏。
00:24
They developed a system they called Deep Q Networks, or DQN,
4
24663
5166
他们开发了个名叫 强化学习的网络(DQN),
00:29
and less than two years later, it was superhuman.
5
29829
3667
在不到两年,它超越人类。
00:33
DQN was getting scores 13 times better
6
33954
4167
DQN打砖块游戏(Breakout)的得分
00:38
than professional human games testers at “Breakout,”
7
38121
3541
比人类专业游戏玩家高13倍,
00:41
17 times better at “Boxing,” and 25 times better at “Video Pinball.”
8
41662
6334
在拳击游戏中高17倍, 在电子弹珠台中高25倍。
00:48
But there was one notable, and glaring, exception.
9
48162
3834
但有一很明显的例外。
00:52
When playing “Montezuma’s Revenge” DQN couldn’t score a single point,
10
52496
5791
玩游戏《Montezuma’s Revenge》时, DQN一分都拿不到,
00:58
even after playing for weeks.
11
58537
2625
即便玩了几周后。
01:01
What was it that made this particular game so vexingly difficult for AI?
12
61412
5459
是什么让这个特别的游戏 对人工智能这么难胜?
01:07
And what would it take to solve it?
13
67204
2459
需要采取什么来解决它?
01:10
Spoiler alert: babies.
14
70538
2833
剧透警告:婴儿。
01:13
We’ll come back to that in a minute.
15
73746
2000
我们1分钟后回来。
01:16
Playing Atari games with AI involves what’s called reinforcement learning,
16
76163
5541
人工智能玩雅达利游戏 涉及到强化学习,
01:21
where the system is designed to maximize some kind of numerical rewards.
17
81871
4917
在这里系统被设计为 最大化某种量化的奖励。
01:26
In this case, those rewards were simply the game's points.
18
86788
3833
在这个例子中,这些奖励是游戏分数。
01:30
This underlying goal drives the system to learn which buttons to press
19
90746
4333
这个潜在的目标驱使系统 学习按哪个按键
01:35
and when to press them to get the most points.
20
95079
3000
以及何时去按来获得最高分数。
01:38
Some systems use model-based approaches, where they have a model of the environment
21
98079
5542
一些系统使用基于模型的方法, 它们有一个环境的模型
01:43
that they can use to predict what will happen next
22
103621
3125
这样它们就能用来预测 一旦它们采取特定行动后,
01:46
once they take a certain action.
23
106746
2000
下一步会发生什么。
01:49
DQN, however, is model free.
24
109288
3041
然而,DQN没有任何模型。
01:52
Instead of explicitly modeling its environment,
25
112704
2584
与其明确地建模环境,
01:55
it just learns to predict, based on the images on screen,
26
115288
3458
它只需要基于屏幕上的图像学习预测,
01:58
how many future points it can expect to earn by pressing different buttons.
27
118746
4958
它们按不同的键能够 期望获得多少分数。
02:03
For instance, “if the ball is here and I move left, more points,
28
123871
4792
例如,“如果球在这里, 我向左移就得更多的分数,
02:08
but if I move right, no more points.”
29
128663
2833
但如果向右移就不得分。”
02:12
But learning these connections requires a lot of trial and error.
30
132038
4500
但学习这些联系需要大量的试错。
02:16
The DQN system would start by mashing buttons randomly,
31
136704
3834
DQN系统从随意敲按键开始,
02:20
and then slowly piece together which buttons to mash when
32
140538
3541
然后慢慢拼凑 何时需要敲哪个按键
02:24
in order to maximize its score.
33
144079
2125
才能够得到最高分。
02:26
But in playing “Montezuma’s Revenge,”
34
146704
2375
但在玩《Montezuma’s Revenge》时,
02:29
this approach of random button-mashing fell flat on its face.
35
149079
4334
这种随意敲按键的方法彻底失效了。
02:34
A player would have to perform this entire sequence
36
154121
3000
玩家得做完这整个序列动作
02:37
just to score their first points at the very end.
37
157121
3375
才能最终得到第一分。
02:40
A mistake? Game over.
38
160871
2208
犯个错误?游戏结束。
02:43
So how could DQN even know it was on the right track?
39
163538
3708
那么DQN如何知道它在正确的道路上?
02:47
This is where babies come in.
40
167746
2458
婴儿上场的时候到了。
02:50
In studies, infants consistently look longer at pictures
41
170746
3875
在研究中,婴儿看没见过的图片要比
02:54
they haven’t seen before than ones they have.
42
174621
2667
见过的图片花更多的时间。
02:57
There just seems to be something intrinsically rewarding about novelty.
43
177579
4000
新奇似乎就是某种内在奖励。
03:02
This behavior has been essential in understanding the infant mind.
44
182121
4125
这种行为对于理解 婴儿的心理至关重要。
03:06
It also turned out to be the secret to beating “Montezuma’s Revenge.”
45
186496
4792
这正好也是玩好 《Montezuma’s Revenge》游戏的秘密。
03:12
The DeepMind researchers worked out an ingenious way
46
192121
3708
DeepMind研究人员找到 巧妙的方法
03:15
to plug this preference for novelty into reinforcement learning.
47
195829
4500
将这种对新奇事物的偏好 插入到强化学习中。
03:20
They made it so that unusual or new images appearing on the screen
48
200704
4542
他们让不同寻常 或新的图片出现在屏幕中时
03:25
were every bit as rewarding as real in-game points.
49
205246
4208
与真正的游戏积分一样有奖励意义。
03:29
Suddenly, DQN was behaving totally differently from before.
50
209704
4709
突然之间,DQN的行为 完全跟起初不一样了。
03:34
It wanted to explore the room it was in,
51
214579
2334
它想要探索所处的房间,
03:36
to grab the key and escape through the locked door—
52
216913
2708
去抓住钥匙并 通过锁住的门逃出去——
03:39
not because it was worth 100 points,
53
219621
2708
不是因为这价值100分,
03:42
but for the same reason we would: to see what was on the other side.
54
222329
4667
而是跟我们的理由一样: 去看看另一边有什么。
03:48
With this new drive, DQN not only managed to grab that first key—
55
228163
5250
通过这种新的激励,DQN不仅 能够抓住第一把钥匙——
03:53
it explored all the way through 15 of the temple’s 24 chambers.
56
233413
4833
它还在24个房间中,探索了15个房间。
03:58
But emphasizing novelty-based rewards can sometimes create more problems
57
238454
4209
但强调基于新奇的奖励有时候 会带来比它解决的问题
04:02
than it solves.
58
242663
1166
更多的问题。
04:03
A novelty-seeking system that’s played a game too long
59
243913
3208
一个新颖性探索的系统 如果玩游戏太久
04:07
will eventually lose motivation.
60
247121
2500
最终会失去动力。
04:09
If it’s seen it all before, why go anywhere?
61
249996
3042
如果这都是以前见过的, 为什么还要去呢?
04:13
Alternately, if it encounters, say, a television, it will freeze.
62
253621
5167
换之,假如它遇到电视, 它就会停下来。
04:18
The constant novel images are essentially paralyzing.
63
258954
3750
不断出现的新奇图像基本让人瘫痪。
04:23
The ideas and inspiration here go in both directions.
64
263204
3625
这个想法和启发是双向的。
04:27
AI researchers stuck on a practical problem,
65
267079
3125
人工智能研究人员被一个问题困住了,
04:30
like how to get DQN to beat a difficult game,
66
270204
3334
比如如何让DQN打赢一个不同的游戏,
04:33
are turning increasingly to experts in human intelligence for ideas.
67
273538
5000
逐渐变成了探索人类的思想智能。
04:38
At the same time,
68
278788
1125
同时,
04:39
AI is giving us new insights into the ways we get stuck and unstuck:
69
279913
5416
人工智能给我们提供了新的视角, 让我们了解如何陷入和摆脱困境:
04:45
into boredom, depression, and addiction,
70
285329
2792
变得无聊、沮丧和上瘾,
04:48
along with curiosity, creativity, and play.
71
288121
3667
还有好奇心、创造力和玩乐。
关于本网站

这个网站将向你介绍对学习英语有用的YouTube视频。你将看到来自世界各地的一流教师教授的英语课程。双击每个视频页面上显示的英文字幕,即可从那里播放视频。字幕会随着视频的播放而同步滚动。如果你有任何意见或要求,请使用此联系表与我们联系。

https://forms.gle/WvT1wiN1qDtmnspy7