How to get better at video games, according to babies - Brian Christian

559,494 views ・ 2021-11-02

TED-Ed


请双击下面的英文字幕来播放视频。

翻译人员: psjmz mz 校对人员: Helen Chang
00:08
In 2013, a group of researchers at DeepMind in London
0
8871
4292
2013年一群伦敦 DeepMind公司的研究者
00:13
had set their sights on a grand challenge.
1
13163
2666
把目光放在一大挑战上。
00:15
They wanted to create an AI system that could beat,
2
15996
3292
他们想要创建一个人工智能系统, 不仅能胜一个,
00:19
not just a single Atari game, but every Atari game.
3
19288
4833
而是能够全部完胜雅达利 (Atari)游戏。
00:24
They developed a system they called Deep Q Networks, or DQN,
4
24663
5166
他们开发了个名叫 强化学习的网络(DQN),
00:29
and less than two years later, it was superhuman.
5
29829
3667
在不到两年,它超越人类。
00:33
DQN was getting scores 13 times better
6
33954
4167
DQN打砖块游戏(Breakout)的得分
00:38
than professional human games testers at “Breakout,”
7
38121
3541
比人类专业游戏玩家高13倍,
00:41
17 times better at “Boxing,” and 25 times better at “Video Pinball.”
8
41662
6334
在拳击游戏中高17倍, 在电子弹珠台中高25倍。
00:48
But there was one notable, and glaring, exception.
9
48162
3834
但有一很明显的例外。
00:52
When playing “Montezuma’s Revenge” DQN couldn’t score a single point,
10
52496
5791
玩游戏《Montezuma’s Revenge》时, DQN一分都拿不到,
00:58
even after playing for weeks.
11
58537
2625
即便玩了几周后。
01:01
What was it that made this particular game so vexingly difficult for AI?
12
61412
5459
是什么让这个特别的游戏 对人工智能这么难胜?
01:07
And what would it take to solve it?
13
67204
2459
需要采取什么来解决它?
01:10
Spoiler alert: babies.
14
70538
2833
剧透警告:婴儿。
01:13
We’ll come back to that in a minute.
15
73746
2000
我们1分钟后回来。
01:16
Playing Atari games with AI involves what’s called reinforcement learning,
16
76163
5541
人工智能玩雅达利游戏 涉及到强化学习,
01:21
where the system is designed to maximize some kind of numerical rewards.
17
81871
4917
在这里系统被设计为 最大化某种量化的奖励。
01:26
In this case, those rewards were simply the game's points.
18
86788
3833
在这个例子中,这些奖励是游戏分数。
01:30
This underlying goal drives the system to learn which buttons to press
19
90746
4333
这个潜在的目标驱使系统 学习按哪个按键
01:35
and when to press them to get the most points.
20
95079
3000
以及何时去按来获得最高分数。
01:38
Some systems use model-based approaches, where they have a model of the environment
21
98079
5542
一些系统使用基于模型的方法, 它们有一个环境的模型
01:43
that they can use to predict what will happen next
22
103621
3125
这样它们就能用来预测 一旦它们采取特定行动后,
01:46
once they take a certain action.
23
106746
2000
下一步会发生什么。
01:49
DQN, however, is model free.
24
109288
3041
然而,DQN没有任何模型。
01:52
Instead of explicitly modeling its environment,
25
112704
2584
与其明确地建模环境,
01:55
it just learns to predict, based on the images on screen,
26
115288
3458
它只需要基于屏幕上的图像学习预测,
01:58
how many future points it can expect to earn by pressing different buttons.
27
118746
4958
它们按不同的键能够 期望获得多少分数。
02:03
For instance, “if the ball is here and I move left, more points,
28
123871
4792
例如,“如果球在这里, 我向左移就得更多的分数,
02:08
but if I move right, no more points.”
29
128663
2833
但如果向右移就不得分。”
02:12
But learning these connections requires a lot of trial and error.
30
132038
4500
但学习这些联系需要大量的试错。
02:16
The DQN system would start by mashing buttons randomly,
31
136704
3834
DQN系统从随意敲按键开始,
02:20
and then slowly piece together which buttons to mash when
32
140538
3541
然后慢慢拼凑 何时需要敲哪个按键
02:24
in order to maximize its score.
33
144079
2125
才能够得到最高分。
02:26
But in playing “Montezuma’s Revenge,”
34
146704
2375
但在玩《Montezuma’s Revenge》时,
02:29
this approach of random button-mashing fell flat on its face.
35
149079
4334
这种随意敲按键的方法彻底失效了。
02:34
A player would have to perform this entire sequence
36
154121
3000
玩家得做完这整个序列动作
02:37
just to score their first points at the very end.
37
157121
3375
才能最终得到第一分。
02:40
A mistake? Game over.
38
160871
2208
犯个错误?游戏结束。
02:43
So how could DQN even know it was on the right track?
39
163538
3708
那么DQN如何知道它在正确的道路上?
02:47
This is where babies come in.
40
167746
2458
婴儿上场的时候到了。
02:50
In studies, infants consistently look longer at pictures
41
170746
3875
在研究中,婴儿看没见过的图片要比
02:54
they haven’t seen before than ones they have.
42
174621
2667
见过的图片花更多的时间。
02:57
There just seems to be something intrinsically rewarding about novelty.
43
177579
4000
新奇似乎就是某种内在奖励。
03:02
This behavior has been essential in understanding the infant mind.
44
182121
4125
这种行为对于理解 婴儿的心理至关重要。
03:06
It also turned out to be the secret to beating “Montezuma’s Revenge.”
45
186496
4792
这正好也是玩好 《Montezuma’s Revenge》游戏的秘密。
03:12
The DeepMind researchers worked out an ingenious way
46
192121
3708
DeepMind研究人员找到 巧妙的方法
03:15
to plug this preference for novelty into reinforcement learning.
47
195829
4500
将这种对新奇事物的偏好 插入到强化学习中。
03:20
They made it so that unusual or new images appearing on the screen
48
200704
4542
他们让不同寻常 或新的图片出现在屏幕中时
03:25
were every bit as rewarding as real in-game points.
49
205246
4208
与真正的游戏积分一样有奖励意义。
03:29
Suddenly, DQN was behaving totally differently from before.
50
209704
4709
突然之间,DQN的行为 完全跟起初不一样了。
03:34
It wanted to explore the room it was in,
51
214579
2334
它想要探索所处的房间,
03:36
to grab the key and escape through the locked door—
52
216913
2708
去抓住钥匙并 通过锁住的门逃出去——
03:39
not because it was worth 100 points,
53
219621
2708
不是因为这价值100分,
03:42
but for the same reason we would: to see what was on the other side.
54
222329
4667
而是跟我们的理由一样: 去看看另一边有什么。
03:48
With this new drive, DQN not only managed to grab that first key—
55
228163
5250
通过这种新的激励,DQN不仅 能够抓住第一把钥匙——
03:53
it explored all the way through 15 of the temple’s 24 chambers.
56
233413
4833
它还在24个房间中,探索了15个房间。
03:58
But emphasizing novelty-based rewards can sometimes create more problems
57
238454
4209
但强调基于新奇的奖励有时候 会带来比它解决的问题
04:02
than it solves.
58
242663
1166
更多的问题。
04:03
A novelty-seeking system that’s played a game too long
59
243913
3208
一个新颖性探索的系统 如果玩游戏太久
04:07
will eventually lose motivation.
60
247121
2500
最终会失去动力。
04:09
If it’s seen it all before, why go anywhere?
61
249996
3042
如果这都是以前见过的, 为什么还要去呢?
04:13
Alternately, if it encounters, say, a television, it will freeze.
62
253621
5167
换之,假如它遇到电视, 它就会停下来。
04:18
The constant novel images are essentially paralyzing.
63
258954
3750
不断出现的新奇图像基本让人瘫痪。
04:23
The ideas and inspiration here go in both directions.
64
263204
3625
这个想法和启发是双向的。
04:27
AI researchers stuck on a practical problem,
65
267079
3125
人工智能研究人员被一个问题困住了,
04:30
like how to get DQN to beat a difficult game,
66
270204
3334
比如如何让DQN打赢一个不同的游戏,
04:33
are turning increasingly to experts in human intelligence for ideas.
67
273538
5000
逐渐变成了探索人类的思想智能。
04:38
At the same time,
68
278788
1125
同时,
04:39
AI is giving us new insights into the ways we get stuck and unstuck:
69
279913
5416
人工智能给我们提供了新的视角, 让我们了解如何陷入和摆脱困境:
04:45
into boredom, depression, and addiction,
70
285329
2792
变得无聊、沮丧和上瘾,
04:48
along with curiosity, creativity, and play.
71
288121
3667
还有好奇心、创造力和玩乐。
关于本网站

这个网站将向你介绍对学习英语有用的YouTube视频。你将看到来自世界各地的一流教师教授的英语课程。双击每个视频页面上显示的英文字幕,即可从那里播放视频。字幕会随着视频的播放而同步滚动。如果你有任何意见或要求,请使用此联系表与我们联系。

https://forms.gle/WvT1wiN1qDtmnspy7


This website was created in October 2020 and last updated on June 12, 2025.

It is now archived and preserved as an English learning resource.

Some information may be out of date.

隐私政策

eng.lish.video

Developer's Blog