How to get better at video games, according to babies - Brian Christian

559,494 views ・ 2021-11-02

TED-Ed

请双击下面的英文字幕来播放视频。

翻译人员: psjmz mz 校对人员: Helen Chang

00:08

In 2013, a group of researchers at DeepMind in London

8871

4292

2013年一群伦敦 DeepMind公司的研究者

00:13

had set their sights on a grand challenge.

13163

2666

把目光放在一大挑战上。

00:15

They wanted to create an AI system that could beat,

15996

3292

他们想要创建一个人工智能系统，不仅能胜一个，

00:19

not just a single Atari game, but every Atari game.

19288

4833

而是能够全部完胜雅达利（Atari）游戏。

00:24

They developed a system they called Deep Q Networks, or DQN,

24663

5166

他们开发了个名叫强化学习的网络（DQN），

00:29

and less than two years later, it was superhuman.

29829

3667

在不到两年，它超越人类。

00:33

DQN was getting scores 13 times better

33954

4167

DQN打砖块游戏（Breakout）的得分

00:38

than professional human games testers at “Breakout,”

38121

3541

比人类专业游戏玩家高13倍，

00:41

17 times better at “Boxing,” and 25 times better at “Video Pinball.”

41662

6334

在拳击游戏中高17倍，在电子弹珠台中高25倍。

00:48

But there was one notable, and glaring, exception.

48162

3834

但有一很明显的例外。

00:52

When playing “Montezuma’s Revenge” DQN couldn’t score a single point,

52496

5791

玩游戏《Montezuma’s Revenge》时， DQN一分都拿不到，

00:58

even after playing for weeks.

58537

2625

即便玩了几周后。

01:01

What was it that made this particular game so vexingly difficult for AI?

61412

5459

是什么让这个特别的游戏对人工智能这么难胜？

01:07

And what would it take to solve it?

67204

2459

需要采取什么来解决它？

01:10

Spoiler alert: babies.

70538

2833

剧透警告：婴儿。

01:13

We’ll come back to that in a minute.

73746

2000

我们1分钟后回来。

01:16

Playing Atari games with AI involves what’s called reinforcement learning,

76163

5541

人工智能玩雅达利游戏涉及到强化学习，

01:21

where the system is designed to maximize some kind of numerical rewards.

81871

4917

在这里系统被设计为最大化某种量化的奖励。

01:26

In this case, those rewards were simply the game's points.

86788

3833

在这个例子中，这些奖励是游戏分数。

01:30

This underlying goal drives the system to learn which buttons to press

90746

4333

这个潜在的目标驱使系统学习按哪个按键

01:35

and when to press them to get the most points.

95079

3000

以及何时去按来获得最高分数。

01:38

Some systems use model-based approaches, where they have a model of the environment

98079

5542

一些系统使用基于模型的方法，它们有一个环境的模型

01:43

that they can use to predict what will happen next

103621

3125

这样它们就能用来预测一旦它们采取特定行动后，

01:46

once they take a certain action.

106746

2000

下一步会发生什么。

01:49

DQN, however, is model free.

109288

3041

然而，DQN没有任何模型。

01:52

Instead of explicitly modeling its environment,

112704

2584

与其明确地建模环境，

01:55

it just learns to predict, based on the images on screen,

115288

3458

它只需要基于屏幕上的图像学习预测，

01:58

how many future points it can expect to earn by pressing different buttons.

118746

4958

它们按不同的键能够期望获得多少分数。

02:03

For instance, “if the ball is here and I move left, more points,

123871

4792

例如，“如果球在这里，我向左移就得更多的分数，

02:08

but if I move right, no more points.”

128663

2833

但如果向右移就不得分。”

02:12

But learning these connections requires a lot of trial and error.

132038

4500

但学习这些联系需要大量的试错。

02:16

The DQN system would start by mashing buttons randomly,

136704

3834

DQN系统从随意敲按键开始，

02:20

and then slowly piece together which buttons to mash when

140538

3541

然后慢慢拼凑何时需要敲哪个按键

02:24

in order to maximize its score.

144079

2125

才能够得到最高分。

02:26

But in playing “Montezuma’s Revenge,”

146704

2375

但在玩《Montezuma’s Revenge》时，

02:29

this approach of random button-mashing fell flat on its face.

149079

4334

这种随意敲按键的方法彻底失效了。

02:34

A player would have to perform this entire sequence

154121

3000

玩家得做完这整个序列动作

02:37

just to score their first points at the very end.

157121

3375

才能最终得到第一分。

02:40

A mistake? Game over.

160871

2208

犯个错误？游戏结束。

02:43

So how could DQN even know it was on the right track?

163538

3708

那么DQN如何知道它在正确的道路上？

02:47

This is where babies come in.

167746

2458

婴儿上场的时候到了。

02:50

In studies, infants consistently look longer at pictures

170746

3875

在研究中，婴儿看没见过的图片要比

02:54

they haven’t seen before than ones they have.

174621

2667

见过的图片花更多的时间。

02:57

There just seems to be something intrinsically rewarding about novelty.

177579

4000

新奇似乎就是某种内在奖励。

03:02

This behavior has been essential in understanding the infant mind.

182121

4125

这种行为对于理解婴儿的心理至关重要。

03:06

It also turned out to be the secret to beating “Montezuma’s Revenge.”

186496

4792

这正好也是玩好《Montezuma’s Revenge》游戏的秘密。

03:12

The DeepMind researchers worked out an ingenious way

192121

3708

DeepMind研究人员找到巧妙的方法

03:15

to plug this preference for novelty into reinforcement learning.

195829

4500

将这种对新奇事物的偏好插入到强化学习中。

03:20

They made it so that unusual or new images appearing on the screen

200704

4542

他们让不同寻常或新的图片出现在屏幕中时

03:25

were every bit as rewarding as real in-game points.

205246

4208

与真正的游戏积分一样有奖励意义。

03:29

Suddenly, DQN was behaving totally differently from before.

209704

4709

突然之间，DQN的行为完全跟起初不一样了。

03:34

It wanted to explore the room it was in,

214579

2334

它想要探索所处的房间，

03:36

to grab the key and escape through the locked door—

216913

2708

去抓住钥匙并通过锁住的门逃出去——

03:39

not because it was worth 100 points,

219621

2708

不是因为这价值100分，

03:42

but for the same reason we would: to see what was on the other side.

222329

4667

而是跟我们的理由一样：去看看另一边有什么。

03:48

With this new drive, DQN not only managed to grab that first key—

228163

5250

通过这种新的激励，DQN不仅能够抓住第一把钥匙——

03:53

it explored all the way through 15 of the temple’s 24 chambers.

233413

4833

它还在24个房间中，探索了15个房间。

03:58

But emphasizing novelty-based rewards can sometimes create more problems

238454

4209

但强调基于新奇的奖励有时候会带来比它解决的问题

04:02

than it solves.

242663

1166

更多的问题。

04:03

A novelty-seeking system that’s played a game too long

243913

3208

一个新颖性探索的系统如果玩游戏太久

04:07

will eventually lose motivation.

247121

2500

最终会失去动力。

04:09

If it’s seen it all before, why go anywhere?

249996

3042

如果这都是以前见过的，为什么还要去呢？

04:13

Alternately, if it encounters, say, a television, it will freeze.

253621

5167

换之，假如它遇到电视，它就会停下来。

04:18

The constant novel images are essentially paralyzing.

258954

3750

不断出现的新奇图像基本让人瘫痪。

04:23

The ideas and inspiration here go in both directions.

263204

3625

这个想法和启发是双向的。

04:27

AI researchers stuck on a practical problem,

267079

3125

人工智能研究人员被一个问题困住了，

04:30

like how to get DQN to beat a difficult game,

270204

3334

比如如何让DQN打赢一个不同的游戏，

04:33

are turning increasingly to experts in human intelligence for ideas.

273538

5000

逐渐变成了探索人类的思想智能。

04:38

At the same time,

278788

1125

同时，

04:39

AI is giving us new insights into the ways we get stuck and unstuck:

279913

5416

人工智能给我们提供了新的视角，让我们了解如何陷入和摆脱困境：

04:45

into boredom, depression, and addiction,

285329

2792

变得无聊、沮丧和上瘾，

04:48

along with curiosity, creativity, and play.

288121

3667

还有好奇心、创造力和玩乐。

New videos

06:51

The Rise of China's Homegrown Brands — and Why ...

08:33

Can AI Help with the Chaos of Family Life? | Av...

09:26

You Are the Bridge to the Next Generation | Ndi...

08:29

Are We Still Human If Robots Help Raise Our Bab...

06:45

Parkour! How the Sport Keeps Your Body and Mind...

09:53

The Power of Gaming Together in a Lonely World ...

05:46

The myth of Medusa - Laura Aitken-Burt

05:02

How reliable is fingerprint evidence? - Theodor...

Original video on YouTube.com

How to get better at video games, according to babies - Brian Christian - YouTube

关于本网站

这个网站将向你介绍对学习英语有用的YouTube视频。你将看到来自世界各地的一流教师教授的英语课程。双击每个视频页面上显示的英文字幕，即可从那里播放视频。字幕会随着视频的播放而同步滚动。如果你有任何意见或要求，请使用此联系表与我们联系。

https://forms.gle/WvT1wiN1qDtmnspy7

Playback speed

Subtitle font size

How to get better at video games, according to babies - Brian Christian

New videos

How to get better at video games, according to babies - Brian Christian

New videos

Original video on YouTube.com