How to get better at video games, according to babies - Brian Christian

559,494 views ・ 2021-11-02

TED-Ed

請雙擊下方英文字幕播放視頻。

譯者: Lilian Chiu 審譯者: Helen Chang

00:08

In 2013, a group of researchers at DeepMind in London

8871

4292

2013 年，倫敦的 DeepMind 公司有一群研究員

00:13

had set their sights on a grand challenge.

13163

2666

決定要接受一項大挑戰。

00:15

They wanted to create an AI system that could beat,

15996

3292

他們要開發一個人工智慧系統，

不只能贏一場雅達利（Atari）遊戲，

00:19

not just a single Atari game, but every Atari game.

19288

4833

而是贏所有的雅達利遊戲。

00:24

They developed a system they called Deep Q Networks, or DQN,

24663

5166

他們開發了一個系統，叫做深度 Q 網路（DQN）。

00:29

and less than two years later, it was superhuman.

29829

3667

不到兩年，它就超越了人類。

00:33

DQN was getting scores 13 times better

33954

4167

DQN 玩《Breakout》的得分

比專業的人類遊戲測試者高十三倍，

00:38

than professional human games testers at “Breakout,”

38121

3541

00:41

17 times better at “Boxing,” and 25 times better at “Video Pinball.”

41662

6334

玩《Boxing》的得分是十七倍，

《Video Pinball》是二十五倍。

00:48

But there was one notable, and glaring, exception.

48162

3834

但有個很引人注意且顯目的例外。

00:52

When playing “Montezuma’s Revenge” DQN couldn’t score a single point,

52496

5791

在玩《Montezuma’s Revenge》時，

DQN 一分也得不到，

00:58

even after playing for weeks.

58537

2625

玩了幾週之後仍然如此。

01:01

What was it that made this particular game so vexingly difficult for AI?

61412

5459

是什麼原因讓這款遊戲特別讓人工智慧傷腦筋？

01:07

And what would it take to solve it?

67204

2459

要靠什麼才能解決這個問題？

01:10

Spoiler alert: babies.

70538

2833

以下有雷：

嬰兒。

01:13

We’ll come back to that in a minute.

73746

2000

這部分我們等下再回來談。

01:16

Playing Atari games with AI involves what’s called reinforcement learning,

76163

5541

用人工智慧玩雅達利遊戲會用到所謂的

強化學習，

01:21

where the system is designed to maximize some kind of numerical rewards.

81871

4917

系統的設計是會讓某種數字化的獎勵達到最高。

01:26

In this case, those rewards were simply the game's points.

86788

3833

在這個例子中，獎勵就是遊戲的得分。

01:30

This underlying goal drives the system to learn which buttons to press

90746

4333

這個背後的目標，會驅使系統去學習要按哪個按鈕，

01:35

and when to press them to get the most points.

95079

3000

以及何時要按，才能得到最高的分數。

01:38

Some systems use model-based approaches, where they have a model of the environment

98079

5542

有些系統用以模型為基礎的方法，

會有一個環境的模型，

01:43

that they can use to predict what will happen next

103621

3125

用這個模型來預測採取某個行動的後果。

01:46

once they take a certain action.

106746

2000

01:49

DQN, however, is model free.

109288

3041

然而，DQN 是不用模型的。

01:52

Instead of explicitly modeling its environment,

112704

2584

它不明確地將環境建模，

01:55

it just learns to predict, based on the images on screen,

115288

3458

而是學習根據螢幕上的影像來預測

01:58

how many future points it can expect to earn by pressing different buttons.

118746

4958

按各個按鈕預期將會得多少分。

02:03

For instance, “if the ball is here and I move left, more points,

123871

4792

比如：「如果球在這裡，而我向左移動，

就會有更多分數，但若向右，就沒有更多分數。」

02:08

but if I move right, no more points.”

128663

2833

02:12

But learning these connections requires a lot of trial and error.

132038

4500

但學習這些連結需要很大量的試誤。

02:16

The DQN system would start by mashing buttons randomly,

136704

3834

DQN 系統一開始是先隨機亂按按鈕，

02:20

and then slowly piece together which buttons to mash when

140538

3541

接著慢慢拼湊出何時要按哪些按鈕

02:24

in order to maximize its score.

144079

2125

才能讓分數達到最高。

02:26

But in playing “Montezuma’s Revenge,”

146704

2375

但在玩《Montezuma’s Revenge》時，

02:29

this approach of random button-mashing fell flat on its face.

149079

4334

用隨機按鈕的方法輸得慘兮兮。

02:34

A player would have to perform this entire sequence

154121

3000

玩家得要做完這一連串過程

02:37

just to score their first points at the very end.

157121

3375

才能在最後得到第一分。

02:40

A mistake? Game over.

160871

2208

犯一個錯呢？遊戲結束。

02:43

So how could DQN even know it was on the right track?

163538

3708

所以 DQN 怎麼會知道它走在對的路徑上了？

02:47

This is where babies come in.

167746

2458

此時就需要嬰兒了。

02:50

In studies, infants consistently look longer at pictures

170746

3875

在研究中，比起見過的圖片，嬰兒會

很一致地花更多時間去看他們以前沒見過的圖片。

02:54

they haven’t seen before than ones they have.

174621

2667

02:57

There just seems to be something intrinsically rewarding about novelty.

177579

4000

新奇感在本質上似乎很有回饋價值。

03:02

This behavior has been essential in understanding the infant mind.

182121

4125

對於了解嬰兒的大腦，這種行為十分重要。

03:06

It also turned out to be the secret to beating “Montezuma’s Revenge.”

186496

4792

後來也發現它就是贏得《Montezuma’s Revenge》的秘密。

03:12

The DeepMind researchers worked out an ingenious way

192121

3708

DeepMind 研究者想出了一種很別出心裁的方法，

03:15

to plug this preference for novelty into reinforcement learning.

195829

4500

把這種對新奇性的偏好

放入到強化學習中。

03:20

They made it so that unusual or new images appearing on the screen

200704

4542

他們讓不尋常或新的影像出現在螢幕時

03:25

were every bit as rewarding as real in-game points.

205246

4208

就和遊戲中的得分一樣有價值。

03:29

Suddenly, DQN was behaving totally differently from before.

209704

4709

突然間，DQN 的行為和以前完全不同了。

03:34

It wanted to explore the room it was in,

214579

2334

它會想要探索它所處的房間，

03:36

to grab the key and escape through the locked door—

216913

2708

抓起鑰匙，從鎖住的門逃脫——

03:39

not because it was worth 100 points,

219621

2708

理由不是因為這樣做能得一百分，

03:42

but for the same reason we would: to see what was on the other side.

222329

4667

而是和我們一樣的理由：

想看看另一頭有什麼。

03:48

With this new drive, DQN not only managed to grab that first key—

228163

5250

有了這個新動機，

DQN 不僅想辦法取得了第一把鑰匙——

03:53

it explored all the way through 15 of the temple’s 24 chambers.

233413

4833

它還一路探索了廟裡二十四個房間當中的十五個。

03:58

But emphasizing novelty-based rewards can sometimes create more problems

238454

4209

但著重以新奇性為基礎的獎勵，有時製造的問題比解決的還多。

04:02

than it solves.

242663

1166

04:03

A novelty-seeking system that’s played a game too long

243913

3208

尋求新奇性的系統如果玩一個遊戲太久，

04:07

will eventually lose motivation.

247121

2500

終究會失去動力。

04:09

If it’s seen it all before, why go anywhere?

249996

3042

如果什麼都看過了，那何必還要去任何地方？

04:13

Alternately, if it encounters, say, a television, it will freeze.

253621

5167

另一方面，比如，如果它遇到了一台電視，

它會呆住。

04:18

The constant novel images are essentially paralyzing.

258954

3750

不斷出現的新奇影像最終會讓它癱瘓。

04:23

The ideas and inspiration here go in both directions.

263204

3625

這裡的想法和鼓舞都是雙向的。

04:27

AI researchers stuck on a practical problem,

267079

3125

人工智慧研究者若卡在實際的問題上，比如

04:30

like how to get DQN to beat a difficult game,

270204

3334

如何在困難的遊戲中讓 DQN 能贏，

04:33

are turning increasingly to experts in human intelligence for ideas.

273538

5000

他們越來越向人類智慧的專家尋求點子。

04:38

At the same time,

278788

1125

同時，

04:39

AI is giving us new insights into the ways we get stuck and unstuck:

279913

5416

人工智慧也讓我們對於我們卡住和脫身的方式有了新的洞見：

04:45

into boredom, depression, and addiction,

285329

2792

思考無聊、沮喪，和成癮，

04:48

along with curiosity, creativity, and play.

288121

3667

同時也思考好奇心、創意，和玩樂。

New videos

06:51

The Rise of China's Homegrown Brands — and Why ...

06:45

Parkour! How the Sport Keeps Your Body and Mind...

05:38

Can you solve the riddle of Pandora’s box? - Al...

05:59

The tale of the Monkey King and the Buddha - Ji...

10:03

Which species would you get rid of? | Ada, Ep. 5

05:29

How are microchips made? - George Zaidan and Sa...

10:03

Why Daylight Is the Secret to Great Sleep | Chr...

11:12

6 Ways to Make Better Connections Online | Marg...

Original video on YouTube.com

How to get better at video games, according to babies - Brian Christian - YouTube

關於本網站

本網站將向您介紹對學習英語有用的 YouTube 視頻。您將看到來自世界各地的一流教師教授的英語課程。雙擊每個視頻頁面上顯示的英文字幕，從那裡播放視頻。字幕與視頻播放同步滾動。如果您有任何意見或要求，請使用此聯繫表與我們聯繫。

https://forms.gle/WvT1wiN1qDtmnspy7

Playback speed

Subtitle font size

How to get better at video games, according to babies - Brian Christian

New videos

How to get better at video games, according to babies - Brian Christian

New videos

Original video on YouTube.com