How to get better at video games, according to babies - Brian Christian
552,809 views ・ 2021-11-02
請雙擊下方英文字幕播放視頻。
譯者: Lilian Chiu
審譯者: Helen Chang
00:08
In 2013, a group of researchers
at DeepMind in London
0
8871
4292
2013 年,倫敦的 DeepMind
公司有一群研究員
00:13
had set their sights on a grand challenge.
1
13163
2666
決定要接受一項大挑戰。
00:15
They wanted to create an AI system
that could beat,
2
15996
3292
他們要開發一個人工智慧系統,
不只能贏一場雅達利(Atari)遊戲,
00:19
not just a single Atari game,
but every Atari game.
3
19288
4833
而是贏所有的雅達利遊戲。
00:24
They developed a system they called
Deep Q Networks, or DQN,
4
24663
5166
他們開發了一個系統,
叫做深度 Q 網路(DQN)。
00:29
and less than two years later,
it was superhuman.
5
29829
3667
不到兩年,它就超越了人類。
00:33
DQN was getting scores 13 times better
6
33954
4167
DQN 玩《Breakout》的得分
比專業的人類遊戲測試者高十三倍,
00:38
than professional human games testers
at “Breakout,”
7
38121
3541
00:41
17 times better at “Boxing,”
and 25 times better at “Video Pinball.”
8
41662
6334
玩《Boxing》的得分是十七倍,
《Video Pinball》是二十五倍。
00:48
But there was one notable, and glaring,
exception.
9
48162
3834
但有個很引人注意且顯目的例外。
00:52
When playing “Montezuma’s Revenge”
DQN couldn’t score a single point,
10
52496
5791
在玩《Montezuma’s Revenge》時,
DQN 一分也得不到,
00:58
even after playing for weeks.
11
58537
2625
玩了幾週之後仍然如此。
01:01
What was it that made this particular game
so vexingly difficult for AI?
12
61412
5459
是什麼原因讓這款遊戲
特別讓人工智慧傷腦筋?
01:07
And what would it take to solve it?
13
67204
2459
要靠什麼才能解決這個問題?
01:10
Spoiler alert: babies.
14
70538
2833
以下有雷:
嬰兒。
01:13
We’ll come back to that in a minute.
15
73746
2000
這部分我們等下再回來談。
01:16
Playing Atari games with AI involves
what’s called reinforcement learning,
16
76163
5541
用人工智慧玩雅達利
遊戲會用到所謂的
強化學習,
01:21
where the system is designed to maximize
some kind of numerical rewards.
17
81871
4917
系統的設計是會讓某種
數字化的獎勵達到最高。
01:26
In this case, those rewards were
simply the game's points.
18
86788
3833
在這個例子中,
獎勵就是遊戲的得分。
01:30
This underlying goal drives the system
to learn which buttons to press
19
90746
4333
這個背後的目標,會驅使
系統去學習要按哪個按鈕,
01:35
and when to press them
to get the most points.
20
95079
3000
以及何時要按,
才能得到最高的分數。
01:38
Some systems use model-based approaches,
where they have a model of the environment
21
98079
5542
有些系統用以模型為基礎的方法,
會有一個環境的模型,
01:43
that they can use to predict
what will happen next
22
103621
3125
用這個模型來預測
採取某個行動的後果。
01:46
once they take a certain action.
23
106746
2000
01:49
DQN, however, is model free.
24
109288
3041
然而,DQN 是不用模型的。
01:52
Instead of explicitly modeling
its environment,
25
112704
2584
它不明確地將環境建模,
01:55
it just learns to predict,
based on the images on screen,
26
115288
3458
而是學習根據螢幕上的影像來預測
01:58
how many future points it can expect
to earn by pressing different buttons.
27
118746
4958
按各個按鈕預期將會得多少分。
02:03
For instance, “if the ball is here
and I move left, more points,
28
123871
4792
比如:「如果球在這裡,
而我向左移動,
就會有更多分數,但若向右,
就沒有更多分數。」
02:08
but if I move right, no more points.”
29
128663
2833
02:12
But learning these connections requires
a lot of trial and error.
30
132038
4500
但學習這些連結需要很大量的試誤。
02:16
The DQN system would start
by mashing buttons randomly,
31
136704
3834
DQN 系統一開始是
先隨機亂按按鈕,
02:20
and then slowly piece together
which buttons to mash when
32
140538
3541
接著慢慢拼湊出何時要按哪些按鈕
02:24
in order to maximize its score.
33
144079
2125
才能讓分數達到最高。
02:26
But in playing “Montezuma’s Revenge,”
34
146704
2375
但在玩《Montezuma’s Revenge》時,
02:29
this approach of random button-mashing
fell flat on its face.
35
149079
4334
用隨機按鈕的方法輸得慘兮兮。
02:34
A player would have to perform
this entire sequence
36
154121
3000
玩家得要做完這一連串過程
02:37
just to score their first points
at the very end.
37
157121
3375
才能在最後得到第一分。
02:40
A mistake? Game over.
38
160871
2208
犯一個錯呢?遊戲結束。
02:43
So how could DQN even know
it was on the right track?
39
163538
3708
所以 DQN 怎麼會知道
它走在對的路徑上了?
02:47
This is where babies come in.
40
167746
2458
此時就需要嬰兒了。
02:50
In studies, infants consistently look
longer at pictures
41
170746
3875
在研究中,比起見過的圖片,嬰兒會
很一致地花更多時間
去看他們以前沒見過的圖片。
02:54
they haven’t seen before
than ones they have.
42
174621
2667
02:57
There just seems to be something
intrinsically rewarding about novelty.
43
177579
4000
新奇感在本質上似乎很有回饋價值。
03:02
This behavior has been essential
in understanding the infant mind.
44
182121
4125
對於了解嬰兒的大腦,
這種行為十分重要。
03:06
It also turned out to be the secret
to beating “Montezuma’s Revenge.”
45
186496
4792
後來也發現它就是贏得
《Montezuma’s Revenge》的秘密。
03:12
The DeepMind researchers worked
out an ingenious way
46
192121
3708
DeepMind 研究者想出了
一種很別出心裁的方法,
03:15
to plug this preference for novelty
into reinforcement learning.
47
195829
4500
把這種對新奇性的偏好
放入到強化學習中。
03:20
They made it so that unusual or new images
appearing on the screen
48
200704
4542
他們讓不尋常或新的
影像出現在螢幕時
03:25
were every bit as rewarding
as real in-game points.
49
205246
4208
就和遊戲中的得分一樣有價值。
03:29
Suddenly, DQN was behaving totally
differently from before.
50
209704
4709
突然間,DQN 的行為
和以前完全不同了。
03:34
It wanted to explore the room it was in,
51
214579
2334
它會想要探索它所處的房間,
03:36
to grab the key and escape
through the locked door—
52
216913
2708
抓起鑰匙,從鎖住的門逃脫——
03:39
not because it was worth 100 points,
53
219621
2708
理由不是因為這樣做能得一百分,
03:42
but for the same reason we would:
to see what was on the other side.
54
222329
4667
而是和我們一樣的理由:
想看看另一頭有什麼。
03:48
With this new drive, DQN not only
managed to grab that first key—
55
228163
5250
有了這個新動機,
DQN 不僅想辦法取得了
第一把鑰匙——
03:53
it explored all the way through 15
of the temple’s 24 chambers.
56
233413
4833
它還一路探索了廟裡
二十四個房間當中的十五個。
03:58
But emphasizing novelty-based rewards
can sometimes create more problems
57
238454
4209
但著重以新奇性為基礎的獎勵,
有時製造的問題比解決的還多。
04:02
than it solves.
58
242663
1166
04:03
A novelty-seeking system that’s played
a game too long
59
243913
3208
尋求新奇性的系統
如果玩一個遊戲太久,
04:07
will eventually lose motivation.
60
247121
2500
終究會失去動力。
04:09
If it’s seen it all before,
why go anywhere?
61
249996
3042
如果什麼都看過了,
那何必還要去任何地方?
04:13
Alternately, if it encounters, say,
a television, it will freeze.
62
253621
5167
另一方面,比如,
如果它遇到了一台電視,
它會呆住。
04:18
The constant novel images
are essentially paralyzing.
63
258954
3750
不斷出現的新奇影像
最終會讓它癱瘓。
04:23
The ideas and inspiration here
go in both directions.
64
263204
3625
這裡的想法和鼓舞都是雙向的。
04:27
AI researchers stuck
on a practical problem,
65
267079
3125
人工智慧研究者若卡在
實際的問題上,比如
04:30
like how to get DQN to beat
a difficult game,
66
270204
3334
如何在困難的遊戲中讓 DQN 能贏,
04:33
are turning increasingly to experts
in human intelligence for ideas.
67
273538
5000
他們越來越向人類智慧的
專家尋求點子。
04:38
At the same time,
68
278788
1125
同時,
04:39
AI is giving us new insights
into the ways we get stuck and unstuck:
69
279913
5416
人工智慧也讓我們對於我們卡住
和脫身的方式有了新的洞見:
04:45
into boredom, depression, and addiction,
70
285329
2792
思考無聊、沮喪,和成癮,
04:48
along with curiosity, creativity,
and play.
71
288121
3667
同時也思考好奇心、創意,和玩樂。
New videos
Original video on YouTube.com
關於本網站
本網站將向您介紹對學習英語有用的 YouTube 視頻。 您將看到來自世界各地的一流教師教授的英語課程。 雙擊每個視頻頁面上顯示的英文字幕,從那裡播放視頻。 字幕與視頻播放同步滾動。 如果您有任何意見或要求,請使用此聯繫表與我們聯繫。