How to get better at video games, according to babies - Brian Christian

559,494 views ・ 2021-11-02

TED-Ed


請雙擊下方英文字幕播放視頻。

譯者: Lilian Chiu 審譯者: Helen Chang
00:08
In 2013, a group of researchers at DeepMind in London
0
8871
4292
2013 年,倫敦的 DeepMind 公司有一群研究員
00:13
had set their sights on a grand challenge.
1
13163
2666
決定要接受一項大挑戰。
00:15
They wanted to create an AI system that could beat,
2
15996
3292
他們要開發一個人工智慧系統,
不只能贏一場雅達利(Atari)遊戲,
00:19
not just a single Atari game, but every Atari game.
3
19288
4833
而是贏所有的雅達利遊戲。
00:24
They developed a system they called Deep Q Networks, or DQN,
4
24663
5166
他們開發了一個系統, 叫做深度 Q 網路(DQN)。
00:29
and less than two years later, it was superhuman.
5
29829
3667
不到兩年,它就超越了人類。
00:33
DQN was getting scores 13 times better
6
33954
4167
DQN 玩《Breakout》的得分
比專業的人類遊戲測試者高十三倍,
00:38
than professional human games testers at “Breakout,”
7
38121
3541
00:41
17 times better at “Boxing,” and 25 times better at “Video Pinball.”
8
41662
6334
玩《Boxing》的得分是十七倍,
《Video Pinball》是二十五倍。
00:48
But there was one notable, and glaring, exception.
9
48162
3834
但有個很引人注意且顯目的例外。
00:52
When playing “Montezuma’s Revenge” DQN couldn’t score a single point,
10
52496
5791
在玩《Montezuma’s Revenge》時,
DQN 一分也得不到,
00:58
even after playing for weeks.
11
58537
2625
玩了幾週之後仍然如此。
01:01
What was it that made this particular game so vexingly difficult for AI?
12
61412
5459
是什麼原因讓這款遊戲 特別讓人工智慧傷腦筋?
01:07
And what would it take to solve it?
13
67204
2459
要靠什麼才能解決這個問題?
01:10
Spoiler alert: babies.
14
70538
2833
以下有雷:
嬰兒。
01:13
We’ll come back to that in a minute.
15
73746
2000
這部分我們等下再回來談。
01:16
Playing Atari games with AI involves what’s called reinforcement learning,
16
76163
5541
用人工智慧玩雅達利 遊戲會用到所謂的
強化學習,
01:21
where the system is designed to maximize some kind of numerical rewards.
17
81871
4917
系統的設計是會讓某種 數字化的獎勵達到最高。
01:26
In this case, those rewards were simply the game's points.
18
86788
3833
在這個例子中, 獎勵就是遊戲的得分。
01:30
This underlying goal drives the system to learn which buttons to press
19
90746
4333
這個背後的目標,會驅使 系統去學習要按哪個按鈕,
01:35
and when to press them to get the most points.
20
95079
3000
以及何時要按, 才能得到最高的分數。
01:38
Some systems use model-based approaches, where they have a model of the environment
21
98079
5542
有些系統用以模型為基礎的方法,
會有一個環境的模型,
01:43
that they can use to predict what will happen next
22
103621
3125
用這個模型來預測 採取某個行動的後果。
01:46
once they take a certain action.
23
106746
2000
01:49
DQN, however, is model free.
24
109288
3041
然而,DQN 是不用模型的。
01:52
Instead of explicitly modeling its environment,
25
112704
2584
它不明確地將環境建模,
01:55
it just learns to predict, based on the images on screen,
26
115288
3458
而是學習根據螢幕上的影像來預測
01:58
how many future points it can expect to earn by pressing different buttons.
27
118746
4958
按各個按鈕預期將會得多少分。
02:03
For instance, “if the ball is here and I move left, more points,
28
123871
4792
比如:「如果球在這裡, 而我向左移動,
就會有更多分數,但若向右, 就沒有更多分數。」
02:08
but if I move right, no more points.”
29
128663
2833
02:12
But learning these connections requires a lot of trial and error.
30
132038
4500
但學習這些連結需要很大量的試誤。
02:16
The DQN system would start by mashing buttons randomly,
31
136704
3834
DQN 系統一開始是 先隨機亂按按鈕,
02:20
and then slowly piece together which buttons to mash when
32
140538
3541
接著慢慢拼湊出何時要按哪些按鈕
02:24
in order to maximize its score.
33
144079
2125
才能讓分數達到最高。
02:26
But in playing “Montezuma’s Revenge,”
34
146704
2375
但在玩《Montezuma’s Revenge》時,
02:29
this approach of random button-mashing fell flat on its face.
35
149079
4334
用隨機按鈕的方法輸得慘兮兮。
02:34
A player would have to perform this entire sequence
36
154121
3000
玩家得要做完這一連串過程
02:37
just to score their first points at the very end.
37
157121
3375
才能在最後得到第一分。
02:40
A mistake? Game over.
38
160871
2208
犯一個錯呢?遊戲結束。
02:43
So how could DQN even know it was on the right track?
39
163538
3708
所以 DQN 怎麼會知道 它走在對的路徑上了?
02:47
This is where babies come in.
40
167746
2458
此時就需要嬰兒了。
02:50
In studies, infants consistently look longer at pictures
41
170746
3875
在研究中,比起見過的圖片,嬰兒會
很一致地花更多時間 去看他們以前沒見過的圖片。
02:54
they haven’t seen before than ones they have.
42
174621
2667
02:57
There just seems to be something intrinsically rewarding about novelty.
43
177579
4000
新奇感在本質上似乎很有回饋價值。
03:02
This behavior has been essential in understanding the infant mind.
44
182121
4125
對於了解嬰兒的大腦, 這種行為十分重要。
03:06
It also turned out to be the secret to beating “Montezuma’s Revenge.”
45
186496
4792
後來也發現它就是贏得 《Montezuma’s Revenge》的秘密。
03:12
The DeepMind researchers worked out an ingenious way
46
192121
3708
DeepMind 研究者想出了 一種很別出心裁的方法,
03:15
to plug this preference for novelty into reinforcement learning.
47
195829
4500
把這種對新奇性的偏好
放入到強化學習中。
03:20
They made it so that unusual or new images appearing on the screen
48
200704
4542
他們讓不尋常或新的 影像出現在螢幕時
03:25
were every bit as rewarding as real in-game points.
49
205246
4208
就和遊戲中的得分一樣有價值。
03:29
Suddenly, DQN was behaving totally differently from before.
50
209704
4709
突然間,DQN 的行為 和以前完全不同了。
03:34
It wanted to explore the room it was in,
51
214579
2334
它會想要探索它所處的房間,
03:36
to grab the key and escape through the locked door—
52
216913
2708
抓起鑰匙,從鎖住的門逃脫——
03:39
not because it was worth 100 points,
53
219621
2708
理由不是因為這樣做能得一百分,
03:42
but for the same reason we would: to see what was on the other side.
54
222329
4667
而是和我們一樣的理由:
想看看另一頭有什麼。
03:48
With this new drive, DQN not only managed to grab that first key—
55
228163
5250
有了這個新動機,
DQN 不僅想辦法取得了 第一把鑰匙——
03:53
it explored all the way through 15 of the temple’s 24 chambers.
56
233413
4833
它還一路探索了廟裡 二十四個房間當中的十五個。
03:58
But emphasizing novelty-based rewards can sometimes create more problems
57
238454
4209
但著重以新奇性為基礎的獎勵, 有時製造的問題比解決的還多。
04:02
than it solves.
58
242663
1166
04:03
A novelty-seeking system that’s played a game too long
59
243913
3208
尋求新奇性的系統 如果玩一個遊戲太久,
04:07
will eventually lose motivation.
60
247121
2500
終究會失去動力。
04:09
If it’s seen it all before, why go anywhere?
61
249996
3042
如果什麼都看過了, 那何必還要去任何地方?
04:13
Alternately, if it encounters, say, a television, it will freeze.
62
253621
5167
另一方面,比如, 如果它遇到了一台電視,
它會呆住。
04:18
The constant novel images are essentially paralyzing.
63
258954
3750
不斷出現的新奇影像 最終會讓它癱瘓。
04:23
The ideas and inspiration here go in both directions.
64
263204
3625
這裡的想法和鼓舞都是雙向的。
04:27
AI researchers stuck on a practical problem,
65
267079
3125
人工智慧研究者若卡在 實際的問題上,比如
04:30
like how to get DQN to beat a difficult game,
66
270204
3334
如何在困難的遊戲中讓 DQN 能贏,
04:33
are turning increasingly to experts in human intelligence for ideas.
67
273538
5000
他們越來越向人類智慧的 專家尋求點子。
04:38
At the same time,
68
278788
1125
同時,
04:39
AI is giving us new insights into the ways we get stuck and unstuck:
69
279913
5416
人工智慧也讓我們對於我們卡住 和脫身的方式有了新的洞見:
04:45
into boredom, depression, and addiction,
70
285329
2792
思考無聊、沮喪,和成癮,
04:48
along with curiosity, creativity, and play.
71
288121
3667
同時也思考好奇心、創意,和玩樂。
關於本網站

本網站將向您介紹對學習英語有用的 YouTube 視頻。 您將看到來自世界各地的一流教師教授的英語課程。 雙擊每個視頻頁面上顯示的英文字幕,從那裡播放視頻。 字幕與視頻播放同步滾動。 如果您有任何意見或要求,請使用此聯繫表與我們聯繫。

https://forms.gle/WvT1wiN1qDtmnspy7


This website was created in October 2020 and last updated on June 12, 2025.

It is now archived and preserved as an English learning resource.

Some information may be out of date.

隱私政策

eng.lish.video

Developer's Blog