How to get better at video games, according to babies - Brian Christian

552,809 views ใƒป 2021-11-02

TED-Ed


ืื ื ืœื—ืฅ ืคืขืžื™ื™ื ืขืœ ื”ื›ืชื•ื‘ื™ื•ืช ื‘ืื ื’ืœื™ืช ืœืžื˜ื” ื›ื“ื™ ืœื”ืคืขื™ืœ ืืช ื”ืกืจื˜ื•ืŸ.

ืชืจื’ื•ื: Ido Dekkers ืขืจื™ื›ื”: zeeva livshitz
00:08
In 2013, a group of researchers at DeepMind in London
0
8871
4292
ื‘ 2013, ืงื‘ื•ืฆื” ืฉืœ ื—ื•ืงืจื™ื ื‘ื“ื™ืคืžื™ื™ื ื“ ื‘ืœื•ื ื“ื•ืŸ
00:13
had set their sights on a grand challenge.
1
13163
2666
ืฉืžื” ืืช ืขื™ื ื™ื™ื” ืขืœ ืืชื’ืจ ื’ื“ื•ืœ.
00:15
They wanted to create an AI system that could beat,
2
15996
3292
ื”ื ืจืฆื• ืœื™ืฆื•ืจ ืžืขืจื›ืช ื‘ื™ื ื” ืžืœืื›ื•ืชื™ืช ืฉืชื•ื›ืœ ืœื”ื‘ื™ืก,
00:19
not just a single Atari game, but every Atari game.
3
19288
4833
ืœื ืจืง ืžืฉื—ืง ืื˜ืืจื™ ื‘ื•ื“ื“ ืืœื ื›ืœ ืžืฉื—ืง ืื˜ืืจื™.
00:24
They developed a system they called Deep Q Networks, or DQN,
4
24663
5166
ื”ื ืคื™ืชื—ื• ืžืขืจื›ืช ืฉื ืงืจืื” ืจืฉืชื•ืช Q ืขืžื•ืงื•ืช, ืื• DQN,
00:29
and less than two years later, it was superhuman.
5
29829
3667
ื•ืคื—ื•ืช ืžืฉื ืชื™ื™ื ืœืื—ืจ ืžื›ืŸ, ื”ื™ื ื”ื™ืชื” ืขืœ ืื ื•ืฉื™ืช.
00:33
DQN was getting scores 13 times better
6
33954
4167
DQN ืงื™ื‘ืœื” ืชื•ืฆืื•ืช ื˜ื•ื‘ื•ืช ืคื™ 13
00:38
than professional human games testers at โ€œBreakout,โ€
7
38121
3541
ืžืฉื—ืงื ื™ื ืื ื•ืฉื™ื™ื ืžืงืฆื•ืขื ื™ื ื‘โ€œื‘ืจื™ื™ืงืืื•ื˜,โ€
00:41
17 times better at โ€œBoxing,โ€ and 25 times better at โ€œVideo Pinball.โ€
8
41662
6334
ื˜ื•ื‘ื” ืคื™ 17 ื‘โ€œื‘ื•ืงืกื™ื ื’,โ€ ื•ื˜ื•ื‘ื” ืคื™ 25 ื‘โ€œืคื™ื ื‘ื•ืœ ื•ื•ื™ื“ืื•.โ€
00:48
But there was one notable, and glaring, exception.
9
48162
3834
ืื‘ืœ ื”ื™ืชื” ื—ืจื™ื’ื” ืื—ืช ื‘ื•ื”ืงืช ื•ื ืชื•ื ื” ืœืฆื™ื•ืŸ.
00:52
When playing โ€œMontezumaโ€™s Revengeโ€ DQN couldnโ€™t score a single point,
10
52496
5791
ื›ืฉืžืฉื—ืงื™ื โ€œื ืงืžืช ืžื•ื ื˜ื–ื•ืžื”โ€ DQN ืœื ื”ืฆืœื™ื—ื” ืœืงื‘ืœ ื ืงื•ื“ื” ืื—ืช,
00:58
even after playing for weeks.
11
58537
2625
ืืคื™ืœื• ืื—ืจื™ ืฉืฉื™ื—ืงื” ื‘ืžืฉืš ืฉื‘ื•ืขื•ืช.
01:01
What was it that made this particular game so vexingly difficult for AI?
12
61412
5459
ืžื” ื–ื” ื”ื™ื” ืฉื”ืคืš ืืช ื”ืžืฉื—ืง ื”ืžืกื•ื™ื™ื ื”ื–ื” ืœื›ืœ ื›ืš ืžื‘ืœื‘ืœ ืœืžืขืจื›ืช ื”ื‘ื™ื ื” ื”ืžืœืื›ื•ืชื™ืช?
01:07
And what would it take to solve it?
13
67204
2459
ื•ืžื” ื”ื™ื” ื“ืจื•ืฉ ื›ื“ื™ ืœืคืชื•ืจ ืืช ื–ื”?
01:10
Spoiler alert: babies.
14
70538
2833
ืื–ื”ืจืช ืกืคื•ื™ื™ืœืจ: ืชื™ื ื•ืงื•ืช.
01:13
Weโ€™ll come back to that in a minute.
15
73746
2000
ื ื—ื–ื•ืจ ืœื–ื” ืขื•ื“ ื“ืงื”.
01:16
Playing Atari games with AI involves whatโ€™s called reinforcement learning,
16
76163
5541
ืžืฉื—ืง ื‘ืžืฉื—ืงื™ ืื˜ืืจื™ ืขื ื‘ื™ื ื” ืžืœืื›ื•ืชื™ืช ื›ื•ืœืœ ืžื” ืฉื ืงืจื ืœืžื™ื“ื” ืžื—ื–ืงืช,
01:21
where the system is designed to maximize some kind of numerical rewards.
17
81871
4917
ืฉื ื”ืžืขืจื›ืช ืžืชื•ื›ื ื ืช ืœืžืงืกื ืกื•ื’ื™ื ืžืกื•ื™ื™ืžื™ื ืฉืœ ืคืจืกื™ื ืžืกืคืจื™ื™ื.
01:26
In this case, those rewards were simply the game's points.
18
86788
3833
ื‘ืžืงืจื” ื”ื–ื”, ื”ืคืจืกื™ื ื”ืืœื” ื”ื™ื• ืคืฉื•ื˜ ื ืงื•ื“ื•ืช ืฉืœ ื”ืฉื—ืงืŸ.
01:30
This underlying goal drives the system to learn which buttons to press
19
90746
4333
ื”ืžื˜ืจื” ื”ื–ื• ืžื ื™ืขื” ืืช ื”ืžืขืจื›ืช ืœืœืžื•ื“ ืขืœ ืื™ื–ื” ื›ืคืชื•ืจื™ื ืœืœื—ื•ืฅ
01:35
and when to press them to get the most points.
20
95079
3000
ื•ืžืชื™ ืœืœื—ื•ืฅ ืขืœื™ื”ื ื›ื“ื™ ืœืงื‘ืœ ืืช ืžื™ืจื‘ ื”ื ืงื•ื“ื•ืช.
01:38
Some systems use model-based approaches, where they have a model of the environment
21
98079
5542
ื›ืžื” ืžืขืจื›ื•ืช ืžืชื‘ืกืกื•ืช ืขืœ ื’ื™ืฉื” ืžื‘ื•ืกืกืช ืžื•ื“ืœ, ื‘ื” ื™ืฉ ืœื”ืŸ ืžื•ื“ืœ ืฉืœ ื”ืกื‘ื™ื‘ื”
01:43
that they can use to predict what will happen next
22
103621
3125
ื‘ื• ื”ืŸ ื™ื›ื•ืœื•ืช ืœื”ืฉืชืžืฉ ื›ื“ื™ ืœื—ื–ื•ืช ืžื” ื™ืงืจื” ืขื›ืฉื™ื•
01:46
once they take a certain action.
23
106746
2000
ื‘ืจื’ืข ืฉื”ืŸ ื™ื ืงื˜ื• ื‘ืคืขื•ืœื” ืžืกื•ื™ื™ืžืช.
01:49
DQN, however, is model free.
24
109288
3041
DQN, ืขื ื–ืืช, ื ื˜ื•ืœืช ืžื•ื“ืœ.
01:52
Instead of explicitly modeling its environment,
25
112704
2584
ื‘ืžืงื•ื ืœืžื“ืœ ืžืคื•ืจืฉื•ืช ืืช ื”ืกื‘ื™ื‘ื”,
01:55
it just learns to predict, based on the images on screen,
26
115288
3458
ื”ื™ื ืคืฉื•ื˜ ืœื•ืžื“ืช ืœื—ื–ื•ืช, ื‘ื”ืชื‘ืกืก ืขืœ ื”ืชืžื•ื ื•ืช ืขืœ ื”ืžืกืš,
01:58
how many future points it can expect to earn by pressing different buttons.
27
118746
4958
ื›ืžื” ื ืงื•ื“ื•ืช ืขืชื™ื“ื™ื•ืช ื”ื™ื ื™ื›ื•ืœื” ืœืฆืคื•ืช ืœื”ืจื•ื•ื™ื— ืขืœ ื™ื“ื™ ืœื—ื™ืฆื” ืขืœ ื›ืคืชื•ืจื™ื ืฉื•ื ื™ื.
02:03
For instance, โ€œif the ball is here and I move left, more points,
28
123871
4792
ืœื“ื•ื’ืžื”, โ€œืื ื”ื›ื“ื•ืจ ืคื” ื•ืื ื™ ื–ื–ื” ืฉืžืืœื”, ื™ื•ืชืจ ื ืงื•ื“ื•ืช,
02:08
but if I move right, no more points.โ€
29
128663
2833
ืื‘ืœ ืื ืื ื™ ื–ื–ื” ืฉืžืืœื”, ืื™ืŸ ื™ื•ืชืจ ื ืงื•ื“ื•ืช.โ€
02:12
But learning these connections requires a lot of trial and error.
30
132038
4500
ืื‘ืœ ืœืžื™ื“ืช ื”ืงื™ืฉื•ืจื™ื ื”ืืœื” ื“ื•ืจืฉืช ื”ืจื‘ื” ื ื™ืกื•ื™ ื•ื˜ืขื™ื™ื”.
02:16
The DQN system would start by mashing buttons randomly,
31
136704
3834
ืžืขืจื›ืช DQN ื”ื™ืชื” ืžืชื—ื™ืœื” ืขืœ ื™ื“ื™ ืœื—ื™ืฆื” ืืงืจืื™ืช ืขืœ ื›ืคืชื•ืจื™ื,
02:20
and then slowly piece together which buttons to mash when
32
140538
3541
ื•ืื– ืœืื˜ ืœืื˜ ืžื‘ื™ื ื” ืขืœ ืื™ื–ื” ื›ืคืชื•ืจื™ื ืœืœื—ื•ืฅ ื•ืžืชื™
02:24
in order to maximize its score.
33
144079
2125
ื›ื“ื™ ืœืžืงืกื ืืช ื”ืชื•ืฆืื”.
02:26
But in playing โ€œMontezumaโ€™s Revenge,โ€
34
146704
2375
ืื‘ืœ ื‘ืžืฉื—ืง โ€œื ืงืžืช ืžื•ื ื˜ื–ื•ืžื”,โ€
02:29
this approach of random button-mashing fell flat on its face.
35
149079
4334
ื”ื’ื™ืฉื” ื”ื–ื• ืฉืœ ืœื—ื™ืฆื” ืืงืจืื™ืช ืขืœ ื›ืคืชื•ืจื™ื ื”ืชืจืกืงื”.
02:34
A player would have to perform this entire sequence
36
154121
3000
ืฉื—ืงืŸ ื”ื™ื” ืฆืจื™ืš ืœื‘ืฆืข ืืช ื›ืœ ื”ืจืฆืฃ
02:37
just to score their first points at the very end.
37
157121
3375
ืจืง ื›ื“ื™ ืœื–ื›ื•ืช ื‘ื ืงื•ื“ื” ื”ืจืืฉื•ื ื” ืžืžืฉ ื‘ืกื•ืฃ.
02:40
A mistake? Game over.
38
160871
2208
ื˜ืขื•ืช? ื”ืžืฉื—ืง ื ื’ืžืจ.
02:43
So how could DQN even know it was on the right track?
39
163538
3708
ืื– ืื™ืš DQN ืืคื™ืœื• ืชื“ืข ืฉื–ื” ื”ืžืกืœื•ืœ ื”ื ื›ื•ืŸ?
02:47
This is where babies come in.
40
167746
2458
ืคื” ื ื›ื ืกื™ื ืชื™ื ื•ืงื•ืช ืœืชืžื•ื ื”.
02:50
In studies, infants consistently look longer at pictures
41
170746
3875
ื‘ืžื—ืงืจื™ื, ืชื™ื ื•ืงื•ืช ืžื‘ื™ื˜ื™ื ื‘ืขืงื‘ื™ื•ืช ื™ื•ืชืจ ื–ืžืŸ ืขืœ ืชืžื•ื ื•ืช
02:54
they havenโ€™t seen before than ones they have.
42
174621
2667
ืฉื”ื ืœื ืจืื• ืœืคื ื™ ื›ืŸ ืžืืฉืจ ืขืœ ืืœื• ืฉืจืื•.
02:57
There just seems to be something intrinsically rewarding about novelty.
43
177579
4000
ืคืฉื•ื˜ ื ืจืื” ืฉื™ืฉ ืžืฉื”ื• ืžืกืคืง ื‘ืื•ืคืŸ ืžื”ื•ืชื™ ื‘ื ื•ื’ืข ืœื—ื“ืฉื ื•ืช.
03:02
This behavior has been essential in understanding the infant mind.
44
182121
4125
ื”ื”ืชื ื”ื’ื•ืช ื”ื–ื• ื”ื™ืชื” ื—ื™ื•ื ื™ืช ืœื”ื‘ื ืช ืžื•ื— ื”ืชื™ื ื•ืงื•ืช.
03:06
It also turned out to be the secret to beating โ€œMontezumaโ€™s Revenge.โ€
45
186496
4792
ืžืกืชื‘ืจ ื’ื ืฉื–ื” ื”ืกื•ื“ ืœื”ื‘ืกืช โ€œื ืงืžืช ืžื•ื ื˜ื–ื•ืžื”.โ€
03:12
The DeepMind researchers worked out an ingenious way
46
192121
3708
ื—ื•ืงืจื™ ื“ื™ืค ืžื™ื™ื ื“ ื”ืขืœื• ื“ืจืš ื’ืื•ื ื™ืช
03:15
to plug this preference for novelty into reinforcement learning.
47
195829
4500
ืœื”ื›ื ื™ืก ืืช ื”ื”ืขื“ืคื” ื”ื–ื• ืœื—ื“ืฉื ื•ืช ืœืชื•ืš ืœืžื™ื“ื” ืžื—ื–ืงืช.
03:20
They made it so that unusual or new images appearing on the screen
48
200704
4542
ื”ื ื’ืจืžื• ืœื›ืš ืฉืชืžื•ื ื•ืช ื—ื“ืฉื•ืช ืื• ืฉื•ื ื•ืช ืฉื”ื•ืคื™ืขื• ืขืœ ื”ืžืกืš
03:25
were every bit as rewarding as real in-game points.
49
205246
4208
ื”ื™ื• ืžืชื’ืžืœื•ืช ื›ืžื• ื ืงื•ื“ื•ืช ื‘ืžืฉื—ืง ื”ืืžื™ืชื™.
03:29
Suddenly, DQN was behaving totally differently from before.
50
209704
4709
ืคืชืื•ื, DQN ื”ืชื ื”ื’ื” ืฉื•ื ื” ืœื’ืžืจื™ ืžืœืคื ื™ ื›ืŸ.
03:34
It wanted to explore the room it was in,
51
214579
2334
ื”ื™ื ืจืฆืชื” ืœื—ืงื•ืจ ืืช ื”ื—ื“ืจ ื‘ื• ื”ื™ืชื”,
03:36
to grab the key and escape through the locked doorโ€”
52
216913
2708
ื›ื“ื™ ืœืชืคื•ืก ืืช ื”ืžืคืชื— ื•ืœื‘ืจื•ื— ื“ืจืš ื”ื“ืœืช ื”ื ืขื•ืœื” --
03:39
not because it was worth 100 points,
53
219621
2708
ืœื ื‘ื’ืœืœ ืฉื–ื” ื”ื™ื” ืฉื•ื•ื” 100 ื ืงื•ื“ื•ืช,
03:42
but for the same reason we would: to see what was on the other side.
54
222329
4667
ืืœื ืžืื•ืชื” ืกื™ื‘ื” ืฉืื ื—ื ื• ื”ื™ื™ื ื• ืขื•ืฉื™ื: ื›ื“ื™ ืœืจืื•ืช ืžื” ื™ืฉ ื‘ืฆื“ ื”ืฉื ื™.
03:48
With this new drive, DQN not only managed to grab that first keyโ€”
55
228163
5250
ืขื ื”ืžื ื™ืข ื”ื—ื“ืฉ ื”ื–ื”, DQN ืœื ืจืง ื”ืฆืœื™ื—ื” ืœืชืคื•ืก ืืช ื”ืžืคืชื— ื”ืจืืฉื•ืŸ --
03:53
it explored all the way through 15 of the templeโ€™s 24 chambers.
56
233413
4833
ื”ื™ื ื—ืงืจื” ื›ืœ ื”ื“ืจืš ืขื“ 15 ืž 24 ื”ื—ื“ืจื™ื ืฉืœ ื”ืžืงื“ืฉ.
03:58
But emphasizing novelty-based rewards can sometimes create more problems
57
238454
4209
ืื‘ืœ ื”ื“ื’ืฉืช ืคืจืกื™ื ืžื‘ื•ืกืกื™ ื—ื“ืฉื ื•ืช ื™ื›ื•ืœื” ืœืคืขืžื™ื ืœื™ืฆื•ืจ ื™ื•ืชืจ ื‘ืขื™ื•ืช
04:02
than it solves.
58
242663
1166
ืžืฉื”ื™ื ืคื•ืชืจืช.
04:03
A novelty-seeking system thatโ€™s played a game too long
59
243913
3208
ืžืขืจื›ืช ืžื—ืคืฉืช ื—ื“ืฉื ื•ืช ืฉืžืฉื—ืงืช ืžืฉื—ืง ื™ื•ืชืจ ืžื“ื™ ื–ืžืŸ
04:07
will eventually lose motivation.
60
247121
2500
ืชืื‘ื“ ื‘ืกื•ืฃ ืืช ื”ืžื•ื˜ื™ื‘ืฆื™ื”.
04:09
If itโ€™s seen it all before, why go anywhere?
61
249996
3042
ืื ื”ื™ื ืจืืชื” ืืช ื”ื›ืœ ืœืคื ื™ ื›ืŸ, ืœืžื” ืœืœื›ืช ืœืžืงื•ื ื›ืœืฉื”ื•?
04:13
Alternately, if it encounters, say, a television, it will freeze.
62
253621
5167
ื‘ืื•ืคืŸ ื—ืœื•ืคื™, ืื ื”ื™ื” ื”ื™ืชื” ื ืชืงืœืช, ื ื’ื™ื“, ื‘ื˜ืœื•ื•ื™ื–ื™ื”, ื”ื™ื ื”ื™ืชื” ืงื•ืคืืช.
04:18
The constant novel images are essentially paralyzing.
63
258954
3750
ื”ืชืžื•ื ื•ืช ื”ื—ื“ืฉื•ืช ื”ืžืชืžืฉื›ื•ืช ืคืฉื•ื˜ ืžืฉืชืงื•ืช.
04:23
The ideas and inspiration here go in both directions.
64
263204
3625
ื”ืจืขื™ื•ื ื•ืช ื•ื”ื”ืฉืจืื” ืคื” ืžื ื•ื’ื“ื™ื.
04:27
AI researchers stuck on a practical problem,
65
267079
3125
ื—ื•ืงืจื™ ื‘ื™ื ื” ืžืœืื›ื•ืชื™ืช ืฉืชืงื•ืขื™ื ืขืœ ื‘ืขื™ื” ืคืจืงื˜ื™ืช,
04:30
like how to get DQN to beat a difficult game,
66
270204
3334
ื›ืžื• ืื™ืš ืœื’ืจื•ื ืœ DQN ืœื ืฆื— ืžืฉื—ืง ืงืฉื”,
04:33
are turning increasingly to experts in human intelligence for ideas.
67
273538
5000
ืคื•ื ื™ื ื™ื•ืชืจ ื•ื™ื•ืชืจ ืœืžื•ืžื—ื™ื ื‘ื‘ื™ื ื” ืื ื•ืฉื™ืช ืœืจืขื™ื•ื ื•ืช.
04:38
At the same time,
68
278788
1125
ื‘ืื•ืชื• ื–ืžืŸ,
04:39
AI is giving us new insights into the ways we get stuck and unstuck:
69
279913
5416
ื‘ื™ื ื” ืžืœืื›ื•ืชื™ืช ื ื•ืชื ืช ืœื ื• ืชื•ื‘ื ื•ืช ืœื“ืจื›ื™ื ื‘ื”ืŸ ืื ื—ื ื• ื ืชืงืขื™ื ื•ืžืฉืชื—ืจืจื™ื:
04:45
into boredom, depression, and addiction,
70
285329
2792
ืœืฉืขืžื•ื, ื“ื™ื›ืื•ืŸ ื•ื”ืชืžื›ืจื•ืช,
04:48
along with curiosity, creativity, and play.
71
288121
3667
ื™ื—ื“ ืขื ืกืงืจื ื•ืช, ื™ืฆื™ืจืชื™ื•ืช ื•ืžืฉื—ืง.
ืขืœ ืืชืจ ื–ื”

ืืชืจ ื–ื” ื™ืฆื™ื’ ื‘ืคื ื™ื›ื ืกืจื˜ื•ื ื™ YouTube ื”ืžื•ืขื™ืœื™ื ืœืœื™ืžื•ื“ ืื ื’ืœื™ืช. ืชื•ื›ืœื• ืœืจืื•ืช ืฉื™ืขื•ืจื™ ืื ื’ืœื™ืช ื”ืžื•ืขื‘ืจื™ื ืขืœ ื™ื“ื™ ืžื•ืจื™ื ืžื”ืฉื•ืจื” ื”ืจืืฉื•ื ื” ืžืจื—ื‘ื™ ื”ืขื•ืœื. ืœื—ืฅ ืคืขืžื™ื™ื ืขืœ ื”ื›ืชื•ื‘ื™ื•ืช ื‘ืื ื’ืœื™ืช ื”ืžื•ืฆื’ื•ืช ื‘ื›ืœ ื“ืฃ ื•ื™ื“ืื• ื›ื“ื™ ืœื”ืคืขื™ืœ ืืช ื”ืกืจื˜ื•ืŸ ืžืฉื. ื”ื›ืชื•ื‘ื™ื•ืช ื’ื•ืœืœื•ืช ื‘ืกื ื›ืจื•ืŸ ืขื ื”ืคืขืœืช ื”ื•ื•ื™ื“ืื•. ืื ื™ืฉ ืœืš ื”ืขืจื•ืช ืื• ื‘ืงืฉื•ืช, ืื ื ืฆื•ืจ ืื™ืชื ื• ืงืฉืจ ื‘ืืžืฆืขื•ืช ื˜ื•ืคืก ื™ืฆื™ืจืช ืงืฉืจ ื–ื”.

https://forms.gle/WvT1wiN1qDtmnspy7