How to get better at video games, according to babies - Brian Christian

542,613 views ・ 2021-11-02

TED-Ed


μ•„λž˜ μ˜λ¬Έμžλ§‰μ„ λ”λΈ”ν΄λ¦­ν•˜μ‹œλ©΄ μ˜μƒμ΄ μž¬μƒλ©λ‹ˆλ‹€.

λ²ˆμ—­: Sohee Park κ²€ν† : DK Kim
00:08
In 2013, a group of researchers at DeepMind in London
0
8871
4292
2013λ…„, 런던 λ”₯λ§ˆμΈλ“œμ˜ 연ꡬ원듀은
00:13
had set their sights on a grand challenge.
1
13163
2666
μœ„λŒ€ν•œ 도전을 ν•˜κΈ°λ‘œ ν•©λ‹ˆλ‹€.
00:15
They wanted to create an AI system that could beat,
2
15996
3292
인곡지λŠ₯ μ‹œμŠ€ν…œμ„ λ§Œλ“€μ–΄μ„œ
00:19
not just a single Atari game, but every Atari game.
3
19288
4833
ν•˜λ‚˜κ°€ μ•„λ‹ˆλΌ λͺ¨λ“  아타리 κ²Œμž„μ—μ„œ 이기고자 ν•œ 것이죠.
00:24
They developed a system they called Deep Q Networks, or DQN,
4
24663
5166
그듀은 Deep Q Networks, 즉 DQNμ΄λΌλŠ” μ‹œμŠ€ν…œμ„ κ°œλ°œν–ˆλŠ”λ°
00:29
and less than two years later, it was superhuman.
5
29829
3667
2년도 μ•ˆ λ˜μ–΄μ„œ, 이 μ‹œμŠ€ν…œμ€ 인간을 λ„˜μ–΄μ„°μŠ΅λ‹ˆλ‹€.
00:33
DQN was getting scores 13 times better
6
33954
4167
DQN은 μƒλŒ€ν•˜λŠ” 인간 μ„ μˆ˜λ³΄λ‹€
00:38
than professional human games testers at β€œBreakout,”
7
38121
3541
β€œλΈŒλ ˆμ΄ν¬μ•„μ›ƒβ€μ€ 13λ°°,
00:41
17 times better at β€œBoxing,” and 25 times better at β€œVideo Pinball.”
8
41662
6334
β€œλ³΅μ‹±β€μ€ 17λ°°, β€œλΉ„λ””μ˜€ 핀볼”은 25λ°° 더 μž˜ν–ˆμŠ΅λ‹ˆλ‹€.
00:48
But there was one notable, and glaring, exception.
9
48162
3834
ν•˜μ§€λ§Œ μ£Όλͺ©ν•  λ§Œν•œ, λˆˆμ— λ„λŠ” μ˜ˆμ™Έκ°€ ν•˜λ‚˜ μžˆμ—ˆμ£ .
00:52
When playing β€œMontezuma’s Revenge” DQN couldn’t score a single point,
10
52496
5791
DQN은 β€œλͺ¬ν…Œμˆ˜λ§ˆμ˜ λ³΅μˆ˜β€ κ²Œμž„μ—μ„œ 단 ν•œ 점도 얻지 λͺ»ν–ˆμŠ΅λ‹ˆλ‹€.
00:58
even after playing for weeks.
11
58537
2625
κ²Œμž„μ„ λͺ‡ μ£Όκ°„μ΄λ‚˜ ν•˜κ³  λ‚˜μ„œλ„μš”.
01:01
What was it that made this particular game so vexingly difficult for AI?
12
61412
5459
이 κ²Œμž„μ˜ μ–΄λ–€ 점이 κ·Έλ ‡κ²Œ 인곡지λŠ₯μ—κ²Œ μ–΄λ €μ› μ„κΉŒμš”?
01:07
And what would it take to solve it?
13
67204
2459
그리고 이λ₯Ό ν•΄κ²°ν•˜κΈ° μœ„ν•΄ 무엇이 ν•„μš”ν• κΉŒμš”?
01:10
Spoiler alert: babies.
14
70538
2833
닡을 미리 λ³Έλ‹€λ©΄ λ°”λ‘œ μ•„κΈ°λ“€μž…λ‹ˆλ‹€.
01:13
We’ll come back to that in a minute.
15
73746
2000
이에 λŒ€ν•΄μ„œλŠ” μž μ‹œ ν›„ λ‹€μ‹œ 보죠.
01:16
Playing Atari games with AI involves what’s called reinforcement learning,
16
76163
5541
인곡지λŠ₯으둜 아타리 κ²Œμž„μ„ ν•˜λŠ” λ°μ—λŠ” κ°•ν™” ν•™μŠ΅μ΄λΌ λΆ€λ₯΄λŠ” 것을 μ‚¬μš©ν•˜λŠ”λ°
01:21
where the system is designed to maximize some kind of numerical rewards.
17
81871
4917
μ΄λŠ” μΌμ’…μ˜ 수치적 보상을 κ·ΉλŒ€ν™”ν•˜λ„λ‘ μ„€κ³„λ˜μ—ˆμŠ΅λ‹ˆλ‹€.
01:26
In this case, those rewards were simply the game's points.
18
86788
3833
μ—¬κΈ°μ„œ 보상은 κ·Έμ € κ²Œμž„ μ μˆ˜μ˜€μ£ .
01:30
This underlying goal drives the system to learn which buttons to press
19
90746
4333
μ΄λŸ¬ν•œ κΈ°λ³Έ λͺ©ν‘œλŠ” μ‹œμŠ€ν…œμ΄ μ–΄λ–€ 단좔λ₯Ό λˆ„λ₯Όμ§€
01:35
and when to press them to get the most points.
20
95079
3000
그리고 μ–Έμ œ λˆŒλŸ¬μ•Ό κ°€μž₯ λ§Žμ€ 점수λ₯Ό 얻을 수 μžˆλŠ”μ§€ ν•™μŠ΅ν•˜λ„λ‘ ν•©λ‹ˆλ‹€.
01:38
Some systems use model-based approaches, where they have a model of the environment
21
98079
5542
μ–΄λ–€ μ‹œμŠ€ν…œμ€ λͺ¨λΈ 기반 μ ‘κ·Ό 방식을 μ‚¬μš©ν•˜λŠ”λ°
이 ν™˜κ²½ λͺ¨λΈμ€ νŠΉμ • 쑰치λ₯Ό μ·¨ν•˜λ©΄ λ‹€μŒμ— μ–΄λ–€ 일이 일어날지λ₯Ό
01:43
that they can use to predict what will happen next
22
103621
3125
01:46
once they take a certain action.
23
106746
2000
μ˜ˆμΈ‘ν•˜λŠ” 데 μ‚¬μš©ν•  수 μžˆμŠ΅λ‹ˆλ‹€.
01:49
DQN, however, is model free.
24
109288
3041
κ·ΈλŸ¬λ‚˜ DQN은 λͺ¨λΈμ΄ μ—†μŠ΅λ‹ˆλ‹€.
01:52
Instead of explicitly modeling its environment,
25
112704
2584
ν™˜κ²½μ„ λͺ…μ‹œμ μœΌλ‘œ λͺ¨λΈλ§ν•˜λŠ” λŒ€μ‹ ,
01:55
it just learns to predict, based on the images on screen,
26
115288
3458
ν™”λ©΄ μƒμ˜ 이미지λ₯Ό 보고 λ‹€μ–‘ν•œ λ²„νŠΌμ„ 눌러
01:58
how many future points it can expect to earn by pressing different buttons.
27
118746
4958
μ–Όλ§ˆλ‚˜ λ§Žμ€ 점수λ₯Ό 얻을 수 μžˆμ„μ§€ μ˜ˆμΈ‘ν•˜λŠ” 방법을 ν•™μŠ΅ν•˜μ£ .
02:03
For instance, β€œif the ball is here and I move left, more points,
28
123871
4792
예λ₯Ό λ“€μ–΄, β€œκ³΅μ΄ μ—¬κΈ° μžˆμ„ λ•Œ μ™Όμͺ½μœΌλ‘œ 움직이면 점수λ₯Ό μ–»κ³ ,
02:08
but if I move right, no more points.”
29
128663
2833
였λ₯Έμͺ½μœΌλ‘œ 움직이면 점수λ₯Ό 얻지 λͺ»ν•œλ‹€.β€μ²˜λŸΌμš”.
02:12
But learning these connections requires a lot of trial and error.
30
132038
4500
ν•˜μ§€λ§Œ μ΄λŸ¬ν•œ 관계λ₯Ό ν•™μŠ΅ν•˜λ €λ©΄ λ§Žμ€ μ‹œν–‰μ°©μ˜€κ°€ ν•„μš”ν•©λ‹ˆλ‹€.
02:16
The DQN system would start by mashing buttons randomly,
31
136704
3834
DQN μ‹œμŠ€ν…œμ€ λ²„νŠΌμ„ λ¬΄μž‘μœ„λ‘œ λˆ„λ₯΄λŠ” κ²ƒμœΌλ‘œ μ‹œμž‘ν•΄
02:20
and then slowly piece together which buttons to mash when
32
140538
3541
점수λ₯Ό μ΅œλŒ€ν™”ν•˜κΈ° μœ„ν•΄μ„œλŠ” μ–΄λ–€ λ²„νŠΌμ„ λˆŒλŸ¬μ•Ό ν•˜λŠ”μ§€
02:24
in order to maximize its score.
33
144079
2125
천천히 쑰각을 λ§žμΆ”μ–΄ λ‚˜κ°”μŠ΅λ‹ˆλ‹€.
02:26
But in playing β€œMontezuma’s Revenge,”
34
146704
2375
ν•˜μ§€λ§Œ β€œλͺ¬ν…Œμˆ˜λ§ˆμ˜ λ³΅μˆ˜β€μ—μ„œλŠ”
02:29
this approach of random button-mashing fell flat on its face.
35
149079
4334
μ΄λ ‡κ²Œ λ¬΄μž‘μœ„λ‘œ λ²„νŠΌμ„ λˆ„λ₯΄λŠ” 것은 μ†Œμš©μ΄ μ—†μ—ˆμ£ .
02:34
A player would have to perform this entire sequence
36
154121
3000
ν”Œλ ˆμ΄μ–΄λŠ” λͺ¨λ“  과정을 ν•˜κ³ μ„œλŠ”
02:37
just to score their first points at the very end.
37
157121
3375
λ§ˆμ§€λ§‰ λΆ€λΆ„μ—μ„œ 겨우 첫 점수λ₯Ό 얻기도 ν•©λ‹ˆλ‹€.
02:40
A mistake? Game over.
38
160871
2208
μ‹€μˆ˜λ₯Ό ν•˜λ©΄ κ²Œμž„ 끝이죠.
02:43
So how could DQN even know it was on the right track?
39
163538
3708
그러면 DQN은 μ œλŒ€λ‘œ ν•˜κ³  μžˆλŠ”μ§€ μ–΄λ–»κ²Œ μ•Œ 수 μžˆμ„κΉŒμš”?
02:47
This is where babies come in.
40
167746
2458
이 λΆ€λΆ„μ—μ„œ 아기듀이 λ“±μž₯ν•©λ‹ˆλ‹€.
02:50
In studies, infants consistently look longer at pictures
41
170746
3875
연ꡬ에 λ”°λ₯΄λ©΄, μœ μ•„λ“€μ€ 그듀이 봀던 사진보닀
02:54
they haven’t seen before than ones they have.
42
174621
2667
이전에 λ³Έ 적이 μ—†λŠ” 사진을 더 였래 쳐닀본닀고 ν•©λ‹ˆλ‹€.
02:57
There just seems to be something intrinsically rewarding about novelty.
43
177579
4000
마치 μƒˆλ‘œμš΄ 것에 λ‚΄μž¬λœ 보상이 μžˆλŠ” κ²ƒμ²˜λŸΌμš”.
03:02
This behavior has been essential in understanding the infant mind.
44
182121
4125
μ΄λŸ¬ν•œ 행동은 μ•„κΈ°μ˜ 생각을 μ΄ν•΄ν•˜λŠ” 데에 ν•„μˆ˜μ μ΄μ—ˆμŠ΅λ‹ˆλ‹€.
03:06
It also turned out to be the secret to beating β€œMontezuma’s Revenge.”
45
186496
4792
λ˜ν•œ β€œλͺ¬ν…Œμˆ˜λ§ˆμ˜ λ³΅μˆ˜β€μ—μ„œ μ΄κΈ°λŠ” 비법이기도 ν–ˆμ£ .
03:12
The DeepMind researchers worked out an ingenious way
46
192121
3708
λ”₯λ§ˆμΈλ“œ 연ꡬ원듀은 μƒˆλ‘œμ›€μ— λŒ€ν•œ μ΄λŸ¬ν•œ μ„ ν˜Έλ₯Ό
03:15
to plug this preference for novelty into reinforcement learning.
47
195829
4500
κ°•ν™” ν•™μŠ΅μ— μ ‘λͺ©μ‹œν‚¬ 수 μžˆλŠ” κΈ°λ°œν•œ 방법을 μ°Ύμ•„λƒˆμŠ΅λ‹ˆλ‹€.
03:20
They made it so that unusual or new images appearing on the screen
48
200704
4542
ν™”λ©΄μƒμ˜ νŠΉμ΄ν•˜κ±°λ‚˜ μƒˆλ‘œμš΄ 이미지듀이
03:25
were every bit as rewarding as real in-game points.
49
205246
4208
μ‹€μ œ κ²Œμž„ λ‚΄ 점수만큼의 보상이 λ˜λ„λ‘ λ§Œλ“  κ²ƒμž…λ‹ˆλ‹€.
03:29
Suddenly, DQN was behaving totally differently from before.
50
209704
4709
그러자 κ°‘μžκΈ° DQN은 μ΄μ „κ³ΌλŠ” μ „ν˜€ λ‹€λ₯Έ 행동을 보이기 μ‹œμž‘ν–ˆμ£ .
03:34
It wanted to explore the room it was in,
51
214579
2334
μžκΈ°κ°€ μžˆλŠ” 방을 νƒμƒ‰ν•˜κ³ ,
03:36
to grab the key and escape through the locked doorβ€”
52
216913
2708
μ—΄μ‡ λ₯Ό μ§‘μ–΄μ„œ 잠긴 문을 μ—΄κ³  νƒˆμΆœν•˜κΈ°λ₯Ό μ›ν–ˆμŠ΅λ‹ˆλ‹€.
03:39
not because it was worth 100 points,
53
219621
2708
μ΄λŠ” 100점의 점수 λ•Œλ¬Έμ΄ μ•„λ‹ˆκ³ 
03:42
but for the same reason we would: to see what was on the other side.
54
222329
4667
μš°λ¦¬μ™€ λ§ˆμ°¬κ°€μ§€λ‘œ λ°˜λŒ€νŽΈμ— 무엇이 μžˆλŠ”μ§€ 보고 μ‹Άμ—ˆκΈ° λ•Œλ¬Έμ΄μ—ˆμŠ΅λ‹ˆλ‹€.
03:48
With this new drive, DQN not only managed to grab that first keyβ€”
55
228163
5250
이 μƒˆλ‘œμš΄ 방식을 톡해 DQN은 첫 번째 μ—΄μ‡ λ₯Ό μ–»μ—ˆμ„ 뿐만 μ•„λ‹ˆλΌ
03:53
it explored all the way through 15 of the temple’s 24 chambers.
56
233413
4833
μ‚¬μ›μ˜ λ°© 24개 쀑 15개λ₯Ό 연달아 νƒν—˜ν–ˆμŠ΅λ‹ˆλ‹€.
03:58
But emphasizing novelty-based rewards can sometimes create more problems
57
238454
4209
κ·ΈλŸ¬λ‚˜ μƒˆλ‘œμ›€μ„ 근거둜 ν•˜λŠ” 보상을 κ°•μ‘°ν•˜λŠ” 것은
ν•΄κ²°ν•˜λŠ” 것보닀 더 λ§Žμ€ 문제λ₯Ό λ§Œλ“€κΈ°λ„ ν•©λ‹ˆλ‹€.
04:02
than it solves.
58
242663
1166
04:03
A novelty-seeking system that’s played a game too long
59
243913
3208
μƒˆλ‘œμ›€μ„ μΆ”κ΅¬ν•˜λŠ” μ‹œμŠ€ν…œμ΄ ν•œ κ²Œμž„μ„ λ„ˆλ¬΄ μ˜€λž˜ν•˜λ©΄
04:07
will eventually lose motivation.
60
247121
2500
κ²°κ΅­ 동기λ₯Ό μžƒμ„ κ²ƒμž…λ‹ˆλ‹€.
04:09
If it’s seen it all before, why go anywhere?
61
249996
3042
이미 λͺ¨λ“  곳을 λ‹€ λ΄€λ‹€λ©΄, λ‹€μ‹œ 갈 μ΄μœ κ°€ μ—†μ£ .
04:13
Alternately, if it encounters, say, a television, it will freeze.
62
253621
5167
λŒ€μ‹ μ— λ§Œμ•½ ν…”λ ˆλΉ„μ „μ„ λ°œκ²¬ν•œλ‹€λ©΄, κ·Έ μžλ¦¬μ—λ§Œ μžˆμ„ κ±°μ˜ˆμš”.
04:18
The constant novel images are essentially paralyzing.
63
258954
3750
기본적으둜 λŠμž„μ—†λŠ” μƒˆ 이미지듀은 우리λ₯Ό λ§ˆλΉ„μ‹œμΌœ λ²„λ¦¬λ‹ˆκΉŒμš”.
04:23
The ideas and inspiration here go in both directions.
64
263204
3625
이런 사싀이 μ£ΌλŠ” 아이디어와 μ˜κ°μ€ μ–‘μͺ½μœΌλ‘œ λ»—μ–΄κ°‘λ‹ˆλ‹€.
04:27
AI researchers stuck on a practical problem,
65
267079
3125
μ–΄λ–»κ²Œ DQN이 μ–΄λ €μš΄ κ²Œμž„μ—μ„œ 이기게 ν•  것인가와 같은
04:30
like how to get DQN to beat a difficult game,
66
270204
3334
μ‹€μ œμ μΈ λ¬Έμ œμ— 맀달린 인곡지λŠ₯ 연ꡬ원듀은
04:33
are turning increasingly to experts in human intelligence for ideas.
67
273538
5000
점점 더 생각에 λŒ€ν•œ 인간지λŠ₯ 전문가듀이 λ˜μ–΄κ°€κ³  있죠.
04:38
At the same time,
68
278788
1125
λ™μ‹œμ— 인곡지λŠ₯은 μš°λ¦¬κ°€ λΉ μ Έλ“€κ±°λ‚˜ ν—€μ–΄λ‚˜μ˜€λŠ” 방식에
04:39
AI is giving us new insights into the ways we get stuck and unstuck:
69
279913
5416
μƒˆλ‘œμš΄ 톡찰을 μ£Όκ³  μžˆμŠ΅λ‹ˆλ‹€.
04:45
into boredom, depression, and addiction,
70
285329
2792
지루함, 우울증, 쀑독,
04:48
along with curiosity, creativity, and play.
71
288121
3667
ν˜ΈκΈ°μ‹¬, 독창성, 그리고 λ†€μ΄μ—μ„œμš”.
이 μ›Ήμ‚¬μ΄νŠΈ 정보

이 μ‚¬μ΄νŠΈλŠ” μ˜μ–΄ ν•™μŠ΅μ— μœ μš©ν•œ YouTube λ™μ˜μƒμ„ μ†Œκ°œν•©λ‹ˆλ‹€. μ „ 세계 졜고의 μ„ μƒλ‹˜λ“€μ΄ κ°€λ₯΄μΉ˜λŠ” μ˜μ–΄ μˆ˜μ—…μ„ 보게 될 κ²ƒμž…λ‹ˆλ‹€. 각 λ™μ˜μƒ νŽ˜μ΄μ§€μ— ν‘œμ‹œλ˜λŠ” μ˜μ–΄ μžλ§‰μ„ 더블 ν΄λ¦­ν•˜λ©΄ κ·Έκ³³μ—μ„œ λ™μ˜μƒμ΄ μž¬μƒλ©λ‹ˆλ‹€. λΉ„λ””μ˜€ μž¬μƒμ— 맞좰 μžλ§‰μ΄ μŠ€ν¬λ‘€λ©λ‹ˆλ‹€. μ˜κ²¬μ΄λ‚˜ μš”μ²­μ΄ μžˆλŠ” 경우 이 문의 양식을 μ‚¬μš©ν•˜μ—¬ λ¬Έμ˜ν•˜μ‹­μ‹œμ˜€.

https://forms.gle/WvT1wiN1qDtmnspy7