How to get better at video games, according to babies - Brian Christian

559,494 views ・ 2021-11-02

TED-Ed

아래 영문자막을 더블클릭하시면 영상이 재생됩니다.

번역: Sohee Park 검토: DK Kim

00:08

In 2013, a group of researchers at DeepMind in London

8871

4292

2013년, 런던 딥마인드의 연구원들은

00:13

had set their sights on a grand challenge.

13163

2666

위대한 도전을 하기로 합니다.

00:15

They wanted to create an AI system that could beat,

15996

3292

인공지능 시스템을 만들어서

00:19

not just a single Atari game, but every Atari game.

19288

4833

하나가 아니라 모든 아타리 게임에서 이기고자 한 것이죠.

00:24

They developed a system they called Deep Q Networks, or DQN,

24663

5166

그들은 Deep Q Networks, 즉 DQN이라는 시스템을 개발했는데

00:29

and less than two years later, it was superhuman.

29829

3667

2년도 안 되어서, 이 시스템은 인간을 넘어섰습니다.

00:33

DQN was getting scores 13 times better

33954

4167

DQN은 상대하는 인간 선수보다

00:38

than professional human games testers at “Breakout,”

38121

3541

“브레이크아웃”은 13배,

00:41

17 times better at “Boxing,” and 25 times better at “Video Pinball.”

41662

6334

“복싱”은 17배, “비디오 핀볼”은 25배 더 잘했습니다.

00:48

But there was one notable, and glaring, exception.

48162

3834

하지만 주목할 만한, 눈에 띄는 예외가 하나 있었죠.

00:52

When playing “Montezuma’s Revenge” DQN couldn’t score a single point,

52496

5791

DQN은 “몬테수마의 복수” 게임에서 단 한 점도 얻지 못했습니다.

00:58

even after playing for weeks.

58537

2625

게임을 몇 주간이나 하고 나서도요.

01:01

What was it that made this particular game so vexingly difficult for AI?

61412

5459

이 게임의 어떤 점이 그렇게 인공지능에게 어려웠을까요?

01:07

And what would it take to solve it?

67204

2459

그리고 이를 해결하기 위해 무엇이 필요할까요?

01:10

Spoiler alert: babies.

70538

2833

답을 미리 본다면 바로 아기들입니다.

01:13

We’ll come back to that in a minute.

73746

2000

이에 대해서는 잠시 후 다시 보죠.

01:16

Playing Atari games with AI involves what’s called reinforcement learning,

76163

5541

인공지능으로 아타리 게임을 하는 데에는 강화 학습이라 부르는 것을 사용하는데

01:21

where the system is designed to maximize some kind of numerical rewards.

81871

4917

이는 일종의 수치적 보상을 극대화하도록 설계되었습니다.

01:26

In this case, those rewards were simply the game's points.

86788

3833

여기서 보상은 그저 게임 점수였죠.

01:30

This underlying goal drives the system to learn which buttons to press

90746

4333

이러한 기본 목표는 시스템이 어떤 단추를 누를지

01:35

and when to press them to get the most points.

95079

3000

그리고 언제 눌러야 가장 많은 점수를 얻을 수 있는지 학습하도록 합니다.

01:38

Some systems use model-based approaches, where they have a model of the environment

98079

5542

어떤 시스템은 모델 기반 접근 방식을 사용하는데

이 환경 모델은 특정 조치를 취하면 다음에 어떤 일이 일어날지를

01:43

that they can use to predict what will happen next

103621

3125

01:46

once they take a certain action.

106746

2000

예측하는 데 사용할 수 있습니다.

01:49

DQN, however, is model free.

109288

3041

그러나 DQN은 모델이 없습니다.

01:52

Instead of explicitly modeling its environment,

112704

2584

환경을 명시적으로 모델링하는 대신,

01:55

it just learns to predict, based on the images on screen,

115288

3458

화면 상의 이미지를 보고 다양한 버튼을 눌러

01:58

how many future points it can expect to earn by pressing different buttons.

118746

4958

얼마나 많은 점수를 얻을 수 있을지 예측하는 방법을 학습하죠.

02:03

For instance, “if the ball is here and I move left, more points,

123871

4792

예를 들어, “공이 여기 있을 때 왼쪽으로 움직이면 점수를 얻고,

02:08

but if I move right, no more points.”

128663

2833

오른쪽으로 움직이면 점수를 얻지 못한다.”처럼요.

02:12

But learning these connections requires a lot of trial and error.

132038

4500

하지만 이러한 관계를 학습하려면 많은 시행착오가 필요합니다.

02:16

The DQN system would start by mashing buttons randomly,

136704

3834

DQN 시스템은 버튼을 무작위로 누르는 것으로 시작해

02:20

and then slowly piece together which buttons to mash when

140538

3541

점수를 최대화하기 위해서는 어떤 버튼을 눌러야 하는지

02:24

in order to maximize its score.

144079

2125

천천히 조각을 맞추어 나갔습니다.

02:26

But in playing “Montezuma’s Revenge,”

146704

2375

하지만 “몬테수마의 복수”에서는

02:29

this approach of random button-mashing fell flat on its face.

149079

4334

이렇게 무작위로 버튼을 누르는 것은 소용이 없었죠.

02:34

A player would have to perform this entire sequence

154121

3000

플레이어는 모든 과정을 하고서는

02:37

just to score their first points at the very end.

157121

3375

마지막 부분에서 겨우 첫 점수를 얻기도 합니다.

02:40

A mistake? Game over.

160871

2208

실수를 하면 게임 끝이죠.

02:43

So how could DQN even know it was on the right track?

163538

3708

그러면 DQN은 제대로 하고 있는지 어떻게 알 수 있을까요?

02:47

This is where babies come in.

167746

2458

이 부분에서 아기들이 등장합니다.

02:50

In studies, infants consistently look longer at pictures

170746

3875

연구에 따르면, 유아들은 그들이 봤던 사진보다

02:54

they haven’t seen before than ones they have.

174621

2667

이전에 본 적이 없는 사진을 더 오래 쳐다본다고 합니다.

02:57

There just seems to be something intrinsically rewarding about novelty.

177579

4000

마치 새로운 것에 내재된 보상이 있는 것처럼요.

03:02

This behavior has been essential in understanding the infant mind.

182121

4125

이러한 행동은 아기의 생각을 이해하는 데에 필수적이었습니다.

03:06

It also turned out to be the secret to beating “Montezuma’s Revenge.”

186496

4792

또한 “몬테수마의 복수”에서 이기는 비법이기도 했죠.

03:12

The DeepMind researchers worked out an ingenious way

192121

3708

딥마인드 연구원들은 새로움에 대한 이러한 선호를

03:15

to plug this preference for novelty into reinforcement learning.

195829

4500

강화 학습에 접목시킬 수 있는 기발한 방법을 찾아냈습니다.

03:20

They made it so that unusual or new images appearing on the screen

200704

4542

화면상의 특이하거나 새로운 이미지들이

03:25

were every bit as rewarding as real in-game points.

205246

4208

실제 게임 내 점수만큼의 보상이 되도록 만든 것입니다.

03:29

Suddenly, DQN was behaving totally differently from before.

209704

4709

그러자 갑자기 DQN은 이전과는 전혀 다른 행동을 보이기 시작했죠.

03:34

It wanted to explore the room it was in,

214579

2334

자기가 있는 방을 탐색하고,

03:36

to grab the key and escape through the locked door—

216913

2708

열쇠를 집어서 잠긴 문을 열고 탈출하기를 원했습니다.

03:39

not because it was worth 100 points,

219621

2708

이는 100점의 점수 때문이 아니고

03:42

but for the same reason we would: to see what was on the other side.

222329

4667

우리와 마찬가지로 반대편에 무엇이 있는지 보고 싶었기 때문이었습니다.

03:48

With this new drive, DQN not only managed to grab that first key—

228163

5250

이 새로운 방식을 통해 DQN은 첫 번째 열쇠를 얻었을 뿐만 아니라

03:53

it explored all the way through 15 of the temple’s 24 chambers.

233413

4833

사원의 방 24개 중 15개를 연달아 탐험했습니다.

03:58

But emphasizing novelty-based rewards can sometimes create more problems

238454

4209

그러나 새로움을 근거로 하는 보상을 강조하는 것은

해결하는 것보다 더 많은 문제를 만들기도 합니다.

04:02

than it solves.

242663

1166

04:03

A novelty-seeking system that’s played a game too long

243913

3208

새로움을 추구하는 시스템이 한 게임을 너무 오래하면

04:07

will eventually lose motivation.

247121

2500

결국 동기를 잃을 것입니다.

04:09

If it’s seen it all before, why go anywhere?

249996

3042

이미 모든 곳을 다 봤다면, 다시 갈 이유가 없죠.

04:13

Alternately, if it encounters, say, a television, it will freeze.

253621

5167

대신에 만약 텔레비전을 발견한다면, 그 자리에만 있을 거예요.

04:18

The constant novel images are essentially paralyzing.

258954

3750

기본적으로 끊임없는 새 이미지들은 우리를 마비시켜 버리니까요.

04:23

The ideas and inspiration here go in both directions.

263204

3625

이런 사실이 주는 아이디어와 영감은 양쪽으로 뻗어갑니다.

04:27

AI researchers stuck on a practical problem,

267079

3125

어떻게 DQN이 어려운 게임에서 이기게 할 것인가와 같은

04:30

like how to get DQN to beat a difficult game,

270204

3334

실제적인 문제에 매달린 인공지능 연구원들은

04:33

are turning increasingly to experts in human intelligence for ideas.

273538

5000

점점 더 생각에 대한 인간지능 전문가들이 되어가고 있죠.

04:38

At the same time,

278788

1125

동시에 인공지능은 우리가 빠져들거나 헤어나오는 방식에

04:39

AI is giving us new insights into the ways we get stuck and unstuck:

279913

5416

새로운 통찰을 주고 있습니다.

04:45

into boredom, depression, and addiction,

285329

2792

지루함, 우울증, 중독,

04:48

along with curiosity, creativity, and play.

288121

3667

호기심, 독창성, 그리고 놀이에서요.

New videos

06:16

How important is politeness? ⏲️ 6 Minute English

07:44

North Korea’s secrets revealed by phone: Study:...

17:30

Advanced English Learning: Speaking Practice

03:48

What can you do? Easy English Conversations 💬 ...

12:13

Speak English Confidently: Daily Tricks & Tips 🧠

13:00

Practice English Conversation (Family life) Imp...

10:22

VOCABULARY English Speaking Practice

11:45

3 Simple Steps to Become Fluent in English

Original video on YouTube.com

How to get better at video games, according to babies - Brian Christian - YouTube

이 웹사이트 정보

이 사이트는 영어 학습에 유용한 YouTube 동영상을 소개합니다. 전 세계 최고의 선생님들이 가르치는 영어 수업을 보게 될 것입니다. 각 동영상 페이지에 표시되는 영어 자막을 더블 클릭하면 그곳에서 동영상이 재생됩니다. 비디오 재생에 맞춰 자막이 스크롤됩니다. 의견이나 요청이 있는 경우 이 문의 양식을 사용하여 문의하십시오.

https://forms.gle/WvT1wiN1qDtmnspy7

Playback speed

Subtitle font size

How to get better at video games, according to babies - Brian Christian

New videos

How to get better at video games, according to babies - Brian Christian

New videos

Original video on YouTube.com