How computers learn to recognize objects instantly | Joseph Redmon

1,132,480 views ・ 2017-08-18

TED

아래 영문자막을 더블클릭하시면 영상이 재생됩니다.

번역: 혜련 장 검토: Taz B K

00:12

Ten years ago,

12645

1151

십년 전 만해도

00:13

computer vision researchers thought that getting a computer

13820

2776

컴퓨터 시각 연구자들은

00:16

to tell the difference between a cat and a dog

16620

2696

개와 고양이를 컴퓨터가 구별해 내는 것은

00:19

would be almost impossible,

19340

1976

거의 불가능하다 생각했습니다.

00:21

even with the significant advance in the state of artificial intelligence.

21340

3696

아무리 인공지능이 발전해도 말이지요.

00:25

Now we can do it at a level greater than 99 percent accuracy.

25060

3560

지금은 99% 이상 정확하게 그 일이 가능한데,

00:29

This is called image classification --

29500

1856

이것을 '이미지 분류' 라고 합니다.

00:31

give it an image, put a label to that image --

31380

3096

이미지 마다 이름표를 붙여주면

00:34

and computers know thousands of other categories as well.

34500

3040

컴퓨터는 수천 개의 다른 유형까지 알아냅니다.

00:38

I'm a graduate student at the University of Washington,

38500

2896

저는 워싱턴 대학교에서 석사과정을 밟고 있고

00:41

and I work on a project called Darknet,

41420

1896

'다크넷' 이라 불리는 프로젝트를 연구하고 있습니다.

00:43

which is a neural network framework

43340

1696

일종의 신경망 체제의 프로그램인데

00:45

for training and testing computer vision models.

45060

2816

컴퓨터 시각 견본을 교육하고 실험하는데 쓰입니다.

00:47

So let's just see what Darknet thinks

47900

2976

자 이제, '다크넷'이 어떤 식으로

00:50

of this image that we have.

50900

1760

이 이미지를 인식하는지 보겠습니다.

00:54

When we run our classifier

54340

2336

지금 이 이미지에

00:56

on this image,

56700

1216

저희가 개발한 선별법을 적용하면

00:57

we see we don't just get a prediction of dog or cat,

57940

2456

단지 개 또는 고양이의 예측 뿐만 아니라

01:00

we actually get specific breed predictions.

60420

2336

자세한 종까지도 알아 낼 수 있습니다.

01:02

That's the level of granularity we have now.

62780

2176

이미 이 정도로 세밀한 수준에 올라와 있습니다.

01:04

And it's correct.

64980

1616

아주 정확하기까지 합니다.

01:06

My dog is in fact a malamute.

66620

1840

제 반려견은 말라뮤트 입니다.

01:08

So we've made amazing strides in image classification,

68860

4336

'이미지 선별법'이 엄청나게 발전을 해왔는데

01:13

but what happens when we run our classifier

73220

2000

이런 종류의 이미지에 저희 선별법을 적용시키면

01:15

on an image that looks like this?

75244

1960

과연 어떤 결과가 나올까요?

01:18

Well ...

78900

1200

자...

01:24

We see that the classifier comes back with a pretty similar prediction.

84460

3896

대략 비슷한 예측을 하는 것을 볼 수 있습니다.

01:28

And it's correct, there is a malamute in the image,

88380

3096

맞습니다, 사진에 말라뮤트가 있죠.

01:31

but just given this label, we don't actually know that much

91500

3696

하지만 이 정도로는 어떤 장면인지

01:35

about what's going on in the image.

95220

1667

많은 것을 알 수 없습니다.

01:36

We need something more powerful.

96911

1560

좀 더 효과적인 것이 필요하겠지요.

01:39

I work on a problem called object detection,

99060

2616

저는 지금 '사물감지'라 불리는 문제에 대해 연구하고 있습니다.

01:41

where we look at an image and try to find all of the objects,

101700

2936

한 이미지 안에 있는 모든 사물들을 찾아내서

01:44

put bounding boxes around them

104660

1456

테두리 상자를 치고

01:46

and say what those objects are.

106140

1520

그것이 무엇인지 맞추는 것입니다.

01:48

So here's what happens when we run a detector on this image.

108220

3280

여기에 감지법을 적용하면 어떻게 되는지 보겠습니다.

01:53

Now, with this kind of result,

113060

2256

자, 이런 식의 결과라면

01:55

we can do a lot more with our computer vision algorithms.

115340

2696

컴퓨터 시각 알고리듬으로 더 많은 것을 해낼 수 있겠군요.

01:58

We see that it knows that there's a cat and a dog.

118060

2976

이제 이미지 안에 고양이와 개가 있고

02:01

It knows their relative locations,

121060

2256

대략 그들의 위치

02:03

their size.

123340

1216

그리고 크기까지 파악하고 있습니다.

02:04

It may even know some extra information.

124580

1936

그외 다른 정보들까지 알고 있을지도 모르겠네요.

02:06

There's a book sitting in the background.

126540

1960

저 뒤 쪽에 책 한 권이 있네요.

02:09

And if you want to build a system on top of computer vision,

129100

3256

이 컴퓨터 시각을 이용해서 어떤 시스템을 개발한다면,

02:12

say a self-driving vehicle or a robotic system,

132380

3456

자율주행 자동차나 로봇 시스템일텐데

02:15

this is the kind of information that you want.

135860

2456

바로 이런 것들이 여러분들이 원하는 정보일 겁니다.

02:18

You want something so that you can interact with the physical world.

138340

3239

물리적 세계와 교감을 가능하게 하는 것들 말이지요.

02:22

Now, when I started working on object detection,

142579

2257

자, 제가 처음으로 '사물감지' 연구에 들어갔을 때

02:24

it took 20 seconds to process a single image.

144860

3296

이미지 하나를 처리하는데 20초가 걸렸습니다.

02:28

And to get a feel for why speed is so important in this domain,

148180

3880

이 분야에서 왜 속도가 중요한지 알고 싶다면

02:32

here's an example of an object detector

152940

2536

여기 사물감지기능의 한 예가 있습니다.

02:35

that takes two seconds to process an image.

155500

2416

이미지 하나를 처리하는데 2초 밖에 걸리지 않습니다.

02:37

So this is 10 times faster

157940

2616

20초 짜리 감지기능보다는

02:40

than the 20-seconds-per-image detector,

160580

3536

10배나 빠른 속도이지요.

02:44

and you can see that by the time it makes predictions,

164140

2656

보시는 것 처럼, 이 기능이 예측을 하기 시작할 때면

02:46

the entire state of the world has changed,

166820

2040

이미 벌어지고 상황은 바뀌어 있을 테니까

02:49

and this wouldn't be very useful

169700

2416

응용 프로그램으로는

02:52

for an application.

172140

1416

별 효용이 없을 겁니다.

02:53

If we speed this up by another factor of 10,

173580

2496

만일 10배를 더 빠르게 한다면

02:56

this is a detector running at five frames per second.

176100

2816

초당 다섯 장면을 처리하는 감지기능이 됩니다.

02:58

This is a lot better,

178940

1536

훨씬 낫죠.

03:00

but for example,

180500

1976

하지만 만일,

03:02

if there's any significant movement,

182500

2296

여기서 더 큰 발전이 없다면

03:04

I wouldn't want a system like this driving my car.

184820

2560

이 정도의 시스템이 제 차를 운전하기를 원친 않겠지요.

03:08

This is our detection system running in real time on my laptop.

188940

3240

이것이 제 노트북에서 실시간으로 작동되고 있는 감지 시스템입니다.

03:12

So it smoothly tracks me as I move around the frame,

192820

3136

아주 부드럽게 제가 틀안에서 움직이는 대로 따라오죠.

03:15

and it's robust to a wide variety of changes in size,

195980

3720

아무 문제가 없습니다. 다양한 크기

03:21

pose,

201260

1200

자세

03:23

forward, backward.

203100

1856

앞뒤 움직임에도

03:24

This is great.

204980

1216

훌륭하죠.

03:26

This is what we really need

206220

1736

이런 것이 바로 우리에게 필요한 것입니다.

03:27

if we're going to build systems on top of computer vision.

207980

2896

컴퓨터 시각을 이용한 시스템을 개발할 때 말이지요.

03:30

(Applause)

210900

4000

(박수)

03:36

So in just a few years,

216100

2176

불과 몇년 만에

03:38

we've gone from 20 seconds per image

218300

2656

한 이미지를 처리하는 시간이 20초에서

03:40

to 20 milliseconds per image, a thousand times faster.

220980

3536

500분의 1초로, 천배나 빨라졌습니다.

03:44

How did we get there?

224540

1416

어떻게 가능했을까요?

03:45

Well, in the past, object detection systems

225980

3016

과거에는, 사물감지 시스템들은

03:49

would take an image like this

229020

1936

이런 이미지를 가지고

03:50

and split it into a bunch of regions

230980

2456

여러 영역으로 잘라내서

03:53

and then run a classifier on each of these regions,

233460

3256

각 영역 마다 선별작업을 실행하고

03:56

and high scores for that classifier

236740

2536

그 선별작업에서 산출된 가장 높은 점수들이

03:59

would be considered detections in the image.

239300

3136

이미지의 감지로 간주되는 방식이었습니다.

04:02

But this involved running a classifier thousands of times over an image,

242460

4056

하지만, 감지를 하기까지 한 이미지에 수천 번의 분류작업이

04:06

thousands of neural network evaluations to produce detection.

246540

2920

또 수천 번의 신경망 감정을 거쳐야 했습니다.

04:11

Instead, we trained a single network to do all of detection for us.

251060

4536

대신에, 우리는 단일 네트워크로 모든 탐지가 가능케 했습니다.

04:15

It produces all of the bounding boxes and class probabilities simultaneously.

255620

4280

모든 테두리 상자와 분류 개연성을 동시에 처리해 내는 것이지요.

04:20

With our system, instead of looking at an image thousands of times

260500

3496

저희 시스템에서는 감지를 해내기 위해

04:24

to produce detection,

264020

1456

한 이미지를 수천 번이 아니라

04:25

you only look once,

265500

1256

단 한 번 보는 것으로 가능하고

04:26

and that's why we call it the YOLO method of object detection.

266780

2920

저희가 이것을 사물감지의 '욜로'법 으로 부르는 이유입니다.

04:31

So with this speed, we're not just limited to images;

271180

3976

이 속도로는, 이미지 뿐만 아니라

04:35

we can process video in real time.

275180

2416

동영상도 실시간으로 처리할 수 있습니다.

04:37

And now, instead of just seeing that cat and dog,

277620

3096

이제는 단순히 개와 고양이를 인지하는 것을 넘어서

04:40

we can see them move around and interact with each other.

280740

2960

그들이 돌아다니는 것도, 서로 어울리는 것도 볼 수 있습니다.

04:46

This is a detector that we trained

286380

2056

이것이 저희가 개발해낸 감지기능입니다.

04:48

on 80 different classes

288460

4376

마이크로소프트의 코코 데이터 세트 안에서

04:52

in Microsoft's COCO dataset.

292860

3256

80개의 등급에 적용시켜 얻어낸 것이지요.

04:56

It has all sorts of things like spoon and fork, bowl,

296140

3336

숟가락, 포크, 그릇 같이 평범한 물건들이

04:59

common objects like that.

299500

1800

다양하게 있네요.

05:02

It has a variety of more exotic things:

100

302180

3096

좀 특이한 것들도 보이지요.

05:05

animals, cars, zebras, giraffes.

101

305300

3256

동물, 자동차, 얼룩말, 기린.

05:08

And now we're going to do something fun.

102

308580

1936

재미난 걸 한번 해볼까요.

05:10

We're just going to go out into the audience

103

310540

2096

방청석으로 들어가서

05:12

and see what kind of things we can detect.

104

312660

2016

어떤 물건들이 감지되는지 보겠습니다.

05:14

Does anyone want a stuffed animal?

105

314700

1620

동물인형 갖고 싶으신 분?

05:17

There are some teddy bears out there.

106

317820

1762

저기 곰인형도 몇개 있네요.

05:21

And we can turn down our threshold for detection a little bit,

107

321860

4536

감지한계치를 조금 낮추면,

05:26

so we can find more of you guys out in the audience.

108

326420

3400

더 많은 분들이 화면에 잡히겠지요.

05:31

Let's see if we can get these stop signs.

109

331380

2336

이 정지표지판들도 잡아낼 수 있는지 보겠습니다.

05:33

We find some backpacks.

110

333740

1880

배낭도 몇개 보이네요.

05:37

Let's just zoom in a little bit.

111

337700

1840

조금 가까이 당겨 보지요.

05:42

And this is great.

112

342140

1256

좋습니다.

05:43

And all of the processing is happening in real time

113

343420

3176

이 모든 것이 컴퓨터에서 실시간으로

05:46

on the laptop.

114

346620

1200

처리되고 있습니다.

05:48

And it's important to remember

115

348900

1456

꼭 알아둘 것은

05:50

that this is a general purpose object detection system,

116

350380

3216

이것이 총괄적인 사물감지 시스템이란 것입니다.

05:53

so we can train this for any image domain.

117

353620

5000

그래야 어떠한 이미지 종류에도 적용시킬 수 있겠지요.

06:00

The same code that we use

118

360140

2536

동일한 코드가

06:02

to find stop signs or pedestrians,

119

362700

2456

정지표지판 또는 보행자

06:05

bicycles in a self-driving vehicle,

120

365180

1976

자율주행 자동차 안의 자전거들을 찾아내기도 하고

06:07

can be used to find cancer cells

121

367180

2856

조직검사를 통해 암세포를

06:10

in a tissue biopsy.

122

370060

3016

찾아낼 때도 사용될 수 있습니다.

06:13

And there are researchers around the globe already using this technology

123

373100

4040

이미 세계 곳곳의 연구원들이 이 기술을

06:18

for advances in things like medicine, robotics.

124

378060

3416

의학과 로봇공학의 발전 등에 쓰고 있습니다.

06:21

This morning, I read a paper

125

381500

1376

오늘 아침 신문에

06:22

where they were taking a census of animals in Nairobi National Park

126

382900

4576

나이로비 국립공원의 동물 수 조사에

06:27

with YOLO as part of this detection system.

127

387500

3136

욜로가 감지 시스템의 일부로 사용된다고 나왔더군요.

06:30

And that's because Darknet is open source

128

390660

3096

다크넷이 오픈소스이기도 하고

06:33

and in the public domain, free for anyone to use.

129

393780

2520

모두가 무료로 사용할 수 있도록 열려있기 때문입니다.

06:37

(Applause)

130

397420

5696

(박수)

06:43

But we wanted to make detection even more accessible and usable,

131

403140

4936

그런데, 저희는 감지기능의 접근성과 사용성을 더 높이고 싶었고

06:48

so through a combination of model optimization,

132

408100

4056

견본 최적화

06:52

network binarization and approximation,

133

412180

2296

네트워크 이진화와 근사치의 적절한 조화를 통해서

06:54

we actually have object detection running on a phone.

134

414500

3920

이제 휴대전화에서도 사물감지가 가능하게 했습니다.

07:04

(Applause)

135

424620

5320

(박수)

07:10

And I'm really excited because now we have a pretty powerful solution

136

430780

5056

아주 흥분되는데요. 왜냐면 급이 낮은 컴퓨터 시각 문제점들을

07:15

to this low-level computer vision problem,

137

435860

2296

해결할 아주 효과적인 방법이 있으니까요.

07:18

and anyone can take it and build something with it.

138

438180

3856

누구나 이 기술을 가지고 원하는 것들을 만들어 낼 수 있습니다.

07:22

So now the rest is up to all of you

139

442060

3176

이제 나머지는 여러분들의 몫이고요.

07:25

and people around the world with access to this software,

140

445260

2936

또 이 소프트웨어를 사용하는 세상의 모든 분들의 몫입니다.

07:28

and I can't wait to see what people will build with this technology.

141

448220

3656

이 기술로 사람들이 어떤 것들을 만들어 낼지 너무 기대됩니다.

07:31

Thank you.

142

451900

1216

감사합니다.

07:33

(Applause)

143

453140

3440

(박수)

New videos

06:16

How important is politeness? ⏲️ 6 Minute English

07:44

North Korea’s secrets revealed by phone: Study:...

17:30

Advanced English Learning: Speaking Practice

03:48

What can you do? Easy English Conversations 💬 ...

12:13

Speak English Confidently: Daily Tricks & Tips 🧠

13:00

Practice English Conversation (Family life) Imp...

10:22

VOCABULARY English Speaking Practice

11:45

3 Simple Steps to Become Fluent in English

Original video on YouTube.com

How computers learn to recognize objects instantly | Joseph Redmon - YouTube

이 웹사이트 정보

이 사이트는 영어 학습에 유용한 YouTube 동영상을 소개합니다. 전 세계 최고의 선생님들이 가르치는 영어 수업을 보게 될 것입니다. 각 동영상 페이지에 표시되는 영어 자막을 더블 클릭하면 그곳에서 동영상이 재생됩니다. 비디오 재생에 맞춰 자막이 스크롤됩니다. 의견이나 요청이 있는 경우 이 문의 양식을 사용하여 문의하십시오.

https://forms.gle/WvT1wiN1qDtmnspy7

Playback speed

Subtitle font size

How computers learn to recognize objects instantly | Joseph Redmon

New videos

How computers learn to recognize objects instantly | Joseph Redmon

New videos

Original video on YouTube.com