How computers learn to recognize objects instantly | Joseph Redmon

1,132,480 views ・ 2017-08-18

TED

下の英語字幕をダブルクリックすると動画を再生できます。

翻訳: Yasushi Aoki 校正: Claire Ghyselen

00:12

Ten years ago,

12645

1151

10年前

00:13

computer vision researchers thought that getting a computer

13820

2776

コンピュータービジョンの研究者は

コンピューターで犬と猫を見分けるのは

00:16

to tell the difference between a cat and a dog

16620

2696

ほとんど無理だと考えていました

00:19

would be almost impossible,

19340

1976

00:21

even with the significant advance in the state of artificial intelligence.

21340

3696

人工知能の大きな発展にもかかわらずです

00:25

Now we can do it at a level greater than 99 percent accuracy.

25060

3560

現在では99%以上の精度で見分けられるようになっています

00:29

This is called image classification --

29500

1856

これは「画像分類」と呼ばれる問題で

00:31

give it an image, put a label to that image --

31380

3096

コンピューターに画像のラベル付けをさせるものです

00:34

and computers know thousands of other categories as well.

34500

3040

コンピューターは何千種もの物を識別できるようになっています

00:38

I'm a graduate student at the University of Washington,

38500

2896

私はワシントン大学の大学院生で

00:41

and I work on a project called Darknet,

41420

1896

Darknetというプロジェクトに取り組んでいます

00:43

which is a neural network framework

43340

1696

コンピュータービジョンのモデルをトレーニングしテストするための

00:45

for training and testing computer vision models.

45060

2816

ニューラルネット・フレームワークです

00:47

So let's just see what Darknet thinks

47900

2976

Darknetがあの犬の画像を

何だと思うか見てみましょう

00:50

of this image that we have.

50900

1760

00:54

When we run our classifier

54340

2336

あの画像を

私たちの画像分類プログラムにかけると

00:56

on this image,

56700

1216

00:57

we see we don't just get a prediction of dog or cat,

57940

2456

犬か猫かだけでなく

01:00

we actually get specific breed predictions.

60420

2336

具体的な犬種まで言い当てます

01:02

That's the level of granularity we have now.

62780

2176

そこまで細かいことが分かるようになっています

01:04

And it's correct.

64980

1616

そして正しい答えを出しています [マラミュート犬 37% ハスキー犬 15% エスキモー犬 12%]

01:06

My dog is in fact a malamute.

66620

1840

私の犬は確かにマラミュート犬です [マラミュート犬 37% ハスキー犬 15% エスキモー犬 12%]

01:08

So we've made amazing strides in image classification,

68860

4336

画像分類は驚くほど進歩しましたが

01:13

but what happens when we run our classifier

73220

2000

こういう複数の物が写った写真を画像分類にかけたら

01:15

on an image that looks like this?

75244

1960

どうなるのでしょう？

01:18

Well ...

78900

1200

結果は—

01:24

We see that the classifier comes back with a pretty similar prediction.

84460

3896

前とほぼ同じになっています [マラミュート犬 7% エスキモー犬 6% ハスキー犬 6%]

01:28

And it's correct, there is a malamute in the image,

88380

3096

それは正しくて画像の中には確かにマラミュート犬がいますが

01:31

but just given this label, we don't actually know that much

91500

3696

そのラベルだけでは

この画像の中でどんなことが起きているのかあまりわかりません

01:35

about what's going on in the image.

95220

1667

01:36

We need something more powerful.

96911

1560

もっと強力なものがほしいところです

私は「物体検出」と呼ばれる問題に取り組んでいて

01:39

I work on a problem called object detection,

99060

2616

01:41

where we look at an image and try to find all of the objects,

101700

2936

それは画像を見てその中にある物体をすべて検出し

01:44

put bounding boxes around them

104660

1456

それぞれの物を箱で囲って

01:46

and say what those objects are.

106140

1520

それが何か識別するという問題です

01:48

So here's what happens when we run a detector on this image.

108220

3280

この画像を物体検出プログラムにかけるとどうなるか見てみましょう

01:53

Now, with this kind of result,

113060

2256

得られる結果はこういうもので

01:55

we can do a lot more with our computer vision algorithms.

115340

2696

色んなことができます

01:58

We see that it knows that there's a cat and a dog.

118060

2976

猫と犬がいることがわかり

02:01

It knows their relative locations,

121060

2256

相対的な位置や

02:03

their size.

123340

1216

大きさもわかります

02:04

It may even know some extra information.

124580

1936

おまけの情報もあります

02:06

There's a book sitting in the background.

126540

1960

向こうに本があるとか

02:09

And if you want to build a system on top of computer vision,

129100

3256

コンピュータービジョンを使ったシステム

02:12

say a self-driving vehicle or a robotic system,

132380

3456

自動運転車やロボットを作ろうとするなら

02:15

this is the kind of information that you want.

135860

2456

これはまさに欲しい情報でしょう

02:18

You want something so that you can interact with the physical world.

138340

3239

周りの世界と作用し合えるようにしてくれるものが欲しいのです

02:22

Now, when I started working on object detection,

142579

2257

私が物体検出に取り組み始めた頃は

02:24

it took 20 seconds to process a single image.

144860

3296

１つの画像の処理に 20秒かかっていました

02:28

And to get a feel for why speed is so important in this domain,

148180

3880

この領域でなぜスピードが重要なのか分かってもらうため

02:32

here's an example of an object detector

152940

2536

物体検出で画像の処理に２秒かかるとどんな具合か

02:35

that takes two seconds to process an image.

155500

2416

見ていただきましょう

02:37

So this is 10 times faster

157940

2616

これは画像１つにつき20秒かかる画像検出プログラムより

02:40

than the 20-seconds-per-image detector,

160580

3536

10倍速いわけですが

02:44

and you can see that by the time it makes predictions,

164140

2656

プログラムが答えを出したときには

02:46

the entire state of the world has changed,

166820

2040

状況は既に変わっているため

02:49

and this wouldn't be very useful

169700

2416

あまりアプリケーションの役には

02:52

for an application.

172140

1416

立ちません

02:53

If we speed this up by another factor of 10,

173580

2496

さらに10倍高速化してみましょう

02:56

this is a detector running at five frames per second.

176100

2816

毎秒 5フレーム処理しています

02:58

This is a lot better,

178940

1536

だいぶマシにはなりましたが

03:00

but for example,

180500

1976

何か大きな動きがあると

03:02

if there's any significant movement,

182500

2296

ズレが出ます

03:04

I wouldn't want a system like this driving my car.

184820

2560

このようなシステムに自分の車を運転して欲しくはありません

03:08

This is our detection system running in real time on my laptop.

188940

3240

これは私たちの物体検出システムでノートPC上でリアルタイムで動いています

03:12

So it smoothly tracks me as I move around the frame,

192820

3136

私が動き回ってもスムーズに追尾します

03:15

and it's robust to a wide variety of changes in size,

195980

3720

様々な種類の変化にも対応できます大きさとか

03:21

pose,

201260

1200

ポーズとか

03:23

forward, backward.

203100

1856

前向き後ろ向き

03:24

This is great.

204980

1216

とてもいいです

03:26

This is what we really need

206220

1736

これこそコンピュータービジョンを使ったシステムを作ろうというときに

03:27

if we're going to build systems on top of computer vision.

207980

2896

欲しいものです

03:30

(Applause)

210900

4000

(拍手)

03:36

So in just a few years,

216100

2176

ほんの数年で

03:38

we've gone from 20 seconds per image

218300

2656

１画像あたり20秒から 20ミリ秒へと

03:40

to 20 milliseconds per image, a thousand times faster.

220980

3536

1000倍高速化しました

03:44

How did we get there?

224540

1416

どうやって実現したのか？

03:45

Well, in the past, object detection systems

225980

3016

以前の物体検出システムは

03:49

would take an image like this

229020

1936

このような画像を受け取ると

03:50

and split it into a bunch of regions

230980

2456

沢山の領域に分割し

03:53

and then run a classifier on each of these regions,

233460

3256

それぞれの領域を分類プログラムにかけ

03:56

and high scores for that classifier

236740

2536

高いスコアが出たところに

03:59

would be considered detections in the image.

239300

3136

物体が検出されたと見なしていました

04:02

But this involved running a classifier thousands of times over an image,

242460

4056

この方法だと１つの画像に対し分類プログラムを何千回も走らせ

04:06

thousands of neural network evaluations to produce detection.

246540

2920

ニューラルネットによる評価が何千回も必要になります

04:11

Instead, we trained a single network to do all of detection for us.

251060

4536

そうする代わりに１つのニューラルネットですべての検出を行うようトレーニングしました

04:15

It produces all of the bounding boxes and class probabilities simultaneously.

255620

4280

境界の箱や分類の確からしさの確率をすべて同時に生成するのです

04:20

With our system, instead of looking at an image thousands of times

260500

3496

我々のシステムでは物体検出を行うために

画像を何千回も見る代わりに

04:24

to produce detection,

264020

1456

04:25

you only look once,

265500

1256

たった一度しか見ないのです

04:26

and that's why we call it the YOLO method of object detection.

266780

2920

それがYOLO (You Only Look Once)の名の所以です

04:31

So with this speed, we're not just limited to images;

271180

3976

これだけ速いと画像だけでなく

04:35

we can process video in real time.

275180

2416

映像もリアルタイムで処理できます

04:37

And now, instead of just seeing that cat and dog,

277620

3096

猫と犬を検出するだけでなく

04:40

we can see them move around and interact with each other.

280740

2960

それぞれが動き回り相手に反応しているのが分かります

04:46

This is a detector that we trained

286380

2056

この検出プログラムは

04:48

on 80 different classes

288460

4376

MicrosoftのCOCOデータセットにある 80種の物に対して

04:52

in Microsoft's COCO dataset.

292860

3256

トレーニングしてあります

04:56

It has all sorts of things like spoon and fork, bowl,

296140

3336

スプーンやフォークといった

04:59

common objects like that.

299500

1800

日常的な物もあれば

05:02

It has a variety of more exotic things:

100

302180

3096

もっと変わった物もあります

05:05

animals, cars, zebras, giraffes.

101

305300

3256

動物車シマウマキリン

05:08

And now we're going to do something fun.

102

308580

1936

ちょっと面白いことをやりましょう

05:10

We're just going to go out into the audience

103

310540

2096

客席からどんなものが検出できるか

05:12

and see what kind of things we can detect.

104

312660

2016

試してみます

05:14

Does anyone want a stuffed animal?

105

314700

1620

ぬいぐるみの動物が欲しい人？

05:17

There are some teddy bears out there.

106

317820

1762

そこかしこにテディベアがあります

05:21

And we can turn down our threshold for detection a little bit,

107

321860

4536

検出器の閾値を少し下げて

05:26

so we can find more of you guys out in the audience.

108

326420

3400

客席の皆さんを検出できるようにしましょう

05:31

Let's see if we can get these stop signs.

109

331380

2336

「一時停止」の標識を検出できるでしょうか

05:33

We find some backpacks.

110

333740

1880

バックパックがいくつかありますね

05:37

Let's just zoom in a little bit.

111

337700

1840

もう少しズームしましょう

05:42

And this is great.

112

342140

1256

素晴らしいです

05:43

And all of the processing is happening in real time

113

343420

3176

すべての処理がノートPC上で

リアルタイムで実行されています

05:46

on the laptop.

114

346620

1200

05:48

And it's important to remember

115

348900

1456

重要なのはこれが

05:50

that this is a general purpose object detection system,

116

350380

3216

汎用物体検出システムだということで

05:53

so we can train this for any image domain.

117

353620

5000

どのような領域の画像に対してもトレーニングできます

06:00

The same code that we use

118

360140

2536

自動運転車が

一時停止の標識や歩行者や自転車を検知するのに使うのと

06:02

to find stop signs or pedestrians,

119

362700

2456

06:05

bicycles in a self-driving vehicle,

120

365180

1976

同じプログラムを

06:07

can be used to find cancer cells

121

367180

2856

組織生検でガンを見つけるためにも

06:10

in a tissue biopsy.

122

370060

3016

使えるのです

すでに世界中の研究者達がこの技術を使って

06:13

And there are researchers around the globe already using this technology

123

373100

4040

06:18

for advances in things like medicine, robotics.

124

378060

3416

医学やロボット工学を前進させています

06:21

This morning, I read a paper

125

381500

1376

今朝新聞で読んだんですが

06:22

where they were taking a census of animals in Nairobi National Park

126

382900

4576

ナイロビ国立公園では YOLOを検出システムとして使って

06:27

with YOLO as part of this detection system.

127

387500

3136

動物の個体数調査をしているそうです

06:30

And that's because Darknet is open source

128

390660

3096

それというのもDarknetはオープンソースでパブリックドメインなため

06:33

and in the public domain, free for anyone to use.

129

393780

2520

誰でも無料で使えるからです

06:37

(Applause)

130

397420

5696

(拍手)

06:43

But we wanted to make detection even more accessible and usable,

131

403140

4936

私たちは物体検出技術をさらに近づきやすく使いやすいものにしたいと思い

06:48

so through a combination of model optimization,

132

408100

4056

モデルの最適化やネットワーク・バイナリぜーション

06:52

network binarization and approximation,

133

412180

2296

近似を組み合わせることで

06:54

we actually have object detection running on a phone.

134

414500

3920

スマートフォン上で動かせるようにしました

07:04

(Applause)

135

424620

5320

(拍手)

07:10

And I'm really excited because now we have a pretty powerful solution

136

430780

5056

私はすごくワクワクしています

いまやこの基本的なコンピュータービジョンの問題に対してとても強力な解があり

07:15

to this low-level computer vision problem,

137

435860

2296

07:18

and anyone can take it and build something with it.

138

438180

3856

誰でもそれを使って何か作り出すことができるんです

07:22

So now the rest is up to all of you

139

442060

3176

あとは皆さんや

このソフトウェアを使える世界中の人々にかかっています

07:25

and people around the world with access to this software,

140

445260

2936

07:28

and I can't wait to see what people will build with this technology.

141

448220

3656

この技術を使ってみんながどんなものを作ってくれるか楽しみです

07:31

Thank you.

142

451900

1216

ありがとうございました

07:33

(Applause)

143

453140

3440

(拍手)

New videos

06:16

How important is politeness? ⏲️ 6 Minute English

07:44

North Korea’s secrets revealed by phone: Study:...

17:30

Advanced English Learning: Speaking Practice

03:48

What can you do? Easy English Conversations 💬 ...

12:13

Speak English Confidently: Daily Tricks & Tips 🧠

13:00

Practice English Conversation (Family life) Imp...

10:22

VOCABULARY English Speaking Practice

11:45

3 Simple Steps to Become Fluent in English

Original video on YouTube.com

How computers learn to recognize objects instantly | Joseph Redmon - YouTube

このウェブサイトについて

このサイトでは英語学習に役立つYouTube動画を紹介します。世界中の一流講師による英語レッスンを見ることができます。各ビデオのページに表示される英語字幕をダブルクリックすると、そこからビデオを再生することができます。字幕はビデオの再生と同期してスクロールします。ご意見・ご要望がございましたら、こちらのお問い合わせフォームよりご連絡ください。

https://forms.gle/WvT1wiN1qDtmnspy7

Playback speed

Subtitle font size

How computers learn to recognize objects instantly | Joseph Redmon

New videos

How computers learn to recognize objects instantly | Joseph Redmon

New videos

Original video on YouTube.com