How computers learn to recognize objects instantly | Joseph Redmon

1,132,480 views ・ 2017-08-18

TED

請雙擊下方英文字幕播放視頻。

譯者: 易帆余審譯者: Wilde Luo

00:12

Ten years ago,

12645

1151

10 年前，

00:13

computer vision researchers thought that getting a computer

13820

2776

電腦視覺研究人員認為，

00:16

to tell the difference between a cat and a dog

16620

2696

要讓電腦辨別貓與狗的差別，

00:19

would be almost impossible,

19340

1976

幾乎是比登天還難，

00:21

even with the significant advance in the state of artificial intelligence.

21340

3696

即使用了相當先進的人工智慧都很難辦到。

00:25

Now we can do it at a level greater than 99 percent accuracy.

25060

3560

現在我們可以把辨別的準確度提升到 99% 以上。

00:29

This is called image classification --

29500

1856

這技術叫做圖像分類——

00:31

give it an image, put a label to that image --

31380

3096

給電腦看圖片，並給圖片貼上標籤——

00:34

and computers know thousands of other categories as well.

34500

3040

電腦還可以識別出許多其它類別的東西。

00:38

I'm a graduate student at the University of Washington,

38500

2896

我目前是華盛頓大學的研究生，

00:41

and I work on a project called Darknet,

41420

1896

我正在做一個專題叫做「暗黑網路」，

00:43

which is a neural network framework

43340

1696

它是一個用來訓練及測試

00:45

for training and testing computer vision models.

45060

2816

電腦視覺模型的神經網路架構。

00:47

So let's just see what Darknet thinks

47900

2976

所以，讓我們來瞧瞧暗黑網路

00:50

of this image that we have.

50900

1760

對我們照片識別能力的狀況。

00:54

When we run our classifier

54340

2336

當我們在這張照片上

00:56

on this image,

56700

1216

開啟我們的分類器，

00:57

we see we don't just get a prediction of dog or cat,

57940

2456

可以看到電腦現在不只在預測這是狗或貓，

01:00

we actually get specific breed predictions.

60420

2336

它實際上正在擷取特定品種的預測。

01:02

That's the level of granularity we have now.

62780

2176

這就是現在我們電腦的粒度等級。

01:04

And it's correct.

64980

1616

辨別正確。

01:06

My dog is in fact a malamute.

66620

1840

我的狗的確是隻雪橇犬。

01:08

So we've made amazing strides in image classification,

68860

4336

所以，我們在圖像識別上已經有了很大的進步，

01:13

but what happens when we run our classifier

73220

2000

但如果我們用識別器

01:15

on an image that looks like this?

75244

1960

來辨別這樣的照片呢？

01:18

Well ...

78900

1200

嗯……

01:24

We see that the classifier comes back with a pretty similar prediction.

84460

3896

可以看到從分類器得到的預測也相當類似。

01:28

And it's correct, there is a malamute in the image,

88380

3096

沒錯，圖片中有一隻雪橇狗，

01:31

but just given this label, we don't actually know that much

91500

3696

但它只給出一個標籤，

我們對這張照片的理解還不是很完整。

01:35

about what's going on in the image.

95220

1667

01:36

We need something more powerful.

96911

1560

我們需要更強的東西。

01:39

I work on a problem called object detection,

99060

2616

我正在研究一個問題，叫做「物件偵測」，

01:41

where we look at an image and try to find all of the objects,

101700

2936

我們把一張照片中的所有物體都找出來，

01:44

put bounding boxes around them

104660

1456

用邊界框把它們框起來，

01:46

and say what those objects are.

106140

1520

然後標示它們是那些東西。

01:48

So here's what happens when we run a detector on this image.

108220

3280

我們來看一下當我們在這一張圖片上執行偵測軟體時，會發生甚麼事。

01:53

Now, with this kind of result,

113060

2256

現在，有了這類的結果，

01:55

we can do a lot more with our computer vision algorithms.

115340

2696

我們就可以利用電腦視覺演算法，幫我們做更多的事。

01:58

We see that it knows that there's a cat and a dog.

118060

2976

我們可以看到，電腦知道圖片中有一隻貓和狗。

02:01

It knows their relative locations,

121060

2256

它知道牠們彼此的相對位置、

02:03

their size.

123340

1216

大小。

02:04

It may even know some extra information.

124580

1936

電腦甚至可能知道其它的資訊。

02:06

There's a book sitting in the background.

126540

1960

它也看到了背景中有一本書。

02:09

And if you want to build a system on top of computer vision,

129100

3256

如果你想要建立一個基於電腦視覺系統的實用系統，

02:12

say a self-driving vehicle or a robotic system,

132380

3456

比如說，自動駕駛車或機械人系統，

02:15

this is the kind of information that you want.

135860

2456

這類就會是你想要的資訊。

02:18

You want something so that you can interact with the physical world.

138340

3239

你會想要一個可以與實體世界互動的東西。

02:22

Now, when I started working on object detection,

142579

2257

當我開始做物件偵測時，

02:24

it took 20 seconds to process a single image.

144860

3296

它要花 20 秒才能處理一張圖片。

02:28

And to get a feel for why speed is so important in this domain,

148180

3880

為了讓各位體會為什麼這個領域這麼講究速度，

02:32

here's an example of an object detector

152940

2536

我這邊做個執行物件偵測器的示範，

02:35

that takes two seconds to process an image.

155500

2416

一張照片只要 2 秒的處理時間。

02:37

So this is 10 times faster

157940

2616

所以，比 20 秒一張的偵測器

02:40

than the 20-seconds-per-image detector,

160580

3536

快了 10 倍，

02:44

and you can see that by the time it makes predictions,

164140

2656

各位可以看到，在它識別圖像的過程中，

02:46

the entire state of the world has changed,

166820

2040

周圍環境已經發生了變化，

02:49

and this wouldn't be very useful

169700

2416

但對一個應用軟體而言，

02:52

for an application.

172140

1416

這樣的速度是很鷄肋的。

02:53

If we speed this up by another factor of 10,

173580

2496

如果我們把另一個參數調升到 10 ，

02:56

this is a detector running at five frames per second.

176100

2816

這個偵測器每秒就可以識別 5 張圖片。

02:58

This is a lot better,

178940

1536

這樣好多了，

03:00

but for example,

180500

1976

但，假如，

03:02

if there's any significant movement,

182500

2296

移動很快的時候……

03:04

I wouldn't want a system like this driving my car.

184820

2560

我可不想在我車上裝這樣慢的系統。

03:08

This is our detection system running in real time on my laptop.

188940

3240

這是在我筆電上運行的即時偵測系統。

03:12

So it smoothly tracks me as I move around the frame,

192820

3136

我在框框附近移動的時候，它可以很順暢地追蹤著我，

03:15

and it's robust to a wide variety of changes in size,

195980

3720

而且，它可以根據不同的大小、

03:21

pose,

201260

1200

姿勢、

03:23

forward, backward.

203100

1856

前、後來做調整。

03:24

This is great.

204980

1216

太棒了。

03:26

This is what we really need

206220

1736

如果我們要建立一個基於電腦視覺系統的實用系統，

03:27

if we're going to build systems on top of computer vision.

207980

2896

這個才會是我真正想要的。

03:30

(Applause)

210900

4000

（掌聲）

03:36

So in just a few years,

216100

2176

所以，才幾年的時間，

03:38

we've gone from 20 seconds per image

218300

2656

我們從每 20 秒處理一張照片，

03:40

to 20 milliseconds per image, a thousand times faster.

220980

3536

進步到每張照片只要 20 毫秒，快了 1000 倍。

03:44

How did we get there?

224540

1416

我們是如何辦到的？

03:45

Well, in the past, object detection systems

225980

3016

過去，物件偵測系統，

03:49

would take an image like this

229020

1936

會把一張像這樣的照片，

03:50

and split it into a bunch of regions

230980

2456

分割成好幾個小區塊，

03:53

and then run a classifier on each of these regions,

233460

3256

然後在每一個小區塊運行分類器軟體，

03:56

and high scores for that classifier

236740

2536

相似度得分如果比較高

03:59

would be considered detections in the image.

239300

3136

會被識別器認為照片偵測成功。

04:02

But this involved running a classifier thousands of times over an image,

242460

4056

但這樣一張圖片要執行好幾千次的識別指令、

04:06

thousands of neural network evaluations to produce detection.

246540

2920

經過好幾千次的神經網路評估才有辦法偵測出來。

04:11

Instead, we trained a single network to do all of detection for us.

251060

4536

但我們不是這樣做，我們訓練了一個網路模型來幫我們完成所有的偵測。

04:15

It produces all of the bounding boxes and class probabilities simultaneously.

255620

4280

它可以同時產出邊界框並同時對可能的結果進行評估。

04:20

With our system, instead of looking at an image thousands of times

260500

3496

有了我們的系統，你就不用一張圖片看了好幾千遍

04:24

to produce detection,

264020

1456

才能偵測出來。

04:25

you only look once,

265500

1256

你只要看一眼 (YOLO)，

04:26

and that's why we call it the YOLO method of object detection.

266780

2920

所以我們簡稱這個物件偵測技術為「YOLO」。

04:31

So with this speed, we're not just limited to images;

271180

3976

所以，有了這樣的辨識速度，我們不只可以偵測圖片；

04:35

we can process video in real time.

275180

2416

還可以處理即時的影片。

04:37

And now, instead of just seeing that cat and dog,

277620

3096

現在各位看到的不是貓、狗的靜態圖片，

04:40

we can see them move around and interact with each other.

280740

2960

而是有牠們在移動、互動的動態影片。

04:46

This is a detector that we trained

286380

2056

這是我們用微軟 COCO 資料集裡

04:48

on 80 different classes

288460

4376

80 種不同的類別

04:52

in Microsoft's COCO dataset.

292860

3256

訓練出來的辨識器。

04:56

It has all sorts of things like spoon and fork, bowl,

296140

3336

它包含各種東西，像是湯匙、叉子、碗

04:59

common objects like that.

299500

1800

這類的日常用品。

05:02

It has a variety of more exotic things:

100

302180

3096

它還有很多奇妙的東西：

05:05

animals, cars, zebras, giraffes.

101

305300

3256

動物、車子、斑馬、長頸鹿。

05:08

And now we're going to do something fun.

102

308580

1936

現在我們要進行一件好玩的事。

05:10

We're just going to go out into the audience

103

310540

2096

我們會進到觀眾席，

05:12

and see what kind of things we can detect.

104

312660

2016

去看看能辨識到哪些東西。

05:14

Does anyone want a stuffed animal?

105

314700

1620

有誰要填充娃娃？

05:17

There are some teddy bears out there.

106

317820

1762

這邊還有一些泰迪熊。

05:21

And we can turn down our threshold for detection a little bit,

107

321860

4536

我們現在降低一下對偵測結果的精確度的要求，

05:26

so we can find more of you guys out in the audience.

108

326420

3400

這樣我們可以在觀眾席中找到更多東西。

05:31

Let's see if we can get these stop signs.

109

331380

2336

我們來看看能不能偵測到停止標誌。

05:33

We find some backpacks.

110

333740

1880

我們有偵測到一些背包。

05:37

Let's just zoom in a little bit.

111

337700

1840

現在把鏡頭拉近一點。

05:42

And this is great.

112

342140

1256

這真的很厲害。

05:43

And all of the processing is happening in real time

113

343420

3176

所有的偵測流程

都可以在筆電裡即時呈現。

05:46

on the laptop.

114

346620

1200

05:48

And it's important to remember

115

348900

1456

更重要的是，

05:50

that this is a general purpose object detection system,

116

350380

3216

這只是一個一般用的物件偵測系統，

05:53

so we can train this for any image domain.

117

353620

5000

我們還可以訓練它辨別任何領域的照片。

06:00

The same code that we use

118

360140

2536

同樣的程式碼，放在自動駕駛車裡，

06:02

to find stop signs or pedestrians,

119

362700

2456

可以偵測到停止標誌、行人、

06:05

bicycles in a self-driving vehicle,

120

365180

1976

腳踏車，

06:07

can be used to find cancer cells

121

367180

2856

但放到組織切片

06:10

in a tissue biopsy.

122

370060

3016

就可以偵測出癌症細胞。

06:13

And there are researchers around the globe already using this technology

123

373100

4040

現在全球有很多研究人員已經開始在使用這項技術

06:18

for advances in things like medicine, robotics.

124

378060

3416

做進一步的研究，像是醫藥、機械人領域。

06:21

This morning, I read a paper

125

381500

1376

今天早上，我讀到一篇文章，

06:22

where they were taking a census of animals in Nairobi National Park

126

382900

4576

在奈洛比國家公園裡，他們要對動物們進行統計調查，

06:27

with YOLO as part of this detection system.

127

387500

3136

YOLO 就是其使用的偵測系統的一部分。

06:30

And that's because Darknet is open source

128

390660

3096

而這一切都是因為暗黑網路是開放原始碼，

06:33

and in the public domain, free for anyone to use.

129

393780

2520

在公眾領域，任何人都可以免費使用。

06:37

(Applause)

130

397420

5696

（掌聲）

06:43

But we wanted to make detection even more accessible and usable,

131

403140

4936

但我們希望偵測系統可以更親民、更好用，

06:48

so through a combination of model optimization,

132

408100

4056

所以在經過模型優化、

06:52

network binarization and approximation,

133

412180

2296

網路二值化及近似度化的整合後，

06:54

we actually have object detection running on a phone.

134

414500

3920

我們終於可以在手機上偵測物件。

07:04

(Applause)

135

424620

5320

（掌聲）

07:10

And I'm really excited because now we have a pretty powerful solution

136

430780

5056

而我真的相當興奮，因為我們現在

在低階的電腦影像處理問題上有了相當強力的解決方式，

07:15

to this low-level computer vision problem,

137

435860

2296

07:18

and anyone can take it and build something with it.

138

438180

3856

任何人都可以拿去並創造一些東西。

07:22

So now the rest is up to all of you

139

442060

3176

所以，接下來就看各位

07:25

and people around the world with access to this software,

140

445260

2936

以及全世界所有人用這個軟體大展身手了，

07:28

and I can't wait to see what people will build with this technology.

141

448220

3656

我真的等不及想看看你們用這項科技所做出來的產品。

07:31

Thank you.

142

451900

1216

謝謝。

07:33

(Applause)

143

453140

3440

（掌聲）

New videos

06:51

The Rise of China's Homegrown Brands — and Why ...

06:45

Parkour! How the Sport Keeps Your Body and Mind...

05:38

Can you solve the riddle of Pandora’s box? - Al...

05:59

The tale of the Monkey King and the Buddha - Ji...

10:03

Which species would you get rid of? | Ada, Ep. 5

05:29

How are microchips made? - George Zaidan and Sa...

10:03

Why Daylight Is the Secret to Great Sleep | Chr...

11:12

6 Ways to Make Better Connections Online | Marg...

Original video on YouTube.com

How computers learn to recognize objects instantly | Joseph Redmon - YouTube

關於本網站

本網站將向您介紹對學習英語有用的 YouTube 視頻。您將看到來自世界各地的一流教師教授的英語課程。雙擊每個視頻頁面上顯示的英文字幕，從那裡播放視頻。字幕與視頻播放同步滾動。如果您有任何意見或要求，請使用此聯繫表與我們聯繫。

https://forms.gle/WvT1wiN1qDtmnspy7

Playback speed

Subtitle font size

How computers learn to recognize objects instantly | Joseph Redmon

New videos

How computers learn to recognize objects instantly | Joseph Redmon

New videos

Original video on YouTube.com