How computers learn to recognize objects instantly | Joseph Redmon

1,132,480 views ・ 2017-08-18

TED

אנא לחץ פעמיים על הכתוביות באנגלית למטה כדי להפעיל את הסרטון.

מתרגם: Zeeva Livshitz מבקר: Ido Dekkers

00:12

Ten years ago,

12645

1151

לפני עשר שנים,

00:13

computer vision researchers thought that getting a computer

13820

2776

מדעני ראייה ממוחשבת חשבו שלגרום למחשב

00:16

to tell the difference between a cat and a dog

16620

2696

להבדיל בין חתול לכלב

00:19

would be almost impossible,

19340

1976

יהיה כמעט בלתי אפשרי,

00:21

even with the significant advance in the state of artificial intelligence.

21340

3696

אפילו עם התקדמות משמעותית במצב של הבינה המלאכותית.

00:25

Now we can do it at a level greater than 99 percent accuracy.

25060

3560

עכשיו אנחנו יכולים לעשות זאת ברמת דיוק של למעלה מ 99 אחוז.

00:29

This is called image classification --

29500

1856

זה נקרא סיווג תמונה --

00:31

give it an image, put a label to that image --

31380

3096

מעלים תמונה ושמים עליה תווית --

00:34

and computers know thousands of other categories as well.

34500

3040

ומחשבים מכירים אלפי קטגוריות אחרות גם כן.

00:38

I'm a graduate student at the University of Washington,

38500

2896

אני סטודנט לתואר שני באוניברסיטת וושינגטון,

00:41

and I work on a project called Darknet,

41420

1896

ואני עובד על פרויקט שנקרא "דארקנט",

00:43

which is a neural network framework

43340

1696

שהוא מסגרת של רשת עצבית

00:45

for training and testing computer vision models.

45060

2816

להכשרה ובדיקת מודלים של ראייה ממוחשבת.

00:47

So let's just see what Darknet thinks

47900

2976

אז בואו ונראה מה "דארקנט" חושבת

00:50

of this image that we have.

50900

1760

על תמונה זו שיש לנו.

00:54

When we run our classifier

54340

2336

כאשר אנו מפעילים את המסווג שלנו

00:56

on this image,

56700

1216

על התמונה הזו,

00:57

we see we don't just get a prediction of dog or cat,

57940

2456

רואים שלא רק מקבלים חיזוי של כלב או חתול,

01:00

we actually get specific breed predictions.

60420

2336

אנחנו למעשה מקבלים תחזיות של גזע ספציפי.

01:02

That's the level of granularity we have now.

62780

2176

זוהי רמת הפירוט שיש לנו עכשיו.

01:04

And it's correct.

64980

1616

והיא נכונה.

01:06

My dog is in fact a malamute.

66620

1840

הכלב שלי למעשה הוא מלמוט.

01:08

So we've made amazing strides in image classification,

68860

4336

אז עשינו צעדים מדהימים בסיווג תמונות,

01:13

but what happens when we run our classifier

73220

2000

אבל מה קורה כשאנו מפעילים את המסווג

01:15

on an image that looks like this?

75244

1960

על תמונה שנראית כמו זו?

01:18

Well ...

78900

1200

טוב ...

01:24

We see that the classifier comes back with a pretty similar prediction.

84460

3896

אנו רואים שהמסווג נותן תחזית די דומה.

01:28

And it's correct, there is a malamute in the image,

88380

3096

וזה נכון. יש מלמוט בתמונה.

01:31

but just given this label, we don't actually know that much

91500

3696

אבל רק בהתחשב בתווית זו, איננו ממש יודעים כל כך הרבה

01:35

about what's going on in the image.

95220

1667

על מה שקורה בתמונה.

01:36

We need something more powerful.

96911

1560

אנחנו צריכים משהו חזק יותר.

01:39

I work on a problem called object detection,

99060

2616

אני עובד על בעיה שנקראת זיהוי אובייקט,

01:41

where we look at an image and try to find all of the objects,

101700

2936

שבה אנו מסתכלים על תמונה ומנסים למצוא את כל האובייקטים,

01:44

put bounding boxes around them

104660

1456

שמים קופסאות תוחמות סביבם

01:46

and say what those objects are.

106140

1520

ואומרים מה הם אובייקטים אלה:

01:48

So here's what happens when we run a detector on this image.

108220

3280

אז זה מה שקורה כשאנו מפעילים גלאי על התמונה הזאת.

01:53

Now, with this kind of result,

113060

2256

עכשיו, עם סוג זה של תוצאה,

01:55

we can do a lot more with our computer vision algorithms.

115340

2696

נוכל לעשות הרבה יותר עם האלגוריתמים של הראייה הממוחשבת.

01:58

We see that it knows that there's a cat and a dog.

118060

2976

אנחנו רואים שהוא מזהה שיש חתול וכלב.

02:01

It knows their relative locations,

121060

2256

הוא יודע את המקומות היחסיים שלהם,

02:03

their size.

123340

1216

את גודלם.

02:04

It may even know some extra information.

124580

1936

הוא אולי אפילו יודע עוד מידע נוסף כלשהו.

02:06

There's a book sitting in the background.

126540

1960

יש ספר שמונח ברקע.

02:09

And if you want to build a system on top of computer vision,

129100

3256

ואם רוצים לבנות שיטה על גבי ראייה ממוחשבת,

02:12

say a self-driving vehicle or a robotic system,

132380

3456

למשל, רכב נהיגה עצמית או מערכת רובוטית,

02:15

this is the kind of information that you want.

135860

2456

זה סוג המידע שמעונינים בו.

02:18

You want something so that you can interact with the physical world.

138340

3239

רוצים משהו שיאפשר לתקשר עם העולם הפיזי.

02:22

Now, when I started working on object detection,

142579

2257

עכשיו, כשהתחלתי לעבוד על זיהוי אובייקט,

02:24

it took 20 seconds to process a single image.

144860

3296

לקח 20 שניות כדי לעבד תמונה בודדת.

02:28

And to get a feel for why speed is so important in this domain,

148180

3880

וכדי לקבל תחושה לסיבה שמהירות כה חשובה בתחום זה,

02:32

here's an example of an object detector

152940

2536

הנה דוגמה של גלאי אובייקט

02:35

that takes two seconds to process an image.

155500

2416

שלוקח לו שתי שניות לעבד תמונה.

02:37

So this is 10 times faster

157940

2616

אז זה פי 10 מהר יותר

02:40

than the 20-seconds-per-image detector,

160580

3536

מה20 שניות לתמונה של גלאי תמונה,

02:44

and you can see that by the time it makes predictions,

164140

2656

ואתם יכולים לראות שעד שזה עושה תחזיות,

02:46

the entire state of the world has changed,

166820

2040

המצב כולו של העולם השתנה,

02:49

and this wouldn't be very useful

169700

2416

וזה לא יהיה מאוד שימושי

02:52

for an application.

172140

1416

עבור יישום.

02:53

If we speed this up by another factor of 10,

173580

2496

אם נאיץ את זה לפי מקדם נוסף של 10,

02:56

this is a detector running at five frames per second.

176100

2816

זה יהיה גלאי שרץ בחמש מסגרות לשנייה.

02:58

This is a lot better,

178940

1536

זה הרבה יותר טוב,

03:00

but for example,

180500

1976

אבל לדוגמה,

03:02

if there's any significant movement,

182500

2296

אם יש תנועה משמעותית,

03:04

I wouldn't want a system like this driving my car.

184820

2560

לא הייתי רוצה שמערכת כזו תנהג במכונית שלי.

03:08

This is our detection system running in real time on my laptop.

188940

3240

זוהי מערכת האיתור שלנו שרצה בזמן אמת על המחשב הנייד שלי.

03:12

So it smoothly tracks me as I move around the frame,

192820

3136

כך היא עוקבת אחרי בצורה חלקה כשאני זז סביב המסגרת,

03:15

and it's robust to a wide variety of changes in size,

195980

3720

והיא חסינה למגוון רחב של שינויים בגודל,

03:21

pose,

201260

1200

העמדה,

03:23

forward, backward.

203100

1856

קדימה, אחורה.

03:24

This is great.

204980

1216

זה נהדר.

03:26

This is what we really need

206220

1736

זה מה שאנחנו באמת צריכים

03:27

if we're going to build systems on top of computer vision.

207980

2896

אם אנחנו הולכים לבנות מערכות על גבי ראייה ממוחשבת.

03:30

(Applause)

210900

4000

(מחיאות כפיים)

03:36

So in just a few years,

216100

2176

אז תוך שנים אחדות,

03:38

we've gone from 20 seconds per image

218300

2656

עברנו מ -20 שניות לתמונה

03:40

to 20 milliseconds per image, a thousand times faster.

220980

3536

ל 20 אלפיות השנייה, פי אלף יותר מהר.

03:44

How did we get there?

224540

1416

איך הגענו לזה?

03:45

Well, in the past, object detection systems

225980

3016

בעבר, מערכות לאיתור אובייקטים

03:49

would take an image like this

229020

1936

היו לוקחות תמונה כמו זו

03:50

and split it into a bunch of regions

230980

2456

ומפצלות אותה לקבוצה של אזורים

03:53

and then run a classifier on each of these regions,

233460

3256

ולאחר מכן מפעילות מסווג על כל אחד מאזורים אלה,

03:56

and high scores for that classifier

236740

2536

וציונים גבוהים עבור מסווג זה

03:59

would be considered detections in the image.

239300

3136

ייחשבו זיהויים בתמונה.

04:02

But this involved running a classifier thousands of times over an image,

242460

4056

אבל זה כרוך בהפעלת מסווג אלפי פעמים על תמונה,

04:06

thousands of neural network evaluations to produce detection.

246540

2920

אלפי הערכות של רשת עצבית כדי לייצר זיהוי.

04:11

Instead, we trained a single network to do all of detection for us.

251060

4536

במקום זה, הכשרנו רשת אחת לעשות את כל הזיהוי עבורנו.

04:15

It produces all of the bounding boxes and class probabilities simultaneously.

255620

4280

היא מייצרת את כל תיבות התחימה ואת סוג ההסתברויות בו זמנית.

04:20

With our system, instead of looking at an image thousands of times

260500

3496

עם המערכת שלנו, במקום להסתכל על תמונה אלפי פעמים

04:24

to produce detection,

264020

1456

כדי לייצר זיהוי,

04:25

you only look once,

265500

1256

מסתכלים רק פעם אחת,

04:26

and that's why we call it the YOLO method of object detection.

266780

2920

ולכן אנחנו קוראים לזה שיטת YOLO לזיהוי אובייקט.

04:31

So with this speed, we're not just limited to images;

271180

3976

אז עם מהירות זו, איננו מוגבלים רק לתמונות;

04:35

we can process video in real time.

275180

2416

אנו יכולים לעבד וידאו בזמן אמת.

04:37

And now, instead of just seeing that cat and dog,

277620

3096

ועכשיו, במקום לראות רק את החתול והכלב האלה,

04:40

we can see them move around and interact with each other.

280740

2960

אנחנו יכולים לראות אותם נעים סביב ומתקשרים אחד עם השני.

04:46

This is a detector that we trained

286380

2056

זהו גלאי שאימנו

04:48

on 80 different classes

288460

4376

על 80 סוגים שונים

04:52

in Microsoft's COCO dataset.

292860

3256

במערך הנתונים COCO של מיקרוסופט.

04:56

It has all sorts of things like spoon and fork, bowl,

296140

3336

יש בו כל מיני דברים כמו כף ומזלג, קערה,

04:59

common objects like that.

299500

1800

חפצים רגילים כאלה.

05:02

It has a variety of more exotic things:

100

302180

3096

יש לו מגוון של דברים אקזוטיים יותר:

05:05

animals, cars, zebras, giraffes.

101

305300

3256

חיות, מכוניות, זברות, ג'ירפות.

05:08

And now we're going to do something fun.

102

308580

1936

ועכשיו אנחנו הולכים לעשות משהו מהנה.

05:10

We're just going to go out into the audience

103

310540

2096

אנחנו פשוט יוצאים אל הקהל

05:12

and see what kind of things we can detect.

104

312660

2016

כדי לראות איזה סוג של דברים נוכל לזהות.

05:14

Does anyone want a stuffed animal?

105

314700

1620

האם מישהו רוצה בובת חיה?

05:17

There are some teddy bears out there.

106

317820

1762

יש כמה בובות דובי שם.

05:21

And we can turn down our threshold for detection a little bit,

107

321860

4536

ואנחנו יכול להנמיך מעט את סף הזיהוי שלנו,

05:26

so we can find more of you guys out in the audience.

108

326420

3400

כדי שנוכל למצוא יותר אנשים מביניכם, בקהל.

05:31

Let's see if we can get these stop signs.

109

331380

2336

בואו ונראה אם נוכל לתפוס תמרורי עצור אלה.

05:33

We find some backpacks.

110

333740

1880

אנחנו מוצאים כמה תרמילי גב.

05:37

Let's just zoom in a little bit.

111

337700

1840

בואו פשוט נגדיל קצת.

05:42

And this is great.

112

342140

1256

וזה נהדר.

05:43

And all of the processing is happening in real time

113

343420

3176

וכל העיבוד קורה בזמן אמת

05:46

on the laptop.

114

346620

1200

על המחשב הנייד.

05:48

And it's important to remember

115

348900

1456

וחשוב לזכור

05:50

that this is a general purpose object detection system,

116

350380

3216

שזוהי מערכת זיהוי אובייקט למטרה כללית,

05:53

so we can train this for any image domain.

117

353620

5000

כך שנוכל להכשיר אותה עבור תמונה מכל תחום.

06:00

The same code that we use

118

360140

2536

אותו קוד שבו אנו משתמשים

06:02

to find stop signs or pedestrians,

119

362700

2456

כדי למצוא שלטי עצור או הולכי רגל,

06:05

bicycles in a self-driving vehicle,

120

365180

1976

אופניים ברכב לנהיגה עצמית,

06:07

can be used to find cancer cells

121

367180

2856

יכול לשמש כדי למצוא תאים סרטניים

06:10

in a tissue biopsy.

122

370060

3016

בביופסיה של רקמה.

06:13

And there are researchers around the globe already using this technology

123

373100

4040

ויש חוקרים ברחבי העולם שכבר משתמשים בטכנולוגיה זו

06:18

for advances in things like medicine, robotics.

124

378060

3416

לקדם תחומים כמו רפואה, ורובוטיקה.

06:21

This morning, I read a paper

125

381500

1376

הבוקר קראתי עיתון

06:22

where they were taking a census of animals in Nairobi National Park

126

382900

4576

שבו ערכו מפקד של בעלי חיים בפארק הלאומי של ניירובי

06:27

with YOLO as part of this detection system.

127

387500

3136

עם YOLO כחלק של מערכת זיהוי זו.

06:30

And that's because Darknet is open source

128

390660

3096

וזה בגלל ש "דארקנט" הוא קוד פתוח

06:33

and in the public domain, free for anyone to use.

129

393780

2520

עבור רשות הרבים, וללא תשלום, לכל מי שרוצה להשתמש,

06:37

(Applause)

130

397420

5696

(מחיאות כפיים)

06:43

But we wanted to make detection even more accessible and usable,

131

403140

4936

אבל רצינו לעשות את הזיהוי לאפילו יותר נגיש ושמיש,

06:48

so through a combination of model optimization,

132

408100

4056

כך שבאמצעות שילוב של אופטימיזציה של המודל,

06:52

network binarization and approximation,

133

412180

2296

בינאריזציה ואומדנות של רשת,

06:54

we actually have object detection running on a phone.

134

414500

3920

יש לנו למעשה זיהוי אובייקט שרץ בטלפון.

07:04

(Applause)

135

424620

5320

(מחיאות כפיים)

07:10

And I'm really excited because now we have a pretty powerful solution

136

430780

5056

ואני באמת מתרגש כי עכשיו יש לנו פתרון די חזק

07:15

to this low-level computer vision problem,

137

435860

2296

לבעיית ראייה ממוחשבת ברמה נמוכה זו.

07:18

and anyone can take it and build something with it.

138

438180

3856

וכל אחד יכול לקחת את זה ולבנות עם זה משהו.

07:22

So now the rest is up to all of you

139

442060

3176

אז עכשיו כל השאר תלוי בכם

07:25

and people around the world with access to this software,

140

445260

2936

ובאנשים ברחבי העולם עם גישה לתוכנה זו,

07:28

and I can't wait to see what people will build with this technology.

141

448220

3656

ואני לא יכול לחכות לראות מה אנשים יבנו עם טכנולוגיה זו.

07:31

Thank you.

142

451900

1216

תודה רבה.

07:33

(Applause)

143

453140

3440

(מחיאות כפיים)

New videos

07:02

How to learn English with the 'learning curve' ...

07:29

'MONEY makes the WORLD go round' - Mr Duncan ex...

02:09:01

English Addict Ep 359 -🔴LIVE stream / Sunday 2...

01:13:01

English Addict Ep 358 -🔴LIVE stream / Wednesda...

05:30

The golden rules for Learning English - Mr Dunc...

03:41

How to speak English with confidence - Mr Dunca...

15:27

Effective English Listening Practice with Short...

10:55

Master English Pronunciation (The Daily Pronunc...

Original video on YouTube.com

How computers learn to recognize objects instantly | Joseph Redmon - YouTube

על אתר זה

אתר זה יציג בפניכם סרטוני YouTube המועילים ללימוד אנגלית. תוכלו לראות שיעורי אנגלית המועברים על ידי מורים מהשורה הראשונה מרחבי העולם. לחץ פעמיים על הכתוביות באנגלית המוצגות בכל דף וידאו כדי להפעיל את הסרטון משם. הכתוביות גוללות בסנכרון עם הפעלת הווידאו. אם יש לך הערות או בקשות, אנא צור איתנו קשר באמצעות טופס יצירת קשר זה.

https://forms.gle/WvT1wiN1qDtmnspy7

Playback speed

Subtitle font size

How computers learn to recognize objects instantly | Joseph Redmon

New videos

How computers learn to recognize objects instantly | Joseph Redmon

New videos

Original video on YouTube.com