How computers learn to recognize objects instantly | Joseph Redmon

1,132,480 views ・ 2017-08-18

TED

Fare doppio clic sui sottotitoli in inglese per riprodurre il video.

Traduttore: Elisabetta Siagri Revisore: Maria Carmina Distratto

00:12

Ten years ago,

12645

1151

Dieci anni fa,

00:13

computer vision researchers thought that getting a computer

13820

2776

i ricercatori di visione artificiale pensavano

che fare in modo che un computer

00:16

to tell the difference between a cat and a dog

16620

2696

riuscisse a differenziare un gatto e un cane

00:19

would be almost impossible,

19340

1976

sarebbe stato quasi impossibile,

00:21

even with the significant advance in the state of artificial intelligence.

21340

3696

nonostante il progresso significativo nel campo dell'intelligenza artificiale.

00:25

Now we can do it at a level greater than 99 percent accuracy.

25060

3560

Ora possiamo farlo a un livello di precisione superiore al 99 per cento.

00:29

This is called image classification --

29500

1856

Questa è chiamata classificazione d'immagini --

00:31

give it an image, put a label to that image --

31380

3096

dategli un'immagine, etichettate quell'immagine --

00:34

and computers know thousands of other categories as well.

34500

3040

e i computer riconoscono anche migliaia di altre categorie.

00:38

I'm a graduate student at the University of Washington,

38500

2896

Sono un dottorando della University of Washington,

00:41

and I work on a project called Darknet,

41420

1896

e lavoro su un progetto chiamato Darknet,

00:43

which is a neural network framework

43340

1696

che è un framework di rete neurale

00:45

for training and testing computer vision models.

45060

2816

per sviluppare e testare i modelli di visione artificiale.

00:47

So let's just see what Darknet thinks

47900

2976

Quindi vediamo cosa ne pensa Darknet

00:50

of this image that we have.

50900

1760

di quest'immagine.

00:54

When we run our classifier

54340

2336

Quando eseguiamo il nostro classificatore

00:56

on this image,

56700

1216

su quest'immagine,

00:57

we see we don't just get a prediction of dog or cat,

57940

2456

vediamo che non otteniamo solo la previsione di un cane o di un gatto,

01:00

we actually get specific breed predictions.

60420

2336

ma la previsione della razza specifica.

01:02

That's the level of granularity we have now.

62780

2176

Questo è il livello di precisione attuale.

01:04

And it's correct.

64980

1616

Ed è corretto.

01:06

My dog is in fact a malamute.

66620

1840

Infatti, il mio cane è un Alaskan Malamute.

01:08

So we've made amazing strides in image classification,

68860

4336

Abbiamo fatto dei passi da gigante nella classificazione di immagini,

01:13

but what happens when we run our classifier

73220

2000

ma cosa succede quando passiamo il nostro classificatore

01:15

on an image that looks like this?

75244

1960

su un'immagine come questa?

01:18

Well ...

78900

1200

Beh...

01:24

We see that the classifier comes back with a pretty similar prediction.

84460

3896

Vediamo che il classificatore ritorna con una predizione abbastanza simile.

01:28

And it's correct, there is a malamute in the image,

88380

3096

Ed è corretto, c'è un Alaskan Malamute sull'immagine,

01:31

but just given this label, we don't actually know that much

91500

3696

ma con questa sola etichetta, non ne sappiamo poi molto

01:35

about what's going on in the image.

95220

1667

di quello che succede nell'immagine.

01:36

We need something more powerful.

96911

1560

Abbiamo bisogno di qualcosa di più potente.

01:39

I work on a problem called object detection,

99060

2616

Io lavoro su un problema chiamato riconoscimento di oggetti,

01:41

where we look at an image and try to find all of the objects,

101700

2936

dove guardiamo un'immagine e cerchiamo di trovare gli oggetti,

01:44

put bounding boxes around them

104660

1456

li delimitiamo con dei "bounding boxes"

01:46

and say what those objects are.

106140

1520

e definiamo quegli oggetti.

01:48

So here's what happens when we run a detector on this image.

108220

3280

Questo è quello che succede

quando passiamo un rilevatore su quest'immagine.

01:53

Now, with this kind of result,

113060

2256

Ora, con questo tipo di risultato,

01:55

we can do a lot more with our computer vision algorithms.

115340

2696

possiamo fare molto di più

con i nostri algoritmi di visione artificiale.

01:58

We see that it knows that there's a cat and a dog.

118060

2976

Vediamo che riconosce che ci sono un gatto e un cane.

02:01

It knows their relative locations,

121060

2256

Conosce la loro posizione,

02:03

their size.

123340

1216

la loro taglia.

02:04

It may even know some extra information.

124580

1936

Potrebbe addirittura conoscere informazioni extra.

02:06

There's a book sitting in the background.

126540

1960

C'è un libro sullo sfondo.

02:09

And if you want to build a system on top of computer vision,

129100

3256

Se vuoi costruire un sistema basato sulla visione artificiale,

02:12

say a self-driving vehicle or a robotic system,

132380

3456

diciamo un'auto senza pilota o un sistema robotico,

02:15

this is the kind of information that you want.

135860

2456

questo è il tipo di informazione che vuoi.

02:18

You want something so that you can interact with the physical world.

138340

3239

Vuoi qualcosa per poter interagire con il mondo fisico.

02:22

Now, when I started working on object detection,

142579

2257

Quando ho iniziato a lavorare sul riconoscimento di oggetti,

02:24

it took 20 seconds to process a single image.

144860

3296

servivano 20 secondi per processare una sola immagine.

02:28

And to get a feel for why speed is so important in this domain,

148180

3880

E per capire quanto è importante la velocità in questo settore,

02:32

here's an example of an object detector

152940

2536

ecco un esempio di un rilevatore di oggetti

02:35

that takes two seconds to process an image.

155500

2416

che impiega due secondi per processare un'immagine.

02:37

So this is 10 times faster

157940

2616

Quindi questo è 10 volte più veloce

02:40

than the 20-seconds-per-image detector,

160580

3536

del rilevatore a 20 secondi a immagine,

02:44

and you can see that by the time it makes predictions,

164140

2656

e potete vedere che nel momento in cui fa le sue predizioni

02:46

the entire state of the world has changed,

166820

2040

l'aspetto del mondo è cambiato,

02:49

and this wouldn't be very useful

169700

2416

e questo non sarebbe utile

02:52

for an application.

172140

1416

per un'applicazione.

02:53

If we speed this up by another factor of 10,

173580

2496

Se velocizziamo di un altro fattore 10,

02:56

this is a detector running at five frames per second.

176100

2816

questo è un rilevatore che funziona a cinque immagini al secondo.

02:58

This is a lot better,

178940

1536

È molto meglio,

03:00

but for example,

180500

1976

ma, ad esempio,

03:02

if there's any significant movement,

182500

2296

se c'è un qualsiasi movimento significativo,

03:04

I wouldn't want a system like this driving my car.

184820

2560

non vorrei un sistema come questo mentre guido.

03:08

This is our detection system running in real time on my laptop.

188940

3240

Questo è il nostro sistema di riconoscimento

in funzione in tempo reale sul mio computer.

03:12

So it smoothly tracks me as I move around the frame,

192820

3136

Quindi mi identifica senza problemi mentre mi muovo sull'immagine,

03:15

and it's robust to a wide variety of changes in size,

195980

3720

ed è efficace anche quando cambiano la taglia,

03:21

pose,

201260

1200

la posa,

03:23

forward, backward.

203100

1856

avanti, indietro.

03:24

This is great.

204980

1216

È fantastico.

03:26

This is what we really need

206220

1736

Questo è ciò di cui abbiamo davvero bisogno

03:27

if we're going to build systems on top of computer vision.

207980

2896

se vogliamo costruire sistemi basati sulla visione artificiale.

03:30

(Applause)

210900

4000

(Applausi)

03:36

So in just a few years,

216100

2176

Quindi in pochi anni,

03:38

we've gone from 20 seconds per image

218300

2656

siamo passati da 20 secondi a immagine

03:40

to 20 milliseconds per image, a thousand times faster.

220980

3536

a 20 millisecondi a immagine, mille volte più veloce.

03:44

How did we get there?

224540

1416

Come ci siamo riusciti?

03:45

Well, in the past, object detection systems

225980

3016

In passato, i sistemi di riconoscimento di oggetti

03:49

would take an image like this

229020

1936

avrebbero preso un'immagine come questa

03:50

and split it into a bunch of regions

230980

2456

e l'avrebbero divisa in un insieme di regioni

03:53

and then run a classifier on each of these regions,

233460

3256

e poi passato un classificatore su ognuna di queste regioni,

03:56

and high scores for that classifier

236740

2536

e punteggi elevati per quel classificatore

03:59

would be considered detections in the image.

239300

3136

sarebbero stati considerati come riconoscimenti nell'immagine.

04:02

But this involved running a classifier thousands of times over an image,

242460

4056

Ma questo voleva dire

passare un classificatore migliaia di volte su un'immagine,

04:06

thousands of neural network evaluations to produce detection.

246540

2920

migliaia di valutazioni di rete neurale per produrre il riconoscimento.

04:11

Instead, we trained a single network to do all of detection for us.

251060

4536

Invece, abbiamo allenato una singola rete a fare tutto il riconoscimento per noi.

04:15

It produces all of the bounding boxes and class probabilities simultaneously.

255620

4280

Produce tutti i bounding boxes e ordina le probabilità simultaneamente.

04:20

With our system, instead of looking at an image thousands of times

260500

3496

Con il nostro sistema, invece di guardare un'immagine migliaia di volte

04:24

to produce detection,

264020

1456

per ottenere il riconoscimento,

04:25

you only look once,

265500

1256

guardi una volta sola,

04:26

and that's why we call it the YOLO method of object detection.

266780

2920

ed è per questo che lo chiamiamo

il metodo YOLO del riconoscimento d'oggetti.

04:31

So with this speed, we're not just limited to images;

271180

3976

Con questa velocità possiamo quindi non limitarci alle immagini;

04:35

we can process video in real time.

275180

2416

ma possiamo processare video in tempo reale.

04:37

And now, instead of just seeing that cat and dog,

277620

3096

E ora, invece di vedere solo il cane e il gatto,

04:40

we can see them move around and interact with each other.

280740

2960

possiamo vederli muovere e interagire tra loro.

04:46

This is a detector that we trained

286380

2056

Questo è un rilevatore che abbiamo allenato

04:48

on 80 different classes

288460

4376

su 80 classi diverse

04:52

in Microsoft's COCO dataset.

292860

3256

nel dataset COCO di Microsoft.

04:56

It has all sorts of things like spoon and fork, bowl,

296140

3336

Contiene di tutto come cucchiaio e forchetta, ciotola,

04:59

common objects like that.

299500

1800

oggetti comuni come questi.

05:02

It has a variety of more exotic things:

100

302180

3096

Ma anche una varietà di cose più esotiche:

05:05

animals, cars, zebras, giraffes.

101

305300

3256

animali, auto, zebre, giraffe.

05:08

And now we're going to do something fun.

102

308580

1936

E adesso facciamo qualcosa di divertente.

05:10

We're just going to go out into the audience

103

310540

2096

Ci metteremo in mezzo al pubblico

05:12

and see what kind of things we can detect.

104

312660

2016

per vedere che tipo di oggetti possiamo identificare.

05:14

Does anyone want a stuffed animal?

105

314700

1620

Qualcuno vuole un peluche?

05:17

There are some teddy bears out there.

106

317820

1762

Ci sono degli orsacchiotti lì in mezzo.

05:21

And we can turn down our threshold for detection a little bit,

107

321860

4536

E possiamo abbassare un po' la nostra soglia di riconoscimento,

05:26

so we can find more of you guys out in the audience.

108

326420

3400

così possiamo riconoscervi meglio in mezzo al pubblico.

05:31

Let's see if we can get these stop signs.

109

331380

2336

Vediamo se riusciamo a trovare dei segnali di stop.

05:33

We find some backpacks.

110

333740

1880

Troviamo degli zaini.

05:37

Let's just zoom in a little bit.

111

337700

1840

Facciamo uno zoom.

05:42

And this is great.

112

342140

1256

Ed è fantastico.

05:43

And all of the processing is happening in real time

113

343420

3176

E tutto il processo avviene in tempo reale

05:46

on the laptop.

114

346620

1200

sul computer.

05:48

And it's important to remember

115

348900

1456

Ed è importante ricordare

05:50

that this is a general purpose object detection system,

116

350380

3216

che questo è un sistema di riconoscimento di oggetti

di uso generale,

05:53

so we can train this for any image domain.

117

353620

5000

quindi lo possiamo allenare per qualsiasi settore di immagini.

06:00

The same code that we use

118

360140

2536

Lo stesso codice che usiamo

06:02

to find stop signs or pedestrians,

119

362700

2456

per trovare segnali di stop o pedoni,

06:05

bicycles in a self-driving vehicle,

120

365180

1976

biciclette in un veicolo con pilota automatico,

06:07

can be used to find cancer cells

121

367180

2856

può essere usato per trovare cellule cancerose

06:10

in a tissue biopsy.

122

370060

3016

durante una biopsia.

06:13

And there are researchers around the globe already using this technology

123

373100

4040

E ci sono ricercatori in tutto il mondo che stanno già usando questa tecnologia

06:18

for advances in things like medicine, robotics.

124

378060

3416

per fare passi avanti in campi come la medicina, la robotica.

06:21

This morning, I read a paper

125

381500

1376

Questa mattina, ho letto un articolo

06:22

where they were taking a census of animals in Nairobi National Park

126

382900

4576

in cui si parlava di un censimento degli animali al Nairobi National Park

06:27

with YOLO as part of this detection system.

127

387500

3136

con YOLO integrato nel sistema di riconoscimento.

06:30

And that's because Darknet is open source

128

390660

3096

Ed è perché Darknet è open source

06:33

and in the public domain, free for anyone to use.

129

393780

2520

ed è di dominio pubblico, e chiunque può utilizzarlo liberamente.

06:37

(Applause)

130

397420

5696

(Applausi)

06:43

But we wanted to make detection even more accessible and usable,

131

403140

4936

Ma volevamo rendere il riconoscimento ancora più accessibile e fruibile,

06:48

so through a combination of model optimization,

132

408100

4056

e attraverso una combinazione di ottimizzazione del modello,

06:52

network binarization and approximation,

133

412180

2296

binarizzazione di rete e approssimazione,

06:54

we actually have object detection running on a phone.

134

414500

3920

abbiamo un riconoscimento di oggetti che funziona su un telefono.

07:04

(Applause)

135

424620

5320

(Applausi)

07:10

And I'm really excited because now we have a pretty powerful solution

136

430780

5056

E sono davvero contento perché abbiamo una soluzione piuttosto efficace

07:15

to this low-level computer vision problem,

137

435860

2296

a questo problema di visione di computer di basso livello,

07:18

and anyone can take it and build something with it.

138

438180

3856

e chiunque può prenderlo e costruirci qualcosa.

07:22

So now the rest is up to all of you

139

442060

3176

Quindi il resto è nelle vostre mani

e in quelle delle persone nel mondo che hanno accesso a questo software,

07:25

and people around the world with access to this software,

140

445260

2936

07:28

and I can't wait to see what people will build with this technology.

141

448220

3656

e sono impaziente di vedere

cosa le persone faranno con questa tecnologia.

07:31

Thank you.

142

451900

1216

Grazie.

07:33

(Applause)

143

453140

3440

(Applausi)

New videos

06:16

How important is politeness? ⏲️ 6 Minute English

07:44

North Korea’s secrets revealed by phone: Study:...

17:30

Advanced English Learning: Speaking Practice

03:48

What can you do? Easy English Conversations 💬 ...

12:13

Speak English Confidently: Daily Tricks & Tips 🧠

13:00

Practice English Conversation (Family life) Imp...

10:22

VOCABULARY English Speaking Practice

11:45

3 Simple Steps to Become Fluent in English

Original video on YouTube.com

How computers learn to recognize objects instantly | Joseph Redmon - YouTube

A proposito di questo sito web

Questo sito vi presenterà i video di YouTube utili per l'apprendimento dell'inglese. Vedrete lezioni di inglese tenute da insegnanti di alto livello provenienti da tutto il mondo. Fate doppio clic sui sottotitoli in inglese visualizzati su ogni pagina video per riprodurre il video da lì. I sottotitoli scorrono in sincronia con la riproduzione del video. Se avete commenti o richieste, contattateci tramite questo modulo di contatto.

https://forms.gle/WvT1wiN1qDtmnspy7

Playback speed

Subtitle font size

How computers learn to recognize objects instantly | Joseph Redmon

New videos

How computers learn to recognize objects instantly | Joseph Redmon

New videos

Original video on YouTube.com