How computers learn to recognize objects instantly | Joseph Redmon

1,119,896 views

2017-08-18 ・ TED


New videos

How computers learn to recognize objects instantly | Joseph Redmon

1,119,896 views ・ 2017-08-18

TED


Dvaput kliknite na engleske titlove ispod za reprodukciju videozapisa.

Prevoditelj: Ivan Nekić Recezent: Sanda L
00:12
Ten years ago,
0
12645
1151
Prije deset godina,
00:13
computer vision researchers thought that getting a computer
1
13820
2776
istraživači računalnog vida mislili su da je naučiti računalo
00:16
to tell the difference between a cat and a dog
2
16620
2696
kako razlikovati između mačke i psa
00:19
would be almost impossible,
3
19340
1976
gotovo nemoguće,
00:21
even with the significant advance in the state of artificial intelligence.
4
21340
3696
čak i uz značajan napredak u razvoju umjetne inteligencije.
00:25
Now we can do it at a level greater than 99 percent accuracy.
5
25060
3560
Sad to možemo učiniti s više od 99 posto točnosti.
00:29
This is called image classification --
6
29500
1856
To se zove klasifikacija slike -
00:31
give it an image, put a label to that image --
7
31380
3096
dati sliku, staviti oznaku na sliku -
00:34
and computers know thousands of other categories as well.
8
34500
3040
a računala znaju i tisuće drugih kategorija.
00:38
I'm a graduate student at the University of Washington,
9
38500
2896
Ja sam postdiplomac na Sveučilištu u Washingtonu
00:41
and I work on a project called Darknet,
10
41420
1896
i radim na projektu pod nazivom Darknet,
00:43
which is a neural network framework
11
43340
1696
što je neuronska mrežna struktura
00:45
for training and testing computer vision models.
12
45060
2816
za obuku i testiranje modela računalnog vida.
00:47
So let's just see what Darknet thinks
13
47900
2976
Pa pogledajmo što Darknet misli
00:50
of this image that we have.
14
50900
1760
o ovoj slici koju imamo.
00:54
When we run our classifier
15
54340
2336
Kad smo pokrenuti naš klasifikator
00:56
on this image,
16
56700
1216
na ovoj slici,
00:57
we see we don't just get a prediction of dog or cat,
17
57940
2456
ne dobivamo samo predviđanja je li to pas ili mačka,
01:00
we actually get specific breed predictions.
18
60420
2336
nego čak i određena predviđanja pasmine.
01:02
That's the level of granularity we have now.
19
62780
2176
To je razina zrnatosti koju imamo sada.
01:04
And it's correct.
20
64980
1616
I to je točno.
01:06
My dog is in fact a malamute.
21
66620
1840
Moj pas je doista malamut.
01:08
So we've made amazing strides in image classification,
22
68860
4336
Napravili smo nevjerojatne pomake u klasifikaciji slike,
01:13
but what happens when we run our classifier
23
73220
2000
ali što se događa kad pokrenemo klasifikator
01:15
on an image that looks like this?
24
75244
1960
na sliku koja izgleda ovako?
01:18
Well ...
25
78900
1200
Dobro ...
01:24
We see that the classifier comes back with a pretty similar prediction.
26
84460
3896
Vidimo da je klasifikator vraća uz prilično slična predviđanja.
01:28
And it's correct, there is a malamute in the image,
27
88380
3096
I to je točno, na slici je malamut,
01:31
but just given this label, we don't actually know that much
28
91500
3696
ali samo s tom oznakom ne znamo mnogo
01:35
about what's going on in the image.
29
95220
1667
o tome što se događa na slici.
01:36
We need something more powerful.
30
96911
1560
Trebamo nešto snažnije.
01:39
I work on a problem called object detection,
31
99060
2616
Radim na problemu koji se zove otkrivanje objekta,
01:41
where we look at an image and try to find all of the objects,
32
101700
2936
gdje gledamo sliku i pokušavamo pronaći sve objekte,
01:44
put bounding boxes around them
33
104660
1456
staviti okvire oko njih
01:46
and say what those objects are.
34
106140
1520
i reći ono što ti predmeti su.
01:48
So here's what happens when we run a detector on this image.
35
108220
3280
Evo što se događa kad pokrenemo detektor na ovoj slici.
01:53
Now, with this kind of result,
36
113060
2256
Ovakvom vrstom rezultata
01:55
we can do a lot more with our computer vision algorithms.
37
115340
2696
možemo napraviti puno više s algoritmima računalnog vida.
01:58
We see that it knows that there's a cat and a dog.
38
118060
2976
Vidimo da zna da su tu mačka i pas.
02:01
It knows their relative locations,
39
121060
2256
Zna njihove relativne položaje,
02:03
their size.
40
123340
1216
njihovu veličinu.
02:04
It may even know some extra information.
41
124580
1936
Čak može znati neke dodatne informacije.
02:06
There's a book sitting in the background.
42
126540
1960
U pozadini je knjiga.
02:09
And if you want to build a system on top of computer vision,
43
129100
3256
Ako želite izgraditi sustav na osnovi računalnog vida,
02:12
say a self-driving vehicle or a robotic system,
44
132380
3456
recimo autonomno vozilo ili robotski sustav,
02:15
this is the kind of information that you want.
45
135860
2456
ovo je vrsta informacija koje želite.
02:18
You want something so that you can interact with the physical world.
46
138340
3239
Želite nešto da možete komunicirati s fizičkim svijetom.
02:22
Now, when I started working on object detection,
47
142579
2257
Kad sam počeo raditi na prepoznavanju objekata,
02:24
it took 20 seconds to process a single image.
48
144860
3296
trebalo je 20 sekundi za obradu jedne slike.
02:28
And to get a feel for why speed is so important in this domain,
49
148180
3880
A kako biste dobili osjećaj zašto je brzina ovdje tako važna,
02:32
here's an example of an object detector
50
152940
2536
evo primjera detektora objekta
02:35
that takes two seconds to process an image.
51
155500
2416
koji treba dvije sekunde za obradu slike.
02:37
So this is 10 times faster
52
157940
2616
Dakle ovo je 10 puta brže
02:40
than the 20-seconds-per-image detector,
53
160580
3536
od detektora kojem treba 20 sekundi po slici,
02:44
and you can see that by the time it makes predictions,
54
164140
2656
i možete vidjeti da se za vrijeme dok on učini predviđanja,
02:46
the entire state of the world has changed,
55
166820
2040
promijenilo čitavo stanje u svijetu,
02:49
and this wouldn't be very useful
56
169700
2416
i to ne bi bilo vrlo korisno
02:52
for an application.
57
172140
1416
za neku primjenu.
02:53
If we speed this up by another factor of 10,
58
173580
2496
Ako ovo gore ubrzamo još jednom za faktor 10,
02:56
this is a detector running at five frames per second.
59
176100
2816
to je detektor koji radi na pet sličica u sekundi.
02:58
This is a lot better,
60
178940
1536
To je puno bolje,
03:00
but for example,
61
180500
1976
ali, na primjer,
03:02
if there's any significant movement,
62
182500
2296
ako postoji bilo kakav značajan pokret,
03:04
I wouldn't want a system like this driving my car.
63
184820
2560
ne bih želio da sustav poput ovog vozi moj auto.
03:08
This is our detection system running in real time on my laptop.
64
188940
3240
Ovo je naš sustav otkrivanja u realnom vremenu na mom laptopu.
03:12
So it smoothly tracks me as I move around the frame,
65
192820
3136
Glatko me prati kako se krećem kroz kadar,
03:15
and it's robust to a wide variety of changes in size,
66
195980
3720
i otporan je na razne promjene veličine,
03:21
pose,
67
201260
1200
položaja,
03:23
forward, backward.
68
203100
1856
naprijed, natrag.
03:24
This is great.
69
204980
1216
Ovo je super.
03:26
This is what we really need
70
206220
1736
To je ono što stvarno trebamo
03:27
if we're going to build systems on top of computer vision.
71
207980
2896
ako ćemo graditi sustave na osnovi računalnog vida,
03:30
(Applause)
72
210900
4000
(Pljesak)
03:36
So in just a few years,
73
216100
2176
U samo nekoliko godina
03:38
we've gone from 20 seconds per image
74
218300
2656
došli smo od 20 sekundi po slici
03:40
to 20 milliseconds per image, a thousand times faster.
75
220980
3536
do 20 milisekundi po slici, tisuću puta brže.
03:44
How did we get there?
76
224540
1416
Kako smo došli dovde?
03:45
Well, in the past, object detection systems
77
225980
3016
Nekada su sustavi za otkrivanje predmeta
03:49
would take an image like this
78
229020
1936
uzimali sliku poput ove
03:50
and split it into a bunch of regions
79
230980
2456
i podijelili je na hrpu područja
03:53
and then run a classifier on each of these regions,
80
233460
3256
i zatim pokrenuli klasifikator na svakom od tih područja.
03:56
and high scores for that classifier
81
236740
2536
Visoki rezultati za taj klasifikator
03:59
would be considered detections in the image.
82
239300
3136
smatrali su se detekcijom u slici.
04:02
But this involved running a classifier thousands of times over an image,
83
242460
4056
No, to je značilo rad klasifikatora tisuće puta na slici,
04:06
thousands of neural network evaluations to produce detection.
84
246540
2920
tisuće procjena neuronskih mreža kako bi dobili detekciju.
04:11
Instead, we trained a single network to do all of detection for us.
85
251060
4536
Umjesto toga smo naučili jednu mrežu da učini sve detekcije za nas.
04:15
It produces all of the bounding boxes and class probabilities simultaneously.
86
255620
4280
Ona istodobno proizvodi sve okvire i klase vjerojatnosti.
04:20
With our system, instead of looking at an image thousands of times
87
260500
3496
S našim sustavom, umjesto da gledate sliku tisuće puta
04:24
to produce detection,
88
264020
1456
kako bi postigao detekciju,
04:25
you only look once,
89
265500
1256
gledate samo jednom,
04:26
and that's why we call it the YOLO method of object detection.
90
266780
2920
zato ga zovemo YOLO metoda za detekciju objekta.
04:31
So with this speed, we're not just limited to images;
91
271180
3976
Dakle, ovom brzinom nismo ograničeni samo na slike;
04:35
we can process video in real time.
92
275180
2416
možemo obraditi video u realnom vremenu.
04:37
And now, instead of just seeing that cat and dog,
93
277620
3096
Sad, umjesto da samo vidimo mačku i psa,
04:40
we can see them move around and interact with each other.
94
280740
2960
vidimo kako se kreću i međusobno komuniciraju.
04:46
This is a detector that we trained
95
286380
2056
To je detektor koji smo obučili
04:48
on 80 different classes
96
288460
4376
na 80 različitih klasa
04:52
in Microsoft's COCO dataset.
97
292860
3256
u Microsoftovoj zbirci podataka COCO.
04:56
It has all sorts of things like spoon and fork, bowl,
98
296140
3336
Ona ima svašta, poput žlice i vilice, zdjele,
04:59
common objects like that.
99
299500
1800
obične predmete poput tih.
05:02
It has a variety of more exotic things:
100
302180
3096
Ima raznih egzotičnijih stvari:
05:05
animals, cars, zebras, giraffes.
101
305300
3256
životinje, automobili, zebre, žirafe.
05:08
And now we're going to do something fun.
102
308580
1936
A sada idemo učiniti nešto zabavno.
05:10
We're just going to go out into the audience
103
310540
2096
Samo ćemo otići u publiku
05:12
and see what kind of things we can detect.
104
312660
2016
i vidjeti što možemo otkriti.
05:14
Does anyone want a stuffed animal?
105
314700
1620
Želi li tko plišanu životinju?
05:17
There are some teddy bears out there.
106
317820
1762
Tamo ima nekih medvjedića.
05:21
And we can turn down our threshold for detection a little bit,
107
321860
4536
Možemo malo smanjiti prag detekcije,
05:26
so we can find more of you guys out in the audience.
108
326420
3400
tako da možemo naći više vas u publici.
05:31
Let's see if we can get these stop signs.
109
331380
2336
Da vidimo možemo li dobiti ove znakove STOP.
05:33
We find some backpacks.
110
333740
1880
Nalazimo neke ruksake.
05:37
Let's just zoom in a little bit.
111
337700
1840
Zumirajmo samo malo.
05:42
And this is great.
112
342140
1256
I to je super.
05:43
And all of the processing is happening in real time
113
343420
3176
Sva obrada se događa u stvarnom vremenu
05:46
on the laptop.
114
346620
1200
na laptopu.
05:48
And it's important to remember
115
348900
1456
I to je važno zapamtiti
05:50
that this is a general purpose object detection system,
116
350380
3216
da je ovo sustav za detekciju objekta opće namjene,
05:53
so we can train this for any image domain.
117
353620
5000
možemo ga trenirati za bilo koju domenu.
06:00
The same code that we use
118
360140
2536
Isti kod koji koristimo
06:02
to find stop signs or pedestrians,
119
362700
2456
za pronaći znakove STOP ili pješake,
06:05
bicycles in a self-driving vehicle,
120
365180
1976
bicikle u autonomnim vozilima,
06:07
can be used to find cancer cells
121
367180
2856
može se koristiti kako bi pronašli stanice raka
06:10
in a tissue biopsy.
122
370060
3016
u biopsiji tkiva.
06:13
And there are researchers around the globe already using this technology
123
373100
4040
A znanstvenici diljem svijeta već koriste ovu tehnologiju
06:18
for advances in things like medicine, robotics.
124
378060
3416
za napredak u medicini, robotici.
06:21
This morning, I read a paper
125
381500
1376
Jutros sam pročitao članak
06:22
where they were taking a census of animals in Nairobi National Park
126
382900
4576
o popisu životinja u Nacionalnom parku Nairobi
06:27
with YOLO as part of this detection system.
127
387500
3136
koristeći YOLO u sustavu detekcije.
06:30
And that's because Darknet is open source
128
390660
3096
A to je zato što je Darknet open source,
06:33
and in the public domain, free for anyone to use.
129
393780
2520
u javnoj domeni, besplatan svakomu za korištenje.
06:37
(Applause)
130
397420
5696
(Pljesak)
06:43
But we wanted to make detection even more accessible and usable,
131
403140
4936
No, željeli smo napraviti detekciju još dostupnijom i korisnijom
06:48
so through a combination of model optimization,
132
408100
4056
pa smo kombinacijom optimizacije modela,
06:52
network binarization and approximation,
133
412180
2296
binarizacije mreže i aproksimacije
06:54
we actually have object detection running on a phone.
134
414500
3920
dobili detekciju objekata koja radi na mobitelu.
07:04
(Applause)
135
424620
5320
(Pljesak)
07:10
And I'm really excited because now we have a pretty powerful solution
136
430780
5056
A ja sam stvarno uzbuđen jer sada imamo moćno rješenje
07:15
to this low-level computer vision problem,
137
435860
2296
problema računalnog vida na osnovnoj razini,
07:18
and anyone can take it and build something with it.
138
438180
3856
i svatko ga može uzeti i graditi nešto njime.
07:22
So now the rest is up to all of you
139
442060
3176
Sad je sve ostalo do vas
07:25
and people around the world with access to this software,
140
445260
2936
i ljudi diljem svijeta s pristupom tom softveru,
07:28
and I can't wait to see what people will build with this technology.
141
448220
3656
jedva čekam vidjeti što će ljudi učiniti s ovom tehnologijom.
07:31
Thank you.
142
451900
1216
Hvala vam.
07:33
(Applause)
143
453140
3440
(Pljesak)
O ovoj web stranici

Ova stranica će vas upoznati s YouTube videozapisima koji su korisni za učenje engleskog jezika. Vidjet ćete lekcije engleskog koje vode vrhunski profesori iz cijelog svijeta. Dvaput kliknite na engleske titlove prikazane na svakoj video stranici da biste reproducirali video s tog mjesta. Titlovi se pomiču sinkronizirano s reprodukcijom videozapisa. Ako imate bilo kakvih komentara ili zahtjeva, obratite nam se putem ovog obrasca za kontakt.

https://forms.gle/WvT1wiN1qDtmnspy7