How computers learn to recognize objects instantly | Joseph Redmon

1,123,328 views ใƒป 2017-08-18

TED


์•„๋ž˜ ์˜๋ฌธ์ž๋ง‰์„ ๋”๋ธ”ํด๋ฆญํ•˜์‹œ๋ฉด ์˜์ƒ์ด ์žฌ์ƒ๋ฉ๋‹ˆ๋‹ค.

๋ฒˆ์—ญ: ํ˜œ๋ จ ์žฅ ๊ฒ€ํ† : Taz B K
00:12
Ten years ago,
0
12645
1151
์‹ญ๋…„ ์ „ ๋งŒํ•ด๋„
00:13
computer vision researchers thought that getting a computer
1
13820
2776
์ปดํ“จํ„ฐ ์‹œ๊ฐ ์—ฐ๊ตฌ์ž๋“ค์€
00:16
to tell the difference between a cat and a dog
2
16620
2696
๊ฐœ์™€ ๊ณ ์–‘์ด๋ฅผ ์ปดํ“จํ„ฐ๊ฐ€ ๊ตฌ๋ณ„ํ•ด ๋‚ด๋Š” ๊ฒƒ์€
00:19
would be almost impossible,
3
19340
1976
๊ฑฐ์˜ ๋ถˆ๊ฐ€๋Šฅํ•˜๋‹ค ์ƒ๊ฐํ–ˆ์Šต๋‹ˆ๋‹ค.
00:21
even with the significant advance in the state of artificial intelligence.
4
21340
3696
์•„๋ฌด๋ฆฌ ์ธ๊ณต์ง€๋Šฅ์ด ๋ฐœ์ „ํ•ด๋„ ๋ง์ด์ง€์š”.
00:25
Now we can do it at a level greater than 99 percent accuracy.
5
25060
3560
์ง€๊ธˆ์€ 99% ์ด์ƒ ์ •ํ™•ํ•˜๊ฒŒ ๊ทธ ์ผ์ด ๊ฐ€๋Šฅํ•œ๋ฐ,
00:29
This is called image classification --
6
29500
1856
์ด๊ฒƒ์„ '์ด๋ฏธ์ง€ ๋ถ„๋ฅ˜' ๋ผ๊ณ  ํ•ฉ๋‹ˆ๋‹ค.
00:31
give it an image, put a label to that image --
7
31380
3096
์ด๋ฏธ์ง€ ๋งˆ๋‹ค ์ด๋ฆ„ํ‘œ๋ฅผ ๋ถ™์—ฌ์ฃผ๋ฉด
00:34
and computers know thousands of other categories as well.
8
34500
3040
์ปดํ“จํ„ฐ๋Š” ์ˆ˜์ฒœ ๊ฐœ์˜ ๋‹ค๋ฅธ ์œ ํ˜•๊นŒ์ง€ ์•Œ์•„๋ƒ…๋‹ˆ๋‹ค.
00:38
I'm a graduate student at the University of Washington,
9
38500
2896
์ €๋Š” ์›Œ์‹ฑํ„ด ๋Œ€ํ•™๊ต์—์„œ ์„์‚ฌ๊ณผ์ •์„ ๋ฐŸ๊ณ  ์žˆ๊ณ 
00:41
and I work on a project called Darknet,
10
41420
1896
'๋‹คํฌ๋„ท' ์ด๋ผ ๋ถˆ๋ฆฌ๋Š” ํ”„๋กœ์ ํŠธ๋ฅผ ์—ฐ๊ตฌํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.
00:43
which is a neural network framework
11
43340
1696
์ผ์ข…์˜ ์‹ ๊ฒฝ๋ง ์ฒด์ œ์˜ ํ”„๋กœ๊ทธ๋žจ์ธ๋ฐ
00:45
for training and testing computer vision models.
12
45060
2816
์ปดํ“จํ„ฐ ์‹œ๊ฐ ๊ฒฌ๋ณธ์„ ๊ต์œกํ•˜๊ณ  ์‹คํ—˜ํ•˜๋Š”๋ฐ ์“ฐ์ž…๋‹ˆ๋‹ค.
00:47
So let's just see what Darknet thinks
13
47900
2976
์ž ์ด์ œ, '๋‹คํฌ๋„ท'์ด ์–ด๋–ค ์‹์œผ๋กœ
00:50
of this image that we have.
14
50900
1760
์ด ์ด๋ฏธ์ง€๋ฅผ ์ธ์‹ํ•˜๋Š”์ง€ ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.
00:54
When we run our classifier
15
54340
2336
์ง€๊ธˆ ์ด ์ด๋ฏธ์ง€์—
00:56
on this image,
16
56700
1216
์ €ํฌ๊ฐ€ ๊ฐœ๋ฐœํ•œ ์„ ๋ณ„๋ฒ•์„ ์ ์šฉํ•˜๋ฉด
00:57
we see we don't just get a prediction of dog or cat,
17
57940
2456
๋‹จ์ง€ ๊ฐœ ๋˜๋Š” ๊ณ ์–‘์ด์˜ ์˜ˆ์ธก ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ
01:00
we actually get specific breed predictions.
18
60420
2336
์ž์„ธํ•œ ์ข…๊นŒ์ง€๋„ ์•Œ์•„ ๋‚ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
01:02
That's the level of granularity we have now.
19
62780
2176
์ด๋ฏธ ์ด ์ •๋„๋กœ ์„ธ๋ฐ€ํ•œ ์ˆ˜์ค€์— ์˜ฌ๋ผ์™€ ์žˆ์Šต๋‹ˆ๋‹ค.
01:04
And it's correct.
20
64980
1616
์•„์ฃผ ์ •ํ™•ํ•˜๊ธฐ๊นŒ์ง€ ํ•ฉ๋‹ˆ๋‹ค.
01:06
My dog is in fact a malamute.
21
66620
1840
์ œ ๋ฐ˜๋ ค๊ฒฌ์€ ๋ง๋ผ๋ฎคํŠธ ์ž…๋‹ˆ๋‹ค.
01:08
So we've made amazing strides in image classification,
22
68860
4336
'์ด๋ฏธ์ง€ ์„ ๋ณ„๋ฒ•'์ด ์—„์ฒญ๋‚˜๊ฒŒ ๋ฐœ์ „์„ ํ•ด์™”๋Š”๋ฐ
01:13
but what happens when we run our classifier
23
73220
2000
์ด๋Ÿฐ ์ข…๋ฅ˜์˜ ์ด๋ฏธ์ง€์— ์ €ํฌ ์„ ๋ณ„๋ฒ•์„ ์ ์šฉ์‹œํ‚ค๋ฉด
01:15
on an image that looks like this?
24
75244
1960
๊ณผ์—ฐ ์–ด๋–ค ๊ฒฐ๊ณผ๊ฐ€ ๋‚˜์˜ฌ๊นŒ์š”?
01:18
Well ...
25
78900
1200
์ž...
01:24
We see that the classifier comes back with a pretty similar prediction.
26
84460
3896
๋Œ€๋žต ๋น„์Šทํ•œ ์˜ˆ์ธก์„ ํ•˜๋Š” ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
01:28
And it's correct, there is a malamute in the image,
27
88380
3096
๋งž์Šต๋‹ˆ๋‹ค, ์‚ฌ์ง„์— ๋ง๋ผ๋ฎคํŠธ๊ฐ€ ์žˆ์ฃ .
01:31
but just given this label, we don't actually know that much
28
91500
3696
ํ•˜์ง€๋งŒ ์ด ์ •๋„๋กœ๋Š” ์–ด๋–ค ์žฅ๋ฉด์ธ์ง€
01:35
about what's going on in the image.
29
95220
1667
๋งŽ์€ ๊ฒƒ์„ ์•Œ ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค.
01:36
We need something more powerful.
30
96911
1560
์ข€ ๋” ํšจ๊ณผ์ ์ธ ๊ฒƒ์ด ํ•„์š”ํ•˜๊ฒ ์ง€์š”.
01:39
I work on a problem called object detection,
31
99060
2616
์ €๋Š” ์ง€๊ธˆ '์‚ฌ๋ฌผ๊ฐ์ง€'๋ผ ๋ถˆ๋ฆฌ๋Š” ๋ฌธ์ œ์— ๋Œ€ํ•ด ์—ฐ๊ตฌํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.
01:41
where we look at an image and try to find all of the objects,
32
101700
2936
ํ•œ ์ด๋ฏธ์ง€ ์•ˆ์— ์žˆ๋Š” ๋ชจ๋“  ์‚ฌ๋ฌผ๋“ค์„ ์ฐพ์•„๋‚ด์„œ
01:44
put bounding boxes around them
33
104660
1456
ํ…Œ๋‘๋ฆฌ ์ƒ์ž๋ฅผ ์น˜๊ณ 
01:46
and say what those objects are.
34
106140
1520
๊ทธ๊ฒƒ์ด ๋ฌด์—‡์ธ์ง€ ๋งž์ถ”๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.
01:48
So here's what happens when we run a detector on this image.
35
108220
3280
์—ฌ๊ธฐ์— ๊ฐ์ง€๋ฒ•์„ ์ ์šฉํ•˜๋ฉด ์–ด๋–ป๊ฒŒ ๋˜๋Š”์ง€ ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.
01:53
Now, with this kind of result,
36
113060
2256
์ž, ์ด๋Ÿฐ ์‹์˜ ๊ฒฐ๊ณผ๋ผ๋ฉด
01:55
we can do a lot more with our computer vision algorithms.
37
115340
2696
์ปดํ“จํ„ฐ ์‹œ๊ฐ ์•Œ๊ณ ๋ฆฌ๋“ฌ์œผ๋กœ ๋” ๋งŽ์€ ๊ฒƒ์„ ํ•ด๋‚ผ ์ˆ˜ ์žˆ๊ฒ ๊ตฐ์š”.
01:58
We see that it knows that there's a cat and a dog.
38
118060
2976
์ด์ œ ์ด๋ฏธ์ง€ ์•ˆ์— ๊ณ ์–‘์ด์™€ ๊ฐœ๊ฐ€ ์žˆ๊ณ 
02:01
It knows their relative locations,
39
121060
2256
๋Œ€๋žต ๊ทธ๋“ค์˜ ์œ„์น˜
02:03
their size.
40
123340
1216
๊ทธ๋ฆฌ๊ณ  ํฌ๊ธฐ๊นŒ์ง€ ํŒŒ์•…ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.
02:04
It may even know some extra information.
41
124580
1936
๊ทธ์™ธ ๋‹ค๋ฅธ ์ •๋ณด๋“ค๊นŒ์ง€ ์•Œ๊ณ  ์žˆ์„์ง€๋„ ๋ชจ๋ฅด๊ฒ ๋„ค์š”.
02:06
There's a book sitting in the background.
42
126540
1960
์ € ๋’ค ์ชฝ์— ์ฑ… ํ•œ ๊ถŒ์ด ์žˆ๋„ค์š”.
02:09
And if you want to build a system on top of computer vision,
43
129100
3256
์ด ์ปดํ“จํ„ฐ ์‹œ๊ฐ์„ ์ด์šฉํ•ด์„œ ์–ด๋–ค ์‹œ์Šคํ…œ์„ ๊ฐœ๋ฐœํ•œ๋‹ค๋ฉด,
02:12
say a self-driving vehicle or a robotic system,
44
132380
3456
์ž์œจ์ฃผํ–‰ ์ž๋™์ฐจ๋‚˜ ๋กœ๋ด‡ ์‹œ์Šคํ…œ์ผํ…๋ฐ
02:15
this is the kind of information that you want.
45
135860
2456
๋ฐ”๋กœ ์ด๋Ÿฐ ๊ฒƒ๋“ค์ด ์—ฌ๋Ÿฌ๋ถ„๋“ค์ด ์›ํ•˜๋Š” ์ •๋ณด์ผ ๊ฒ๋‹ˆ๋‹ค.
02:18
You want something so that you can interact with the physical world.
46
138340
3239
๋ฌผ๋ฆฌ์  ์„ธ๊ณ„์™€ ๊ต๊ฐ์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•˜๋Š” ๊ฒƒ๋“ค ๋ง์ด์ง€์š”.
02:22
Now, when I started working on object detection,
47
142579
2257
์ž, ์ œ๊ฐ€ ์ฒ˜์Œ์œผ๋กœ '์‚ฌ๋ฌผ๊ฐ์ง€' ์—ฐ๊ตฌ์— ๋“ค์–ด๊ฐ”์„ ๋•Œ
02:24
it took 20 seconds to process a single image.
48
144860
3296
์ด๋ฏธ์ง€ ํ•˜๋‚˜๋ฅผ ์ฒ˜๋ฆฌํ•˜๋Š”๋ฐ 20์ดˆ๊ฐ€ ๊ฑธ๋ ธ์Šต๋‹ˆ๋‹ค.
02:28
And to get a feel for why speed is so important in this domain,
49
148180
3880
์ด ๋ถ„์•ผ์—์„œ ์™œ ์†๋„๊ฐ€ ์ค‘์š”ํ•œ์ง€ ์•Œ๊ณ  ์‹ถ๋‹ค๋ฉด
02:32
here's an example of an object detector
50
152940
2536
์—ฌ๊ธฐ ์‚ฌ๋ฌผ๊ฐ์ง€๊ธฐ๋Šฅ์˜ ํ•œ ์˜ˆ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.
02:35
that takes two seconds to process an image.
51
155500
2416
์ด๋ฏธ์ง€ ํ•˜๋‚˜๋ฅผ ์ฒ˜๋ฆฌํ•˜๋Š”๋ฐ 2์ดˆ ๋ฐ–์— ๊ฑธ๋ฆฌ์ง€ ์•Š์Šต๋‹ˆ๋‹ค.
02:37
So this is 10 times faster
52
157940
2616
20์ดˆ ์งœ๋ฆฌ ๊ฐ์ง€๊ธฐ๋Šฅ๋ณด๋‹ค๋Š”
02:40
than the 20-seconds-per-image detector,
53
160580
3536
10๋ฐฐ๋‚˜ ๋น ๋ฅธ ์†๋„์ด์ง€์š”.
02:44
and you can see that by the time it makes predictions,
54
164140
2656
๋ณด์‹œ๋Š” ๊ฒƒ ์ฒ˜๋Ÿผ, ์ด ๊ธฐ๋Šฅ์ด ์˜ˆ์ธก์„ ํ•˜๊ธฐ ์‹œ์ž‘ํ•  ๋•Œ๋ฉด
02:46
the entire state of the world has changed,
55
166820
2040
์ด๋ฏธ ๋ฒŒ์–ด์ง€๊ณ  ์ƒํ™ฉ์€ ๋ฐ”๋€Œ์–ด ์žˆ์„ ํ…Œ๋‹ˆ๊นŒ
02:49
and this wouldn't be very useful
56
169700
2416
์‘์šฉ ํ”„๋กœ๊ทธ๋žจ์œผ๋กœ๋Š”
02:52
for an application.
57
172140
1416
๋ณ„ ํšจ์šฉ์ด ์—†์„ ๊ฒ๋‹ˆ๋‹ค.
02:53
If we speed this up by another factor of 10,
58
173580
2496
๋งŒ์ผ 10๋ฐฐ๋ฅผ ๋” ๋น ๋ฅด๊ฒŒ ํ•œ๋‹ค๋ฉด
02:56
this is a detector running at five frames per second.
59
176100
2816
์ดˆ๋‹น ๋‹ค์„ฏ ์žฅ๋ฉด์„ ์ฒ˜๋ฆฌํ•˜๋Š” ๊ฐ์ง€๊ธฐ๋Šฅ์ด ๋ฉ๋‹ˆ๋‹ค.
02:58
This is a lot better,
60
178940
1536
ํ›จ์”ฌ ๋‚ซ์ฃ .
03:00
but for example,
61
180500
1976
ํ•˜์ง€๋งŒ ๋งŒ์ผ,
03:02
if there's any significant movement,
62
182500
2296
์—ฌ๊ธฐ์„œ ๋” ํฐ ๋ฐœ์ „์ด ์—†๋‹ค๋ฉด
03:04
I wouldn't want a system like this driving my car.
63
184820
2560
์ด ์ •๋„์˜ ์‹œ์Šคํ…œ์ด ์ œ ์ฐจ๋ฅผ ์šด์ „ํ•˜๊ธฐ๋ฅผ ์›์นœ ์•Š๊ฒ ์ง€์š”.
03:08
This is our detection system running in real time on my laptop.
64
188940
3240
์ด๊ฒƒ์ด ์ œ ๋…ธํŠธ๋ถ์—์„œ ์‹ค์‹œ๊ฐ„์œผ๋กœ ์ž‘๋™๋˜๊ณ  ์žˆ๋Š” ๊ฐ์ง€ ์‹œ์Šคํ…œ์ž…๋‹ˆ๋‹ค.
03:12
So it smoothly tracks me as I move around the frame,
65
192820
3136
์•„์ฃผ ๋ถ€๋“œ๋Ÿฝ๊ฒŒ ์ œ๊ฐ€ ํ‹€์•ˆ์—์„œ ์›€์ง์ด๋Š” ๋Œ€๋กœ ๋”ฐ๋ผ์˜ค์ฃ .
03:15
and it's robust to a wide variety of changes in size,
66
195980
3720
์•„๋ฌด ๋ฌธ์ œ๊ฐ€ ์—†์Šต๋‹ˆ๋‹ค. ๋‹ค์–‘ํ•œ ํฌ๊ธฐ
03:21
pose,
67
201260
1200
์ž์„ธ
03:23
forward, backward.
68
203100
1856
์•ž๋’ค ์›€์ง์ž„์—๋„
03:24
This is great.
69
204980
1216
ํ›Œ๋ฅญํ•˜์ฃ .
03:26
This is what we really need
70
206220
1736
์ด๋Ÿฐ ๊ฒƒ์ด ๋ฐ”๋กœ ์šฐ๋ฆฌ์—๊ฒŒ ํ•„์š”ํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค.
03:27
if we're going to build systems on top of computer vision.
71
207980
2896
์ปดํ“จํ„ฐ ์‹œ๊ฐ์„ ์ด์šฉํ•œ ์‹œ์Šคํ…œ์„ ๊ฐœ๋ฐœํ•  ๋•Œ ๋ง์ด์ง€์š”.
03:30
(Applause)
72
210900
4000
(๋ฐ•์ˆ˜)
03:36
So in just a few years,
73
216100
2176
๋ถˆ๊ณผ ๋ช‡๋…„ ๋งŒ์—
03:38
we've gone from 20 seconds per image
74
218300
2656
ํ•œ ์ด๋ฏธ์ง€๋ฅผ ์ฒ˜๋ฆฌํ•˜๋Š” ์‹œ๊ฐ„์ด 20์ดˆ์—์„œ
03:40
to 20 milliseconds per image, a thousand times faster.
75
220980
3536
500๋ถ„์˜ 1์ดˆ๋กœ, ์ฒœ๋ฐฐ๋‚˜ ๋นจ๋ผ์กŒ์Šต๋‹ˆ๋‹ค.
03:44
How did we get there?
76
224540
1416
์–ด๋–ป๊ฒŒ ๊ฐ€๋Šฅํ–ˆ์„๊นŒ์š”?
03:45
Well, in the past, object detection systems
77
225980
3016
๊ณผ๊ฑฐ์—๋Š”, ์‚ฌ๋ฌผ๊ฐ์ง€ ์‹œ์Šคํ…œ๋“ค์€
03:49
would take an image like this
78
229020
1936
์ด๋Ÿฐ ์ด๋ฏธ์ง€๋ฅผ ๊ฐ€์ง€๊ณ 
03:50
and split it into a bunch of regions
79
230980
2456
์—ฌ๋Ÿฌ ์˜์—ญ์œผ๋กœ ์ž˜๋ผ๋‚ด์„œ
03:53
and then run a classifier on each of these regions,
80
233460
3256
๊ฐ ์˜์—ญ ๋งˆ๋‹ค ์„ ๋ณ„์ž‘์—…์„ ์‹คํ–‰ํ•˜๊ณ 
03:56
and high scores for that classifier
81
236740
2536
๊ทธ ์„ ๋ณ„์ž‘์—…์—์„œ ์‚ฐ์ถœ๋œ ๊ฐ€์žฅ ๋†’์€ ์ ์ˆ˜๋“ค์ด
03:59
would be considered detections in the image.
82
239300
3136
์ด๋ฏธ์ง€์˜ ๊ฐ์ง€๋กœ ๊ฐ„์ฃผ๋˜๋Š” ๋ฐฉ์‹์ด์—ˆ์Šต๋‹ˆ๋‹ค.
04:02
But this involved running a classifier thousands of times over an image,
83
242460
4056
ํ•˜์ง€๋งŒ, ๊ฐ์ง€๋ฅผ ํ•˜๊ธฐ๊นŒ์ง€ ํ•œ ์ด๋ฏธ์ง€์— ์ˆ˜์ฒœ ๋ฒˆ์˜ ๋ถ„๋ฅ˜์ž‘์—…์ด
04:06
thousands of neural network evaluations to produce detection.
84
246540
2920
๋˜ ์ˆ˜์ฒœ ๋ฒˆ์˜ ์‹ ๊ฒฝ๋ง ๊ฐ์ •์„ ๊ฑฐ์ณ์•ผ ํ–ˆ์Šต๋‹ˆ๋‹ค.
04:11
Instead, we trained a single network to do all of detection for us.
85
251060
4536
๋Œ€์‹ ์—, ์šฐ๋ฆฌ๋Š” ๋‹จ์ผ ๋„คํŠธ์›Œํฌ๋กœ ๋ชจ๋“  ํƒ์ง€๊ฐ€ ๊ฐ€๋Šฅ์ผ€ ํ–ˆ์Šต๋‹ˆ๋‹ค.
04:15
It produces all of the bounding boxes and class probabilities simultaneously.
86
255620
4280
๋ชจ๋“  ํ…Œ๋‘๋ฆฌ ์ƒ์ž์™€ ๋ถ„๋ฅ˜ ๊ฐœ์—ฐ์„ฑ์„ ๋™์‹œ์— ์ฒ˜๋ฆฌํ•ด ๋‚ด๋Š” ๊ฒƒ์ด์ง€์š”.
04:20
With our system, instead of looking at an image thousands of times
87
260500
3496
์ €ํฌ ์‹œ์Šคํ…œ์—์„œ๋Š” ๊ฐ์ง€๋ฅผ ํ•ด๋‚ด๊ธฐ ์œ„ํ•ด
04:24
to produce detection,
88
264020
1456
ํ•œ ์ด๋ฏธ์ง€๋ฅผ ์ˆ˜์ฒœ ๋ฒˆ์ด ์•„๋‹ˆ๋ผ
04:25
you only look once,
89
265500
1256
๋‹จ ํ•œ ๋ฒˆ ๋ณด๋Š” ๊ฒƒ์œผ๋กœ ๊ฐ€๋Šฅํ•˜๊ณ 
04:26
and that's why we call it the YOLO method of object detection.
90
266780
2920
์ €ํฌ๊ฐ€ ์ด๊ฒƒ์„ ์‚ฌ๋ฌผ๊ฐ์ง€์˜ '์šœ๋กœ'๋ฒ• ์œผ๋กœ ๋ถ€๋ฅด๋Š” ์ด์œ ์ž…๋‹ˆ๋‹ค.
04:31
So with this speed, we're not just limited to images;
91
271180
3976
์ด ์†๋„๋กœ๋Š”, ์ด๋ฏธ์ง€ ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ
04:35
we can process video in real time.
92
275180
2416
๋™์˜์ƒ๋„ ์‹ค์‹œ๊ฐ„์œผ๋กœ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
04:37
And now, instead of just seeing that cat and dog,
93
277620
3096
์ด์ œ๋Š” ๋‹จ์ˆœํžˆ ๊ฐœ์™€ ๊ณ ์–‘์ด๋ฅผ ์ธ์ง€ํ•˜๋Š” ๊ฒƒ์„ ๋„˜์–ด์„œ
04:40
we can see them move around and interact with each other.
94
280740
2960
๊ทธ๋“ค์ด ๋Œ์•„๋‹ค๋‹ˆ๋Š” ๊ฒƒ๋„, ์„œ๋กœ ์–ด์šธ๋ฆฌ๋Š” ๊ฒƒ๋„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
04:46
This is a detector that we trained
95
286380
2056
์ด๊ฒƒ์ด ์ €ํฌ๊ฐ€ ๊ฐœ๋ฐœํ•ด๋‚ธ ๊ฐ์ง€๊ธฐ๋Šฅ์ž…๋‹ˆ๋‹ค.
04:48
on 80 different classes
96
288460
4376
๋งˆ์ดํฌ๋กœ์†Œํ”„ํŠธ์˜ ์ฝ”์ฝ” ๋ฐ์ดํ„ฐ ์„ธํŠธ ์•ˆ์—์„œ
04:52
in Microsoft's COCO dataset.
97
292860
3256
80๊ฐœ์˜ ๋“ฑ๊ธ‰์— ์ ์šฉ์‹œ์ผœ ์–ป์–ด๋‚ธ ๊ฒƒ์ด์ง€์š”.
04:56
It has all sorts of things like spoon and fork, bowl,
98
296140
3336
์ˆŸ๊ฐ€๋ฝ, ํฌํฌ, ๊ทธ๋ฆ‡ ๊ฐ™์ด ํ‰๋ฒ”ํ•œ ๋ฌผ๊ฑด๋“ค์ด
04:59
common objects like that.
99
299500
1800
๋‹ค์–‘ํ•˜๊ฒŒ ์žˆ๋„ค์š”.
05:02
It has a variety of more exotic things:
100
302180
3096
์ข€ ํŠน์ดํ•œ ๊ฒƒ๋“ค๋„ ๋ณด์ด์ง€์š”.
05:05
animals, cars, zebras, giraffes.
101
305300
3256
๋™๋ฌผ, ์ž๋™์ฐจ, ์–ผ๋ฃฉ๋ง, ๊ธฐ๋ฆฐ.
05:08
And now we're going to do something fun.
102
308580
1936
์žฌ๋ฏธ๋‚œ ๊ฑธ ํ•œ๋ฒˆ ํ•ด๋ณผ๊นŒ์š”.
05:10
We're just going to go out into the audience
103
310540
2096
๋ฐฉ์ฒญ์„์œผ๋กœ ๋“ค์–ด๊ฐ€์„œ
05:12
and see what kind of things we can detect.
104
312660
2016
์–ด๋–ค ๋ฌผ๊ฑด๋“ค์ด ๊ฐ์ง€๋˜๋Š”์ง€ ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.
05:14
Does anyone want a stuffed animal?
105
314700
1620
๋™๋ฌผ์ธํ˜• ๊ฐ–๊ณ  ์‹ถ์œผ์‹  ๋ถ„?
05:17
There are some teddy bears out there.
106
317820
1762
์ €๊ธฐ ๊ณฐ์ธํ˜•๋„ ๋ช‡๊ฐœ ์žˆ๋„ค์š”.
05:21
And we can turn down our threshold for detection a little bit,
107
321860
4536
๊ฐ์ง€ํ•œ๊ณ„์น˜๋ฅผ ์กฐ๊ธˆ ๋‚ฎ์ถ”๋ฉด,
05:26
so we can find more of you guys out in the audience.
108
326420
3400
๋” ๋งŽ์€ ๋ถ„๋“ค์ด ํ™”๋ฉด์— ์žกํžˆ๊ฒ ์ง€์š”.
05:31
Let's see if we can get these stop signs.
109
331380
2336
์ด ์ •์ง€ํ‘œ์ง€ํŒ๋“ค๋„ ์žก์•„๋‚ผ ์ˆ˜ ์žˆ๋Š”์ง€ ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.
05:33
We find some backpacks.
110
333740
1880
๋ฐฐ๋‚ญ๋„ ๋ช‡๊ฐœ ๋ณด์ด๋„ค์š”.
05:37
Let's just zoom in a little bit.
111
337700
1840
์กฐ๊ธˆ ๊ฐ€๊นŒ์ด ๋‹น๊ฒจ ๋ณด์ง€์š”.
05:42
And this is great.
112
342140
1256
์ข‹์Šต๋‹ˆ๋‹ค.
05:43
And all of the processing is happening in real time
113
343420
3176
์ด ๋ชจ๋“  ๊ฒƒ์ด ์ปดํ“จํ„ฐ์—์„œ ์‹ค์‹œ๊ฐ„์œผ๋กœ
05:46
on the laptop.
114
346620
1200
์ฒ˜๋ฆฌ๋˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.
05:48
And it's important to remember
115
348900
1456
๊ผญ ์•Œ์•„๋‘˜ ๊ฒƒ์€
05:50
that this is a general purpose object detection system,
116
350380
3216
์ด๊ฒƒ์ด ์ด๊ด„์ ์ธ ์‚ฌ๋ฌผ๊ฐ์ง€ ์‹œ์Šคํ…œ์ด๋ž€ ๊ฒƒ์ž…๋‹ˆ๋‹ค.
05:53
so we can train this for any image domain.
117
353620
5000
๊ทธ๋ž˜์•ผ ์–ด๋– ํ•œ ์ด๋ฏธ์ง€ ์ข…๋ฅ˜์—๋„ ์ ์šฉ์‹œํ‚ฌ ์ˆ˜ ์žˆ๊ฒ ์ง€์š”.
06:00
The same code that we use
118
360140
2536
๋™์ผํ•œ ์ฝ”๋“œ๊ฐ€
06:02
to find stop signs or pedestrians,
119
362700
2456
์ •์ง€ํ‘œ์ง€ํŒ ๋˜๋Š” ๋ณดํ–‰์ž
06:05
bicycles in a self-driving vehicle,
120
365180
1976
์ž์œจ์ฃผํ–‰ ์ž๋™์ฐจ ์•ˆ์˜ ์ž์ „๊ฑฐ๋“ค์„ ์ฐพ์•„๋‚ด๊ธฐ๋„ ํ•˜๊ณ 
06:07
can be used to find cancer cells
121
367180
2856
์กฐ์ง๊ฒ€์‚ฌ๋ฅผ ํ†ตํ•ด ์•”์„ธํฌ๋ฅผ
06:10
in a tissue biopsy.
122
370060
3016
์ฐพ์•„๋‚ผ ๋•Œ๋„ ์‚ฌ์šฉ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
06:13
And there are researchers around the globe already using this technology
123
373100
4040
์ด๋ฏธ ์„ธ๊ณ„ ๊ณณ๊ณณ์˜ ์—ฐ๊ตฌ์›๋“ค์ด ์ด ๊ธฐ์ˆ ์„
06:18
for advances in things like medicine, robotics.
124
378060
3416
์˜ํ•™๊ณผ ๋กœ๋ด‡๊ณตํ•™์˜ ๋ฐœ์ „ ๋“ฑ์— ์“ฐ๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.
06:21
This morning, I read a paper
125
381500
1376
์˜ค๋Š˜ ์•„์นจ ์‹ ๋ฌธ์—
06:22
where they were taking a census of animals in Nairobi National Park
126
382900
4576
๋‚˜์ด๋กœ๋น„ ๊ตญ๋ฆฝ๊ณต์›์˜ ๋™๋ฌผ ์ˆ˜ ์กฐ์‚ฌ์—
06:27
with YOLO as part of this detection system.
127
387500
3136
์šœ๋กœ๊ฐ€ ๊ฐ์ง€ ์‹œ์Šคํ…œ์˜ ์ผ๋ถ€๋กœ ์‚ฌ์šฉ๋œ๋‹ค๊ณ  ๋‚˜์™”๋”๊ตฐ์š”.
06:30
And that's because Darknet is open source
128
390660
3096
๋‹คํฌ๋„ท์ด ์˜คํ”ˆ์†Œ์Šค์ด๊ธฐ๋„ ํ•˜๊ณ 
06:33
and in the public domain, free for anyone to use.
129
393780
2520
๋ชจ๋‘๊ฐ€ ๋ฌด๋ฃŒ๋กœ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋„๋ก ์—ด๋ ค์žˆ๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.
06:37
(Applause)
130
397420
5696
(๋ฐ•์ˆ˜)
06:43
But we wanted to make detection even more accessible and usable,
131
403140
4936
๊ทธ๋Ÿฐ๋ฐ, ์ €ํฌ๋Š” ๊ฐ์ง€๊ธฐ๋Šฅ์˜ ์ ‘๊ทผ์„ฑ๊ณผ ์‚ฌ์šฉ์„ฑ์„ ๋” ๋†’์ด๊ณ  ์‹ถ์—ˆ๊ณ 
06:48
so through a combination of model optimization,
132
408100
4056
๊ฒฌ๋ณธ ์ตœ์ ํ™”
06:52
network binarization and approximation,
133
412180
2296
๋„คํŠธ์›Œํฌ ์ด์ง„ํ™”์™€ ๊ทผ์‚ฌ์น˜์˜ ์ ์ ˆํ•œ ์กฐํ™”๋ฅผ ํ†ตํ•ด์„œ
06:54
we actually have object detection running on a phone.
134
414500
3920
์ด์ œ ํœด๋Œ€์ „ํ™”์—์„œ๋„ ์‚ฌ๋ฌผ๊ฐ์ง€๊ฐ€ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ–ˆ์Šต๋‹ˆ๋‹ค.
07:04
(Applause)
135
424620
5320
(๋ฐ•์ˆ˜)
07:10
And I'm really excited because now we have a pretty powerful solution
136
430780
5056
์•„์ฃผ ํฅ๋ถ„๋˜๋Š”๋ฐ์š”. ์™œ๋ƒ๋ฉด ๊ธ‰์ด ๋‚ฎ์€ ์ปดํ“จํ„ฐ ์‹œ๊ฐ ๋ฌธ์ œ์ ๋“ค์„
07:15
to this low-level computer vision problem,
137
435860
2296
ํ•ด๊ฒฐํ•  ์•„์ฃผ ํšจ๊ณผ์ ์ธ ๋ฐฉ๋ฒ•์ด ์žˆ์œผ๋‹ˆ๊นŒ์š”.
07:18
and anyone can take it and build something with it.
138
438180
3856
๋ˆ„๊ตฌ๋‚˜ ์ด ๊ธฐ์ˆ ์„ ๊ฐ€์ง€๊ณ  ์›ํ•˜๋Š” ๊ฒƒ๋“ค์„ ๋งŒ๋“ค์–ด ๋‚ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
07:22
So now the rest is up to all of you
139
442060
3176
์ด์ œ ๋‚˜๋จธ์ง€๋Š” ์—ฌ๋Ÿฌ๋ถ„๋“ค์˜ ๋ชซ์ด๊ณ ์š”.
07:25
and people around the world with access to this software,
140
445260
2936
๋˜ ์ด ์†Œํ”„ํŠธ์›จ์–ด๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ์„ธ์ƒ์˜ ๋ชจ๋“  ๋ถ„๋“ค์˜ ๋ชซ์ž…๋‹ˆ๋‹ค.
07:28
and I can't wait to see what people will build with this technology.
141
448220
3656
์ด ๊ธฐ์ˆ ๋กœ ์‚ฌ๋žŒ๋“ค์ด ์–ด๋–ค ๊ฒƒ๋“ค์„ ๋งŒ๋“ค์–ด ๋‚ผ์ง€ ๋„ˆ๋ฌด ๊ธฐ๋Œ€๋ฉ๋‹ˆ๋‹ค.
07:31
Thank you.
142
451900
1216
๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค.
07:33
(Applause)
143
453140
3440
(๋ฐ•์ˆ˜)
์ด ์›น์‚ฌ์ดํŠธ ์ •๋ณด

์ด ์‚ฌ์ดํŠธ๋Š” ์˜์–ด ํ•™์Šต์— ์œ ์šฉํ•œ YouTube ๋™์˜์ƒ์„ ์†Œ๊ฐœํ•ฉ๋‹ˆ๋‹ค. ์ „ ์„ธ๊ณ„ ์ตœ๊ณ ์˜ ์„ ์ƒ๋‹˜๋“ค์ด ๊ฐ€๋ฅด์น˜๋Š” ์˜์–ด ์ˆ˜์—…์„ ๋ณด๊ฒŒ ๋  ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๊ฐ ๋™์˜์ƒ ํŽ˜์ด์ง€์— ํ‘œ์‹œ๋˜๋Š” ์˜์–ด ์ž๋ง‰์„ ๋”๋ธ” ํด๋ฆญํ•˜๋ฉด ๊ทธ๊ณณ์—์„œ ๋™์˜์ƒ์ด ์žฌ์ƒ๋ฉ๋‹ˆ๋‹ค. ๋น„๋””์˜ค ์žฌ์ƒ์— ๋งž์ถฐ ์ž๋ง‰์ด ์Šคํฌ๋กค๋ฉ๋‹ˆ๋‹ค. ์˜๊ฒฌ์ด๋‚˜ ์š”์ฒญ์ด ์žˆ๋Š” ๊ฒฝ์šฐ ์ด ๋ฌธ์˜ ์–‘์‹์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ฌธ์˜ํ•˜์‹ญ์‹œ์˜ค.

https://forms.gle/WvT1wiN1qDtmnspy7