How we teach computers to understand pictures | Fei Fei Li

1,175,905 views ・ 2015-03-23

TED


請雙擊下方英文字幕播放視頻。

譯者: Sailin Lu 審譯者: angie chen
00:14
Let me show you something.
0
14366
3738
容我為各位呈現一些照片
00:18
(Video) Girl: Okay, that's a cat sitting in a bed.
1
18104
4156
(影片)女孩:嗯,這是一隻貓,坐在床上。
00:22
The boy is petting the elephant.
2
22260
4040
這男孩在拍撫一隻象。
00:26
Those are people that are going on an airplane.
3
26300
4354
這些人要去搭飛機。
00:30
That's a big airplane.
4
30654
2810
好大的飛機。
主講人:這是由一位三歲的小孩
00:33
Fei-Fei Li: This is a three-year-old child
5
33464
2206
00:35
describing what she sees in a series of photos.
6
35670
3679
所描述她看到的一系列照片
00:39
She might still have a lot to learn about this world,
7
39349
2845
雖然對於這世界她還有更多要學習的地方,
00:42
but she's already an expert at one very important task:
8
42194
4549
但是她已經是其中一項重要技能的專家--
00:46
to make sense of what she sees.
9
46743
2846
為所見之聞賦予意義。
科技在我們的社會已進展到前所未有的程度:
00:50
Our society is more technologically advanced than ever.
10
50229
4226
00:54
We send people to the moon, we make phones that talk to us
11
54455
3629
我們把人送上月球、發明可以與人交談的電話,
00:58
or customize radio stations that can play only music we like.
12
58084
4946
或是客製一個電台,只播放個人喜歡的音樂。
01:03
Yet, our most advanced machines and computers
13
63030
4055
然而這台無比聰明的機器和電腦
01:07
still struggle at this task.
14
67085
2903
仍然無法發展這項技能,
01:09
So I'm here today to give you a progress report
15
69988
3459
因此今天我來到這裡向各位報告
01:13
on the latest advances in our research in computer vision,
16
73447
4047
我們在電腦視覺的最新研究進展,
01:17
one of the most frontier and potentially revolutionary
17
77494
4161
這是現階段在資訊業領域中,
最先進、最具潛力的革命性技術。
01:21
technologies in computer science.
18
81655
3206
01:24
Yes, we have prototyped cars that can drive by themselves,
19
84861
4551
是的,目前我們已經有自動駕駛的原型車,
01:29
but without smart vision, they cannot really tell the difference
20
89412
3853
但若不具備視覺辨識技術, 它將無法分辨同樣出現在馬路中,
01:33
between a crumpled paper bag on the road, which can be run over,
21
93265
3970
一團它其實輾過也無妨的破紙袋,
01:37
and a rock that size, which should be avoided.
22
97235
3340
以及一個大到它必須閃避的石塊, 兩者有何不同。
我們製造出畫素極高的相機,
01:41
We have made fabulous megapixel cameras,
23
101415
3390
01:44
but we have not delivered sight to the blind.
24
104805
3135
但我們卻無法賦予盲人視覺;
無人機可以翻山越嶺,
01:48
Drones can fly over massive land,
25
108420
3305
01:51
but don't have enough vision technology
26
111725
2134
卻沒有足夠的視覺技術可以
01:53
to help us to track the changes of the rainforests.
27
113859
3461
讓我們追蹤雨林的變化;
01:57
Security cameras are everywhere,
28
117320
2950
監視器滿佈在各個角落,
02:00
but they do not alert us when a child is drowning in a swimming pool.
29
120270
5067
卻無法在看到一個孩子將溺斃在泳池之際, 對我們發出警訊。
靜態及動態影像已逐漸與全世界的生活密不可分,
02:06
Photos and videos are becoming an integral part of global life.
30
126167
5595
02:11
They're being generated at a pace that's far beyond what any human,
31
131762
4087
它們發展的步伐已經遠遠超越人類
02:15
or teams of humans, could hope to view,
32
135849
2783
及其群體所相信的,
02:18
and you and I are contributing to that at this TED.
33
138632
3921
在座各位以及我自己 都是TED這個活動裡頭的推手。
02:22
Yet our most advanced software is still struggling at understanding
34
142553
5232
然而,目前最先進的軟體卻仍在其中苦苦掙扎,
無法理解與應用這龐大的資料體。
02:27
and managing this enormous content.
35
147785
3876
02:31
So in other words, collectively as a society,
36
151661
5272
換而言之,在這整個社會裡,
02:36
we're very much blind,
37
156933
1746
大家都有如盲人在運作,
02:38
because our smartest machines are still blind.
38
158679
3387
因為連我們最聰明的機器都還看不見。
02:43
"Why is this so hard?" you may ask.
39
163526
2926
或許有人會問:這到底有什麼困難?
02:46
Cameras can take pictures like this one
40
166452
2693
任何相機都可以產生像這樣的照片,
02:49
by converting lights into a two-dimensional array of numbers
41
169145
3994
它是藉由將有色光轉換成2D的數字陣列,
02:53
known as pixels,
42
173139
1650
也就是大家熟知的像素。
02:54
but these are just lifeless numbers.
43
174789
2251
但這些數字是死的,
02:57
They do not carry meaning in themselves.
44
177040
3111
並沒有被賦予意義。
03:00
Just like to hear is not the same as to listen,
45
180151
4343
就好像有「聽」,不代表有「到」。
03:04
to take pictures is not the same as to see,
46
184494
4040
同樣地,攝取到影像不等於看見,
03:08
and by seeing, we really mean understanding.
47
188534
3829
我們所認知的看到,應包含著了解其中的意義。
03:13
In fact, it took Mother Nature 540 million years of hard work
48
193293
6177
事實上,這樣的成果, 是大自然花了五億四千萬年的光陰
03:19
to do this task,
49
199470
1973
才得到的。
03:21
and much of that effort
50
201443
1881
這其中的努力,
03:23
went into developing the visual processing apparatus of our brains,
51
203324
5271
泰半是耗費在發展腦部的視覺處理這個區塊,
03:28
not the eyes themselves.
52
208595
2647
而不是眼睛的部分。
03:31
So vision begins with the eyes,
53
211242
2747
也就是說,視覺始於眼睛,
03:33
but it truly takes place in the brain.
54
213989
3518
但真正使它有用的,卻是大腦。
03:38
So for 15 years now, starting from my Ph.D. at Caltech
55
218287
5060
十五年來,從在加州理工學院攻讀博士開始,
03:43
and then leading Stanford's Vision Lab,
56
223347
2926
到領導史丹佛的視覺實驗室,
03:46
I've been working with my mentors, collaborators and students
57
226273
4396
我和指導教授、同事及學生們,
03:50
to teach computers to see.
58
230669
2889
試圖讓電腦擁有智能之眼,
03:54
Our research field is called computer vision and machine learning.
59
234658
3294
我們研究的領域稱之為電腦視覺與機器學習,
03:57
It's part of the general field of artificial intelligence.
60
237952
3878
這是人工智慧其中一環。
04:03
So ultimately, we want to teach the machines to see just like we do:
61
243000
5493
我們的終極目標就是教導機器能夠像人一樣理解所見之物,
04:08
naming objects, identifying people, inferring 3D geometry of things,
62
248493
5387
像是識別物品、辨認人臉、 推論物體的幾何形態,
04:13
understanding relations, emotions, actions and intentions.
63
253880
5688
進而理解其中的關聯、情緒、動作及意圖。
04:19
You and I weave together entire stories of people, places and things
64
259568
6153
在座每一位和我,都可以在匆匆一瞥的瞬間,
理解到人事、地、物所交織而成的網絡,
04:25
the moment we lay our gaze on them.
65
265721
2164
04:28
The first step towards this goal is to teach a computer to see objects,
66
268955
5583
要電腦達成這個目標的第一步,就是教導它辨別物品,
04:34
the building block of the visual world.
67
274538
3368
這是視覺的基石。
04:37
In its simplest terms, imagine this teaching process
68
277906
4434
簡單來說,我們教導的方法就是
04:42
as showing the computers some training images
69
282340
2995
給電腦看一些特定物體的影像,
04:45
of a particular object, let's say cats,
70
285335
3321
例如貓咪。
04:48
and designing a model that learns from these training images.
71
288656
4737
我們設計了一個程式讓電腦利用這些影像來學習
04:53
How hard can this be?
72
293393
2044
這有啥困難?
04:55
After all, a cat is just a collection of shapes and colors,
73
295437
4052
貓咪不就是由一些幾何圖形和顏色所組成的嘛,
04:59
and this is what we did in the early days of object modeling.
74
299489
4086
這就是我們初期所做的物體模型。
05:03
We'd tell the computer algorithm in a mathematical language
75
303575
3622
我們用數學語言來告知電腦演繹方法,
05:07
that a cat has a round face, a chubby body,
76
307197
3343
貓就是有圓圓的臉、胖胖的身體,
05:10
two pointy ears, and a long tail,
77
310540
2299
兩個尖尖的耳朵和一條長尾巴。
05:12
and that looked all fine.
78
312839
1410
看起來很好啊,
05:14
But what about this cat?
79
314859
2113
但如果貓咪長這樣呢?
05:16
(Laughter)
80
316972
1091
(觀眾笑)
05:18
It's all curled up.
81
318063
1626
全身都捲起來了。
05:19
Now you have to add another shape and viewpoint to the object model.
82
319689
4719
這下子我們又得在原來的模型 加上新的形狀和不同的視野角度。
05:24
But what if cats are hidden?
83
324408
1715
又,如果貓咪是躲著的呢?
05:27
What about these silly cats?
84
327143
2219
像這群傻貓?
05:31
Now you get my point.
85
331112
2417
這樣各位了解我的意思嗎?
05:33
Even something as simple as a household pet
86
333529
3367
即使簡單如貓這樣的家庭寵物,
05:36
can present an infinite number of variations to the object model,
87
336896
4504
也會有相對於原型以外,無數的其他形態表徵,
05:41
and that's just one object.
88
341400
2233
而這只是其中一樣。
05:44
So about eight years ago,
89
344573
2492
因此八年前,
05:47
a very simple and profound observation changed my thinking.
90
347065
5030
一項極其簡單和深刻的觀察,改變了我的想法,
05:53
No one tells a child how to see,
91
353425
2685
沒有人教導孩子如何去「看」,
05:56
especially in the early years.
92
356110
2261
特別是在早期發育階段,
05:58
They learn this through real-world experiences and examples.
93
358371
5000
他們是從真實世界的經驗中學習。
06:03
If you consider a child's eyes
94
363371
2740
如果你把孩童的眼睛
06:06
as a pair of biological cameras,
95
366111
2554
當成生物相機的概念,
06:08
they take one picture about every 200 milliseconds,
96
368665
4180
就如同每200毫秒就拍一張照片一樣,
06:12
the average time an eye movement is made.
97
372845
3134
這是眼球移動的平均時間。
06:15
So by age three, a child would have seen hundreds of millions of pictures
98
375979
5550
年紀到了三歲時, 孩子們已經看過了真實世界中
數以百萬計的照片,
06:21
of the real world.
99
381529
1834
06:23
That's a lot of training examples.
100
383363
2280
這樣的訓練範例是很大量的。
06:26
So instead of focusing solely on better and better algorithms,
101
386383
5989
因此,我的直覺告訴我 應該以孩童的學習經驗法則,
06:32
my insight was to give the algorithms the kind of training data
102
392372
5272
並兼以質與量,
提供訓練的資料給電腦,
06:37
that a child was given through experiences
103
397644
3319
06:40
in both quantity and quality.
104
400963
3878
而非一昧追求更好的程式演算。
06:44
Once we know this,
105
404841
1858
有了上述的洞見,
06:46
we knew we needed to collect a data set
106
406699
2971
我們接下來必須要收集
06:49
that has far more images than we have ever had before,
107
409670
4459
前所未有的大量資料群,
06:54
perhaps thousands of times more,
108
414129
2577
甚至於是千倍以上的。
06:56
and together with Professor Kai Li at Princeton University,
109
416706
4111
於是我與普林斯頓大學的李凱教授
07:00
we launched the ImageNet project in 2007.
110
420817
4752
共同於2007年開始了 我們稱之為 ImageNet 的專案。
07:05
Luckily, we didn't have to mount a camera on our head
111
425569
3838
很幸運地,我們不必在頭上綁一個相機,
07:09
and wait for many years.
112
429407
1764
然後花費數年收集影像,
07:11
We went to the Internet,
113
431171
1463
而是轉而由網際網路,
07:12
the biggest treasure trove of pictures that humans have ever created.
114
432634
4436
這個由人類所創造出來 龐大的影像寶窟,
07:17
We downloaded nearly a billion images
115
437070
3041
我們下載了數以百萬計的影像,
07:20
and used crowdsourcing technology like the Amazon Mechanical Turk platform
116
440111
5880
並且使用如Amazon Mechanical Turk 這樣的群眾外包平台,
07:25
to help us to label these images.
117
445991
2339
來協助我們處理及分類這些照片。
07:28
At its peak, ImageNet was one of the biggest employers
118
448330
4900
在高峰期,ImageNet 甚至是整個亞馬遜平台
07:33
of the Amazon Mechanical Turk workers:
119
453230
2996
最大的雇主之一,
07:36
together, almost 50,000 workers
120
456226
3854
我們一共聘請了來自167個國家,
07:40
from 167 countries around the world
121
460080
4040
約5萬個工作者,
07:44
helped us to clean, sort and label
122
464120
3947
來協助我們分類處理並標示
07:48
nearly a billion candidate images.
123
468067
3575
將近10億幅影像,
07:52
That was how much effort it took
124
472612
2653
花費了這麼多的資源,
07:55
to capture even a fraction of the imagery
125
475265
3900
就是為了捕捉那一絲絲
07:59
a child's mind takes in in the early developmental years.
126
479165
4171
孩童在早期心智發展的浮光掠影。
08:04
In hindsight, this idea of using big data
127
484148
3902
用現在眼光看來,使用大量的資料
08:08
to train computer algorithms may seem obvious now,
128
488050
4550
來訓練電腦演算是明顯合理的,
08:12
but back in 2007, it was not so obvious.
129
492600
4110
然而在2007年的世界卻非如此。
08:16
We were fairly alone on this journey for quite a while.
130
496710
3878
有好長一段時間, 我們在這個旅途中孤獨地踽踽而行,
08:20
Some very friendly colleagues advised me to do something more useful for my tenure,
131
500588
5003
有些同事好心地建議我, 與其苦苦掙扎於研究經費的募集,
08:25
and we were constantly struggling for research funding.
132
505591
4342
還不如轉而先做些比較好拿到終身聘的研究,
08:29
Once, I even joked to my graduate students
133
509933
2485
我還曾跟我的研究生開玩笑說
08:32
that I would just reopen my dry cleaner's shop to fund ImageNet.
134
512418
4063
我乾脆再開一間乾洗店來資助ImageNet 好了,
08:36
After all, that's how I funded my college years.
135
516481
4761
畢竟那就是我用以支付大學學費的方法。
08:41
So we carried on.
136
521242
1856
就這樣我們還是繼續往前走,
08:43
In 2009, the ImageNet project delivered
137
523098
3715
2009年起,ImageNet 已經是個擁有
08:46
a database of 15 million images
138
526813
4042
涵蓋了兩萬兩千種不同類別,
08:50
across 22,000 classes of objects and things
139
530855
4805
多達150億幅圖像的資料庫,
08:55
organized by everyday English words.
140
535660
3320
並組織以英語日常生活用字為主,
08:58
In both quantity and quality,
141
538980
2926
這樣的規模,不論是「質」或「量」
09:01
this was an unprecedented scale.
142
541906
2972
都是史無前例的。
09:04
As an example, in the case of cats,
143
544878
3461
用貓來舉個例子說明,
09:08
we have more than 62,000 cats
144
548339
2809
我們有超過六萬兩千種
09:11
of all kinds of looks and poses
145
551148
4110
不同外觀和姿勢的貓咪,
09:15
and across all species of domestic and wild cats.
146
555258
5223
橫跨不同的種類,有家貓,也有野貓。
09:20
We were thrilled to have put together ImageNet,
147
560481
3344
ImageNet 的成果讓我們非常激動,
09:23
and we wanted the whole research world to benefit from it,
148
563825
3738
我們希望它有助於全世界的研究,
09:27
so in the TED fashion, we opened up the entire data set
149
567563
4041
就如同 TED 的貢獻,我們免費提供整個資料庫
09:31
to the worldwide research community for free.
150
571604
3592
給全世界的研究單位。
(觀眾鼓掌)
09:36
(Applause)
151
576636
4000
09:41
Now that we have the data to nourish our computer brain,
152
581416
4538
有了這些資料,我們可以教育我們的電腦,
09:45
we're ready to come back to the algorithms themselves.
153
585954
3737
下一步就是回到程式演算的部分了。
09:49
As it turned out, the wealth of information provided by ImageNet
154
589691
5178
結果我們發現,ImageNet 所提供的豐富資訊
09:54
was a perfect match to a particular class of machine learning algorithms
155
594869
4806
恰巧與機器學習演算的其中一門特定領域 不謀而合,
09:59
called convolutional neural network,
156
599675
2415
我們稱它為「卷積神經網絡」,
10:02
pioneered by Kunihiko Fukushima, Geoff Hinton, and Yann LeCun
157
602090
5248
在七零及八零年代,福島邦彥、Geoff Hinton
10:07
back in the 1970s and '80s.
158
607338
3645
和 Yann LeCun 等學者為該領域的先驅。
10:10
Just like the brain consists of billions of highly connected neurons,
159
610983
5619
正如同大腦是由無數個緊密連結的神經元所組成,
10:16
a basic operating unit in a neural network
160
616602
3854
神經網絡的基本運作單位
10:20
is a neuron-like node.
161
620456
2415
也是一個類神經元的節點。
10:22
It takes input from other nodes
162
622871
2554
它的運作方式是從別的節點得到資料,
10:25
and sends output to others.
163
625425
2718
然後再傳給其他的節點。
10:28
Moreover, these hundreds of thousands or even millions of nodes
164
628143
4713
而且這些數不清的節點
10:32
are organized in hierarchical layers,
165
632856
3227
擁有層層的組織架構,
10:36
also similar to the brain.
166
636083
2554
就好像我們的大腦一樣。
10:38
In a typical neural network we use to train our object recognition model,
167
638637
4783
在一般的神經網絡中, 我們用作訓練的物品辨識模型
10:43
it has 24 million nodes,
168
643420
3181
就有兩千四百萬個節點、
10:46
140 million parameters,
169
646601
3297
一億四千萬個參數,
10:49
and 15 billion connections.
170
649898
2763
以及一百五十億個連結。
10:52
That's an enormous model.
171
652661
2415
這是一個大的不得了的模型。
10:55
Powered by the massive data from ImageNet
172
655076
3901
由ImageNet 提供巨大的資料群、
10:58
and the modern CPUs and GPUs to train such a humongous model,
173
658977
5433
並使用先進的核心處理器及圖型處理器來訓練 這個龐然大物,
11:04
the convolutional neural network
174
664410
2369
卷積神經網絡就在眾人的意料外
11:06
blossomed in a way that no one expected.
175
666779
3436
開花結果了。
11:10
It became the winning architecture
176
670215
2508
在物品辨識領域中,這樣的架構
11:12
to generate exciting new results in object recognition.
177
672723
5340
以令人興奮的嶄新成果,傲視群雄。
11:18
This is a computer telling us
178
678063
2810
電腦告訴我們
11:20
this picture contains a cat
179
680873
2300
這張圖中有隻貓,
11:23
and where the cat is.
180
683173
1903
還告訴我們貓在哪裡。
11:25
Of course there are more things than cats,
181
685076
2112
當然,這世界不會只有貓,
11:27
so here's a computer algorithm telling us
182
687188
2438
電腦的演算告訴我們
11:29
the picture contains a boy and a teddy bear;
183
689626
3274
這張圖中有一個男孩和一隻泰迪熊;
11:32
a dog, a person, and a small kite in the background;
184
692900
4366
有狗,一個人,以及背景中的一支小風箏;
11:37
or a picture of very busy things
185
697266
3135
或這一張令人眼花撩亂的圖,
11:40
like a man, a skateboard, railings, a lampost, and so on.
186
700401
4644
有人、滑板、欄杆、路燈,等等。
11:45
Sometimes, when the computer is not so confident about what it sees,
187
705045
5293
有時候,如果電腦不確定自己所見到的東西時,
11:51
we have taught it to be smart enough
188
711498
2276
我們已經將它教到可以聰明地
11:53
to give us a safe answer instead of committing too much,
189
713774
3878
給一個安全的答案,而非莽撞地回答,
11:57
just like we would do,
190
717652
2811
就像一般人會做的。
12:00
but other times our computer algorithm is remarkable at telling us
191
720463
4666
更有些時候,電腦的運算竟能夠
12:05
what exactly the objects are,
192
725129
2253
精準地辨別物體品項
12:07
like the make, model, year of the cars.
193
727382
3436
例如製造商、型號、車子的年份。
12:10
We applied this algorithm to millions of Google Street View images
194
730818
5386
Google 將這個演算程式廣泛地運用在
12:16
across hundreds of American cities,
195
736204
3135
數百個美國城市的街景裡,
12:19
and we have learned something really interesting:
196
739339
2926
也因此我們從中得到了一些有趣的概念。
12:22
first, it confirmed our common wisdom
197
742265
3320
首先,它證實了一項廣為人知的說法,
12:25
that car prices correlate very well
198
745585
3290
也就是汽車價格和家庭收入
12:28
with household incomes.
199
748875
2345
是息息相關的。
12:31
But surprisingly, car prices also correlate well
200
751220
4527
然而令人驚訝的是,汽車價格也和
12:35
with crime rates in cities,
201
755747
2300
城市中的犯罪率
12:39
or voting patterns by zip codes.
202
759007
3963
以及區域選舉模式,有相當的關係。
12:44
So wait a minute. Is that it?
203
764060
2206
等等,難道說我今天
12:46
Has the computer already matched or even surpassed human capabilities?
204
766266
5153
就是來告訴各位電腦已經趕上 甚至超越人類了嗎?
12:51
Not so fast.
205
771419
2138
還早得很呢。
12:53
So far, we have just taught the computer to see objects.
206
773557
4923
到目前為止,我們只是教導電腦識別物品,
12:58
This is like a small child learning to utter a few nouns.
207
778480
4644
就像小孩子牙牙學語一樣,
13:03
It's an incredible accomplishment,
208
783124
2670
雖然這是個傲人的進展,
13:05
but it's only the first step.
209
785794
2460
但它不過是第一步而已,
13:08
Soon, another developmental milestone will be hit,
210
788254
3762
很快地,下一波具指標性的後浪就會打上來了,
13:12
and children begin to communicate in sentences.
211
792016
3461
小孩子開始進展到用句子來溝通。
13:15
So instead of saying this is a cat in the picture,
212
795477
4224
因此,他已經不會用「這是貓」 來描述圖片,
13:19
you already heard the little girl telling us this is a cat lying on a bed.
213
799701
5202
而是會聽到這個小女孩說「這是躺在床上的貓」。
13:24
So to teach a computer to see a picture and generate sentences,
214
804903
5595
因此,要教導電腦看到圖並說出句子,
13:30
the marriage between big data and machine learning algorithm
215
810498
3948
必須進一步地仰賴龐大資料群
13:34
has to take another step.
216
814446
2275
以及機器的學習演算。
13:36
Now, the computer has to learn from both pictures
217
816721
4156
現在,電腦不僅要學習圖片識別,
13:40
as well as natural language sentences
218
820877
2856
還要學習人類自然的
13:43
generated by humans.
219
823733
3322
說話方式。
13:47
Just like the brain integrates vision and language,
220
827055
3853
就如同大腦要結合視覺和語言一樣,
13:50
we developed a model that connects parts of visual things
221
830908
5201
我們做出了一個模型, 它可以連結不同的可視物體,
13:56
like visual snippets
222
836109
1904
就像視覺片段一樣,
13:58
with words and phrases in sentences.
223
838013
4203
並附上句子用的字詞和片語。
14:02
About four months ago,
224
842216
2763
約四個月前,
14:04
we finally tied all this together
225
844979
2647
我們終於把所有的元素全部兜起來了,
14:07
and produced one of the first computer vision models
226
847626
3784
做出了第一個電腦版的模型,
14:11
that is capable of generating a human-like sentence
227
851410
3994
它有辦法在初次看到照片時
14:15
when it sees a picture for the first time.
228
855404
3506
說出像人類般自然的句子,
14:18
Now, I'm ready to show you what the computer says
229
858910
4644
好,現在我要給各位看看電腦
14:23
when it sees the picture
230
863554
1975
對於演講一開頭
14:25
that the little girl saw at the beginning of this talk.
231
865529
3830
那位小女孩所看到的影像, 它又是如何理解的。
14:31
(Video) Computer: A man is standing next to an elephant.
232
871519
3344
(電腦) 有個人站在大象旁邊。
14:36
A large airplane sitting on top of an airport runway.
233
876393
3634
一架大飛機停在機場跑道上。
14:41
FFL: Of course, we're still working hard to improve our algorithms,
234
881057
4212
(主講人) 當然,我們仍戮力於改善這電腦程式,
14:45
and it still has a lot to learn.
235
885269
2596
它還有很多要學。
14:47
(Applause)
236
887865
2291
(觀眾鼓掌)
14:51
And the computer still makes mistakes.
237
891556
3321
電腦還是會犯錯。
14:54
(Video) Computer: A cat lying on a bed in a blanket.
238
894877
3391
(電腦) 一隻貓包著毯子躺在床上。
14:58
FFL: So of course, when it sees too many cats,
239
898268
2553
(主講人) 因為它看了太多貓了,
15:00
it thinks everything might look like a cat.
240
900821
2926
以至於它見到了什麼都像貓咪。
15:05
(Video) Computer: A young boy is holding a baseball bat.
241
905317
2864
(電腦) 一位小男孩握著一支球棒。
15:08
(Laughter)
242
908181
1765
(觀眾笑)
15:09
FFL: Or, if it hasn't seen a toothbrush, it confuses it with a baseball bat.
243
909946
4583
(主講人) 或者,如果電腦是第一次看到牙刷, 會把它與球棒混淆。
15:15
(Video) Computer: A man riding a horse down a street next to a building.
244
915309
3434
(電腦) 一個人在建築物旁的街道上騎馬。
15:18
(Laughter)
245
918743
2023
(觀眾笑)
15:20
FFL: We haven't taught Art 101 to the computers.
246
920766
3552
(主講人) 我們還沒讓電腦上基礎美術課。
15:25
(Video) Computer: A zebra standing in a field of grass.
247
925768
2884
(電腦) 一匹斑馬站在原野中。
15:28
FFL: And it hasn't learned to appreciate the stunning beauty of nature
248
928652
3367
(主講人) 電腦還沒辦法像人類一樣,
15:32
like you and I do.
249
932019
2438
學會欣賞大自然的美景。
15:34
So it has been a long journey.
250
934457
2832
這是條漫漫長路,
15:37
To get from age zero to three was hard.
251
937289
4226
要從零歲發展到三歲是很難的,
15:41
The real challenge is to go from three to 13 and far beyond.
252
941515
5596
更艱深的挑戰在於從三歲發展到十三歲, 甚至到更遠的階段。
15:47
Let me remind you with this picture of the boy and the cake again.
253
947111
4365
讓我用這張男孩與蛋糕的圖片來進一步說明,
15:51
So far, we have taught the computer to see objects
254
951476
4064
直到今日,我們已經教會了電腦識別物品,
15:55
or even tell us a simple story when seeing a picture.
255
955540
4458
甚至於在看到一張圖後,可以簡單地敘述。
15:59
(Video) Computer: A person sitting at a table with a cake.
256
959998
3576
(電腦) 一個人和蛋糕坐在桌旁。
16:03
FFL: But there's so much more to this picture
257
963574
2630
(主講人) 這張照片其實蘊涵著更多的東西,
16:06
than just a person and a cake.
258
966204
2270
不僅只有人和蛋糕。
16:08
What the computer doesn't see is that this is a special Italian cake
259
968474
4467
電腦看不出這是種特別的義式蛋糕,
16:12
that's only served during Easter time.
260
972941
3217
人們只有在復活節時才會做。
16:16
The boy is wearing his favorite t-shirt
261
976158
3205
這個男孩穿著他最心愛的T恤,
16:19
given to him as a gift by his father after a trip to Sydney,
262
979363
3970
是去雪梨玩的時候,他的父親送的,
16:23
and you and I can all tell how happy he is
263
983333
3808
各位和我都可以看得出他有多快樂,
16:27
and what's exactly on his mind at that moment.
264
987141
3203
以及當時他的心裡在想什麼。
16:31
This is my son Leo.
265
991214
3125
這是我兒子,李奧。
16:34
On my quest for visual intelligence,
266
994339
2624
在探索智能視覺的旅途上,
16:36
I think of Leo constantly
267
996963
2391
我不斷地想到他,
16:39
and the future world he will live in.
268
999354
2903
以及他在將來生活的世界,
16:42
When machines can see,
269
1002257
2021
當未來,機器有了視覺,
16:44
doctors and nurses will have extra pairs of tireless eyes
270
1004278
4712
醫生和護士就多了雙永不倦怠的眼睛,
16:48
to help them to diagnose and take care of patients.
271
1008990
4092
幫助他們診斷及照顧病人;
16:53
Cars will run smarter and safer on the road.
272
1013082
4383
行駛在路上的車子可以更聰明、更安全;
16:57
Robots, not just humans,
273
1017465
2694
人類與機器人能一起
17:00
will help us to brave the disaster zones to save the trapped and wounded.
274
1020159
4849
共同投入災區的救援工作,拯救受困人員及傷者;
17:05
We will discover new species, better materials,
275
1025798
3796
我們還可以發現新品種 與更好的材料,
17:09
and explore unseen frontiers with the help of the machines.
276
1029594
4509
探索未知的疆界, 這一切都可仰賴機器的協助。
17:15
Little by little, we're giving sight to the machines.
277
1035113
4167
一步一步地,我們賦予機器視覺,
17:19
First, we teach them to see.
278
1039280
2798
先教他們識別物品,
17:22
Then, they help us to see better.
279
1042078
2763
然後它們也讓我們看得更清楚,
17:24
For the first time, human eyes won't be the only ones
280
1044841
4165
這是第一次人類的眼睛不是唯一
17:29
pondering and exploring our world.
281
1049006
2934
可以用來思考和探索世界的工具,
17:31
We will not only use the machines for their intelligence,
282
1051940
3460
我們不僅可以利用機器的智能,
17:35
we will also collaborate with them in ways that we cannot even imagine.
283
1055400
6179
更可以運用更多你想像不到的方式攜手合作。
17:41
This is my quest:
284
1061579
2161
這是我想追求的目標:
17:43
to give computers visual intelligence
285
1063740
2712
給予機器智慧之眼,
17:46
and to create a better future for Leo and for the world.
286
1066452
5131
為李奧和整個世界創造更美好的未來。
17:51
Thank you.
287
1071583
1811
謝謝各位。
17:53
(Applause)
288
1073394
3785
(觀眾鼓掌)
關於本網站

本網站將向您介紹對學習英語有用的 YouTube 視頻。 您將看到來自世界各地的一流教師教授的英語課程。 雙擊每個視頻頁面上顯示的英文字幕,從那裡播放視頻。 字幕與視頻播放同步滾動。 如果您有任何意見或要求,請使用此聯繫表與我們聯繫。

https://forms.gle/WvT1wiN1qDtmnspy7


This website was created in October 2020 and last updated on June 12, 2025.

It is now archived and preserved as an English learning resource.

Some information may be out of date.

隱私政策

eng.lish.video

Developer's Blog