How we teach computers to understand pictures | Fei Fei Li

1,162,036 views ・ 2015-03-23

TED


请双击下面的英文字幕来播放视频。

翻译人员: Twisted Meadows 校对人员: Min WANG
00:14
Let me show you something.
0
14366
3738
我先来给你们看点东西。
(视频)女孩: 好吧,这是只猫,坐在床上。
00:18
(Video) Girl: Okay, that's a cat sitting in a bed.
1
18104
4156
一个男孩摸着一头大象。
00:22
The boy is petting the elephant.
2
22260
4040
00:26
Those are people that are going on an airplane.
3
26300
4354
那些人正准备登机。
00:30
That's a big airplane.
4
30654
2810
那是架大飞机。
00:33
Fei-Fei Li: This is a three-year-old child
5
33464
2206
李飞飞: 这是一个三岁的小孩
00:35
describing what she sees in a series of photos.
6
35670
3679
在讲述她从一系列照片里看到的东西。
00:39
She might still have a lot to learn about this world,
7
39349
2845
对这个世界, 她也许还有很多要学的东西,
00:42
but she's already an expert at one very important task:
8
42194
4549
但在一个重要的任务上, 她已经是专家了:
00:46
to make sense of what she sees.
9
46743
2846
去理解她所看到的东西。
00:50
Our society is more technologically advanced than ever.
10
50229
4226
我们的社会已经在科技上 取得了前所未有的进步。
00:54
We send people to the moon, we make phones that talk to us
11
54455
3629
我们把人送上月球, 我们制造出可以与我们对话的手机,
00:58
or customize radio stations that can play only music we like.
12
58084
4946
或者订制一个音乐电台, 播放的全是我们喜欢的音乐。
01:03
Yet, our most advanced machines and computers
13
63030
4055
然而,哪怕是我们最先进的机器和电脑
01:07
still struggle at this task.
14
67085
2903
也会在这个问题上犯难。
01:09
So I'm here today to give you a progress report
15
69988
3459
所以今天我在这里, 向大家做个进度汇报:
01:13
on the latest advances in our research in computer vision,
16
73447
4047
关于我们在计算机 视觉方面最新的研究进展。
01:17
one of the most frontier and potentially revolutionary
17
77494
4161
这是计算机科学领域最前沿的、
01:21
technologies in computer science.
18
81655
3206
具有革命性潜力的科技。
01:24
Yes, we have prototyped cars that can drive by themselves,
19
84861
4551
是的,我们现在已经有了 具备自动驾驶功能的原型车,
01:29
but without smart vision, they cannot really tell the difference
20
89412
3853
但是如果没有敏锐的视觉, 它们就不能真正区分出
01:33
between a crumpled paper bag on the road, which can be run over,
21
93265
3970
地上摆着的是一个压扁的纸袋, 可以被轻易压过,
01:37
and a rock that size, which should be avoided.
22
97235
3340
还是一块相同体积的石头, 应该避开。
01:41
We have made fabulous megapixel cameras,
23
101415
3390
我们已经造出了超高清的相机,
01:44
but we have not delivered sight to the blind.
24
104805
3135
但我们仍然无法把 这些画面传递给盲人。
01:48
Drones can fly over massive land,
25
108420
3305
我们的无人机可以飞跃广阔的土地,
01:51
but don't have enough vision technology
26
111725
2134
却没有足够的视觉技术
01:53
to help us to track the changes of the rainforests.
27
113859
3461
去帮我们追踪热带雨林的变化。
01:57
Security cameras are everywhere,
28
117320
2950
安全摄像头到处都是,
02:00
but they do not alert us when a child is drowning in a swimming pool.
29
120270
5067
但当有孩子在泳池里溺水时 它们无法向我们报警。
02:06
Photos and videos are becoming an integral part of global life.
30
126167
5595
照片和视频,已经成为 全人类生活里不可缺少的部分。
02:11
They're being generated at a pace that's far beyond what any human,
31
131762
4087
它们以极快的速度被创造出来, 以至于没有任何人,或者团体,
02:15
or teams of humans, could hope to view,
32
135849
2783
能够完全浏览这些内容,
02:18
and you and I are contributing to that at this TED.
33
138632
3921
而你我正参与其中的这场TED, 也为之添砖加瓦。
02:22
Yet our most advanced software is still struggling at understanding
34
142553
5232
直到现在,我们最先进的 软件也依然为之犯难:
02:27
and managing this enormous content.
35
147785
3876
该怎么理解和处理 这些数量庞大的内容?
02:31
So in other words, collectively as a society,
36
151661
5272
所以换句话说, 在作为集体的这个社会里,
02:36
we're very much blind,
37
156933
1746
我们依然非常茫然,因为我们最智能的机器 依然有视觉上的缺陷。
02:38
because our smartest machines are still blind.
38
158679
3387
02:43
"Why is this so hard?" you may ask.
39
163526
2926
”为什么这么困难?“你也许会问。
02:46
Cameras can take pictures like this one
40
166452
2693
照相机可以像这样获得照片:
02:49
by converting lights into a two-dimensional array of numbers
41
169145
3994
它把采集到的光线转换成 二维数字矩阵来存储
——也就是“像素”,
02:53
known as pixels,
42
173139
1650
02:54
but these are just lifeless numbers.
43
174789
2251
但这些仍然是死板的数字。
02:57
They do not carry meaning in themselves.
44
177040
3111
它们自身并不携带任何意义。
03:00
Just like to hear is not the same as to listen,
45
180151
4343
就像”听到“和”听“完全不同,
03:04
to take pictures is not the same as to see,
46
184494
4040
”拍照“和”看“也完全不同。
03:08
and by seeing, we really mean understanding.
47
188534
3829
通过“看”, 我们实际上是“理解”了这个画面。
03:13
In fact, it took Mother Nature 540 million years of hard work
48
193293
6177
事实上,大自然经过了5亿4千万年的努力
03:19
to do this task,
49
199470
1973
才完成了这个工作,
03:21
and much of that effort
50
201443
1881
而这努力中更多的部分
03:23
went into developing the visual processing apparatus of our brains,
51
203324
5271
是用在进化我们的大脑内 用于视觉处理的器官,
03:28
not the eyes themselves.
52
208595
2647
而不是眼睛本身。
03:31
So vision begins with the eyes,
53
211242
2747
所以"视觉”从眼睛采集信息开始,
03:33
but it truly takes place in the brain.
54
213989
3518
但大脑才是它真正呈现意义的地方。
03:38
So for 15 years now, starting from my Ph.D. at Caltech
55
218287
5060
所以15年来, 从我进入加州理工学院攻读Ph.D.
03:43
and then leading Stanford's Vision Lab,
56
223347
2926
到后来领导 斯坦福大学的视觉实验室,
03:46
I've been working with my mentors, collaborators and students
57
226273
4396
我一直在和我的导师、 合作者和学生们一起
03:50
to teach computers to see.
58
230669
2889
教计算机如何去“看”。
03:54
Our research field is called computer vision and machine learning.
59
234658
3294
我们的研究领域叫做 "计算机视觉与机器学习"。
03:57
It's part of the general field of artificial intelligence.
60
237952
3878
这是AI(人工智能)领域的一个分支。
04:03
So ultimately, we want to teach the machines to see just like we do:
61
243000
5493
最终,我们希望能教会机器 像我们一样看见事物:
04:08
naming objects, identifying people, inferring 3D geometry of things,
62
248493
5387
识别物品、辨别不同的人、 推断物体的立体形状、
04:13
understanding relations, emotions, actions and intentions.
63
253880
5688
理解事物的关联、 人的情绪、动作和意图。
04:19
You and I weave together entire stories of people, places and things
64
259568
6153
像你我一样,只凝视一个画面一眼 就能理清整个故事中的人物、地点、事件。
04:25
the moment we lay our gaze on them.
65
265721
2164
04:28
The first step towards this goal is to teach a computer to see objects,
66
268955
5583
实现这一目标的第一步是 教计算机看到“对象”(物品),
04:34
the building block of the visual world.
67
274538
3368
这是建造视觉世界的基石。
04:37
In its simplest terms, imagine this teaching process
68
277906
4434
在这个最简单的任务里, 想象一下这个教学过程:
04:42
as showing the computers some training images
69
282340
2995
给计算机看一些特定物品的训练图片, 比如说猫,
04:45
of a particular object, let's say cats,
70
285335
3321
04:48
and designing a model that learns from these training images.
71
288656
4737
并让它从这些训练图片中, 学习建立出一个模型来。
04:53
How hard can this be?
72
293393
2044
这有多难呢?
04:55
After all, a cat is just a collection of shapes and colors,
73
295437
4052
不管怎么说,一只猫只是一些 形状和颜色拼凑起来的图案罢了,
04:59
and this is what we did in the early days of object modeling.
74
299489
4086
比如这个就是我们 最初设计的抽象模型。
05:03
We'd tell the computer algorithm in a mathematical language
75
303575
3622
我们用数学的语言, 告诉计算机这种算法:
05:07
that a cat has a round face, a chubby body,
76
307197
3343
“猫”有着圆脸、胖身子、
05:10
two pointy ears, and a long tail,
77
310540
2299
两个尖尖的耳朵,还有一条长尾巴,
05:12
and that looked all fine.
78
312839
1410
这(算法)看上去挺好的。
05:14
But what about this cat?
79
314859
2113
但如果遇到这样的猫呢?
05:16
(Laughter)
80
316972
1091
(笑)
它整个蜷缩起来了。
05:18
It's all curled up.
81
318063
1626
05:19
Now you have to add another shape and viewpoint to the object model.
82
319689
4719
现在你不得不加入一些别的形状和视角 来描述这个物品模型。
05:24
But what if cats are hidden?
83
324408
1715
但如果猫是藏起来的呢?
05:27
What about these silly cats?
84
327143
2219
再看看这些傻猫呢?
05:31
Now you get my point.
85
331112
2417
你现在知道了吧。
05:33
Even something as simple as a household pet
86
333529
3367
即使那些事物简单到 只是一只家养的宠物,
05:36
can present an infinite number of variations to the object model,
87
336896
4504
都可以出呈现出无限种变化的外观模型,
05:41
and that's just one object.
88
341400
2233
而这还只是“一个”对象的模型。
05:44
So about eight years ago,
89
344573
2492
所以大概在8年前,
05:47
a very simple and profound observation changed my thinking.
90
347065
5030
一个非常简单、有冲击力的 观察改变了我的想法。
05:53
No one tells a child how to see,
91
353425
2685
没有人教过婴儿怎么“看”,
05:56
especially in the early years.
92
356110
2261
尤其是在他们还很小的时候。
05:58
They learn this through real-world experiences and examples.
93
358371
5000
他们是从真实世界的经验 和例子中学到这个的。
06:03
If you consider a child's eyes
94
363371
2740
如果你把孩子的眼睛
都看作是生物照相机,
06:06
as a pair of biological cameras,
95
366111
2554
06:08
they take one picture about every 200 milliseconds,
96
368665
4180
那他们每200毫秒就拍一张照。
06:12
the average time an eye movement is made.
97
372845
3134
——这是眼球转动一次的平均时间。
06:15
So by age three, a child would have seen hundreds of millions of pictures
98
375979
5550
所以到3岁大的时候,一个孩子已经看过了 上亿张的真实世界照片。
06:21
of the real world.
99
381529
1834
06:23
That's a lot of training examples.
100
383363
2280
这种“训练照片”的数量是非常大的。
06:26
So instead of focusing solely on better and better algorithms,
101
386383
5989
所以,与其孤立地关注于 算法的优化、再优化,
06:32
my insight was to give the algorithms the kind of training data
102
392372
5272
我的关注点放在了给算法 提供像那样的训练数据
06:37
that a child was given through experiences
103
397644
3319
——那些,婴儿们从经验中获得的 质量和数量都极其惊人的训练照片。
06:40
in both quantity and quality.
104
400963
3878
06:44
Once we know this,
105
404841
1858
一旦我们知道了这个,
06:46
we knew we needed to collect a data set
106
406699
2971
我们就明白自己需要收集的数据集,
06:49
that has far more images than we have ever had before,
107
409670
4459
必须比我们曾有过的任何数据库都丰富
06:54
perhaps thousands of times more,
108
414129
2577
——可能要丰富数千倍。
06:56
and together with Professor Kai Li at Princeton University,
109
416706
4111
因此,通过与普林斯顿大学的 Kai Li教授合作,
07:00
we launched the ImageNet project in 2007.
110
420817
4752
我们在2007年发起了 ImageNet(图片网络)计划。
07:05
Luckily, we didn't have to mount a camera on our head
111
425569
3838
幸运的是,我们不必在自己脑子里 装上一台照相机,然后等它拍很多年。
07:09
and wait for many years.
112
429407
1764
我们运用了互联网,
07:11
We went to the Internet,
113
431171
1463
07:12
the biggest treasure trove of pictures that humans have ever created.
114
432634
4436
这个由人类创造的 最大的图片宝库。
07:17
We downloaded nearly a billion images
115
437070
3041
我们下载了接近10亿张图片
07:20
and used crowdsourcing technology like the Amazon Mechanical Turk platform
116
440111
5880
并利用众包技术(利用互联网分配工作、发现创意或 解决技术问题),像“亚马逊土耳其机器人”这样的平台
07:25
to help us to label these images.
117
445991
2339
来帮我们标记这些图片。
07:28
At its peak, ImageNet was one of the biggest employers
118
448330
4900
在高峰期时,ImageNet是「亚马逊土耳其机器人」 这个平台上最大的雇主之一:
07:33
of the Amazon Mechanical Turk workers:
119
453230
2996
07:36
together, almost 50,000 workers
120
456226
3854
来自世界上167个国家的 接近5万个工作者,在一起工作
07:40
from 167 countries around the world
121
460080
4040
帮我们筛选、排序、标记了 接近10亿张备选照片。
07:44
helped us to clean, sort and label
122
464120
3947
07:48
nearly a billion candidate images.
123
468067
3575
07:52
That was how much effort it took
124
472612
2653
这就是我们为这个计划投入的精力,
07:55
to capture even a fraction of the imagery
125
475265
3900
去捕捉,一个婴儿可能在他早期发育阶段 获取的”一小部分“图像。
07:59
a child's mind takes in in the early developmental years.
126
479165
4171
事后我们再来看,这个利用大数据来训练 计算机算法的思路,也许现在看起来很普通,
08:04
In hindsight, this idea of using big data
127
484148
3902
08:08
to train computer algorithms may seem obvious now,
128
488050
4550
08:12
but back in 2007, it was not so obvious.
129
492600
4110
但回到2007年时,它就不那么寻常了。
08:16
We were fairly alone on this journey for quite a while.
130
496710
3878
我们在这段旅程上孤独地前行了很久。
08:20
Some very friendly colleagues advised me to do something more useful for my tenure,
131
500588
5003
一些很友善的同事建议我 做一些更有用的事来获得终身教职,
08:25
and we were constantly struggling for research funding.
132
505591
4342
而且我们也不断地为项目的研究经费发愁。
08:29
Once, I even joked to my graduate students
133
509933
2485
有一次,我甚至对 我的研究生学生开玩笑说:
08:32
that I would just reopen my dry cleaner's shop to fund ImageNet.
134
512418
4063
我要重新回去开我的干洗店 来赚钱资助ImageNet了。
08:36
After all, that's how I funded my college years.
135
516481
4761
——毕竟,我的大学时光 就是靠这个资助的。
所以我们仍然在继续着。
08:41
So we carried on.
136
521242
1856
在2009年,ImageNet项目诞生了——
08:43
In 2009, the ImageNet project delivered
137
523098
3715
08:46
a database of 15 million images
138
526813
4042
一个含有1500万张照片的数据库, 涵盖了22000种物品。
08:50
across 22,000 classes of objects and things
139
530855
4805
08:55
organized by everyday English words.
140
535660
3320
这些物品是根据日常英语单词 进行分类组织的。
08:58
In both quantity and quality,
141
538980
2926
无论是在质量上还是数量上,
09:01
this was an unprecedented scale.
142
541906
2972
这都是一个规模空前的数据库。
09:04
As an example, in the case of cats,
143
544878
3461
举个例子,在"猫"这个对象中,
09:08
we have more than 62,000 cats
144
548339
2809
我们有超过62000只猫
09:11
of all kinds of looks and poses
145
551148
4110
长相各异,姿势五花八门,
09:15
and across all species of domestic and wild cats.
146
555258
5223
而且涵盖了各种品种的家猫和野猫。
09:20
We were thrilled to have put together ImageNet,
147
560481
3344
我们对ImageNet收集到的图片 感到异常兴奋,
09:23
and we wanted the whole research world to benefit from it,
148
563825
3738
而且我们希望整个研究界能从中受益,
09:27
so in the TED fashion, we opened up the entire data set
149
567563
4041
所以以一种和TED一样的方式,
我们公开了整个数据库, 免费提供给全世界的研究团体。
09:31
to the worldwide research community for free.
150
571604
3592
(掌声)
09:36
(Applause)
151
576636
4000
09:41
Now that we have the data to nourish our computer brain,
152
581416
4538
那么现在,我们有了用来 培育计算机大脑的数据库,
09:45
we're ready to come back to the algorithms themselves.
153
585954
3737
我们可以回到”算法“本身上来了。
09:49
As it turned out, the wealth of information provided by ImageNet
154
589691
5178
因为ImageNet的横空出世,它提供的信息财富 完美地适用于一些特定类别的机器学习算法,
09:54
was a perfect match to a particular class of machine learning algorithms
155
594869
4806
09:59
called convolutional neural network,
156
599675
2415
称作“卷积神经网络”,
10:02
pioneered by Kunihiko Fukushima, Geoff Hinton, and Yann LeCun
157
602090
5248
最早由Kunihiko Fukushima,Geoff Hinton, 和Yann LeCun在上世纪七八十年代开创。
10:07
back in the 1970s and '80s.
158
607338
3645
10:10
Just like the brain consists of billions of highly connected neurons,
159
610983
5619
就像大脑是由上十亿的 紧密联结的神经元组成,
10:16
a basic operating unit in a neural network
160
616602
3854
神经网络里最基础的运算单元 也是一个“神经元式”的节点。
10:20
is a neuron-like node.
161
620456
2415
10:22
It takes input from other nodes
162
622871
2554
每个节点从其它节点处获取输入信息, 然后把自己的输出信息再交给另外的节点。
10:25
and sends output to others.
163
625425
2718
此外,这些成千上万、甚至上百万的节点
10:28
Moreover, these hundreds of thousands or even millions of nodes
164
628143
4713
10:32
are organized in hierarchical layers,
165
632856
3227
都被按等级分布于不同层次,
就像大脑一样。
10:36
also similar to the brain.
166
636083
2554
10:38
In a typical neural network we use to train our object recognition model,
167
638637
4783
在一个我们用来训练“对象识别模型”的 典型神经网络里,
10:43
it has 24 million nodes,
168
643420
3181
有着2400万个节点,1亿4千万个参数, 和150亿个联结。
10:46
140 million parameters,
169
646601
3297
10:49
and 15 billion connections.
170
649898
2763
10:52
That's an enormous model.
171
652661
2415
这是一个庞大的模型。
10:55
Powered by the massive data from ImageNet
172
655076
3901
借助ImageNet提供的巨大规模数据支持,
10:58
and the modern CPUs and GPUs to train such a humongous model,
173
658977
5433
通过大量最先进的CPU和GPU, 来训练这些堆积如山的模型,
11:04
the convolutional neural network
174
664410
2369
“卷积神经网络” 以难以想象的方式蓬勃发展起来。
11:06
blossomed in a way that no one expected.
175
666779
3436
它成为了一个成功体系,
11:10
It became the winning architecture
176
670215
2508
11:12
to generate exciting new results in object recognition.
177
672723
5340
在对象识别领域, 产生了激动人心的新成果。
11:18
This is a computer telling us
178
678063
2810
这张图,是计算机在告诉我们:
11:20
this picture contains a cat
179
680873
2300
照片里有一只猫、
11:23
and where the cat is.
180
683173
1903
还有猫所在的位置。
当然不止有猫了,
11:25
Of course there are more things than cats,
181
685076
2112
11:27
so here's a computer algorithm telling us
182
687188
2438
所以这是计算机算法在告诉我们
11:29
the picture contains a boy and a teddy bear;
183
689626
3274
照片里有一个男孩,和一个泰迪熊;
11:32
a dog, a person, and a small kite in the background;
184
692900
4366
一只狗,一个人,和背景里的小风筝;
或者是一张拍摄于闹市的照片 比如人、滑板、栏杆、灯柱…等等。
11:37
or a picture of very busy things
185
697266
3135
11:40
like a man, a skateboard, railings, a lampost, and so on.
186
700401
4644
有时候,如果计算机 不是很确定它看到的是什么,
11:45
Sometimes, when the computer is not so confident about what it sees,
187
705045
5293
11:51
we have taught it to be smart enough
188
711498
2276
我们还教它用足够聪明的方式 给出一个“安全”的答案,而不是“言多必失”
11:53
to give us a safe answer instead of committing too much,
189
713774
3878
11:57
just like we would do,
190
717652
2811
——就像人类面对这类问题时一样。
12:00
but other times our computer algorithm is remarkable at telling us
191
720463
4666
但在其他时候,我们的计算机 算法厉害到可以告诉我们
12:05
what exactly the objects are,
192
725129
2253
关于对象的更确切的信息, 比如汽车的品牌、型号、年份。
12:07
like the make, model, year of the cars.
193
727382
3436
12:10
We applied this algorithm to millions of Google Street View images
194
730818
5386
我们在上百万张谷歌街景照片中 应用了这一算法,
12:16
across hundreds of American cities,
195
736204
3135
那些照片涵盖了上百个美国城市。
12:19
and we have learned something really interesting:
196
739339
2926
我们从中发现一些有趣的事:
12:22
first, it confirmed our common wisdom
197
742265
3320
首先,它证实了我们的一些常识:
12:25
that car prices correlate very well
198
745585
3290
汽车的价格,与家庭收入 呈现出明显的正相关。
12:28
with household incomes.
199
748875
2345
12:31
But surprisingly, car prices also correlate well
200
751220
4527
但令人惊奇的是,汽车价格与犯罪率 也呈现出明显的正相关性,
12:35
with crime rates in cities,
201
755747
2300
以上结论是基于城市、或投票的 邮编区域进行分析的结果。
12:39
or voting patterns by zip codes.
202
759007
3963
那么等一下,这就是全部成果了吗?
12:44
So wait a minute. Is that it?
203
764060
2206
计算机是不是已经达到, 或者甚至超过了人类的能力?
12:46
Has the computer already matched or even surpassed human capabilities?
204
766266
5153
12:51
Not so fast.
205
771419
2138
——还没有那么快。
12:53
So far, we have just taught the computer to see objects.
206
773557
4923
目前为止,我们还只是 教会了计算机去看对象。
12:58
This is like a small child learning to utter a few nouns.
207
778480
4644
这就像是一个小宝宝学会说出几个名词。
这是一项难以置信的成就,
13:03
It's an incredible accomplishment,
208
783124
2670
13:05
but it's only the first step.
209
785794
2460
但这还只是第一步。
13:08
Soon, another developmental milestone will be hit,
210
788254
3762
很快,我们就会到达 发展历程的另一个里程碑:
这个小孩会开始用“句子”进行交流。
13:12
and children begin to communicate in sentences.
211
792016
3461
13:15
So instead of saying this is a cat in the picture,
212
795477
4224
所以不止是说这张图里有只“猫”,
13:19
you already heard the little girl telling us this is a cat lying on a bed.
213
799701
5202
你在开头已经听到小妹妹 告诉我们“这只猫是坐在床上的”。
13:24
So to teach a computer to see a picture and generate sentences,
214
804903
5595
为了教计算机看懂图片并生成句子,
13:30
the marriage between big data and machine learning algorithm
215
810498
3948
“大数据”和“机器学习算法”的结合 需要更进一步。
13:34
has to take another step.
216
814446
2275
13:36
Now, the computer has to learn from both pictures
217
816721
4156
现在,计算机需要从图片和人类创造的 自然语言句子中同时进行学习。
13:40
as well as natural language sentences
218
820877
2856
13:43
generated by humans.
219
823733
3322
就像我们的大脑, 把视觉现象和语言融合在一起,
13:47
Just like the brain integrates vision and language,
220
827055
3853
13:50
we developed a model that connects parts of visual things
221
830908
5201
我们开发了一个模型,
可以把一部分视觉信息,像视觉片段, 与语句中的文字、短语联系起来。
13:56
like visual snippets
222
836109
1904
13:58
with words and phrases in sentences.
223
838013
4203
14:02
About four months ago,
224
842216
2763
大约4个月前, 我们最终把所有技术结合在了一起,
14:04
we finally tied all this together
225
844979
2647
14:07
and produced one of the first computer vision models
226
847626
3784
创造了第一个“计算机视觉模型”,
14:11
that is capable of generating a human-like sentence
227
851410
3994
它在看到图片的第一时间,就有能力生成 类似人类语言的句子。
14:15
when it sees a picture for the first time.
228
855404
3506
14:18
Now, I'm ready to show you what the computer says
229
858910
4644
现在,我准备给你们看看 计算机看到图片时会说些什么
14:23
when it sees the picture
230
863554
1975
14:25
that the little girl saw at the beginning of this talk.
231
865529
3830
——还是那些在演讲开头给小女孩看的图片。
(视频)计算机: “一个男人站在一头大象旁边。”
14:31
(Video) Computer: A man is standing next to an elephant.
232
871519
3344
14:36
A large airplane sitting on top of an airport runway.
233
876393
3634
“一架大飞机停在机场跑道一端。”
14:41
FFL: Of course, we're still working hard to improve our algorithms,
234
881057
4212
李飞飞: 当然,我们还在努力改善我们的算法,
14:45
and it still has a lot to learn.
235
885269
2596
它还有很多要学的东西。
14:47
(Applause)
236
887865
2291
(掌声)
14:51
And the computer still makes mistakes.
237
891556
3321
计算机还是会犯很多错误的。
14:54
(Video) Computer: A cat lying on a bed in a blanket.
238
894877
3391
(视频)计算机: “一只猫躺在床上的毯子上。”
李飞飞:所以…当然——如果它看过太多种的猫, 它就会觉得什么东西都长得像猫……
14:58
FFL: So of course, when it sees too many cats,
239
898268
2553
15:00
it thinks everything might look like a cat.
240
900821
2926
(视频)计算机: “一个小男孩拿着一根棒球棍。”
15:05
(Video) Computer: A young boy is holding a baseball bat.
241
905317
2864
15:08
(Laughter)
242
908181
1765
(笑声)
15:09
FFL: Or, if it hasn't seen a toothbrush, it confuses it with a baseball bat.
243
909946
4583
李飞飞:或者…如果它从没见过牙刷, 它就分不清牙刷和棒球棍的区别。
15:15
(Video) Computer: A man riding a horse down a street next to a building.
244
915309
3434
(视频)计算机: “建筑旁的街道上有一个男人骑马经过。”
15:18
(Laughter)
245
918743
2023
(笑声)
15:20
FFL: We haven't taught Art 101 to the computers.
246
920766
3552
李飞飞:我们还没教它Art 101 (美国大学艺术基础课)。
15:25
(Video) Computer: A zebra standing in a field of grass.
247
925768
2884
(视频)计算机: “一只斑马站在一片草原上。”
15:28
FFL: And it hasn't learned to appreciate the stunning beauty of nature
248
928652
3367
李飞飞:它还没学会像你我一样 欣赏大自然里的绝美景色。
15:32
like you and I do.
249
932019
2438
15:34
So it has been a long journey.
250
934457
2832
所以,这是一条漫长的道路。
15:37
To get from age zero to three was hard.
251
937289
4226
将一个孩子从出生培养到3岁是很辛苦的。
15:41
The real challenge is to go from three to 13 and far beyond.
252
941515
5596
而真正的挑战是从3岁到13岁的过程中, 而且远远不止于此。
让我再给你们看看这张 关于小男孩和蛋糕的图。
15:47
Let me remind you with this picture of the boy and the cake again.
253
947111
4365
15:51
So far, we have taught the computer to see objects
254
951476
4064
目前为止, 我们已经教会计算机“看”对象,
15:55
or even tell us a simple story when seeing a picture.
255
955540
4458
或者甚至基于图片, 告诉我们一个简单的故事。
15:59
(Video) Computer: A person sitting at a table with a cake.
256
959998
3576
(视频)计算机: ”一个人坐在放蛋糕的桌子旁。“
16:03
FFL: But there's so much more to this picture
257
963574
2630
李飞飞:但图片里还有更多信息 ——远不止一个人和一个蛋糕。
16:06
than just a person and a cake.
258
966204
2270
16:08
What the computer doesn't see is that this is a special Italian cake
259
968474
4467
计算机无法理解的是: 这是一个特殊的意大利蛋糕,
16:12
that's only served during Easter time.
260
972941
3217
它只在复活节限时供应。
而这个男孩穿着的 是他最喜欢的T恤衫,
16:16
The boy is wearing his favorite t-shirt
261
976158
3205
16:19
given to him as a gift by his father after a trip to Sydney,
262
979363
3970
那是他父亲去悉尼旅行时 带给他的礼物。
16:23
and you and I can all tell how happy he is
263
983333
3808
另外,你和我都能清楚地看出, 这个小孩有多高兴,以及这一刻在想什么。
16:27
and what's exactly on his mind at that moment.
264
987141
3203
这是我的儿子Leo。
16:31
This is my son Leo.
265
991214
3125
在我探索视觉智能的道路上,
16:34
On my quest for visual intelligence,
266
994339
2624
16:36
I think of Leo constantly
267
996963
2391
我不断地想到Leo 和他未来将要生活的那个世界。
16:39
and the future world he will live in.
268
999354
2903
当机器可以“看到”的时候,
16:42
When machines can see,
269
1002257
2021
16:44
doctors and nurses will have extra pairs of tireless eyes
270
1004278
4712
医生和护士会获得一双额外的、 不知疲倦的眼睛,
16:48
to help them to diagnose and take care of patients.
271
1008990
4092
帮他们诊断病情、照顾病人。
16:53
Cars will run smarter and safer on the road.
272
1013082
4383
汽车可以在道路上行驶得 更智能、更安全。
16:57
Robots, not just humans,
273
1017465
2694
机器人,而不只是人类,
会帮我们救助灾区被困和受伤的人员。
17:00
will help us to brave the disaster zones to save the trapped and wounded.
274
1020159
4849
17:05
We will discover new species, better materials,
275
1025798
3796
我们会发现新的物种、更好的材料,
17:09
and explore unseen frontiers with the help of the machines.
276
1029594
4509
还可以在机器的帮助下 探索从未见到过的前沿地带。
一点一点地, 我们正在赋予机器以视力。
17:15
Little by little, we're giving sight to the machines.
277
1035113
4167
17:19
First, we teach them to see.
278
1039280
2798
首先,我们教它们去“看”。
然后,它们反过来也帮助我们, 让我们看得更清楚。
17:22
Then, they help us to see better.
279
1042078
2763
17:24
For the first time, human eyes won't be the only ones
280
1044841
4165
这是第一次,人类的眼睛不再 独自地思考和探索我们的世界。
17:29
pondering and exploring our world.
281
1049006
2934
17:31
We will not only use the machines for their intelligence,
282
1051940
3460
我们将不止是“使用”机器的智力,
17:35
we will also collaborate with them in ways that we cannot even imagine.
283
1055400
6179
我们还要以一种从未想象过的方式, 与它们“合作”。
17:41
This is my quest:
284
1061579
2161
我所追求的是:
17:43
to give computers visual intelligence
285
1063740
2712
赋予计算机视觉智能,
17:46
and to create a better future for Leo and for the world.
286
1066452
5131
并为Leo和这个世界, 创造出更美好的未来。
17:51
Thank you.
287
1071583
1811
谢谢。
(掌声)
17:53
(Applause)
288
1073394
3785
关于本网站

这个网站将向你介绍对学习英语有用的YouTube视频。你将看到来自世界各地的一流教师教授的英语课程。双击每个视频页面上显示的英文字幕,即可从那里播放视频。字幕会随着视频的播放而同步滚动。如果你有任何意见或要求,请使用此联系表与我们联系。

https://forms.gle/WvT1wiN1qDtmnspy7