How we teach computers to understand pictures | Fei Fei Li

1,175,905 views ・ 2015-03-23

TED


请双击下面的英文字幕来播放视频。

翻译人员: Twisted Meadows 校对人员: Min WANG
00:14
Let me show you something.
0
14366
3738
我先来给你们看点东西。
(视频)女孩: 好吧,这是只猫,坐在床上。
00:18
(Video) Girl: Okay, that's a cat sitting in a bed.
1
18104
4156
一个男孩摸着一头大象。
00:22
The boy is petting the elephant.
2
22260
4040
00:26
Those are people that are going on an airplane.
3
26300
4354
那些人正准备登机。
00:30
That's a big airplane.
4
30654
2810
那是架大飞机。
00:33
Fei-Fei Li: This is a three-year-old child
5
33464
2206
李飞飞: 这是一个三岁的小孩
00:35
describing what she sees in a series of photos.
6
35670
3679
在讲述她从一系列照片里看到的东西。
00:39
She might still have a lot to learn about this world,
7
39349
2845
对这个世界, 她也许还有很多要学的东西,
00:42
but she's already an expert at one very important task:
8
42194
4549
但在一个重要的任务上, 她已经是专家了:
00:46
to make sense of what she sees.
9
46743
2846
去理解她所看到的东西。
00:50
Our society is more technologically advanced than ever.
10
50229
4226
我们的社会已经在科技上 取得了前所未有的进步。
00:54
We send people to the moon, we make phones that talk to us
11
54455
3629
我们把人送上月球, 我们制造出可以与我们对话的手机,
00:58
or customize radio stations that can play only music we like.
12
58084
4946
或者订制一个音乐电台, 播放的全是我们喜欢的音乐。
01:03
Yet, our most advanced machines and computers
13
63030
4055
然而,哪怕是我们最先进的机器和电脑
01:07
still struggle at this task.
14
67085
2903
也会在这个问题上犯难。
01:09
So I'm here today to give you a progress report
15
69988
3459
所以今天我在这里, 向大家做个进度汇报:
01:13
on the latest advances in our research in computer vision,
16
73447
4047
关于我们在计算机 视觉方面最新的研究进展。
01:17
one of the most frontier and potentially revolutionary
17
77494
4161
这是计算机科学领域最前沿的、
01:21
technologies in computer science.
18
81655
3206
具有革命性潜力的科技。
01:24
Yes, we have prototyped cars that can drive by themselves,
19
84861
4551
是的,我们现在已经有了 具备自动驾驶功能的原型车,
01:29
but without smart vision, they cannot really tell the difference
20
89412
3853
但是如果没有敏锐的视觉, 它们就不能真正区分出
01:33
between a crumpled paper bag on the road, which can be run over,
21
93265
3970
地上摆着的是一个压扁的纸袋, 可以被轻易压过,
01:37
and a rock that size, which should be avoided.
22
97235
3340
还是一块相同体积的石头, 应该避开。
01:41
We have made fabulous megapixel cameras,
23
101415
3390
我们已经造出了超高清的相机,
01:44
but we have not delivered sight to the blind.
24
104805
3135
但我们仍然无法把 这些画面传递给盲人。
01:48
Drones can fly over massive land,
25
108420
3305
我们的无人机可以飞跃广阔的土地,
01:51
but don't have enough vision technology
26
111725
2134
却没有足够的视觉技术
01:53
to help us to track the changes of the rainforests.
27
113859
3461
去帮我们追踪热带雨林的变化。
01:57
Security cameras are everywhere,
28
117320
2950
安全摄像头到处都是,
02:00
but they do not alert us when a child is drowning in a swimming pool.
29
120270
5067
但当有孩子在泳池里溺水时 它们无法向我们报警。
02:06
Photos and videos are becoming an integral part of global life.
30
126167
5595
照片和视频,已经成为 全人类生活里不可缺少的部分。
02:11
They're being generated at a pace that's far beyond what any human,
31
131762
4087
它们以极快的速度被创造出来, 以至于没有任何人,或者团体,
02:15
or teams of humans, could hope to view,
32
135849
2783
能够完全浏览这些内容,
02:18
and you and I are contributing to that at this TED.
33
138632
3921
而你我正参与其中的这场TED, 也为之添砖加瓦。
02:22
Yet our most advanced software is still struggling at understanding
34
142553
5232
直到现在,我们最先进的 软件也依然为之犯难:
02:27
and managing this enormous content.
35
147785
3876
该怎么理解和处理 这些数量庞大的内容?
02:31
So in other words, collectively as a society,
36
151661
5272
所以换句话说, 在作为集体的这个社会里,
02:36
we're very much blind,
37
156933
1746
我们依然非常茫然,因为我们最智能的机器 依然有视觉上的缺陷。
02:38
because our smartest machines are still blind.
38
158679
3387
02:43
"Why is this so hard?" you may ask.
39
163526
2926
”为什么这么困难?“你也许会问。
02:46
Cameras can take pictures like this one
40
166452
2693
照相机可以像这样获得照片:
02:49
by converting lights into a two-dimensional array of numbers
41
169145
3994
它把采集到的光线转换成 二维数字矩阵来存储
——也就是“像素”,
02:53
known as pixels,
42
173139
1650
02:54
but these are just lifeless numbers.
43
174789
2251
但这些仍然是死板的数字。
02:57
They do not carry meaning in themselves.
44
177040
3111
它们自身并不携带任何意义。
03:00
Just like to hear is not the same as to listen,
45
180151
4343
就像”听到“和”听“完全不同,
03:04
to take pictures is not the same as to see,
46
184494
4040
”拍照“和”看“也完全不同。
03:08
and by seeing, we really mean understanding.
47
188534
3829
通过“看”, 我们实际上是“理解”了这个画面。
03:13
In fact, it took Mother Nature 540 million years of hard work
48
193293
6177
事实上,大自然经过了5亿4千万年的努力
03:19
to do this task,
49
199470
1973
才完成了这个工作,
03:21
and much of that effort
50
201443
1881
而这努力中更多的部分
03:23
went into developing the visual processing apparatus of our brains,
51
203324
5271
是用在进化我们的大脑内 用于视觉处理的器官,
03:28
not the eyes themselves.
52
208595
2647
而不是眼睛本身。
03:31
So vision begins with the eyes,
53
211242
2747
所以"视觉”从眼睛采集信息开始,
03:33
but it truly takes place in the brain.
54
213989
3518
但大脑才是它真正呈现意义的地方。
03:38
So for 15 years now, starting from my Ph.D. at Caltech
55
218287
5060
所以15年来, 从我进入加州理工学院攻读Ph.D.
03:43
and then leading Stanford's Vision Lab,
56
223347
2926
到后来领导 斯坦福大学的视觉实验室,
03:46
I've been working with my mentors, collaborators and students
57
226273
4396
我一直在和我的导师、 合作者和学生们一起
03:50
to teach computers to see.
58
230669
2889
教计算机如何去“看”。
03:54
Our research field is called computer vision and machine learning.
59
234658
3294
我们的研究领域叫做 "计算机视觉与机器学习"。
03:57
It's part of the general field of artificial intelligence.
60
237952
3878
这是AI(人工智能)领域的一个分支。
04:03
So ultimately, we want to teach the machines to see just like we do:
61
243000
5493
最终,我们希望能教会机器 像我们一样看见事物:
04:08
naming objects, identifying people, inferring 3D geometry of things,
62
248493
5387
识别物品、辨别不同的人、 推断物体的立体形状、
04:13
understanding relations, emotions, actions and intentions.
63
253880
5688
理解事物的关联、 人的情绪、动作和意图。
04:19
You and I weave together entire stories of people, places and things
64
259568
6153
像你我一样,只凝视一个画面一眼 就能理清整个故事中的人物、地点、事件。
04:25
the moment we lay our gaze on them.
65
265721
2164
04:28
The first step towards this goal is to teach a computer to see objects,
66
268955
5583
实现这一目标的第一步是 教计算机看到“对象”(物品),
04:34
the building block of the visual world.
67
274538
3368
这是建造视觉世界的基石。
04:37
In its simplest terms, imagine this teaching process
68
277906
4434
在这个最简单的任务里, 想象一下这个教学过程:
04:42
as showing the computers some training images
69
282340
2995
给计算机看一些特定物品的训练图片, 比如说猫,
04:45
of a particular object, let's say cats,
70
285335
3321
04:48
and designing a model that learns from these training images.
71
288656
4737
并让它从这些训练图片中, 学习建立出一个模型来。
04:53
How hard can this be?
72
293393
2044
这有多难呢?
04:55
After all, a cat is just a collection of shapes and colors,
73
295437
4052
不管怎么说,一只猫只是一些 形状和颜色拼凑起来的图案罢了,
04:59
and this is what we did in the early days of object modeling.
74
299489
4086
比如这个就是我们 最初设计的抽象模型。
05:03
We'd tell the computer algorithm in a mathematical language
75
303575
3622
我们用数学的语言, 告诉计算机这种算法:
05:07
that a cat has a round face, a chubby body,
76
307197
3343
“猫”有着圆脸、胖身子、
05:10
two pointy ears, and a long tail,
77
310540
2299
两个尖尖的耳朵,还有一条长尾巴,
05:12
and that looked all fine.
78
312839
1410
这(算法)看上去挺好的。
05:14
But what about this cat?
79
314859
2113
但如果遇到这样的猫呢?
05:16
(Laughter)
80
316972
1091
(笑)
它整个蜷缩起来了。
05:18
It's all curled up.
81
318063
1626
05:19
Now you have to add another shape and viewpoint to the object model.
82
319689
4719
现在你不得不加入一些别的形状和视角 来描述这个物品模型。
05:24
But what if cats are hidden?
83
324408
1715
但如果猫是藏起来的呢?
05:27
What about these silly cats?
84
327143
2219
再看看这些傻猫呢?
05:31
Now you get my point.
85
331112
2417
你现在知道了吧。
05:33
Even something as simple as a household pet
86
333529
3367
即使那些事物简单到 只是一只家养的宠物,
05:36
can present an infinite number of variations to the object model,
87
336896
4504
都可以出呈现出无限种变化的外观模型,
05:41
and that's just one object.
88
341400
2233
而这还只是“一个”对象的模型。
05:44
So about eight years ago,
89
344573
2492
所以大概在8年前,
05:47
a very simple and profound observation changed my thinking.
90
347065
5030
一个非常简单、有冲击力的 观察改变了我的想法。
05:53
No one tells a child how to see,
91
353425
2685
没有人教过婴儿怎么“看”,
05:56
especially in the early years.
92
356110
2261
尤其是在他们还很小的时候。
05:58
They learn this through real-world experiences and examples.
93
358371
5000
他们是从真实世界的经验 和例子中学到这个的。
06:03
If you consider a child's eyes
94
363371
2740
如果你把孩子的眼睛
都看作是生物照相机,
06:06
as a pair of biological cameras,
95
366111
2554
06:08
they take one picture about every 200 milliseconds,
96
368665
4180
那他们每200毫秒就拍一张照。
06:12
the average time an eye movement is made.
97
372845
3134
——这是眼球转动一次的平均时间。
06:15
So by age three, a child would have seen hundreds of millions of pictures
98
375979
5550
所以到3岁大的时候,一个孩子已经看过了 上亿张的真实世界照片。
06:21
of the real world.
99
381529
1834
06:23
That's a lot of training examples.
100
383363
2280
这种“训练照片”的数量是非常大的。
06:26
So instead of focusing solely on better and better algorithms,
101
386383
5989
所以,与其孤立地关注于 算法的优化、再优化,
06:32
my insight was to give the algorithms the kind of training data
102
392372
5272
我的关注点放在了给算法 提供像那样的训练数据
06:37
that a child was given through experiences
103
397644
3319
——那些,婴儿们从经验中获得的 质量和数量都极其惊人的训练照片。
06:40
in both quantity and quality.
104
400963
3878
06:44
Once we know this,
105
404841
1858
一旦我们知道了这个,
06:46
we knew we needed to collect a data set
106
406699
2971
我们就明白自己需要收集的数据集,
06:49
that has far more images than we have ever had before,
107
409670
4459
必须比我们曾有过的任何数据库都丰富
06:54
perhaps thousands of times more,
108
414129
2577
——可能要丰富数千倍。
06:56
and together with Professor Kai Li at Princeton University,
109
416706
4111
因此,通过与普林斯顿大学的 Kai Li教授合作,
07:00
we launched the ImageNet project in 2007.
110
420817
4752
我们在2007年发起了 ImageNet(图片网络)计划。
07:05
Luckily, we didn't have to mount a camera on our head
111
425569
3838
幸运的是,我们不必在自己脑子里 装上一台照相机,然后等它拍很多年。
07:09
and wait for many years.
112
429407
1764
我们运用了互联网,
07:11
We went to the Internet,
113
431171
1463
07:12
the biggest treasure trove of pictures that humans have ever created.
114
432634
4436
这个由人类创造的 最大的图片宝库。
07:17
We downloaded nearly a billion images
115
437070
3041
我们下载了接近10亿张图片
07:20
and used crowdsourcing technology like the Amazon Mechanical Turk platform
116
440111
5880
并利用众包技术(利用互联网分配工作、发现创意或 解决技术问题),像“亚马逊土耳其机器人”这样的平台
07:25
to help us to label these images.
117
445991
2339
来帮我们标记这些图片。
07:28
At its peak, ImageNet was one of the biggest employers
118
448330
4900
在高峰期时,ImageNet是「亚马逊土耳其机器人」 这个平台上最大的雇主之一:
07:33
of the Amazon Mechanical Turk workers:
119
453230
2996
07:36
together, almost 50,000 workers
120
456226
3854
来自世界上167个国家的 接近5万个工作者,在一起工作
07:40
from 167 countries around the world
121
460080
4040
帮我们筛选、排序、标记了 接近10亿张备选照片。
07:44
helped us to clean, sort and label
122
464120
3947
07:48
nearly a billion candidate images.
123
468067
3575
07:52
That was how much effort it took
124
472612
2653
这就是我们为这个计划投入的精力,
07:55
to capture even a fraction of the imagery
125
475265
3900
去捕捉,一个婴儿可能在他早期发育阶段 获取的”一小部分“图像。
07:59
a child's mind takes in in the early developmental years.
126
479165
4171
事后我们再来看,这个利用大数据来训练 计算机算法的思路,也许现在看起来很普通,
08:04
In hindsight, this idea of using big data
127
484148
3902
08:08
to train computer algorithms may seem obvious now,
128
488050
4550
08:12
but back in 2007, it was not so obvious.
129
492600
4110
但回到2007年时,它就不那么寻常了。
08:16
We were fairly alone on this journey for quite a while.
130
496710
3878
我们在这段旅程上孤独地前行了很久。
08:20
Some very friendly colleagues advised me to do something more useful for my tenure,
131
500588
5003
一些很友善的同事建议我 做一些更有用的事来获得终身教职,
08:25
and we were constantly struggling for research funding.
132
505591
4342
而且我们也不断地为项目的研究经费发愁。
08:29
Once, I even joked to my graduate students
133
509933
2485
有一次,我甚至对 我的研究生学生开玩笑说:
08:32
that I would just reopen my dry cleaner's shop to fund ImageNet.
134
512418
4063
我要重新回去开我的干洗店 来赚钱资助ImageNet了。
08:36
After all, that's how I funded my college years.
135
516481
4761
——毕竟,我的大学时光 就是靠这个资助的。
所以我们仍然在继续着。
08:41
So we carried on.
136
521242
1856
在2009年,ImageNet项目诞生了——
08:43
In 2009, the ImageNet project delivered
137
523098
3715
08:46
a database of 15 million images
138
526813
4042
一个含有1500万张照片的数据库, 涵盖了22000种物品。
08:50
across 22,000 classes of objects and things
139
530855
4805
08:55
organized by everyday English words.
140
535660
3320
这些物品是根据日常英语单词 进行分类组织的。
08:58
In both quantity and quality,
141
538980
2926
无论是在质量上还是数量上,
09:01
this was an unprecedented scale.
142
541906
2972
这都是一个规模空前的数据库。
09:04
As an example, in the case of cats,
143
544878
3461
举个例子,在"猫"这个对象中,
09:08
we have more than 62,000 cats
144
548339
2809
我们有超过62000只猫
09:11
of all kinds of looks and poses
145
551148
4110
长相各异,姿势五花八门,
09:15
and across all species of domestic and wild cats.
146
555258
5223
而且涵盖了各种品种的家猫和野猫。
09:20
We were thrilled to have put together ImageNet,
147
560481
3344
我们对ImageNet收集到的图片 感到异常兴奋,
09:23
and we wanted the whole research world to benefit from it,
148
563825
3738
而且我们希望整个研究界能从中受益,
09:27
so in the TED fashion, we opened up the entire data set
149
567563
4041
所以以一种和TED一样的方式,
我们公开了整个数据库, 免费提供给全世界的研究团体。
09:31
to the worldwide research community for free.
150
571604
3592
(掌声)
09:36
(Applause)
151
576636
4000
09:41
Now that we have the data to nourish our computer brain,
152
581416
4538
那么现在,我们有了用来 培育计算机大脑的数据库,
09:45
we're ready to come back to the algorithms themselves.
153
585954
3737
我们可以回到”算法“本身上来了。
09:49
As it turned out, the wealth of information provided by ImageNet
154
589691
5178
因为ImageNet的横空出世,它提供的信息财富 完美地适用于一些特定类别的机器学习算法,
09:54
was a perfect match to a particular class of machine learning algorithms
155
594869
4806
09:59
called convolutional neural network,
156
599675
2415
称作“卷积神经网络”,
10:02
pioneered by Kunihiko Fukushima, Geoff Hinton, and Yann LeCun
157
602090
5248
最早由Kunihiko Fukushima,Geoff Hinton, 和Yann LeCun在上世纪七八十年代开创。
10:07
back in the 1970s and '80s.
158
607338
3645
10:10
Just like the brain consists of billions of highly connected neurons,
159
610983
5619
就像大脑是由上十亿的 紧密联结的神经元组成,
10:16
a basic operating unit in a neural network
160
616602
3854
神经网络里最基础的运算单元 也是一个“神经元式”的节点。
10:20
is a neuron-like node.
161
620456
2415
10:22
It takes input from other nodes
162
622871
2554
每个节点从其它节点处获取输入信息, 然后把自己的输出信息再交给另外的节点。
10:25
and sends output to others.
163
625425
2718
此外,这些成千上万、甚至上百万的节点
10:28
Moreover, these hundreds of thousands or even millions of nodes
164
628143
4713
10:32
are organized in hierarchical layers,
165
632856
3227
都被按等级分布于不同层次,
就像大脑一样。
10:36
also similar to the brain.
166
636083
2554
10:38
In a typical neural network we use to train our object recognition model,
167
638637
4783
在一个我们用来训练“对象识别模型”的 典型神经网络里,
10:43
it has 24 million nodes,
168
643420
3181
有着2400万个节点,1亿4千万个参数, 和150亿个联结。
10:46
140 million parameters,
169
646601
3297
10:49
and 15 billion connections.
170
649898
2763
10:52
That's an enormous model.
171
652661
2415
这是一个庞大的模型。
10:55
Powered by the massive data from ImageNet
172
655076
3901
借助ImageNet提供的巨大规模数据支持,
10:58
and the modern CPUs and GPUs to train such a humongous model,
173
658977
5433
通过大量最先进的CPU和GPU, 来训练这些堆积如山的模型,
11:04
the convolutional neural network
174
664410
2369
“卷积神经网络” 以难以想象的方式蓬勃发展起来。
11:06
blossomed in a way that no one expected.
175
666779
3436
它成为了一个成功体系,
11:10
It became the winning architecture
176
670215
2508
11:12
to generate exciting new results in object recognition.
177
672723
5340
在对象识别领域, 产生了激动人心的新成果。
11:18
This is a computer telling us
178
678063
2810
这张图,是计算机在告诉我们:
11:20
this picture contains a cat
179
680873
2300
照片里有一只猫、
11:23
and where the cat is.
180
683173
1903
还有猫所在的位置。
当然不止有猫了,
11:25
Of course there are more things than cats,
181
685076
2112
11:27
so here's a computer algorithm telling us
182
687188
2438
所以这是计算机算法在告诉我们
11:29
the picture contains a boy and a teddy bear;
183
689626
3274
照片里有一个男孩,和一个泰迪熊;
11:32
a dog, a person, and a small kite in the background;
184
692900
4366
一只狗,一个人,和背景里的小风筝;
或者是一张拍摄于闹市的照片 比如人、滑板、栏杆、灯柱…等等。
11:37
or a picture of very busy things
185
697266
3135
11:40
like a man, a skateboard, railings, a lampost, and so on.
186
700401
4644
有时候,如果计算机 不是很确定它看到的是什么,
11:45
Sometimes, when the computer is not so confident about what it sees,
187
705045
5293
11:51
we have taught it to be smart enough
188
711498
2276
我们还教它用足够聪明的方式 给出一个“安全”的答案,而不是“言多必失”
11:53
to give us a safe answer instead of committing too much,
189
713774
3878
11:57
just like we would do,
190
717652
2811
——就像人类面对这类问题时一样。
12:00
but other times our computer algorithm is remarkable at telling us
191
720463
4666
但在其他时候,我们的计算机 算法厉害到可以告诉我们
12:05
what exactly the objects are,
192
725129
2253
关于对象的更确切的信息, 比如汽车的品牌、型号、年份。
12:07
like the make, model, year of the cars.
193
727382
3436
12:10
We applied this algorithm to millions of Google Street View images
194
730818
5386
我们在上百万张谷歌街景照片中 应用了这一算法,
12:16
across hundreds of American cities,
195
736204
3135
那些照片涵盖了上百个美国城市。
12:19
and we have learned something really interesting:
196
739339
2926
我们从中发现一些有趣的事:
12:22
first, it confirmed our common wisdom
197
742265
3320
首先,它证实了我们的一些常识:
12:25
that car prices correlate very well
198
745585
3290
汽车的价格,与家庭收入 呈现出明显的正相关。
12:28
with household incomes.
199
748875
2345
12:31
But surprisingly, car prices also correlate well
200
751220
4527
但令人惊奇的是,汽车价格与犯罪率 也呈现出明显的正相关性,
12:35
with crime rates in cities,
201
755747
2300
以上结论是基于城市、或投票的 邮编区域进行分析的结果。
12:39
or voting patterns by zip codes.
202
759007
3963
那么等一下,这就是全部成果了吗?
12:44
So wait a minute. Is that it?
203
764060
2206
计算机是不是已经达到, 或者甚至超过了人类的能力?
12:46
Has the computer already matched or even surpassed human capabilities?
204
766266
5153
12:51
Not so fast.
205
771419
2138
——还没有那么快。
12:53
So far, we have just taught the computer to see objects.
206
773557
4923
目前为止,我们还只是 教会了计算机去看对象。
12:58
This is like a small child learning to utter a few nouns.
207
778480
4644
这就像是一个小宝宝学会说出几个名词。
这是一项难以置信的成就,
13:03
It's an incredible accomplishment,
208
783124
2670
13:05
but it's only the first step.
209
785794
2460
但这还只是第一步。
13:08
Soon, another developmental milestone will be hit,
210
788254
3762
很快,我们就会到达 发展历程的另一个里程碑:
这个小孩会开始用“句子”进行交流。
13:12
and children begin to communicate in sentences.
211
792016
3461
13:15
So instead of saying this is a cat in the picture,
212
795477
4224
所以不止是说这张图里有只“猫”,
13:19
you already heard the little girl telling us this is a cat lying on a bed.
213
799701
5202
你在开头已经听到小妹妹 告诉我们“这只猫是坐在床上的”。
13:24
So to teach a computer to see a picture and generate sentences,
214
804903
5595
为了教计算机看懂图片并生成句子,
13:30
the marriage between big data and machine learning algorithm
215
810498
3948
“大数据”和“机器学习算法”的结合 需要更进一步。
13:34
has to take another step.
216
814446
2275
13:36
Now, the computer has to learn from both pictures
217
816721
4156
现在,计算机需要从图片和人类创造的 自然语言句子中同时进行学习。
13:40
as well as natural language sentences
218
820877
2856
13:43
generated by humans.
219
823733
3322
就像我们的大脑, 把视觉现象和语言融合在一起,
13:47
Just like the brain integrates vision and language,
220
827055
3853
13:50
we developed a model that connects parts of visual things
221
830908
5201
我们开发了一个模型,
可以把一部分视觉信息,像视觉片段, 与语句中的文字、短语联系起来。
13:56
like visual snippets
222
836109
1904
13:58
with words and phrases in sentences.
223
838013
4203
14:02
About four months ago,
224
842216
2763
大约4个月前, 我们最终把所有技术结合在了一起,
14:04
we finally tied all this together
225
844979
2647
14:07
and produced one of the first computer vision models
226
847626
3784
创造了第一个“计算机视觉模型”,
14:11
that is capable of generating a human-like sentence
227
851410
3994
它在看到图片的第一时间,就有能力生成 类似人类语言的句子。
14:15
when it sees a picture for the first time.
228
855404
3506
14:18
Now, I'm ready to show you what the computer says
229
858910
4644
现在,我准备给你们看看 计算机看到图片时会说些什么
14:23
when it sees the picture
230
863554
1975
14:25
that the little girl saw at the beginning of this talk.
231
865529
3830
——还是那些在演讲开头给小女孩看的图片。
(视频)计算机: “一个男人站在一头大象旁边。”
14:31
(Video) Computer: A man is standing next to an elephant.
232
871519
3344
14:36
A large airplane sitting on top of an airport runway.
233
876393
3634
“一架大飞机停在机场跑道一端。”
14:41
FFL: Of course, we're still working hard to improve our algorithms,
234
881057
4212
李飞飞: 当然,我们还在努力改善我们的算法,
14:45
and it still has a lot to learn.
235
885269
2596
它还有很多要学的东西。
14:47
(Applause)
236
887865
2291
(掌声)
14:51
And the computer still makes mistakes.
237
891556
3321
计算机还是会犯很多错误的。
14:54
(Video) Computer: A cat lying on a bed in a blanket.
238
894877
3391
(视频)计算机: “一只猫躺在床上的毯子上。”
李飞飞:所以…当然——如果它看过太多种的猫, 它就会觉得什么东西都长得像猫……
14:58
FFL: So of course, when it sees too many cats,
239
898268
2553
15:00
it thinks everything might look like a cat.
240
900821
2926
(视频)计算机: “一个小男孩拿着一根棒球棍。”
15:05
(Video) Computer: A young boy is holding a baseball bat.
241
905317
2864
15:08
(Laughter)
242
908181
1765
(笑声)
15:09
FFL: Or, if it hasn't seen a toothbrush, it confuses it with a baseball bat.
243
909946
4583
李飞飞:或者…如果它从没见过牙刷, 它就分不清牙刷和棒球棍的区别。
15:15
(Video) Computer: A man riding a horse down a street next to a building.
244
915309
3434
(视频)计算机: “建筑旁的街道上有一个男人骑马经过。”
15:18
(Laughter)
245
918743
2023
(笑声)
15:20
FFL: We haven't taught Art 101 to the computers.
246
920766
3552
李飞飞:我们还没教它Art 101 (美国大学艺术基础课)。
15:25
(Video) Computer: A zebra standing in a field of grass.
247
925768
2884
(视频)计算机: “一只斑马站在一片草原上。”
15:28
FFL: And it hasn't learned to appreciate the stunning beauty of nature
248
928652
3367
李飞飞:它还没学会像你我一样 欣赏大自然里的绝美景色。
15:32
like you and I do.
249
932019
2438
15:34
So it has been a long journey.
250
934457
2832
所以,这是一条漫长的道路。
15:37
To get from age zero to three was hard.
251
937289
4226
将一个孩子从出生培养到3岁是很辛苦的。
15:41
The real challenge is to go from three to 13 and far beyond.
252
941515
5596
而真正的挑战是从3岁到13岁的过程中, 而且远远不止于此。
让我再给你们看看这张 关于小男孩和蛋糕的图。
15:47
Let me remind you with this picture of the boy and the cake again.
253
947111
4365
15:51
So far, we have taught the computer to see objects
254
951476
4064
目前为止, 我们已经教会计算机“看”对象,
15:55
or even tell us a simple story when seeing a picture.
255
955540
4458
或者甚至基于图片, 告诉我们一个简单的故事。
15:59
(Video) Computer: A person sitting at a table with a cake.
256
959998
3576
(视频)计算机: ”一个人坐在放蛋糕的桌子旁。“
16:03
FFL: But there's so much more to this picture
257
963574
2630
李飞飞:但图片里还有更多信息 ——远不止一个人和一个蛋糕。
16:06
than just a person and a cake.
258
966204
2270
16:08
What the computer doesn't see is that this is a special Italian cake
259
968474
4467
计算机无法理解的是: 这是一个特殊的意大利蛋糕,
16:12
that's only served during Easter time.
260
972941
3217
它只在复活节限时供应。
而这个男孩穿着的 是他最喜欢的T恤衫,
16:16
The boy is wearing his favorite t-shirt
261
976158
3205
16:19
given to him as a gift by his father after a trip to Sydney,
262
979363
3970
那是他父亲去悉尼旅行时 带给他的礼物。
16:23
and you and I can all tell how happy he is
263
983333
3808
另外,你和我都能清楚地看出, 这个小孩有多高兴,以及这一刻在想什么。
16:27
and what's exactly on his mind at that moment.
264
987141
3203
这是我的儿子Leo。
16:31
This is my son Leo.
265
991214
3125
在我探索视觉智能的道路上,
16:34
On my quest for visual intelligence,
266
994339
2624
16:36
I think of Leo constantly
267
996963
2391
我不断地想到Leo 和他未来将要生活的那个世界。
16:39
and the future world he will live in.
268
999354
2903
当机器可以“看到”的时候,
16:42
When machines can see,
269
1002257
2021
16:44
doctors and nurses will have extra pairs of tireless eyes
270
1004278
4712
医生和护士会获得一双额外的、 不知疲倦的眼睛,
16:48
to help them to diagnose and take care of patients.
271
1008990
4092
帮他们诊断病情、照顾病人。
16:53
Cars will run smarter and safer on the road.
272
1013082
4383
汽车可以在道路上行驶得 更智能、更安全。
16:57
Robots, not just humans,
273
1017465
2694
机器人,而不只是人类,
会帮我们救助灾区被困和受伤的人员。
17:00
will help us to brave the disaster zones to save the trapped and wounded.
274
1020159
4849
17:05
We will discover new species, better materials,
275
1025798
3796
我们会发现新的物种、更好的材料,
17:09
and explore unseen frontiers with the help of the machines.
276
1029594
4509
还可以在机器的帮助下 探索从未见到过的前沿地带。
一点一点地, 我们正在赋予机器以视力。
17:15
Little by little, we're giving sight to the machines.
277
1035113
4167
17:19
First, we teach them to see.
278
1039280
2798
首先,我们教它们去“看”。
然后,它们反过来也帮助我们, 让我们看得更清楚。
17:22
Then, they help us to see better.
279
1042078
2763
17:24
For the first time, human eyes won't be the only ones
280
1044841
4165
这是第一次,人类的眼睛不再 独自地思考和探索我们的世界。
17:29
pondering and exploring our world.
281
1049006
2934
17:31
We will not only use the machines for their intelligence,
282
1051940
3460
我们将不止是“使用”机器的智力,
17:35
we will also collaborate with them in ways that we cannot even imagine.
283
1055400
6179
我们还要以一种从未想象过的方式, 与它们“合作”。
17:41
This is my quest:
284
1061579
2161
我所追求的是:
17:43
to give computers visual intelligence
285
1063740
2712
赋予计算机视觉智能,
17:46
and to create a better future for Leo and for the world.
286
1066452
5131
并为Leo和这个世界, 创造出更美好的未来。
17:51
Thank you.
287
1071583
1811
谢谢。
(掌声)
17:53
(Applause)
288
1073394
3785
关于本网站

这个网站将向你介绍对学习英语有用的YouTube视频。你将看到来自世界各地的一流教师教授的英语课程。双击每个视频页面上显示的英文字幕,即可从那里播放视频。字幕会随着视频的播放而同步滚动。如果你有任何意见或要求,请使用此联系表与我们联系。

https://forms.gle/WvT1wiN1qDtmnspy7


This website was created in October 2020 and last updated on June 12, 2025.

It is now archived and preserved as an English learning resource.

Some information may be out of date.

隐私政策

eng.lish.video

Developer's Blog