With Spatial Intelligence, AI Will Understand the Real World | Fei-Fei Li | TED

621,822 views ・ 2024-05-16

TED


Please double-click on the English subtitles below to play the video.

00:04
Let me show you something.
0
4334
1877
00:06
To be precise,
1
6253
1626
00:07
I'm going to show you nothing.
2
7921
2002
00:10
This was the world 540 million years ago.
3
10423
4797
00:15
Pure, endless darkness.
4
15262
2711
00:18
It wasn't dark due to a lack of light.
5
18723
3587
00:22
It was dark because of a lack of sight.
6
22602
3253
00:27
Although sunshine did filter 1,000 meters
7
27566
5005
00:32
beneath the surface of ocean,
8
32612
2378
00:35
a light permeated from hydrothermal vents to seafloor,
9
35031
5339
00:40
brimming with life,
10
40370
1710
00:42
there was not a single eye to be found in these ancient waters.
11
42122
5046
00:47
No retinas, no corneas, no lenses.
12
47669
4588
00:52
So all this light, all this life went unseen.
13
52632
4880
00:57
There was a time that the very idea of seeing didn't exist.
14
57971
5005
01:03
It [had] simply never been done before.
15
63351
2544
01:06
Until it was.
16
66438
1459
01:09
So for reasons we're only beginning to understand,
17
69274
3253
01:12
trilobites, the first organisms that could sense light, emerged.
18
72569
5839
01:18
They're the first inhabitants of this reality that we take for granted.
19
78408
5797
01:24
First to discover that there is something other than oneself.
20
84247
4671
01:28
A world of many selves.
21
88918
2420
01:32
The ability to see is thought to have ushered in Cambrian explosion,
22
92339
4754
01:37
a period in which a huge variety of animal species
23
97093
4338
01:41
entered fossil records.
24
101431
2377
01:43
What began as a passive experience,
25
103808
3045
01:46
the simple act of letting light in,
26
106895
3462
01:50
soon became far more active.
27
110357
2544
01:53
The nervous system began to evolve.
28
113443
3503
01:56
Sight turning to insight.
29
116988
3379
02:00
Seeing became understanding.
30
120367
2877
02:03
Understanding led to actions.
31
123244
2461
02:05
And all these gave rise to intelligence.
32
125705
4129
02:10
Today, we're no longer satisfied with just nature's gift of visual intelligence.
33
130669
6756
02:17
Curiosity urges us to create machines to see just as intelligently as we can,
34
137425
6507
02:23
if not better.
35
143932
1793
02:25
Nine years ago, on this stage,
36
145725
2086
02:27
I delivered an early progress report on computer vision,
37
147811
4421
02:32
a subfield of artificial intelligence.
38
152273
2461
02:35
Three powerful forces converged for the first time.
39
155235
4546
02:39
Aa family of algorithms called neural networks.
40
159823
3587
02:43
Fast, specialized hardware called graphic processing units,
41
163410
4587
02:48
or GPUs.
42
168039
1585
02:49
And big data.
43
169666
1418
02:51
Like the 15 million images that my lab spent years curating called ImageNet.
44
171126
6256
02:57
Together, they ushered in the age of modern AI.
45
177382
4171
03:02
We've come a long way.
46
182554
1585
03:04
Back then, just putting labels on images was a big breakthrough.
47
184139
4546
03:09
But the speed and accuracy of these algorithms just improved rapidly.
48
189352
4963
03:14
The annual ImageNet challenge, led by my lab,
49
194816
3503
03:18
gauged the performance of this progress.
50
198361
3003
03:21
And on this plot, you're seeing the annual improvement
51
201364
3587
03:24
and milestone models.
52
204993
2127
03:27
We went a step further
53
207787
1669
03:29
and created algorithms that can segment objects
54
209456
5005
03:34
or predict the dynamic relationships among them
55
214461
3378
03:37
in these works done by my students and collaborators.
56
217839
3587
03:41
And there's more.
57
221885
1543
03:43
Recall last time I showed you the first computer-vision algorithm
58
223428
4463
03:47
that can describe a photo in human natural language.
59
227932
4463
03:52
That was work done with my brilliant former student, Andrej Karpathy.
60
232729
4171
03:57
At that time, I pushed my luck and said,
61
237484
2294
03:59
"Andrej, can we make computers to do the reverse?"
62
239819
3045
04:02
And Andrej said, "Ha ha, that's impossible."
63
242906
2919
04:05
Well, as you can see from this post,
64
245867
1835
04:07
recently the impossible has become possible.
65
247744
3628
04:11
That's thanks to a family of diffusion models
66
251998
3003
04:15
that powers today's generative AI algorithm,
67
255001
3545
04:18
which can take human-prompted sentences
68
258546
3629
04:22
and turn them into photos and videos
69
262175
3462
04:25
of something that's entirely new.
70
265678
2628
04:28
Many of you have seen the recent impressive results of Sora by OpenAI.
71
268306
5297
04:34
But even without the enormous number of GPUs,
72
274187
3754
04:37
my student and our collaborators
73
277941
2502
04:40
have developed a generative video model called Walt
74
280443
4421
04:44
months before Sora.
75
284906
2502
04:47
And you're seeing some of these results.
76
287450
2586
04:50
There is room for improvement.
77
290703
2711
04:53
I mean, look at that cat's eye
78
293414
2294
04:55
and the way it goes under the wave without ever getting wet.
79
295750
3337
04:59
What a cat-astrophe.
80
299546
1710
05:01
(Laughter)
81
301673
2711
05:04
And if past is prologue,
82
304425
2670
05:07
we will learn from these mistakes and create a future we imagine.
83
307136
4672
05:11
And in this future,
84
311850
1793
05:13
we want AI to do everything it can for us,
85
313643
3629
05:17
or to help us.
86
317313
1877
05:19
For years I have been saying
87
319607
2461
05:22
that taking a picture is not the same as seeing and understanding.
88
322110
4379
05:26
Today, I would like to add to that.
89
326906
3128
05:30
Simply seeing is not enough.
90
330034
3170
05:33
Seeing is for doing and learning.
91
333204
3212
05:36
When we act upon this world in 3D space and time,
92
336833
4755
05:41
we learn, and we learn to see and do better.
93
341629
4213
05:46
Nature has created this virtuous cycle of seeing and doing
94
346175
4463
05:50
powered by “spatial intelligence.”
95
350680
2836
05:54
To illustrate to you what your spatial intelligence is doing constantly,
96
354142
4171
05:58
look at this picture.
97
358354
1335
05:59
Raise your hand if you feel like you want to do something.
98
359731
3003
06:02
(Laughter)
99
362775
1377
06:04
In the last split of a second,
100
364193
2420
06:06
your brain looked at the geometry of this glass,
101
366654
3087
06:09
its place in 3D space,
102
369782
3003
06:12
its relationship with the table, the cat
103
372827
2503
06:15
and everything else.
104
375371
1335
06:16
And you can predict what's going to happen next.
105
376706
3045
06:20
The urge to act is innate to all beings with spatial intelligence,
106
380501
6632
06:27
which links perception with action.
107
387133
3086
06:30
And if we want to advance AI beyond its current capabilities,
108
390637
5422
06:36
we want more than AI that can see and talk.
109
396059
3169
06:39
We want AI that can do.
110
399270
2711
06:42
Indeed, we're making exciting progress.
111
402815
3838
06:46
The recent milestones in spatial intelligence
112
406694
4004
06:50
is teaching computers to see, learn, do
113
410698
3921
06:54
and learn to see and do better.
114
414619
2210
06:57
This is not easy.
115
417372
1710
06:59
It took nature millions of years to evolve spatial intelligence,
116
419123
5172
07:04
which depends on the eye taking light,
117
424295
2711
07:07
project 2D images on the retina
118
427006
2711
07:09
and the brain to translate these data into 3D information.
119
429717
4004
07:14
Only recently, a group of researchers from Google
120
434222
3587
07:17
are able to develop an algorithm to take a bunch of photos
121
437850
4880
07:22
and translate that into 3D space,
122
442730
3337
07:26
like the examples we're showing here.
123
446109
2252
07:29
My student and our collaborators have taken a step further
124
449070
4630
07:33
and created an algorithm that takes one input image
125
453741
4421
07:38
and turn that into 3D shape.
126
458204
2586
07:40
Here are more examples.
127
460832
1960
07:43
Recall, we talked about computer programs that can take a human sentence
128
463668
5422
07:49
and turn it into videos.
129
469132
2043
07:51
A group of researchers in University of Michigan
130
471217
4046
07:55
have figured out a way to translate that line of sentence
131
475304
3796
07:59
into 3D room layout, like shown here.
132
479142
3378
08:03
And my colleagues at Stanford and their students
133
483354
3337
08:06
have developed an algorithm that takes one image
134
486691
4087
08:10
and generates infinitely plausible spaces
135
490820
3420
08:14
for viewers to explore.
136
494240
1960
08:17
These are prototypes of the first budding signs of a future possibility.
137
497035
6256
08:23
One in which the human race can take our entire world
138
503332
6507
08:29
and translate it into digital forms
139
509881
2210
08:32
and model the richness and nuances.
140
512133
2753
08:35
What nature did to us implicitly in our individual minds,
141
515303
5255
08:40
spatial intelligence technology can hope to do
142
520600
3587
08:44
for our collective consciousness.
143
524187
2210
08:47
As the progress of spatial intelligence accelerates,
144
527356
3963
08:51
a new era in this virtuous cycle is taking place in front of our eyes.
145
531319
5589
08:56
This back and forth is catalyzing robotic learning,
146
536908
4462
09:01
a key component for any embodied intelligence system
147
541412
5005
09:06
that needs to understand and interact with the 3D world.
148
546417
5297
09:12
A decade ago,
149
552507
1626
09:14
ImageNet from my lab
150
554175
2169
09:16
enabled a database of millions of high-quality photos
151
556385
4547
09:20
to help train computers to see.
152
560932
2460
09:23
Today, we're doing the same with behaviors and actions
153
563810
4754
09:28
to train computers and robots how to act in the 3D world.
154
568606
4796
09:34
But instead of collecting static images,
155
574403
3254
09:37
we develop simulation environments powered by 3D spatial models
156
577657
5755
09:43
so that the computers can have infinite varieties of possibilities
157
583454
5339
09:48
to learn to act.
158
588793
2085
09:50
And you're just seeing a small number of examples
159
590920
4630
09:55
to teach our robots
160
595591
1418
09:57
in a project led by my lab called Behavior.
161
597009
3003
10:00
We’re also making exciting progress in robotic language intelligence.
162
600805
5839
10:06
Using large language model-based input,
163
606644
3170
10:09
my students and our collaborators are among the first teams
164
609814
4004
10:13
that can show a robotic arm performing a variety of tasks
165
613818
5547
10:19
based on verbal instructions,
166
619407
2002
10:21
like opening this drawer or unplugging a charged phone.
167
621409
4421
10:26
Or making sandwiches, using bread, lettuce, tomatoes
168
626330
5130
10:31
and even putting a napkin for the user.
169
631460
3045
10:34
Typically I would like a little more for my sandwich,
170
634505
2878
10:37
but this is a good start.
171
637425
1877
10:39
(Laughter)
172
639302
1167
10:40
In that primordial ocean, in our ancient times,
173
640970
5130
10:46
the ability to see and perceive one's environment
174
646142
3837
10:50
kicked off the Cambrian explosion of interactions with other life forms.
175
650021
5130
10:55
Today, that light is reaching the digital minds.
176
655193
4629
10:59
Spatial intelligence is allowing machines
177
659864
3503
11:03
to interact not only with one another,
178
663409
3086
11:06
but with humans, and with 3D worlds,
179
666537
3379
11:09
real or virtual.
180
669957
1919
11:12
And as that future is taking shape,
181
672251
2628
11:14
it will have a profound impact to many lives.
182
674879
3795
11:18
Let's take health care as an example.
183
678716
2878
11:21
For the past decade,
184
681636
1668
11:23
my lab has been taking some of the first steps
185
683346
3461
11:26
in applying AI to tackle challenges that impact patient outcome
186
686849
5589
11:32
and medical staff burnout.
187
692438
2252
11:34
Together with our collaborators from Stanford School of Medicine
188
694732
3754
11:38
and partnering hospitals,
189
698486
2085
11:40
we're piloting smart sensors
190
700571
2419
11:43
that can detect clinicians going into patient rooms
191
703032
3712
11:46
without properly washing their hands.
192
706744
3003
11:49
Or keep track of surgical instruments.
193
709747
3337
11:53
Or alert care teams when a patient is at physical risk,
194
713084
3920
11:57
such as falling.
195
717004
1502
11:59
We consider these techniques a form of ambient intelligence,
196
719465
4630
12:04
like extra pairs of eyes that do make a difference.
197
724095
4212
12:08
But I would like more interactive help for our patients, clinicians
198
728724
5756
12:14
and caretakers, who desperately also need an extra pair of hands.
199
734522
4921
12:19
Imagine an autonomous robot transporting medical supplies
200
739860
4672
12:24
while caretakers focus on our patients
201
744573
3003
12:27
or augmented reality, guiding surgeons to do safer, faster
202
747618
4713
12:32
and less invasive operations.
203
752373
2377
12:35
Or imagine patients with severe paralysis controlling robots with their thoughts.
204
755584
6841
12:42
That's right, brainwaves, to perform everyday tasks
205
762466
3921
12:46
that you and I take for granted.
206
766429
2627
12:49
You're seeing a glimpse of that future in this pilot study from my lab recently.
207
769098
5881
12:55
In this video, the robotic arm is cooking a Japanese sukiyaki meal
208
775021
5964
13:00
controlled only by the brain electrical signal,
209
780985
4463
13:05
non-invasively collected through an EEG cap.
210
785489
4338
13:10
(Applause)
211
790661
2586
13:13
Thank you.
212
793289
1168
13:16
The emergence of vision half a billion years ago
213
796292
3378
13:19
turned a world of darkness upside down.
214
799712
3337
13:23
It set off the most profound evolutionary process:
215
803090
4004
13:27
the development of intelligence in the animal world.
216
807136
3962
13:31
AI's breathtaking progress in the last decade is just as astounding.
217
811515
5381
13:37
But I believe the full potential of this digital Cambrian explosion
218
817730
4880
13:42
won't be fully realized until we power our computers and robots
219
822651
6507
13:49
with spatial intelligence,
220
829158
2169
13:51
just like what nature did to all of us.
221
831369
2585
13:55
It’s an exciting time to teach our digital companion
222
835081
4045
13:59
to learn to reason
223
839126
1669
14:00
and to interact with this beautiful 3D space we call home,
224
840836
4797
14:05
and also create many more new worlds that we can all explore.
225
845674
5172
14:11
To realize this future won't be easy.
226
851514
2878
14:14
It requires all of us to take thoughtful steps
227
854433
4463
14:18
and develop technologies that always put humans in the center.
228
858938
4421
14:23
But if we do this right,
229
863776
2210
14:26
the computers and robots powered by spatial intelligence
230
866028
3837
14:29
will not only be useful tools
231
869907
2419
14:32
but also trusted partners
232
872368
2586
14:34
to enhance and augment our productivity and humanity
233
874995
4130
14:39
while respecting our individual dignity
234
879166
2920
14:42
and lifting our collective prosperity.
235
882128
2585
14:45
What excites me the most in the future
236
885631
3712
14:49
is a future in which that AI grows more perceptive,
237
889343
4671
14:54
insightful and spatially aware,
238
894056
3128
14:57
and they join us on our quest
239
897184
2920
15:00
to always pursue a better way to make a better world.
240
900104
5172
15:05
Thank you.
241
905276
1209
15:06
(Applause)
242
906485
4296
About this website

This site will introduce you to YouTube videos that are useful for learning English. You will see English lessons taught by top-notch teachers from around the world. Double-click on the English subtitles displayed on each video page to play the video from there. The subtitles scroll in sync with the video playback. If you have any comments or requests, please contact us using this contact form.

https://forms.gle/WvT1wiN1qDtmnspy7