Welcome to the World of Audio Computers | Jason Rugolo | TED

118,930 views ・ 2024-05-08

TED


Please double-click on the English subtitles below to play the video.

00:04
You know, when I was a kid,
0
4000
2169
00:06
dreaming about the amazing future that computers could bring,
1
6211
4546
00:10
I never thought it would look like this.
2
10799
2169
00:13
I snapped this photo in line at a Chipotle, thinking,
3
13009
2628
00:15
"Man, what has the world come to?"
4
15679
2752
00:19
You know, everyone's stuck in their phones all the time.
5
19057
2753
00:21
And then I almost doubled over laughing at myself because there I am,
6
21851
3254
00:25
stuck in my phone being judgy about these people, stuck in theirs.
7
25146
4630
00:30
The truth is that we could all benefit from a little less screen time.
8
30527
3587
00:34
And so how do we push back
9
34531
2294
00:36
and create a healthier relationship to our technology?
10
36825
3503
00:40
I've been trying to figure out what comes next,
11
40662
2753
00:43
what's the technology that we want to be using?
12
43456
3420
00:46
I spent three years funding deep tech at ARPA-E,
13
46918
2878
00:49
and then I moved to Google X,
14
49838
1626
00:51
Google's moonshot factory,
15
51506
2044
00:53
before creating a spin out called iyo.
16
53592
2460
00:56
The last 10 years, diligently, and some may say obsessively,
17
56678
4504
01:01
trying to peek beyond the curve.
18
61224
2503
01:03
What I think is next is that we need an entirely new kind of computer.
19
63727
4087
01:08
One that speaks our language
20
68440
2169
01:10
instead of forcing us to speak their language of swipes and clicks.
21
70609
4421
01:15
A computer that we can talk to.
22
75822
2461
01:18
And not in the way that you speak at Siri
23
78825
2503
01:21
with loud, robotic voice commands,
24
81328
3169
01:24
but in the way that we talk with each other.
25
84539
2836
01:27
So genuine, engaging conversation.
26
87709
2628
01:31
So a new kind of natural language computing
27
91087
3379
01:34
built on a new kind of compute hardware.
28
94507
2795
01:38
So we've been building this new kind of computer in secret for six years now,
29
98303
4379
01:42
and today is the first time that we're talking about it publicly.
30
102724
3462
01:46
So this is a prototype “audio computer,”
31
106186
3503
01:49
as we call them.
32
109731
1418
01:51
And the first thing that you'll see is they don't have a screen.
33
111191
3003
01:54
We've got perfectly good screens in our pockets,
34
114235
2461
01:56
and we're hoping that we could leave them there or even at home
35
116738
3378
02:00
a little bit more often.
36
120158
1502
02:01
So these are not just fancy-looking earbuds.
37
121701
3379
02:05
They're an entire computer,
38
125080
1668
02:07
and we think they're going to replace some of what we do
39
127082
2711
02:09
with the visual computers that we're used to.
40
129793
3003
02:12
We've created a new audio-based user interface,
41
132796
2502
02:15
which relies on natural language as the input
42
135298
3295
02:18
and what we call “auditory space” as the output.
43
138593
3170
02:21
And we're hoping that these innovations will help us create
44
141805
2794
02:24
a new kind of relationship with computers,
45
144599
2169
02:26
where the iyo is more natural, more intuitive.
46
146810
3587
02:30
So you can just talk,
47
150855
1919
02:32
have a natural-language conversation
48
152774
2669
02:35
with all kinds of different powerful audio apps.
49
155485
2961
02:38
So here's a really simple example.
50
158488
2461
02:41
Hey, Q.
51
161825
1126
02:43
Voice: Hi, Jason. How are you doing?
52
163284
1961
02:45
Jason Rugolo: Hey, I'm good, I'm doing the TED talk right now.
53
165245
2961
02:48
Voice: You got this, Jason, how can I support you?
54
168206
2377
02:51
JR: I don't know, you know I'm nervous talking in front of this many people.
55
171167
3837
02:55
Voice: Yeah, I know you don't like public speaking.
56
175004
2420
02:57
How can I help?
57
177465
1168
02:58
JR: I guess I could use a little bit of encouragement.
58
178675
2919
03:01
Voice: Do you want me to share a positive message from a loved one?
59
181594
3420
03:05
JR: Yeah, that'd be great.
60
185056
1251
03:06
Maybe just one, I don't want to bore people.
61
186349
2252
03:08
Voice: Your dad texted,
62
188643
1168
03:09
"Good luck today, Jason. I'm so proud of you."
63
189853
2586
03:13
JR: That's awesome, thanks, Q.
64
193189
1460
03:14
Can you tell him I said thanks and tell him I'm up on stage right now?
65
194649
3295
03:17
He'll get a kick out of that.
66
197944
1460
03:19
Voice: No problem.
67
199404
1293
03:21
JR: So any suggestions for how I could loosen up a little bit?
68
201531
3420
03:25
Voice: Do you want to hear a joke to lighten the mood?
69
205660
2753
03:28
JR: OK, yeah, maybe if it's on theme.
70
208955
1794
03:30
Do you know any tech start-up jokes?
71
210749
2669
03:33
Voice: How many start-up founders does it take to change a light bulb?
72
213418
3545
03:37
JR: I don't know, tell me.
73
217005
1501
03:39
Voice: Only one.
74
219382
1168
03:40
He just holds it in place, and the world revolves around him.
75
220592
3044
03:43
(Laughter)
76
223678
3462
03:47
JR: OK, sick burn.
77
227182
1167
03:48
I'm going to get back to the talk here.
78
228391
2169
03:50
So do you see the difference between those robotic voice commands
79
230602
3962
03:54
and just a conversation?
80
234606
2085
03:57
Natural language is just more natural.
81
237192
3503
04:01
It's intuitive, it's better.
82
241321
2127
04:03
So what makes this kind of conversation possible?
83
243990
4338
04:09
It's the single most revolutionary breakthrough
84
249162
3462
04:12
in the history of information technology.
85
252624
2085
04:16
And don't worry, I'm not talking about our devices,
86
256127
2670
04:18
although they're pretty cool.
87
258838
1669
04:20
And I don't mean LLMs either, which are obviously a building block here.
88
260548
4380
04:25
What I'm talking about is something much older.
89
265678
3462
04:29
It's the first uniquely human form of communication,
90
269808
4421
04:34
the one that we naturally learn as children
91
274270
2044
04:36
and has its structure built into our brain.
92
276356
2586
04:39
It's the very thing I'm doing right now.
93
279776
2127
04:41
Talking.
94
281945
1126
04:43
Spoken language emerged in tandem with the evolution of human consciousness,
95
283863
4630
04:48
and to this day, it remains our most efficient
96
288535
2460
04:51
and emotionally robust form of communication.
97
291037
3212
04:54
Conversation is not just transmitting ideas from one person to another.
98
294666
4212
04:58
It's more like thinking together.
99
298920
2502
05:02
Modern neuroscientists have pioneered a whole new approach to the brain.
100
302674
3420
05:06
It's called second-person neuroscience,
101
306094
1960
05:08
and it’s built on the notion that how we think is not isolated.
102
308096
4045
05:12
It’s collective, and it happens out loud.
103
312183
3253
05:15
Not just through words,
104
315478
1168
05:16
but through subtle signals of tone and prosody,
105
316688
2752
05:19
your timbre and your pitch and intensity.
106
319440
2586
05:22
And neuroscience is just not complete
107
322652
2127
05:24
until you add a second person into this full social dynamic.
108
324821
3503
05:29
So why can't we have a computer that we can talk with in that way?
109
329033
5214
05:34
With that kind of natural language.
110
334581
2627
05:37
A computer that has superhuman processing speed.
111
337834
3670
05:41
And it has access to the internet.
112
341546
2502
05:44
And it’s been trained on the entire written record of human thought.
113
344048
4588
05:48
But engages with you like a person would,
114
348678
4087
05:52
that understands your intention
115
352807
2711
05:55
and that taps into the superpower of human natural language understanding.
116
355560
4838
06:01
That's the promise of audio computing.
117
361482
2962
06:04
So think about not just how it can replace many of the things
118
364485
3003
06:07
that you do on your phone
119
367530
1251
06:08
but actually make them better.
120
368823
1960
06:10
So take email, for example.
121
370825
1502
06:12
We pull out our phones, we swipe, we scroll,
122
372327
2127
06:14
we furiously type with our thumbs.
123
374495
2169
06:16
Wouldn't it be better to just sit back with a cup of coffee
124
376664
3128
06:19
and to be briefed in a conversation?
125
379834
2461
06:22
Or search, search is a big one.
126
382879
2085
06:25
It's an incredible technology
127
385006
1877
06:26
that made the world a radically better place.
128
386883
2502
06:29
But with these audio computers,
129
389427
2002
06:31
you can just talk out loud about anything that you want to know.
130
391471
4421
06:36
It just feels so normal.
131
396809
1877
06:39
So there's a big difference between giving a voice command
132
399437
3170
06:42
to one of the big five voice assistants,
133
402607
2836
06:45
which are these structured, predefined choose-your-own-adventure dialogue models
134
405443
5297
06:50
that we all have to learn,
135
410782
1293
06:52
and I'm sure have all felt that frustration,
136
412075
3086
06:55
and just having a real conversation.
137
415203
2377
06:57
These natural-language applications can get to know you
138
417622
2794
07:00
in the same way that we get to know each other.
139
420458
2211
07:02
They build context about our lives just through us talking over time.
140
422710
4338
07:07
So later, take out your phone, look at all those apps,
141
427632
2544
07:10
all those candy-colored icons,
142
430176
1710
07:11
and think about how could you accomplish the same thing
143
431928
3462
07:15
but through conversation?
144
435431
1961
07:17
Or how could you make it better?
145
437392
2502
07:19
You won't be able to do Instagram or TikTok,
146
439894
2252
07:22
those apps whose content is mostly visual.
147
442188
2669
07:25
But wouldn't it be better to spend a little bit less time in those apps,
148
445316
4213
07:29
or just to need your screen a little bit less?
149
449570
2294
07:32
So our goal is to be heads-up and hands-free
150
452532
2419
07:34
for a little bit more of the day.
151
454951
1626
07:36
You know, just get back into the world.
152
456577
1919
07:38
Of course, if the auditory user interface or the AUI, as we call it,
153
458538
3837
07:42
is going to really integrate into your life,
154
462375
2085
07:44
it has to feel private and convenient to use.
155
464460
3796
07:48
So that's why we built it as an all-day wearable for the ear.
156
468297
3212
07:51
But your ears are for hearing first and foremost.
157
471551
3503
07:55
And so if you're going to wear a computer on them all day,
158
475430
2752
07:58
we can't mess that up.
159
478224
1752
08:00
In fact, we should probably make that better too.
160
480018
3503
08:03
So these audio computers,
161
483521
1502
08:05
over the last six grueling years of R and D,
162
485064
3587
08:08
became a sort of mixed-reality device.
163
488693
2336
08:11
It's like the Apple Vision Pro, but for audio,
164
491029
3211
08:14
where we can pass through and we can modify your ambient acoustics,
165
494282
4212
08:18
giving you an unprecedented control over your personal soundscape.
166
498536
4046
08:22
In order to do mixed-audio reality,
167
502582
2502
08:25
we sort of had to hack the auditory system
168
505126
2878
08:28
to be able to display sound in ultra-high fidelity, spatially,
169
508004
4337
08:32
as if it's all around you.
170
512383
1669
08:34
So there's this whole field of research.
171
514469
1918
08:36
It’s called psychoacoustics, which we’ve led on for years.
172
516387
3379
08:39
We built this giant audio structure.
173
519807
2211
08:42
It's a dome with 128 custom speakers coming from all directions,
174
522060
4170
08:46
so we could create virtual soundscapes.
175
526230
2461
08:48
It's sort of like the Star Trek holodeck, but for audio.
176
528733
4546
08:53
And if you're standing in the middle of this and you close your eyes,
177
533279
3295
08:56
we can transport you auditorily to anywhere that we want.
178
536574
4213
09:00
So we can render a virtual football game,
179
540828
2503
09:03
and you feel the energy.
180
543372
1669
09:05
Or we can make it sound like you’re in the middle of a bustling city street.
181
545708
4338
09:10
And if you’re me, you feel the anxiety.
182
550088
2711
09:12
Or standing on a beach with the crashing waves,
183
552840
3295
09:16
and you feel the peace.
184
556177
2252
09:19
And so it's super cool.
185
559097
1209
09:20
I wish everyone could be inside there.
186
560348
2627
09:22
Then we ran countless experiments to figure out all the complicated ways
187
562975
3504
09:26
that your brain positions sounds in space.
188
566521
3003
09:29
Also, we could reverse-engineer those neural algorithms
189
569899
3295
09:33
and code them into our software.
190
573194
2294
09:35
So our goal has been to create this experience
191
575530
3670
09:39
but right here.
192
579242
1626
09:41
Us psychoacousticians call this “virtual auditory space”
193
581536
3420
09:44
to distinguish from our real auditory space,
194
584997
2086
09:47
which is, you know, the sounds that are all around us.
195
587125
3044
09:50
And this is what's necessary to create a compelling mixed audio-reality device.
196
590211
4588
09:55
So it's actually impossible to demonstrate this experience
197
595341
3295
09:58
until you hear it with your ears yourself.
198
598636
2419
10:01
But to give you an idea, we have tried to simulate it for you.
199
601097
4087
10:05
So imagine that you’re sitting in a noisy restaurant,
200
605685
2919
10:08
and you're having trouble hearing your friends across the table.
201
608646
3045
10:12
(Overlapping voices, music and noise)
202
612316
5089
10:19
Hey, can you enhance the sounds that are right in front of me?
203
619323
3003
10:23
(People speaking)
204
623494
1335
10:26
(Baby crying)
205
626247
1209
10:30
And can you turn that baby down?
206
630126
2252
10:32
(People talking)
207
632920
2086
10:36
That’s better.
208
636966
1126
10:38
I'm still having a little trouble hearing Pedro.
209
638134
2294
10:40
Can you isolate Pedro for me?
210
640469
1419
10:43
Pedro: (Speaking in Spanish)
211
643306
2794
10:46
JR: That's perfect.
212
646684
1126
10:47
And, you know, my Spanish is a little rusty.
213
647852
2085
10:49
Can I hear Pedro but in English?
214
649979
2252
10:54
Pedro: And at the end of the trip,
215
654400
1710
10:56
we came back to the city to visit the historic center.
216
656152
3503
10:59
JR: Hey Shell, close all programs.
217
659655
1919
11:01
(Noise enhances)
218
661908
1543
11:03
Ah, it’s so much worse.
219
663451
1877
11:06
That's pretty cool, right?
220
666162
1376
11:07
It's pretty cool.
221
667538
1210
11:08
(Applause)
222
668748
2669
11:11
So what you just heard was a beamforming app,
223
671417
2753
11:14
the computational auditory scene analysis app,
224
674212
2627
11:16
a machine-learning denoising app,
225
676839
2169
11:19
an AI transcription and translation
226
679008
1793
11:20
and text-to-speech with style transfer app.
227
680801
2878
11:23
The point is that all those audio transformations are done by software.
228
683721
3795
11:27
So we think the possibilities for these audio computers
229
687934
2961
11:30
are pretty much endless,
230
690937
1459
11:32
and we can't wait to see
231
692438
1168
11:33
what the world's developers are going to do here.
232
693648
2586
11:36
Like imagine an education app that knows your personal learning style
233
696234
3962
11:40
and can teach you with the quality of a world-class professor,
234
700238
2919
11:43
on-call anytime.
235
703157
1668
11:45
Or like a fitness coach you can summon all day
236
705409
2211
11:47
about your diet and exercise,
237
707662
1918
11:49
who can also motivate you through conversation
238
709580
2711
11:52
and even gamify your workout with some auditory cues.
239
712291
3420
11:56
Or, hey, K?
240
716504
1501
11:58
Voice: Hi, Jason. What's up?
241
718965
1376
12:00
Hey, if you were going to make an audio app that could be anything,
242
720883
3170
12:04
what would it be?
243
724095
1334
12:05
Voice: How about a whoopee cushion
244
725972
1626
12:07
that plays a fart sound whenever you sit down?
245
727640
2169
12:09
(Laughter)
246
729809
1418
12:11
JR: Hey, K, if you were going to make an audio app
247
731269
2335
12:13
that didn't have anything to do with farts, what would it be?
248
733646
3003
12:16
Voice: Maybe an app that generates personalized soundscapes
249
736691
2794
12:19
for relaxation and focus?
250
739527
1752
12:21
JR: That's much better.
251
741279
1167
12:22
Alright, it looks like we still have a little fine-tuning to do here.
252
742488
3337
12:26
So the point is,
253
746617
1627
12:28
imagination is the only limit to what you can do here.
254
748244
3420
12:32
Our goal is not just to create the world's first audio computer,
255
752290
3878
12:36
it's to create a truly intuitive computing experience
256
756210
4213
12:40
where we're not monetizing your attention
257
760464
3170
12:43
or making you captive to a new kind of device,
258
763676
3754
12:47
but instead interfacing machines with us
259
767430
3086
12:50
in the way that we were born to.
260
770558
2127
12:53
So I think it's time for a computer that speaks our language.
261
773561
4629
12:58
Thank you.
262
778524
1168
12:59
(Applause)
263
779692
4963
About this website

This site will introduce you to YouTube videos that are useful for learning English. You will see English lessons taught by top-notch teachers from around the world. Double-click on the English subtitles displayed on each video page to play the video from there. The subtitles scroll in sync with the video playback. If you have any comments or requests, please contact us using this contact form.

https://forms.gle/WvT1wiN1qDtmnspy7