Rupal Patel: Synthetic voices, as unique as fingerprints

112,811 views ・ 2014-02-13

TED


Please double-click on the English subtitles below to play the video.

00:12
I'd like to talk today
0
12719
1490
00:14
about a powerful and fundamental aspect
1
14209
2927
00:17
of who we are: our voice.
2
17136
3598
00:20
Each one of us has a unique voiceprint
3
20734
2746
00:23
that reflects our age, our size,
4
23480
2289
00:25
even our lifestyle and personality.
5
25769
3237
00:29
In the words of the poet Longfellow,
6
29006
2142
00:31
"the human voice is the organ of the soul."
7
31148
3870
00:35
As a speech scientist, I'm fascinated
8
35018
2747
00:37
by how the voice is produced,
9
37765
1829
00:39
and I have an idea for how it can be engineered.
10
39594
3658
00:43
That's what I'd like to share with you.
11
43252
2210
00:45
I'm going to start by playing you a sample
12
45462
1814
00:47
of a voice that you may recognize.
13
47276
1871
00:49
(Recording) Stephen Hawking: "I would have thought
14
49147
1304
00:50
it was fairly obvious what I meant."
15
50451
2749
00:53
Rupal Patel: That was the voice
16
53200
1280
00:54
of Professor Stephen Hawking.
17
54480
2086
00:56
What you may not know is that same voice
18
56566
3849
01:00
may also be used by this little girl
19
60415
2478
01:02
who is unable to speak
20
62893
1697
01:04
because of a neurological condition.
21
64590
2597
01:07
In fact, all of these individuals
22
67187
2068
01:09
may be using the same voice,
23
69255
2012
01:11
and that's because there's only a few options available.
24
71267
3557
01:14
In the U.S. alone, there are 2.5 million Americans
25
74824
4317
01:19
who are unable to speak,
26
79141
1610
01:20
and many of whom use computerized devices
27
80751
2622
01:23
to communicate.
28
83373
1522
01:24
Now that's millions of people worldwide
29
84895
3479
01:28
who are using generic voices,
30
88374
1652
01:30
including Professor Hawking,
31
90026
1446
01:31
who uses an American-accented voice.
32
91472
4833
01:36
This lack of individuation of the synthetic voice
33
96305
3328
01:39
really hit home
34
99633
1416
01:41
when I was at an assistive technology conference
35
101049
2472
01:43
a few years ago,
36
103521
1850
01:45
and I recall walking into an exhibit hall
37
105371
3604
01:48
and seeing a little girl and a grown man
38
108975
3044
01:52
having a conversation using their devices,
39
112019
2916
01:54
different devices, but the same voice.
40
114935
4284
01:59
And I looked around and I saw this happening
41
119219
1909
02:01
all around me, literally hundreds of individuals
42
121128
4190
02:05
using a handful of voices,
43
125318
2738
02:08
voices that didn't fit their bodies
44
128056
3091
02:11
or their personalities.
45
131147
2082
02:13
We wouldn't dream of fitting a little girl
46
133229
2727
02:15
with the prosthetic limb of a grown man.
47
135956
3396
02:19
So why then the same prosthetic voice?
48
139352
3304
02:22
It really struck me,
49
142656
1291
02:23
and I wanted to do something about this.
50
143947
3151
02:27
I'm going to play you now a sample
51
147098
1953
02:29
of someone who has, two people actually,
52
149051
3288
02:32
who have severe speech disorders.
53
152339
1768
02:34
I want you to take a listen to how they sound.
54
154107
3230
02:37
They're saying the same utterance.
55
157337
2357
02:39
(First voice)
56
159694
2432
02:42
(Second voice)
57
162126
3617
02:45
You probably didn't understand what they said,
58
165743
2412
02:48
but I hope that you heard
59
168155
1854
02:50
their unique vocal identities.
60
170009
4283
02:54
So what I wanted to do next is,
61
174292
2813
02:57
I wanted to find out how we could harness
62
177105
2384
02:59
these residual vocal abilities
63
179489
1821
03:01
and build a technology
64
181310
2016
03:03
that could be customized for them,
65
183326
2143
03:05
voices that could be customized for them.
66
185469
2429
03:07
So I reached out to my collaborator, Tim Bunnell.
67
187898
2685
03:10
Dr. Bunnell is an expert in speech synthesis,
68
190583
3063
03:13
and what he'd been doing is building
69
193646
2033
03:15
personalized voices for people
70
195679
1881
03:17
by putting together
71
197560
2097
03:19
pre-recorded samples of their voice
72
199657
2150
03:21
and reconstructing a voice for them.
73
201807
2879
03:24
These are people who had lost their voice
74
204686
1712
03:26
later in life.
75
206398
1911
03:28
We didn't have the luxury
76
208309
1394
03:29
of pre-recorded samples of speech
77
209703
1774
03:31
for those born with speech disorder.
78
211477
2292
03:33
But I thought, there had to be a way
79
213769
2537
03:36
to reverse engineer a voice
80
216306
1944
03:38
from whatever little is left over.
81
218250
2291
03:40
So we decided to do exactly that.
82
220541
2714
03:43
We set out with a little bit of funding from the National Science Foundation,
83
223255
3403
03:46
to create custom-crafted voices that captured
84
226658
3565
03:50
their unique vocal identities.
85
230223
1536
03:51
We call this project VocaliD, or vocal I.D.,
86
231759
3203
03:54
for vocal identity.
87
234962
2033
03:56
Now before I get into the details of how
88
236995
2674
03:59
the voice is made and let you listen to it,
89
239669
2048
04:01
I need to give you a real quick speech science lesson. Okay?
90
241717
3350
04:05
So first, we know that the voice is changing
91
245067
3159
04:08
dramatically over the course of development.
92
248226
2854
04:11
Children sound different from teens
93
251080
2090
04:13
who sound different from adults.
94
253170
1463
04:14
We've all experienced this.
95
254633
2642
04:17
Fact number two is that speech
96
257275
3363
04:20
is a combination of the source,
97
260638
2553
04:23
which is the vibrations generated by your voice box,
98
263191
3479
04:26
which are then pushed through
99
266670
1939
04:28
the rest of the vocal tract.
100
268609
2437
04:31
These are the chambers of your head and neck
101
271046
2484
04:33
that vibrate,
102
273530
1239
04:34
and they actually filter that source sound
103
274769
2110
04:36
to produce consonants and vowels.
104
276879
2537
04:39
So the combination of source and filter
105
279416
3860
04:43
is how we produce speech.
106
283276
2630
04:45
And that happens in one individual.
107
285906
3026
04:48
Now I told you earlier that I'd spent
108
288932
2626
04:51
a good part of my career
109
291558
2025
04:53
understanding and studying
110
293583
2453
04:56
the source characteristics of people
111
296036
1958
04:57
with severe speech disorder,
112
297994
2301
05:00
and what I've found
113
300295
1465
05:01
is that even though their filters were impaired,
114
301760
3366
05:05
they were able to modulate their source:
115
305126
2961
05:08
the pitch, the loudness, the tempo of their voice.
116
308087
3262
05:11
These are called prosody, and I've been documenting for years
117
311349
3368
05:14
that the prosodic abilities of these individuals
118
314717
2277
05:16
are preserved.
119
316994
1575
05:18
So when I realized that those same cues
120
318569
4087
05:22
are also important for speaker identity,
121
322656
2769
05:25
I had this idea.
122
325425
2015
05:27
Why don't we take the source
123
327440
2516
05:29
from the person we want the voice to sound like,
124
329956
2213
05:32
because it's preserved,
125
332169
1463
05:33
and borrow the filter
126
333632
2135
05:35
from someone about the same age and size,
127
335767
3229
05:39
because they can articulate speech,
128
339011
2407
05:41
and then mix them?
129
341418
1791
05:43
Because when we mix them,
130
343209
1787
05:44
we can get a voice that's as clear
131
344996
1698
05:46
as our surrogate talker --
132
346694
1754
05:48
that's the person we borrowed the filter from—
133
348448
2595
05:51
and is similar in identity to our target talker.
134
351043
4649
05:55
It's that simple.
135
355692
1427
05:57
That's the science behind what we're doing.
136
357119
2934
06:00
So once you have that in mind,
137
360053
3533
06:03
how do you go about building this voice?
138
363586
2258
06:05
Well, you have to find someone
139
365844
1480
06:07
who is willing to be a surrogate.
140
367324
2400
06:09
It's not such an ominous thing.
141
369724
2264
06:11
Being a surrogate donor
142
371988
1523
06:13
only requires you to say a few hundred
143
373511
2788
06:16
to a few thousand utterances.
144
376299
2242
06:18
The process goes something like this.
145
378541
2003
06:20
(Video) Voice: Things happen in pairs.
146
380544
2190
06:22
I love to sleep.
147
382734
1925
06:24
The sky is blue without clouds.
148
384659
3882
06:28
RP: Now she's going to go on like this
149
388541
2002
06:30
for about three to four hours,
150
390543
1919
06:32
and the idea is not for her to say everything
151
392462
3005
06:35
that the target is going to want to say,
152
395467
2045
06:37
but the idea is to cover all the different combinations
153
397512
3395
06:40
of the sounds that occur in the language.
154
400907
3271
06:44
The more speech you have,
155
404178
1638
06:45
the better sounding voice you're going to have.
156
405816
2305
06:48
Once you have those recordings,
157
408121
1673
06:49
what we need to do
158
409794
1413
06:51
is we have to parse these recordings
159
411207
2718
06:53
into little snippets of speech,
160
413925
2449
06:56
one- or two-sound combinations,
161
416374
2337
06:58
sometimes even whole words
162
418711
1883
07:00
that start populating a dataset or a database.
163
420594
4516
07:05
We're going to call this database a voice bank.
164
425110
3717
07:08
Now the power of the voice bank
165
428827
2096
07:10
is that from this voice bank,
166
430923
2014
07:12
we can now say any new utterance,
167
432937
2011
07:14
like, "I love chocolate" --
168
434948
1424
07:16
everyone needs to be able to say that—
169
436372
1739
07:18
fish through that database
170
438111
1831
07:19
and find all the segments necessary
171
439942
1940
07:21
to say that utterance.
172
441882
1929
07:23
(Video) Voice: I love chocolate.
173
443811
1789
07:25
RP: So that's speech synthesis.
174
445600
1391
07:26
It's called concatenative synthesis, and that's what we're using.
175
446991
2573
07:29
That's not the novel part.
176
449564
1533
07:31
What's novel is how we make it sound
177
451097
2221
07:33
like this young woman.
178
453318
1457
07:34
This is Samantha.
179
454775
1524
07:36
I met her when she was nine,
180
456299
2346
07:38
and since then, my team and I
181
458645
1897
07:40
have been trying to build her a personalized voice.
182
460542
2714
07:43
We first had to find a surrogate donor,
183
463256
3099
07:46
and then we had to have Samantha
184
466355
1818
07:48
produce some utterances.
185
468173
1929
07:50
What she can produce are mostly vowel-like sounds,
186
470102
2379
07:52
but that's enough for us to extract
187
472481
2479
07:54
her source characteristics.
188
474960
2285
07:57
What happens next is best described
189
477245
3271
08:00
by my daughter's analogy. She's six.
190
480516
2767
08:03
She calls it mixing colors to paint voices.
191
483283
5422
08:08
It's beautiful. It's exactly that.
192
488705
2555
08:11
Samantha's voice is like a concentrated sample
193
491260
2860
08:14
of red food dye which we can infuse
194
494120
2609
08:16
into the recordings of her surrogate
195
496729
2540
08:19
to get a pink voice just like this.
196
499269
4387
08:23
(Video) Samantha: Aaaaaah.
197
503656
4491
08:28
RP: So now, Samantha can say this.
198
508147
2808
08:30
(Video) Samantha: This voice is only for me.
199
510955
3069
08:34
I can't wait to use my new voice with my friends.
200
514024
6305
08:40
RP: Thank you. (Applause)
201
520329
6417
08:46
I'll never forget the gentle smile
202
526746
2333
08:49
that spread across her face
203
529079
1902
08:50
when she heard that voice for the first time.
204
530981
3649
08:54
Now there's millions of people
205
534630
1882
08:56
around the world like Samantha, millions,
206
536512
2833
08:59
and we've only begun to scratch the surface.
207
539345
3440
09:02
What we've done so far is we have
208
542785
1642
09:04
a few surrogate talkers from around the U.S.
209
544427
3859
09:08
who have donated their voices,
210
548286
1507
09:09
and we have been using those
211
549793
1928
09:11
to build our first few personalized voices.
212
551721
4472
09:16
But there's so much more work to be done.
213
556193
1756
09:17
For Samantha, her surrogate
214
557949
2188
09:20
came from somewhere in the Midwest, a stranger
215
560137
3046
09:23
who gave her the gift of voice.
216
563183
3841
09:27
And as a scientist, I'm so excited
217
567024
2153
09:29
to take this work out of the laboratory
218
569177
1935
09:31
and finally into the real world
219
571112
1800
09:32
so it can have real-world impact.
220
572912
3165
09:36
What I want to share with you next
221
576077
1582
09:37
is how I envision taking this work
222
577659
2175
09:39
to that next level.
223
579834
2711
09:42
I imagine a whole world of surrogate donors
224
582545
3887
09:46
from all walks of life, different sizes, different ages,
225
586432
3260
09:49
coming together in this voice drive
226
589692
3058
09:52
to give people voices
227
592750
2270
09:55
that are as colorful as their personalities.
228
595020
3799
09:58
To do that as a first step,
229
598819
2300
10:01
we've put together this website, VocaliD.org,
230
601119
3275
10:04
as a way to bring together those
231
604394
1624
10:06
who want to join us as voice donors,
232
606018
2675
10:08
as expertise donors,
233
608693
1772
10:10
in whatever way to make this vision a reality.
234
610465
5339
10:15
They say that giving blood can save lives.
235
615804
4153
10:19
Well, giving your voice can change lives.
236
619957
4982
10:24
All we need is a few hours of speech
237
624939
3050
10:27
from our surrogate talker,
238
627989
1491
10:29
and as little as a vowel from our target talker,
239
629480
4733
10:34
to create a unique vocal identity.
240
634213
3711
10:37
So that's the science behind what we're doing.
241
637924
2626
10:40
I want to end by circling back to the human side
242
640550
4455
10:45
that is really the inspiration for this work.
243
645005
4102
10:49
About five years ago, we built our very first voice
244
649107
3699
10:52
for a little boy named William.
245
652806
2501
10:55
When his mom first heard this voice,
246
655307
2357
10:57
she said, "This is what William
247
657664
2345
11:00
would have sounded like
248
660009
1546
11:01
had he been able to speak."
249
661555
2449
11:04
And then I saw William typing a message
250
664004
2418
11:06
on his device.
251
666422
1362
11:07
I wondered, what was he thinking?
252
667784
3293
11:11
Imagine carrying around someone else's voice
253
671077
3590
11:14
for nine years
254
674667
2193
11:16
and finally finding your own voice.
255
676860
4844
11:21
Imagine that.
256
681704
1377
11:23
This is what William said:
257
683081
2797
11:25
"Never heard me before."
258
685878
4463
11:32
Thank you.
259
692417
1619
11:34
(Applause)
260
694036
4724
About this website

This site will introduce you to YouTube videos that are useful for learning English. You will see English lessons taught by top-notch teachers from around the world. Double-click on the English subtitles displayed on each video page to play the video from there. The subtitles scroll in sync with the video playback. If you have any comments or requests, please contact us using this contact form.

https://forms.gle/WvT1wiN1qDtmnspy7