Rupal Patel: Synthetic voices, as unique as fingerprints

108,568 views ・ 2014-02-13

TED


请双击下面的英文字幕来播放视频。

翻译人员: Peipei Xiang 校对人员: Ying Wang
00:12
I'd like to talk today
0
12719
1490
我今天要和大家讲述的是
00:14
about a powerful and fundamental aspect
1
14209
2927
关于我们自身的一个非常强大
00:17
of who we are: our voice.
2
17136
3598
非常重要的方面:我们的声音,
00:20
Each one of us has a unique voiceprint
3
20734
2746
每一个人的声音都带有独特的标记,
00:23
that reflects our age, our size,
4
23480
2289
这个声音的标记能反映出我们的年龄,我们的胖瘦高矮,
00:25
even our lifestyle and personality.
5
25769
3237
甚至是我们的生活方式和性格。
00:29
In the words of the poet Longfellow,
6
29006
2142
用诗人朗费罗的话来说,
00:31
"the human voice is the organ of the soul."
7
31148
3870
“人类的声音是灵魂的重要器官。”
00:35
As a speech scientist, I'm fascinated
8
35018
2747
身为一个语音科学家,我非常热衷于研究
00:37
by how the voice is produced,
9
37765
1829
声音的产生,
00:39
and I have an idea for how it can be engineered.
10
39594
3658
而且我有一个如何制造声音的想法。
00:43
That's what I'd like to share with you.
11
43252
2210
这就是我今天想和大家分享的东西。
00:45
I'm going to start by playing you a sample
12
45462
1814
首先,我想为大家播放一个声音样本,
00:47
of a voice that you may recognize.
13
47276
1871
这个声音你们可能听过。
00:49
(Recording) Stephen Hawking: "I would have thought
14
49147
1304
(录音)史蒂芬·霍金:“我本来以为,
00:50
it was fairly obvious what I meant."
15
50451
2749
我想说的意思很显而易见。”
00:53
Rupal Patel: That was the voice
16
53200
1280
卢帕尔·帕特尔:那是
00:54
of Professor Stephen Hawking.
17
54480
2086
史蒂芬·霍金教授的声音。
00:56
What you may not know is that same voice
18
56566
3849
你们可能不知道的是,同样的声音
01:00
may also be used by this little girl
19
60415
2478
也被用于这个小女孩身上,
01:02
who is unable to speak
20
62893
1697
她因为大脑神经系统缺陷
01:04
because of a neurological condition.
21
64590
2597
而不能讲话。
01:07
In fact, all of these individuals
22
67187
2068
事实上,很多不能说话的人
01:09
may be using the same voice,
23
69255
2012
都可能在使用同样的声音
01:11
and that's because there's only a few options available.
24
71267
3557
那是因为可以使用的声音样本太少了。
01:14
In the U.S. alone, there are 2.5 million Americans
25
74824
4317
单单在美国,就有250万人
01:19
who are unable to speak,
26
79141
1610
不能说话,
01:20
and many of whom use computerized devices
27
80751
2622
而且在这些人中很多都是使用电脑化的设备
01:23
to communicate.
28
83373
1522
进行交流。
01:24
Now that's millions of people worldwide
29
84895
3479
也就是全世界数百万的人
01:28
who are using generic voices,
30
88374
1652
都在使用一些毫无个性的声音,
01:30
including Professor Hawking,
31
90026
1446
其中就包括史蒂芬·霍金教授,
01:31
who uses an American-accented voice.
32
91472
4833
他使用的声音是带有美国口音的。
01:36
This lack of individuation of the synthetic voice
33
96305
3328
我真正开始意识到
01:39
really hit home
34
99633
1416
合成声音缺乏个性
01:41
when I was at an assistive technology conference
35
101049
2472
是我在几年前参加一个
01:43
a few years ago,
36
103521
1850
辅助技术会议的时候,
01:45
and I recall walking into an exhibit hall
37
105371
3604
我记得走进一个展厅,
01:48
and seeing a little girl and a grown man
38
108975
3044
看到一个小女孩和一个成年男子
01:52
having a conversation using their devices,
39
112019
2916
正在用他们的设备进行对话,
01:54
different devices, but the same voice.
40
114935
4284
不同的设备,却是同样的声音。
01:59
And I looked around and I saw this happening
41
119219
1909
我看向四周,发现身边这种情况很多,
02:01
all around me, literally hundreds of individuals
42
121128
4190
几乎是上百个人
02:05
using a handful of voices,
43
125318
2738
却只用着为数不多的几种声音,
02:08
voices that didn't fit their bodies
44
128056
3091
这些声音跟他们的身体特征
02:11
or their personalities.
45
131147
2082
和性格都很不匹配。
02:13
We wouldn't dream of fitting a little girl
46
133229
2727
我们肯定做梦也不会想到把一个成年男子的假肢
02:15
with the prosthetic limb of a grown man.
47
135956
3396
装在一个小女孩身上。
02:19
So why then the same prosthetic voice?
48
139352
3304
那为什么他们要用同样的合成声音呢?
02:22
It really struck me,
49
142656
1291
这深深的触动了我,
02:23
and I wanted to do something about this.
50
143947
3151
我想做些什么。
02:27
I'm going to play you now a sample
51
147098
1953
现在我想为大家播放一个人的录音——
02:29
of someone who has, two people actually,
52
149051
3288
不对,其实是两个人,
02:32
who have severe speech disorders.
53
152339
1768
他们都有很严重的言语障碍。
02:34
I want you to take a listen to how they sound.
54
154107
3230
我想让大家听听他们的声音。
02:37
They're saying the same utterance.
55
157337
2357
他们在发出同样一个音。
02:39
(First voice)
56
159694
2432
(第一个声音)
02:42
(Second voice)
57
162126
3617
(第二个声音)
02:45
You probably didn't understand what they said,
58
165743
2412
大家可能并不明白他们说了什么,
02:48
but I hope that you heard
59
168155
1854
但我希望大家听到了
02:50
their unique vocal identities.
60
170009
4283
他们独特的声音标志。
02:54
So what I wanted to do next is,
61
174292
2813
所以接下来我想要做的事情就是,
02:57
I wanted to find out how we could harness
62
177105
2384
我想要找出如何可以利用
02:59
these residual vocal abilities
63
179489
1821
他们残留的发声能力,
03:01
and build a technology
64
181310
2016
并发明一项技术,
03:03
that could be customized for them,
65
183326
2143
这项技术能为他们创造出个性化的声音,
03:05
voices that could be customized for them.
66
185469
2429
就是专门为他们定制的声音。
03:07
So I reached out to my collaborator, Tim Bunnell.
67
187898
2685
所以我联系了我的合作伙伴,蒂姆·邦内尔。
03:10
Dr. Bunnell is an expert in speech synthesis,
68
190583
3063
邦内尔博士是一位语言合成方面的专家,
03:13
and what he'd been doing is building
69
193646
2033
他一直在为需要帮助的人合成
03:15
personalized voices for people
70
195679
1881
个性化的声音,
03:17
by putting together
71
197560
2097
他把这些人
03:19
pre-recorded samples of their voice
72
199657
2150
预先录制好的声音样本组合在一起,
03:21
and reconstructing a voice for them.
73
201807
2879
并重新建立他们的声音。
03:24
These are people who had lost their voice
74
204686
1712
这些人都是在人生后来的某个阶段
03:26
later in life.
75
206398
1911
才失去了语言能力。
03:28
We didn't have the luxury
76
208309
1394
可是我们没有
03:29
of pre-recorded samples of speech
77
209703
1774
那些生来就有言语障碍的人的
03:31
for those born with speech disorder.
78
211477
2292
预先录制好的声音样本。
03:33
But I thought, there had to be a way
79
213769
2537
但我想,肯定有一个办法
03:36
to reverse engineer a voice
80
216306
1944
可以利用仅存的不管剩下多少的语言能力
03:38
from whatever little is left over.
81
218250
2291
来逆向重组声音。
03:40
So we decided to do exactly that.
82
220541
2714
于是我们决定去做这样的工作。
03:43
We set out with a little bit of funding from the National Science Foundation,
83
223255
3403
我们从国家科学基金会的一小笔资金开始,
03:46
to create custom-crafted voices that captured
84
226658
3565
努力打造反映了他们的独特声印的
03:50
their unique vocal identities.
85
230223
1536
定制的声音。
03:51
We call this project VocaliD, or vocal I.D.,
86
231759
3203
我们称之为VocaliD计划,即声音ID,
03:54
for vocal identity.
87
234962
2033
用于区别不同的声音。
03:56
Now before I get into the details of how
88
236995
2674
那么,在我开始讲述
03:59
the voice is made and let you listen to it,
89
239669
2048
声音是如何制作的,以及让大家听这些声音之前,
04:01
I need to give you a real quick speech science lesson. Okay?
90
241717
3350
我需要先给大家上一堂关于语音学的快速入门课, 可以么?
04:05
So first, we know that the voice is changing
91
245067
3159
首先,我们知道声音
04:08
dramatically over the course of development.
92
248226
2854
在其发展过程中会发生巨大的改变。
04:11
Children sound different from teens
93
251080
2090
儿童的声音与青少年的声音不同,
04:13
who sound different from adults.
94
253170
1463
而青少年的声音则与成人的声音不同。
04:14
We've all experienced this.
95
254633
2642
我们都经历过这样的改变。
04:17
Fact number two is that speech
96
257275
3363
第二,语音是
04:20
is a combination of the source,
97
260638
2553
声源的组合,
04:23
which is the vibrations generated by your voice box,
98
263191
3479
也就是你的喉部产生的震动
04:26
which are then pushed through
99
266670
1939
通过声道
04:28
the rest of the vocal tract.
100
268609
2437
传出来。
04:31
These are the chambers of your head and neck
101
271046
2484
这些是你的头部和颈部
04:33
that vibrate,
102
273530
1239
会震动的腔室,
04:34
and they actually filter that source sound
103
274769
2110
他们会过滤声源
04:36
to produce consonants and vowels.
104
276879
2537
并产生辅音和元音。
04:39
So the combination of source and filter
105
279416
3860
所以声源和过滤器的组合
04:43
is how we produce speech.
106
283276
2630
使得我们能够制造语言。
04:45
And that happens in one individual.
107
285906
3026
而这发生在一个个体身上。
04:48
Now I told you earlier that I'd spent
108
288932
2626
早先我告诉过你们
04:51
a good part of my career
109
291558
2025
我花了我职业生涯中的很大一部分时间
04:53
understanding and studying
110
293583
2453
来了解和学习
04:56
the source characteristics of people
111
296036
1958
那些有着严重言语障碍的人的
04:57
with severe speech disorder,
112
297994
2301
声源的特征,
05:00
and what I've found
113
300295
1465
我发现
05:01
is that even though their filters were impaired,
114
301760
3366
虽然他们的过滤器受损,
05:05
they were able to modulate their source:
115
305126
2961
他们仍然能够控制他们的声源,
05:08
the pitch, the loudness, the tempo of their voice.
116
308087
3262
包括音高、响度和声音的节奏。
05:11
These are called prosody, and I've been documenting for years
117
311349
3368
这些我们称这些为韵律,而我多年的记录表明
05:14
that the prosodic abilities of these individuals
118
314717
2277
这些人的韵律能力
05:16
are preserved.
119
316994
1575
被保留了下来。
05:18
So when I realized that those same cues
120
318569
4087
所以当我意识到这些同样的线索
05:22
are also important for speaker identity,
121
322656
2769
对讲者身份也是非常重要的时候,
05:25
I had this idea.
122
325425
2015
我有了这样一个想法。
05:27
Why don't we take the source
123
327440
2516
为什么不利用那些
05:29
from the person we want the voice to sound like,
124
329956
2213
我们希望听到的声音的声源,
05:32
because it's preserved,
125
332169
1463
因为这个声源是好的,
05:33
and borrow the filter
126
333632
2135
再借助一个
05:35
from someone about the same age and size,
127
335767
3229
差不多年龄和体型的人的过滤器,
05:39
because they can articulate speech,
128
339011
2407
因为他们可以清晰地发声,
05:41
and then mix them?
129
341418
1791
然后把他们组合在一起?
05:43
Because when we mix them,
130
343209
1787
因为当我们把它们组合在一起的时候,
05:44
we can get a voice that's as clear
131
344996
1698
我们就可以获得一个
05:46
as our surrogate talker --
132
346694
1754
像代理说话者一样清晰的声音,
05:48
that's the person we borrowed the filter from—
133
348448
2595
代理说话者就是我们向其借了过滤器的那个人,
05:51
and is similar in identity to our target talker.
134
351043
4649
而这个声音又跟我们的目标说话者的身份一致。
05:55
It's that simple.
135
355692
1427
就这么简单。
05:57
That's the science behind what we're doing.
136
357119
2934
这就是我们在做的研究背后的科学。
06:00
So once you have that in mind,
137
360053
3533
有了这样的想法以后,
06:03
how do you go about building this voice?
138
363586
2258
我们又该如何真正去打造这样的声音呢?
06:05
Well, you have to find someone
139
365844
1480
嗯,你必须找到
06:07
who is willing to be a surrogate.
140
367324
2400
愿意做代理说话者的人。
06:09
It's not such an ominous thing.
141
369724
2264
这并不是什么有着不祥之兆的事情。
06:11
Being a surrogate donor
142
371988
1523
作为一个代理说话者,
06:13
only requires you to say a few hundred
143
373511
2788
你只需要说上几百个
06:16
to a few thousand utterances.
144
376299
2242
到几千个话语。
06:18
The process goes something like this.
145
378541
2003
过程大致是这样的。
06:20
(Video) Voice: Things happen in pairs.
146
380544
2190
(视频)声音:事情成对发生。
06:22
I love to sleep.
147
382734
1925
我爱睡觉。
06:24
The sky is blue without clouds.
148
384659
3882
天空很蓝,无云。
06:28
RP: Now she's going to go on like this
149
388541
2002
卢帕尔·帕特尔:她就这样继续说上
06:30
for about three to four hours,
150
390543
1919
大约三到四个小时,
06:32
and the idea is not for her to say everything
151
392462
3005
当然她并不需要说出
06:35
that the target is going to want to say,
152
395467
2045
目标说话者会说的所有东西,
06:37
but the idea is to cover all the different combinations
153
397512
3395
而只需覆盖到一门语言中的
06:40
of the sounds that occur in the language.
154
400907
3271
所有发音的不同组合。
06:44
The more speech you have,
155
404178
1638
越多的语音样本
06:45
the better sounding voice you're going to have.
156
405816
2305
就意味着越好的声音质量。
06:48
Once you have those recordings,
157
408121
1673
一旦有了这些录音之后,
06:49
what we need to do
158
409794
1413
我们需要做的就是
06:51
is we have to parse these recordings
159
411207
2718
将这些录音
06:53
into little snippets of speech,
160
413925
2449
解析成语音的小片段,
06:56
one- or two-sound combinations,
161
416374
2337
一两个发声的组合,
06:58
sometimes even whole words
162
418711
1883
有的时候甚至整个的词语
07:00
that start populating a dataset or a database.
163
420594
4516
也会出现在数据库里边。
07:05
We're going to call this database a voice bank.
164
425110
3717
我们就将这个数据库称为声音银行。
07:08
Now the power of the voice bank
165
428827
2096
这个声音银行的作用在于:
07:10
is that from this voice bank,
166
430923
2014
基于这个声音银行,
07:12
we can now say any new utterance,
167
432937
2011
我们现在可以说出任何新的话语,
07:14
like, "I love chocolate" --
168
434948
1424
比如:“我爱巧克力”——
07:16
everyone needs to be able to say that—
169
436372
1739
每个人都应该有可以说出这句话的能力——
07:18
fish through that database
170
438111
1831
从这个数据库中寻找
07:19
and find all the segments necessary
171
439942
1940
并找到说这句话需要的
07:21
to say that utterance.
172
441882
1929
所有必要的片段。
07:23
(Video) Voice: I love chocolate.
173
443811
1789
(视频)声音:我爱巧克力。
07:25
RP: So that's speech synthesis.
174
445600
1391
卢帕尔·帕特尔:这就是语音合成。
07:26
It's called concatenative synthesis, and that's what we're using.
175
446991
2573
这个被称之为衔接合成,而我们用的就是它。
07:29
That's not the novel part.
176
449564
1533
其实这部分并不新奇。
07:31
What's novel is how we make it sound
177
451097
2221
新奇的部分是我们如何制作出听起来
07:33
like this young woman.
178
453318
1457
像是这个年轻女性的声音。
07:34
This is Samantha.
179
454775
1524
这是萨曼莎。
07:36
I met her when she was nine,
180
456299
2346
我第一次见到她的时候,她九岁,
07:38
and since then, my team and I
181
458645
1897
从那时候起,我和我的团队
07:40
have been trying to build her a personalized voice.
182
460542
2714
就一直在努力给她打造一个属于她自己的声音。
07:43
We first had to find a surrogate donor,
183
463256
3099
我们首先要找到一个代理说话者,
07:46
and then we had to have Samantha
184
466355
1818
然后我们让萨曼莎
07:48
produce some utterances.
185
468173
1929
发出一些声音。
07:50
What she can produce are mostly vowel-like sounds,
186
470102
2379
她能做的就是发出一些类似元音的声音,
07:52
but that's enough for us to extract
187
472481
2479
但这对于我们提取她的声源特征
07:54
her source characteristics.
188
474960
2285
已经足够了。
07:57
What happens next is best described
189
477245
3271
接下来发生的事情最好可以
08:00
by my daughter's analogy. She's six.
190
480516
2767
用我女儿的比喻来描述。她六岁。
08:03
She calls it mixing colors to paint voices.
191
483283
5422
她称其为“用不同的颜色画声音”。
08:08
It's beautiful. It's exactly that.
192
488705
2555
美极了。正是这样。
08:11
Samantha's voice is like a concentrated sample
193
491260
2860
萨曼莎的声音就好比是
08:14
of red food dye which we can infuse
194
494120
2609
浓缩的红色食用色素注入了
08:16
into the recordings of her surrogate
195
496729
2540
她的代理说话者的录音里面,
08:19
to get a pink voice just like this.
196
499269
4387
而产生了这样的粉红色的声音。
08:23
(Video) Samantha: Aaaaaah.
197
503656
4491
(视频)萨曼莎:啊……
08:28
RP: So now, Samantha can say this.
198
508147
2808
卢帕尔·帕特尔:那么现在,萨曼莎可以说这样的话。
08:30
(Video) Samantha: This voice is only for me.
199
510955
3069
(视频)萨曼莎:这是只属于我的声音。
08:34
I can't wait to use my new voice with my friends.
200
514024
6305
我迫不及待地想跟我的朋友用我 的新声音交流。
08:40
RP: Thank you. (Applause)
201
520329
6417
卢帕尔·帕特尔:谢谢。(掌声)
08:46
I'll never forget the gentle smile
202
526746
2333
我永远不会忘记
08:49
that spread across her face
203
529079
1902
当她第一次听到自己的声音的时候,
08:50
when she heard that voice for the first time.
204
530981
3649
那个绽放在她脸上的温柔的笑脸。
08:54
Now there's millions of people
205
534630
1882
这个世界有上百万
08:56
around the world like Samantha, millions,
206
536512
2833
和萨曼莎一样的人,上百万,
08:59
and we've only begun to scratch the surface.
207
539345
3440
而我们其实才刚刚开始。
09:02
What we've done so far is we have
208
542785
1642
我们到目前为止所做的就是,
09:04
a few surrogate talkers from around the U.S.
209
544427
3859
我们有来自美国的几个代理说话者,
09:08
who have donated their voices,
210
548286
1507
他们捐献了自己的声音,
09:09
and we have been using those
211
549793
1928
而我们正在用这些声音
09:11
to build our first few personalized voices.
212
551721
4472
来打造最初的一些个性化的声音。
09:16
But there's so much more work to be done.
213
556193
1756
但是接下来的任务还很重。
09:17
For Samantha, her surrogate
214
557949
2188
就萨曼莎,她的代理说话者
09:20
came from somewhere in the Midwest, a stranger
215
560137
3046
来自中西部的一个地方,
09:23
who gave her the gift of voice.
216
563183
3841
一个将声音赠送给她的陌生人。
09:27
And as a scientist, I'm so excited
217
567024
2153
作为一名科学家,我很期待
09:29
to take this work out of the laboratory
218
569177
1935
将这项工作搬到实验室之外,
09:31
and finally into the real world
219
571112
1800
最终搬进现实世界
09:32
so it can have real-world impact.
220
572912
3165
并产生真正的影响。
09:36
What I want to share with you next
221
576077
1582
我接下来想跟你们分享的是
09:37
is how I envision taking this work
222
577659
2175
我对如何将这项工作
09:39
to that next level.
223
579834
2711
推进到下一个层次的展望。
09:42
I imagine a whole world of surrogate donors
224
582545
3887
我想象到一个充满了代理说话者的世界,
09:46
from all walks of life, different sizes, different ages,
225
586432
3260
他们来自不同的行业,有着不同的体型和年龄,
09:49
coming together in this voice drive
226
589692
3058
他们为这个声音计划走到一起,
09:52
to give people voices
227
592750
2270
希望赋予人们
09:55
that are as colorful as their personalities.
228
595020
3799
和他们的性格一样丰富多彩的声音。
09:58
To do that as a first step,
229
598819
2300
实现这个目标的第一步,
10:01
we've put together this website, VocaliD.org,
230
601119
3275
我们建立了一个网站:VocaliD.org,
10:04
as a way to bring together those
231
604394
1624
通过这个网站,我们把
10:06
who want to join us as voice donors,
232
606018
2675
愿意以声音捐献者或专业知识捐献者的身份
10:08
as expertise donors,
233
608693
1772
加入到我们的人们团结在一起,
10:10
in whatever way to make this vision a reality.
234
610465
5339
不管以何种方式,来一起实现这个愿景。
10:15
They say that giving blood can save lives.
235
615804
4153
人们说献血可以拯救生命。
10:19
Well, giving your voice can change lives.
236
619957
4982
那么,捐献您的声音可以改变生命。
10:24
All we need is a few hours of speech
237
624939
3050
我们需要的仅仅是几小时的
10:27
from our surrogate talker,
238
627989
1491
代理说话者的话语,
10:29
and as little as a vowel from our target talker,
239
629480
4733
以及目标说话者的一个小小的元音,
10:34
to create a unique vocal identity.
240
634213
3711
就可以打造一个独特的声音。
10:37
So that's the science behind what we're doing.
241
637924
2626
这就是我们所做的研究背后的科学。
10:40
I want to end by circling back to the human side
242
640550
4455
作为结尾,我还是想回到人的主题,
10:45
that is really the inspiration for this work.
243
645005
4102
这也是这项工作的真正灵感来源。
10:49
About five years ago, we built our very first voice
244
649107
3699
大约五年前,我们第一次给一个名为威廉的男孩
10:52
for a little boy named William.
245
652806
2501
打造了他的声音。
10:55
When his mom first heard this voice,
246
655307
2357
当他的妈妈第一次听到这个声音的时候,
10:57
she said, "This is what William
247
657664
2345
她说:“如果威廉
11:00
would have sounded like
248
660009
1546
可以讲话,
11:01
had he been able to speak."
249
661555
2449
他的声音就该是这样的。
11:04
And then I saw William typing a message
250
664004
2418
然后我看到威廉在他的设备上
11:06
on his device.
251
666422
1362
打出一条消息。
11:07
I wondered, what was he thinking?
252
667784
3293
我在想,他在想什么?
11:11
Imagine carrying around someone else's voice
253
671077
3590
想象一下九年来一直用着
11:14
for nine years
254
674667
2193
别人的声音,
11:16
and finally finding your own voice.
255
676860
4844
然后最终找到了你自己的声音。
11:21
Imagine that.
256
681704
1377
想象一下。
11:23
This is what William said:
257
683081
2797
威廉说的是:
11:25
"Never heard me before."
258
685878
4463
“我从来没有听过我自己的声音。”
11:32
Thank you.
259
692417
1619
谢谢。
11:34
(Applause)
260
694036
4724
(掌声)
关于本网站

这个网站将向你介绍对学习英语有用的YouTube视频。你将看到来自世界各地的一流教师教授的英语课程。双击每个视频页面上显示的英文字幕,即可从那里播放视频。字幕会随着视频的播放而同步滚动。如果你有任何意见或要求,请使用此联系表与我们联系。

https://forms.gle/WvT1wiN1qDtmnspy7