Rupal Patel: Synthetic voices, as unique as fingerprints

113,269 views ・ 2014-02-13

TED


請雙擊下方英文字幕播放視頻。

譯者: Chunda Zeng 審譯者: Xuwen Zhu
00:12
I'd like to talk today
0
12719
1490
我今天想給大家介紹
00:14
about a powerful and fundamental aspect
1
14209
2927
一個對我們身份有重要影響的因素
00:17
of who we are: our voice.
2
17136
3598
那就是:聲音
00:20
Each one of us has a unique voiceprint
3
20734
2746
我們每一個人都有獨特的音印
00:23
that reflects our age, our size,
4
23480
2289
它反映了我們的年紀, 體型,
00:25
even our lifestyle and personality.
5
25769
3237
甚至我們的性格與生活習慣
00:29
In the words of the poet Longfellow,
6
29006
2142
以詩人亨利·沃茲沃思·朗費羅的話說:
00:31
"the human voice is the organ of the soul."
7
31148
3870
"人類的聲音就是靈魂的器官."
00:35
As a speech scientist, I'm fascinated
8
35018
2747
做為一個語言科學家, 我對聲音產生的過程
00:37
by how the voice is produced,
9
37765
1829
有著濃厚的興趣,
00:39
and I have an idea for how it can be engineered.
10
39594
3658
我對如何來設計與建造聲音 有一個新的看法
00:43
That's what I'd like to share with you.
11
43252
2210
我想和大家分享的這個看法
00:45
I'm going to start by playing you a sample
12
45462
1814
先給大家放一個實例
00:47
of a voice that you may recognize.
13
47276
1871
你們也許認得這個聲音
00:49
(Recording) Stephen Hawking: "I would have thought
14
49147
1304
(錄音) 史蒂芬‧霍金:"我以為我說的話
00:50
it was fairly obvious what I meant."
15
50451
2749
還是比較清楚的"
00:53
Rupal Patel: That was the voice
16
53200
1280
這個錄音裡的聲音
00:54
of Professor Stephen Hawking.
17
54480
2086
是來自史蒂芬‧霍金教授
00:56
What you may not know is that same voice
18
56566
3849
但是你也許不知道同一個聲音
01:00
may also be used by this little girl
19
60415
2478
也可能被這個小女孩使用
01:02
who is unable to speak
20
62893
1697
她因為神經的問題
01:04
because of a neurological condition.
21
64590
2597
而無法說話
01:07
In fact, all of these individuals
22
67187
2068
事實上, 所有這些人
01:09
may be using the same voice,
23
69255
2012
都可能用著同一個聲音,
01:11
and that's because there's only a few options available.
24
71267
3557
因為目前可用的聲音只有幾個
01:14
In the U.S. alone, there are 2.5 million Americans
25
74824
4317
僅在美國就有250萬人
01:19
who are unable to speak,
26
79141
1610
無法通過語言溝通,
01:20
and many of whom use computerized devices
27
80751
2622
他們大多數
01:23
to communicate.
28
83373
1522
使用電子設備來溝通
01:24
Now that's millions of people worldwide
29
84895
3479
這意味著全世界有數百萬的人
01:28
who are using generic voices,
30
88374
1652
都用著同樣的聲音,
01:30
including Professor Hawking,
31
90026
1446
其中包括了霍金教授,
01:31
who uses an American-accented voice.
32
91472
4833
他用的是帶有美式口音的聲音
01:36
This lack of individuation of the synthetic voice
33
96305
3328
這種人工聲音缺少的個體性
01:39
really hit home
34
99633
1416
讓我非常的驚訝,
01:41
when I was at an assistive technology conference
35
101049
2472
當我幾年前
01:43
a few years ago,
36
103521
1850
在一個輔具科技會議上,
01:45
and I recall walking into an exhibit hall
37
105371
3604
我記得走進一個展覽廳
01:48
and seeing a little girl and a grown man
38
108975
3044
看見一個小女孩和一個成年男子
01:52
having a conversation using their devices,
39
112019
2916
通過他們的設備談話,
01:54
different devices, but the same voice.
40
114935
4284
雖然設備不同, 但聲音卻是一樣的
01:59
And I looked around and I saw this happening
41
119219
1909
我望了望四周,發現
02:01
all around me, literally hundreds of individuals
42
121128
4190
周圍有幾百個人
02:05
using a handful of voices,
43
125318
2738
使用的聲音却只有幾種
02:08
voices that didn't fit their bodies
44
128056
3091
都不符合他們的身體
02:11
or their personalities.
45
131147
2082
或是性格.
02:13
We wouldn't dream of fitting a little girl
46
133229
2727
我們不會考慮給一個小女孩裝上
02:15
with the prosthetic limb of a grown man.
47
135956
3396
一個成年男子的假肢
02:19
So why then the same prosthetic voice?
48
139352
3304
那為甚麼要給她一個 不屬於自己的聲音呢?
02:22
It really struck me,
49
142656
1291
我因為感觸很深,
02:23
and I wanted to do something about this.
50
143947
3151
所以決定對此做些甚麼
02:27
I'm going to play you now a sample
51
147098
1953
接下來我要播放的例子
02:29
of someone who has, two people actually,
52
149051
3288
是兩個人,
02:32
who have severe speech disorders.
53
152339
1768
他們都有嚴重的語言障礙
02:34
I want you to take a listen to how they sound.
54
154107
3230
我希望大家聽聽看他們的聲音
02:37
They're saying the same utterance.
55
157337
2357
二人說的是一樣的話
02:39
(First voice)
56
159694
2432
(聲音一)
02:42
(Second voice)
57
162126
3617
(聲音二)
02:45
You probably didn't understand what they said,
58
165743
2412
你們也許沒聽懂他們的話,
02:48
but I hope that you heard
59
168155
1854
但我希望你們注意到了
02:50
their unique vocal identities.
60
170009
4283
他們聲音中的獨特性
02:54
So what I wanted to do next is,
61
174292
2813
我接下來要做的是,
02:57
I wanted to find out how we could harness
62
177105
2384
找到一個方法來
02:59
these residual vocal abilities
63
179489
1821
利用這些剩餘的聲音特性
03:01
and build a technology
64
181310
2016
來發明一套科技
03:03
that could be customized for them,
65
183326
2143
專為他們設計
03:05
voices that could be customized for them.
66
185469
2429
將他們的聲音個性化,
03:07
So I reached out to my collaborator, Tim Bunnell.
67
187898
2685
我找到了我的合作人, 蒂姆·布涅爾
03:10
Dr. Bunnell is an expert in speech synthesis,
68
190583
3063
布涅爾博士是智能語音方面的專家,
03:13
and what he'd been doing is building
69
193646
2033
他一直都在為
03:15
personalized voices for people
70
195679
1881
他人設計個性化的語音
03:17
by putting together
71
197560
2097
方法是通過收集
03:19
pre-recorded samples of their voice
72
199657
2150
這些人之前的聲音錄音
03:21
and reconstructing a voice for them.
73
201807
2879
然後再為他們重建一種聲音
03:24
These are people who had lost their voice
74
204686
1712
但是布涅爾博士的這些研究對象
03:26
later in life.
75
206398
1911
遇到的問題是後天性語言障礙
03:28
We didn't have the luxury
76
208309
1394
我們這次的研究沒有這個福利
03:29
of pre-recorded samples of speech
77
209703
1774
對這些先天帶有語言障礙的人
03:31
for those born with speech disorder.
78
211477
2292
我們沒有事先錄製好的聲音樣品
03:33
But I thought, there had to be a way
79
213769
2537
但是我想了想, 一定有一個方法
03:36
to reverse engineer a voice
80
216306
1944
可以從僅有的所剩中
03:38
from whatever little is left over.
81
218250
2291
將聲音逆向製作出來
03:40
So we decided to do exactly that.
82
220541
2714
所以我們決定就這樣做
03:43
We set out with a little bit of funding from the National Science Foundation,
83
223255
3403
我們從國家科學基金會獲得了一些資金,
03:46
to create custom-crafted voices that captured
84
226658
3565
用以建造一套可以抓住他們
03:50
their unique vocal identities.
85
230223
1536
聲音特性的個體化語音
03:51
We call this project VocaliD, or vocal I.D.,
86
231759
3203
我們將該專案稱作VocaliD, 或是vocal I.D.,
03:54
for vocal identity.
87
234962
2033
作為語音身份(Vocal Identity)的簡寫
03:56
Now before I get into the details of how
88
236995
2674
在我向大家播放
03:59
the voice is made and let you listen to it,
89
239669
2048
和介紹如何製作這個聲音之前,
04:01
I need to give you a real quick speech science lesson. Okay?
90
241717
3350
我需要先給大家上一堂 語言科學課, 好嗎?
04:05
So first, we know that the voice is changing
91
245067
3159
首先,我們需要了解聲音
04:08
dramatically over the course of development.
92
248226
2854
在成長的過程中會發生巨大的變化
04:11
Children sound different from teens
93
251080
2090
兒童和青少年聽起來會不同
04:13
who sound different from adults.
94
253170
1463
而青少年和成年人之間也是
04:14
We've all experienced this.
95
254633
2642
我們都曾經歷過這些語言變化階段
04:17
Fact number two is that speech
96
257275
3363
事實二,是語言的產生
04:20
is a combination of the source,
97
260638
2553
是由多個來源組成,
04:23
which is the vibrations generated by your voice box,
98
263191
3479
其中包括了你喉頭產生的顫動,
04:26
which are then pushed through
99
266670
1939
這種顫動接著
04:28
the rest of the vocal tract.
100
268609
2437
會貫穿整個聲腔
04:31
These are the chambers of your head and neck
101
271046
2484
圖像顯示的是頭和脖子的內部
04:33
that vibrate,
102
273530
1239
它們會顫動,
04:34
and they actually filter that source sound
103
274769
2110
其實它們是將來源聲音過濾掉
04:36
to produce consonants and vowels.
104
276879
2537
來產生子音和母音
04:39
So the combination of source and filter
105
279416
3860
所以聲音的來源和過濾過程加在一起
04:43
is how we produce speech.
106
283276
2630
就是我們產生聲音的方法
04:45
And that happens in one individual.
107
285906
3026
這是一個人身上發生的過程
04:48
Now I told you earlier that I'd spent
108
288932
2626
我之前告訴過大家
04:51
a good part of my career
109
291558
2025
我職業生涯的大部分時間
04:53
understanding and studying
110
293583
2453
都用來研究和學習
04:56
the source characteristics of people
111
296036
1958
有嚴重語音障礙人士的
04:57
with severe speech disorder,
112
297994
2301
聲音源的特徵,
05:00
and what I've found
113
300295
1465
我發現
05:01
is that even though their filters were impaired,
114
301760
3366
雖然他們的過濾器官已遭到損壞,
05:05
they were able to modulate their source:
115
305126
2961
他們可以調製自己的聲音來源:
05:08
the pitch, the loudness, the tempo of their voice.
116
308087
3262
包括高低度, 大小, 以及速度
05:11
These are called prosody, and I've been documenting for years
117
311349
3368
這些被稱之為音律,
05:14
that the prosodic abilities of these individuals
118
314717
2277
我用了多年的時間 來紀錄這些人是如何
05:16
are preserved.
119
316994
1575
維持自己音律的能力
05:18
So when I realized that those same cues
120
318569
4087
當我認識到同樣的線索
05:22
are also important for speaker identity,
121
322656
2769
對說話人的身份同樣重要的時候,
05:25
I had this idea.
122
325425
2015
我有了一個想法
05:27
Why don't we take the source
123
327440
2516
為什麼我們不找一個 聲音是我們所需要的人,
05:29
from the person we want the voice to sound like,
124
329956
2213
從他那採集聲音源
05:32
because it's preserved,
125
332169
1463
因為它已被保留,
05:33
and borrow the filter
126
333632
2135
然後再找一個有著相似年紀和體型的人
05:35
from someone about the same age and size,
127
335767
3229
從他那借用過濾器,
05:39
because they can articulate speech,
128
339011
2407
因為他們能清晰地說話,
05:41
and then mix them?
129
341418
1791
然後將二者混合?
05:43
Because when we mix them,
130
343209
1787
因為當我們將它們混合的時候,
05:44
we can get a voice that's as clear
131
344996
1698
我們得到的聲音將會和
05:46
as our surrogate talker --
132
346694
1754
那個代替說話者一樣清楚
05:48
that's the person we borrowed the filter from—
133
348448
2595
代替說話者就是我們借用過濾器的人
05:51
and is similar in identity to our target talker.
134
351043
4649
而產生的語音和我們 目標說話者有相似的辨認度
05:55
It's that simple.
135
355692
1427
就這麼簡單
05:57
That's the science behind what we're doing.
136
357119
2934
這就我們該項研究的科學性
06:00
So once you have that in mind,
137
360053
3533
有了這個想法以後,
06:03
how do you go about building this voice?
138
363586
2258
應該怎麼來製造這個聲音呢?
06:05
Well, you have to find someone
139
365844
1480
首先,你必須找一個
06:07
who is willing to be a surrogate.
140
367324
2400
願意當這個代替者的人
06:09
It's not such an ominous thing.
141
369724
2264
這個任務也不是太糟糕
06:11
Being a surrogate donor
142
371988
1523
當一個聲音捐贈者
06:13
only requires you to say a few hundred
143
373511
2788
只要求你閱讀幾百
06:16
to a few thousand utterances.
144
376299
2242
到幾千句話.
06:18
The process goes something like this.
145
378541
2003
以下是過程
06:20
(Video) Voice: Things happen in pairs.
146
380544
2190
(錄影)聲音: 事情成雙成對地發生
06:22
I love to sleep.
147
382734
1925
我愛睡覺
06:24
The sky is blue without clouds.
148
384659
3882
天空藍色無雲
06:28
RP: Now she's going to go on like this
149
388541
2002
演講者: 她接下來的3-4個小時
06:30
for about three to four hours,
150
390543
1919
都會繼續閱讀,
06:32
and the idea is not for her to say everything
151
392462
3005
目的是不要讓她說
06:35
that the target is going to want to say,
152
395467
2045
所有目標說話者要說的話
06:37
but the idea is to cover all the different combinations
153
397512
3395
真正的目的是要概擴所有
06:40
of the sounds that occur in the language.
154
400907
3271
在語言中可能發生的組合
06:44
The more speech you have,
155
404178
1638
你說的話越多,
06:45
the better sounding voice you're going to have.
156
405816
2305
你的聲音就會聽起來更好
06:48
Once you have those recordings,
157
408121
1673
當錄音完成後,
06:49
what we need to do
158
409794
1413
我們接下來
06:51
is we have to parse these recordings
159
411207
2718
要對這些錄音做語法分析
06:53
into little snippets of speech,
160
413925
2449
將它們分段,
06:56
one- or two-sound combinations,
161
416374
2337
大概1-2個音的組合,
06:58
sometimes even whole words
162
418711
1883
有時候也會是那些
07:00
that start populating a dataset or a database.
163
420594
4516
填入數據集或是數據庫的完整單字
07:05
We're going to call this database a voice bank.
164
425110
3717
我們將這個數據庫稱之為聲音銀行
07:08
Now the power of the voice bank
165
428827
2096
聲音銀行的力量
07:10
is that from this voice bank,
166
430923
2014
使我們通過它
07:12
we can now say any new utterance,
167
432937
2011
可以說出任何新的語句,
07:14
like, "I love chocolate" --
168
434948
1424
比如說, "我喜歡巧克力"
07:16
everyone needs to be able to say that—
169
436372
1739
所有人都需要說這類的話的能力
07:18
fish through that database
170
438111
1831
搜尋數據庫
07:19
and find all the segments necessary
171
439942
1940
找到必須的部分
07:21
to say that utterance.
172
441882
1929
來完成這個語句
07:23
(Video) Voice: I love chocolate.
173
443811
1789
(錄影)聲音: 我喜歡巧克力
07:25
RP: So that's speech synthesis.
174
445600
1391
演講人: 這是一個人工聲音
07:26
It's called concatenative synthesis, and that's what we're using.
175
446991
2573
我們將其稱之為連環整合 我們使用的就是這個方法
07:29
That's not the novel part.
176
449564
1533
這不是新奇的部分
07:31
What's novel is how we make it sound
177
451097
2221
它新奇之處是我們使它
07:33
like this young woman.
178
453318
1457
聽起來就像是這個年輕女士的聲音
07:34
This is Samantha.
179
454775
1524
她是珊曼莎
07:36
I met her when she was nine,
180
456299
2346
在她9歲時, 我第一次見到她
07:38
and since then, my team and I
181
458645
1897
在那之後, 我和我的團隊
07:40
have been trying to build her a personalized voice.
182
460542
2714
一直設法為她製造一款個性化的聲音
07:43
We first had to find a surrogate donor,
183
463256
3099
我們首先需要一個捐贈者,
07:46
and then we had to have Samantha
184
466355
1818
然後我們會讓珊曼莎
07:48
produce some utterances.
185
468173
1929
發一些音
07:50
What she can produce are mostly vowel-like sounds,
186
470102
2379
雖然她所發出的音大部分都類似母音,
07:52
but that's enough for us to extract
187
472481
2479
但我們用這些已足夠
07:54
her source characteristics.
188
474960
2285
來取得她聲音根源的特性
07:57
What happens next is best described
189
477245
3271
接下來所發生的事
08:00
by my daughter's analogy. She's six.
190
480516
2767
用我女兒的比喻來描述再合適不過, 她6歲
08:03
She calls it mixing colors to paint voices.
191
483283
5422
她說這是混合顏色來畫聲音
08:08
It's beautiful. It's exactly that.
192
488705
2555
很漂亮, 就是這樣
08:11
Samantha's voice is like a concentrated sample
193
491260
2860
珊曼莎的聲音就像是紅色食用色素
08:14
of red food dye which we can infuse
194
494120
2609
的濃縮樣品
08:16
into the recordings of her surrogate
195
496729
2540
我們可以將它注入到她代替者的錄音裡
08:19
to get a pink voice just like this.
196
499269
4387
然後取得一個像這樣的粉色聲音
08:23
(Video) Samantha: Aaaaaah.
197
503656
4491
(錄影)珊曼莎:啊.....
08:28
RP: So now, Samantha can say this.
198
508147
2808
現在, 珊曼莎可以說這個
08:30
(Video) Samantha: This voice is only for me.
199
510955
3069
(錄影)珊曼莎: 這個聲音是我的專屬
08:34
I can't wait to use my new voice with my friends.
200
514024
6305
我等不及與我朋友們分享我的聲音
08:40
RP: Thank you. (Applause)
201
520329
6417
謝謝
08:46
I'll never forget the gentle smile
202
526746
2333
我永遠都不會忘記
08:49
that spread across her face
203
529079
1902
當她第一次聽到自己的聲音時
08:50
when she heard that voice for the first time.
204
530981
3649
佈滿在她臉上那輕柔的微笑
08:54
Now there's millions of people
205
534630
1882
目前世界上
08:56
around the world like Samantha, millions,
206
536512
2833
有好幾百萬像珊曼莎的人, 幾百萬,
08:59
and we've only begun to scratch the surface.
207
539345
3440
而我們的工作才剛剛開始
09:02
What we've done so far is we have
208
542785
1642
我們目前只有
09:04
a few surrogate talkers from around the U.S.
209
544427
3859
幾個來自美國的語言代替者
09:08
who have donated their voices,
210
548286
1507
捐贈了他們的聲音,
09:09
and we have been using those
211
549793
1928
我們使用了他們的捐贈
09:11
to build our first few personalized voices.
212
551721
4472
來建造我們第一批個性化的聲音
09:16
But there's so much more work to be done.
213
556193
1756
但還有更多的工作要完成
09:17
For Samantha, her surrogate
214
557949
2188
對珊曼莎而言, 她的代替者
09:20
came from somewhere in the Midwest, a stranger
215
560137
3046
是來自美國中西部, 一個陌生人
09:23
who gave her the gift of voice.
216
563183
3841
送給了她一個聲音禮物
09:27
And as a scientist, I'm so excited
217
567024
2153
作為一個科學家, 我很開心
09:29
to take this work out of the laboratory
218
569177
1935
能將這個研究從實驗室
09:31
and finally into the real world
219
571112
1800
帶到現實的世界
09:32
so it can have real-world impact.
220
572912
3165
讓它產生一個實際的影響
09:36
What I want to share with you next
221
576077
1582
我接下來想跟大家分享
09:37
is how I envision taking this work
222
577659
2175
我如何想像讓這項研究
09:39
to that next level.
223
579834
2711
進入下一個階段
09:42
I imagine a whole world of surrogate donors
224
582545
3887
我想像著一個充滿了聲音捐贈者的世界
09:46
from all walks of life, different sizes, different ages,
225
586432
3260
他們來自各行各業, 有著不同的體型和年齡,
09:49
coming together in this voice drive
226
589692
3058
一起聚集到這個聲音活動
09:52
to give people voices
227
592750
2270
給其他人提供的聲音
09:55
that are as colorful as their personalities.
228
595020
3799
就像他們個性一樣多姿多采
09:58
To do that as a first step,
229
598819
2300
我們的第一個步驟,
10:01
we've put together this website, VocaliD.org,
230
601119
3275
是建立這個網站, VocaliD.org,
10:04
as a way to bring together those
231
604394
1624
通過這個網站將
10:06
who want to join us as voice donors,
232
606018
2675
那些願意捐贈聲音的,
10:08
as expertise donors,
233
608693
1772
願意提供意見的,
10:10
in whatever way to make this vision a reality.
234
610465
5339
還有想提供其它幫助的人聚集到一起
10:15
They say that giving blood can save lives.
235
615804
4153
有人說捐血可以救人
10:19
Well, giving your voice can change lives.
236
619957
4982
那麼捐聲音就可以改變他人的生活
10:24
All we need is a few hours of speech
237
624939
3050
從我們的代替說話者那裡
10:27
from our surrogate talker,
238
627989
1491
我們只需要幾個小時的語音,
10:29
and as little as a vowel from our target talker,
239
629480
4733
然後再從我們的目標說話者那裡取得幾個母音,
10:34
to create a unique vocal identity.
240
634213
3711
就可以建立出一個獨特的聲音身份
10:37
So that's the science behind what we're doing.
241
637924
2626
這就是我們研究背後的科學
10:40
I want to end by circling back to the human side
242
640550
4455
結尾我想再次強調人為因素
10:45
that is really the inspiration for this work.
243
645005
4102
因為它才是這項研究的啟發
10:49
About five years ago, we built our very first voice
244
649107
3699
大約在5年前, 我們為一個名為威廉的小男孩
10:52
for a little boy named William.
245
652806
2501
製造了第一個聲音
10:55
When his mom first heard this voice,
246
655307
2357
當他的媽媽第一次聽到兒子的聲音時,
10:57
she said, "This is what William
247
657664
2345
她說, "如果威廉可以說話,
11:00
would have sounded like
248
660009
1546
那他的聲音
11:01
had he been able to speak."
249
661555
2449
一定和這個一模一樣."
11:04
And then I saw William typing a message
250
664004
2418
我們然後看到威廉在他的設備上
11:06
on his device.
251
666422
1362
打一條訊息
11:07
I wondered, what was he thinking?
252
667784
3293
我猜想他在想什麼?
11:11
Imagine carrying around someone else's voice
253
671077
3590
試想一下借用了他人的聲音
11:14
for nine years
254
674667
2193
9年之後
11:16
and finally finding your own voice.
255
676860
4844
終於有了自己聲音的感覺
11:21
Imagine that.
256
681704
1377
試想一下
11:23
This is what William said:
257
683081
2797
這就是威廉說的話:
11:25
"Never heard me before."
258
685878
4463
"在這之前從來沒聽過我說話"
11:32
Thank you.
259
692417
1619
謝謝大家
11:34
(Applause)
260
694036
4724
掌聲
關於本網站

本網站將向您介紹對學習英語有用的 YouTube 視頻。 您將看到來自世界各地的一流教師教授的英語課程。 雙擊每個視頻頁面上顯示的英文字幕,從那裡播放視頻。 字幕與視頻播放同步滾動。 如果您有任何意見或要求,請使用此聯繫表與我們聯繫。

https://forms.gle/WvT1wiN1qDtmnspy7