Rupal Patel: Synthetic voices, as unique as fingerprints

114,553 views ・ 2014-02-13

TED


請雙擊下方英文字幕播放視頻。

譯者: Chunda Zeng 審譯者: Xuwen Zhu
00:12
I'd like to talk today
0
12719
1490
我今天想給大家介紹
00:14
about a powerful and fundamental aspect
1
14209
2927
一個對我們身份有重要影響的因素
00:17
of who we are: our voice.
2
17136
3598
那就是:聲音
00:20
Each one of us has a unique voiceprint
3
20734
2746
我們每一個人都有獨特的音印
00:23
that reflects our age, our size,
4
23480
2289
它反映了我們的年紀, 體型,
00:25
even our lifestyle and personality.
5
25769
3237
甚至我們的性格與生活習慣
00:29
In the words of the poet Longfellow,
6
29006
2142
以詩人亨利·沃茲沃思·朗費羅的話說:
00:31
"the human voice is the organ of the soul."
7
31148
3870
"人類的聲音就是靈魂的器官."
00:35
As a speech scientist, I'm fascinated
8
35018
2747
做為一個語言科學家, 我對聲音產生的過程
00:37
by how the voice is produced,
9
37765
1829
有著濃厚的興趣,
00:39
and I have an idea for how it can be engineered.
10
39594
3658
我對如何來設計與建造聲音 有一個新的看法
00:43
That's what I'd like to share with you.
11
43252
2210
我想和大家分享的這個看法
00:45
I'm going to start by playing you a sample
12
45462
1814
先給大家放一個實例
00:47
of a voice that you may recognize.
13
47276
1871
你們也許認得這個聲音
00:49
(Recording) Stephen Hawking: "I would have thought
14
49147
1304
(錄音) 史蒂芬‧霍金:"我以為我說的話
00:50
it was fairly obvious what I meant."
15
50451
2749
還是比較清楚的"
00:53
Rupal Patel: That was the voice
16
53200
1280
這個錄音裡的聲音
00:54
of Professor Stephen Hawking.
17
54480
2086
是來自史蒂芬‧霍金教授
00:56
What you may not know is that same voice
18
56566
3849
但是你也許不知道同一個聲音
01:00
may also be used by this little girl
19
60415
2478
也可能被這個小女孩使用
01:02
who is unable to speak
20
62893
1697
她因為神經的問題
01:04
because of a neurological condition.
21
64590
2597
而無法說話
01:07
In fact, all of these individuals
22
67187
2068
事實上, 所有這些人
01:09
may be using the same voice,
23
69255
2012
都可能用著同一個聲音,
01:11
and that's because there's only a few options available.
24
71267
3557
因為目前可用的聲音只有幾個
01:14
In the U.S. alone, there are 2.5 million Americans
25
74824
4317
僅在美國就有250萬人
01:19
who are unable to speak,
26
79141
1610
無法通過語言溝通,
01:20
and many of whom use computerized devices
27
80751
2622
他們大多數
01:23
to communicate.
28
83373
1522
使用電子設備來溝通
01:24
Now that's millions of people worldwide
29
84895
3479
這意味著全世界有數百萬的人
01:28
who are using generic voices,
30
88374
1652
都用著同樣的聲音,
01:30
including Professor Hawking,
31
90026
1446
其中包括了霍金教授,
01:31
who uses an American-accented voice.
32
91472
4833
他用的是帶有美式口音的聲音
01:36
This lack of individuation of the synthetic voice
33
96305
3328
這種人工聲音缺少的個體性
01:39
really hit home
34
99633
1416
讓我非常的驚訝,
01:41
when I was at an assistive technology conference
35
101049
2472
當我幾年前
01:43
a few years ago,
36
103521
1850
在一個輔具科技會議上,
01:45
and I recall walking into an exhibit hall
37
105371
3604
我記得走進一個展覽廳
01:48
and seeing a little girl and a grown man
38
108975
3044
看見一個小女孩和一個成年男子
01:52
having a conversation using their devices,
39
112019
2916
通過他們的設備談話,
01:54
different devices, but the same voice.
40
114935
4284
雖然設備不同, 但聲音卻是一樣的
01:59
And I looked around and I saw this happening
41
119219
1909
我望了望四周,發現
02:01
all around me, literally hundreds of individuals
42
121128
4190
周圍有幾百個人
02:05
using a handful of voices,
43
125318
2738
使用的聲音却只有幾種
02:08
voices that didn't fit their bodies
44
128056
3091
都不符合他們的身體
02:11
or their personalities.
45
131147
2082
或是性格.
02:13
We wouldn't dream of fitting a little girl
46
133229
2727
我們不會考慮給一個小女孩裝上
02:15
with the prosthetic limb of a grown man.
47
135956
3396
一個成年男子的假肢
02:19
So why then the same prosthetic voice?
48
139352
3304
那為甚麼要給她一個 不屬於自己的聲音呢?
02:22
It really struck me,
49
142656
1291
我因為感觸很深,
02:23
and I wanted to do something about this.
50
143947
3151
所以決定對此做些甚麼
02:27
I'm going to play you now a sample
51
147098
1953
接下來我要播放的例子
02:29
of someone who has, two people actually,
52
149051
3288
是兩個人,
02:32
who have severe speech disorders.
53
152339
1768
他們都有嚴重的語言障礙
02:34
I want you to take a listen to how they sound.
54
154107
3230
我希望大家聽聽看他們的聲音
02:37
They're saying the same utterance.
55
157337
2357
二人說的是一樣的話
02:39
(First voice)
56
159694
2432
(聲音一)
02:42
(Second voice)
57
162126
3617
(聲音二)
02:45
You probably didn't understand what they said,
58
165743
2412
你們也許沒聽懂他們的話,
02:48
but I hope that you heard
59
168155
1854
但我希望你們注意到了
02:50
their unique vocal identities.
60
170009
4283
他們聲音中的獨特性
02:54
So what I wanted to do next is,
61
174292
2813
我接下來要做的是,
02:57
I wanted to find out how we could harness
62
177105
2384
找到一個方法來
02:59
these residual vocal abilities
63
179489
1821
利用這些剩餘的聲音特性
03:01
and build a technology
64
181310
2016
來發明一套科技
03:03
that could be customized for them,
65
183326
2143
專為他們設計
03:05
voices that could be customized for them.
66
185469
2429
將他們的聲音個性化,
03:07
So I reached out to my collaborator, Tim Bunnell.
67
187898
2685
我找到了我的合作人, 蒂姆·布涅爾
03:10
Dr. Bunnell is an expert in speech synthesis,
68
190583
3063
布涅爾博士是智能語音方面的專家,
03:13
and what he'd been doing is building
69
193646
2033
他一直都在為
03:15
personalized voices for people
70
195679
1881
他人設計個性化的語音
03:17
by putting together
71
197560
2097
方法是通過收集
03:19
pre-recorded samples of their voice
72
199657
2150
這些人之前的聲音錄音
03:21
and reconstructing a voice for them.
73
201807
2879
然後再為他們重建一種聲音
03:24
These are people who had lost their voice
74
204686
1712
但是布涅爾博士的這些研究對象
03:26
later in life.
75
206398
1911
遇到的問題是後天性語言障礙
03:28
We didn't have the luxury
76
208309
1394
我們這次的研究沒有這個福利
03:29
of pre-recorded samples of speech
77
209703
1774
對這些先天帶有語言障礙的人
03:31
for those born with speech disorder.
78
211477
2292
我們沒有事先錄製好的聲音樣品
03:33
But I thought, there had to be a way
79
213769
2537
但是我想了想, 一定有一個方法
03:36
to reverse engineer a voice
80
216306
1944
可以從僅有的所剩中
03:38
from whatever little is left over.
81
218250
2291
將聲音逆向製作出來
03:40
So we decided to do exactly that.
82
220541
2714
所以我們決定就這樣做
03:43
We set out with a little bit of funding from the National Science Foundation,
83
223255
3403
我們從國家科學基金會獲得了一些資金,
03:46
to create custom-crafted voices that captured
84
226658
3565
用以建造一套可以抓住他們
03:50
their unique vocal identities.
85
230223
1536
聲音特性的個體化語音
03:51
We call this project VocaliD, or vocal I.D.,
86
231759
3203
我們將該專案稱作VocaliD, 或是vocal I.D.,
03:54
for vocal identity.
87
234962
2033
作為語音身份(Vocal Identity)的簡寫
03:56
Now before I get into the details of how
88
236995
2674
在我向大家播放
03:59
the voice is made and let you listen to it,
89
239669
2048
和介紹如何製作這個聲音之前,
04:01
I need to give you a real quick speech science lesson. Okay?
90
241717
3350
我需要先給大家上一堂 語言科學課, 好嗎?
04:05
So first, we know that the voice is changing
91
245067
3159
首先,我們需要了解聲音
04:08
dramatically over the course of development.
92
248226
2854
在成長的過程中會發生巨大的變化
04:11
Children sound different from teens
93
251080
2090
兒童和青少年聽起來會不同
04:13
who sound different from adults.
94
253170
1463
而青少年和成年人之間也是
04:14
We've all experienced this.
95
254633
2642
我們都曾經歷過這些語言變化階段
04:17
Fact number two is that speech
96
257275
3363
事實二,是語言的產生
04:20
is a combination of the source,
97
260638
2553
是由多個來源組成,
04:23
which is the vibrations generated by your voice box,
98
263191
3479
其中包括了你喉頭產生的顫動,
04:26
which are then pushed through
99
266670
1939
這種顫動接著
04:28
the rest of the vocal tract.
100
268609
2437
會貫穿整個聲腔
04:31
These are the chambers of your head and neck
101
271046
2484
圖像顯示的是頭和脖子的內部
04:33
that vibrate,
102
273530
1239
它們會顫動,
04:34
and they actually filter that source sound
103
274769
2110
其實它們是將來源聲音過濾掉
04:36
to produce consonants and vowels.
104
276879
2537
來產生子音和母音
04:39
So the combination of source and filter
105
279416
3860
所以聲音的來源和過濾過程加在一起
04:43
is how we produce speech.
106
283276
2630
就是我們產生聲音的方法
04:45
And that happens in one individual.
107
285906
3026
這是一個人身上發生的過程
04:48
Now I told you earlier that I'd spent
108
288932
2626
我之前告訴過大家
04:51
a good part of my career
109
291558
2025
我職業生涯的大部分時間
04:53
understanding and studying
110
293583
2453
都用來研究和學習
04:56
the source characteristics of people
111
296036
1958
有嚴重語音障礙人士的
04:57
with severe speech disorder,
112
297994
2301
聲音源的特徵,
05:00
and what I've found
113
300295
1465
我發現
05:01
is that even though their filters were impaired,
114
301760
3366
雖然他們的過濾器官已遭到損壞,
05:05
they were able to modulate their source:
115
305126
2961
他們可以調製自己的聲音來源:
05:08
the pitch, the loudness, the tempo of their voice.
116
308087
3262
包括高低度, 大小, 以及速度
05:11
These are called prosody, and I've been documenting for years
117
311349
3368
這些被稱之為音律,
05:14
that the prosodic abilities of these individuals
118
314717
2277
我用了多年的時間 來紀錄這些人是如何
05:16
are preserved.
119
316994
1575
維持自己音律的能力
05:18
So when I realized that those same cues
120
318569
4087
當我認識到同樣的線索
05:22
are also important for speaker identity,
121
322656
2769
對說話人的身份同樣重要的時候,
05:25
I had this idea.
122
325425
2015
我有了一個想法
05:27
Why don't we take the source
123
327440
2516
為什麼我們不找一個 聲音是我們所需要的人,
05:29
from the person we want the voice to sound like,
124
329956
2213
從他那採集聲音源
05:32
because it's preserved,
125
332169
1463
因為它已被保留,
05:33
and borrow the filter
126
333632
2135
然後再找一個有著相似年紀和體型的人
05:35
from someone about the same age and size,
127
335767
3229
從他那借用過濾器,
05:39
because they can articulate speech,
128
339011
2407
因為他們能清晰地說話,
05:41
and then mix them?
129
341418
1791
然後將二者混合?
05:43
Because when we mix them,
130
343209
1787
因為當我們將它們混合的時候,
05:44
we can get a voice that's as clear
131
344996
1698
我們得到的聲音將會和
05:46
as our surrogate talker --
132
346694
1754
那個代替說話者一樣清楚
05:48
that's the person we borrowed the filter from—
133
348448
2595
代替說話者就是我們借用過濾器的人
05:51
and is similar in identity to our target talker.
134
351043
4649
而產生的語音和我們 目標說話者有相似的辨認度
05:55
It's that simple.
135
355692
1427
就這麼簡單
05:57
That's the science behind what we're doing.
136
357119
2934
這就我們該項研究的科學性
06:00
So once you have that in mind,
137
360053
3533
有了這個想法以後,
06:03
how do you go about building this voice?
138
363586
2258
應該怎麼來製造這個聲音呢?
06:05
Well, you have to find someone
139
365844
1480
首先,你必須找一個
06:07
who is willing to be a surrogate.
140
367324
2400
願意當這個代替者的人
06:09
It's not such an ominous thing.
141
369724
2264
這個任務也不是太糟糕
06:11
Being a surrogate donor
142
371988
1523
當一個聲音捐贈者
06:13
only requires you to say a few hundred
143
373511
2788
只要求你閱讀幾百
06:16
to a few thousand utterances.
144
376299
2242
到幾千句話.
06:18
The process goes something like this.
145
378541
2003
以下是過程
06:20
(Video) Voice: Things happen in pairs.
146
380544
2190
(錄影)聲音: 事情成雙成對地發生
06:22
I love to sleep.
147
382734
1925
我愛睡覺
06:24
The sky is blue without clouds.
148
384659
3882
天空藍色無雲
06:28
RP: Now she's going to go on like this
149
388541
2002
演講者: 她接下來的3-4個小時
06:30
for about three to four hours,
150
390543
1919
都會繼續閱讀,
06:32
and the idea is not for her to say everything
151
392462
3005
目的是不要讓她說
06:35
that the target is going to want to say,
152
395467
2045
所有目標說話者要說的話
06:37
but the idea is to cover all the different combinations
153
397512
3395
真正的目的是要概擴所有
06:40
of the sounds that occur in the language.
154
400907
3271
在語言中可能發生的組合
06:44
The more speech you have,
155
404178
1638
你說的話越多,
06:45
the better sounding voice you're going to have.
156
405816
2305
你的聲音就會聽起來更好
06:48
Once you have those recordings,
157
408121
1673
當錄音完成後,
06:49
what we need to do
158
409794
1413
我們接下來
06:51
is we have to parse these recordings
159
411207
2718
要對這些錄音做語法分析
06:53
into little snippets of speech,
160
413925
2449
將它們分段,
06:56
one- or two-sound combinations,
161
416374
2337
大概1-2個音的組合,
06:58
sometimes even whole words
162
418711
1883
有時候也會是那些
07:00
that start populating a dataset or a database.
163
420594
4516
填入數據集或是數據庫的完整單字
07:05
We're going to call this database a voice bank.
164
425110
3717
我們將這個數據庫稱之為聲音銀行
07:08
Now the power of the voice bank
165
428827
2096
聲音銀行的力量
07:10
is that from this voice bank,
166
430923
2014
使我們通過它
07:12
we can now say any new utterance,
167
432937
2011
可以說出任何新的語句,
07:14
like, "I love chocolate" --
168
434948
1424
比如說, "我喜歡巧克力"
07:16
everyone needs to be able to say that—
169
436372
1739
所有人都需要說這類的話的能力
07:18
fish through that database
170
438111
1831
搜尋數據庫
07:19
and find all the segments necessary
171
439942
1940
找到必須的部分
07:21
to say that utterance.
172
441882
1929
來完成這個語句
07:23
(Video) Voice: I love chocolate.
173
443811
1789
(錄影)聲音: 我喜歡巧克力
07:25
RP: So that's speech synthesis.
174
445600
1391
演講人: 這是一個人工聲音
07:26
It's called concatenative synthesis, and that's what we're using.
175
446991
2573
我們將其稱之為連環整合 我們使用的就是這個方法
07:29
That's not the novel part.
176
449564
1533
這不是新奇的部分
07:31
What's novel is how we make it sound
177
451097
2221
它新奇之處是我們使它
07:33
like this young woman.
178
453318
1457
聽起來就像是這個年輕女士的聲音
07:34
This is Samantha.
179
454775
1524
她是珊曼莎
07:36
I met her when she was nine,
180
456299
2346
在她9歲時, 我第一次見到她
07:38
and since then, my team and I
181
458645
1897
在那之後, 我和我的團隊
07:40
have been trying to build her a personalized voice.
182
460542
2714
一直設法為她製造一款個性化的聲音
07:43
We first had to find a surrogate donor,
183
463256
3099
我們首先需要一個捐贈者,
07:46
and then we had to have Samantha
184
466355
1818
然後我們會讓珊曼莎
07:48
produce some utterances.
185
468173
1929
發一些音
07:50
What she can produce are mostly vowel-like sounds,
186
470102
2379
雖然她所發出的音大部分都類似母音,
07:52
but that's enough for us to extract
187
472481
2479
但我們用這些已足夠
07:54
her source characteristics.
188
474960
2285
來取得她聲音根源的特性
07:57
What happens next is best described
189
477245
3271
接下來所發生的事
08:00
by my daughter's analogy. She's six.
190
480516
2767
用我女兒的比喻來描述再合適不過, 她6歲
08:03
She calls it mixing colors to paint voices.
191
483283
5422
她說這是混合顏色來畫聲音
08:08
It's beautiful. It's exactly that.
192
488705
2555
很漂亮, 就是這樣
08:11
Samantha's voice is like a concentrated sample
193
491260
2860
珊曼莎的聲音就像是紅色食用色素
08:14
of red food dye which we can infuse
194
494120
2609
的濃縮樣品
08:16
into the recordings of her surrogate
195
496729
2540
我們可以將它注入到她代替者的錄音裡
08:19
to get a pink voice just like this.
196
499269
4387
然後取得一個像這樣的粉色聲音
08:23
(Video) Samantha: Aaaaaah.
197
503656
4491
(錄影)珊曼莎:啊.....
08:28
RP: So now, Samantha can say this.
198
508147
2808
現在, 珊曼莎可以說這個
08:30
(Video) Samantha: This voice is only for me.
199
510955
3069
(錄影)珊曼莎: 這個聲音是我的專屬
08:34
I can't wait to use my new voice with my friends.
200
514024
6305
我等不及與我朋友們分享我的聲音
08:40
RP: Thank you. (Applause)
201
520329
6417
謝謝
08:46
I'll never forget the gentle smile
202
526746
2333
我永遠都不會忘記
08:49
that spread across her face
203
529079
1902
當她第一次聽到自己的聲音時
08:50
when she heard that voice for the first time.
204
530981
3649
佈滿在她臉上那輕柔的微笑
08:54
Now there's millions of people
205
534630
1882
目前世界上
08:56
around the world like Samantha, millions,
206
536512
2833
有好幾百萬像珊曼莎的人, 幾百萬,
08:59
and we've only begun to scratch the surface.
207
539345
3440
而我們的工作才剛剛開始
09:02
What we've done so far is we have
208
542785
1642
我們目前只有
09:04
a few surrogate talkers from around the U.S.
209
544427
3859
幾個來自美國的語言代替者
09:08
who have donated their voices,
210
548286
1507
捐贈了他們的聲音,
09:09
and we have been using those
211
549793
1928
我們使用了他們的捐贈
09:11
to build our first few personalized voices.
212
551721
4472
來建造我們第一批個性化的聲音
09:16
But there's so much more work to be done.
213
556193
1756
但還有更多的工作要完成
09:17
For Samantha, her surrogate
214
557949
2188
對珊曼莎而言, 她的代替者
09:20
came from somewhere in the Midwest, a stranger
215
560137
3046
是來自美國中西部, 一個陌生人
09:23
who gave her the gift of voice.
216
563183
3841
送給了她一個聲音禮物
09:27
And as a scientist, I'm so excited
217
567024
2153
作為一個科學家, 我很開心
09:29
to take this work out of the laboratory
218
569177
1935
能將這個研究從實驗室
09:31
and finally into the real world
219
571112
1800
帶到現實的世界
09:32
so it can have real-world impact.
220
572912
3165
讓它產生一個實際的影響
09:36
What I want to share with you next
221
576077
1582
我接下來想跟大家分享
09:37
is how I envision taking this work
222
577659
2175
我如何想像讓這項研究
09:39
to that next level.
223
579834
2711
進入下一個階段
09:42
I imagine a whole world of surrogate donors
224
582545
3887
我想像著一個充滿了聲音捐贈者的世界
09:46
from all walks of life, different sizes, different ages,
225
586432
3260
他們來自各行各業, 有著不同的體型和年齡,
09:49
coming together in this voice drive
226
589692
3058
一起聚集到這個聲音活動
09:52
to give people voices
227
592750
2270
給其他人提供的聲音
09:55
that are as colorful as their personalities.
228
595020
3799
就像他們個性一樣多姿多采
09:58
To do that as a first step,
229
598819
2300
我們的第一個步驟,
10:01
we've put together this website, VocaliD.org,
230
601119
3275
是建立這個網站, VocaliD.org,
10:04
as a way to bring together those
231
604394
1624
通過這個網站將
10:06
who want to join us as voice donors,
232
606018
2675
那些願意捐贈聲音的,
10:08
as expertise donors,
233
608693
1772
願意提供意見的,
10:10
in whatever way to make this vision a reality.
234
610465
5339
還有想提供其它幫助的人聚集到一起
10:15
They say that giving blood can save lives.
235
615804
4153
有人說捐血可以救人
10:19
Well, giving your voice can change lives.
236
619957
4982
那麼捐聲音就可以改變他人的生活
10:24
All we need is a few hours of speech
237
624939
3050
從我們的代替說話者那裡
10:27
from our surrogate talker,
238
627989
1491
我們只需要幾個小時的語音,
10:29
and as little as a vowel from our target talker,
239
629480
4733
然後再從我們的目標說話者那裡取得幾個母音,
10:34
to create a unique vocal identity.
240
634213
3711
就可以建立出一個獨特的聲音身份
10:37
So that's the science behind what we're doing.
241
637924
2626
這就是我們研究背後的科學
10:40
I want to end by circling back to the human side
242
640550
4455
結尾我想再次強調人為因素
10:45
that is really the inspiration for this work.
243
645005
4102
因為它才是這項研究的啟發
10:49
About five years ago, we built our very first voice
244
649107
3699
大約在5年前, 我們為一個名為威廉的小男孩
10:52
for a little boy named William.
245
652806
2501
製造了第一個聲音
10:55
When his mom first heard this voice,
246
655307
2357
當他的媽媽第一次聽到兒子的聲音時,
10:57
she said, "This is what William
247
657664
2345
她說, "如果威廉可以說話,
11:00
would have sounded like
248
660009
1546
那他的聲音
11:01
had he been able to speak."
249
661555
2449
一定和這個一模一樣."
11:04
And then I saw William typing a message
250
664004
2418
我們然後看到威廉在他的設備上
11:06
on his device.
251
666422
1362
打一條訊息
11:07
I wondered, what was he thinking?
252
667784
3293
我猜想他在想什麼?
11:11
Imagine carrying around someone else's voice
253
671077
3590
試想一下借用了他人的聲音
11:14
for nine years
254
674667
2193
9年之後
11:16
and finally finding your own voice.
255
676860
4844
終於有了自己聲音的感覺
11:21
Imagine that.
256
681704
1377
試想一下
11:23
This is what William said:
257
683081
2797
這就是威廉說的話:
11:25
"Never heard me before."
258
685878
4463
"在這之前從來沒聽過我說話"
11:32
Thank you.
259
692417
1619
謝謝大家
11:34
(Applause)
260
694036
4724
掌聲
關於本網站

本網站將向您介紹對學習英語有用的 YouTube 視頻。 您將看到來自世界各地的一流教師教授的英語課程。 雙擊每個視頻頁面上顯示的英文字幕,從那裡播放視頻。 字幕與視頻播放同步滾動。 如果您有任何意見或要求,請使用此聯繫表與我們聯繫。

https://forms.gle/WvT1wiN1qDtmnspy7


This website was created in October 2020 and last updated on June 12, 2025.

It is now archived and preserved as an English learning resource.

Some information may be out of date.

隱私政策

eng.lish.video

Developer's Blog