Rupal Patel: Synthetic voices, as unique as fingerprints

114,553 views ・ 2014-02-13

TED

請雙擊下方英文字幕播放視頻。

譯者: Chunda Zeng 審譯者: Xuwen Zhu

00:12

I'd like to talk today

12719

1490

我今天想給大家介紹

00:14

about a powerful and fundamental aspect

14209

2927

一個對我們身份有重要影響的因素

00:17

of who we are: our voice.

17136

3598

那就是:聲音

00:20

Each one of us has a unique voiceprint

20734

2746

我們每一個人都有獨特的音印

00:23

that reflects our age, our size,

23480

2289

它反映了我們的年紀, 體型,

00:25

even our lifestyle and personality.

25769

3237

甚至我們的性格與生活習慣

00:29

In the words of the poet Longfellow,

29006

2142

以詩人亨利·沃茲沃思·朗費羅的話說:

00:31

"the human voice is the organ of the soul."

31148

3870

"人類的聲音就是靈魂的器官."

00:35

As a speech scientist, I'm fascinated

35018

2747

做為一個語言科學家, 我對聲音產生的過程

00:37

by how the voice is produced,

37765

1829

有著濃厚的興趣,

00:39

and I have an idea for how it can be engineered.

39594

3658

我對如何來設計與建造聲音有一個新的看法

00:43

That's what I'd like to share with you.

43252

2210

我想和大家分享的這個看法

00:45

I'm going to start by playing you a sample

45462

1814

先給大家放一個實例

00:47

of a voice that you may recognize.

47276

1871

你們也許認得這個聲音

00:49

(Recording) Stephen Hawking: "I would have thought

49147

1304

(錄音) 史蒂芬‧霍金:"我以為我說的話

00:50

it was fairly obvious what I meant."

50451

2749

還是比較清楚的"

00:53

Rupal Patel: That was the voice

53200

1280

這個錄音裡的聲音

00:54

of Professor Stephen Hawking.

54480

2086

是來自史蒂芬‧霍金教授

00:56

What you may not know is that same voice

56566

3849

但是你也許不知道同一個聲音

01:00

may also be used by this little girl

60415

2478

也可能被這個小女孩使用

01:02

who is unable to speak

62893

1697

她因為神經的問題

01:04

because of a neurological condition.

64590

2597

而無法說話

01:07

In fact, all of these individuals

67187

2068

事實上, 所有這些人

01:09

may be using the same voice,

69255

2012

都可能用著同一個聲音,

01:11

and that's because there's only a few options available.

71267

3557

因為目前可用的聲音只有幾個

01:14

In the U.S. alone, there are 2.5 million Americans

74824

4317

僅在美國就有250萬人

01:19

who are unable to speak,

79141

1610

無法通過語言溝通,

01:20

and many of whom use computerized devices

80751

2622

他們大多數

01:23

to communicate.

83373

1522

使用電子設備來溝通

01:24

Now that's millions of people worldwide

84895

3479

這意味著全世界有數百萬的人

01:28

who are using generic voices,

88374

1652

都用著同樣的聲音,

01:30

including Professor Hawking,

90026

1446

其中包括了霍金教授,

01:31

who uses an American-accented voice.

91472

4833

他用的是帶有美式口音的聲音

01:36

This lack of individuation of the synthetic voice

96305

3328

這種人工聲音缺少的個體性

01:39

really hit home

99633

1416

讓我非常的驚訝,

01:41

when I was at an assistive technology conference

101049

2472

當我幾年前

01:43

a few years ago,

103521

1850

在一個輔具科技會議上,

01:45

and I recall walking into an exhibit hall

105371

3604

我記得走進一個展覽廳

01:48

and seeing a little girl and a grown man

108975

3044

看見一個小女孩和一個成年男子

01:52

having a conversation using their devices,

112019

2916

通過他們的設備談話,

01:54

different devices, but the same voice.

114935

4284

雖然設備不同, 但聲音卻是一樣的

01:59

And I looked around and I saw this happening

119219

1909

我望了望四周,發現

02:01

all around me, literally hundreds of individuals

121128

4190

周圍有幾百個人

02:05

using a handful of voices,

125318

2738

使用的聲音却只有幾種

02:08

voices that didn't fit their bodies

128056

3091

都不符合他們的身體

02:11

or their personalities.

131147

2082

或是性格.

02:13

We wouldn't dream of fitting a little girl

133229

2727

我們不會考慮給一個小女孩裝上

02:15

with the prosthetic limb of a grown man.

135956

3396

一個成年男子的假肢

02:19

So why then the same prosthetic voice?

139352

3304

那為甚麼要給她一個不屬於自己的聲音呢?

02:22

It really struck me,

142656

1291

我因為感觸很深,

02:23

and I wanted to do something about this.

143947

3151

所以決定對此做些甚麼

02:27

I'm going to play you now a sample

147098

1953

接下來我要播放的例子

02:29

of someone who has, two people actually,

149051

3288

是兩個人,

02:32

who have severe speech disorders.

152339

1768

他們都有嚴重的語言障礙

02:34

I want you to take a listen to how they sound.

154107

3230

我希望大家聽聽看他們的聲音

02:37

They're saying the same utterance.

157337

2357

二人說的是一樣的話

02:39

(First voice)

159694

2432

（聲音一）

02:42

(Second voice)

162126

3617

（聲音二）

02:45

You probably didn't understand what they said,

165743

2412

你們也許沒聽懂他們的話,

02:48

but I hope that you heard

168155

1854

但我希望你們注意到了

02:50

their unique vocal identities.

170009

4283

他們聲音中的獨特性

02:54

So what I wanted to do next is,

174292

2813

我接下來要做的是,

02:57

I wanted to find out how we could harness

177105

2384

找到一個方法來

02:59

these residual vocal abilities

179489

1821

利用這些剩餘的聲音特性

03:01

and build a technology

181310

2016

來發明一套科技

03:03

that could be customized for them,

183326

2143

專為他們設計

03:05

voices that could be customized for them.

185469

2429

將他們的聲音個性化,

03:07

So I reached out to my collaborator, Tim Bunnell.

187898

2685

我找到了我的合作人, 蒂姆·布涅爾

03:10

Dr. Bunnell is an expert in speech synthesis,

190583

3063

布涅爾博士是智能語音方面的專家,

03:13

and what he'd been doing is building

193646

2033

他一直都在為

03:15

personalized voices for people

195679

1881

他人設計個性化的語音

03:17

by putting together

197560

2097

方法是通過收集

03:19

pre-recorded samples of their voice

199657

2150

這些人之前的聲音錄音

03:21

and reconstructing a voice for them.

201807

2879

然後再為他們重建一種聲音

03:24

These are people who had lost their voice

204686

1712

但是布涅爾博士的這些研究對象

03:26

later in life.

206398

1911

遇到的問題是後天性語言障礙

03:28

We didn't have the luxury

208309

1394

我們這次的研究沒有這個福利

03:29

of pre-recorded samples of speech

209703

1774

對這些先天帶有語言障礙的人

03:31

for those born with speech disorder.

211477

2292

我們沒有事先錄製好的聲音樣品

03:33

But I thought, there had to be a way

213769

2537

但是我想了想, 一定有一個方法

03:36

to reverse engineer a voice

216306

1944

可以從僅有的所剩中

03:38

from whatever little is left over.

218250

2291

將聲音逆向製作出來

03:40

So we decided to do exactly that.

220541

2714

所以我們決定就這樣做

03:43

We set out with a little bit of funding from the National Science Foundation,

223255

3403

我們從國家科學基金會獲得了一些資金,

03:46

to create custom-crafted voices that captured

226658

3565

用以建造一套可以抓住他們

03:50

their unique vocal identities.

230223

1536

聲音特性的個體化語音

03:51

We call this project VocaliD, or vocal I.D.,

231759

3203

我們將該專案稱作VocaliD, 或是vocal I.D.,

03:54

for vocal identity.

234962

2033

作為語音身份(Vocal Identity)的簡寫

03:56

Now before I get into the details of how

236995

2674

在我向大家播放

03:59

the voice is made and let you listen to it,

239669

2048

和介紹如何製作這個聲音之前,

04:01

I need to give you a real quick speech science lesson. Okay?

241717

3350

我需要先給大家上一堂語言科學課, 好嗎?

04:05

So first, we know that the voice is changing

245067

3159

首先,我們需要了解聲音

04:08

dramatically over the course of development.

248226

2854

在成長的過程中會發生巨大的變化

04:11

Children sound different from teens

251080

2090

兒童和青少年聽起來會不同

04:13

who sound different from adults.

253170

1463

而青少年和成年人之間也是

04:14

We've all experienced this.

254633

2642

我們都曾經歷過這些語言變化階段

04:17

Fact number two is that speech

257275

3363

事實二，是語言的產生

04:20

is a combination of the source,

260638

2553

是由多個來源組成,

04:23

which is the vibrations generated by your voice box,

263191

3479

其中包括了你喉頭產生的顫動,

04:26

which are then pushed through

266670

1939

這種顫動接著

04:28

the rest of the vocal tract.

100

268609

2437

會貫穿整個聲腔

04:31

These are the chambers of your head and neck

101

271046

2484

圖像顯示的是頭和脖子的內部

04:33

that vibrate,

102

273530

1239

它們會顫動,

04:34

and they actually filter that source sound

103

274769

2110

其實它們是將來源聲音過濾掉

04:36

to produce consonants and vowels.

104

276879

2537

來產生子音和母音

04:39

So the combination of source and filter

105

279416

3860

所以聲音的來源和過濾過程加在一起

04:43

is how we produce speech.

106

283276

2630

就是我們產生聲音的方法

04:45

And that happens in one individual.

107

285906

3026

這是一個人身上發生的過程

04:48

Now I told you earlier that I'd spent

108

288932

2626

我之前告訴過大家

04:51

a good part of my career

109

291558

2025

我職業生涯的大部分時間

04:53

understanding and studying

110

293583

2453

都用來研究和學習

04:56

the source characteristics of people

111

296036

1958

有嚴重語音障礙人士的

04:57

with severe speech disorder,

112

297994

2301

聲音源的特徵,

05:00

and what I've found

113

300295

1465

我發現

05:01

is that even though their filters were impaired,

114

301760

3366

雖然他們的過濾器官已遭到損壞,

05:05

they were able to modulate their source:

115

305126

2961

他們可以調製自己的聲音來源:

05:08

the pitch, the loudness, the tempo of their voice.

116

308087

3262

包括高低度, 大小, 以及速度

05:11

These are called prosody, and I've been documenting for years

117

311349

3368

這些被稱之為音律,

05:14

that the prosodic abilities of these individuals

118

314717

2277

我用了多年的時間來紀錄這些人是如何

05:16

are preserved.

119

316994

1575

維持自己音律的能力

05:18

So when I realized that those same cues

120

318569

4087

當我認識到同樣的線索

05:22

are also important for speaker identity,

121

322656

2769

對說話人的身份同樣重要的時候,

05:25

I had this idea.

122

325425

2015

我有了一個想法

05:27

Why don't we take the source

123

327440

2516

為什麼我們不找一個聲音是我們所需要的人,

05:29

from the person we want the voice to sound like,

124

329956

2213

從他那採集聲音源

05:32

because it's preserved,

125

332169

1463

因為它已被保留,

05:33

and borrow the filter

126

333632

2135

然後再找一個有著相似年紀和體型的人

05:35

from someone about the same age and size,

127

335767

3229

從他那借用過濾器,

05:39

because they can articulate speech,

128

339011

2407

因為他們能清晰地說話,

05:41

and then mix them?

129

341418

1791

然後將二者混合?

05:43

Because when we mix them,

130

343209

1787

因為當我們將它們混合的時候,

05:44

we can get a voice that's as clear

131

344996

1698

我們得到的聲音將會和

05:46

as our surrogate talker --

132

346694

1754

那個代替說話者一樣清楚

05:48

that's the person we borrowed the filter from—

133

348448

2595

代替說話者就是我們借用過濾器的人

05:51

and is similar in identity to our target talker.

134

351043

4649

而產生的語音和我們目標說話者有相似的辨認度

05:55

It's that simple.

135

355692

1427

就這麼簡單

05:57

That's the science behind what we're doing.

136

357119

2934

這就我們該項研究的科學性

06:00

So once you have that in mind,

137

360053

3533

有了這個想法以後,

06:03

how do you go about building this voice?

138

363586

2258

應該怎麼來製造這個聲音呢?

06:05

Well, you have to find someone

139

365844

1480

首先,你必須找一個

06:07

who is willing to be a surrogate.

140

367324

2400

願意當這個代替者的人

06:09

It's not such an ominous thing.

141

369724

2264

這個任務也不是太糟糕

06:11

Being a surrogate donor

142

371988

1523

當一個聲音捐贈者

06:13

only requires you to say a few hundred

143

373511

2788

只要求你閱讀幾百

06:16

to a few thousand utterances.

144

376299

2242

到幾千句話.

06:18

The process goes something like this.

145

378541

2003

以下是過程

06:20

(Video) Voice: Things happen in pairs.

146

380544

2190

(錄影)聲音: 事情成雙成對地發生

06:22

I love to sleep.

147

382734

1925

我愛睡覺

06:24

The sky is blue without clouds.

148

384659

3882

天空藍色無雲

06:28

RP: Now she's going to go on like this

149

388541

2002

演講者: 她接下來的3-4個小時

06:30

for about three to four hours,

150

390543

1919

都會繼續閱讀,

06:32

and the idea is not for her to say everything

151

392462

3005

目的是不要讓她說

06:35

that the target is going to want to say,

152

395467

2045

所有目標說話者要說的話

06:37

but the idea is to cover all the different combinations

153

397512

3395

真正的目的是要概擴所有

06:40

of the sounds that occur in the language.

154

400907

3271

在語言中可能發生的組合

06:44

The more speech you have,

155

404178

1638

你說的話越多,

06:45

the better sounding voice you're going to have.

156

405816

2305

你的聲音就會聽起來更好

06:48

Once you have those recordings,

157

408121

1673

當錄音完成後,

06:49

what we need to do

158

409794

1413

我們接下來

06:51

is we have to parse these recordings

159

411207

2718

要對這些錄音做語法分析

06:53

into little snippets of speech,

160

413925

2449

將它們分段,

06:56

one- or two-sound combinations,

161

416374

2337

大概1-2個音的組合,

06:58

sometimes even whole words

162

418711

1883

有時候也會是那些

07:00

that start populating a dataset or a database.

163

420594

4516

填入數據集或是數據庫的完整單字

07:05

We're going to call this database a voice bank.

164

425110

3717

我們將這個數據庫稱之為聲音銀行

07:08

Now the power of the voice bank

165

428827

2096

聲音銀行的力量

07:10

is that from this voice bank,

166

430923

2014

使我們通過它

07:12

we can now say any new utterance,

167

432937

2011

可以說出任何新的語句,

07:14

like, "I love chocolate" --

168

434948

1424

比如說, "我喜歡巧克力"

07:16

everyone needs to be able to say that—

169

436372

1739

所有人都需要說這類的話的能力

07:18

fish through that database

170

438111

1831

搜尋數據庫

07:19

and find all the segments necessary

171

439942

1940

找到必須的部分

07:21

to say that utterance.

172

441882

1929

來完成這個語句

07:23

(Video) Voice: I love chocolate.

173

443811

1789

(錄影)聲音: 我喜歡巧克力

07:25

RP: So that's speech synthesis.

174

445600

1391

演講人: 這是一個人工聲音

07:26

It's called concatenative synthesis, and that's what we're using.

175

446991

2573

我們將其稱之為連環整合我們使用的就是這個方法

07:29

That's not the novel part.

176

449564

1533

這不是新奇的部分

07:31

What's novel is how we make it sound

177

451097

2221

它新奇之處是我們使它

07:33

like this young woman.

178

453318

1457

聽起來就像是這個年輕女士的聲音

07:34

This is Samantha.

179

454775

1524

她是珊曼莎

07:36

I met her when she was nine,

180

456299

2346

在她9歲時, 我第一次見到她

07:38

and since then, my team and I

181

458645

1897

在那之後, 我和我的團隊

07:40

have been trying to build her a personalized voice.

182

460542

2714

一直設法為她製造一款個性化的聲音

07:43

We first had to find a surrogate donor,

183

463256

3099

我們首先需要一個捐贈者,

07:46

and then we had to have Samantha

184

466355

1818

然後我們會讓珊曼莎

07:48

produce some utterances.

185

468173

1929

發一些音

07:50

What she can produce are mostly vowel-like sounds,

186

470102

2379

雖然她所發出的音大部分都類似母音,

07:52

but that's enough for us to extract

187

472481

2479

但我們用這些已足夠

07:54

her source characteristics.

188

474960

2285

來取得她聲音根源的特性

07:57

What happens next is best described

189

477245

3271

接下來所發生的事

08:00

by my daughter's analogy. She's six.

190

480516

2767

用我女兒的比喻來描述再合適不過, 她6歲

08:03

She calls it mixing colors to paint voices.

191

483283

5422

她說這是混合顏色來畫聲音

08:08

It's beautiful. It's exactly that.

192

488705

2555

很漂亮, 就是這樣

08:11

Samantha's voice is like a concentrated sample

193

491260

2860

珊曼莎的聲音就像是紅色食用色素

08:14

of red food dye which we can infuse

194

494120

2609

的濃縮樣品

08:16

into the recordings of her surrogate

195

496729

2540

我們可以將它注入到她代替者的錄音裡

08:19

to get a pink voice just like this.

196

499269

4387

然後取得一個像這樣的粉色聲音

08:23

(Video) Samantha: Aaaaaah.

197

503656

4491

(錄影)珊曼莎:啊.....

08:28

RP: So now, Samantha can say this.

198

508147

2808

現在, 珊曼莎可以說這個

08:30

(Video) Samantha: This voice is only for me.

199

510955

3069

(錄影)珊曼莎: 這個聲音是我的專屬

08:34

I can't wait to use my new voice with my friends.

200

514024

6305

我等不及與我朋友們分享我的聲音

08:40

RP: Thank you. (Applause)

201

520329

6417

謝謝

08:46

I'll never forget the gentle smile

202

526746

2333

我永遠都不會忘記

08:49

that spread across her face

203

529079

1902

當她第一次聽到自己的聲音時

08:50

when she heard that voice for the first time.

204

530981

3649

佈滿在她臉上那輕柔的微笑

08:54

Now there's millions of people

205

534630

1882

目前世界上

08:56

around the world like Samantha, millions,

206

536512

2833

有好幾百萬像珊曼莎的人, 幾百萬,

08:59

and we've only begun to scratch the surface.

207

539345

3440

而我們的工作才剛剛開始

09:02

What we've done so far is we have

208

542785

1642

我們目前只有

09:04

a few surrogate talkers from around the U.S.

209

544427

3859

幾個來自美國的語言代替者

09:08

who have donated their voices,

210

548286

1507

捐贈了他們的聲音,

09:09

and we have been using those

211

549793

1928

我們使用了他們的捐贈

09:11

to build our first few personalized voices.

212

551721

4472

來建造我們第一批個性化的聲音

09:16

But there's so much more work to be done.

213

556193

1756

但還有更多的工作要完成

09:17

For Samantha, her surrogate

214

557949

2188

對珊曼莎而言, 她的代替者

09:20

came from somewhere in the Midwest, a stranger

215

560137

3046

是來自美國中西部, 一個陌生人

09:23

who gave her the gift of voice.

216

563183

3841

送給了她一個聲音禮物

09:27

And as a scientist, I'm so excited

217

567024

2153

作為一個科學家, 我很開心

09:29

to take this work out of the laboratory

218

569177

1935

能將這個研究從實驗室

09:31

and finally into the real world

219

571112

1800

帶到現實的世界

09:32

so it can have real-world impact.

220

572912

3165

讓它產生一個實際的影響

09:36

What I want to share with you next

221

576077

1582

我接下來想跟大家分享

09:37

is how I envision taking this work

222

577659

2175

我如何想像讓這項研究

09:39

to that next level.

223

579834

2711

進入下一個階段

09:42

I imagine a whole world of surrogate donors

224

582545

3887

我想像著一個充滿了聲音捐贈者的世界

09:46

from all walks of life, different sizes, different ages,

225

586432

3260

他們來自各行各業, 有著不同的體型和年齡,

09:49

coming together in this voice drive

226

589692

3058

一起聚集到這個聲音活動

09:52

to give people voices

227

592750

2270

給其他人提供的聲音

09:55

that are as colorful as their personalities.

228

595020

3799

就像他們個性一樣多姿多采

09:58

To do that as a first step,

229

598819

2300

我們的第一個步驟,

10:01

we've put together this website, VocaliD.org,

230

601119

3275

是建立這個網站, VocaliD.org,

10:04

as a way to bring together those

231

604394

1624

通過這個網站將

10:06

who want to join us as voice donors,

232

606018

2675

那些願意捐贈聲音的,

10:08

as expertise donors,

233

608693

1772

願意提供意見的,

10:10

in whatever way to make this vision a reality.

234

610465

5339

還有想提供其它幫助的人聚集到一起

10:15

They say that giving blood can save lives.

235

615804

4153

有人說捐血可以救人

10:19

Well, giving your voice can change lives.

236

619957

4982

那麼捐聲音就可以改變他人的生活

10:24

All we need is a few hours of speech

237

624939

3050

從我們的代替說話者那裡

10:27

from our surrogate talker,

238

627989

1491

我們只需要幾個小時的語音,

10:29

and as little as a vowel from our target talker,

239

629480

4733

然後再從我們的目標說話者那裡取得幾個母音,

10:34

to create a unique vocal identity.

240

634213

3711

就可以建立出一個獨特的聲音身份

10:37

So that's the science behind what we're doing.

241

637924

2626

這就是我們研究背後的科學

10:40

I want to end by circling back to the human side

242

640550

4455

結尾我想再次強調人為因素

10:45

that is really the inspiration for this work.

243

645005

4102

因為它才是這項研究的啟發

10:49

About five years ago, we built our very first voice

244

649107

3699

大約在5年前, 我們為一個名為威廉的小男孩

10:52

for a little boy named William.

245

652806

2501

製造了第一個聲音

10:55

When his mom first heard this voice,

246

655307

2357

當他的媽媽第一次聽到兒子的聲音時,

10:57

she said, "This is what William

247

657664

2345

她說, "如果威廉可以說話,

11:00

would have sounded like

248

660009

1546

那他的聲音

11:01

had he been able to speak."

249

661555

2449

一定和這個一模一樣."

11:04

And then I saw William typing a message

250

664004

2418

我們然後看到威廉在他的設備上

11:06

on his device.

251

666422

1362

打一條訊息

11:07

I wondered, what was he thinking?

252

667784

3293

我猜想他在想什麼?

11:11

Imagine carrying around someone else's voice

253

671077

3590

試想一下借用了他人的聲音

11:14

for nine years

254

674667

2193

9年之後

11:16

and finally finding your own voice.

255

676860

4844

終於有了自己聲音的感覺

11:21

Imagine that.

256

681704

1377

試想一下

11:23

This is what William said:

257

683081

2797

這就是威廉說的話:

11:25

"Never heard me before."

258

685878

4463

"在這之前從來沒聽過我說話"

11:32

Thank you.

259

692417

1619

謝謝大家

11:34

(Applause)

260

694036

4724

掌聲

New videos

06:51

The Rise of China's Homegrown Brands — and Why ...

06:45

Parkour! How the Sport Keeps Your Body and Mind...

05:38

Can you solve the riddle of Pandora’s box? - Al...

05:59

The tale of the Monkey King and the Buddha - Ji...

10:03

Which species would you get rid of? | Ada, Ep. 5

05:29

How are microchips made? - George Zaidan and Sa...

10:03

Why Daylight Is the Secret to Great Sleep | Chr...

11:12

6 Ways to Make Better Connections Online | Marg...

Original video on YouTube.com

Rupal Patel: Synthetic voices, as unique as fingerprints - YouTube

關於本網站

本網站將向您介紹對學習英語有用的 YouTube 視頻。您將看到來自世界各地的一流教師教授的英語課程。雙擊每個視頻頁面上顯示的英文字幕，從那裡播放視頻。字幕與視頻播放同步滾動。如果您有任何意見或要求，請使用此聯繫表與我們聯繫。

https://forms.gle/WvT1wiN1qDtmnspy7

Playback speed

Subtitle font size

Rupal Patel: Synthetic voices, as unique as fingerprints

New videos

Rupal Patel: Synthetic voices, as unique as fingerprints

New videos

Original video on YouTube.com