Rupal Patel: Synthetic voices, as unique as fingerprints

114,553 views ・ 2014-02-13

TED

请双击下面的英文字幕来播放视频。

翻译人员: Peipei Xiang 校对人员: Ying Wang

00:12

I'd like to talk today

12719

1490

我今天要和大家讲述的是

00:14

about a powerful and fundamental aspect

14209

2927

关于我们自身的一个非常强大

00:17

of who we are: our voice.

17136

3598

非常重要的方面：我们的声音，

00:20

Each one of us has a unique voiceprint

20734

2746

每一个人的声音都带有独特的标记，

00:23

that reflects our age, our size,

23480

2289

这个声音的标记能反映出我们的年龄，我们的胖瘦高矮，

00:25

even our lifestyle and personality.

25769

3237

甚至是我们的生活方式和性格。

00:29

In the words of the poet Longfellow,

29006

2142

用诗人朗费罗的话来说，

00:31

"the human voice is the organ of the soul."

31148

3870

“人类的声音是灵魂的重要器官。”

00:35

As a speech scientist, I'm fascinated

35018

2747

身为一个语音科学家，我非常热衷于研究

00:37

by how the voice is produced,

37765

1829

声音的产生，

00:39

and I have an idea for how it can be engineered.

39594

3658

而且我有一个如何制造声音的想法。

00:43

That's what I'd like to share with you.

43252

2210

这就是我今天想和大家分享的东西。

00:45

I'm going to start by playing you a sample

45462

1814

首先，我想为大家播放一个声音样本，

00:47

of a voice that you may recognize.

47276

1871

这个声音你们可能听过。

00:49

(Recording) Stephen Hawking: "I would have thought

49147

1304

（录音）史蒂芬·霍金：“我本来以为，

00:50

it was fairly obvious what I meant."

50451

2749

我想说的意思很显而易见。”

00:53

Rupal Patel: That was the voice

53200

1280

卢帕尔·帕特尔：那是

00:54

of Professor Stephen Hawking.

54480

2086

史蒂芬·霍金教授的声音。

00:56

What you may not know is that same voice

56566

3849

你们可能不知道的是，同样的声音

01:00

may also be used by this little girl

60415

2478

也被用于这个小女孩身上，

01:02

who is unable to speak

62893

1697

她因为大脑神经系统缺陷

01:04

because of a neurological condition.

64590

2597

而不能讲话。

01:07

In fact, all of these individuals

67187

2068

事实上，很多不能说话的人

01:09

may be using the same voice,

69255

2012

都可能在使用同样的声音

01:11

and that's because there's only a few options available.

71267

3557

那是因为可以使用的声音样本太少了。

01:14

In the U.S. alone, there are 2.5 million Americans

74824

4317

单单在美国，就有250万人

01:19

who are unable to speak,

79141

1610

不能说话，

01:20

and many of whom use computerized devices

80751

2622

而且在这些人中很多都是使用电脑化的设备

01:23

to communicate.

83373

1522

进行交流。

01:24

Now that's millions of people worldwide

84895

3479

也就是全世界数百万的人

01:28

who are using generic voices,

88374

1652

都在使用一些毫无个性的声音，

01:30

including Professor Hawking,

90026

1446

其中就包括史蒂芬·霍金教授，

01:31

who uses an American-accented voice.

91472

4833

他使用的声音是带有美国口音的。

01:36

This lack of individuation of the synthetic voice

96305

3328

我真正开始意识到

01:39

really hit home

99633

1416

合成声音缺乏个性

01:41

when I was at an assistive technology conference

101049

2472

是我在几年前参加一个

01:43

a few years ago,

103521

1850

辅助技术会议的时候，

01:45

and I recall walking into an exhibit hall

105371

3604

我记得走进一个展厅，

01:48

and seeing a little girl and a grown man

108975

3044

看到一个小女孩和一个成年男子

01:52

having a conversation using their devices,

112019

2916

正在用他们的设备进行对话，

01:54

different devices, but the same voice.

114935

4284

不同的设备，却是同样的声音。

01:59

And I looked around and I saw this happening

119219

1909

我看向四周，发现身边这种情况很多，

02:01

all around me, literally hundreds of individuals

121128

4190

几乎是上百个人

02:05

using a handful of voices,

125318

2738

却只用着为数不多的几种声音，

02:08

voices that didn't fit their bodies

128056

3091

这些声音跟他们的身体特征

02:11

or their personalities.

131147

2082

和性格都很不匹配。

02:13

We wouldn't dream of fitting a little girl

133229

2727

我们肯定做梦也不会想到把一个成年男子的假肢

02:15

with the prosthetic limb of a grown man.

135956

3396

装在一个小女孩身上。

02:19

So why then the same prosthetic voice?

139352

3304

那为什么他们要用同样的合成声音呢？

02:22

It really struck me,

142656

1291

这深深的触动了我，

02:23

and I wanted to do something about this.

143947

3151

我想做些什么。

02:27

I'm going to play you now a sample

147098

1953

现在我想为大家播放一个人的录音——

02:29

of someone who has, two people actually,

149051

3288

不对，其实是两个人，

02:32

who have severe speech disorders.

152339

1768

他们都有很严重的言语障碍。

02:34

I want you to take a listen to how they sound.

154107

3230

我想让大家听听他们的声音。

02:37

They're saying the same utterance.

157337

2357

他们在发出同样一个音。

02:39

(First voice)

159694

2432

（第一个声音）

02:42

(Second voice)

162126

3617

（第二个声音）

02:45

You probably didn't understand what they said,

165743

2412

大家可能并不明白他们说了什么，

02:48

but I hope that you heard

168155

1854

但我希望大家听到了

02:50

their unique vocal identities.

170009

4283

他们独特的声音标志。

02:54

So what I wanted to do next is,

174292

2813

所以接下来我想要做的事情就是，

02:57

I wanted to find out how we could harness

177105

2384

我想要找出如何可以利用

02:59

these residual vocal abilities

179489

1821

他们残留的发声能力，

03:01

and build a technology

181310

2016

并发明一项技术，

03:03

that could be customized for them,

183326

2143

这项技术能为他们创造出个性化的声音，

03:05

voices that could be customized for them.

185469

2429

就是专门为他们定制的声音。

03:07

So I reached out to my collaborator, Tim Bunnell.

187898

2685

所以我联系了我的合作伙伴，蒂姆·邦内尔。

03:10

Dr. Bunnell is an expert in speech synthesis,

190583

3063

邦内尔博士是一位语言合成方面的专家，

03:13

and what he'd been doing is building

193646

2033

他一直在为需要帮助的人合成

03:15

personalized voices for people

195679

1881

个性化的声音，

03:17

by putting together

197560

2097

他把这些人

03:19

pre-recorded samples of their voice

199657

2150

预先录制好的声音样本组合在一起，

03:21

and reconstructing a voice for them.

201807

2879

并重新建立他们的声音。

03:24

These are people who had lost their voice

204686

1712

这些人都是在人生后来的某个阶段

03:26

later in life.

206398

1911

才失去了语言能力。

03:28

We didn't have the luxury

208309

1394

可是我们没有

03:29

of pre-recorded samples of speech

209703

1774

那些生来就有言语障碍的人的

03:31

for those born with speech disorder.

211477

2292

预先录制好的声音样本。

03:33

But I thought, there had to be a way

213769

2537

但我想，肯定有一个办法

03:36

to reverse engineer a voice

216306

1944

可以利用仅存的不管剩下多少的语言能力

03:38

from whatever little is left over.

218250

2291

来逆向重组声音。

03:40

So we decided to do exactly that.

220541

2714

于是我们决定去做这样的工作。

03:43

We set out with a little bit of funding from the National Science Foundation,

223255

3403

我们从国家科学基金会的一小笔资金开始，

03:46

to create custom-crafted voices that captured

226658

3565

努力打造反映了他们的独特声印的

03:50

their unique vocal identities.

230223

1536

定制的声音。

03:51

We call this project VocaliD, or vocal I.D.,

231759

3203

我们称之为VocaliD计划，即声音ID，

03:54

for vocal identity.

234962

2033

用于区别不同的声音。

03:56

Now before I get into the details of how

236995

2674

那么，在我开始讲述

03:59

the voice is made and let you listen to it,

239669

2048

声音是如何制作的，以及让大家听这些声音之前，

04:01

I need to give you a real quick speech science lesson. Okay?

241717

3350

我需要先给大家上一堂关于语音学的快速入门课，可以么？

04:05

So first, we know that the voice is changing

245067

3159

首先，我们知道声音

04:08

dramatically over the course of development.

248226

2854

在其发展过程中会发生巨大的改变。

04:11

Children sound different from teens

251080

2090

儿童的声音与青少年的声音不同，

04:13

who sound different from adults.

253170

1463

而青少年的声音则与成人的声音不同。

04:14

We've all experienced this.

254633

2642

我们都经历过这样的改变。

04:17

Fact number two is that speech

257275

3363

第二，语音是

04:20

is a combination of the source,

260638

2553

声源的组合，

04:23

which is the vibrations generated by your voice box,

263191

3479

也就是你的喉部产生的震动

04:26

which are then pushed through

266670

1939

通过声道

04:28

the rest of the vocal tract.

100

268609

2437

传出来。

04:31

These are the chambers of your head and neck

101

271046

2484

这些是你的头部和颈部

04:33

that vibrate,

102

273530

1239

会震动的腔室，

04:34

and they actually filter that source sound

103

274769

2110

他们会过滤声源

04:36

to produce consonants and vowels.

104

276879

2537

并产生辅音和元音。

04:39

So the combination of source and filter

105

279416

3860

所以声源和过滤器的组合

04:43

is how we produce speech.

106

283276

2630

使得我们能够制造语言。

04:45

And that happens in one individual.

107

285906

3026

而这发生在一个个体身上。

04:48

Now I told you earlier that I'd spent

108

288932

2626

早先我告诉过你们

04:51

a good part of my career

109

291558

2025

我花了我职业生涯中的很大一部分时间

04:53

understanding and studying

110

293583

2453

来了解和学习

04:56

the source characteristics of people

111

296036

1958

那些有着严重言语障碍的人的

04:57

with severe speech disorder,

112

297994

2301

声源的特征，

05:00

and what I've found

113

300295

1465

我发现

05:01

is that even though their filters were impaired,

114

301760

3366

虽然他们的过滤器受损，

05:05

they were able to modulate their source:

115

305126

2961

他们仍然能够控制他们的声源，

05:08

the pitch, the loudness, the tempo of their voice.

116

308087

3262

包括音高、响度和声音的节奏。

05:11

These are called prosody, and I've been documenting for years

117

311349

3368

这些我们称这些为韵律，而我多年的记录表明

05:14

that the prosodic abilities of these individuals

118

314717

2277

这些人的韵律能力

05:16

are preserved.

119

316994

1575

被保留了下来。

05:18

So when I realized that those same cues

120

318569

4087

所以当我意识到这些同样的线索

05:22

are also important for speaker identity,

121

322656

2769

对讲者身份也是非常重要的时候，

05:25

I had this idea.

122

325425

2015

我有了这样一个想法。

05:27

Why don't we take the source

123

327440

2516

为什么不利用那些

05:29

from the person we want the voice to sound like,

124

329956

2213

我们希望听到的声音的声源，

05:32

because it's preserved,

125

332169

1463

因为这个声源是好的，

05:33

and borrow the filter

126

333632

2135

再借助一个

05:35

from someone about the same age and size,

127

335767

3229

差不多年龄和体型的人的过滤器，

05:39

because they can articulate speech,

128

339011

2407

因为他们可以清晰地发声，

05:41

and then mix them?

129

341418

1791

然后把他们组合在一起？

05:43

Because when we mix them,

130

343209

1787

因为当我们把它们组合在一起的时候，

05:44

we can get a voice that's as clear

131

344996

1698

我们就可以获得一个

05:46

as our surrogate talker --

132

346694

1754

像代理说话者一样清晰的声音，

05:48

that's the person we borrowed the filter from—

133

348448

2595

代理说话者就是我们向其借了过滤器的那个人，

05:51

and is similar in identity to our target talker.

134

351043

4649

而这个声音又跟我们的目标说话者的身份一致。

05:55

It's that simple.

135

355692

1427

就这么简单。

05:57

That's the science behind what we're doing.

136

357119

2934

这就是我们在做的研究背后的科学。

06:00

So once you have that in mind,

137

360053

3533

有了这样的想法以后，

06:03

how do you go about building this voice?

138

363586

2258

我们又该如何真正去打造这样的声音呢？

06:05

Well, you have to find someone

139

365844

1480

嗯，你必须找到

06:07

who is willing to be a surrogate.

140

367324

2400

愿意做代理说话者的人。

06:09

It's not such an ominous thing.

141

369724

2264

这并不是什么有着不祥之兆的事情。

06:11

Being a surrogate donor

142

371988

1523

作为一个代理说话者，

06:13

only requires you to say a few hundred

143

373511

2788

你只需要说上几百个

06:16

to a few thousand utterances.

144

376299

2242

到几千个话语。

06:18

The process goes something like this.

145

378541

2003

过程大致是这样的。

06:20

(Video) Voice: Things happen in pairs.

146

380544

2190

（视频）声音：事情成对发生。

06:22

I love to sleep.

147

382734

1925

我爱睡觉。

06:24

The sky is blue without clouds.

148

384659

3882

天空很蓝，无云。

06:28

RP: Now she's going to go on like this

149

388541

2002

卢帕尔·帕特尔：她就这样继续说上

06:30

for about three to four hours,

150

390543

1919

大约三到四个小时，

06:32

and the idea is not for her to say everything

151

392462

3005

当然她并不需要说出

06:35

that the target is going to want to say,

152

395467

2045

目标说话者会说的所有东西，

06:37

but the idea is to cover all the different combinations

153

397512

3395

而只需覆盖到一门语言中的

06:40

of the sounds that occur in the language.

154

400907

3271

所有发音的不同组合。

06:44

The more speech you have,

155

404178

1638

越多的语音样本

06:45

the better sounding voice you're going to have.

156

405816

2305

就意味着越好的声音质量。

06:48

Once you have those recordings,

157

408121

1673

一旦有了这些录音之后，

06:49

what we need to do

158

409794

1413

我们需要做的就是

06:51

is we have to parse these recordings

159

411207

2718

将这些录音

06:53

into little snippets of speech,

160

413925

2449

解析成语音的小片段，

06:56

one- or two-sound combinations,

161

416374

2337

一两个发声的组合，

06:58

sometimes even whole words

162

418711

1883

有的时候甚至整个的词语

07:00

that start populating a dataset or a database.

163

420594

4516

也会出现在数据库里边。

07:05

We're going to call this database a voice bank.

164

425110

3717

我们就将这个数据库称为声音银行。

07:08

Now the power of the voice bank

165

428827

2096

这个声音银行的作用在于：

07:10

is that from this voice bank,

166

430923

2014

基于这个声音银行，

07:12

we can now say any new utterance,

167

432937

2011

我们现在可以说出任何新的话语，

07:14

like, "I love chocolate" --

168

434948

1424

比如：“我爱巧克力”——

07:16

everyone needs to be able to say that—

169

436372

1739

每个人都应该有可以说出这句话的能力——

07:18

fish through that database

170

438111

1831

从这个数据库中寻找

07:19

and find all the segments necessary

171

439942

1940

并找到说这句话需要的

07:21

to say that utterance.

172

441882

1929

所有必要的片段。

07:23

(Video) Voice: I love chocolate.

173

443811

1789

（视频）声音：我爱巧克力。

07:25

RP: So that's speech synthesis.

174

445600

1391

卢帕尔·帕特尔：这就是语音合成。

07:26

It's called concatenative synthesis, and that's what we're using.

175

446991

2573

这个被称之为衔接合成，而我们用的就是它。

07:29

That's not the novel part.

176

449564

1533

其实这部分并不新奇。

07:31

What's novel is how we make it sound

177

451097

2221

新奇的部分是我们如何制作出听起来

07:33

like this young woman.

178

453318

1457

像是这个年轻女性的声音。

07:34

This is Samantha.

179

454775

1524

这是萨曼莎。

07:36

I met her when she was nine,

180

456299

2346

我第一次见到她的时候，她九岁，

07:38

and since then, my team and I

181

458645

1897

从那时候起，我和我的团队

07:40

have been trying to build her a personalized voice.

182

460542

2714

就一直在努力给她打造一个属于她自己的声音。

07:43

We first had to find a surrogate donor,

183

463256

3099

我们首先要找到一个代理说话者，

07:46

and then we had to have Samantha

184

466355

1818

然后我们让萨曼莎

07:48

produce some utterances.

185

468173

1929

发出一些声音。

07:50

What she can produce are mostly vowel-like sounds,

186

470102

2379

她能做的就是发出一些类似元音的声音，

07:52

but that's enough for us to extract

187

472481

2479

但这对于我们提取她的声源特征

07:54

her source characteristics.

188

474960

2285

已经足够了。

07:57

What happens next is best described

189

477245

3271

接下来发生的事情最好可以

08:00

by my daughter's analogy. She's six.

190

480516

2767

用我女儿的比喻来描述。她六岁。

08:03

She calls it mixing colors to paint voices.

191

483283

5422

她称其为“用不同的颜色画声音”。

08:08

It's beautiful. It's exactly that.

192

488705

2555

美极了。正是这样。

08:11

Samantha's voice is like a concentrated sample

193

491260

2860

萨曼莎的声音就好比是

08:14

of red food dye which we can infuse

194

494120

2609

浓缩的红色食用色素注入了

08:16

into the recordings of her surrogate

195

496729

2540

她的代理说话者的录音里面，

08:19

to get a pink voice just like this.

196

499269

4387

而产生了这样的粉红色的声音。

08:23

(Video) Samantha: Aaaaaah.

197

503656

4491

（视频）萨曼莎：啊……

08:28

RP: So now, Samantha can say this.

198

508147

2808

卢帕尔·帕特尔：那么现在，萨曼莎可以说这样的话。

08:30

(Video) Samantha: This voice is only for me.

199

510955

3069

（视频）萨曼莎：这是只属于我的声音。

08:34

I can't wait to use my new voice with my friends.

200

514024

6305

我迫不及待地想跟我的朋友用我的新声音交流。

08:40

RP: Thank you. (Applause)

201

520329

6417

卢帕尔·帕特尔：谢谢。（掌声）

08:46

I'll never forget the gentle smile

202

526746

2333

我永远不会忘记

08:49

that spread across her face

203

529079

1902

当她第一次听到自己的声音的时候，

08:50

when she heard that voice for the first time.

204

530981

3649

那个绽放在她脸上的温柔的笑脸。

08:54

Now there's millions of people

205

534630

1882

这个世界有上百万

08:56

around the world like Samantha, millions,

206

536512

2833

和萨曼莎一样的人，上百万，

08:59

and we've only begun to scratch the surface.

207

539345

3440

而我们其实才刚刚开始。

09:02

What we've done so far is we have

208

542785

1642

我们到目前为止所做的就是，

09:04

a few surrogate talkers from around the U.S.

209

544427

3859

我们有来自美国的几个代理说话者，

09:08

who have donated their voices,

210

548286

1507

他们捐献了自己的声音，

09:09

and we have been using those

211

549793

1928

而我们正在用这些声音

09:11

to build our first few personalized voices.

212

551721

4472

来打造最初的一些个性化的声音。

09:16

But there's so much more work to be done.

213

556193

1756

但是接下来的任务还很重。

09:17

For Samantha, her surrogate

214

557949

2188

就萨曼莎，她的代理说话者

09:20

came from somewhere in the Midwest, a stranger

215

560137

3046

来自中西部的一个地方，

09:23

who gave her the gift of voice.

216

563183

3841

一个将声音赠送给她的陌生人。

09:27

And as a scientist, I'm so excited

217

567024

2153

作为一名科学家，我很期待

09:29

to take this work out of the laboratory

218

569177

1935

将这项工作搬到实验室之外，

09:31

and finally into the real world

219

571112

1800

最终搬进现实世界

09:32

so it can have real-world impact.

220

572912

3165

并产生真正的影响。

09:36

What I want to share with you next

221

576077

1582

我接下来想跟你们分享的是

09:37

is how I envision taking this work

222

577659

2175

我对如何将这项工作

09:39

to that next level.

223

579834

2711

推进到下一个层次的展望。

09:42

I imagine a whole world of surrogate donors

224

582545

3887

我想象到一个充满了代理说话者的世界，

09:46

from all walks of life, different sizes, different ages,

225

586432

3260

他们来自不同的行业，有着不同的体型和年龄，

09:49

coming together in this voice drive

226

589692

3058

他们为这个声音计划走到一起，

09:52

to give people voices

227

592750

2270

希望赋予人们

09:55

that are as colorful as their personalities.

228

595020

3799

和他们的性格一样丰富多彩的声音。

09:58

To do that as a first step,

229

598819

2300

实现这个目标的第一步，

10:01

we've put together this website, VocaliD.org,

230

601119

3275

我们建立了一个网站：VocaliD.org，

10:04

as a way to bring together those

231

604394

1624

通过这个网站，我们把

10:06

who want to join us as voice donors,

232

606018

2675

愿意以声音捐献者或专业知识捐献者的身份

10:08

as expertise donors,

233

608693

1772

加入到我们的人们团结在一起，

10:10

in whatever way to make this vision a reality.

234

610465

5339

不管以何种方式，来一起实现这个愿景。

10:15

They say that giving blood can save lives.

235

615804

4153

人们说献血可以拯救生命。

10:19

Well, giving your voice can change lives.

236

619957

4982

那么，捐献您的声音可以改变生命。

10:24

All we need is a few hours of speech

237

624939

3050

我们需要的仅仅是几小时的

10:27

from our surrogate talker,

238

627989

1491

代理说话者的话语，

10:29

and as little as a vowel from our target talker,

239

629480

4733

以及目标说话者的一个小小的元音，

10:34

to create a unique vocal identity.

240

634213

3711

就可以打造一个独特的声音。

10:37

So that's the science behind what we're doing.

241

637924

2626

这就是我们所做的研究背后的科学。

10:40

I want to end by circling back to the human side

242

640550

4455

作为结尾，我还是想回到人的主题，

10:45

that is really the inspiration for this work.

243

645005

4102

这也是这项工作的真正灵感来源。

10:49

About five years ago, we built our very first voice

244

649107

3699

大约五年前，我们第一次给一个名为威廉的男孩

10:52

for a little boy named William.

245

652806

2501

打造了他的声音。

10:55

When his mom first heard this voice,

246

655307

2357

当他的妈妈第一次听到这个声音的时候，

10:57

she said, "This is what William

247

657664

2345

她说：“如果威廉

11:00

would have sounded like

248

660009

1546

可以讲话，

11:01

had he been able to speak."

249

661555

2449

他的声音就该是这样的。

11:04

And then I saw William typing a message

250

664004

2418

然后我看到威廉在他的设备上

11:06

on his device.

251

666422

1362

打出一条消息。

11:07

I wondered, what was he thinking?

252

667784

3293

我在想，他在想什么？

11:11

Imagine carrying around someone else's voice

253

671077

3590

想象一下九年来一直用着

11:14

for nine years

254

674667

2193

别人的声音，

11:16

and finally finding your own voice.

255

676860

4844

然后最终找到了你自己的声音。

11:21

Imagine that.

256

681704

1377

想象一下。

11:23

This is what William said:

257

683081

2797

威廉说的是：

11:25

"Never heard me before."

258

685878

4463

“我从来没有听过我自己的声音。”

11:32

Thank you.

259

692417

1619

谢谢。

11:34

(Applause)

260

694036

4724

（掌声）

New videos

06:51

The Rise of China's Homegrown Brands — and Why ...

08:33

Can AI Help with the Chaos of Family Life? | Av...

09:26

You Are the Bridge to the Next Generation | Ndi...

08:29

Are We Still Human If Robots Help Raise Our Bab...

06:45

Parkour! How the Sport Keeps Your Body and Mind...

09:53

The Power of Gaming Together in a Lonely World ...

05:46

The myth of Medusa - Laura Aitken-Burt

05:02

How reliable is fingerprint evidence? - Theodor...

Original video on YouTube.com

Rupal Patel: Synthetic voices, as unique as fingerprints - YouTube

关于本网站

这个网站将向你介绍对学习英语有用的YouTube视频。你将看到来自世界各地的一流教师教授的英语课程。双击每个视频页面上显示的英文字幕，即可从那里播放视频。字幕会随着视频的播放而同步滚动。如果你有任何意见或要求，请使用此联系表与我们联系。

https://forms.gle/WvT1wiN1qDtmnspy7

Playback speed

Subtitle font size

Rupal Patel: Synthetic voices, as unique as fingerprints

New videos

Rupal Patel: Synthetic voices, as unique as fingerprints

New videos

Original video on YouTube.com