Rupal Patel: Synthetic voices, as unique as fingerprints

114,553 views ・ 2014-02-13

TED

Please double-click on the English subtitles below to play the video.

00:12

I'd like to talk today

12719

1490

00:14

about a powerful and fundamental aspect

14209

2927

00:17

of who we are: our voice.

17136

3598

00:20

Each one of us has a unique voiceprint

20734

2746

00:23

that reflects our age, our size,

23480

2289

00:25

even our lifestyle and personality.

25769

3237

00:29

In the words of the poet Longfellow,

29006

2142

00:31

"the human voice is the organ of the soul."

31148

3870

00:35

As a speech scientist, I'm fascinated

35018

2747

00:37

by how the voice is produced,

37765

1829

00:39

and I have an idea for how it can be engineered.

39594

3658

00:43

That's what I'd like to share with you.

43252

2210

00:45

I'm going to start by playing you a sample

45462

1814

00:47

of a voice that you may recognize.

47276

1871

00:49

(Recording) Stephen Hawking: "I would have thought

49147

1304

00:50

it was fairly obvious what I meant."

50451

2749

00:53

Rupal Patel: That was the voice

53200

1280

00:54

of Professor Stephen Hawking.

54480

2086

00:56

What you may not know is that same voice

56566

3849

01:00

may also be used by this little girl

60415

2478

01:02

who is unable to speak

62893

1697

01:04

because of a neurological condition.

64590

2597

01:07

In fact, all of these individuals

67187

2068

01:09

may be using the same voice,

69255

2012

01:11

and that's because there's only a few options available.

71267

3557

01:14

In the U.S. alone, there are 2.5 million Americans

74824

4317

01:19

who are unable to speak,

79141

1610

01:20

and many of whom use computerized devices

80751

2622

01:23

to communicate.

83373

1522

01:24

Now that's millions of people worldwide

84895

3479

01:28

who are using generic voices,

88374

1652

01:30

including Professor Hawking,

90026

1446

01:31

who uses an American-accented voice.

91472

4833

01:36

This lack of individuation of the synthetic voice

96305

3328

01:39

really hit home

99633

1416

01:41

when I was at an assistive technology conference

101049

2472

01:43

a few years ago,

103521

1850

01:45

and I recall walking into an exhibit hall

105371

3604

01:48

and seeing a little girl and a grown man

108975

3044

01:52

having a conversation using their devices,

112019

2916

01:54

different devices, but the same voice.

114935

4284

01:59

And I looked around and I saw this happening

119219

1909

02:01

all around me, literally hundreds of individuals

121128

4190

02:05

using a handful of voices,

125318

2738

02:08

voices that didn't fit their bodies

128056

3091

02:11

or their personalities.

131147

2082

02:13

We wouldn't dream of fitting a little girl

133229

2727

02:15

with the prosthetic limb of a grown man.

135956

3396

02:19

So why then the same prosthetic voice?

139352

3304

02:22

It really struck me,

142656

1291

02:23

and I wanted to do something about this.

143947

3151

02:27

I'm going to play you now a sample

147098

1953

02:29

of someone who has, two people actually,

149051

3288

02:32

who have severe speech disorders.

152339

1768

02:34

I want you to take a listen to how they sound.

154107

3230

02:37

They're saying the same utterance.

157337

2357

02:39

(First voice)

159694

2432

02:42

(Second voice)

162126

3617

02:45

You probably didn't understand what they said,

165743

2412

02:48

but I hope that you heard

168155

1854

02:50

their unique vocal identities.

170009

4283

02:54

So what I wanted to do next is,

174292

2813

02:57

I wanted to find out how we could harness

177105

2384

02:59

these residual vocal abilities

179489

1821

03:01

and build a technology

181310

2016

03:03

that could be customized for them,

183326

2143

03:05

voices that could be customized for them.

185469

2429

03:07

So I reached out to my collaborator, Tim Bunnell.

187898

2685

03:10

Dr. Bunnell is an expert in speech synthesis,

190583

3063

03:13

and what he'd been doing is building

193646

2033

03:15

personalized voices for people

195679

1881

03:17

by putting together

197560

2097

03:19

pre-recorded samples of their voice

199657

2150

03:21

and reconstructing a voice for them.

201807

2879

03:24

These are people who had lost their voice

204686

1712

03:26

later in life.

206398

1911

03:28

We didn't have the luxury

208309

1394

03:29

of pre-recorded samples of speech

209703

1774

03:31

for those born with speech disorder.

211477

2292

03:33

But I thought, there had to be a way

213769

2537

03:36

to reverse engineer a voice

216306

1944

03:38

from whatever little is left over.

218250

2291

03:40

So we decided to do exactly that.

220541

2714

03:43

We set out with a little bit of funding from the National Science Foundation,

223255

3403

03:46

to create custom-crafted voices that captured

226658

3565

03:50

their unique vocal identities.

230223

1536

03:51

We call this project VocaliD, or vocal I.D.,

231759

3203

03:54

for vocal identity.

234962

2033

03:56

Now before I get into the details of how

236995

2674

03:59

the voice is made and let you listen to it,

239669

2048

04:01

I need to give you a real quick speech science lesson. Okay?

241717

3350

04:05

So first, we know that the voice is changing

245067

3159

04:08

dramatically over the course of development.

248226

2854

04:11

Children sound different from teens

251080

2090

04:13

who sound different from adults.

253170

1463

04:14

We've all experienced this.

254633

2642

04:17

Fact number two is that speech

257275

3363

04:20

is a combination of the source,

260638

2553

04:23

which is the vibrations generated by your voice box,

263191

3479

04:26

which are then pushed through

266670

1939

04:28

the rest of the vocal tract.

100

268609

2437

04:31

These are the chambers of your head and neck

101

271046

2484

04:33

that vibrate,

102

273530

1239

04:34

and they actually filter that source sound

103

274769

2110

04:36

to produce consonants and vowels.

104

276879

2537

04:39

So the combination of source and filter

105

279416

3860

04:43

is how we produce speech.

106

283276

2630

04:45

And that happens in one individual.

107

285906

3026

04:48

Now I told you earlier that I'd spent

108

288932

2626

04:51

a good part of my career

109

291558

2025

04:53

understanding and studying

110

293583

2453

04:56

the source characteristics of people

111

296036

1958

04:57

with severe speech disorder,

112

297994

2301

05:00

and what I've found

113

300295

1465

05:01

is that even though their filters were impaired,

114

301760

3366

05:05

they were able to modulate their source:

115

305126

2961

05:08

the pitch, the loudness, the tempo of their voice.

116

308087

3262

05:11

These are called prosody, and I've been documenting for years

117

311349

3368

05:14

that the prosodic abilities of these individuals

118

314717

2277

05:16

are preserved.

119

316994

1575

05:18

So when I realized that those same cues

120

318569

4087

05:22

are also important for speaker identity,

121

322656

2769

05:25

I had this idea.

122

325425

2015

05:27

Why don't we take the source

123

327440

2516

05:29

from the person we want the voice to sound like,

124

329956

2213

05:32

because it's preserved,

125

332169

1463

05:33

and borrow the filter

126

333632

2135

05:35

from someone about the same age and size,

127

335767

3229

05:39

because they can articulate speech,

128

339011

2407

05:41

and then mix them?

129

341418

1791

05:43

Because when we mix them,

130

343209

1787

05:44

we can get a voice that's as clear

131

344996

1698

05:46

as our surrogate talker --

132

346694

1754

05:48

that's the person we borrowed the filter from—

133

348448

2595

05:51

and is similar in identity to our target talker.

134

351043

4649

05:55

It's that simple.

135

355692

1427

05:57

That's the science behind what we're doing.

136

357119

2934

06:00

So once you have that in mind,

137

360053

3533

06:03

how do you go about building this voice?

138

363586

2258

06:05

Well, you have to find someone

139

365844

1480

06:07

who is willing to be a surrogate.

140

367324

2400

06:09

It's not such an ominous thing.

141

369724

2264

06:11

Being a surrogate donor

142

371988

1523

06:13

only requires you to say a few hundred

143

373511

2788

06:16

to a few thousand utterances.

144

376299

2242

06:18

The process goes something like this.

145

378541

2003

06:20

(Video) Voice: Things happen in pairs.

146

380544

2190

06:22

I love to sleep.

147

382734

1925

06:24

The sky is blue without clouds.

148

384659

3882

06:28

RP: Now she's going to go on like this

149

388541

2002

06:30

for about three to four hours,

150

390543

1919

06:32

and the idea is not for her to say everything

151

392462

3005

06:35

that the target is going to want to say,

152

395467

2045

06:37

but the idea is to cover all the different combinations

153

397512

3395

06:40

of the sounds that occur in the language.

154

400907

3271

06:44

The more speech you have,

155

404178

1638

06:45

the better sounding voice you're going to have.

156

405816

2305

06:48

Once you have those recordings,

157

408121

1673

06:49

what we need to do

158

409794

1413

06:51

is we have to parse these recordings

159

411207

2718

06:53

into little snippets of speech,

160

413925

2449

06:56

one- or two-sound combinations,

161

416374

2337

06:58

sometimes even whole words

162

418711

1883

07:00

that start populating a dataset or a database.

163

420594

4516

07:05

We're going to call this database a voice bank.

164

425110

3717

07:08

Now the power of the voice bank

165

428827

2096

07:10

is that from this voice bank,

166

430923

2014

07:12

we can now say any new utterance,

167

432937

2011

07:14

like, "I love chocolate" --

168

434948

1424

07:16

everyone needs to be able to say that—

169

436372

1739

07:18

fish through that database

170

438111

1831

07:19

and find all the segments necessary

171

439942

1940

07:21

to say that utterance.

172

441882

1929

07:23

(Video) Voice: I love chocolate.

173

443811

1789

07:25

RP: So that's speech synthesis.

174

445600

1391

07:26

It's called concatenative synthesis, and that's what we're using.

175

446991

2573

07:29

That's not the novel part.

176

449564

1533

07:31

What's novel is how we make it sound

177

451097

2221

07:33

like this young woman.

178

453318

1457

07:34

This is Samantha.

179

454775

1524

07:36

I met her when she was nine,

180

456299

2346

07:38

and since then, my team and I

181

458645

1897

07:40

have been trying to build her a personalized voice.

182

460542

2714

07:43

We first had to find a surrogate donor,

183

463256

3099

07:46

and then we had to have Samantha

184

466355

1818

07:48

produce some utterances.

185

468173

1929

07:50

What she can produce are mostly vowel-like sounds,

186

470102

2379

07:52

but that's enough for us to extract

187

472481

2479

07:54

her source characteristics.

188

474960

2285

07:57

What happens next is best described

189

477245

3271

08:00

by my daughter's analogy. She's six.

190

480516

2767

08:03

She calls it mixing colors to paint voices.

191

483283

5422

08:08

It's beautiful. It's exactly that.

192

488705

2555

08:11

Samantha's voice is like a concentrated sample

193

491260

2860

08:14

of red food dye which we can infuse

194

494120

2609

08:16

into the recordings of her surrogate

195

496729

2540

08:19

to get a pink voice just like this.

196

499269

4387

08:23

(Video) Samantha: Aaaaaah.

197

503656

4491

08:28

RP: So now, Samantha can say this.

198

508147

2808

08:30

(Video) Samantha: This voice is only for me.

199

510955

3069

08:34

I can't wait to use my new voice with my friends.

200

514024

6305

08:40

RP: Thank you. (Applause)

201

520329

6417

08:46

I'll never forget the gentle smile

202

526746

2333

08:49

that spread across her face

203

529079

1902

08:50

when she heard that voice for the first time.

204

530981

3649

08:54

Now there's millions of people

205

534630

1882

08:56

around the world like Samantha, millions,

206

536512

2833

08:59

and we've only begun to scratch the surface.

207

539345

3440

09:02

What we've done so far is we have

208

542785

1642

09:04

a few surrogate talkers from around the U.S.

209

544427

3859

09:08

who have donated their voices,

210

548286

1507

09:09

and we have been using those

211

549793

1928

09:11

to build our first few personalized voices.

212

551721

4472

09:16

But there's so much more work to be done.

213

556193

1756

09:17

For Samantha, her surrogate

214

557949

2188

09:20

came from somewhere in the Midwest, a stranger

215

560137

3046

09:23

who gave her the gift of voice.

216

563183

3841

09:27

And as a scientist, I'm so excited

217

567024

2153

09:29

to take this work out of the laboratory

218

569177

1935

09:31

and finally into the real world

219

571112

1800

09:32

so it can have real-world impact.

220

572912

3165

09:36

What I want to share with you next

221

576077

1582

09:37

is how I envision taking this work

222

577659

2175

09:39

to that next level.

223

579834

2711

09:42

I imagine a whole world of surrogate donors

224

582545

3887

09:46

from all walks of life, different sizes, different ages,

225

586432

3260

09:49

coming together in this voice drive

226

589692

3058

09:52

to give people voices

227

592750

2270

09:55

that are as colorful as their personalities.

228

595020

3799

09:58

To do that as a first step,

229

598819

2300

10:01

we've put together this website, VocaliD.org,

230

601119

3275

10:04

as a way to bring together those

231

604394

1624

10:06

who want to join us as voice donors,

232

606018

2675

10:08

as expertise donors,

233

608693

1772

10:10

in whatever way to make this vision a reality.

234

610465

5339

10:15

They say that giving blood can save lives.

235

615804

4153

10:19

Well, giving your voice can change lives.

236

619957

4982

10:24

All we need is a few hours of speech

237

624939

3050

10:27

from our surrogate talker,

238

627989

1491

10:29

and as little as a vowel from our target talker,

239

629480

4733

10:34

to create a unique vocal identity.

240

634213

3711

10:37

So that's the science behind what we're doing.

241

637924

2626

10:40

I want to end by circling back to the human side

242

640550

4455

10:45

that is really the inspiration for this work.

243

645005

4102

10:49

About five years ago, we built our very first voice

244

649107

3699

10:52

for a little boy named William.

245

652806

2501

10:55

When his mom first heard this voice,

246

655307

2357

10:57

she said, "This is what William

247

657664

2345

11:00

would have sounded like

248

660009

1546

11:01

had he been able to speak."

249

661555

2449

11:04

And then I saw William typing a message

250

664004

2418

11:06

on his device.

251

666422

1362

11:07

I wondered, what was he thinking?

252

667784

3293

11:11

Imagine carrying around someone else's voice

253

671077

3590

11:14

for nine years

254

674667

2193

11:16

and finally finding your own voice.

255

676860

4844

11:21

Imagine that.

256

681704

1377

11:23

This is what William said:

257

683081

2797

11:25

"Never heard me before."

258

685878

4463

11:32

Thank you.

259

692417

1619

11:34

(Applause)

260

694036

4724

New videos

06:27

How do drugs make you hallucinate? - Anees Bahji

06:51

The Rise of China's Homegrown Brands — and Why ...

06:16

How important is politeness? ⏲️ 6 Minute English

07:44

North Korea’s secrets revealed by phone: Study:...

17:30

Advanced English Learning: Speaking Practice

03:48

What can you do? Easy English Conversations 💬 ...

08:33

Can AI Help with the Chaos of Family Life? | Av...

12:13

Speak English Confidently: Daily Tricks & Tips 🧠

Original video on YouTube.com

Rupal Patel: Synthetic voices, as unique as fingerprints - YouTube

About this website

This site will introduce you to YouTube videos that are useful for learning English. You will see English lessons taught by top-notch teachers from around the world. Double-click on the English subtitles displayed on each video page to play the video from there. The subtitles scroll in sync with the video playback. If you have any comments or requests, please contact us using this contact form.

https://forms.gle/WvT1wiN1qDtmnspy7

Playback speed

Subtitle font size

Rupal Patel: Synthetic voices, as unique as fingerprints

New videos

Rupal Patel: Synthetic voices, as unique as fingerprints

New videos

Original video on YouTube.com