How to read the genome and build a human being | Riccardo Sabatini

320,260 views ・ 2016-05-24

TED


Please double-click on the English subtitles below to play the video.

00:12
For the next 16 minutes, I'm going to take you on a journey
0
12612
2762
00:15
that is probably the biggest dream of humanity:
1
15398
3086
00:18
to understand the code of life.
2
18508
2015
00:21
So for me, everything started many, many years ago
3
21072
2743
00:23
when I met the first 3D printer.
4
23839
2723
00:26
The concept was fascinating.
5
26586
1674
00:28
A 3D printer needs three elements:
6
28284
2022
00:30
a bit of information, some raw material, some energy,
7
30330
4134
00:34
and it can produce any object that was not there before.
8
34488
3334
00:38
I was doing physics, I was coming back home
9
38517
2137
00:40
and I realized that I actually always knew a 3D printer.
10
40678
3438
00:44
And everyone does.
11
44140
1336
00:45
It was my mom.
12
45500
1158
00:46
(Laughter)
13
46682
1001
00:47
My mom takes three elements:
14
47707
2414
00:50
a bit of information, which is between my father and my mom in this case,
15
50145
3973
00:54
raw elements and energy in the same media, that is food,
16
54142
4157
00:58
and after several months, produces me.
17
58323
2508
01:00
And I was not existent before.
18
60855
1812
01:02
So apart from the shock of my mom discovering that she was a 3D printer,
19
62691
3762
01:06
I immediately got mesmerized by that piece,
20
66477
4738
01:11
the first one, the information.
21
71239
1717
01:12
What amount of information does it take
22
72980
2251
01:15
to build and assemble a human?
23
75255
1936
01:17
Is it much? Is it little?
24
77215
1574
01:18
How many thumb drives can you fill?
25
78813
2180
01:21
Well, I was studying physics at the beginning
26
81017
2624
01:23
and I took this approximation of a human as a gigantic Lego piece.
27
83665
5597
01:29
So, imagine that the building blocks are little atoms
28
89286
3785
01:33
and there is a hydrogen here, a carbon here, a nitrogen here.
29
93095
4653
01:37
So in the first approximation,
30
97772
1571
01:39
if I can list the number of atoms that compose a human being,
31
99367
4343
01:43
I can build it.
32
103734
1387
01:45
Now, you can run some numbers
33
105145
2029
01:47
and that happens to be quite an astonishing number.
34
107198
3277
01:50
So the number of atoms,
35
110499
2757
01:53
the file that I will save in my thumb drive to assemble a little baby,
36
113280
4755
01:58
will actually fill an entire Titanic of thumb drives --
37
118059
4667
02:02
multiplied 2,000 times.
38
122750
2718
02:05
This is the miracle of life.
39
125957
3401
02:09
Every time you see from now on a pregnant lady,
40
129382
2612
02:12
she's assembling the biggest amount of information
41
132018
2856
02:14
that you will ever encounter.
42
134898
1556
02:16
Forget big data, forget anything you heard of.
43
136478
2950
02:19
This is the biggest amount of information that exists.
44
139452
2881
02:22
(Applause)
45
142357
3833
02:26
But nature, fortunately, is much smarter than a young physicist,
46
146214
4644
02:30
and in four billion years, managed to pack this information
47
150882
3576
02:34
in a small crystal we call DNA.
48
154482
2705
02:37
We met it for the first time in 1950 when Rosalind Franklin,
49
157605
4312
02:41
an amazing scientist, a woman,
50
161941
1556
02:43
took a picture of it.
51
163521
1389
02:44
But it took us more than 40 years to finally poke inside a human cell,
52
164934
5188
02:50
take out this crystal,
53
170146
1602
02:51
unroll it, and read it for the first time.
54
171772
3080
02:55
The code comes out to be a fairly simple alphabet,
55
175615
3241
02:58
four letters: A, T, C and G.
56
178880
3772
03:02
And to build a human, you need three billion of them.
57
182676
3490
03:06
Three billion.
58
186933
1179
03:08
How many are three billion?
59
188136
1579
03:09
It doesn't really make any sense as a number, right?
60
189739
2762
03:12
So I was thinking how I could explain myself better
61
192525
4085
03:16
about how big and enormous this code is.
62
196634
3050
03:19
But there is -- I mean, I'm going to have some help,
63
199708
3054
03:22
and the best person to help me introduce the code
64
202786
3227
03:26
is actually the first man to sequence it, Dr. Craig Venter.
65
206037
3522
03:29
So welcome onstage, Dr. Craig Venter.
66
209583
3390
03:32
(Applause)
67
212997
6931
03:39
Not the man in the flesh,
68
219952
2256
03:43
but for the first time in history,
69
223448
2345
03:45
this is the genome of a specific human,
70
225817
3462
03:49
printed page-by-page, letter-by-letter:
71
229303
3760
03:53
262,000 pages of information,
72
233087
3996
03:57
450 kilograms, shipped from the United States to Canada
73
237107
4364
04:01
thanks to Bruno Bowden, Lulu.com, a start-up, did everything.
74
241495
4843
04:06
It was an amazing feat.
75
246362
1463
04:07
But this is the visual perception of what is the code of life.
76
247849
4297
04:12
And now, for the first time, I can do something fun.
77
252170
2478
04:14
I can actually poke inside it and read.
78
254672
2547
04:17
So let me take an interesting book ... like this one.
79
257243
4625
04:25
I have an annotation; it's a fairly big book.
80
265077
2534
04:27
So just to let you see what is the code of life.
81
267635
3727
04:32
Thousands and thousands and thousands
82
272566
3391
04:35
and millions of letters.
83
275981
2670
04:38
And they apparently make sense.
84
278675
2396
04:41
Let's get to a specific part.
85
281095
1757
04:43
Let me read it to you:
86
283571
1362
04:44
(Laughter)
87
284957
1021
04:46
"AAG, AAT, ATA."
88
286002
4006
04:50
To you it sounds like mute letters,
89
290965
2067
04:53
but this sequence gives the color of the eyes to Craig.
90
293056
4041
04:57
I'll show you another part of the book.
91
297633
1932
04:59
This is actually a little more complicated.
92
299589
2094
05:02
Chromosome 14, book 132:
93
302983
2647
05:05
(Laughter)
94
305654
2090
05:07
As you might expect.
95
307768
1277
05:09
(Laughter)
96
309069
3466
05:14
"ATT, CTT, GATT."
97
314857
4507
05:20
This human is lucky,
98
320329
1687
05:22
because if you miss just two letters in this position --
99
322040
4517
05:26
two letters of our three billion --
100
326581
1877
05:28
he will be condemned to a terrible disease:
101
328482
2019
05:30
cystic fibrosis.
102
330525
1440
05:31
We have no cure for it, we don't know how to solve it,
103
331989
3413
05:35
and it's just two letters of difference from what we are.
104
335426
3755
05:39
A wonderful book, a mighty book,
105
339585
2705
05:43
a mighty book that helped me understand
106
343115
1998
05:45
and show you something quite remarkable.
107
345137
2753
05:48
Every one of you -- what makes me, me and you, you --
108
348480
4435
05:52
is just about five million of these,
109
352939
2954
05:55
half a book.
110
355917
1228
05:58
For the rest,
111
358015
1663
05:59
we are all absolutely identical.
112
359702
2562
06:03
Five hundred pages is the miracle of life that you are.
113
363008
4018
06:07
The rest, we all share it.
114
367050
2531
06:09
So think about that again when we think that we are different.
115
369605
2909
06:12
This is the amount that we share.
116
372538
2221
06:15
So now that I have your attention,
117
375441
3429
06:18
the next question is:
118
378894
1359
06:20
How do I read it?
119
380277
1151
06:21
How do I make sense out of it?
120
381452
1509
06:23
Well, for however good you can be at assembling Swedish furniture,
121
383409
4240
06:27
this instruction manual is nothing you can crack in your life.
122
387673
3563
06:31
(Laughter)
123
391260
1603
06:32
And so, in 2014, two famous TEDsters,
124
392887
3112
06:36
Peter Diamandis and Craig Venter himself,
125
396023
2540
06:38
decided to assemble a new company.
126
398587
1927
06:40
Human Longevity was born,
127
400538
1412
06:41
with one mission:
128
401974
1370
06:43
trying everything we can try
129
403368
1861
06:45
and learning everything we can learn from these books,
130
405253
2759
06:48
with one target --
131
408036
1705
06:50
making real the dream of personalized medicine,
132
410862
2801
06:53
understanding what things should be done to have better health
133
413687
3767
06:57
and what are the secrets in these books.
134
417478
2283
07:00
An amazing team, 40 data scientists and many, many more people,
135
420329
4250
07:04
a pleasure to work with.
136
424603
1350
07:05
The concept is actually very simple.
137
425977
2253
07:08
We're going to use a technology called machine learning.
138
428254
3158
07:11
On one side, we have genomes -- thousands of them.
139
431436
4539
07:15
On the other side, we collected the biggest database of human beings:
140
435999
3997
07:20
phenotypes, 3D scan, NMR -- everything you can think of.
141
440020
4296
07:24
Inside there, on these two opposite sides,
142
444340
2899
07:27
there is the secret of translation.
143
447263
2442
07:29
And in the middle, we build a machine.
144
449729
2472
07:32
We build a machine and we train a machine --
145
452801
2385
07:35
well, not exactly one machine, many, many machines --
146
455210
3210
07:38
to try to understand and translate the genome in a phenotype.
147
458444
4544
07:43
What are those letters, and what do they do?
148
463362
3340
07:46
It's an approach that can be used for everything,
149
466726
2747
07:49
but using it in genomics is particularly complicated.
150
469497
2993
07:52
Little by little we grew and we wanted to build different challenges.
151
472514
3276
07:55
We started from the beginning, from common traits.
152
475814
2732
07:58
Common traits are comfortable because they are common,
153
478570
2603
08:01
everyone has them.
154
481197
1184
08:02
So we started to ask our questions:
155
482405
2494
08:04
Can we predict height?
156
484923
1380
08:06
Can we read the books and predict your height?
157
486985
2177
08:09
Well, we actually can,
158
489186
1151
08:10
with five centimeters of precision.
159
490361
1793
08:12
BMI is fairly connected to your lifestyle,
160
492178
3135
08:15
but we still can, we get in the ballpark, eight kilograms of precision.
161
495337
3864
08:19
Can we predict eye color?
162
499225
1231
08:20
Yeah, we can.
163
500480
1158
08:21
Eighty percent accuracy.
164
501662
1324
08:23
Can we predict skin color?
165
503466
1858
08:25
Yeah we can, 80 percent accuracy.
166
505348
2441
08:27
Can we predict age?
167
507813
1340
08:30
We can, because apparently, the code changes during your life.
168
510121
3739
08:33
It gets shorter, you lose pieces, it gets insertions.
169
513884
3282
08:37
We read the signals, and we make a model.
170
517190
2555
08:40
Now, an interesting challenge:
171
520438
1475
08:41
Can we predict a human face?
172
521937
1729
08:45
It's a little complicated,
173
525014
1278
08:46
because a human face is scattered among millions of these letters.
174
526316
3191
08:49
And a human face is not a very well-defined object.
175
529531
2629
08:52
So, we had to build an entire tier of it
176
532184
2051
08:54
to learn and teach a machine what a face is,
177
534259
2710
08:56
and embed and compress it.
178
536993
2037
08:59
And if you're comfortable with machine learning,
179
539054
2248
09:01
you understand what the challenge is here.
180
541326
2284
09:04
Now, after 15 years -- 15 years after we read the first sequence --
181
544108
5991
09:10
this October, we started to see some signals.
182
550123
2902
09:13
And it was a very emotional moment.
183
553049
2455
09:15
What you see here is a subject coming in our lab.
184
555528
3745
09:19
This is a face for us.
185
559619
1928
09:21
So we take the real face of a subject, we reduce the complexity,
186
561571
3631
09:25
because not everything is in your face --
187
565226
1970
09:27
lots of features and defects and asymmetries come from your life.
188
567220
3786
09:31
We symmetrize the face, and we run our algorithm.
189
571030
3469
09:35
The results that I show you right now,
190
575245
1898
09:37
this is the prediction we have from the blood.
191
577167
3372
09:41
(Applause)
192
581596
1524
09:43
Wait a second.
193
583144
1435
09:44
In these seconds, your eyes are watching, left and right, left and right,
194
584603
4692
09:49
and your brain wants those pictures to be identical.
195
589319
3930
09:53
So I ask you to do another exercise, to be honest.
196
593273
2446
09:55
Please search for the differences,
197
595743
2287
09:58
which are many.
198
598054
1361
09:59
The biggest amount of signal comes from gender,
199
599439
2603
10:02
then there is age, BMI, the ethnicity component of a human.
200
602066
5201
10:07
And scaling up over that signal is much more complicated.
201
607291
3711
10:11
But what you see here, even in the differences,
202
611026
3250
10:14
lets you understand that we are in the right ballpark,
203
614300
3595
10:17
that we are getting closer.
204
617919
1348
10:19
And it's already giving you some emotions.
205
619291
2349
10:21
This is another subject that comes in place,
206
621664
2703
10:24
and this is a prediction.
207
624391
1409
10:25
A little smaller face, we didn't get the complete cranial structure,
208
625824
4596
10:30
but still, it's in the ballpark.
209
630444
2651
10:33
This is a subject that comes in our lab,
210
633634
2224
10:35
and this is the prediction.
211
635882
1443
10:38
So these people have never been seen in the training of the machine.
212
638056
4676
10:42
These are the so-called "held-out" set.
213
642756
2837
10:45
But these are people that you will probably never believe.
214
645617
3740
10:49
We're publishing everything in a scientific publication,
215
649381
2676
10:52
you can read it.
216
652081
1151
10:53
But since we are onstage, Chris challenged me.
217
653256
2344
10:55
I probably exposed myself and tried to predict
218
655624
3626
10:59
someone that you might recognize.
219
659274
2831
11:02
So, in this vial of blood -- and believe me, you have no idea
220
662470
4425
11:06
what we had to do to have this blood now, here --
221
666919
2880
11:09
in this vial of blood is the amount of biological information
222
669823
3901
11:13
that we need to do a full genome sequence.
223
673748
2277
11:16
We just need this amount.
224
676049
2070
11:18
We ran this sequence, and I'm going to do it with you.
225
678528
3205
11:21
And we start to layer up all the understanding we have.
226
681757
3979
11:25
In the vial of blood, we predicted he's a male.
227
685760
3350
11:29
And the subject is a male.
228
689134
1364
11:30
We predict that he's a meter and 76 cm.
229
690996
2438
11:33
The subject is a meter and 77 cm.
230
693458
2392
11:35
So, we predicted that he's 76; the subject is 82.
231
695874
4110
11:40
We predict his age, 38.
232
700701
2632
11:43
The subject is 35.
233
703357
1904
11:45
We predict his eye color.
234
705851
2124
11:48
Too dark.
235
708824
1211
11:50
We predict his skin color.
236
710059
1555
11:52
We are almost there.
237
712026
1410
11:53
That's his face.
238
713899
1373
11:57
Now, the reveal moment:
239
717172
3269
12:00
the subject is this person.
240
720465
1770
12:02
(Laughter)
241
722259
1935
12:04
And I did it intentionally.
242
724218
2058
12:06
I am a very particular and peculiar ethnicity.
243
726300
3692
12:10
Southern European, Italians -- they never fit in models.
244
730016
2950
12:12
And it's particular -- that ethnicity is a complex corner case for our model.
245
732990
5130
12:18
But there is another point.
246
738144
1509
12:19
So, one of the things that we use a lot to recognize people
247
739677
3477
12:23
will never be written in the genome.
248
743178
1722
12:24
It's our free will, it's how I look.
249
744924
2317
12:27
Not my haircut in this case, but my beard cut.
250
747265
3229
12:30
So I'm going to show you, I'm going to, in this case, transfer it --
251
750518
3553
12:34
and this is nothing more than Photoshop, no modeling --
252
754095
2765
12:36
the beard on the subject.
253
756884
1713
12:38
And immediately, we get much, much better in the feeling.
254
758621
3472
12:42
So, why do we do this?
255
762955
2709
12:47
We certainly don't do it for predicting height
256
767938
5140
12:53
or taking a beautiful picture out of your blood.
257
773102
2372
12:56
We do it because the same technology and the same approach,
258
776390
4018
13:00
the machine learning of this code,
259
780432
2520
13:02
is helping us to understand how we work,
260
782976
3137
13:06
how your body works,
261
786137
1486
13:07
how your body ages,
262
787647
1665
13:09
how disease generates in your body,
263
789336
2769
13:12
how your cancer grows and develops,
264
792129
2972
13:15
how drugs work
265
795125
1783
13:16
and if they work on your body.
266
796932
2314
13:19
This is a huge challenge.
267
799713
1667
13:21
This is a challenge that we share
268
801894
1638
13:23
with thousands of other researchers around the world.
269
803556
2579
13:26
It's called personalized medicine.
270
806159
2222
13:29
It's the ability to move from a statistical approach
271
809125
3460
13:32
where you're a dot in the ocean,
272
812609
2032
13:34
to a personalized approach,
273
814665
1813
13:36
where we read all these books
274
816502
2185
13:38
and we get an understanding of exactly how you are.
275
818711
2864
13:42
But it is a particularly complicated challenge,
276
822260
3362
13:45
because of all these books, as of today,
277
825646
3998
13:49
we just know probably two percent:
278
829668
2642
13:53
four books of more than 175.
279
833027
3653
13:58
And this is not the topic of my talk,
280
838021
3206
14:02
because we will learn more.
281
842145
2598
14:05
There are the best minds in the world on this topic.
282
845378
2669
14:09
The prediction will get better,
283
849048
1834
14:10
the model will get more precise.
284
850906
2253
14:13
And the more we learn,
285
853183
1858
14:15
the more we will be confronted with decisions
286
855065
4830
14:19
that we never had to face before
287
859919
3021
14:22
about life,
288
862964
1435
14:24
about death,
289
864423
1674
14:26
about parenting.
290
866121
1603
14:32
So, we are touching the very inner detail on how life works.
291
872626
4746
14:38
And it's a revolution that cannot be confined
292
878118
3158
14:41
in the domain of science or technology.
293
881300
2659
14:44
This must be a global conversation.
294
884960
2244
14:47
We must start to think of the future we're building as a humanity.
295
887798
5217
14:53
We need to interact with creatives, with artists, with philosophers,
296
893039
4064
14:57
with politicians.
297
897127
1510
14:58
Everyone is involved,
298
898661
1158
14:59
because it's the future of our species.
299
899843
2825
15:03
Without fear, but with the understanding
300
903273
3968
15:07
that the decisions that we make in the next year
301
907265
3871
15:11
will change the course of history forever.
302
911160
3789
15:15
Thank you.
303
915732
1160
15:16
(Applause)
304
916916
10159
About this website

This site will introduce you to YouTube videos that are useful for learning English. You will see English lessons taught by top-notch teachers from around the world. Double-click on the English subtitles displayed on each video page to play the video from there. The subtitles scroll in sync with the video playback. If you have any comments or requests, please contact us using this contact form.

https://forms.gle/WvT1wiN1qDtmnspy7