How we can store digital data in DNA | Dina Zielinski

127,538 views ・ 2019-03-21

TED


Please double-click on the English subtitles below to play the video.

00:12
I could fit all movies ever made inside of this tube.
0
12652
5196
00:17
If you can't see it, that's kind of the point.
1
17872
2253
00:20
(Laughter)
2
20149
1016
00:21
Before we understand how this is possible,
3
21189
3243
00:24
it's important to understand the value of this feat.
4
24456
3746
00:29
All of our thoughts and actions these days,
5
29075
2266
00:31
through photos and videos --
6
31365
1986
00:33
even our fitness activities --
7
33375
1879
00:35
are stored as digital data.
8
35278
2133
00:38
Aside from running out of space
9
38109
1517
00:39
on our phones,
10
39650
1151
00:40
we rarely think about our digital footprint.
11
40825
2314
00:43
But humanity has collectively generated more data
12
43536
3528
00:47
in the last few years
13
47088
1873
00:48
than all of preceding human history.
14
48985
2530
00:51
Big data has become a big problem.
15
51902
2898
00:55
Digital storage is really expensive,
16
55229
2817
00:58
and none of these devices that we have really stand the test of time.
17
58070
3723
01:03
There's this nonprofit website called the Internet Archive.
18
63256
3750
01:07
In addition to free books and movies,
19
67030
2645
01:09
you can access web pages as far back as 1996.
20
69699
4364
01:14
Now, this is very tempting,
21
74087
1684
01:15
but I decided to go back and look at the TED website's very humble beginnings.
22
75795
5989
01:21
As you can see, it's changed quite a bit in the last 30 years.
23
81808
3912
01:26
So this led me to the first-ever TED,
24
86720
2824
01:29
back in 1984,
25
89568
2180
01:31
and it just so happened to be a Sony executive
26
91772
2525
01:34
explaining how a compact disk works.
27
94321
3058
01:37
(Laughter)
28
97403
1079
01:38
Now, it's really incredible to be able to go back in time
29
98506
4264
01:42
and access this moment.
30
102794
2286
01:45
It's also really fascinating that after 30 years, after that first TED,
31
105548
5363
01:50
we're still talking about digital storage.
32
110935
2779
01:54
Now, if we look back another 30 years,
33
114827
2787
01:57
IBM released the first-ever hard drive
34
117638
3185
02:00
back in 1956.
35
120847
2127
02:02
Here it is being loaded for shipping in front of a small audience.
36
122998
4197
02:07
It held the equivalent of one MP3 song
37
127569
3110
02:11
and weighed over one ton.
38
131354
2004
02:14
At 10,000 dollars a megabyte,
39
134100
2651
02:16
I don't think anyone in this room would be interested in buying this thing,
40
136775
3587
02:20
except maybe as a collector's item.
41
140386
1760
02:22
But it's the best we could do at the time.
42
142817
2988
02:26
We've come such a long way in data storage.
43
146832
3116
02:29
Devices have evolved dramatically.
44
149972
2898
02:32
But all media eventually wear out or become obsolete.
45
152894
4024
02:37
If someone handed you a floppy drive today to back up your presentation,
46
157401
4417
02:41
you'd probably look at them kind of strange, maybe laugh,
47
161842
2940
02:44
but you'd have no way to use the damn thing.
48
164806
2415
02:47
These devices can no longer meet our storage needs,
49
167854
3141
02:51
although some of them can be repurposed.
50
171019
2702
02:54
All technology eventually dies or is lost,
51
174682
3109
02:57
along with our data,
52
177815
1851
02:59
all of our memories.
53
179690
1540
03:02
There's this illusion that the storage problem has been solved,
54
182210
4116
03:06
but really, we all just externalize it.
55
186350
2493
03:08
We don't worry about storing our emails and our photos.
56
188867
3477
03:12
They're just in the cloud.
57
192368
1723
03:15
But behind the scenes, storage is problematic.
58
195231
2937
03:18
After all, the cloud is just a lot of hard drives.
59
198192
3980
03:23
Now, most digital data, we could argue, is not really critical.
60
203156
4040
03:27
Surely, we could just delete it.
61
207220
2123
03:29
But how can we really know what's important today?
62
209957
3535
03:34
We've learned so much about human history
63
214132
2536
03:36
from drawings and writings in caves,
64
216692
2826
03:39
from stone tablets.
65
219542
1614
03:41
We've deciphered languages from the Rosetta Stone.
66
221180
3397
03:45
You know, we'll never really have the whole story, though.
67
225841
3609
03:49
Our data is our story,
68
229474
1894
03:51
even more so today.
69
231392
1735
03:53
We won't have our record recorded on stone tablets.
70
233508
3261
03:57
But we don't have to choose what is important now.
71
237692
2698
04:00
There's a way to store it all.
72
240847
1893
04:03
It turns out that there's a solution that's been around
73
243519
2598
04:06
for a few billion years,
74
246141
2443
04:08
and it's actually in this tube.
75
248608
1840
04:12
DNA is nature's oldest storage device.
76
252044
3722
04:15
After all, it contains all the information necessary
77
255790
3371
04:19
to build and maintain a human being.
78
259185
2830
04:22
But what makes DNA so great?
79
262583
2204
04:25
Well, let's take our own genome
80
265493
1756
04:27
as an example.
81
267273
1560
04:28
If we were to print out all three billion A's, T's, C's and G's
82
268857
4770
04:33
on a standard font, standard format,
83
273651
3631
04:37
and then we were to stack all of those papers,
84
277306
2740
04:40
it would be about 130 meters high,
85
280070
2660
04:42
somewhere between the Statue of Liberty and the Washington Monument.
86
282754
3659
04:46
Now, if we converted all those A's, T's, C's and G's
87
286437
2447
04:48
to digital data, to zeroes and ones,
88
288898
2556
04:51
it would total a few gigs.
89
291478
1769
04:53
And that's in each cell of our body.
90
293786
2339
04:56
We have more than 30 trillion cells.
91
296516
2838
04:59
You get the idea:
92
299757
1500
05:01
DNA can store a ton of information in a minuscule space.
93
301281
4675
05:07
DNA is also very durable,
94
307620
1825
05:09
and it doesn't even require electricity to store it.
95
309469
2834
05:12
We know this because scientists have recovered DNA from ancient humans
96
312327
4276
05:16
that lived hundreds of thousands of years ago.
97
316627
2752
05:19
One of those is Ötzi the Iceman.
98
319739
2627
05:22
Turns out, he's Austrian.
99
322390
1683
05:24
(Laughter)
100
324097
1600
05:25
He was found high, well-preserved,
101
325721
1630
05:27
in the mountains between Italy and Austria,
102
327375
2814
05:30
and it turns out that he has living genetic relatives here in Austria today.
103
330213
3984
05:34
So one of you could be a cousin of Ötzi.
104
334221
2342
05:36
(Laughter)
105
336587
1055
05:38
The point is that we have a better chance of recovering information
106
338043
3853
05:41
from an ancient human
107
341920
1225
05:43
than we do from an old phone.
108
343169
2042
05:45
It's also much less likely that we'll lose the ability to read DNA
109
345783
4645
05:50
than any single man-made device.
110
350452
2434
05:53
Every single new storage format requires a new way to read it.
111
353567
4112
05:57
We'll always be able to read DNA.
112
357703
2133
05:59
If we can no longer sequence, we have bigger problems
113
359860
3068
06:02
than worrying about data storage.
114
362952
2281
06:05
Storing data on DNA is not new.
115
365725
3071
06:08
Nature's been doing it for several billion years.
116
368820
3099
06:11
In fact, every living thing is a DNA storage device.
117
371943
3892
06:16
But how do we store data on DNA?
118
376397
2786
06:19
This is Photo 51.
119
379725
1791
06:21
It's the first-ever photo of DNA,
120
381540
2627
06:24
taken about 60 years ago.
121
384191
2252
06:26
This is around the time that that same hard drive was released by IBM.
122
386467
4382
06:31
So really, our understanding of digital storage and of DNA have coevolved.
123
391246
5492
06:37
We first learned to sequence, or read DNA,
124
397600
3316
06:40
and very soon after, how to write it,
125
400940
2012
06:42
or synthesize it.
126
402976
1559
06:44
This is much like how we learn a new language.
127
404559
3564
06:48
And now we have the ability to read, write and copy DNA.
128
408812
4613
06:53
We do it in the lab all the time.
129
413449
2080
06:56
So anything, really anything, that can be stored as zeroes and ones
130
416283
3882
07:00
can be stored in DNA.
131
420189
1719
07:02
To store something digitally, like this photo,
132
422579
3195
07:05
we convert it to bits, or binary digits.
133
425798
3306
07:09
Each pixel in a black-and-white photo is simply a zero or a one.
134
429128
4211
07:13
And we can write DNA much like an inkjet printer can print letters on a page.
135
433849
4824
07:18
We just have to convert our data, all of those zeroes and ones,
136
438697
3824
07:22
to A's, T's, C's and G's,
137
442545
2138
07:24
and then we send this to a synthesis company.
138
444707
2258
07:26
So we write it, we can store it,
139
446989
1947
07:28
and when we want to recover our data, we just sequence it.
140
448960
3234
07:32
Now, the fun part of all of this is deciding what files to include.
141
452218
4081
07:36
We're serious scientists, so we had to include a manuscript
142
456323
3377
07:39
for good posterity.
143
459724
1743
07:41
We also included a $50 Amazon gift card --
144
461491
2833
07:44
don't get too excited, it's already been spent, someone decoded it --
145
464348
3531
07:47
as well as an operating system,
146
467903
2210
07:50
one of the first movies ever made
147
470137
2371
07:52
and a Pioneer plaque.
148
472532
1738
07:54
Some of you might have seen this.
149
474294
1669
07:55
It has a depiction of a typical -- apparently -- male and female,
150
475987
3456
07:59
and our approximate location in the Solar System,
151
479467
2562
08:02
in case the Pioneer spacecraft ever encounters extraterrestrials.
152
482053
4002
08:06
So once we decided what sort of files we want to encode,
153
486861
2929
08:09
we package up the data,
154
489814
1468
08:11
convert those zeroes and ones to A's, T's, C's and G's,
155
491306
3654
08:14
and then we just send this file off to a synthesis company.
156
494984
3277
08:18
And this is what we got back.
157
498285
1770
08:20
Our files were in this tube.
158
500079
1919
08:22
All we had to do was sequence it.
159
502022
2098
08:24
This all sounds pretty straightforward,
160
504525
2531
08:27
but the difference between a really cool, fun idea
161
507080
2978
08:30
and something we can actually use
162
510082
2155
08:32
is overcoming these practical challenges.
163
512261
2496
08:35
Now, while DNA is more robust than any man-made device,
164
515453
3972
08:39
it's not perfect.
165
519449
1285
08:40
It does have some weaknesses.
166
520758
1950
08:43
We recover our message by sequencing the DNA,
167
523364
3431
08:46
and every time data is retrieved,
168
526819
2013
08:48
we lose the DNA.
169
528856
1786
08:50
That's just part of the sequencing process.
170
530666
2414
08:53
We don't want to run out of data,
171
533104
1935
08:55
but luckily, there's a way to copy the DNA
172
535063
3096
08:58
that's even cheaper and easier than synthesizing it.
173
538183
4585
09:03
We actually tested a way to make 200 trillion copies of our files,
174
543275
4858
09:08
and we recovered all the data without error.
175
548157
2732
09:11
So sequencing also introduces errors into our DNA,
176
551556
3867
09:15
into the A's, T's, C's and G's.
177
555447
2307
09:18
Nature has a way to deal with this in our cells.
178
558135
2978
09:21
But our data is stored in synthetic DNA in a tube,
179
561137
5890
09:27
so we had to find our own way to overcome this problem.
180
567051
3252
09:30
We decided to use an algorithm that was used to stream videos.
181
570724
4243
09:35
When you're streaming a video,
182
575452
1453
09:36
you're essentially trying to recover the original video, the original file.
183
576929
4461
09:41
When we're trying to recover our original files,
184
581414
2909
09:44
we're simply sequencing.
185
584347
1848
09:46
But really, both of these processes are about recovering enough zeroes and ones
186
586219
4088
09:50
to put our data back together.
187
590331
1793
09:52
And so, because of our coding strategy,
188
592711
2041
09:54
we were able to package up all of our data
189
594776
2551
09:57
in a way that allowed us to make millions and trillions of copies
190
597351
3772
10:01
and still always recover all of our files back.
191
601147
2976
10:04
This is the movie we encoded.
192
604708
1750
10:06
It's one of the first movies ever made,
193
606482
2580
10:09
and now the first to be copied more than 200 trillion times on DNA.
194
609086
4759
10:14
Soon after our work was published,
195
614377
2130
10:16
we participated in an "Ask Me Anything" on the website reddit.
196
616531
3747
10:20
If you're a fellow nerd, you're very familiar with this website.
197
620302
3175
10:23
Most questions were thoughtful.
198
623501
1945
10:25
Some were comical.
199
625470
1872
10:27
For example, one user wanted to know when we would have a literal thumb drive.
200
627366
4128
10:32
Now, the thing is,
201
632091
2276
10:34
our DNA already stores everything needed to make us who we are.
202
634391
4142
10:38
It's a lot safer to store data on DNA
203
638557
3818
10:42
in synthetic DNA in a tube.
204
642399
2821
10:46
Writing and reading data from DNA is obviously a lot more time-consuming
205
646704
5426
10:52
than just saving all your files on a hard drive --
206
652154
3095
10:55
for now.
207
655273
1291
10:57
So initially, we should focus on long-term storage.
208
657159
3781
11:02
Most data are ephemeral.
209
662630
2310
11:04
It's really hard to grasp what's important today,
210
664964
2588
11:07
or what will be important for future generations.
211
667576
3252
11:10
But the point is, we don't have to decide today.
212
670852
2563
11:14
There's this great program by UNESCO called the "Memory of the World" program.
213
674065
4988
11:19
It's been created to preserve historical materials
214
679077
3267
11:22
that are considered of value to all of humanity.
215
682368
3127
11:26
Items are nominated to be added to the collection,
216
686210
2977
11:29
including that film that we encoded.
217
689211
2255
11:32
While a wonderful way to preserve human heritage,
218
692188
3582
11:35
it doesn't have to be a choice.
219
695794
1912
11:38
Instead of asking the current generation -- us --
220
698088
3454
11:41
what might be important in the future,
221
701566
2222
11:43
we could store everything in DNA.
222
703812
2334
11:47
Storage is not just about how many bytes
223
707543
2440
11:50
but how well we can actually store the data and recover it.
224
710007
3501
11:53
There's always been this tension between how much data we can generate
225
713940
3431
11:57
and how much we can recover
226
717395
1715
11:59
and how much we can store.
227
719134
1769
12:01
Every advance in writing data has required a new way to read it.
228
721841
4039
12:05
We can no longer read old media.
229
725904
2343
12:08
How many of you even have a disk drive in your laptop,
230
728271
3741
12:12
never mind a floppy drive?
231
732036
1724
12:14
This will never be the case with DNA.
232
734151
2552
12:16
As long as we're around, DNA is around,
233
736727
3177
12:19
and we'll find a way to sequence it.
234
739928
2180
12:23
Archiving the world around us is part of human nature.
235
743214
3459
12:27
This is the progress we've made in digital storage in 60 years,
236
747172
4624
12:31
at a time when we were only beginning to understand DNA.
237
751820
3376
12:35
Yet, we've made similar progress in half that time with DNA sequencers,
238
755725
4845
12:40
and as long as we're around, DNA will never be obsolete.
239
760594
4943
12:46
Thank you.
240
766107
1181
12:47
(Applause)
241
767312
4981
About this website

This site will introduce you to YouTube videos that are useful for learning English. You will see English lessons taught by top-notch teachers from around the world. Double-click on the English subtitles displayed on each video page to play the video from there. The subtitles scroll in sync with the video playback. If you have any comments or requests, please contact us using this contact form.

https://forms.gle/WvT1wiN1qDtmnspy7