Frederic Kaplan: How I built an information time machine

78,548 views ・ 2014-01-09

TED


Please double-click on the English subtitles below to play the video.

00:12
This is an image of the planet Earth.
0
12285
2893
00:15
It looks very much like the Apollo pictures
1
15178
3093
00:18
that are very well known.
2
18271
1611
00:19
There is something different;
3
19882
2070
00:21
you can click on it,
4
21952
1447
00:23
and if you click on it,
5
23399
1198
00:24
you can zoom in on almost any place on the Earth.
6
24597
3072
00:27
For instance, this is a bird's-eye view
7
27669
1999
00:29
of the EPFL campus.
8
29668
2666
00:32
In many cases, you can also see
9
32334
2108
00:34
how a building looks from a nearby street.
10
34442
3740
00:38
This is pretty amazing.
11
38182
1422
00:39
But there's something missing in this wonderful tour:
12
39604
3427
00:43
It's time.
13
43031
2188
00:45
i'm not really sure when this picture was taken.
14
45219
3070
00:48
I'm not even sure it was taken
15
48289
1412
00:49
at the same moment as the bird's-eye view.
16
49701
6083
00:55
In my lab, we develop tools
17
55784
2209
00:57
to travel not only in space
18
57993
1764
00:59
but also through time.
19
59757
2558
01:02
The kind of question we're asking is
20
62315
1870
01:04
Is it possible to build something
21
64185
1393
01:05
like Google Maps of the past?
22
65578
2178
01:07
Can I add a slider on top of Google Maps
23
67756
3310
01:11
and just change the year,
24
71066
1803
01:12
seeing how it was 100 years before,
25
72869
1791
01:14
1,000 years before?
26
74660
1669
01:16
Is that possible?
27
76329
2123
01:18
Can I reconstruct social networks of the past?
28
78452
2252
01:20
Can I make a Facebook of the Middle Ages?
29
80704
3049
01:23
So, can I build time machines?
30
83753
3776
01:27
Maybe we can just say, "No, it's not possible."
31
87529
2565
01:30
Or, maybe, we can think of it from an information point of view.
32
90094
3810
01:33
This is what I call the information mushroom.
33
93904
3190
01:37
Vertically, you have the time.
34
97094
1583
01:38
and horizontally, the amount of digital information available.
35
98677
2740
01:41
Obviously, in the last 10 years, we have much information.
36
101417
3482
01:44
And obviously the more we go in the past, the less information we have.
37
104899
3548
01:48
If we want to build something like Google Maps of the past,
38
108447
2318
01:50
or Facebook of the past,
39
110765
1494
01:52
we need to enlarge this space,
40
112259
1574
01:53
we need to make that like a rectangle.
41
113833
1938
01:55
How do we do that?
42
115771
1510
01:57
One way is digitization.
43
117281
2098
01:59
There's a lot of material available --
44
119395
1779
02:01
newspaper, printed books, thousands of printed books.
45
121190
6270
02:07
I can digitize all these.
46
127460
1768
02:09
I can extract information from these.
47
129228
2737
02:11
Of course, the more you go in the past, the less information you will have.
48
131965
3543
02:15
So, it might not be enough.
49
135508
2646
02:18
So, I can do what historians do.
50
138154
2408
02:20
I can extrapolate.
51
140562
1524
02:22
This is what we call, in computer science, simulation.
52
142086
4470
02:26
If I take a log book,
53
146556
1751
02:28
I can consider, it's not just a log book
54
148307
2404
02:30
of a Venetian captain going to a particular journey.
55
150711
2972
02:33
I can consider it is actually a log book
56
153683
1643
02:35
which is representative of many journeys of that period.
57
155326
2582
02:37
I'm extrapolating.
58
157908
2245
02:40
If I have a painting of a facade,
59
160153
2038
02:42
I can consider it's not just that particular building,
60
162191
2751
02:44
but probably it also shares the same grammar
61
164942
3932
02:48
of buildings where we lost any information.
62
168874
4041
02:52
So if we want to construct a time machine,
63
172915
2858
02:55
we need two things.
64
175773
1339
02:57
We need very large archives,
65
177112
2234
02:59
and we need excellent specialists.
66
179346
2742
03:02
The Venice Time Machine,
67
182088
1874
03:03
the project I'm going to talk to you about,
68
183962
1805
03:05
is a joint project between the EPFL
69
185767
3020
03:08
and the University of Venice Ca'Foscari.
70
188787
2978
03:11
There's something very peculiar about Venice,
71
191765
2165
03:13
that its administration has been
72
193930
2674
03:16
very, very bureaucratic.
73
196604
2194
03:18
They've been keeping track of everything,
74
198798
2193
03:20
almost like Google today.
75
200991
2915
03:23
At the Archivio di Stato,
76
203906
1514
03:25
you have 80 kilometers of archives
77
205420
1764
03:27
documenting every aspect
78
207184
2009
03:29
of the life of Venice over more than 1,000 years.
79
209193
2246
03:31
You have every boat that goes out,
80
211439
1920
03:33
every boat that comes in.
81
213359
1076
03:34
You have every change that was made in the city.
82
214435
2797
03:37
This is all there.
83
217232
3291
03:40
We are setting up a 10-year digitization program
84
220523
3908
03:44
which has the objective of transforming
85
224431
1677
03:46
this immense archive
86
226108
1384
03:47
into a giant information system.
87
227492
2426
03:49
The type of objective we want to reach
88
229918
1857
03:51
is 450 books a day that can be digitized.
89
231775
4726
03:56
Of course, when you digitize, that's not enough,
90
236501
2247
03:58
because these documents,
91
238748
1287
04:00
most of them are in Latin, in Tuscan,
92
240035
2639
04:02
in Venetian dialect,
93
242689
1515
04:04
so you need to transcribe them,
94
244204
1675
04:05
to translate them in some cases,
95
245879
1681
04:07
to index them,
96
247560
1120
04:08
and this is obviously not easy.
97
248680
2164
04:10
In particular, traditional optical character recognition method
98
250844
3844
04:14
that can be used for printed manuscripts,
99
254688
1424
04:16
they do not work well on the handwritten document.
100
256112
4004
04:20
So the solution is actually to take inspiration
101
260116
2130
04:22
from another domain: speech recognition.
102
262246
2901
04:25
This is a domain of something that seems impossible,
103
265147
2055
04:27
which can actually be done,
104
267202
2537
04:29
simply by putting additional constraints.
105
269739
2194
04:31
If you have a very good model
106
271933
1586
04:33
of a language which is used,
107
273519
1526
04:35
if you have a very good model of a document,
108
275045
2086
04:37
how well they are structured.
109
277131
1432
04:38
And these are administrative documents.
110
278563
1353
04:39
They are well structured in many cases.
111
279931
2132
04:42
If you divide this huge archive into smaller subsets
112
282063
3308
04:45
where a smaller subset actually shares similar features,
113
285371
2877
04:48
then there's a chance of success.
114
288248
4031
04:54
If we reach that stage, then there's something else:
115
294761
2435
04:57
we can extract from this document events.
116
297196
3522
05:00
Actually probably 10 billion events
117
300718
2298
05:03
can be extracted from this archive.
118
303016
1931
05:04
And this giant information system
119
304947
1724
05:06
can be searched in many ways.
120
306671
1816
05:08
You can ask questions like,
121
308487
1368
05:09
"Who lived in this palazzo in 1323?"
122
309855
2760
05:12
"How much cost a sea bream at the Realto market
123
312615
2222
05:14
in 1434?"
124
314837
1724
05:16
"What was the salary
125
316561
1460
05:18
of a glass maker in Murano
126
318021
2045
05:20
maybe over a decade?"
127
320066
1406
05:21
You can ask even bigger questions
128
321472
1422
05:22
because it will be semantically coded.
129
322894
2738
05:25
And then what you can do is put that in space,
130
325632
2140
05:27
because much of this information is spatial.
131
327772
2173
05:29
And from that, you can do things like
132
329945
1935
05:31
reconstructing this extraordinary journey
133
331880
2113
05:33
of that city that managed to have a sustainable development
134
333993
3356
05:37
over a thousand years,
135
337349
2126
05:39
managing to have all the time
136
339475
1620
05:41
a form of equilibrium with its environment.
137
341095
2861
05:43
You can reconstruct that journey,
138
343956
1248
05:45
visualize it in many different ways.
139
345204
2896
05:48
But of course, you cannot understand Venice if you just look at the city.
140
348100
2699
05:50
You have to put it in a larger European context.
141
350799
2396
05:53
So the idea is also to document all the things
142
353195
2821
05:56
that worked at the European level.
143
356016
2423
05:58
We can reconstruct also the journey
144
358439
1964
06:00
of the Venetian maritime empire,
145
360403
1990
06:02
how it progressively controlled the Adriatic Sea,
146
362393
3166
06:05
how it became the most powerful medieval empire
147
365559
3746
06:09
of its time,
148
369305
1561
06:10
controlling most of the sea routes
149
370866
2172
06:13
from the east to the south.
150
373038
2933
06:17
But you can even do other things,
151
377305
2316
06:19
because in these maritime routes,
152
379621
2277
06:21
there are regular patterns.
153
381898
1975
06:23
You can go one step beyond
154
383889
2493
06:26
and actually create a simulation system,
155
386382
2120
06:28
create a Mediterranean simulator
156
388502
2815
06:31
which is capable actually of reconstructing
157
391317
2593
06:33
even the information we are missing,
158
393910
2202
06:36
which would enable us to have questions you could ask
159
396112
2988
06:39
like if you were using a route planner.
160
399100
2988
06:42
"If I am in Corfu in June 1323
161
402088
3071
06:45
and want to go to Constantinople,
162
405159
2526
06:47
where can I take a boat?"
163
407685
2143
06:49
Probably we can answer this question
164
409828
1367
06:51
with one or two or three days' precision.
165
411195
4473
06:55
"How much will it cost?"
166
415668
1607
06:57
"What are the chance of encountering pirates?"
167
417275
3592
07:00
Of course, you understand,
168
420867
1811
07:02
the central scientific challenge of a project like this one
169
422678
2609
07:05
is qualifying, quantifying and representing
170
425287
3729
07:09
uncertainty and inconsistency at each step of this process.
171
429016
3330
07:12
There are errors everywhere,
172
432346
2712
07:15
errors in the document, it's the wrong name of the captain,
173
435058
2489
07:17
some of the boats never actually took to sea.
174
437547
3213
07:20
There are errors in translation, interpretative biases,
175
440760
4857
07:25
and on top of that, if you add algorithmic processes,
176
445624
3466
07:29
you're going to have errors in recognition,
177
449090
2949
07:32
errors in extraction,
178
452039
1961
07:34
so you have very, very uncertain data.
179
454000
4481
07:38
So how can we detect and correct these inconsistencies?
180
458481
3757
07:42
How can we represent that form of uncertainty?
181
462238
3660
07:45
It's difficult. One thing you can do
182
465898
2097
07:47
is document each step of the process,
183
467995
2226
07:50
not only coding the historical information
184
470221
2448
07:52
but what we call the meta-historical information,
185
472669
2679
07:55
how is historical knowledge constructed,
186
475348
2663
07:58
documenting each step.
187
478011
1998
08:00
That will not guarantee that we actually converge
188
480009
1645
08:01
toward a single story of Venice,
189
481654
2450
08:04
but probably we can actually reconstruct
190
484104
2138
08:06
a fully documented potential story of Venice.
191
486242
3048
08:09
Maybe there's not a single map.
192
489290
1459
08:10
Maybe there are several maps.
193
490749
2120
08:12
The system should allow for that,
194
492869
2216
08:15
because we have to deal with a new form of uncertainty,
195
495085
2859
08:17
which is really new for this type of giant databases.
196
497944
4641
08:22
And how should we communicate
197
502585
2190
08:24
this new research to a large audience?
198
504790
3979
08:28
Again, Venice is extraordinary for that.
199
508769
2663
08:31
With the millions of visitors that come every year,
200
511432
2171
08:33
it's actually one of the best places
201
513603
1763
08:35
to try to invent the museum of the future.
202
515366
2988
08:38
Imagine, horizontally you see the reconstructed map
203
518354
3304
08:41
of a given year,
204
521658
1286
08:42
and vertically, you see the document
205
522944
2958
08:45
that served the reconstruction,
206
525902
1511
08:47
paintings, for instance.
207
527413
3400
08:50
Imagine an immersive system that permits
208
530813
2580
08:53
to go and dive and reconstruct the Venice of a given year,
209
533393
3502
08:56
some experience you could share within a group.
210
536895
2715
08:59
On the contrary, imagine actually that you start
211
539610
2246
09:01
from a document, a Venetian manuscript,
212
541856
2207
09:04
and you show, actually, what you can construct out of it,
213
544063
3049
09:07
how it is decoded,
214
547112
1772
09:08
how the context of that document can be recreated.
215
548884
2415
09:11
This is an image from an exhibit
216
551299
1885
09:13
which is currently conducted in Geneva
217
553184
2276
09:15
with that type of system.
218
555460
2354
09:17
So to conclude, we can say that
219
557814
2175
09:19
research in the humanities is about to undergo
220
559989
3079
09:23
an evolution which is maybe similar
221
563068
1802
09:24
to what happened to life sciences 30 years ago.
222
564870
4582
09:29
It's really a question of scale.
223
569452
4676
09:34
We see projects which are
224
574130
3303
09:37
much beyond any single research team can do,
225
577433
3843
09:41
and this is really new for the humanities,
226
581276
2243
09:43
which very often take the habit of working
227
583519
3869
09:47
in small groups or only with a couple of researchers.
228
587388
4008
09:51
When you visit the Archivio di Stato,
229
591396
2118
09:53
you feel this is beyond what any single team can do,
230
593514
2822
09:56
and that should be a joint and common effort.
231
596336
3834
10:00
So what we must do for this paradigm shift
232
600170
3106
10:03
is actually foster a new generation
233
603276
1902
10:05
of "digital humanists"
234
605178
1537
10:06
that are going to be ready for this shift.
235
606715
2090
10:08
I thank you very much.
236
608805
1959
10:10
(Applause)
237
610764
4000
About this website

This site will introduce you to YouTube videos that are useful for learning English. You will see English lessons taught by top-notch teachers from around the world. Double-click on the English subtitles displayed on each video page to play the video from there. The subtitles scroll in sync with the video playback. If you have any comments or requests, please contact us using this contact form.

https://forms.gle/WvT1wiN1qDtmnspy7