Frederic Kaplan: How I built an information time machine

78,570 views ・ 2014-01-09

TED


请双击下面的英文字幕来播放视频。

翻译人员: Li Yang 校对人员: Neo Liu
00:12
This is an image of the planet Earth.
0
12285
2893
这是一张地球的图像。
00:15
It looks very much like the Apollo pictures
1
15178
3093
看上去很像那非常有名的
00:18
that are very well known.
2
18271
1611
从阿波罗号上发回的地球图像。
00:19
There is something different;
3
19882
2070
但这里的图像又有一些不同的地方;
00:21
you can click on it,
4
21952
1447
你可以用鼠标点击它,
00:23
and if you click on it,
5
23399
1198
如果你点击了这个图像,
00:24
you can zoom in on almost any place on the Earth.
6
24597
3072
你也就可以放大图像中地球上的几乎任何一个地方。
00:27
For instance, this is a bird's-eye view
7
27669
1999
举个例子来看,这是一张俯瞰
00:29
of the EPFL campus.
8
29668
2666
洛桑联邦理工学院校园的图像。
00:32
In many cases, you can also see
9
32334
2108
通常,你也可以像从附近的街道上一样,
00:34
how a building looks from a nearby street.
10
34442
3740
看看这里的一座建筑是什么样子的。
00:38
This is pretty amazing.
11
38182
1422
这非常令人惊叹。
00:39
But there's something missing in this wonderful tour:
12
39604
3427
但这场绝妙的旅程中似乎又忽略了什么:
00:43
It's time.
13
43031
2188
那便是时间。
00:45
i'm not really sure when this picture was taken.
14
45219
3070
我不确定这张图片是什么时候拍的;
00:48
I'm not even sure it was taken
15
48289
1412
我甚至不确定它是不是
00:49
at the same moment as the bird's-eye view.
16
49701
6083
和前面俯瞰学院的那张照片一起拍的。
00:55
In my lab, we develop tools
17
55784
2209
在我的实验室里,我们在开发一种
00:57
to travel not only in space
18
57993
1764
能让我们不仅在空间里旅行的,
00:59
but also through time.
19
59757
2558
而且在时间中旅行的工具。
01:02
The kind of question we're asking is
20
62315
1870
我们想探讨的问题是
01:04
Is it possible to build something
21
64185
1393
我们有没有可能做出一种像
01:05
like Google Maps of the past?
22
65578
2178
关于过去的谷歌地图一样的东西?
01:07
Can I add a slider on top of Google Maps
23
67756
3310
我能不能在谷歌地图的顶端添加
01:11
and just change the year,
24
71066
1803
一行可以滑动变化年份的时间条,
01:12
seeing how it was 100 years before,
25
72869
1791
来看看这里一百年前是什么样子的,
01:14
1,000 years before?
26
74660
1669
一千年前又是什么样子的?
01:16
Is that possible?
27
76329
2123
这可能吗?
01:18
Can I reconstruct social networks of the past?
28
78452
2252
我能不能重新构建出过去的社交网络?
01:20
Can I make a Facebook of the Middle Ages?
29
80704
3049
我能不能做出中世纪的脸书(Facebook)?
01:23
So, can I build time machines?
30
83753
3776
我,能不能做出时间机器?
01:27
Maybe we can just say, "No, it's not possible."
31
87529
2565
或许我们可以直接说,“不,这不可能的。”
01:30
Or, maybe, we can think of it from an information point of view.
32
90094
3810
又或许我们可以从信息学的角度来思考这个问题。
01:33
This is what I call the information mushroom.
33
93904
3190
我把这个东西叫做信息蘑菇。
01:37
Vertically, you have the time.
34
97094
1583
数轴上是年代,
01:38
and horizontally, the amount of digital information available.
35
98677
2740
横轴上是我们能获得的关于那个年代的数字信息。
01:41
Obviously, in the last 10 years, we have much information.
36
101417
3482
很显然,我们有非常多的关于过去十年的信息,
01:44
And obviously the more we go in the past, the less information we have.
37
104899
3548
但在越来越久远的年代,我们能获得的信息也越来越少。
01:48
If we want to build something like Google Maps of the past,
38
108447
2318
如果我们想做类似过去的谷歌地图、
01:50
or Facebook of the past,
39
110765
1494
过去的脸书(Facebook)这样的东西,
01:52
we need to enlarge this space,
40
112259
1574
我们需要扩大图片中获得的数字信息的区域(橙色部分),
01:53
we need to make that like a rectangle.
41
113833
1938
我们需要让这块区域变成矩形的形状。
01:55
How do we do that?
42
115771
1510
那我们该怎么做呢?
01:57
One way is digitization.
43
117281
2098
一种方法是把所有我们能得到的资料数字化,
01:59
There's a lot of material available --
44
119395
1779
我们手边有很多过去的资料,
02:01
newspaper, printed books, thousands of printed books.
45
121190
6270
从报纸到数千的纸质图书。
02:07
I can digitize all these.
46
127460
1768
我可以把所有这些资料数字化。
02:09
I can extract information from these.
47
129228
2737
我可以从它们之中提取信息。
02:11
Of course, the more you go in the past, the less information you will have.
48
131965
3543
当然,在越来越久远的年代,我们有的资料也越来越少。
02:15
So, it might not be enough.
49
135508
2646
所以,仅仅这样可能还是不够。
02:18
So, I can do what historians do.
50
138154
2408
但我还可以做一些历史学家做的事情。
02:20
I can extrapolate.
51
140562
1524
我可以从拥有的资料信息对未知的那些事实进行推断。
02:22
This is what we call, in computer science, simulation.
52
142086
4470
在计算机学中我们把这个叫做模拟。
02:26
If I take a log book,
53
146556
1751
如果我有一本航海日志,
02:28
I can consider, it's not just a log book
54
148307
2404
我可以这么想,这不仅仅是一本
02:30
of a Venetian captain going to a particular journey.
55
150711
2972
关于一个威尼斯船长某次特定航程的航海日志。
02:33
I can consider it is actually a log book
56
153683
1643
它可以是一本
02:35
which is representative of many journeys of that period.
57
155326
2582
代表了那个年代很多类似航程的航海日志。
02:37
I'm extrapolating.
58
157908
2245
我在做的便是一个推断的过程。
02:40
If I have a painting of a facade,
59
160153
2038
如果我有一张建筑的照片,
02:42
I can consider it's not just that particular building,
60
162191
2751
我可以认为这不仅仅反映的是那一座特定的建筑的特征,
02:44
but probably it also shares the same grammar
61
164942
3932
它可能也反映了同时代那些建筑的特征,
02:48
of buildings where we lost any information.
62
168874
4041
而那些建筑可能正是我们所知甚少的。
02:52
So if we want to construct a time machine,
63
172915
2858
如果我们想建成一个时间机器,
02:55
we need two things.
64
175773
1339
我们需要两样东西。
02:57
We need very large archives,
65
177112
2234
一方面,我们需要大量的档案,
02:59
and we need excellent specialists.
66
179346
2742
另一方面,我们需要一批杰出的专家。
03:02
The Venice Time Machine,
67
182088
1874
我将和你们介绍的便是
03:03
the project I'm going to talk to you about,
68
183962
1805
威尼斯时间机器项目,
03:05
is a joint project between the EPFL
69
185767
3020
这是一个由洛桑联邦理工学院
03:08
and the University of Venice Ca'Foscari.
70
188787
2978
和威尼斯大学合作的项目。
03:11
There's something very peculiar about Venice,
71
191765
2165
关于威尼斯很特别的一点便是,
03:13
that its administration has been
72
193930
2674
它的管理模式
03:16
very, very bureaucratic.
73
196604
2194
非常的官僚主义。
03:18
They've been keeping track of everything,
74
198798
2193
他们几乎记录下这里发生的一切,
03:20
almost like Google today.
75
200991
2915
而这很像今天的谷歌。
03:23
At the Archivio di Stato,
76
203906
1514
在威尼斯的国家档案馆,
03:25
you have 80 kilometers of archives
77
205420
1764
你可以找到80千米长的档案资料,
03:27
documenting every aspect
78
207184
2009
它们记录下了过去一千多年中威尼斯人们
03:29
of the life of Venice over more than 1,000 years.
79
209193
2246
生活的方方面面。
03:31
You have every boat that goes out,
80
211439
1920
你可以了解到每一艘
03:33
every boat that comes in.
81
213359
1076
出入的小船的信息。
03:34
You have every change that was made in the city.
82
214435
2797
你可以了解到这座城市发生的每一丁点儿变化。
03:37
This is all there.
83
217232
3291
它们都被记录在那里。
03:40
We are setting up a 10-year digitization program
84
220523
3908
我们正在开展一个长达十年的数字化项目,
03:44
which has the objective of transforming
85
224431
1677
它的目标就是
03:46
this immense archive
86
226108
1384
把这些海量的档案信息
03:47
into a giant information system.
87
227492
2426
全部转化成一个巨大的信息系统。
03:49
The type of objective we want to reach
88
229918
1857
要达成这个目标,
03:51
is 450 books a day that can be digitized.
89
231775
4726
我们每天要把450本书数字化。
03:56
Of course, when you digitize, that's not enough,
90
236501
2247
当然,仅仅数字化是不够的,
03:58
because these documents,
91
238748
1287
这些档案中,
04:00
most of them are in Latin, in Tuscan,
92
240035
2639
很多是用拉丁语、托斯卡纳语、
04:02
in Venetian dialect,
93
242689
1515
威尼斯方言记录下的。
04:04
so you need to transcribe them,
94
244204
1675
所以你需要转写它们,
04:05
to translate them in some cases,
95
245879
1681
一些情况下你需要翻译它们,
04:07
to index them,
96
247560
1120
你需要将它们编入索引,
04:08
and this is obviously not easy.
97
248680
2164
而这显然不是件容易的事情。
04:10
In particular, traditional optical character recognition method
98
250844
3844
尤其要指出的是,传统的光学字符识别方法
04:14
that can be used for printed manuscripts,
99
254688
1424
虽然对于印刷本可以使用且非常有效,
04:16
they do not work well on the handwritten document.
100
256112
4004
但对于这些手写的档案似乎并不太行之有效。
04:20
So the solution is actually to take inspiration
101
260116
2130
我们的解决方案是从语音识别这个领域
04:22
from another domain: speech recognition.
102
262246
2901
寻找一些启发。
04:25
This is a domain of something that seems impossible,
103
265147
2055
这个领域看上去做的是一些不可能完成的事情,
04:27
which can actually be done,
104
267202
2537
但其实只要加上一些限制条件,
04:29
simply by putting additional constraints.
105
269739
2194
它们完全是可以做到的。
04:31
If you have a very good model
106
271933
1586
如果你有一个关于被使用的语言的
04:33
of a language which is used,
107
273519
1526
很好的模型;
04:35
if you have a very good model of a document,
108
275045
2086
如果你有一个关于一份条理清晰的
04:37
how well they are structured.
109
277131
1432
档案的很好的模型,
04:38
And these are administrative documents.
110
278563
1353
这便是那些行政管理的档案文献,
04:39
They are well structured in many cases.
111
279931
2132
它们通常都有很好的条理;
04:42
If you divide this huge archive into smaller subsets
112
282063
3308
如果你把这些海量的档案划分成一些小的部分,
04:45
where a smaller subset actually shares similar features,
113
285371
2877
其中每一个部分都和其他部分有相近的特征,
04:48
then there's a chance of success.
114
288248
4031
那么我们便有成功的机会。
04:54
If we reach that stage, then there's something else:
115
294761
2435
如果我们到了那个阶段,我们便可以做一些别的事情:
04:57
we can extract from this document events.
116
297196
3522
我们可以从这些档案文献中提取事件。
05:00
Actually probably 10 billion events
117
300718
2298
实际上大概一百亿件事件
05:03
can be extracted from this archive.
118
303016
1931
可以从这些档案中提取出来。
05:04
And this giant information system
119
304947
1724
而这巨大的信息系统
05:06
can be searched in many ways.
120
306671
1816
又可以被很多种方法搜索。
05:08
You can ask questions like,
121
308487
1368
你可以问这样的问题,
05:09
"Who lived in this palazzo in 1323?"
122
309855
2760
“1323年的时候,谁住在这个宫殿里?”
05:12
"How much cost a sea bream at the Realto market
123
312615
2222
“1434年的时候,里亚托的一个集市里,
05:14
in 1434?"
124
314837
1724
海鲷卖多少钱?
05:16
"What was the salary
125
316561
1460
“穆拉诺岛的一个玻璃工人
05:18
of a glass maker in Murano
126
318021
2045
大约十多年前的工资
05:20
maybe over a decade?"
127
320066
1406
是多少?”
05:21
You can ask even bigger questions
128
321472
1422
因为这些信息会被用语义编码,
05:22
because it will be semantically coded.
129
322894
2738
你又可以问一些更宏大的问题。
05:25
And then what you can do is put that in space,
130
325632
2140
然后我们便需要把这些存在于空间中的信息
05:27
because much of this information is spatial.
131
327772
2173
放回到它们原来的空间中去。
05:29
And from that, you can do things like
132
329945
1935
这样,我们便可以
05:31
reconstructing this extraordinary journey
133
331880
2113
重新构建出这场令人惊叹的关于这座城市的旅程,
05:33
of that city that managed to have a sustainable development
134
333993
3356
让它能有一个持续的、
05:37
over a thousand years,
135
337349
2126
超过数千年的发展过程,
05:39
managing to have all the time
136
339475
1620
能让所有的时间
05:41
a form of equilibrium with its environment.
137
341095
2861
和它所在的空间环境达到一种平衡状态。
05:43
You can reconstruct that journey,
138
343956
1248
我们可以重新建构这场旅行,
05:45
visualize it in many different ways.
139
345204
2896
用不同的方法将它图像化。
05:48
But of course, you cannot understand Venice if you just look at the city.
140
348100
2699
但当然,如果我们仅仅考察威尼斯这一座城市,我们便不能做到完全理解它
05:50
You have to put it in a larger European context.
141
350799
2396
我们需要把它放到更大的欧洲的背景下去观察研究。
05:53
So the idea is also to document all the things
142
353195
2821
这便意味着我们需要记录下
05:56
that worked at the European level.
143
356016
2423
在欧洲层面上发生的所有事情。
05:58
We can reconstruct also the journey
144
358439
1964
我们也可以重新建构
06:00
of the Venetian maritime empire,
145
360403
1990
威尼斯海上帝国时期的旅程,
06:02
how it progressively controlled the Adriatic Sea,
146
362393
3166
看它如何一步步控制了亚得里亚海,
06:05
how it became the most powerful medieval empire
147
365559
3746
看它如何变成那个时代
06:09
of its time,
148
369305
1561
最强大的中世纪帝国,
06:10
controlling most of the sea routes
149
370866
2172
它如何几乎控制了从东到南的
06:13
from the east to the south.
150
373038
2933
所有海上航线。
06:17
But you can even do other things,
151
377305
2316
同时,因为这些海上航线
06:19
because in these maritime routes,
152
379621
2277
有着自己的模式和规律,
06:21
there are regular patterns.
153
381898
1975
我们甚至可以做一些别的事情。
06:23
You can go one step beyond
154
383889
2493
我们可以更进一步,
06:26
and actually create a simulation system,
155
386382
2120
创造出一个模拟系统,
06:28
create a Mediterranean simulator
156
388502
2815
模拟出地中海区域的历史,
06:31
which is capable actually of reconstructing
157
391317
2593
这能让我们甚至重建出
06:33
even the information we are missing,
158
393910
2202
我们丢失的信息,
06:36
which would enable us to have questions you could ask
159
396112
2988
能让我们回答出一些别的问题。
06:39
like if you were using a route planner.
160
399100
2988
比如如果你在进行路线规划,你想问,
06:42
"If I am in Corfu in June 1323
161
402088
3071
“如果我在1323年6月科孚岛,
06:45
and want to go to Constantinople,
162
405159
2526
想前往君士坦丁堡,
06:47
where can I take a boat?"
163
407685
2143
我能在哪里找到船?”
06:49
Probably we can answer this question
164
409828
1367
或许我们可以在当时的两三天的精确度内
06:51
with one or two or three days' precision.
165
411195
4473
回答这个问题。
06:55
"How much will it cost?"
166
415668
1607
“它需要多少钱?”
06:57
"What are the chance of encountering pirates?"
167
417275
3592
“遇到海盗的几率有多少?”
07:00
Of course, you understand,
168
420867
1811
当然,你也应该理解,
07:02
the central scientific challenge of a project like this one
169
422678
2609
对于这样一个项目,最核心的科学性质疑便是
07:05
is qualifying, quantifying and representing
170
425287
3729
能否量化出
07:09
uncertainty and inconsistency at each step of this process.
171
429016
3330
它每一步中的不确定性和不一致性。
07:12
There are errors everywhere,
172
432346
2712
因为到处都有错误,
07:15
errors in the document, it's the wrong name of the captain,
173
435058
2489
档案中有错误,或许是船长的名字错了,
07:17
some of the boats never actually took to sea.
174
437547
3213
或许是有一些小船从来没有出过海,
07:20
There are errors in translation, interpretative biases,
175
440760
4857
翻译中也有错误,我们的解释可能有偏差,
07:25
and on top of that, if you add algorithmic processes,
176
445624
3466
最关键的是,如果我们加上算法的过程,
07:29
you're going to have errors in recognition,
177
449090
2949
我们将会在信息识别、
07:32
errors in extraction,
178
452039
1961
信息提取中都存在错误,
07:34
so you have very, very uncertain data.
179
454000
4481
这样我们拥有的便是非常不确定的信息资料。
07:38
So how can we detect and correct these inconsistencies?
180
458481
3757
那我们如何发现并纠正这些偏差呢?
07:42
How can we represent that form of uncertainty?
181
462238
3660
我们如何表示出这种不确定性呢?
07:45
It's difficult. One thing you can do
182
465898
2097
这是非常困难的,我们能做的
07:47
is document each step of the process,
183
467995
2226
便是记录下我们过程中的每一步,
07:50
not only coding the historical information
184
470221
2448
这不仅仅是翻译出那些历史信息,
07:52
but what we call the meta-historical information,
185
472669
2679
而且是翻译出那些我们叫做元历史的信息,
07:55
how is historical knowledge constructed,
186
475348
2663
关于那些历史是如何构建的,
07:58
documenting each step.
187
478011
1998
我们要记录下每一步。
08:00
That will not guarantee that we actually converge
188
480009
1645
这当然不会保证我们真的能汇聚出
08:01
toward a single story of Venice,
189
481654
2450
关于威尼斯的最可靠的过去,
08:04
but probably we can actually reconstruct
190
484104
2138
但或许我们真的能重建出
08:06
a fully documented potential story of Venice.
191
486242
3048
一个可能的威尼斯的过去。
08:09
Maybe there's not a single map.
192
489290
1459
也许不仅仅有一张地图,
08:10
Maybe there are several maps.
193
490749
2120
也许有很多张地图。
08:12
The system should allow for that,
194
492869
2216
这个系统应该承认并接受这些事实,
08:15
because we have to deal with a new form of uncertainty,
195
495085
2859
因为我们必须要面对并处理这种新的不确定性,
08:17
which is really new for this type of giant databases.
196
497944
4641
它对于我们这种巨大的数据库而言确实是非常新的。
08:22
And how should we communicate
197
502585
2190
然后,我们应该怎样
08:24
this new research to a large audience?
198
504790
3979
和更多的人交流我们这项全新的研究呢?
08:28
Again, Venice is extraordinary for that.
199
508769
2663
再一次地,威尼斯在这里非常地特别,它有自己的优势。
08:31
With the millions of visitors that come every year,
200
511432
2171
在威尼斯每年有数百万的游客前来观光,
08:33
it's actually one of the best places
201
513603
1763
这使它实际上变成了
08:35
to try to invent the museum of the future.
202
515366
2988
构建出未来博物馆的最佳的选择之一。
08:38
Imagine, horizontally you see the reconstructed map
203
518354
3304
想象一下,在横轴上你看到某个特定年份的
08:41
of a given year,
204
521658
1286
重新组织建构好的地图,
08:42
and vertically, you see the document
205
522944
2958
在竖轴上,你看到完成这一重建的
08:45
that served the reconstruction,
206
525902
1511
档案资料,
08:47
paintings, for instance.
207
527413
3400
比如说绘画作品。
08:50
Imagine an immersive system that permits
208
530813
2580
想象一下,这样一个身临其境的系统
08:53
to go and dive and reconstruct the Venice of a given year,
209
533393
3502
能让你深入到威尼斯的每一个特定年份去体验,
08:56
some experience you could share within a group.
210
536895
2715
这显然是你应该和他人分享的经历。
08:59
On the contrary, imagine actually that you start
211
539610
2246
另一方面,实际上你体验的这一切
09:01
from a document, a Venetian manuscript,
212
541856
2207
都是从一份档案、一份威尼斯的手稿出发构建的,
09:04
and you show, actually, what you can construct out of it,
213
544063
3049
你看到你能从那些档案资料中得到什么,
09:07
how it is decoded,
214
547112
1772
它们是如何被解读出来的,
09:08
how the context of that document can be recreated.
215
548884
2415
那些档案中的内容又是如何被重现的。
09:11
This is an image from an exhibit
216
551299
1885
这便是这样一件展览品的概念,
09:13
which is currently conducted in Geneva
217
553184
2276
而它现在正在和这种信息系统一起
09:15
with that type of system.
218
555460
2354
在日内瓦进行着。
09:17
So to conclude, we can say that
219
557814
2175
总而言之,我们可以说
09:19
research in the humanities is about to undergo
220
559989
3079
现在研究人类相关的人文学很像
09:23
an evolution which is maybe similar
221
563068
1802
30多年前在生命科学领域
09:24
to what happened to life sciences 30 years ago.
222
564870
4582
发生的一场革命性的变化。
09:29
It's really a question of scale.
223
569452
4676
这真的是个规模的问题。
09:34
We see projects which are
224
574130
3303
我们看到很多项目,
09:37
much beyond any single research team can do,
225
577433
3843
它们在做的远远超过任何一个单一的研究小组,
09:41
and this is really new for the humanities,
226
581276
2243
这对人文学者来说确实是非常新颖的,
09:43
which very often take the habit of working
227
583519
3869
因为他们通常适应于
09:47
in small groups or only with a couple of researchers.
228
587388
4008
在小的团队里工作或仅仅和一些研究者一起工作。
09:51
When you visit the Archivio di Stato,
229
591396
2118
当你参观威尼斯国家档案馆的时候,
09:53
you feel this is beyond what any single team can do,
230
593514
2822
你会觉得这远远超过了任何一个团队能做的事情,
09:56
and that should be a joint and common effort.
231
596336
3834
那应该是共同努力的结果。
10:00
So what we must do for this paradigm shift
232
600170
3106
所以应对这种模式的转换
10:03
is actually foster a new generation
233
603276
1902
我们应该培养出新的一代人,
10:05
of "digital humanists"
234
605178
1537
他们便是“数字人文主义者”,
10:06
that are going to be ready for this shift.
235
606715
2090
他们应该能准备好迎接这种转变。
10:08
I thank you very much.
236
608805
1959
非常感谢。
10:10
(Applause)
237
610764
4000
(鼓掌)
关于本网站

这个网站将向你介绍对学习英语有用的YouTube视频。你将看到来自世界各地的一流教师教授的英语课程。双击每个视频页面上显示的英文字幕,即可从那里播放视频。字幕会随着视频的播放而同步滚动。如果你有任何意见或要求,请使用此联系表与我们联系。

https://forms.gle/WvT1wiN1qDtmnspy7