Big Data - Tim Smith

探索"海量数据"的前沿 - 蒂姆 . 史密斯

589,741 views ・ 2013-05-03

TED-Ed


请双击下面的英文字幕来播放视频。

00:00
Translator: Andrea McDonough Reviewer: Jessica Ruby
0
0
7000
翻译人员: Hanlin Xu 校对人员: Bighead Ge
00:31
Big data is an elusive concept.
1
31085
2762
“海量资料” 是一个让人难以捉摸的概念。
00:35
It represents an amount of digital information,
2
35987
2688
它表示了巨大的数字信息量,大到难以
00:38
which is uncomfortable to store,
3
38675
2170
存储
00:40
transport,
4
40845
1128
转移
00:41
or analyze.
5
41973
1878
或分析。
00:43
Big data is so voluminous
6
43851
1915
”海量数据“ 非常庞大以至于
00:45
that it overwhelms the technologies of the day
7
45766
2708
它颠覆了目前的科技发展,
00:48
and challenges us to create the next generation
8
48474
2425
并且挑战我们发明新一代
00:50
of data storage tools and techniques.
9
50899
3105
数据存储技术的工具和技术。
00:59
So, big data isn't new.
10
59557
1779
所以,“海量数据”不是新的话题。
01:01
In fact, physicists at CERN have been rangling
11
61336
2358
实际上,物理学家在欧洲粒子物理研究所已经为
01:03
with the challenge of their ever-expanding big data for decades.
12
63694
4399
他们不断扩大的数据库纠结了数十年。
01:09
Fifty years ago, CERN's data could be stored
13
69431
2323
五十年前,欧洲粒子物理研究所的数据可以被存储在
01:11
in a single computer.
14
71754
1752
单单一台电脑上。
01:13
OK, so it wasn't your usual computer,
15
73506
2154
好吧,那台电脑不是你现在用的普通的电脑。
01:15
this was a mainframe computer
16
75660
1417
这台电脑的主机填满了
01:17
that filled an entire building.
17
77077
2310
整个办公楼。
01:21
To analyze the data,
18
81494
1169
想要分析得到的数据,
01:22
physicists from around the world traveled to CERN
19
82663
2948
世界各地的物理学家们就得来欧洲粒子物理研究所
01:25
to connect to the enormous machine.
20
85611
3026
连接上这个巨大的机器。
01:31
In the 1970's, our ever-growing big data
21
91075
2853
在七十年代,这些不断增长的海量数据
01:33
was distributed across different sets of computers,
22
93928
2750
被分配到不同的计算机集上,
01:36
which mushroomed at CERN.
23
96678
2030
这些计算机集在研究所里迅速扩张。
01:38
Each set was joined together
24
98708
1442
每个计算机集连着
01:40
in dedicated, homegrown networks.
25
100150
2528
专属的广播网。
01:42
But physicists collaborated without regard
26
102678
1786
但是物理学家们的合作研究不能受到这些
01:44
for the boundaries between sets,
27
104464
1949
计算机集的束缚,
01:46
hence needed to access data on all of these.
28
106413
2889
他们需要访问所有的数据,
01:49
So, we bridged the independent networks together
29
109302
1985
所以,我们桥接起这些独立的计算机集
01:51
in our own CERNET.
30
111287
3092
创建了欧洲粒子物理研究所内部网络。
01:54
In the 1980's, islands of similar networks
31
114379
2848
在八十年代,说着不同语言的与此相似的网络
01:57
speaking different dialects
32
117227
1544
扩散到了整个欧洲
01:58
sprung up all over Europe and the States,
33
118771
2540
和美国,
02:01
making remote access possible but torturous.
34
121311
3091
使远程访问成为可能但是非常痛苦和麻烦。
02:04
To make it easy for our physicists across the world
35
124402
2144
为了让全球的物理学家们
02:06
to access the ever-expanding big data
36
126546
2405
更容易地拿到
02:08
stored at CERN without traveling,
37
128951
1793
这些数据,
02:10
the networks needed to be talking
38
130744
1299
这些网络必须用
02:12
with the same language.
39
132043
1370
同一种语言。
02:13
We adopted the fledgling internet working standard from the States,
40
133413
3795
我们采用了初出茅庐的美国因特网标准,
02:17
followed by the rest of Europe,
41
137208
1376
欧洲也随之采用,
02:18
and we established the principal link at CERN
42
138584
2168
之后,1989年,我们设立了欧洲和美国的首要链接
02:20
between Europe and the States in 1989,
43
140752
2503
在欧洲粒子物理研究所
02:23
and the truly global internet took off!
44
143255
2786
随后,全球因特网迅速流行起来。
02:28
Physicists could easily then access
45
148580
1791
物理学家们可以轻而易举地
02:30
the terabytes of big data
46
150371
1812
从全世界各地远程获取
02:32
remotely from around the world,
47
152183
1663
海量数据
02:33
generate results,
48
153846
1379
生成结果,
02:35
and write papers in their home institutes.
49
155225
2295
并且在他们自己的研究所里写研究报告。
02:37
Then, they wanted to share their findings
50
157520
1501
之后,他们想和所有的同行们
02:39
with all their colleagues.
51
159021
1792
分享他们的研究成果。
02:40
To make this information sharing easy,
52
160813
1603
为了让数据分享更容易,
02:42
we created the web in the early 1990's.
53
162416
2942
我们在九十年代早起发明了因特网。
02:45
Physicists no longer needed to know
54
165358
1838
物理学家们再也不用需要
02:47
where the information was stored
55
167196
1637
知道数据储存在哪里
02:48
in order to find it and access it on the web,
56
168833
2569
他们只需要上网找就可以了。
02:51
an idea which caught on across the world
57
171402
2134
这个主意被人们广泛接受了,
02:53
and has transformed the way we communicate
58
173536
2376
随之改变了我们日常生活中
02:55
in our daily lives.
59
175912
1668
人们沟通的方式。
03:00
During the early 2000's,
60
180226
1407
在二十一世纪初期,
03:01
the continued growth of our big data
61
181633
1990
“海量数据”的持续增长
03:03
outstripped our capability to analyze it at CERN,
62
183623
3291
超过了欧洲物理研究所的研究能力
03:06
despite having buildings full of computers.
63
186914
3585
(尽管他们拥有一幢幢全是计算机的大楼)
03:10
We had to start distributing the petabytes of data
64
190499
2306
我们不得不开始散步这些“拍它字节” 数据 (拍字节或拍它字节(Petabyte、PB)是一种资讯计量单位,现今通常在标示网络硬盘总容量,或具有大容量的储存媒介之储存容量时使用。)
03:12
to our collaborating partners
65
192805
1582
给我们的合作伙伴,
03:14
in order to employ local computing and storage
66
194387
2752
从而使用上百各大科学研究院的
03:17
at hundreds of different institutes.
67
197139
2835
地方计算机存储资源。
03:19
In order to orchestrate these interconnected resources
68
199974
2295
为了更好得调配这些
03:22
with their diverse technologies,
69
202269
2044
互相联系的资源
03:24
we developed a computing grid,
70
204313
1751
我们研发了一个计算机网格
03:26
enabling the seamless sharing
71
206064
1576
使无缝的全球数据分享
03:27
of computing resources around the globe.
72
207640
2428
成为可能.
03:30
This relies on trust relationships and mutual exchange.
73
210068
4391
这依赖于相互信赖的关系和互相交流。
03:34
But this grid model could not be transferred
74
214459
2293
但是这个网格模型可以轻易地被转送到这种关系之外,
03:36
out of our community so easily,
75
216752
2284
没有相互信赖的关系和互相交流,
03:39
where not everyone has resources to share
76
219036
2294
每一个人都会对自己的资源表现的很保守,
03:41
nor could companies be expected
77
221330
1876
一些公司也不会有
03:43
to have the same level of trust.
78
223206
2753
同样的信任度。
03:45
Instead, an alternative, more business-like approach
79
225959
2295
取而代之一种商业化方式的获取信息的方式
03:48
for accessing on-demand resources
80
228254
1836
在最近非常
03:50
has been flourishing recently,
81
230090
1708
流行,
03:51
called cloud computing,
82
231798
1668
那就是云技术。
03:53
which other communities are now exploiting
83
233466
1876
云技术被很多其他团体用来
03:55
to analyzing their big data.
84
235342
2000
分析他们的海量数据。
03:57
It might seem paradoxical for a place like CERN,
85
237342
2987
像欧洲粒子物理研究所这样的地方
04:00
a lab focused on the study
86
240329
1571
专注于研究小得无法想象的粒子
04:01
of the unimaginably small building blocks of matter,
87
241900
3171
却可以成为“海量数据" 的源头
04:05
to be the source of something as big as big data.
88
245071
3377
这可能会让人感觉很矛盾
04:08
But the way we study the fundamental particles,
89
248448
2082
然而,我们学习这些基本颗粒的方式
04:10
as well as the forces by which they interact,
90
250530
2613
和这些颗粒作用于彼此的作用力
04:13
involves creating them fleetingly,
91
253143
2103
包含了:短暂地创造它们,
04:15
colliding protons in our accelerators
92
255246
2368
在加速器里使它们碰撞,
04:17
and capturing a trace of them
93
257614
1427
在它们在以接近光速运动时
04:19
as they zoom off near light speed.
94
259041
2273
记录下它们的迹线。
04:21
To see those traces,
95
261314
994
为了能很好地观察这些轨迹,
04:22
our detector, with 150 million sensors,
96
262308
3448
在探测器里,我们装了1.5亿个感应器,
04:25
acts like a really massive 3-D camera,
97
265756
2475
这些探测器就像硕大的3D照相机,
04:28
taking a picture of each collision event -
98
268231
2110
拍下每一次碰撞-
04:30
that's up to 14 millions times per second.
99
270341
2550
那是每秒钟1400万张。
04:32
That makes a lot of data.
100
272891
2533
这构成了很多数据。
04:37
But if big data has been around for so long,
101
277194
2159
如果”海量数据“已经存在了那么久,
04:39
why do we suddenly keep hearing about it now?
102
279353
2627
我们为什么现在才听说它呢?
04:41
Well, as the old metaphor explains,
103
281980
1711
老话说得好
04:43
the whole is greater than the sum of its parts,
104
283691
2788
”团结力量大“,
04:46
and this is no longer just science that is exploiting this.
105
286479
3777
不仅是科学研究在利用这个。
04:50
The fact that we can derive more knowledge
106
290256
1604
从各种信息中,我们可以通过拼接相关信息和发现关联性
04:51
by joining related information together
107
291860
2330
从而导出更多的信息。
04:54
and spotting correlations
108
294190
1551
这,让我们更消息灵通,
04:55
can inform and enrich numerous aspects of everyday life,
109
295741
3391
也可以丰富我们的日常生活。
04:59
either in real time,
110
299132
1028
无论是在实时,
05:00
such as traffic or financial conditions,
111
300160
2291
(比如信息量或金融信息)
05:02
in short-term evolutions,
112
302451
1755
在短期的演变
05:04
such as medical or meteorological,
113
304206
2127
(比如说医学或气象学)
05:06
or in predictive situations,
114
306333
1725
或者在需要预测的情况下
05:08
such as business, crime, or disease trends.
115
308058
3020
(比如说商业,犯罪,疾病发展趋势)。
05:13
Virtually every field is turning to gathering big data,
116
313369
3063
事实上,每一个领域都需要收集海量数据,
05:16
with mobile sensor networks spanning the globe,
117
316432
2337
比如遍布全球的移动感应网络,
05:18
cameras on the ground and in the air,
118
318769
2287
比如陆地或在空中都有的摄像器,
05:21
archives storing information published on the web,
119
321056
3011
比如网络信息档案集,
05:24
and loggers capturing the activities
120
324067
2129
和捕捉全球网民
05:26
of Internet citizens the world over.
121
326196
2699
网上活动的记录器。
05:28
The challenge is on to invent new tools and techniques
122
328895
2591
我们面临的挑战是去发明新的工具与新的技术
05:31
to mine these vast stores,
123
331486
1953
从而来挖掘这些巨大的存储箱,
05:33
to inform decision making,
124
333439
1801
帮助我们做正确的决定,
05:35
to improve medical diagnosis,
125
335240
2256
提高医学诊断正确率,
05:37
and otherwise to answer needs and desires
126
337496
2210
甚至推满足未来社会
05:39
of tomorrow's society in ways that are unimagined today.
127
339706
3957
尚无法想像的需求和渴望。
关于本网站

这个网站将向你介绍对学习英语有用的YouTube视频。你将看到来自世界各地的一流教师教授的英语课程。双击每个视频页面上显示的英文字幕,即可从那里播放视频。字幕会随着视频的播放而同步滚动。如果你有任何意见或要求,请使用此联系表与我们联系。

https://forms.gle/WvT1wiN1qDtmnspy7