Big Data - Tim Smith

探索巨量資料這新領域 - Tim Smith

589,741 views ・ 2013-05-03

TED-Ed


請雙擊下方英文字幕播放視頻。

00:00
Translator: Andrea McDonough Reviewer: Jessica Ruby
0
0
7000
譯者: Jephian Lin 審譯者: Jui-Hsin Chen
00:31
Big data is an elusive concept.
1
31085
2762
巨量資料是一種難以理解的觀念。 (譯註:又稱大數據。)
00:35
It represents an amount of digital information,
2
35987
2688
它代表數位資料的量,
00:38
which is uncomfortable to store,
3
38675
2170
它大到難以儲存、
00:40
transport,
4
40845
1128
傳輸、
00:41
or analyze.
5
41973
1878
或分析。
00:43
Big data is so voluminous
6
43851
1915
巨量數據非常龐大,
00:45
that it overwhelms the technologies of the day
7
45766
2708
以至於今日的科技無法處理它
00:48
and challenges us to create the next generation
8
48474
2425
並促使我們來研發新一代的
00:50
of data storage tools and techniques.
9
50899
3105
資料儲存設備以及科技。
00:59
So, big data isn't new.
10
59557
1779
所以,巨量資料並不是什麼新東西。
01:01
In fact, physicists at CERN have been rangling
11
61336
2358
事實上,CERN 的物理學家已經面對這個
01:03
with the challenge of their ever-expanding big data for decades.
12
63694
4399
資料不斷擴張的挑戰好幾十年了。
01:09
Fifty years ago, CERN's data could be stored
13
69431
2323
五十年前,CERN 的資料可以儲存在
01:11
in a single computer.
14
71754
1752
單一一臺電腦裡。
01:13
OK, so it wasn't your usual computer,
15
73506
2154
當然,它不是我們一般的電腦,
01:15
this was a mainframe computer
16
75660
1417
而是一臺大型電腦,
01:17
that filled an entire building.
17
77077
2310
它塞滿了一整棟房子。
01:21
To analyze the data,
18
81494
1169
如果要分析資料,
01:22
physicists from around the world traveled to CERN
19
82663
2948
物理學家們會從世界各地飛到 CERN
01:25
to connect to the enormous machine.
20
85611
3026
來使用這臺巨大的機器。
01:31
In the 1970's, our ever-growing big data
21
91075
2853
在 1970 年代,我們那不斷擴張的資料
01:33
was distributed across different sets of computers,
22
93928
2750
被分配在好幾組不同的電腦中,
01:36
which mushroomed at CERN.
23
96678
2030
這些電腦在 CERN 裡,如雨後春筍般地出現。
01:38
Each set was joined together
24
98708
1442
每組電腦只用自製的專用網路相連結。
01:40
in dedicated, homegrown networks.
25
100150
2528
每組電腦只用自製的專用網路相連結。
01:42
But physicists collaborated without regard
26
102678
1786
但是科學家的合作關係
01:44
for the boundaries between sets,
27
104464
1949
並不侷限在單一組電腦中,
01:46
hence needed to access data on all of these.
28
106413
2889
所以他們必須能夠在 所有電腦上運用這些資料。
01:49
So, we bridged the independent networks together
29
109302
1985
所以我們把各獨立的網路橋接在一起,
01:51
in our own CERNET.
30
111287
3092
成了我們的 CERNET。
01:54
In the 1980's, islands of similar networks
31
114379
2848
在 1980 年代,一群一群類似這樣的網路
01:57
speaking different dialects
32
117227
1544
在歐洲及美國各地湧現,
01:58
sprung up all over Europe and the States,
33
118771
2540
它們都用不同的方言,
02:01
making remote access possible but torturous.
34
121311
3091
這讓遠端連接變為可能,卻也令人折騰。
02:04
To make it easy for our physicists across the world
35
124402
2144
為了讓我們散佈在世界各地的物理學家
02:06
to access the ever-expanding big data
36
126546
2405
不用四處奔波就能得到存取在 CERN
02:08
stored at CERN without traveling,
37
128951
1793
那不斷更新的資料,
02:10
the networks needed to be talking
38
130744
1299
這個網路系統就必須使用
02:12
with the same language.
39
132043
1370
同一種語言。
02:13
We adopted the fledgling internet working standard from the States,
40
133413
3795
我們採用了美國那不成熟的標準系統,
02:17
followed by the rest of Europe,
41
137208
1376
之後歐洲其餘單位也接受了,
02:18
and we established the principal link at CERN
42
138584
2168
接著在 1989 年,我們在 CERN 建立了
02:20
between Europe and the States in 1989,
43
140752
2503
歐洲和美國的主要連線,
02:23
and the truly global internet took off!
44
143255
2786
這個正式的全球網路終於起飛了。
02:28
Physicists could easily then access
45
148580
1791
物理學家可以輕鬆地
02:30
the terabytes of big data
46
150371
1812
從世界各地
02:32
remotely from around the world,
47
152183
1663
存取到好幾 TB 的巨量資料,
02:33
generate results,
48
153846
1379
產生結果,
02:35
and write papers in their home institutes.
49
155225
2295
然後在自家的研究機構中撰寫論文。
02:37
Then, they wanted to share their findings
50
157520
1501
接著,他們想要和他們的同事
02:39
with all their colleagues.
51
159021
1792
分享他們的結果。
02:40
To make this information sharing easy,
52
160813
1603
為了讓資料分享更容易,
02:42
we created the web in the early 1990's.
53
162416
2942
我們在 1990 年代早期建構了一個網路。
02:45
Physicists no longer needed to know
54
165358
1838
物理學家不再須要先知道
02:47
where the information was stored
55
167196
1637
資料是儲存在哪裡
02:48
in order to find it and access it on the web,
56
168833
2569
然後才能存取資料,
02:51
an idea which caught on across the world
57
171402
2134
一個傳遍世界的想法
02:53
and has transformed the way we communicate
58
173536
2376
改變了我們日常通訊的方式。
02:55
in our daily lives.
59
175912
1668
改變了我們日常通訊的方式。
03:00
During the early 2000's,
60
180226
1407
在 2000 年代早期,
03:01
the continued growth of our big data
61
181633
1990
我們這個愈變愈大的巨量資料
03:03
outstripped our capability to analyze it at CERN,
62
183623
3291
超過了我們 CERN 能夠處理的能力,
03:06
despite having buildings full of computers.
63
186914
3585
除非所有空間都塞滿電腦。
03:10
We had to start distributing the petabytes of data
64
190499
2306
我們必須開始將這好幾 PB 的資料 (譯註:PB = 1,024 TB。)
03:12
to our collaborating partners
65
192805
1582
分配儲存在我們的合作伙伴那,
03:14
in order to employ local computing and storage
66
194387
2752
這樣才有辦法利用各地上百個不同機構的
03:17
at hundreds of different institutes.
67
197139
2835
計算儲存資源。
03:19
In order to orchestrate these interconnected resources
68
199974
2295
為了要讓這些錯綜複雜的資源 在各地不同的系統中
03:22
with their diverse technologies,
69
202269
2044
能協調運作,
03:24
we developed a computing grid,
70
204313
1751
我們發展了一套計算網格,
03:26
enabling the seamless sharing
71
206064
1576
讓世界各地的計算資源
03:27
of computing resources around the globe.
72
207640
2428
得以無縫地分享。
03:30
This relies on trust relationships and mutual exchange.
73
210068
4391
這要依靠彼此的信賴關係以及資源交換。
03:34
But this grid model could not be transferred
74
214459
2293
但這個網格模型沒辦法簡單地
03:36
out of our community so easily,
75
216752
2284
移轉出我們這個群體,
03:39
where not everyone has resources to share
76
219036
2294
因為不是所有人都有資源可以分享
03:41
nor could companies be expected
77
221330
1876
而各公司之間也沒辦法
03:43
to have the same level of trust.
78
223206
2753
被期望能有相同層級的信賴。
03:45
Instead, an alternative, more business-like approach
79
225959
2295
取而代之的是,針對存取須求的資源,
03:48
for accessing on-demand resources
80
228254
1836
有一個商業取向的替代方案
03:50
has been flourishing recently,
81
230090
1708
近期正在蓬勃發展,
03:51
called cloud computing,
82
231798
1668
它叫做雲端計算,
03:53
which other communities are now exploiting
83
233466
1876
有些其它的群體正利用它
03:55
to analyzing their big data.
84
235342
2000
來分析它們的巨量資料。
03:57
It might seem paradoxical for a place like CERN,
85
237342
2987
這對於 CERN 這個地方來說, 聽起來可能有點衝突,
04:00
a lab focused on the study
86
240329
1571
一個專注於研究物質的極小構成要素的實驗室,
04:01
of the unimaginably small building blocks of matter,
87
241900
3171
一個專注於研究物質的 極小構成要素的實驗室
04:05
to be the source of something as big as big data.
88
245071
3377
竟然是這樣巨量資料的來源。
04:08
But the way we study the fundamental particles,
89
248448
2082
但是我們研究基本粒子
04:10
as well as the forces by which they interact,
90
250530
2613
以及它們的交互作用力的方法,
04:13
involves creating them fleetingly,
91
253143
2103
包含了在瞬間產生這些粒子、
04:15
colliding protons in our accelerators
92
255246
2368
在我們的加速器中碰撞質子、
04:17
and capturing a trace of them
93
257614
1427
以及在它們以近光速運動時
04:19
as they zoom off near light speed.
94
259041
2273
捕捉他們的軌跡。
04:21
To see those traces,
95
261314
994
要見到這些軌跡,
04:22
our detector, with 150 million sensors,
96
262308
3448
我們的偵測器, 包含了一億五千萬個感應器,
04:25
acts like a really massive 3-D camera,
97
265756
2475
像是一個非常巨大的 3-D 攝影機,
04:28
taking a picture of each collision event -
98
268231
2110
記錄每一次碰撞
04:30
that's up to 14 millions times per second.
99
270341
2550
──這可能會高到每秒一千四百萬次。
04:32
That makes a lot of data.
100
272891
2533
這會產生大量的數據。
04:37
But if big data has been around for so long,
101
277194
2159
但是如果巨量資料已經存在這麼久了,
04:39
why do we suddenly keep hearing about it now?
102
279353
2627
為什麼我們現在才不斷聽到它?
04:41
Well, as the old metaphor explains,
103
281980
1711
這個嘛,就像一個古老的比喻所說的,
04:43
the whole is greater than the sum of its parts,
104
283691
2788
整體強過它所有部份的總和,
04:46
and this is no longer just science that is exploiting this.
105
286479
3777
而已經不再只有科學在開發這塊。
04:50
The fact that we can derive more knowledge
106
290256
1604
我們可以藉由連結相關的資訊
04:51
by joining related information together
107
291860
2330
以及開發合作關係來增長知識,
04:54
and spotting correlations
108
294190
1551
而這項事實
04:55
can inform and enrich numerous aspects of everyday life,
109
295741
3391
可以滋潤並強化 日常生活中的許多部份,
04:59
either in real time,
110
299132
1028
無論是在即時資訊中,
05:00
such as traffic or financial conditions,
111
300160
2291
比如交通或是財政狀況;
05:02
in short-term evolutions,
112
302451
1755
或在短期的演化上,
05:04
such as medical or meteorological,
113
304206
2127
比如醫學或是天氣學;
05:06
or in predictive situations,
114
306333
1725
或是在預測情勢上,
05:08
such as business, crime, or disease trends.
115
308058
3020
有商業、犯罪、或是疾病趨勢。
05:13
Virtually every field is turning to gathering big data,
116
313369
3063
實際上每個領域都 漸漸開始搜集巨量資料,
05:16
with mobile sensor networks spanning the globe,
117
316432
2337
像是跨越全球的行動裝置網路、
05:18
cameras on the ground and in the air,
118
318769
2287
地面及空中的攝影機、
05:21
archives storing information published on the web,
119
321056
3011
儲存發表在網路上的資訊的資料庫、
05:24
and loggers capturing the activities
120
324067
2129
以及記載各地網民活動
05:26
of Internet citizens the world over.
121
326196
2699
的記錄器。
05:28
The challenge is on to invent new tools and techniques
122
328895
2591
這個挑戰在於要 發明一項新的工具以及技術
05:31
to mine these vast stores,
123
331486
1953
來儲存這大量的資料、
05:33
to inform decision making,
124
333439
1801
來為決策提供資訊,
05:35
to improve medical diagnosis,
125
335240
2256
來改進醫學診斷、
05:37
and otherwise to answer needs and desires
126
337496
2210
以及回應一些今日沒想過的
05:39
of tomorrow's society in ways that are unimagined today.
127
339706
3957
明日社會的需求與渴望。
關於本網站

本網站將向您介紹對學習英語有用的 YouTube 視頻。 您將看到來自世界各地的一流教師教授的英語課程。 雙擊每個視頻頁面上顯示的英文字幕,從那裡播放視頻。 字幕與視頻播放同步滾動。 如果您有任何意見或要求,請使用此聯繫表與我們聯繫。

https://forms.gle/WvT1wiN1qDtmnspy7