Kenneth Cukier: Big data is better data

517,498 views ・ 2014-09-23

TED


請雙擊下方英文字幕播放視頻。

譯者: Yesbydefault 倪文娟 審譯者: Rocky LIANG
00:12
America's favorite pie is?
0
12787
3845
美國人最喜歡哪一種派?
00:16
Audience: Apple. Kenneth Cukier: Apple. Of course it is.
1
16632
3506
觀眾:蘋果派。
講者:蘋果派,當然啦!
00:20
How do we know it?
2
20138
1231
我們怎麼知道?
00:21
Because of data.
3
21369
2753
因為有數據。
00:24
You look at supermarket sales.
4
24122
2066
我們分析超市銷售數據,
00:26
You look at supermarket sales of 30-centimeter pies
5
26188
2866
分析直徑 30 公分冷凍蘋果派的 超市銷售數據,
00:29
that are frozen, and apple wins, no contest.
6
29054
4075
蘋果派最夯,銷量一面倒。
顧客幾乎都是買蘋果派。
00:33
The majority of the sales are apple.
7
33129
5180
00:38
But then supermarkets started selling
8
38309
2964
但是後來,超市開始賣小派,
00:41
smaller, 11-centimeter pies,
9
41273
2583
直徑 11 公分的派,
00:43
and suddenly, apple fell to fourth or fifth place.
10
43856
4174
突然,蘋果派銷量掉到第四、五名,
00:48
Why? What happened?
11
48030
2875
為什麼?發生了什麼事?
00:50
Okay, think about it.
12
50905
2818
好,你想想:
00:53
When you buy a 30-centimeter pie,
13
53723
3848
如果是買 30 公分的大派,
00:57
the whole family has to agree,
14
57571
2261
全家人都得同意,
00:59
and apple is everyone's second favorite.
15
59832
3791
而蘋果是全家每個人的第二選擇,
01:03
(Laughter)
16
63623
1935
(觀眾笑聲)
01:05
But when you buy an individual 11-centimeter pie,
17
65558
3615
但是當你分開買 11 公分的小派,
01:09
you can buy the one that you want.
18
69173
3745
就可以買你自己想吃的,
01:12
You can get your first choice.
19
72918
4015
每個人都可以選自己最愛的口味。
01:16
You have more data.
20
76933
1641
這就會產生更多的數據。
01:18
You can see something
21
78574
1554
你會有新發現,
01:20
that you couldn't see
22
80128
1132
看出數據少的時候, 無法發現的現象。
01:21
when you only had smaller amounts of it.
23
81260
3953
01:25
Now, the point here is that more data
24
85213
2475
現在,這個例子的重點是,
數據增加,不只是讓我們看見更「多」,
01:27
doesn't just let us see more,
25
87688
2283
01:29
more of the same thing we were looking at.
26
89971
1854
更多我們本來就已經知道的;
01:31
More data allows us to see new.
27
91825
3613
數據增加,讓我們看見「新」資訊,
01:35
It allows us to see better.
28
95438
3094
看得更「準確」,
01:38
It allows us to see different.
29
98532
3656
看見「不同」。
01:42
In this case, it allows us to see
30
102188
3173
在這個例子,它使我們看到
01:45
what America's favorite pie is:
31
105361
2913
美國人真正最喜歡的派是什麼:
01:48
not apple.
32
108274
2542
不是蘋果派。
01:50
Now, you probably all have heard the term big data.
33
110816
3614
你們可能都聽過「大數據」這個詞,
01:54
In fact, you're probably sick of hearing the term
34
114430
2057
其實,你們可能已經聽膩了。
01:56
big data.
35
116487
1630
01:58
It is true that there is a lot of hype around the term,
36
118117
3330
的確有很多大肆宣傳,
02:01
and that is very unfortunate,
37
121447
2332
非常遺憾。
02:03
because big data is an extremely important tool
38
123779
3046
因為大數據是極為重要的工具,
02:06
by which society is going to advance.
39
126825
3734
將會推動社會進步。
02:10
In the past, we used to look at small data
40
130559
3561
過去,我們依賴少量數據,
02:14
and think about what it would mean
41
134120
1704
研究其含義,
02:15
to try to understand the world,
42
135824
1496
試圖了解我們的世界。
02:17
and now we have a lot more of it,
43
137320
1991
現在我們有了更多數據,
02:19
more than we ever could before.
44
139311
2722
遠超過以往能力所及。
02:22
What we find is that when we have
45
142033
1877
我們發現,
02:23
a large body of data, we can fundamentally do things
46
143910
2724
當我們擁有龐大的數據,
就可以做過去數據較少時做不到的事。
02:26
that we couldn't do when we only had smaller amounts.
47
146634
3276
02:29
Big data is important, and big data is new,
48
149910
2641
大數據很重要,
大數據也很新。
02:32
and when you think about it,
49
152551
1777
你想一想,
02:34
the only way this planet is going to deal
50
154328
2216
唯一能幫助地球因應全球的挑戰:
02:36
with its global challenges —
51
156544
1789
02:38
to feed people, supply them with medical care,
52
158333
3537
解決饑荒、
提供醫療、
02:41
supply them with energy, electricity,
53
161870
2810
提供能源和電力、
02:44
and to make sure they're not burnt to a crisp
54
164680
1789
確保我們不被全球暖化烤焦,
02:46
because of global warming —
55
166469
1238
02:47
is because of the effective use of data.
56
167707
4195
唯一的方法,就是靠善用數據。
02:51
So what is new about big data? What is the big deal?
57
171902
3870
所以大數據有什麼稀奇?
有什麼好「大」驚小怪?
02:55
Well, to answer that question, let's think about
58
175772
2517
要回答這個問題,
讓我們先來看資訊以前長什麼樣子。
02:58
what information looked like,
59
178289
1896
03:00
physically looked like in the past.
60
180185
3034
03:03
In 1908, on the island of Crete,
61
183219
3611
好,
1908 年,在克里特島,
03:06
archaeologists discovered a clay disc.
62
186830
4735
考古學家發現一個泥土圓盤,
03:11
They dated it from 2000 B.C., so it's 4,000 years old.
63
191565
4059
鑑定大約是公元前 2 千年製成的,
所以已經有 4 千年之久。
03:15
Now, there's inscriptions on this disc,
64
195624
2004
圓盤上刻有古文字,
03:17
but we actually don't know what it means.
65
197628
1327
但無法解讀,
03:18
It's a complete mystery, but the point is that
66
198955
2098
是個謎團。
但重點是,4 千年前資訊是這個樣貌,
03:21
this is what information used to look like
67
201053
1928
03:22
4,000 years ago.
68
202981
2089
古人是用這種方式儲存、傳遞資訊。
03:25
This is how society stored
69
205070
2548
03:27
and transmitted information.
70
207618
3524
03:31
Now, society hasn't advanced all that much.
71
211142
4160
到現在,社會並沒有進步那麼多,
03:35
We still store information on discs,
72
215302
3474
我們還是把資訊存在碟片上,
03:38
but now we can store a lot more information,
73
218776
3184
只是現在可以儲存更多資訊,
03:41
more than ever before.
74
221960
1260
空前的多。
03:43
Searching it is easier. Copying it easier.
75
223220
3093
搜尋更容易,複製更容易,
03:46
Sharing it is easier. Processing it is easier.
76
226313
3500
分享更容易,處理更容易。
03:49
And what we can do is we can reuse this information
77
229813
2766
我們可以重複使用這些資訊,
03:52
for uses that we never even imagined
78
232579
1834
用途之廣,超乎想像,
03:54
when we first collected the data.
79
234413
3195
超乎我們蒐集資訊時的預期。
03:57
In this respect, the data has gone
80
237608
2252
這樣看來,資訊已經
03:59
from a stock to a flow,
81
239860
3532
從「存料」 變成「流動」;
04:03
from something that is stationary and static
82
243392
3938
從靜止、靜態的,
04:07
to something that is fluid and dynamic.
83
247330
3609
變成流體、動態的。
04:10
There is, if you will, a liquidity to information.
84
250939
4023
資訊可說是,有流動性。
04:14
The disc that was discovered off of Crete
85
254962
3474
那個 4 千年之久的克里特圓盤,
04:18
that's 4,000 years old, is heavy,
86
258436
3764
它很重,
04:22
it doesn't store a lot of information,
87
262200
1962
儲存的資訊量不多,
04:24
and that information is unchangeable.
88
264162
3116
內容也不能更改。
04:27
By contrast, all of the files
89
267278
4011
相較之下,
愛德華.史諾登盜走的所有檔案,
04:31
that Edward Snowden took
90
271289
1861
04:33
from the National Security Agency in the United States
91
273150
2621
就是他從美國國安局竊走的資料,
04:35
fits on a memory stick
92
275771
2419
可以全部存在一個記憶卡,
04:38
the size of a fingernail,
93
278190
3010
體積只有指甲般的大小。
04:41
and it can be shared at the speed of light.
94
281200
4745
並且可以用光速來傳輸分享。
04:45
More data. More.
95
285945
5255
更多的數據!
更多。
今天之所以有這麼多的數據,
04:51
Now, one reason why we have so much data in the world today
96
291200
1974
04:53
is we are collecting things
97
293174
1432
原因之一是 我們正在蒐集過去
04:54
that we've always collected information on,
98
294606
3280
儲存資訊的物體;
04:57
but another reason why is we're taking things
99
297886
2656
原因之二是,
我們把一些經常很資訊性的東西——
05:00
that have always been informational
100
300542
2812
05:03
but have never been rendered into a data format
101
303354
2486
從未數據化的資訊,
05:05
and we are putting it into data.
102
305840
2419
把它們變成數據,
05:08
Think, for example, the question of location.
103
308259
3308
例如,地理位置。
05:11
Take, for example, Martin Luther.
104
311567
2249
舉馬丁.路德為例,
05:13
If we wanted to know in the 1500s
105
313816
1597
如果我們想知道十六世紀時,
05:15
where Martin Luther was,
106
315413
2667
馬丁.路德去過哪些地方,
05:18
we would have to follow him at all times,
107
318080
2092
我們必須隨時跟著他到處跑,
05:20
maybe with a feathery quill and an inkwell,
108
320172
2137
可能還要帶著羽毛筆和墨水瓶,
05:22
and record it,
109
322309
1676
隨時記錄。
05:23
but now think about what it looks like today.
110
323985
2183
但是看看現在的做法,
05:26
You know that somewhere,
111
326168
2122
你知道世界上某處,
05:28
probably in a telecommunications carrier's database,
112
328290
2446
可能是電信商的資料庫裡面,
05:30
there is a spreadsheet or at least a database entry
113
330736
3036
有一個試算表 或至少有一筆記錄,
05:33
that records your information
114
333772
2088
存著關於你的資訊,
05:35
of where you've been at all times.
115
335860
2063
記錄你去過的所有地方。
05:37
If you have a cell phone,
116
337923
1360
如果你有一支手機,
05:39
and that cell phone has GPS, but even if it doesn't have GPS,
117
339283
2847
手機有 GPS,但就算沒有 GPS,
05:42
it can record your information.
118
342130
2385
還是可以記錄你的資訊。
05:44
In this respect, location has been datafied.
119
344515
4084
就這個角度來說,位置已經被數據化。
05:48
Now think, for example, of the issue of posture,
120
348599
4601
現在再想想這個例子:姿勢,
05:53
the way that you are all sitting right now,
121
353200
1285
就是你們現在的坐姿,
05:54
the way that you sit,
122
354485
2030
你的坐姿、
05:56
the way that you sit, the way that you sit.
123
356515
2771
你的坐姿,和你的坐姿,
05:59
It's all different, and it's a function of your leg length
124
359286
2077
都不一樣,取決於你的腿長、
06:01
and your back and the contours of your back,
125
361363
2093
你的背和背部輪廓。
06:03
and if I were to put sensors, maybe 100 sensors
126
363456
2531
要是我現在裝 1 百個感應器,
06:05
into all of your chairs right now,
127
365987
1766
到你們每個人的椅子上,
06:07
I could create an index that's fairly unique to you,
128
367753
3600
我可以建出你個人獨特的索引資料,
06:11
sort of like a fingerprint, but it's not your finger.
129
371353
4409
有點像指紋,但不是你的手指。
06:15
So what could we do with this?
130
375762
2969
這有什麼用?
06:18
Researchers in Tokyo are using it
131
378731
2397
東京的研究員用這種數據
06:21
as a potential anti-theft device in cars.
132
381128
4388
來研發汽車防盜裝置。
06:25
The idea is that the carjacker sits behind the wheel,
133
385516
2924
概念是,偷車賊坐在駕駛座,
06:28
tries to stream off, but the car recognizes
134
388440
2104
急著開車逃逸,
但是車子辨識出開車的人未經授權,
06:30
that a non-approved driver is behind the wheel,
135
390544
2362
06:32
and maybe the engine just stops, unless you
136
392906
2164
引擎就自動熄火,
除非你輸入密碼到儀表板,
06:35
type in a password into the dashboard
137
395070
3177
06:38
to say, "Hey, I have authorization to drive." Great.
138
398247
4658
告訴系統:「嘿,我可是有經授權喔!」
很好。
06:42
What if every single car in Europe
139
402905
2553
若歐洲每輛汽車都有這個裝置呢?
06:45
had this technology in it?
140
405458
1457
06:46
What could we do then?
141
406915
3165
那又能做什麼?
06:50
Maybe, if we aggregated the data,
142
410080
2240
或許,我們可以聚集所有的數據,
06:52
maybe we could identify telltale signs
143
412320
3814
或許能提早偵測到警訊,
06:56
that best predict that a car accident
144
416134
2709
預測車禍
06:58
is going to take place in the next five seconds.
145
418843
5893
即將在 5 秒鐘內發生。
07:04
And then what we will have datafied
146
424736
2557
然後我們還可以數據化
07:07
is driver fatigue,
147
427293
1783
駕駛員的疲勞狀態,
07:09
and the service would be when the car senses
148
429076
2334
汽車系統可以偵測到
07:11
that the person slumps into that position,
149
431410
3437
駕駛癱坐成某個姿勢,
07:14
automatically knows, hey, set an internal alarm
150
434847
3994
自動感知,發出指令啟動響鈴,
07:18
that would vibrate the steering wheel, honk inside
151
438841
2025
導致方向盤震動,
07:20
to say, "Hey, wake up,
152
440866
1721
車內喇叭作響,大喊:「嘿,快醒來!
07:22
pay more attention to the road."
153
442587
1904
注意路況!」
07:24
These are the sorts of things we can do
154
444491
1853
這一類的事都可以做到,
07:26
when we datafy more aspects of our lives.
155
446344
2821
當我們把更多的生活層面數據化。
07:29
So what is the value of big data?
156
449165
3675
那麼,大數據究竟有什麼價值?
07:32
Well, think about it.
157
452840
2190
想想看,
07:35
You have more information.
158
455030
2412
現在有更多資訊,
07:37
You can do things that you couldn't do before.
159
457442
3341
可以做過去不能做的事。
07:40
One of the most impressive areas
160
460783
1676
這概念的應用當中,最驚人的領域之一,
07:42
where this concept is taking place
161
462459
1729
07:44
is in the area of machine learning.
162
464188
3307
就是「機器學習」。
07:47
Machine learning is a branch of artificial intelligence,
163
467495
3077
機器學習是人工智慧的一個分支,
07:50
which itself is a branch of computer science.
164
470572
3378
人工智慧又是電腦科學的分支。
07:53
The general idea is that instead of
165
473950
1543
基本概念是:
07:55
instructing a computer what do do,
166
475493
2117
不必告訴電腦要做什麼,
07:57
we are going to simply throw data at the problem
167
477610
2620
只要把數據輸入到問題裡,
08:00
and tell the computer to figure it out for itself.
168
480230
3206
然後叫電腦自己想辦法。
08:03
And it will help you understand it
169
483436
1777
我們回顧一下源頭, 就會比較容易了解。
08:05
by seeing its origins.
170
485213
3552
08:08
In the 1950s, a computer scientist
171
488765
2388
1950 年代,IBM 有位電腦科學家
08:11
at IBM named Arthur Samuel liked to play checkers,
172
491153
3592
名叫亞瑟.山姆爾,很愛下跳棋,
08:14
so he wrote a computer program
173
494745
1402
所以他寫了一個電腦程式,
08:16
so he could play against the computer.
174
496147
2813
叫電腦跟他對打。
08:18
He played. He won.
175
498960
2711
他開始下棋,結果他贏了。
08:21
He played. He won.
176
501671
2103
他再開始下棋,結果他又贏了。
08:23
He played. He won,
177
503774
3015
他再下,還是他贏。
08:26
because the computer only knew
178
506789
1778
因為電腦只會
08:28
what a legal move was.
179
508567
2227
棋步的規則。
08:30
Arthur Samuel knew something else.
180
510794
2087
而亞瑟.山姆爾會得更多,
08:32
Arthur Samuel knew strategy.
181
512881
4629
他懂得策略。
08:37
So he wrote a small sub-program alongside it
182
517510
2396
所以他又寫了一個副程式,
08:39
operating in the background, and all it did
183
519906
1974
在背景執行,只做一件事:
08:41
was score the probability
184
521880
1817
就是計算機率,
08:43
that a given board configuration would likely lead
185
523697
2563
評估目前的棋局,
08:46
to a winning board versus a losing board
186
526260
2910
比較贏棋和輸棋的機率,
08:49
after every move.
187
529170
2508
每下一步棋,就重算一次。
08:51
He plays the computer. He wins.
188
531678
3150
然後他又跟電腦對打,結果他贏。
08:54
He plays the computer. He wins.
189
534828
2508
再對打,還是他贏。
08:57
He plays the computer. He wins.
190
537336
3731
再對打,還是他贏。
09:01
And then Arthur Samuel leaves the computer
191
541067
2277
然後亞瑟.山姆爾讓電腦自己對打。
09:03
to play itself.
192
543344
2227
09:05
It plays itself. It collects more data.
193
545571
3509
它就自己下棋,一邊收集數據。
09:09
It collects more data. It increases the accuracy of its prediction.
194
549080
4309
越收集越多,它的預測準確度就提高。
09:13
And then Arthur Samuel goes back to the computer
195
553389
2104
然後亞瑟.山姆爾再回來跟電腦對打。
09:15
and he plays it, and he loses,
196
555493
2318
他開始下棋,結果他輸了。
09:17
and he plays it, and he loses,
197
557811
2069
他又下,又輸了。
09:19
and he plays it, and he loses,
198
559880
2047
再下,還是輸。
09:21
and Arthur Samuel has created a machine
199
561927
2599
亞瑟.山姆爾創造了一台機器,
09:24
that surpasses his ability in a task that he taught it.
200
564526
6288
它的能力青出於藍,更甚於藍。
09:30
And this idea of machine learning
201
570814
2498
而這種機器學習的概念,
09:33
is going everywhere.
202
573312
3927
現在到處可見。
09:37
How do you think we have self-driving cars?
203
577239
3149
你想我們怎麼會有自動駕駛汽車?
09:40
Are we any better off as a society
204
580388
2137
把全部交通規則都輸入到軟體, 可以改善社會嗎?
09:42
enshrining all the rules of the road into software?
205
582525
3285
09:45
No. Memory is cheaper. No.
206
585810
2598
不是。
因為記憶體更便宜嗎?不是。
09:48
Algorithms are faster. No. Processors are better. No.
207
588408
3994
演算法變快了?不。
有更好的處理器?不。
09:52
All of those things matter, but that's not why.
208
592402
2772
這些都很重要,但不是真正的原因。
09:55
It's because we changed the nature of the problem.
209
595174
3141
真正的原因是 我們改變了問題的本質。
09:58
We changed the nature of the problem from one
210
598315
1530
我們把問題從
09:59
in which we tried to overtly and explicitly
211
599845
2245
明確指示電腦如何開車,
10:02
explain to the computer how to drive
212
602090
2581
10:04
to one in which we say,
213
604671
1316
改成對電腦說:
10:05
"Here's a lot of data around the vehicle.
214
605987
1876
「我給你大量的開車數據,
10:07
You figure it out.
215
607863
1533
你自個兒看著辦吧!」
10:09
You figure it out that that is a traffic light,
216
609396
1867
你自己判斷出那是紅綠燈,
10:11
that that traffic light is red and not green,
217
611263
2081
而且現在亮紅燈,不是綠燈,
10:13
that that means that you need to stop
218
613344
2014
表示你要停車,
10:15
and not go forward."
219
615358
3083
不能繼續開。」
10:18
Machine learning is at the basis
220
618441
1518
機器學習也是
10:19
of many of the things that we do online:
221
619959
1991
我們許多網路活動的基礎:
10:21
search engines,
222
621950
1857
搜尋引擎、
10:23
Amazon's personalization algorithm,
223
623807
3801
亞馬遜的個人化演算法、
10:27
computer translation,
224
627608
2212
電腦翻譯、
10:29
voice recognition systems.
225
629820
4290
語音辨識系統。
10:34
Researchers recently have looked at
226
634110
2835
研究專家近來研究
10:36
the question of biopsies,
227
636945
3195
活組織切片檢查,
10:40
cancerous biopsies,
228
640140
2767
癌組織切片,
10:42
and they've asked the computer to identify
229
642907
2315
他們叫電腦自己判別,
10:45
by looking at the data and survival rates
230
645222
2471
電腦分析數據和存活率,
10:47
to determine whether cells are actually
231
647693
4667
判斷是否為癌症細胞。
10:52
cancerous or not,
232
652360
2544
10:54
and sure enough, when you throw the data at it,
233
654904
1778
果然,當你把數據丟給電腦,
10:56
through a machine-learning algorithm,
234
656682
2047
透過一個機器學習的演算法,
10:58
the machine was able to identify
235
658729
1877
電腦真的能找出
11:00
the 12 telltale signs that best predict
236
660606
2262
12 大危險徵兆,
11:02
that this biopsy of the breast cancer cells
237
662868
3299
預測這個乳房癌細胞的切片
11:06
are indeed cancerous.
238
666167
3218
真的就是癌腫瘤。
11:09
The problem: The medical literature
239
669385
2498
問題來了:醫學文獻只知道
11:11
only knew nine of them.
240
671883
2789
其中 9 項。
11:14
Three of the traits were ones
241
674672
1800
另外 3 項特性
11:16
that people didn't need to look for,
242
676472
2975
是我們以前不需檢查的,
11:19
but that the machine spotted.
243
679447
5531
卻被電腦找出來了。
11:24
Now, there are dark sides to big data as well.
244
684978
5925
好。
不過,大數據也有不好的一面。
11:30
It will improve our lives, but there are problems
245
690903
2074
它會改善我們的生活,
11:32
that we need to be conscious of,
246
692977
2640
但是也有我們必須注意的問題。
11:35
and the first one is the idea
247
695617
2623
第一,
我們可能因為預測而受罰,
11:38
that we may be punished for predictions,
248
698240
2686
11:40
that the police may use big data for their purposes,
249
700926
3870
警察可能會利用大數據來辦案,
11:44
a little bit like "Minority Report."
250
704796
2351
有點像電影《關鍵報告》。
11:47
Now, it's a term called predictive policing,
251
707147
2441
這叫做「預測性警務」,
11:49
or algorithmic criminology,
252
709588
2363
或「演算犯罪學」。
11:51
and the idea is that if we take a lot of data,
253
711951
2036
原理是,我們蒐集大量數據,
11:53
for example where past crimes have been,
254
713987
2159
例如,分析過去犯罪發生地點的大數據,
11:56
we know where to send the patrols.
255
716146
2543
我們就知道要往哪裡派送警力。
11:58
That makes sense, but the problem, of course,
256
718689
2115
這很合邏輯。但問題是,當然,
12:00
is that it's not simply going to stop on location data,
257
720804
4544
這種策略不會 只限犯罪地點的數據,
12:05
it's going to go down to the level of the individual.
258
725348
2959
而會一直延伸到個人資料。
12:08
Why don't we use data about the person's
259
728307
2250
何不利用人們的
12:10
high school transcript?
260
730557
2228
高中成績單?
12:12
Maybe we should use the fact that
261
732785
1561
或許我們可以看看
12:14
they're unemployed or not, their credit score,
262
734346
2028
他們是否失業、信用評等、
12:16
their web-surfing behavior,
263
736374
1552
上網瀏覽行為、
12:17
whether they're up late at night.
264
737926
1878
是否熬夜、
12:19
Their Fitbit, when it's able to identify biochemistries,
265
739804
3161
Fitbit 智慧健康手環, 當它能識別個人生化數據,
12:22
will show that they have aggressive thoughts.
266
742965
4236
可看出主人是否有攻擊性的想法。
12:27
We may have algorithms that are likely to predict
267
747201
2221
可能有演算法 會預測我們將要做什麼事,
12:29
what we are about to do,
268
749422
1633
12:31
and we may be held accountable
269
751055
1244
可能還沒有付諸行動,就得負責。
12:32
before we've actually acted.
270
752299
2590
12:34
Privacy was the central challenge
271
754889
1732
在小數據時代, 最重要的挑戰是隱私。
12:36
in a small data era.
272
756621
2880
12:39
In the big data age,
273
759501
2149
在大數據時代,
12:41
the challenge will be safeguarding free will,
274
761650
4523
挑戰則變成保衛自由意志、
12:46
moral choice, human volition,
275
766173
3779
道德選擇、人的意志、
12:49
human agency.
276
769952
3068
人的「能動性」(human agency)。
12:54
There is another problem:
277
774540
2225
還有一個問題:
12:56
Big data is going to steal our jobs.
278
776765
3556
大數據會搶走我們的工作。
13:00
Big data and algorithms are going to challenge
279
780321
3512
大數據和演算法將會挑戰
13:03
white collar, professional knowledge work
280
783833
3061
21 世紀的白領、專業知識工作,
13:06
in the 21st century
281
786894
1653
13:08
in the same way that factory automation
282
788547
2434
就像工廠自動化和生產線
13:10
and the assembly line
283
790981
2189
13:13
challenged blue collar labor in the 20th century.
284
793170
3026
在 20 世紀挑戰藍領工作者一樣。
13:16
Think about a lab technician
285
796196
2092
試想一位實驗室技術員,
13:18
who is looking through a microscope
286
798288
1409
他正在用顯微鏡看腫瘤切片,
13:19
at a cancer biopsy
287
799697
1624
13:21
and determining whether it's cancerous or not.
288
801321
2637
要判斷是否為癌細胞。
13:23
The person went to university.
289
803958
1972
他唸過大學,
13:25
The person buys property.
290
805930
1430
買了房子,
13:27
He or she votes.
291
807360
1741
會投票,
13:29
He or she is a stakeholder in society.
292
809101
3666
他與社會利害相關。
13:32
And that person's job,
293
812767
1394
他的工作,及許多像他一樣的專業人士,
13:34
as well as an entire fleet
294
814161
1609
13:35
of professionals like that person,
295
815770
1969
13:37
is going to find that their jobs are radically changed
296
817739
3150
將發現他們的工作起了劇變,
13:40
or actually completely eliminated.
297
820889
2357
甚至完全被淘汰。
13:43
Now, we like to think
298
823246
1284
我們喜歡相信
13:44
that technology creates jobs over a period of time
299
824530
3187
長遠來說,科技創造工作機會,
13:47
after a short, temporary period of dislocation,
300
827717
3465
即使剛開始會先經歷 短暫的錯亂與重組,
13:51
and that is true for the frame of reference
301
831182
1941
這對我們所處的工業革命時代來說, 並沒有錯,
13:53
with which we all live, the Industrial Revolution,
302
833123
2142
13:55
because that's precisely what happened.
303
835265
2328
因為事實的確如此。
13:57
But we forget something in that analysis:
304
837593
2333
但是這個分析遺漏了一點:
13:59
There are some categories of jobs
305
839926
1830
有些工作類別其實已經消失,
14:01
that simply get eliminated and never come back.
306
841756
3420
且從未起死回生。
14:05
The Industrial Revolution wasn't very good
307
845176
2004
如果你是一匹馬, 那麼工業革命對你並不利。
14:07
if you were a horse.
308
847180
4002
14:11
So we're going to need to be careful
309
851182
2055
所以我們必須非常謹慎,
14:13
and take big data and adjust it for our needs,
310
853237
3514
正確駕馭大數據, 調整它以適應我們所需,
14:16
our very human needs.
311
856751
3185
滿足我們的人性需求。
14:19
We have to be the master of this technology,
312
859936
1954
我們必須成為這項科技的主人,
14:21
not its servant.
313
861890
1656
而不是淪為它的奴隸。
14:23
We are just at the outset of the big data era,
314
863546
2958
大數據時代才正開始,
14:26
and honestly, we are not very good
315
866504
3150
老實說,我們並不是很擅長
14:29
at handling all the data that we can now collect.
316
869654
4207
處理我們能蒐集的龐大數據資料。
14:33
It's not just a problem for the National Security Agency.
317
873861
3330
這不只是國安局的問題,
14:37
Businesses collect lots of data, and they misuse it too,
318
877191
3038
企業也蒐集大量資料, 同樣也誤用、濫用。
14:40
and we need to get better at this, and this will take time.
319
880229
3667
我們都必須學習怎麼正確運用, 而這需要時間。
14:43
It's a little bit like the challenge that was faced
320
883896
1822
有點像原始人用火 所面臨的挑戰。
14:45
by primitive man and fire.
321
885718
2407
14:48
This is a tool, but this is a tool that,
322
888125
1885
大數據是個工具,
14:50
unless we're careful, will burn us.
323
890010
3559
如果運用失當,就會燒傷我們。
14:56
Big data is going to transform how we live,
324
896008
3120
大數據將改變我們如何生活、
14:59
how we work and how we think.
325
899128
2801
工作,和思考。
15:01
It is going to help us manage our careers
326
901929
1889
它可以幫助我們管理職涯,
15:03
and lead lives of satisfaction and hope
327
903818
3634
讓我們過滿意、夢想的生活,
15:07
and happiness and health,
328
907452
2992
帶來快樂與健康。
15:10
but in the past, we've often looked at information technology
329
910444
3306
以往,我們常在看待「資訊科技」時,
15:13
and our eyes have only seen the T,
330
913750
2208
只專注在「科技」,
15:15
the technology, the hardware,
331
915958
1686
只重視硬體,
15:17
because that's what was physical.
332
917644
2262
因為它具體可見。
15:19
We now need to recast our gaze at the I,
333
919906
2924
現在我們必須重新對焦,
15:22
the information,
334
922830
1380
轉向「資訊」,
15:24
which is less apparent,
335
924210
1373
它比較不明顯,
15:25
but in some ways a lot more important.
336
925583
4109
但是就某些方面來說,卻重要得多。
15:29
Humanity can finally learn from the information
337
929692
3465
人性總算可以向我們蒐集來的資訊學習,
15:33
that it can collect,
338
933157
2418
15:35
as part of our timeless quest
339
935575
2115
成為我們永恆追尋的一部份,
15:37
to understand the world and our place in it,
340
937690
3159
藉此了解我們的世界,和人類的角色,
15:40
and that's why big data is a big deal.
341
940849
5631
這是為什麼大數據將「大」有可為。
15:46
(Applause)
342
946480
3568
(觀眾掌聲)
關於本網站

本網站將向您介紹對學習英語有用的 YouTube 視頻。 您將看到來自世界各地的一流教師教授的英語課程。 雙擊每個視頻頁面上顯示的英文字幕,從那裡播放視頻。 字幕與視頻播放同步滾動。 如果您有任何意見或要求,請使用此聯繫表與我們聯繫。

https://forms.gle/WvT1wiN1qDtmnspy7