The human insights missing from big data | Tricia Wang

246,282 views ・ 2017-08-02

TED


Please double-click on the English subtitles below to play the video.

00:12
In ancient Greece,
0
12705
1545
00:15
when anyone from slaves to soldiers, poets and politicians,
1
15256
3943
00:19
needed to make a big decision on life's most important questions,
2
19223
4004
00:23
like, "Should I get married?"
3
23251
1391
00:24
or "Should we embark on this voyage?"
4
24666
1857
00:26
or "Should our army advance into this territory?"
5
26547
2928
00:29
they all consulted the oracle.
6
29499
2579
00:32
So this is how it worked:
7
32840
1440
00:34
you would bring her a question and you would get on your knees,
8
34304
3112
00:37
and then she would go into this trance.
9
37440
1871
00:39
It would take a couple of days,
10
39335
1549
00:40
and then eventually she would come out of it,
11
40908
2163
00:43
giving you her predictions as your answer.
12
43095
2536
00:46
From the oracle bones of ancient China
13
46730
2566
00:49
to ancient Greece to Mayan calendars,
14
49320
2345
00:51
people have craved for prophecy
15
51689
2296
00:54
in order to find out what's going to happen next.
16
54009
3137
00:58
And that's because we all want to make the right decision.
17
58336
3239
01:01
We don't want to miss something.
18
61599
1545
01:03
The future is scary,
19
63712
1743
01:05
so it's much nicer knowing that we can make a decision
20
65479
2717
01:08
with some assurance of the outcome.
21
68220
1982
01:10
Well, we have a new oracle,
22
70899
1611
01:12
and it's name is big data,
23
72534
2145
01:14
or we call it "Watson" or "deep learning" or "neural net."
24
74703
3939
01:19
And these are the kinds of questions we ask of our oracle now,
25
79160
4012
01:23
like, "What's the most efficient way to ship these phones
26
83196
3922
01:27
from China to Sweden?"
27
87142
1823
01:28
Or, "What are the odds
28
88989
1800
01:30
of my child being born with a genetic disorder?"
29
90813
3363
01:34
Or, "What are the sales volume we can predict for this product?"
30
94772
3244
01:39
I have a dog. Her name is Elle, and she hates the rain.
31
99928
4047
01:43
And I have tried everything to untrain her.
32
103999
3306
01:47
But because I have failed at this,
33
107329
2771
01:50
I also have to consult an oracle, called Dark Sky,
34
110124
3286
01:53
every time before we go on a walk,
35
113434
1635
01:55
for very accurate weather predictions in the next 10 minutes.
36
115093
3577
02:01
She's so sweet.
37
121355
1303
02:03
So because of all of this, our oracle is a $122 billion industry.
38
123647
5707
02:09
Now, despite the size of this industry,
39
129826
3376
02:13
the returns are surprisingly low.
40
133226
2456
02:16
Investing in big data is easy,
41
136162
2494
02:18
but using it is hard.
42
138680
1933
02:21
Over 73 percent of big data projects aren't even profitable,
43
141801
4040
02:25
and I have executives coming up to me saying,
44
145865
2431
02:28
"We're experiencing the same thing.
45
148320
1789
02:30
We invested in some big data system,
46
150133
1753
02:31
and our employees aren't making better decisions.
47
151910
2968
02:34
And they're certainly not coming up with more breakthrough ideas."
48
154902
3162
02:38
So this is all really interesting to me,
49
158734
3184
02:41
because I'm a technology ethnographer.
50
161942
2010
02:44
I study and I advise companies
51
164450
2564
02:47
on the patterns of how people use technology,
52
167038
2483
02:49
and one of my interest areas is data.
53
169545
2678
02:52
So why is having more data not helping us make better decisions,
54
172247
5193
02:57
especially for companies who have all these resources
55
177464
2783
03:00
to invest in these big data systems?
56
180271
1736
03:02
Why isn't it getting any easier for them?
57
182031
2398
03:05
So, I've witnessed the struggle firsthand.
58
185810
2634
03:09
In 2009, I started a research position with Nokia.
59
189194
3484
03:13
And at the time,
60
193052
1158
03:14
Nokia was one of the largest cell phone companies in the world,
61
194234
3158
03:17
dominating emerging markets like China, Mexico and India --
62
197416
3202
03:20
all places where I had done a lot of research
63
200642
2502
03:23
on how low-income people use technology.
64
203168
2676
03:25
And I spent a lot of extra time in China
65
205868
2330
03:28
getting to know the informal economy.
66
208222
2592
03:30
So I did things like working as a street vendor
67
210838
2401
03:33
selling dumplings to construction workers.
68
213263
2574
03:35
Or I did fieldwork,
69
215861
1358
03:37
spending nights and days in internet cafés,
70
217243
2958
03:40
hanging out with Chinese youth, so I could understand
71
220225
2546
03:42
how they were using games and mobile phones
72
222795
2284
03:45
and using it between moving from the rural areas to the cities.
73
225103
3370
03:50
Through all of this qualitative evidence that I was gathering,
74
230155
3927
03:54
I was starting to see so clearly
75
234106
2824
03:56
that a big change was about to happen among low-income Chinese people.
76
236954
4472
04:02
Even though they were surrounded by advertisements for luxury products
77
242840
4367
04:07
like fancy toilets -- who wouldn't want one? --
78
247231
3495
04:10
and apartments and cars,
79
250750
2890
04:13
through my conversations with them,
80
253664
1820
04:15
I found out that the ads the actually enticed them the most
81
255508
3841
04:19
were the ones for iPhones,
82
259373
1996
04:21
promising them this entry into this high-tech life.
83
261393
3052
04:25
And even when I was living with them in urban slums like this one,
84
265289
3163
04:28
I saw people investing over half of their monthly income
85
268476
2996
04:31
into buying a phone,
86
271496
1623
04:33
and increasingly, they were "shanzhai,"
87
273143
2302
04:35
which are affordable knock-offs of iPhones and other brands.
88
275469
3388
04:40
They're very usable.
89
280123
1625
04:42
Does the job.
90
282710
1322
04:44
And after years of living with migrants and working with them
91
284570
5789
04:50
and just really doing everything that they were doing,
92
290383
3434
04:53
I started piecing all these data points together --
93
293841
3597
04:57
from the things that seem random, like me selling dumplings,
94
297462
3123
05:00
to the things that were more obvious,
95
300609
1804
05:02
like tracking how much they were spending on their cell phone bills.
96
302437
3232
05:05
And I was able to create this much more holistic picture
97
305693
2639
05:08
of what was happening.
98
308356
1156
05:09
And that's when I started to realize
99
309536
1722
05:11
that even the poorest in China would want a smartphone,
100
311282
3509
05:14
and that they would do almost anything to get their hands on one.
101
314815
4985
05:20
You have to keep in mind,
102
320893
2404
05:23
iPhones had just come out, it was 2009,
103
323321
3084
05:26
so this was, like, eight years ago,
104
326429
1885
05:28
and Androids had just started looking like iPhones.
105
328338
2437
05:30
And a lot of very smart and realistic people said,
106
330799
2507
05:33
"Those smartphones -- that's just a fad.
107
333330
2207
05:36
Who wants to carry around these heavy things
108
336063
2996
05:39
where batteries drain quickly and they break every time you drop them?"
109
339083
3487
05:44
But I had a lot of data,
110
344613
1201
05:45
and I was very confident about my insights,
111
345838
2260
05:48
so I was very excited to share them with Nokia.
112
348122
2829
05:53
But Nokia was not convinced,
113
353152
2517
05:55
because it wasn't big data.
114
355693
2335
05:58
They said, "We have millions of data points,
115
358842
2404
06:01
and we don't see any indicators of anyone wanting to buy a smartphone,
116
361270
4247
06:05
and your data set of 100, as diverse as it is, is too weak
117
365541
4388
06:09
for us to even take seriously."
118
369953
1714
06:12
And I said, "Nokia, you're right.
119
372728
1605
06:14
Of course you wouldn't see this,
120
374357
1560
06:15
because you're sending out surveys assuming that people don't know
121
375941
3371
06:19
what a smartphone is,
122
379336
1159
06:20
so of course you're not going to get any data back
123
380519
2366
06:22
about people wanting to buy a smartphone in two years.
124
382909
2572
06:25
Your surveys, your methods have been designed
125
385505
2118
06:27
to optimize an existing business model,
126
387647
2022
06:29
and I'm looking at these emergent human dynamics
127
389693
2608
06:32
that haven't happened yet.
128
392325
1354
06:33
We're looking outside of market dynamics
129
393703
2438
06:36
so that we can get ahead of it."
130
396165
1631
06:39
Well, you know what happened to Nokia?
131
399193
2244
06:41
Their business fell off a cliff.
132
401461
2365
06:44
This -- this is the cost of missing something.
133
404611
3727
06:48
It was unfathomable.
134
408983
1999
06:51
But Nokia's not alone.
135
411823
1651
06:54
I see organizations throwing out data all the time
136
414078
2581
06:56
because it didn't come from a quant model
137
416683
2561
06:59
or it doesn't fit in one.
138
419268
1768
07:02
But it's not big data's fault.
139
422039
2048
07:04
It's the way we use big data; it's our responsibility.
140
424762
3907
07:09
Big data's reputation for success
141
429550
1911
07:11
comes from quantifying very specific environments,
142
431485
3759
07:15
like electricity power grids or delivery logistics or genetic code,
143
435268
4913
07:20
when we're quantifying in systems that are more or less contained.
144
440205
4318
07:24
But not all systems are as neatly contained.
145
444547
2969
07:27
When you're quantifying and systems are more dynamic,
146
447540
3258
07:30
especially systems that involve human beings,
147
450822
3799
07:34
forces are complex and unpredictable,
148
454645
2426
07:37
and these are things that we don't know how to model so well.
149
457095
3486
07:41
Once you predict something about human behavior,
150
461024
2813
07:43
new factors emerge,
151
463861
1855
07:45
because conditions are constantly changing.
152
465740
2365
07:48
That's why it's a never-ending cycle.
153
468129
1803
07:49
You think you know something,
154
469956
1464
07:51
and then something unknown enters the picture.
155
471444
2242
07:53
And that's why just relying on big data alone
156
473710
3322
07:57
increases the chance that we'll miss something,
157
477056
2849
07:59
while giving us this illusion that we already know everything.
158
479929
3777
08:04
And what makes it really hard to see this paradox
159
484226
3856
08:08
and even wrap our brains around it
160
488106
2659
08:10
is that we have this thing that I call the quantification bias,
161
490789
3691
08:14
which is the unconscious belief of valuing the measurable
162
494504
3922
08:18
over the immeasurable.
163
498450
1594
08:21
And we often experience this at our work.
164
501042
3284
08:24
Maybe we work alongside colleagues who are like this,
165
504350
2650
08:27
or even our whole entire company may be like this,
166
507024
2428
08:29
where people become so fixated on that number,
167
509476
2546
08:32
that they can't see anything outside of it,
168
512046
2067
08:34
even when you present them evidence right in front of their face.
169
514137
3948
08:38
And this is a very appealing message,
170
518943
3371
08:42
because there's nothing wrong with quantifying;
171
522338
2343
08:44
it's actually very satisfying.
172
524705
1430
08:46
I get a great sense of comfort from looking at an Excel spreadsheet,
173
526159
4362
08:50
even very simple ones.
174
530545
1401
08:51
(Laughter)
175
531970
1014
08:53
It's just kind of like,
176
533008
1152
08:54
"Yes! The formula worked. It's all OK. Everything is under control."
177
534184
3504
08:58
But the problem is
178
538612
2390
09:01
that quantifying is addictive.
179
541026
2661
09:03
And when we forget that
180
543711
1382
09:05
and when we don't have something to kind of keep that in check,
181
545117
3038
09:08
it's very easy to just throw out data
182
548179
2118
09:10
because it can't be expressed as a numerical value.
183
550321
2718
09:13
It's very easy just to slip into silver-bullet thinking,
184
553063
2921
09:16
as if some simple solution existed.
185
556008
2579
09:19
Because this is a great moment of danger for any organization,
186
559420
4062
09:23
because oftentimes, the future we need to predict --
187
563506
2634
09:26
it isn't in that haystack,
188
566164
2166
09:28
but it's that tornado that's bearing down on us
189
568354
2538
09:30
outside of the barn.
190
570916
1488
09:34
There is no greater risk
191
574780
2326
09:37
than being blind to the unknown.
192
577130
1666
09:38
It can cause you to make the wrong decisions.
193
578820
2149
09:40
It can cause you to miss something big.
194
580993
1974
09:43
But we don't have to go down this path.
195
583554
3101
09:47
It turns out that the oracle of ancient Greece
196
587273
3195
09:50
holds the secret key that shows us the path forward.
197
590492
3966
09:55
Now, recent geological research has shown
198
595474
2595
09:58
that the Temple of Apollo, where the most famous oracle sat,
199
598093
3564
10:01
was actually built over two earthquake faults.
200
601681
3084
10:04
And these faults would release these petrochemical fumes
201
604789
2886
10:07
from underneath the Earth's crust,
202
607699
1685
10:09
and the oracle literally sat right above these faults,
203
609408
3866
10:13
inhaling enormous amounts of ethylene gas, these fissures.
204
613298
3588
10:16
(Laughter)
205
616910
1008
10:17
It's true.
206
617942
1173
10:19
(Laughter)
207
619139
1017
10:20
It's all true, and that's what made her babble and hallucinate
208
620180
3509
10:23
and go into this trance-like state.
209
623713
1724
10:25
She was high as a kite!
210
625461
1770
10:27
(Laughter)
211
627255
4461
10:31
So how did anyone --
212
631740
2779
10:34
How did anyone get any useful advice out of her
213
634543
3030
10:37
in this state?
214
637597
1190
10:39
Well, you see those people surrounding the oracle?
215
639317
2381
10:41
You see those people holding her up,
216
641722
1879
10:43
because she's, like, a little woozy?
217
643625
1717
10:45
And you see that guy on your left-hand side
218
645366
2308
10:47
holding the orange notebook?
219
647698
1598
10:49
Well, those were the temple guides,
220
649925
1730
10:51
and they worked hand in hand with the oracle.
221
651679
3016
10:55
When inquisitors would come and get on their knees,
222
655904
2516
10:58
that's when the temple guides would get to work,
223
658444
2340
11:00
because after they asked her questions,
224
660808
1864
11:02
they would observe their emotional state,
225
662696
2001
11:04
and then they would ask them follow-up questions,
226
664721
2324
11:07
like, "Why do you want to know this prophecy? Who are you?
227
667069
2834
11:09
What are you going to do with this information?"
228
669927
2264
11:12
And then the temple guides would take this more ethnographic,
229
672215
3182
11:15
this more qualitative information,
230
675421
2156
11:17
and interpret the oracle's babblings.
231
677601
2075
11:21
So the oracle didn't stand alone,
232
681248
2292
11:23
and neither should our big data systems.
233
683564
2148
11:26
Now to be clear,
234
686450
1161
11:27
I'm not saying that big data systems are huffing ethylene gas,
235
687635
3459
11:31
or that they're even giving invalid predictions.
236
691118
2353
11:33
The total opposite.
237
693495
1161
11:34
But what I am saying
238
694680
2068
11:36
is that in the same way that the oracle needed her temple guides,
239
696772
3832
11:40
our big data systems need them, too.
240
700628
2288
11:42
They need people like ethnographers and user researchers
241
702940
4109
11:47
who can gather what I call thick data.
242
707073
2506
11:50
This is precious data from humans,
243
710322
2991
11:53
like stories, emotions and interactions that cannot be quantified.
244
713337
4102
11:57
It's the kind of data that I collected for Nokia
245
717463
2322
11:59
that comes in in the form of a very small sample size,
246
719809
2669
12:02
but delivers incredible depth of meaning.
247
722502
2955
12:05
And what makes it so thick and meaty
248
725481
3680
12:10
is the experience of understanding the human narrative.
249
730265
4029
12:14
And that's what helps to see what's missing in our models.
250
734318
3639
12:18
Thick data grounds our business questions in human questions,
251
738671
4045
12:22
and that's why integrating big and thick data
252
742740
3562
12:26
forms a more complete picture.
253
746326
1689
12:28
Big data is able to offer insights at scale
254
748592
2881
12:31
and leverage the best of machine intelligence,
255
751497
2647
12:34
whereas thick data can help us rescue the context loss
256
754168
3572
12:37
that comes from making big data usable,
257
757764
2098
12:39
and leverage the best of human intelligence.
258
759886
2181
12:42
And when you actually integrate the two, that's when things get really fun,
259
762091
3552
12:45
because then you're no longer just working with data
260
765667
2436
12:48
you've already collected.
261
768127
1196
12:49
You get to also work with data that hasn't been collected.
262
769347
2737
12:52
You get to ask questions about why:
263
772108
1719
12:53
Why is this happening?
264
773851
1317
12:55
Now, when Netflix did this,
265
775598
1379
12:57
they unlocked a whole new way to transform their business.
266
777001
3035
13:01
Netflix is known for their really great recommendation algorithm,
267
781226
3956
13:05
and they had this $1 million prize for anyone who could improve it.
268
785206
4797
13:10
And there were winners.
269
790027
1314
13:12
But Netflix discovered the improvements were only incremental.
270
792075
4323
13:17
So to really find out what was going on,
271
797224
1964
13:19
they hired an ethnographer, Grant McCracken,
272
799212
3741
13:22
to gather thick data insights.
273
802977
1546
13:24
And what he discovered was something that they hadn't seen initially
274
804547
3924
13:28
in the quantitative data.
275
808495
1355
13:30
He discovered that people loved to binge-watch.
276
810892
2728
13:33
In fact, people didn't even feel guilty about it.
277
813644
2353
13:36
They enjoyed it.
278
816021
1255
13:37
(Laughter)
279
817300
1026
13:38
So Netflix was like, "Oh. This is a new insight."
280
818350
2356
13:40
So they went to their data science team,
281
820730
1938
13:42
and they were able to scale this big data insight
282
822692
2318
13:45
in with their quantitative data.
283
825034
2587
13:47
And once they verified it and validated it,
284
827645
3170
13:50
Netflix decided to do something very simple but impactful.
285
830839
4761
13:56
They said, instead of offering the same show from different genres
286
836654
6492
14:03
or more of the different shows from similar users,
287
843170
3888
14:07
we'll just offer more of the same show.
288
847082
2554
14:09
We'll make it easier for you to binge-watch.
289
849660
2105
14:11
And they didn't stop there.
290
851789
1486
14:13
They did all these things
291
853299
1474
14:14
to redesign their entire viewer experience,
292
854797
2959
14:17
to really encourage binge-watching.
293
857780
1758
14:20
It's why people and friends disappear for whole weekends at a time,
294
860050
3241
14:23
catching up on shows like "Master of None."
295
863315
2343
14:25
By integrating big data and thick data, they not only improved their business,
296
865682
4173
14:29
but they transformed how we consume media.
297
869879
2812
14:32
And now their stocks are projected to double in the next few years.
298
872715
4552
14:38
But this isn't just about watching more videos
299
878100
3830
14:41
or selling more smartphones.
300
881954
1620
14:43
For some, integrating thick data insights into the algorithm
301
883963
4050
14:48
could mean life or death,
302
888037
2263
14:50
especially for the marginalized.
303
890324
2146
14:53
All around the country, police departments are using big data
304
893558
3434
14:57
for predictive policing,
305
897016
1963
14:59
to set bond amounts and sentencing recommendations
306
899003
3084
15:02
in ways that reinforce existing biases.
307
902111
3147
15:06
NSA's Skynet machine learning algorithm
308
906116
2423
15:08
has possibly aided in the deaths of thousands of civilians in Pakistan
309
908563
5444
15:14
from misreading cellular device metadata.
310
914031
2721
15:18
As all of our lives become more automated,
311
918951
3403
15:22
from automobiles to health insurance or to employment,
312
922378
3080
15:25
it is likely that all of us
313
925482
2350
15:27
will be impacted by the quantification bias.
314
927856
2989
15:32
Now, the good news is that we've come a long way
315
932792
2621
15:35
from huffing ethylene gas to make predictions.
316
935437
2450
15:37
We have better tools, so let's just use them better.
317
937911
3070
15:41
Let's integrate the big data with the thick data.
318
941005
2323
15:43
Let's bring our temple guides with the oracles,
319
943352
2261
15:45
and whether this work happens in companies or nonprofits
320
945637
3376
15:49
or government or even in the software,
321
949037
2469
15:51
all of it matters,
322
951530
1792
15:53
because that means we're collectively committed
323
953346
3023
15:56
to making better data,
324
956393
2191
15:58
better algorithms, better outputs
325
958608
1836
16:00
and better decisions.
326
960468
1643
16:02
This is how we'll avoid missing that something.
327
962135
3558
16:07
(Applause)
328
967042
3948
About this website

This site will introduce you to YouTube videos that are useful for learning English. You will see English lessons taught by top-notch teachers from around the world. Double-click on the English subtitles displayed on each video page to play the video from there. The subtitles scroll in sync with the video playback. If you have any comments or requests, please contact us using this contact form.

https://forms.gle/WvT1wiN1qDtmnspy7