The era of blind faith in big data must end | Cathy O'Neil

240,040 views ・ 2017-09-07

TED


Please double-click on the English subtitles below to play the video.

00:12
Algorithms are everywhere.
0
12795
1596
00:15
They sort and separate the winners from the losers.
1
15931
3125
00:19
The winners get the job
2
19839
2264
00:22
or a good credit card offer.
3
22127
1743
00:23
The losers don't even get an interview
4
23894
2651
00:27
or they pay more for insurance.
5
27410
1777
00:30
We're being scored with secret formulas that we don't understand
6
30017
3549
00:34
that often don't have systems of appeal.
7
34495
3217
00:39
That begs the question:
8
39060
1296
00:40
What if the algorithms are wrong?
9
40380
2913
00:44
To build an algorithm you need two things:
10
44920
2040
00:46
you need data, what happened in the past,
11
46984
1981
00:48
and a definition of success,
12
48989
1561
00:50
the thing you're looking for and often hoping for.
13
50574
2457
00:53
You train an algorithm by looking, figuring out.
14
53055
5037
00:58
The algorithm figures out what is associated with success.
15
58116
3419
01:01
What situation leads to success?
16
61559
2463
01:04
Actually, everyone uses algorithms.
17
64701
1762
01:06
They just don't formalize them in written code.
18
66487
2718
01:09
Let me give you an example.
19
69229
1348
01:10
I use an algorithm every day to make a meal for my family.
20
70601
3316
01:13
The data I use
21
73941
1476
01:16
is the ingredients in my kitchen,
22
76214
1659
01:17
the time I have,
23
77897
1527
01:19
the ambition I have,
24
79448
1233
01:20
and I curate that data.
25
80705
1709
01:22
I don't count those little packages of ramen noodles as food.
26
82438
4251
01:26
(Laughter)
27
86713
1869
01:28
My definition of success is:
28
88606
1845
01:30
a meal is successful if my kids eat vegetables.
29
90475
2659
01:34
It's very different from if my youngest son were in charge.
30
94001
2854
01:36
He'd say success is if he gets to eat lots of Nutella.
31
96879
2788
01:40
But I get to choose success.
32
100999
2226
01:43
I am in charge. My opinion matters.
33
103249
2707
01:45
That's the first rule of algorithms.
34
105980
2675
01:48
Algorithms are opinions embedded in code.
35
108679
3180
01:53
It's really different from what you think most people think of algorithms.
36
113382
3663
01:57
They think algorithms are objective and true and scientific.
37
117069
4504
02:02
That's a marketing trick.
38
122207
1699
02:05
It's also a marketing trick
39
125089
2125
02:07
to intimidate you with algorithms,
40
127238
3154
02:10
to make you trust and fear algorithms
41
130416
3661
02:14
because you trust and fear mathematics.
42
134101
2018
02:17
A lot can go wrong when we put blind faith in big data.
43
137387
4830
02:23
This is Kiri Soares. She's a high school principal in Brooklyn.
44
143504
3373
02:26
In 2011, she told me her teachers were being scored
45
146901
2586
02:29
with a complex, secret algorithm
46
149511
2727
02:32
called the "value-added model."
47
152262
1489
02:34
I told her, "Well, figure out what the formula is, show it to me.
48
154325
3092
02:37
I'm going to explain it to you."
49
157441
1541
02:39
She said, "Well, I tried to get the formula,
50
159006
2141
02:41
but my Department of Education contact told me it was math
51
161171
2772
02:43
and I wouldn't understand it."
52
163967
1546
02:47
It gets worse.
53
167086
1338
02:48
The New York Post filed a Freedom of Information Act request,
54
168448
3530
02:52
got all the teachers' names and all their scores
55
172002
2959
02:54
and they published them as an act of teacher-shaming.
56
174985
2782
02:58
When I tried to get the formulas, the source code, through the same means,
57
178904
3860
03:02
I was told I couldn't.
58
182788
2149
03:04
I was denied.
59
184961
1236
03:06
I later found out
60
186221
1174
03:07
that nobody in New York City had access to that formula.
61
187419
2866
03:10
No one understood it.
62
190309
1305
03:13
Then someone really smart got involved, Gary Rubinstein.
63
193749
3224
03:16
He found 665 teachers from that New York Post data
64
196997
3621
03:20
that actually had two scores.
65
200642
1866
03:22
That could happen if they were teaching
66
202532
1881
03:24
seventh grade math and eighth grade math.
67
204437
2439
03:26
He decided to plot them.
68
206900
1538
03:28
Each dot represents a teacher.
69
208462
1993
03:30
(Laughter)
70
210924
2379
03:33
What is that?
71
213327
1521
03:34
(Laughter)
72
214872
1277
03:36
That should never have been used for individual assessment.
73
216173
3446
03:39
It's almost a random number generator.
74
219643
1926
03:41
(Applause)
75
221593
2946
03:44
But it was.
76
224563
1162
03:45
This is Sarah Wysocki.
77
225749
1176
03:46
She got fired, along with 205 other teachers,
78
226949
2175
03:49
from the Washington, DC school district,
79
229148
2662
03:51
even though she had great recommendations from her principal
80
231834
2909
03:54
and the parents of her kids.
81
234767
1428
03:57
I know what a lot of you guys are thinking,
82
237210
2032
03:59
especially the data scientists, the AI experts here.
83
239266
2487
04:01
You're thinking, "Well, I would never make an algorithm that inconsistent."
84
241777
4226
04:06
But algorithms can go wrong,
85
246673
1683
04:08
even have deeply destructive effects with good intentions.
86
248380
4598
04:14
And whereas an airplane that's designed badly
87
254351
2379
04:16
crashes to the earth and everyone sees it,
88
256754
2001
04:18
an algorithm designed badly
89
258779
1850
04:22
can go on for a long time, silently wreaking havoc.
90
262065
3865
04:27
This is Roger Ailes.
91
267568
1570
04:29
(Laughter)
92
269162
2000
04:32
He founded Fox News in 1996.
93
272344
2388
04:35
More than 20 women complained about sexual harassment.
94
275256
2581
04:37
They said they weren't allowed to succeed at Fox News.
95
277861
3235
04:41
He was ousted last year, but we've seen recently
96
281120
2520
04:43
that the problems have persisted.
97
283664
2670
04:47
That begs the question:
98
287474
1400
04:48
What should Fox News do to turn over another leaf?
99
288898
2884
04:53
Well, what if they replaced their hiring process
100
293065
3041
04:56
with a machine-learning algorithm?
101
296130
1654
04:57
That sounds good, right?
102
297808
1595
04:59
Think about it.
103
299427
1300
05:00
The data, what would the data be?
104
300751
2105
05:02
A reasonable choice would be the last 21 years of applications to Fox News.
105
302880
4947
05:07
Reasonable.
106
307851
1502
05:09
What about the definition of success?
107
309377
1938
05:11
Reasonable choice would be,
108
311741
1324
05:13
well, who is successful at Fox News?
109
313089
1778
05:14
I guess someone who, say, stayed there for four years
110
314891
3580
05:18
and was promoted at least once.
111
318495
1654
05:20
Sounds reasonable.
112
320636
1561
05:22
And then the algorithm would be trained.
113
322221
2354
05:24
It would be trained to look for people to learn what led to success,
114
324599
3877
05:29
what kind of applications historically led to success
115
329039
4318
05:33
by that definition.
116
333381
1294
05:36
Now think about what would happen
117
336020
1775
05:37
if we applied that to a current pool of applicants.
118
337819
2555
05:40
It would filter out women
119
340939
1629
05:43
because they do not look like people who were successful in the past.
120
343483
3930
05:51
Algorithms don't make things fair
121
351572
2537
05:54
if you just blithely, blindly apply algorithms.
122
354133
2694
05:56
They don't make things fair.
123
356851
1482
05:58
They repeat our past practices,
124
358357
2128
06:00
our patterns.
125
360509
1183
06:01
They automate the status quo.
126
361716
1939
06:04
That would be great if we had a perfect world,
127
364538
2389
06:07
but we don't.
128
367725
1312
06:09
And I'll add that most companies don't have embarrassing lawsuits,
129
369061
4102
06:14
but the data scientists in those companies
130
374266
2588
06:16
are told to follow the data,
131
376878
2189
06:19
to focus on accuracy.
132
379091
2143
06:22
Think about what that means.
133
382093
1381
06:23
Because we all have bias, it means they could be codifying sexism
134
383498
4027
06:27
or any other kind of bigotry.
135
387549
1836
06:31
Thought experiment,
136
391308
1421
06:32
because I like them:
137
392753
1509
06:35
an entirely segregated society --
138
395394
2975
06:40
racially segregated, all towns, all neighborhoods
139
400067
3328
06:43
and where we send the police only to the minority neighborhoods
140
403419
3037
06:46
to look for crime.
141
406480
1193
06:48
The arrest data would be very biased.
142
408271
2219
06:51
What if, on top of that, we found the data scientists
143
411671
2575
06:54
and paid the data scientists to predict where the next crime would occur?
144
414270
4161
06:59
Minority neighborhood.
145
419095
1487
07:01
Or to predict who the next criminal would be?
146
421105
3125
07:04
A minority.
147
424708
1395
07:07
The data scientists would brag about how great and how accurate
148
427769
3541
07:11
their model would be,
149
431334
1297
07:12
and they'd be right.
150
432655
1299
07:15
Now, reality isn't that drastic, but we do have severe segregations
151
435771
4615
07:20
in many cities and towns,
152
440410
1287
07:21
and we have plenty of evidence
153
441721
1893
07:23
of biased policing and justice system data.
154
443638
2688
07:27
And we actually do predict hotspots,
155
447452
2815
07:30
places where crimes will occur.
156
450291
1530
07:32
And we do predict, in fact, the individual criminality,
157
452221
3866
07:36
the criminality of individuals.
158
456111
1770
07:38
The news organization ProPublica recently looked into
159
458792
3963
07:42
one of those "recidivism risk" algorithms,
160
462779
2024
07:44
as they're called,
161
464827
1163
07:46
being used in Florida during sentencing by judges.
162
466014
3194
07:50
Bernard, on the left, the black man, was scored a 10 out of 10.
163
470231
3585
07:54
Dylan, on the right, 3 out of 10.
164
474999
2007
07:57
10 out of 10, high risk. 3 out of 10, low risk.
165
477030
2501
08:00
They were both brought in for drug possession.
166
480418
2385
08:02
They both had records,
167
482827
1154
08:04
but Dylan had a felony
168
484005
2806
08:06
but Bernard didn't.
169
486835
1176
08:09
This matters, because the higher score you are,
170
489638
3066
08:12
the more likely you're being given a longer sentence.
171
492728
3473
08:18
What's going on?
172
498114
1294
08:20
Data laundering.
173
500346
1332
08:22
It's a process by which technologists hide ugly truths
174
502750
4427
08:27
inside black box algorithms
175
507201
1821
08:29
and call them objective;
176
509046
1290
08:31
call them meritocratic.
177
511140
1568
08:34
When they're secret, important and destructive,
178
514938
2385
08:37
I've coined a term for these algorithms:
179
517347
2487
08:39
"weapons of math destruction."
180
519858
1999
08:41
(Laughter)
181
521881
1564
08:43
(Applause)
182
523469
3054
08:46
They're everywhere, and it's not a mistake.
183
526547
2354
08:49
These are private companies building private algorithms
184
529515
3723
08:53
for private ends.
185
533262
1392
08:55
Even the ones I talked about for teachers and the public police,
186
535034
3214
08:58
those were built by private companies
187
538272
1869
09:00
and sold to the government institutions.
188
540165
2231
09:02
They call it their "secret sauce" --
189
542420
1873
09:04
that's why they can't tell us about it.
190
544317
2128
09:06
It's also private power.
191
546469
2220
09:09
They are profiting for wielding the authority of the inscrutable.
192
549744
4695
09:16
Now you might think, since all this stuff is private
193
556934
2934
09:19
and there's competition,
194
559892
1158
09:21
maybe the free market will solve this problem.
195
561074
2306
09:23
It won't.
196
563404
1249
09:24
There's a lot of money to be made in unfairness.
197
564677
3120
09:28
Also, we're not economic rational agents.
198
568947
3369
09:32
We all are biased.
199
572851
1292
09:34
We're all racist and bigoted in ways that we wish we weren't,
200
574780
3377
09:38
in ways that we don't even know.
201
578181
2019
09:41
We know this, though, in aggregate,
202
581172
3081
09:44
because sociologists have consistently demonstrated this
203
584277
3220
09:47
with these experiments they build,
204
587521
1665
09:49
where they send a bunch of applications to jobs out,
205
589210
2568
09:51
equally qualified but some have white-sounding names
206
591802
2501
09:54
and some have black-sounding names,
207
594327
1706
09:56
and it's always disappointing, the results -- always.
208
596057
2694
09:59
So we are the ones that are biased,
209
599330
1771
10:01
and we are injecting those biases into the algorithms
210
601125
3429
10:04
by choosing what data to collect,
211
604578
1812
10:06
like I chose not to think about ramen noodles --
212
606414
2743
10:09
I decided it was irrelevant.
213
609181
1625
10:10
But by trusting the data that's actually picking up on past practices
214
610830
5684
10:16
and by choosing the definition of success,
215
616538
2014
10:18
how can we expect the algorithms to emerge unscathed?
216
618576
3983
10:22
We can't. We have to check them.
217
622583
2356
10:25
We have to check them for fairness.
218
625985
1709
10:27
The good news is, we can check them for fairness.
219
627718
2711
10:30
Algorithms can be interrogated,
220
630453
3352
10:33
and they will tell us the truth every time.
221
633829
2034
10:35
And we can fix them. We can make them better.
222
635887
2493
10:38
I call this an algorithmic audit,
223
638404
2375
10:40
and I'll walk you through it.
224
640803
1679
10:42
First, data integrity check.
225
642506
2196
10:45
For the recidivism risk algorithm I talked about,
226
645952
2657
10:49
a data integrity check would mean we'd have to come to terms with the fact
227
649402
3573
10:52
that in the US, whites and blacks smoke pot at the same rate
228
652999
3526
10:56
but blacks are far more likely to be arrested --
229
656549
2485
10:59
four or five times more likely, depending on the area.
230
659058
3184
11:03
What is that bias looking like in other crime categories,
231
663137
2826
11:05
and how do we account for it?
232
665987
1451
11:07
Second, we should think about the definition of success,
233
667982
3039
11:11
audit that.
234
671045
1381
11:12
Remember -- with the hiring algorithm? We talked about it.
235
672450
2752
11:15
Someone who stays for four years and is promoted once?
236
675226
3165
11:18
Well, that is a successful employee,
237
678415
1769
11:20
but it's also an employee that is supported by their culture.
238
680208
3079
11:23
That said, also it can be quite biased.
239
683909
1926
11:25
We need to separate those two things.
240
685859
2065
11:27
We should look to the blind orchestra audition
241
687948
2426
11:30
as an example.
242
690398
1196
11:31
That's where the people auditioning are behind a sheet.
243
691618
2756
11:34
What I want to think about there
244
694766
1931
11:36
is the people who are listening have decided what's important
245
696721
3417
11:40
and they've decided what's not important,
246
700162
2029
11:42
and they're not getting distracted by that.
247
702215
2059
11:44
When the blind orchestra auditions started,
248
704781
2749
11:47
the number of women in orchestras went up by a factor of five.
249
707554
3444
11:52
Next, we have to consider accuracy.
250
712073
2015
11:55
This is where the value-added model for teachers would fail immediately.
251
715053
3734
11:59
No algorithm is perfect, of course,
252
719398
2162
12:02
so we have to consider the errors of every algorithm.
253
722440
3605
12:06
How often are there errors, and for whom does this model fail?
254
726656
4359
12:11
What is the cost of that failure?
255
731670
1718
12:14
And finally, we have to consider
256
734254
2207
12:17
the long-term effects of algorithms,
257
737793
2186
12:20
the feedback loops that are engendering.
258
740686
2207
12:23
That sounds abstract,
259
743406
1236
12:24
but imagine if Facebook engineers had considered that
260
744666
2664
12:28
before they decided to show us only things that our friends had posted.
261
748090
4855
12:33
I have two more messages, one for the data scientists out there.
262
753581
3234
12:37
Data scientists: we should not be the arbiters of truth.
263
757270
3409
12:41
We should be translators of ethical discussions that happen
264
761340
3783
12:45
in larger society.
265
765147
1294
12:47
(Applause)
266
767399
2133
12:49
And the rest of you,
267
769556
1556
12:51
the non-data scientists:
268
771831
1396
12:53
this is not a math test.
269
773251
1498
12:55
This is a political fight.
270
775452
1348
12:58
We need to demand accountability for our algorithmic overlords.
271
778407
3907
13:03
(Applause)
272
783938
1499
13:05
The era of blind faith in big data must end.
273
785461
4225
13:09
Thank you very much.
274
789710
1167
13:10
(Applause)
275
790901
5303
About this website

This site will introduce you to YouTube videos that are useful for learning English. You will see English lessons taught by top-notch teachers from around the world. Double-click on the English subtitles displayed on each video page to play the video from there. The subtitles scroll in sync with the video playback. If you have any comments or requests, please contact us using this contact form.

https://forms.gle/WvT1wiN1qDtmnspy7