3 ways to spot a bad statistic | Mona Chalabi

249,321 views ・ 2017-04-17

TED


Please double-click on the English subtitles below to play the video.

00:12
I'm going to be talking about statistics today.
0
12704
2763
00:15
If that makes you immediately feel a little bit wary, that's OK,
1
15491
3138
00:18
that doesn't make you some kind of crazy conspiracy theorist,
2
18653
2859
00:21
it makes you skeptical.
3
21536
1296
00:22
And when it comes to numbers, especially now, you should be skeptical.
4
22856
3886
00:26
But you should also be able to tell which numbers are reliable
5
26766
3011
00:29
and which ones aren't.
6
29801
1160
00:30
So today I want to try to give you some tools to be able to do that.
7
30985
3206
00:34
But before I do,
8
34215
1169
00:35
I just want to clarify which numbers I'm talking about here.
9
35408
2839
00:38
I'm not talking about claims like,
10
38271
1635
00:39
"9 out of 10 women recommend this anti-aging cream."
11
39930
2449
00:42
I think a lot of us always roll our eyes at numbers like that.
12
42403
2972
00:45
What's different now is people are questioning statistics like,
13
45399
2984
00:48
"The US unemployment rate is five percent."
14
48407
2014
00:50
What makes this claim different is it doesn't come from a private company,
15
50445
3516
00:53
it comes from the government.
16
53985
1388
00:55
About 4 out of 10 Americans distrust the economic data
17
55397
3336
00:58
that gets reported by government.
18
58757
1573
01:00
Among supporters of President Trump it's even higher;
19
60354
2491
01:02
it's about 7 out of 10.
20
62869
1633
01:04
I don't need to tell anyone here
21
64526
1804
01:06
that there are a lot of dividing lines in our society right now,
22
66354
3011
01:09
and a lot of them start to make sense,
23
69389
1825
01:11
once you understand people's relationships with these government numbers.
24
71238
3687
01:14
On the one hand, there are those who say these statistics are crucial,
25
74949
3336
01:18
that we need them to make sense of society as a whole
26
78309
2630
01:20
in order to move beyond emotional anecdotes
27
80963
2164
01:23
and measure progress in an [objective] way.
28
83151
2410
01:25
And then there are the others,
29
85585
1467
01:27
who say that these statistics are elitist,
30
87076
2156
01:29
maybe even rigged;
31
89256
1208
01:30
they don't make sense and they don't really reflect
32
90488
2394
01:32
what's happening in people's everyday lives.
33
92906
2296
01:35
It kind of feels like that second group is winning the argument right now.
34
95226
3487
01:38
We're living in a world of alternative facts,
35
98737
2108
01:40
where people don't find statistics this kind of common ground,
36
100869
2935
01:43
this starting point for debate.
37
103828
1636
01:45
This is a problem.
38
105488
1286
01:46
There are actually moves in the US right now
39
106798
2067
01:48
to get rid of some government statistics altogether.
40
108889
2861
01:51
Right now there's a bill in congress about measuring racial inequality.
41
111774
3387
01:55
The draft law says that government money should not be used
42
115185
2801
01:58
to collect data on racial segregation.
43
118010
1902
01:59
This is a total disaster.
44
119936
1885
02:01
If we don't have this data,
45
121845
1748
02:03
how can we observe discrimination,
46
123617
1778
02:05
let alone fix it?
47
125419
1278
02:06
In other words:
48
126721
1188
02:07
How can a government create fair policies
49
127933
2059
02:10
if they can't measure current levels of unfairness?
50
130016
2771
02:12
This isn't just about discrimination,
51
132811
1794
02:14
it's everything -- think about it.
52
134629
1670
02:16
How can we legislate on health care
53
136323
1690
02:18
if we don't have good data on health or poverty?
54
138037
2271
02:20
How can we have public debate about immigration
55
140332
2198
02:22
if we can't at least agree
56
142554
1250
02:23
on how many people are entering and leaving the country?
57
143828
2643
02:26
Statistics come from the state; that's where they got their name.
58
146495
3058
02:29
The point was to better measure the population
59
149577
2157
02:31
in order to better serve it.
60
151758
1357
02:33
So we need these government numbers,
61
153139
1725
02:34
but we also have to move beyond either blindly accepting
62
154888
2647
02:37
or blindly rejecting them.
63
157559
1268
02:38
We need to learn the skills to be able to spot bad statistics.
64
158851
2997
02:41
I started to learn some of these
65
161872
1528
02:43
when I was working in a statistical department
66
163424
2166
02:45
that's part of the United Nations.
67
165614
1643
02:47
Our job was to find out how many Iraqis had been forced from their homes
68
167281
3406
02:50
as a result of the war,
69
170711
1158
02:51
and what they needed.
70
171893
1158
02:53
It was really important work, but it was also incredibly difficult.
71
173075
3178
02:56
Every single day, we were making decisions
72
176277
2018
02:58
that affected the accuracy of our numbers --
73
178319
2157
03:00
decisions like which parts of the country we should go to,
74
180500
2744
03:03
who we should speak to,
75
183268
1156
03:04
which questions we should ask.
76
184448
1568
03:06
And I started to feel really disillusioned with our work,
77
186040
2680
03:08
because we thought we were doing a really good job,
78
188744
2518
03:11
but the one group of people who could really tell us were the Iraqis,
79
191286
3278
03:14
and they rarely got the chance to find our analysis, let alone question it.
80
194588
3540
03:18
So I started to feel really determined
81
198152
1831
03:20
that the one way to make numbers more accurate
82
200007
2311
03:22
is to have as many people as possible be able to question them.
83
202342
3053
03:25
So I became a data journalist.
84
205419
1434
03:26
My job is finding these data sets and sharing them with the public.
85
206877
3904
03:30
Anyone can do this, you don't have to be a geek or a nerd.
86
210805
3173
03:34
You can ignore those words; they're used by people
87
214002
2355
03:36
trying to say they're smart while pretending they're humble.
88
216381
2822
03:39
Absolutely anyone can do this.
89
219227
1589
03:40
I want to give you guys three questions
90
220840
2067
03:42
that will help you be able to spot some bad statistics.
91
222931
3005
03:45
So, question number one is: Can you see uncertainty?
92
225960
3507
03:49
One of things that's really changed people's relationship with numbers,
93
229491
3364
03:52
and even their trust in the media,
94
232879
1641
03:54
has been the use of political polls.
95
234544
2258
03:56
I personally have a lot of issues with political polls
96
236826
2538
03:59
because I think the role of journalists is actually to report the facts
97
239388
3376
04:02
and not attempt to predict them,
98
242788
1553
04:04
especially when those predictions can actually damage democracy
99
244365
2996
04:07
by signaling to people: don't bother to vote for that guy,
100
247385
2732
04:10
he doesn't have a chance.
101
250141
1205
04:11
Let's set that aside for now and talk about the accuracy of this endeavor.
102
251370
3654
04:15
Based on national elections in the UK, Italy, Israel
103
255048
4608
04:19
and of course, the most recent US presidential election,
104
259680
2764
04:22
using polls to predict electoral outcomes
105
262468
2137
04:24
is about as accurate as using the moon to predict hospital admissions.
106
264629
3812
04:28
No, seriously, I used actual data from an academic study to draw this.
107
268465
4200
04:32
There are a lot of reasons why polling has become so inaccurate.
108
272689
3727
04:36
Our societies have become really diverse,
109
276440
1970
04:38
which makes it difficult for pollsters to get a really nice representative sample
110
278434
3821
04:42
of the population for their polls.
111
282279
1627
04:43
People are really reluctant to answer their phones to pollsters,
112
283930
3006
04:46
and also, shockingly enough, people might lie.
113
286960
2276
04:49
But you wouldn't necessarily know that to look at the media.
114
289260
2811
04:52
For one thing, the probability of a Hillary Clinton win
115
292095
2761
04:54
was communicated with decimal places.
116
294880
2791
04:57
We don't use decimal places to describe the temperature.
117
297695
2621
05:00
How on earth can predicting the behavior of 230 million voters in this country
118
300340
4228
05:04
be that precise?
119
304592
1829
05:06
And then there were those sleek charts.
120
306445
2002
05:08
See, a lot of data visualizations will overstate certainty, and it works --
121
308471
3973
05:12
these charts can numb our brains to criticism.
122
312468
2620
05:15
When you hear a statistic, you might feel skeptical.
123
315112
2558
05:17
As soon as it's buried in a chart,
124
317694
1635
05:19
it feels like some kind of objective science,
125
319353
2129
05:21
and it's not.
126
321506
1249
05:22
So I was trying to find ways to better communicate this to people,
127
322779
3103
05:25
to show people the uncertainty in our numbers.
128
325906
2504
05:28
What I did was I started taking real data sets,
129
328434
2246
05:30
and turning them into hand-drawn visualizations,
130
330704
2652
05:33
so that people can see how imprecise the data is;
131
333380
2672
05:36
so people can see that a human did this,
132
336076
1996
05:38
a human found the data and visualized it.
133
338096
1972
05:40
For example, instead of finding out the probability
134
340092
2672
05:42
of getting the flu in any given month,
135
342788
2126
05:44
you can see the rough distribution of flu season.
136
344938
2792
05:47
This is --
137
347754
1167
05:48
(Laughter)
138
348945
1018
05:49
a bad shot to show in February.
139
349987
1486
05:51
But it's also more responsible data visualization,
140
351497
2455
05:53
because if you were to show the exact probabilities,
141
353976
2455
05:56
maybe that would encourage people to get their flu jabs
142
356455
2592
05:59
at the wrong time.
143
359071
1456
06:00
The point of these shaky lines
144
360983
1693
06:02
is so that people remember these imprecisions,
145
362700
2911
06:05
but also so they don't necessarily walk away with a specific number,
146
365635
3227
06:08
but they can remember important facts.
147
368886
1866
06:10
Facts like injustice and inequality leave a huge mark on our lives.
148
370776
4024
06:14
Facts like Black Americans and Native Americans have shorter life expectancies
149
374824
4189
06:19
than those of other races,
150
379037
1400
06:20
and that isn't changing anytime soon.
151
380461
2138
06:22
Facts like prisoners in the US can be kept in solitary confinement cells
152
382623
3901
06:26
that are smaller than the size of an average parking space.
153
386548
3342
06:30
The point of these visualizations is also to remind people
154
390355
3335
06:33
of some really important statistical concepts,
155
393714
2350
06:36
concepts like averages.
156
396088
1636
06:37
So let's say you hear a claim like,
157
397748
1668
06:39
"The average swimming pool in the US contains 6.23 fecal accidents."
158
399440
4434
06:43
That doesn't mean every single swimming pool in the country
159
403898
2797
06:46
contains exactly 6.23 turds.
160
406719
2194
06:48
So in order to show that,
161
408937
1417
06:50
I went back to the original data, which comes from the CDC,
162
410378
2841
06:53
who surveyed 47 swimming facilities.
163
413243
2065
06:55
And I just spent one evening redistributing poop.
164
415332
2391
06:57
So you can kind of see how misleading averages can be.
165
417747
2682
07:00
(Laughter)
166
420453
1282
07:01
OK, so the second question that you guys should be asking yourselves
167
421759
3901
07:05
to spot bad numbers is:
168
425684
1501
07:07
Can I see myself in the data?
169
427209
1967
07:09
This question is also about averages in a way,
170
429200
2913
07:12
because part of the reason why people are so frustrated
171
432137
2605
07:14
with these national statistics,
172
434766
1495
07:16
is they don't really tell the story of who's winning and who's losing
173
436285
3273
07:19
from national policy.
174
439582
1156
07:20
It's easy to understand why people are frustrated with global averages
175
440762
3318
07:24
when they don't match up with their personal experiences.
176
444104
2679
07:26
I wanted to show people the way data relates to their everyday lives.
177
446807
3263
07:30
I started this advice column called "Dear Mona,"
178
450094
2246
07:32
where people would write to me with questions and concerns
179
452364
2726
07:35
and I'd try to answer them with data.
180
455114
1784
07:36
People asked me anything.
181
456922
1200
07:38
questions like, "Is it normal to sleep in a separate bed to my wife?"
182
458146
3261
07:41
"Do people regret their tattoos?"
183
461431
1591
07:43
"What does it mean to die of natural causes?"
184
463046
2164
07:45
All of these questions are great, because they make you think
185
465234
2966
07:48
about ways to find and communicate these numbers.
186
468224
2336
07:50
If someone asks you, "How much pee is a lot of pee?"
187
470584
2503
07:53
which is a question that I got asked,
188
473111
2458
07:55
you really want to make sure that the visualization makes sense
189
475593
2980
07:58
to as many people as possible.
190
478597
1747
08:00
These numbers aren't unavailable.
191
480368
1575
08:01
Sometimes they're just buried in the appendix of an academic study.
192
481967
3507
08:05
And they're certainly not inscrutable;
193
485498
1839
08:07
if you really wanted to test these numbers on urination volume,
194
487361
2975
08:10
you could grab a bottle and try it for yourself.
195
490360
2257
08:12
(Laughter)
196
492641
1008
08:13
The point of this isn't necessarily
197
493673
1694
08:15
that every single data set has to relate specifically to you.
198
495391
2877
08:18
I'm interested in how many women were issued fines in France
199
498292
2880
08:21
for wearing the face veil, or the niqab,
200
501196
1959
08:23
even if I don't live in France or wear the face veil.
201
503179
2618
08:25
The point of asking where you fit in is to get as much context as possible.
202
505821
3835
08:29
So it's about zooming out from one data point,
203
509680
2191
08:31
like the unemployment rate is five percent,
204
511895
2104
08:34
and seeing how it changes over time,
205
514023
1757
08:35
or seeing how it changes by educational status --
206
515804
2650
08:38
this is why your parents always wanted you to go to college --
207
518478
3104
08:41
or seeing how it varies by gender.
208
521606
2032
08:43
Nowadays, male unemployment rate is higher
209
523662
2127
08:45
than the female unemployment rate.
210
525813
1700
08:47
Up until the early '80s, it was the other way around.
211
527537
2695
08:50
This is a story of one of the biggest changes
212
530256
2117
08:52
that's happened in American society,
213
532397
1720
08:54
and it's all there in that chart, once you look beyond the averages.
214
534141
3276
08:57
The axes are everything;
215
537441
1165
08:58
once you change the scale, you can change the story.
216
538630
2669
09:01
OK, so the third and final question that I want you guys to think about
217
541323
3380
09:04
when you're looking at statistics is:
218
544727
1819
09:06
How was the data collected?
219
546570
1873
09:09
So far, I've only talked about the way data is communicated,
220
549487
2939
09:12
but the way it's collected matters just as much.
221
552450
2276
09:14
I know this is tough,
222
554750
1167
09:15
because methodologies can be opaque and actually kind of boring,
223
555941
3081
09:19
but there are some simple steps you can take to check this.
224
559046
2873
09:21
I'll use one last example here.
225
561943
1839
09:24
One poll found that 41 percent of Muslims in this country support jihad,
226
564129
3887
09:28
which is obviously pretty scary,
227
568040
1525
09:29
and it was reported everywhere in 2015.
228
569589
2642
09:32
When I want to check a number like that,
229
572255
2615
09:34
I'll start off by finding the original questionnaire.
230
574894
2501
09:37
It turns out that journalists who reported on that statistic
231
577419
2926
09:40
ignored a question lower down on the survey
232
580369
2231
09:42
that asked respondents how they defined "jihad."
233
582624
2346
09:44
And most of them defined it as,
234
584994
1981
09:46
"Muslims' personal, peaceful struggle to be more religious."
235
586999
3942
09:50
Only 16 percent defined it as, "violent holy war against unbelievers."
236
590965
4194
09:55
This is the really important point:
237
595183
2430
09:57
based on those numbers, it's totally possible
238
597637
2155
09:59
that no one in the survey who defined it as violent holy war
239
599816
3105
10:02
also said they support it.
240
602945
1332
10:04
Those two groups might not overlap at all.
241
604301
2208
10:06
It's also worth asking how the survey was carried out.
242
606942
2637
10:09
This was something called an opt-in poll,
243
609603
1998
10:11
which means anyone could have found it on the internet and completed it.
244
611625
3402
10:15
There's no way of knowing if those people even identified as Muslim.
245
615051
3339
10:18
And finally, there were 600 respondents in that poll.
246
618414
2612
10:21
There are roughly three million Muslims in this country,
247
621050
2654
10:23
according to Pew Research Center.
248
623728
1607
10:25
That means the poll spoke to roughly one in every 5,000 Muslims
249
625359
2993
10:28
in this country.
250
628376
1168
10:29
This is one of the reasons
251
629568
1266
10:30
why government statistics are often better than private statistics.
252
630858
3607
10:34
A poll might speak to a couple hundred people, maybe a thousand,
253
634489
3035
10:37
or if you're L'Oreal, trying to sell skin care products in 2005,
254
637548
3058
10:40
then you spoke to 48 women to claim that they work.
255
640630
2417
10:43
(Laughter)
256
643071
1026
10:44
Private companies don't have a huge interest in getting the numbers right,
257
644121
3556
10:47
they just need the right numbers.
258
647701
1755
10:49
Government statisticians aren't like that.
259
649480
2020
10:51
In theory, at least, they're totally impartial,
260
651524
2447
10:53
not least because most of them do their jobs regardless of who's in power.
261
653995
3501
10:57
They're civil servants.
262
657520
1162
10:58
And to do their jobs properly,
263
658706
1964
11:00
they don't just speak to a couple hundred people.
264
660694
2363
11:03
Those unemployment numbers I keep on referencing
265
663081
2318
11:05
come from the Bureau of Labor Statistics,
266
665423
2004
11:07
and to make their estimates,
267
667451
1335
11:08
they speak to over 140,000 businesses in this country.
268
668810
3489
11:12
I get it, it's frustrating.
269
672323
1725
11:14
If you want to test a statistic that comes from a private company,
270
674072
3115
11:17
you can buy the face cream for you and a bunch of friends, test it out,
271
677211
3361
11:20
if it doesn't work, you can say the numbers were wrong.
272
680596
2591
11:23
But how do you question government statistics?
273
683211
2146
11:25
You just keep checking everything.
274
685381
1630
11:27
Find out how they collected the numbers.
275
687035
1913
11:28
Find out if you're seeing everything on the chart you need to see.
276
688972
3125
11:32
But don't give up on the numbers altogether, because if you do,
277
692121
2965
11:35
we'll be making public policy decisions in the dark,
278
695110
2439
11:37
using nothing but private interests to guide us.
279
697573
2262
11:39
Thank you.
280
699859
1166
11:41
(Applause)
281
701049
2461
About this website

This site will introduce you to YouTube videos that are useful for learning English. You will see English lessons taught by top-notch teachers from around the world. Double-click on the English subtitles displayed on each video page to play the video from there. The subtitles scroll in sync with the video playback. If you have any comments or requests, please contact us using this contact form.

https://forms.gle/WvT1wiN1qDtmnspy7