How statistics can be misleading - Mark Liddell

1,473,327 views ・ 2016-01-14

TED-Ed


μ•„λž˜ μ˜λ¬Έμžλ§‰μ„ λ”λΈ”ν΄λ¦­ν•˜μ‹œλ©΄ μ˜μƒμ΄ μž¬μƒλ©λ‹ˆλ‹€.

λ²ˆμ—­: suhyeon sim κ²€ν† : Seon-Gyu Choi
00:06
Statistics are persuasive.
0
6636
2441
ν†΅κ³„λŠ” 섀득λ ₯이 μžˆμŠ΅λ‹ˆλ‹€.
00:09
So much so that people, organizations, and whole countries
1
9077
3464
λ„ˆλ¬΄ 섀득λ ₯이 μžˆμ–΄μ„œ μ‚¬λžŒλ“€, 단체, 그리고 ꡭ가듀은
00:12
base some of their most important decisions on organized data.
2
12541
5206
κ·Έ μ •λ¦¬λœ 데이터λ₯Ό 기반으둜 μ€‘λŒ€ν•œ 결정을 λ‚΄λ¦½λ‹ˆλ‹€.
00:17
But there's a problem with that.
3
17747
1737
ν•˜μ§€λ§Œ κ±°κΈ°μ—λŠ” λ¬Έμ œκ°€ μžˆμŠ΅λ‹ˆλ‹€.
00:19
Any set of statistics might have something lurking inside it,
4
19484
3817
μ–΄λ–€ 톡계든,κ²°κ³Όλ₯Ό μ™„μ „νžˆ 뒀집을 수 μžˆλŠ”
00:23
something that can turn the results completely upside down.
5
23301
3950
무언가가 μˆ¨μ–΄μžˆμ„ μˆ˜λ„ μžˆμŠ΅λ‹ˆλ‹€.
00:27
For example, imagine you need to choose between two hospitals
6
27251
3669
예λ₯Ό λ“€μ–΄, μ—¬λŸ¬λΆ„μ΄ λ‚˜μ΄ λ“  μΉœμ²™ μ–΄λ₯Έμ˜ μˆ˜μˆ μ„ μœ„ν•΄
00:30
for an elderly relative's surgery.
7
30920
2817
두 병원 쀑 ν•˜λ‚˜λ₯Ό 골라야 ν•œλ‹€κ³  μƒκ°ν•΄λ΄…μ‹œλ‹€.
00:33
Out of each hospital's last 1000 patient's,
8
33737
2697
졜근 두 λ³‘μ›μ—μ„œ μΉ˜λ£Œλ°›μ€ 1,000λͺ…μ˜ ν™˜μž 쀑에
00:36
900 survived at Hospital A,
9
36434
3178
병원 Aμ—μ„œλŠ” 900λͺ…이 μ‚΄μ•„λ‚¨μ•˜κ³ 
00:39
while only 800 survived at Hospital B.
10
39612
3409
반면, 병원 Bμ—μ„œλŠ” 800λͺ…이 μ‚΄μ•„ λ‚¨μ•˜μŠ΅λ‹ˆλ‹€.
00:43
So it looks like Hospital A is the better choice.
11
43021
3149
κ·Έλ ‡λ‹€λ©΄ 병원 Aκ°€ 더 λ‚˜μ€ 선택 같이 λ³΄μž…λ‹ˆλ‹€.
00:46
But before you make your decision,
12
46170
1673
ν•˜μ§€λ§Œ κ·ΈλŸ¬ν•œ 결정을 내리기 전에
00:47
remember that not all patients arrive at the hospital
13
47843
3568
λͺ¨λ“  ν™˜μžκ°€ λ˜‘κ°™μ€ 건강 μƒνƒœλ‘œ 병원에 μ˜€λŠ” 것이
00:51
with the same level of health.
14
51411
2400
μ•„λ‹ˆλΌλŠ” 사싀을 κΈ°μ–΅ν•΄μ•Ό ν•©λ‹ˆλ‹€.
00:53
And if we divide each hospital's last 1000 patients
15
53811
2892
μ΅œκ·Όμ— 두 병원에 온 ν™˜μž 1,000λͺ…을
00:56
into those who arrived in good health and those who arrived in poor health,
16
56703
4429
κ±΄κ°•ν•œ μ‚¬λžŒκ³Ό μ•„ν”ˆ μ‚¬λžŒμœΌλ‘œ λ‚˜λˆ  보면
01:01
the picture starts to look very different.
17
61132
2640
상황이 달라 보이기 μ‹œμž‘ν•  κ²ƒμž…λ‹ˆλ‹€.
01:03
Hospital A had only 100 patients who arrived in poor health,
18
63772
4077
병원 Aμ—λŠ” λ‚˜μœ 건강 μƒνƒœμ˜ ν™˜μžκ°€ 100λͺ… 밖에 μ˜€μ§€ μ•Šμ•˜κ³ 
01:07
of which 30 survived.
19
67849
2476
κ·Έ 쀑에 30λͺ…이 μ‚΄μ•˜μŠ΅λ‹ˆλ‹€.
01:10
But Hospital B had 400, and they were able to save 210.
20
70325
4527
병원 Bμ—λŠ” λ‚˜μœ 건강 μƒνƒœμ˜ ν™˜μž 400λͺ…이 μ™”κ³ 
210λͺ…을 살릴 수 μžˆμ—ˆμŠ΅λ‹ˆλ‹€.
01:14
So Hospital B is the better choice
21
74852
2317
λ”°λΌμ„œ 병원 Bκ°€ 더 λ‚˜μ€ μ„ νƒμž…λ‹ˆλ‹€.
01:17
for patients who arrive at hospital in poor health,
22
77169
3572
λ‚˜μœ 건강 μƒνƒœλ‘œ 병원에 온 ν™˜μžλ“€μ—κ²Œ 말이죠.
01:20
with a survival rate of 52.5%.
23
80741
3785
μƒμ‘΄μœ¨μ΄ 52.5%λ‚˜ λ˜λ‹ˆκΉŒμš”.
01:24
And what if your relative's health is good when she arrives at the hospital?
24
84526
3919
ν•˜μ§€λ§Œ λ§Œμ•½ λ‹Ήμ‹ μ˜ μΉœμ²™μ΄
쒋은 건강 μƒνƒœλ‘œ 병원을 κ°€μ‹ λ‹€λ©΄ μ–΄λ–¨κΉŒμš”?
01:28
Strangely enough, Hospital B is still the better choice,
25
88445
3826
μ΄μƒν•˜κ²Œλ„, 병원 Bκ°€ μ—¬μ „νžˆ 더 λ‚˜μ€ μ„ νƒμž…λ‹ˆλ‹€.
01:32
with a survival rate of over 98%.
26
92271
3405
μƒμ‘΄μœ¨μ΄ 98%λ‚˜ λ˜λ‹ˆκΉŒμš”.
01:35
So how can Hospital A have a better overall survival rate
27
95676
3057
κ·ΈλŸ°λ°λ„ μ–΄λ–»κ²Œ 병원 A의 μ „λ°˜μ μΈ μƒμ‘΄μœ¨μ΄ 더 높을 수 μžˆμ„κΉŒμš”?
01:38
if Hospital B has better survival rates for patients in each of the two groups?
28
98733
6097
병원 Bκ°€ 각각 2개 κ·Έλ£Ή ν™˜μžμ˜ 더 높은
μƒμ‘΄μœ¨μ„ 가지고 μžˆλŠ”λ°λ„ λ§μž…λ‹ˆλ‹€.
01:44
What we've stumbled upon is a case of Simpson's paradox,
29
104830
3759
μ—¬κΈ°μ„œ μš°λ¦¬κ°€ μ•Œμ•„μ•Ό ν•  것이 λ°”λ‘œ μ‹¬μŠ¨μ˜ μ—­μ„€μž…λ‹ˆλ‹€.
01:48
where the same set of data can appear to show opposite trends
30
108589
3310
같은 μžλ£ŒλΌλ„ μ–΄λ–»κ²Œ λΆ„λ₯˜ν•˜λŠ” 지에 따라
01:51
depending on how it's grouped.
31
111899
2765
μ •λ°˜λŒ€μ˜ κ²°κ³Όλ₯Ό 보일 수 μžˆλŠ” κ±°μ£ .
01:54
This often occurs when aggregated data hides a conditional variable,
32
114664
4080
μ΄λŠ” μ·¨ν•©ν•œ 데이터가 쑰건뢀 λ³€μˆ˜λ₯Ό 감좔고 μžˆμ„ λ•Œ μ’…μ’… λ°œμƒν•©λ‹ˆλ‹€.
01:58
sometimes known as a lurking variable,
33
118744
2633
κ·Έ λ³€μˆ˜λ₯Ό 잠볡 λ³€μˆ˜λΌκ³ λ„ ν•©λ‹ˆλ‹€.
02:01
which is a hidden additional factor that significantly influences results.
34
121377
5207
결과에 μ€‘λŒ€ν•œ 영ν–₯을 λ―ΈμΉ˜λŠ” μˆ¨κ²¨μ§„ 좔가적 μš”μΈμ΄μ£ .
02:06
Here, the hidden factor is the relative proportion of patients
35
126584
3439
이 병원 μ‚¬λ‘€μ—μ„œ μˆ¨κ²¨μ§„ μš”μΈμ€ 병원에 온 ν™˜μžλ“€μ˜
02:10
who arrive in good or poor health.
36
130023
3241
건강 μƒνƒœμ— λŒ€ν•œ μƒλŒ€μ μΈ λΉ„μœ¨μ΄λΌκ³  ν•  수 μžˆμŠ΅λ‹ˆλ‹€.
02:13
Simpson's paradox isn't just a hypothetical scenario.
37
133264
3280
μ‹¬μŠ¨μ˜ 역섀은 단지 가상 μ‹œλ‚˜λ¦¬μ˜€κ°€ μ•„λ‹™λ‹ˆλ‹€.
02:16
It pops up from time to time in the real world,
38
136544
2380
그것은 ν˜„μ‹€μ—μ„œλ„ μ’…μ’… λ‚˜νƒ€λ‚©λ‹ˆλ‹€.
02:18
sometimes in important contexts.
39
138924
3208
λ•Œλ‘œλŠ” μ€‘μš”ν•œ μˆœκ°„μ— λ§μž…λ‹ˆλ‹€.
02:22
One study in the UK appeared to show
40
142132
1998
영ꡭ의 ν•œ μ—°κ΅¬μ—μ„œλŠ”
02:24
that smokers had a higher survival rate than nonsmokers
41
144130
3470
ν‘μ—°μžλ“€μ΄ λΉ„ν‘μ—°μžλ“€λ³΄λ‹€ 더 높은 μƒμ‘΄μœ¨μ„ λ³΄μ˜€μŠ΅λ‹ˆλ‹€.
02:27
over a twenty-year time period.
42
147600
2246
20λ…„ λ™μ•ˆ 연ꡬλ₯Ό ν•œ κ²°κ³Ό 말이죠.
02:29
That is, until dividing the participants by age group
43
149846
3461
그것은 μ°Έκ°€μžλ“€μ„ μ—°λ ΉλŒ€λ³„λ‘œ λΆ„λ₯˜ν•˜κΈ° μ „κΉŒμ§€ λ§žλŠ” λ§μ΄μ—ˆμŠ΅λ‹ˆλ‹€.
02:33
showed that the nonsmokers were significantly older on average,
44
153307
4516
그런데 비흑연 μ°Έκ°€μžλ“€μ€ ν‰κ· μ μœΌλ‘œ λ‚˜μ΄κ°€ λ§Žμ€ μ‚¬λžŒλ“€μ΄μ—ˆμŠ΅λ‹ˆλ‹€.
02:37
and thus, more likely to die during the trial period,
45
157823
3107
κ·Έλž˜μ„œ 연ꡬ κΈ°κ°„ λ™μ•ˆ 사망할 ν™•λ₯ μ΄ 더 λ†’μ•˜λ˜κ±°μ£ .
02:40
precisely because they were living longer in general.
46
160930
3508
μ •ν™•νžˆ λ§ν•˜λ©΄, 그듀이 κ·Έλƒ₯ 더 였래 μ‚΄μ•˜κΈ° λ•Œλ¬Έμ΄μ—ˆμ–΄μš”.
02:44
Here, the age groups are the lurking variable,
47
164438
2848
μ—¬κΈ°μ„œλŠ” μ—°λ ΉλŒ€κ°€ μˆ¨μ€ λ³€μˆ˜μ΄μž
02:47
and are vital to correctly interpret the data.
48
167286
2890
데이터λ₯Ό λ°”λ₯΄κ²Œ ν•΄μ„ν•˜κΈ° μœ„ν•œ ν•„μˆ˜μ μΈ μš”μ†Œμ˜€μŠ΅λ‹ˆλ‹€.
02:50
In another example,
49
170176
1383
λ‹€λ₯Έ μ˜ˆμ—μ„œλŠ”
02:51
an analysis of Florida's death penalty cases
50
171559
2722
ν”Œλ‘œλ¦¬λ‹€μ£Όμ˜ μ‚¬ν˜• 사건듀을 뢄석해 보면
02:54
seemed to reveal no racial disparity in sentencing
51
174281
3984
μ‚΄μΈμ£„λ‘œ μœ μ£„λ₯Ό 선고받은 흑인과 백인 피고인듀 사이에
02:58
between black and white defendants convicted of murder.
52
178265
3316
인쒅 차별이 μ—†λŠ” 것을 μ•Œ 수 μžˆμŠ΅λ‹ˆλ‹€.
03:01
But dividing the cases by the race of the victim told a different story.
53
181581
4815
ν•˜μ§€λ§Œ 사건을 ν”Όν•΄μžμ˜ 인쒅에 따라 λΆ„λ₯˜ν•  경우, μ΄μ•ΌκΈ°λŠ” λ‹¬λΌμ§‘λ‹ˆλ‹€.
03:06
In either situation,
54
186396
1573
각각의 경우
03:07
black defendants were more likely to be sentenced to death.
55
187969
3122
흑인 피고인이 μ‚¬ν˜•μ„ 선고받은 ν™•λ₯ μ΄ 더 λ†’μ•˜μŠ΅λ‹ˆλ‹€.
03:11
The slightly higher overall sentencing rate for white defendants
56
191091
3975
백인 피고인에 λŒ€ν•œ 전체적인 μ‚¬ν˜• μ„ κ³ μœ¨μ΄ 쑰금 λ†’μ•„μ§€λŠ” μ΄μœ λŠ”
03:15
was due to the fact that cases with white victims
57
195066
3626
ν”Όν•΄μžκ°€ 백인인 경우의 μ‚¬κ±΄μ—μ„œ
03:18
were more likely to elicit a death sentence
58
198692
2667
μ‚¬ν˜• μ„ κ³ λ₯Ό 받을 κ°€λŠ₯성이 λ†’κΈ° λ•Œλ¬Έμž…λ‹ˆλ‹€.
03:21
than cases where the victim was black,
59
201359
2732
ν”Όν•΄μžκ°€ 흑인일 κ²½μš°λ³΄λ‹€ λ§μž…λ‹ˆλ‹€.
03:24
and most murders occurred between people of the same race.
60
204091
4392
그리고 λŒ€λΆ€λΆ„μ˜ 살인은 같은 인쒅 간에 λ°œμƒν–ˆμŠ΅λ‹ˆλ‹€.
03:28
So how do we avoid falling for the paradox?
61
208483
2836
κ·Έλ ‡λ‹€λ©΄ μš°λ¦¬λŠ” μ–΄λ–»κ²Œ ν•˜λ©΄ μ΄λŸ¬ν•œ 역섀을 ν”Όν•  수 μžˆμ„κΉŒμš”?
03:31
Unfortunately, there's no one-size-fits-all answer.
62
211319
3367
λΆˆν–‰νžˆλ„, λ”± 정해진 닡이 μ—†μŠ΅λ‹ˆλ‹€.
03:34
Data can be grouped and divided in any number of ways,
63
214686
3818
μžλ£ŒλŠ” μˆ˜λ§Žμ€ 기쀀에 따라 λΆ„λ₯˜ 될 수 있으며
03:38
and overall numbers may sometimes give a more accurate picture
64
218504
3602
였히렀 데이터 전체가 더 μ •ν™•ν•œ 그림을 보여주기도 ν•©λ‹ˆλ‹€.
03:42
than data divided into misleading or arbitrary categories.
65
222106
4532
자의적이고 잘λͺ»λœ λ°©μ‹μœΌλ‘œ λΆ„λ₯˜λœ 데이터보닀 말이죠.
03:46
All we can do is carefully study the actual situations the statistics describe
66
226638
5451
μš°λ¦¬κ°€ ν•  수 μžˆλŠ” 것은 톡계가 λ¬˜μ‚¬ν•˜λŠ” μ‹€μ œ 상황을 꼼꼼히 μ—°κ΅¬ν•˜κ³ 
03:52
and consider whether lurking variables may be present.
67
232089
3888
μˆ¨μ€ λ³€μˆ˜κ°€ μžˆμ§€λŠ” μ•Šμ€μ§€ μƒκ°ν•΄λ³΄λŠ” κ²ƒμž…λ‹ˆλ‹€.
03:55
Otherwise, we leave ourselves vulnerable to those who would use data
68
235977
3401
그렇지 μ•ŠμœΌλ©΄ μš°λ¦¬λŠ” 슀슀둜λ₯Ό 지킀기 νž˜λ“€μ–΄μ§‘λ‹ˆλ‹€.
03:59
to manipulate others and promote their own agendas.
69
239378
3271
μ‚¬λžŒλ“€μ΄ μ˜€ν•΄ν•˜κ²Œ 데이터λ₯Ό μ΄μš©ν•˜λŠ” μ‚¬λžŒλ“€λ‘œλΆ€ν„° λ§μž…λ‹ˆλ‹€.
이 μ›Ήμ‚¬μ΄νŠΈ 정보

이 μ‚¬μ΄νŠΈλŠ” μ˜μ–΄ ν•™μŠ΅μ— μœ μš©ν•œ YouTube λ™μ˜μƒμ„ μ†Œκ°œν•©λ‹ˆλ‹€. μ „ 세계 졜고의 μ„ μƒλ‹˜λ“€μ΄ κ°€λ₯΄μΉ˜λŠ” μ˜μ–΄ μˆ˜μ—…μ„ 보게 될 κ²ƒμž…λ‹ˆλ‹€. 각 λ™μ˜μƒ νŽ˜μ΄μ§€μ— ν‘œμ‹œλ˜λŠ” μ˜μ–΄ μžλ§‰μ„ 더블 ν΄λ¦­ν•˜λ©΄ κ·Έκ³³μ—μ„œ λ™μ˜μƒμ΄ μž¬μƒλ©λ‹ˆλ‹€. λΉ„λ””μ˜€ μž¬μƒμ— 맞좰 μžλ§‰μ΄ μŠ€ν¬λ‘€λ©λ‹ˆλ‹€. μ˜κ²¬μ΄λ‚˜ μš”μ²­μ΄ μžˆλŠ” 경우 이 문의 양식을 μ‚¬μš©ν•˜μ—¬ λ¬Έμ˜ν•˜μ‹­μ‹œμ˜€.

https://forms.gle/WvT1wiN1qDtmnspy7