The method that can "prove" almost anything - James A. Smith

836,535 views ใƒป 2021-08-05

TED-Ed


ืื ื ืœื—ืฅ ืคืขืžื™ื™ื ืขืœ ื”ื›ืชื•ื‘ื™ื•ืช ื‘ืื ื’ืœื™ืช ืœืžื˜ื” ื›ื“ื™ ืœื”ืคืขื™ืœ ืืช ื”ืกืจื˜ื•ืŸ.

ืชืจื’ื•ื: Ido Dekkers ืขืจื™ื›ื”: Sigal Tifferet
00:06
In 2011, a group of researchers conducted a scientific study
0
6371
4167
ื‘ 2011, ืงื‘ื•ืฆืช ื—ื•ืงืจื™ื ืขืจื›ื” ืžื—ืงืจ ืžื“ืขื™
00:10
to find an impossible result:
1
10538
2125
ื•ืงื™ื‘ืœื” ืชื•ืฆืื” ื‘ืœืชื™ ืืคืฉืจื™ืช:
00:12
that listening to certain songs can make you younger.
2
12663
3500
ืฉื”ืงืฉื‘ื” ืœืฉื™ืจื™ื ืžืกื•ื™ื™ืžื™ื ื™ื›ื•ืœื” ืœื”ืคื•ืš ืืชื›ื ืœืฆืขื™ืจื™ื.
00:16
Their study involved real people, truthfully reported data,
3
16663
3625
ื”ืžื—ืงืจ ืฉืœื”ื ื›ืœืœ ืื ืฉื™ื ืืžื™ืชื™ื™ื, ื ืชื•ื ื™ื ืืžื™ืชื™ื™ื ื•ื›ื ื™ื,
00:20
and commonplace statistical analyses.
4
20288
3000
ื•ื ื™ืชื•ื—ื™ื ืกื˜ื˜ื™ืกื˜ื™ื™ื ื ืคื•ืฆื™ื.
00:23
So how did they do it?
5
23288
1416
ืื– ืื™ืš ื”ื ืขืฉื• ืืช ื–ื”?
00:24
The answer lies in a statistical method scientists often use
6
24704
4125
ื”ืชืฉื•ื‘ื” ื ืžืฆืืช ื‘ืฉื™ื˜ื” ืกื˜ื˜ื™ืกื˜ื™ืช ื‘ื” ืžื“ืขื ื™ื ืžืฉืชืžืฉื™ื ืจื‘ื•ืช
00:28
to try to figure out whether their results mean something or if theyโ€™re random noise.
7
28829
4875
ื›ื“ื™ ืœื”ื‘ื™ืŸ ืื ืœืชื•ืฆืื•ืช ืฉืงื™ื‘ืœื• ื™ืฉ ืžืฉืžืขื•ืช, ืื• ืฉื”ืŸ ืกืชื ืจืขืฉ ืืงืจืื™.
00:33
In fact, the whole point of the music study
8
33704
2625
ืœืžืขืฉื”, ื›ืœ ืžื˜ืจืชื• ืฉืœ ืžื—ืงืจ ื”ืžื•ื–ื™ืงื”
00:36
was to point out ways this method can be misused.
9
36329
3917
ื”ื™ืชื” ืœื”ืฆื‘ื™ืข ืขืœ ื“ืจื›ื™ื ื‘ื”ืŸ ืืคืฉืจ ืœื”ืฉืชืžืฉ ืœืจืขื” ื‘ืฉื™ื˜ื” ื”ื–ื•.
00:40
A famous thought experiment explains the method:
10
40746
2791
ื ื™ืกื•ื™ ืžื—ืฉื‘ืชื™ ืžืคื•ืจืกื ืžืกื‘ื™ืจ ืืช ื”ืฉื™ื˜ื”:
00:43
there are eight cups of tea,
11
43746
1750
ื™ืฉ ืฉืžื•ื ื” ื›ื•ืกื•ืช ืชื”,
00:45
four with the milk added first, and four with the tea added first.
12
45496
4416
ื‘ืืจื‘ืข ืžื”ืŸ ืžื–ื’ื• ืงื•ื“ื ื—ืœื‘, ื•ื‘ืืจื‘ืข ืžื–ื’ื• ืงื•ื“ื ืชื”.
00:50
A participant must determine which are which according to taste.
13
50162
3625
ืžืฉืชืชืคืช ืฆืจื™ื›ื” ืœืงื‘ื•ืข ืžื” ืžื–ื’ื• ืงื•ื“ื ื‘ื›ืœ ืื—ืช ืžื”ืŸ, ืœืคื™ ื”ื˜ืขื.
00:53
There are 70 different ways the cups can be sorted into two groups of four,
14
53871
4583
ื™ืฉ 70 ื“ืจื›ื™ื ืฉื•ื ื•ืช ืฉื”ื›ื•ืกื•ืช ื™ื›ื•ืœื•ืช ืœื”ื™ื•ืช ืžืงื•ื‘ืฆื•ืช ื‘ืฉืชื™ ืงื‘ื•ืฆื•ืช ืฉืœ ืืจื‘ืข,
00:58
and only one is correct.
15
58454
2000
ื•ืจืง ืื—ืช ืžื”ืŸ ื ื›ื•ื ื”.
01:00
So, can she taste the difference?
16
60662
2584
ืื– ื”ืื ื”ื™ื ืชื•ื›ืœ ืœื˜ืขื•ื ืžื” ืžื–ื’ื• ืงื•ื“ื?
01:03
Thatโ€™s our research question.
17
63246
1625
ื–ื• ืฉืืœืช ื”ืžื—ืงืจ ืฉืœื ื•.
01:04
To analyze her choices, we define whatโ€™s called a null hypothesis:
18
64871
4625
ื›ื“ื™ ืœื ืชื— ืืช ื”ื‘ื—ื™ืจื•ืช ืฉืœื”, ืื ื—ื ื• ืžื’ื“ื™ืจื™ื โ€œื”ืฉืขืจืช ืืคืกโ€œ:
01:09
that she canโ€™t distinguish the teas.
19
69496
2167
ืฉื”ื™ื ืœื ื™ื›ื•ืœื” ืœื”ื‘ื—ื™ืŸ ื‘ื™ืŸ ืกื•ื’ื™ ื”ืชื”.
01:11
If she canโ€™t distinguish the teas,
20
71871
2042
ืื ื”ื™ื ืœื ื™ื›ื•ืœื” ืœื”ื‘ื—ื™ืŸ ื‘ื™ืŸ ืกื•ื’ื™ ื”ืชื”,
01:13
sheโ€™ll still get the right answer 1 in 70 times by chance.
21
73913
5166
ื”ื™ื ืขื“ื™ื™ืŸ ืชื’ื™ืข ืœืชืฉื•ื‘ื” ื”ื ื›ื•ื ื” ื‘ 1 ืžืชื•ืš 70 ืคืขืžื™ื ื‘ืžืงืจื”.
01:19
1 in 70 is roughly .014.
22
79079
3334
1 ืžืชื•ืš 70 ื–ื” ื‘ืขืจืš 0.014.
01:22
That single number is called a p-value.
23
82746
3292
ื”ืžืกืคืจ ื”ื–ื” ื ืงืจื ืขืจืš p.
01:26
In many fields, a p-value of .05 or below is considered statistically significant,
24
86038
6916
ื‘ืชื—ื•ืžื™ื ืจื‘ื™ื, ืขืจืš p ืฉืœ 0.05 ืื• ืคื—ื•ืช ื ื—ืฉื‘ ืœืžื•ื‘ื”ืง ืžื‘ื—ื™ื ื” ืกื˜ื˜ื™ืกื˜ื™ืช,
01:32
meaning thereโ€™s enough evidence to reject the null hypothesis.
25
92954
3792
ืžื” ืฉืื•ืžืจ ืฉื™ืฉ ืžืกืคื™ืง ืขื“ื•ื™ื•ืช ืœื“ื—ื•ืช ืืช ื”ืฉืขืจืช ื”ืืคืก.
01:36
Based on a p-value of .014,
26
96996
3375
ื‘ื”ืชื‘ืกืก ืขืœ ืขืจืš p ืฉืœ 0.014,
01:40
theyโ€™d rule out the null hypothesis that she canโ€™t distinguish the teas.
27
100371
4125
ื”ื ื™ืคืกืœื• ืืช ื”ืฉืขืจืช ื”ืืคืก ืฉื”ื™ื ืœื ื™ื›ื•ืœื” ืœื”ื‘ื—ื™ืŸ ื‘ื™ืŸ ืกื•ื’ื™ ื”ืชื”.
01:44
Though p-values are commonly used by both researchers and journals
28
104913
3916
ืœืžืจื•ืช ืฉืžืงื•ื‘ืœ ืžืื•ื“ ืœื”ืฉืชืžืฉ ื‘ืขืจื›ื™ p ืขืœ ื™ื“ื™ ื—ื•ืงืจื™ื ื•ื›ืชื‘ื™-ืขืช
01:48
to evaluate scientific results,
29
108829
2084
ื›ื“ื™ ืœื”ืขืจื™ืš ืชื•ืฆืื•ืช ืžื“ืขื™ื•ืช,
01:50
theyโ€™re really confusing, even for many scientists.
30
110913
2958
ื”ื ื‘ืืžืช ืžื‘ืœื‘ืœื™ื, ื’ื ืœืจื‘ื™ื ืžื”ืžื“ืขื ื™ื•ืช.
01:54
Thatโ€™s partly because all a p-value actually tells us
31
114329
4042
ื—ืœืงื™ืช, ื‘ื’ืœืœ ืฉื›ืœ ืžื” ืฉืขืจืš p ืื•ืžืจ ืœื ื•,
01:58
is the probability of getting a certain result,
32
118371
3000
ื–ื” ืžื” ื”ื”ืกืชื‘ืจื•ืช ืœืงื‘ืœ ืชื•ืฆืื” ืžืกื•ื™ื™ืžืช,
02:01
assuming the null hypothesis is true.
33
121371
2917
ื‘ื”ื ื—ื” ืฉื”ืฉืขืจืช ื”ืืคืก ื ื›ื•ื ื”.
02:04
So if she correctly sorts the teas,
34
124663
2791
ืื– ืื ื”ื™ื ืžืžื™ื™ื ืช ืืช ื›ื•ืกื•ืช ื”ืชื” ื‘ืื•ืคืŸ ื ื›ื•ืŸ,
02:07
the p-value is the probability of her doing so
35
127454
3417
ืขืจืš ื” p ื”ื•ื ื”ื”ืกืชื‘ืจื•ืช ืฉื”ื™ื ืชืขืฉื” ื–ืืช
02:10
assuming she canโ€™t tell the difference.
36
130871
2458
ื‘ื”ื ื—ื” ืฉื”ื™ื ืœื ื™ื›ื•ืœื” ืœื”ื‘ื“ื™ืœ.
02:13
But the reverse isnโ€™t true:
37
133329
2459
ืื‘ืœ ื”ื”ื™ืคืš ืื™ื ื• ื ื›ื•ืŸ:
02:15
the p-value doesnโ€™t tell us the probability
38
135788
2416
ืขืจืš ื” p ืœื ืื•ืžืจ ืœื ื• ืžื” ื”ื”ืกืชื‘ืจื•ืช
02:18
that she can taste the difference,
39
138204
1625
ืฉื”ื™ื ื™ื›ื•ืœื” ืœื”ื‘ื“ื™ืœ ื‘ื™ืŸ ื”ื˜ืขืžื™ื,
02:19
which is what weโ€™re trying to find out.
40
139829
2084
ืฉื–ื” ืžื” ืฉืื ื—ื ื• ืžื ืกื™ื ืœื’ืœื•ืช.
02:22
So if a p-value doesnโ€™t answer the research question,
41
142329
3250
ืื– ืื ืขืจืš ื” p ืœื ืขื•ื ื” ืขืœ ืฉืืœืช ื”ืžื—ืงืจ,
02:25
why does the scientific community use it?
42
145579
2292
ืœืžื” ื”ืงื”ื™ืœื” ื”ืžื“ืขื™ืช ืžืฉืชืžืฉืช ื‘ื•?
02:28
Well, because even though a p-value doesnโ€™t directly state the probability
43
148329
4709
ืžื›ื™ื•ื•ืŸ ืฉืœืžืจื•ืช ืฉืขืจืš p ืœื ืžืจืื” ื™ืฉื™ืจื•ืช ืืช ื”ื”ืกืชื‘ืจื•ืช
02:33
that the results are due to random chance,
44
153038
2500
ืฉืชื•ืฆืื” ื”ืชืงื‘ืœื” ื‘ืื•ืคืŸ ืžืงืจื™,
02:35
it usually gives a pretty reliable indication.
45
155538
3333
ื”ื•ื ื‘ื“ืจืš ื›ืœืœ ืžื”ื•ื•ื” ืกื™ืžืŸ ื“ื™ ืืžื™ืŸ.
02:39
At least, it does when used correctly.
46
159204
2792
ืœืคื—ื•ืช, ืื ืžืฉืชืžืฉื™ื ื‘ื• ื ื›ื•ืŸ.
02:41
And thatโ€™s where many researchers, and even whole fields,
47
161996
3917
ื•ืฉื ื”ืจื‘ื” ื—ื•ืงืจื•ืช, ื•ืืคื™ืœื• ืชื—ื•ืžื™ื ืฉืœืžื™ื,
02:45
have run into trouble.
48
165913
1458
ื ืชืงืœื• ื‘ื‘ืขื™ื”.
02:47
Most real studies are more complex than the tea experiment.
49
167538
3458
ืจื•ื‘ ื”ืžื—ืงืจื™ื ื”ืืžื™ืชื™ื™ื ืžื•ืจื›ื‘ื™ื ื™ื•ืชืจ ืžื ื™ืกื•ื™ ื”ืชื”.
02:51
Scientists can test their research question in multiple ways,
50
171288
3375
ืžื“ืขื ื™ื ื™ื›ื•ืœื™ื ืœื‘ื—ื•ืŸ ืืช ืฉืืœืช ื”ืžื—ืงืจ ืฉืœื”ื ื‘ืžื‘ื—ื ื™ื ืžืจื•ื‘ื™ื,
02:54
and some of these tests might produce a statistically significant result,
51
174663
4375
ื•ื›ืžื” ืžื”ืžื‘ื—ื ื™ื ื”ืืœื” ืื•ืœื™ ื™ืชื ื• ืชื•ืฆืื” ืกื˜ื˜ื™ืกื˜ื™ืช ืžื•ื‘ื”ืงืช,
02:59
while others donโ€™t.
52
179038
1208
ื‘ื–ืžืŸ ืฉืื—ืจื™ื ืœื.
03:00
It might seem like a good idea to test every possibility.
53
180454
3167
ื–ื” ืื•ืœื™ ื ืจืื” ื›ืžื• ืจืขื™ื•ืŸ ื˜ื•ื‘ ืœื‘ื—ื•ืŸ ื›ืœ ืืคืฉืจื•ืช.
03:03
But itโ€™s not, because with each additional test,
54
183913
3083
ืื‘ืœ ื–ื” ืœื. ื›ื™ ืขื ื›ืœ ืžื‘ื—ืŸ ื ื•ืกืฃ,
03:07
the chance of a false positive increases.
55
187163
3208
ืขื•ืœื” ื”ืกื™ื›ื•ื™ ืœืงื‘ืœ ืชื•ืฆืื” ื—ื™ื•ื‘ื™ืช ืฉื’ื•ื™ื™ื”.
03:10
Searching for a low p-value, and then presenting only that analysis,
56
190996
4500
ื—ื™ืคื•ืฉ ืื—ืจ ืขืจืš p ื ืžื•ืš, ื•ืื– ื”ืฆื’ื” ืจืง ืฉืœ ื”ื ื™ืชื•ื— ื”ื–ื”,
03:15
is often called p-hacking.
57
195496
2750
ื ืงืจื ื”ืืงื™ื ื’ ืฉืœ p.
03:18
Itโ€™s like throwing darts until you hit a bullseye
58
198246
2750
ื–ื” ื›ืžื• ืœื–ืจื•ืง ื—ื™ืฆื™ื ืขื“ ืฉืืชื ืคื•ื’ืขื™ื ื‘ื‘ื•ืœ
03:20
and then saying you only threw the dart that hit the bullโ€™s eye.
59
200996
3333
ื•ืื– ืœื”ื’ื™ื“ ืฉืจืง ื–ืจืงืชื ืืช ื”ื—ืฅ ืฉืคื’ืข ื‘ื‘ื•ืœ.
03:24
This is exactly what the music researchers did.
60
204746
3208
ื–ื” ื‘ื“ื™ื•ืง ืžื” ืฉื—ื•ืงืจื™ ื”ืžื•ื–ื™ืงื” ืขืฉื•.
03:28
They played three groups of participants each a different song
61
208079
3709
ื”ื ื”ืฉืžื™ืขื• ืฉื™ืจ ืฉื•ื ื” ืœื›ืœ ืื—ืช ืžืฉืœื•ืฉ ืงื‘ื•ืฆื•ืช ืžืฉืชืชืคื™ื
03:31
and collected lots of information about them.
62
211788
2500
ื•ืืกืคื• ื”ืจื‘ื” ืžื™ื“ืข ืขืœื™ื”ืŸ.
03:34
The analysis they published included only two out of the three groups.
63
214288
4250
ื”ื ื™ืชื•ื— ืฉื”ื ืคืจืกืžื• ื›ืœืœ ืจืง ืฉืชื™ื ืžืฉืœื•ืฉ ื”ืงื‘ื•ืฆื•ืช.
03:38
Of all the information they collected,
64
218538
2208
ืžื›ืœ ื”ืžื™ื“ืข ืฉื”ื ืืกืคื•,
03:40
their analysis only used participantsโ€™ fathersโ€™ ageโ€”
65
220746
3542
ื”ื ื™ืชื•ื— ืฉืœื”ื ื›ืœืœ ืจืง ืืช ื’ื™ืœ ื”ืื‘ ืฉืœ ื”ืžืฉืชืชืคื™ื --
03:44
to โ€œcontrol for variation in baseline age across participants.โ€
66
224288
4541
ื›ื“ื™ โ€œืœืฉืœื•ื˜ ื‘ืฉื•ื ื•ืช ื”ื’ื™ืœ ื”ื‘ืกื™ืกื™ ืžืขื‘ืจ ืœืžืฉืชืชืคื™ื.โ€
03:49
They also paused their experiment after every ten participants,
67
229246
4208
ื”ื ื’ื ืขืฆืจื• ืืช ื”ื ื™ืกื•ื™ ืฉืœื”ื ืื—ืจื™ ื›ืœ ืขืฉืจื” ืžืฉืชืชืคื™ื,
03:53
and continued if the p-value was above .05,
68
233454
4459
ื•ื”ืžืฉื™ื›ื• ืื ืขืจืš ื” p ื”ื™ื” ืžืขืœ 0.05,
03:57
but stopped when it dipped below .05.
69
237913
3291
ืื‘ืœ ืขืฆืจื• ืื•ืชื• ืื ื”ืขืจืš ื™ืจื“ ืžืชื—ืช ืœ 0.05.
04:01
They found that participants who heard one song were 1.5 years younger
70
241746
5208
ื”ื ื’ื™ืœื• ืฉืžืฉืชืชืคื™ื ืฉืฉืžืขื• ืฉื™ืจ ืื—ื“, ื”ื™ื• ืฆืขื™ืจื™ื ื‘ 1.5 ืฉื ื™ื
04:06
than those who heard the other song, with a p-value of .04.
71
246954
4375
ืžืืœื• ืฉืฉืžืขื• ืืช ื”ืฉื™ืจ ื”ืื—ืจ, ืขื ืขืจืš p ืฉืœ 0.04.
04:12
Usually itโ€™s much tougher to spot p-hacking,
72
252163
2833
ื‘ื“ืจืš ื›ืœืœ ื”ืจื‘ื” ื™ื•ืชืจ ืงืฉื” ืœื’ืœื•ืช ื”ืืงื™ื ื’ ืฉืœ p,
04:14
because we donโ€™t know the results are impossible:
73
254996
2667
ื›ื™ ืื ื—ื ื• ืœื ื™ื•ื“ืขื•ืช ืฉื”ืชื•ืฆืื•ืช ื”ืŸ ื‘ืœืชื™ ืืคืฉืจื™ื•ืช:
04:17
the whole point of doing experiments is to learn something new.
74
257663
3416
ื›ืœ ื”ืจืขื™ื•ืŸ ื‘ืขืจื™ื›ืช ื ื™ืกื•ื™ื™ื ื”ื™ื ืœืœืžื•ื“ ืžืฉื”ื• ื—ื“ืฉ.
04:21
Fortunately, thereโ€™s a simple way to make p-values more reliable:
75
261329
4209
ืœืžืจื‘ื” ื”ืžื–ืœ, ื™ืฉ ื“ืจืš ืคืฉื•ื˜ื” ืœื”ืคื•ืš ืืช ืขืจืš ื” p ืœื™ื•ืชืจ ืืžื™ืŸ:
04:25
pre-registering a detailed plan for the experiment and analysis
76
265913
4708
ืจื™ืฉื•ื ืžืจืืฉ ืฉืœ ื”ืชื›ื ื™ืช ื”ืžืคื•ืจื˜ืช ืœื ื™ืกื•ื™ ื•ืœื ื™ืชื•ื— ื”ื ืชื•ื ื™ื,
04:30
beforehand that others can check,
77
270621
2458
ื›ื“ื™ ืฉืื—ืจื™ื ื™ื•ื›ืœื• ืœื‘ื“ื•ืง ืื•ืชื” ืงื•ื“ื,
04:33
so researchers canโ€™t keep trying different analyses
78
273079
3417
ื•ื”ื—ื•ืงืจื™ื ืœื ื™ื•ื›ืœื• ืœื ืกื•ืช ื ื™ืชื•ื—ื™ื ืฉื•ื ื™ื
04:36
until they find a significant result.
79
276496
2125
ืขื“ ืฉื”ื ื™ืงื‘ืœื• ืชื•ืฆืื” ืžื•ื‘ื”ืงืช.
04:38
And, in the true spirit of scientific inquiry,
80
278788
2458
ื•ื‘ืจื•ื— ื”ืืžื™ืชื™ืช ืฉืœ ื—ืงื™ืจื” ืžื“ืขื™ืช,
04:41
thereโ€™s even a new field thatโ€™s basically science doing science on itself:
81
281246
5375
ื™ืฉ ืืคื™ืœื• ืชื—ื•ื ื—ื“ืฉ ืฉื”ื•ื ืžื“ืข ืขืœ ืžื“ืข:
04:46
studying scientific practices in order to improve them.
82
286621
3667
ืžื—ืงืจ ืขืœ ืžื ื”ื’ื™ื ืžื“ืขื™ื™ื, ืฉื ื•ืขื“ ืœืฉืคืจ ืื•ืชื.
ืขืœ ืืชืจ ื–ื”

ืืชืจ ื–ื” ื™ืฆื™ื’ ื‘ืคื ื™ื›ื ืกืจื˜ื•ื ื™ YouTube ื”ืžื•ืขื™ืœื™ื ืœืœื™ืžื•ื“ ืื ื’ืœื™ืช. ืชื•ื›ืœื• ืœืจืื•ืช ืฉื™ืขื•ืจื™ ืื ื’ืœื™ืช ื”ืžื•ืขื‘ืจื™ื ืขืœ ื™ื“ื™ ืžื•ืจื™ื ืžื”ืฉื•ืจื” ื”ืจืืฉื•ื ื” ืžืจื—ื‘ื™ ื”ืขื•ืœื. ืœื—ืฅ ืคืขืžื™ื™ื ืขืœ ื”ื›ืชื•ื‘ื™ื•ืช ื‘ืื ื’ืœื™ืช ื”ืžื•ืฆื’ื•ืช ื‘ื›ืœ ื“ืฃ ื•ื™ื“ืื• ื›ื“ื™ ืœื”ืคืขื™ืœ ืืช ื”ืกืจื˜ื•ืŸ ืžืฉื. ื”ื›ืชื•ื‘ื™ื•ืช ื’ื•ืœืœื•ืช ื‘ืกื ื›ืจื•ืŸ ืขื ื”ืคืขืœืช ื”ื•ื•ื™ื“ืื•. ืื ื™ืฉ ืœืš ื”ืขืจื•ืช ืื• ื‘ืงืฉื•ืช, ืื ื ืฆื•ืจ ืื™ืชื ื• ืงืฉืจ ื‘ืืžืฆืขื•ืช ื˜ื•ืคืก ื™ืฆื™ืจืช ืงืฉืจ ื–ื”.

https://forms.gle/WvT1wiN1qDtmnspy7