The method that can "prove" almost anything - James A. Smith

816,442 views ・ 2021-08-05

TED-Ed


请双击下面的英文字幕来播放视频。

翻译人员: Xiwen Zhu 校对人员: Yanyan Hong
00:06
In 2011, a group of researchers conducted a scientific study
0
6371
4167
2011 年,一组研究人员 进行了一项不可能找到结论的
00:10
to find an impossible result:
1
10538
2125
科学研究:
00:12
that listening to certain songs can make you younger.
2
12663
3500
听某些歌曲可以让你变得更年轻。
00:16
Their study involved real people, truthfully reported data,
3
16663
3625
这项研究邀请真人参与, 采用真实有效的数据,
00:20
and commonplace statistical analyses.
4
20288
3000
以及常见的统计分析。
00:23
So how did they do it?
5
23288
1416
那么,他们是怎么做到的呢?
00:24
The answer lies in a statistical method scientists often use
6
24704
4125
答案就在科学家们 常用的一种统计方法,
00:28
to try to figure out whether their results mean something or if they’re random noise.
7
28829
4875
用来检测他们的数据 是有意义的还是随机的。
00:33
In fact, the whole point of the music study
8
33704
2625
事实上,这项关于音乐的研究的就是
00:36
was to point out ways this method can be misused.
9
36329
3917
为了指出这个统计方法 有哪些被滥用的途径。
00:40
A famous thought experiment explains the method:
10
40746
2791
一个著名的思想实验可以 说明这个统计方法:
00:43
there are eight cups of tea,
11
43746
1750
假如你有 8 杯茶,
00:45
four with the milk added first, and four with the tea added first.
12
45496
4416
其中 4 杯先加了牛奶, 另外 4 杯先加了茶。
00:50
A participant must determine which are which according to taste.
13
50162
3625
参与者们必须根据口味 将这 8 杯茶分成两组,
00:53
There are 70 different ways the cups can be sorted into two groups of four,
14
53871
4583
总共有 70 个不同的方式 可以将杯子分为两组,每组 4 个,
00:58
and only one is correct.
15
58454
2000
而只有一个方式是正确的。
01:00
So, can she taste the difference?
16
60662
2584
那么,她可以尝出差异吗?
01:03
That’s our research question.
17
63246
1625
这就是我们的研究问题。
01:04
To analyze her choices, we define what’s called a null hypothesis:
18
64871
4625
为了分析她的选择, 我们需要建立一个原假设:
01:09
that she can’t distinguish the teas.
19
69496
2167
她不能分辨出这 8 杯茶。
01:11
If she can’t distinguish the teas,
20
71871
2042
假如她分辨不出这些茶,
01:13
she’ll still get the right answer 1 in 70 times by chance.
21
73913
5166
她仍然可以在 70 次中做一次正确选择。
01:19
1 in 70 is roughly .014.
22
79079
3334
70 分之 1 大概是 0.014 的概率。
01:22
That single number is called a p-value.
23
82746
3292
这个数字可以称之为 P 值。
01:26
In many fields, a p-value of .05 or below is considered statistically significant,
24
86038
6916
许多领域中,当 P 值等于或小于0.05时, 可以认为在统计学上具有显著性,
01:32
meaning there’s enough evidence to reject the null hypothesis.
25
92954
3792
这意味着有着足够的证据可以 拒绝我们的零假设。
01:36
Based on a p-value of .014,
26
96996
3375
基于 0.014 的 P 值,
01:40
they’d rule out the null hypothesis that she can’t distinguish the teas.
27
100371
4125
他们可以推翻她无法辨别 这 8 杯茶的原假设。
01:44
Though p-values are commonly used by both researchers and journals
28
104913
3916
虽然研究人员和期刊常使用 P 值
01:48
to evaluate scientific results,
29
108829
2084
来鉴定试验结果,
01:50
they’re really confusing, even for many scientists.
30
110913
2958
但是甚至对许多科学家来说, P 值非常难理解。
01:54
That’s partly because all a p-value actually tells us
31
114329
4042
这是因为在原假设是正确的情况下,
01:58
is the probability of getting a certain result,
32
118371
3000
P 值其实只能告诉我们
02:01
assuming the null hypothesis is true.
33
121371
2917
得到某种结果的概率。
02:04
So if she correctly sorts the teas,
34
124663
2791
所以假如她正确地分类出这 8 杯茶,
02:07
the p-value is the probability of her doing so
35
127454
3417
P 值代表的是 在假设她无法辨别这些茶的情况下
02:10
assuming she can’t tell the difference.
36
130871
2458
她能正确地分类的概率。
02:13
But the reverse isn’t true:
37
133329
2459
但反过来就不成立了:
02:15
the p-value doesn’t tell us the probability
38
135788
2416
P 值不能代表她能够尝出来
02:18
that she can taste the difference,
39
138204
1625
不同味道的概率,
02:19
which is what we’re trying to find out.
40
139829
2084
但这就是我们想要找出的。
02:22
So if a p-value doesn’t answer the research question,
41
142329
3250
所以当 P 值不能解答 我们的研究问题时,
02:25
why does the scientific community use it?
42
145579
2292
为什么科学界还在使用 P 值呢?
02:28
Well, because even though a p-value doesn’t directly state the probability
43
148329
4709
这是因为,虽然 P 值不能直接说明
02:33
that the results are due to random chance,
44
153038
2500
实验结果是随机的,
02:35
it usually gives a pretty reliable indication.
45
155538
3333
但它是一个可靠的指示。
02:39
At least, it does when used correctly.
46
159204
2792
至少,当它在正确使用时,确实如此。
02:41
And that’s where many researchers, and even whole fields,
47
161996
3917
这就是许多研究人员, 甚至整个领域
02:45
have run into trouble.
48
165913
1458
会出错的地方。
02:47
Most real studies are more complex than the tea experiment.
49
167538
3458
大多数真正的研究 都比 8 杯茶实验更加复杂,
02:51
Scientists can test their research question in multiple ways,
50
171288
3375
科学家可以使用许多不同的方法 来测试他们的研究问题,
02:54
and some of these tests might produce a statistically significant result,
51
174663
4375
其中有一些方法可能会产生 具有统计显著性的结果,
02:59
while others don’t.
52
179038
1208
但有些方法就不会。
03:00
It might seem like a good idea to test every possibility.
53
180454
3167
似乎测试每一种可能性 是一个很好的主意。
03:03
But it’s not, because with each additional test,
54
183913
3083
但是并不然, 因为每一个附加的测试
03:07
the chance of a false positive increases.
55
187163
3208
都会带来增加误报的可能性。
03:10
Searching for a low p-value, and then presenting only that analysis,
56
190996
4500
寻找低 P 值, 然后只展示这部分的分析,
03:15
is often called p-hacking.
57
195496
2750
通常称为 P 值操纵。
03:18
It’s like throwing darts until you hit a bullseye
58
198246
2750
这就好比一直扔飞镖, 直到有一个击中靶心,
03:20
and then saying you only threw the dart that hit the bull’s eye.
59
200996
3333
然后宣称你只扔了那一个 正中靶心的飞镖。
03:24
This is exactly what the music researchers did.
60
204746
3208
这正是这项音乐研究人员做的,
03:28
They played three groups of participants each a different song
61
208079
3709
他们为三组不同的参与者 放了三个不同的歌曲,
03:31
and collected lots of information about them.
62
211788
2500
然后收集了他们的大量信息。
03:34
The analysis they published included only two out of the three groups.
63
214288
4250
他们发表的分析报告 只包括了三组中的两组。
03:38
Of all the information they collected,
64
218538
2208
在所有收集的信息中,
03:40
their analysis only used participants’ fathers’ age—
65
220746
3542
他们的分析只使用了 参与者父亲的年龄
03:44
to “control for variation in baseline age across participants.”
66
224288
4541
目的是为了 “控制参与者的年龄基线”,
03:49
They also paused their experiment after every ten participants,
67
229246
4208
他们还在每 10 个参与者后暂停实验,
03:53
and continued if the p-value was above .05,
68
233454
4459
然后如果 P 值大于 0.05, 他们会继续试验,
03:57
but stopped when it dipped below .05.
69
237913
3291
但假如 P 值开始下降到低于 0.05, 他们会停止实验。
04:01
They found that participants who heard one song were 1.5 years younger
70
241746
5208
他们发现听到某一首歌的参与者 会比听到另一首歌的
04:06
than those who heard the other song, with a p-value of .04.
71
246954
4375
年轻 1.5 岁,P 值为 0.04。
04:12
Usually it’s much tougher to spot p-hacking,
72
252163
2833
通常很难发现是否存在 P 值操控,
04:14
because we don’t know the results are impossible:
73
254996
2667
因为我们不知道哪些结论 是不可能的:
04:17
the whole point of doing experiments is to learn something new.
74
257663
3416
实验的目的就是探索新的知识。
04:21
Fortunately, there’s a simple way to make p-values more reliable:
75
261329
4209
幸运的是,有许多容易的方法 可以让 P 值更加地可靠:
04:25
pre-registering a detailed plan for the experiment and analysis
76
265913
4708
预先记录实验及分析的详细计划,
04:30
beforehand that others can check,
77
270621
2458
以便他人能够核查,
04:33
so researchers can’t keep trying different analyses
78
273079
3417
来确保在研究员得到重要结果前,
04:36
until they find a significant result.
79
276496
2125
不会尝试更改分析方式。
04:38
And, in the true spirit of scientific inquiry,
80
278788
2458
本着真正的科学探究精神,
04:41
there’s even a new field that’s basically science doing science on itself:
81
281246
5375
甚至还有一个新领域出现, 那就是用科学研究科学:
04:46
studying scientific practices in order to improve them.
82
286621
3667
研究不同的科学实践, 以改进它们。
关于本网站

这个网站将向你介绍对学习英语有用的YouTube视频。你将看到来自世界各地的一流教师教授的英语课程。双击每个视频页面上显示的英文字幕,即可从那里播放视频。字幕会随着视频的播放而同步滚动。如果你有任何意见或要求,请使用此联系表与我们联系。

https://forms.gle/WvT1wiN1qDtmnspy7