The method that can "prove" almost anything - James A. Smith

816,442 views ・ 2021-08-05

TED-Ed


請雙擊下方英文字幕播放視頻。

譯者: Lilian Chiu 審譯者: Amanda Zhu
00:06
In 2011, a group of researchers conducted a scientific study
0
6371
4167
2011 年,一群研究者 進行了一項科學研究,
00:10
to find an impossible result:
1
10538
2125
其發現讓人難以置信:
00:12
that listening to certain songs can make you younger.
2
12663
3500
聆聽某些歌曲能讓你變年輕。
00:16
Their study involved real people, truthfully reported data,
3
16663
3625
他們的研究用到真人參與、 誠實回報的資料,
00:20
and commonplace statistical analyses.
4
20288
3000
以及常用的統計分析。
00:23
So how did they do it?
5
23288
1416
他們怎麼做到的?
00:24
The answer lies in a statistical method scientists often use
6
24704
4125
答案是一種統計方法,
科學家通常會用它來判別
00:28
to try to figure out whether their results mean something or if they’re random noise.
7
28829
4875
研究結果是有意義的, 或者只是隨機雜音。
00:33
In fact, the whole point of the music study
8
33704
2625
事實上,這項音樂研究的重點
00:36
was to point out ways this method can be misused.
9
36329
3917
就是要點出這個方法 可能如何被誤用。
00:40
A famous thought experiment explains the method:
10
40746
2791
有一個著名的思想實驗 就解釋了這個方法:
00:43
there are eight cups of tea,
11
43746
1750
有八杯茶,
00:45
four with the milk added first, and four with the tea added first.
12
45496
4416
其中四杯先加牛奶,
另外四杯先加茶。
00:50
A participant must determine which are which according to taste.
13
50162
3625
受試者要根據味道 來判斷哪一杯是哪一種。
00:53
There are 70 different ways the cups can be sorted into two groups of four,
14
53871
4583
將任意四杯分成一組, 一共會有七十種組合,
00:58
and only one is correct.
15
58454
2000
其中只有一種是正確的。
01:00
So, can she taste the difference?
16
60662
2584
我們這項研究的問題是
01:03
That’s our research question.
17
63246
1625
「她能嚐出差異嗎?」
01:04
To analyze her choices, we define what’s called a null hypothesis:
18
64871
4625
為了分析各種選擇,
我們要先設定所謂的虛無假說:
01:09
that she can’t distinguish the teas.
19
69496
2167
她無法分辨。
01:11
If she can’t distinguish the teas,
20
71871
2042
如果她無法分辨,
01:13
she’ll still get the right answer 1 in 70 times by chance.
21
73913
5166
她仍然有答對的可能,
猜對的機率有七十分之一。
01:19
1 in 70 is roughly .014.
22
79079
3334
七十分之一約為 0.014。
01:22
That single number is called a p-value.
23
82746
3292
這個數字叫做 p 值。
01:26
In many fields, a p-value of .05 or below is considered statistically significant,
24
86038
6916
在許多領域中,
等於或小於 0.05 的 p 值
被認為具有統計顯著性,
01:32
meaning there’s enough evidence to reject the null hypothesis.
25
92954
3792
意即已有證據足以摒棄這個虛無假設。
01:36
Based on a p-value of .014,
26
96996
3375
因為這個研究的 p 值為 0.014,
01:40
they’d rule out the null hypothesis that she can’t distinguish the teas.
27
100371
4125
他們就會將「她無法分辨」的 虛無假說排除。
01:44
Though p-values are commonly used by both researchers and journals
28
104913
3916
雖然研究者和期刊都經常使用 p 值
01:48
to evaluate scientific results,
29
108829
2084
來評估科學研究結果,
01:50
they’re really confusing, even for many scientists.
30
110913
2958
但就連許多科學家 也會對 p 值感到困惑,
01:54
That’s partly because all a p-value actually tells us
31
114329
4042
部分原因是 p 值其實只是告訴我們,
01:58
is the probability of getting a certain result,
32
118371
3000
如果虛無假設是真的,
02:01
assuming the null hypothesis is true.
33
121371
2917
得到某個結果的機率有多高。
02:04
So if she correctly sorts the teas,
34
124663
2791
所以,如果她把茶正確地分類,
02:07
the p-value is the probability of her doing so
35
127454
3417
p 值就是在假設 她無法分辨的前提下
02:10
assuming she can’t tell the difference.
36
130871
2458
正確分辨的機率,
02:13
But the reverse isn’t true:
37
133329
2459
但反過來就不見得是對的:
02:15
the p-value doesn’t tell us the probability
38
135788
2416
p 值不會告訴我們 她分辨錯誤的機率,
02:18
that she can taste the difference,
39
138204
1625
02:19
which is what we’re trying to find out.
40
139829
2084
這機率才是我們想找出的答案。
02:22
So if a p-value doesn’t answer the research question,
41
142329
3250
所以,如果 p 值不能解答研究問題,
02:25
why does the scientific community use it?
42
145579
2292
為什麼仍被科學界採用?
02:28
Well, because even though a p-value doesn’t directly state the probability
43
148329
4709
因為雖然 p 值不能直接代表
隨機猜中的機率,
02:33
that the results are due to random chance,
44
153038
2500
02:35
it usually gives a pretty reliable indication.
45
155538
3333
但它通常仍然能提供蠻可靠的暗示,
02:39
At least, it does when used correctly.
46
159204
2792
至少是在正確使用的情況下。
02:41
And that’s where many researchers, and even whole fields,
47
161996
3917
這就是許多研究者,
甚至整個研究領域遇到問題的地方了。
02:45
have run into trouble.
48
165913
1458
02:47
Most real studies are more complex than the tea experiment.
49
167538
3458
大部分真正的研究 都比這個茶的實驗複雜許多。
02:51
Scientists can test their research question in multiple ways,
50
171288
3375
科學家可以用多種方式 來測試他們的研究,
02:54
and some of these tests might produce a statistically significant result,
51
174663
4375
有些測試可能會產生 具有統計顯著性的結果,
02:59
while others don’t.
52
179038
1208
有些則不會。
03:00
It might seem like a good idea to test every possibility.
53
180454
3167
測試每一種可能性似乎是個好點子,
03:03
But it’s not, because with each additional test,
54
183913
3083
但事實並非如此,
因為每增加一項測試,
03:07
the chance of a false positive increases.
55
187163
3208
結果是偽真的可能性就會增加。
03:10
Searching for a low p-value, and then presenting only that analysis,
56
190996
4500
找一個很低的 p 值,
並只呈現對應該 p 值的分析,
03:15
is often called p-hacking.
57
195496
2750
通常被稱為 p 值駭客。
03:18
It’s like throwing darts until you hit a bullseye
58
198246
2750
這就像是不斷射飛鏢, 直到命中紅心,
03:20
and then saying you only threw the dart that hit the bull’s eye.
59
200996
3333
然後宣稱你只射了 命中紅心的那個飛鏢。
03:24
This is exactly what the music researchers did.
60
204746
3208
那些聲稱音樂可以駐顏的研究者 用的就是這一招。
03:28
They played three groups of participants each a different song
61
208079
3709
針對三組受試者, 他們各播放一首不同的歌曲,
03:31
and collected lots of information about them.
62
211788
2500
接著收集許多實驗的資訊。
03:34
The analysis they published included only two out of the three groups.
63
214288
4250
他們發表的分析 只包含三組當中的兩組。
03:38
Of all the information they collected,
64
218538
2208
在他們所收集到的所有資訊中,
03:40
their analysis only used participants’ fathers’ age—
65
220746
3542
他們的分析只使用了 受試者的父親年齡——
03:44
to “control for variation in baseline age across participants.”
66
224288
4541
以「控制各受試者 基線年齡的差異」。
03:49
They also paused their experiment after every ten participants,
67
229246
4208
而且每做完十個受試者, 他們就會把實驗暫停,
03:53
and continued if the p-value was above .05,
68
233454
4459
如果 p 值高於 0.05 就會繼續,
03:57
but stopped when it dipped below .05.
69
237913
3291
若低於 0.05,就停下來。
04:01
They found that participants who heard one song were 1.5 years younger
70
241746
5208
他們發現,聽某一首歌曲的受試者
比聽另一首歌曲的受試者 還要年輕一歲半,
04:06
than those who heard the other song, with a p-value of .04.
71
246954
4375
對應的 p 值為 0.04。
04:12
Usually it’s much tougher to spot p-hacking,
72
252163
2833
一般來說,p 值駭客很難被發現,
04:14
because we don’t know the results are impossible:
73
254996
2667
因為我們不會知道結果是不可能的:
04:17
the whole point of doing experiments is to learn something new.
74
257663
3416
做實驗的目的就是想取得新知。
04:21
Fortunately, there’s a simple way to make p-values more reliable:
75
261329
4209
幸運的是,有一個簡單的方法
可以讓 p 值變得更可靠:
04:25
pre-registering a detailed plan for the experiment and analysis
76
265913
4708
事先登錄實驗及分析計畫,
04:30
beforehand that others can check,
77
270621
2458
讓他人能夠檢查,
04:33
so researchers can’t keep trying different analyses
78
273079
3417
這樣研究者就無法 不斷嘗試不同的分析,
04:36
until they find a significant result.
79
276496
2125
直到找到顯著的結果為止。
04:38
And, in the true spirit of scientific inquiry,
80
278788
2458
而且,根據真正的科學調查精神,
04:41
there’s even a new field that’s basically science doing science on itself:
81
281246
5375
甚至有一個新領域,
基本上是科學在對自己做科學:
04:46
studying scientific practices in order to improve them.
82
286621
3667
研究的是科學的研究方法,以改善它們。
關於本網站

本網站將向您介紹對學習英語有用的 YouTube 視頻。 您將看到來自世界各地的一流教師教授的英語課程。 雙擊每個視頻頁面上顯示的英文字幕,從那裡播放視頻。 字幕與視頻播放同步滾動。 如果您有任何意見或要求,請使用此聯繫表與我們聯繫。

https://forms.gle/WvT1wiN1qDtmnspy7