Peter Donnelly: How stats fool juries

246,556 views ・ 2007-01-12

TED


请双击下面的英文字幕来播放视频。

翻译人员: Xiaofei Zhang 校对人员: Zhu Jie
00:25
As other speakers have said, it's a rather daunting experience --
0
25000
2000
正如一些演讲者所说 在这里的观众面前演讲
00:27
a particularly daunting experience -- to be speaking in front of this audience.
1
27000
3000
是一次令人畏缩的经历--相当令人恐慌
00:30
But unlike the other speakers, I'm not going to tell you about
2
30000
3000
不过与其他演讲者不同 我不会给大家讲
00:33
the mysteries of the universe, or the wonders of evolution,
3
33000
2000
宇宙的迷团 也不会讲进化的奥妙
00:35
or the really clever, innovative ways people are attacking
4
35000
4000
抑或是人们用来对抗世界上主要的不平等现象的
00:39
the major inequalities in our world.
5
39000
2000
那些着实非常奇妙新颖的办法
00:41
Or even the challenges of nation-states in the modern global economy.
6
41000
5000
更不会讲现代全球经济下国家之间的挑战
00:46
My brief, as you've just heard, is to tell you about statistics --
7
46000
4000
就像你们刚才听到的 概括来说 我讲的内容是统计学--
00:50
and, to be more precise, to tell you some exciting things about statistics.
8
50000
3000
更确切地说 是一些统计学中很有趣的事情
00:53
And that's --
9
53000
1000
而这--
00:54
(Laughter)
10
54000
1000
(笑)
00:55
-- that's rather more challenging
11
55000
2000
--相对所有在我之前以及之后的演讲者而言
00:57
than all the speakers before me and all the ones coming after me.
12
57000
2000
具有空前绝后的挑战性
00:59
(Laughter)
13
59000
1000
(笑)
01:01
One of my senior colleagues told me, when I was a youngster in this profession,
14
61000
5000
当我在统计学这个领域还是新人的时候 一个资深同事相当自豪地告诉我
01:06
rather proudly, that statisticians were people who liked figures
15
66000
4000
统计学家是那些喜欢数字
01:10
but didn't have the personality skills to become accountants.
16
70000
3000
但性格上不适合做会计的人
01:13
(Laughter)
17
73000
2000
(笑)
01:15
And there's another in-joke among statisticians, and that's,
18
75000
3000
还有一个统计学的笑话
01:18
"How do you tell the introverted statistician from the extroverted statistician?"
19
78000
3000
“怎样看出统计学家是内向还是外向呢?”
01:21
To which the answer is,
20
81000
2000
答案就是
01:23
"The extroverted statistician's the one who looks at the other person's shoes."
21
83000
5000
“外向的统计学家会看别人的鞋”
01:28
(Laughter)
22
88000
3000
(笑)
01:31
But I want to tell you something useful -- and here it is, so concentrate now.
23
91000
5000
不过其实我想讲一些有用的--所以请注意
01:36
This evening, there's a reception in the University's Museum of Natural History.
24
96000
3000
今晚在学校的自然历史博物馆里有一个招待会
01:39
And it's a wonderful setting, as I hope you'll find,
25
99000
2000
希望你能发现 这是一个绝妙的场合
01:41
and a great icon to the best of the Victorian tradition.
26
101000
5000
也是维多利亚优秀传统中的表现
01:46
It's very unlikely -- in this special setting, and this collection of people --
27
106000
5000
在这样的场合 这样的人群中 虽然有点不大可能
01:51
but you might just find yourself talking to someone you'd rather wish that you weren't.
28
111000
3000
但你也许仍然发现你在跟一些你并不想聊天的人交谈
01:54
So here's what you do.
29
114000
2000
这时候你就可以这么做
01:56
When they say to you, "What do you do?" -- you say, "I'm a statistician."
30
116000
4000
当他们问:“你的工作是?”--你就说:“我是统计学家”
02:00
(Laughter)
31
120000
1000
(笑)
02:01
Well, except they've been pre-warned now, and they'll know you're making it up.
32
121000
4000
除非他们事先得到提醒 知道这是你编的
02:05
And then one of two things will happen.
33
125000
2000
一般出现的情形都不过以下两种
02:07
They'll either discover their long-lost cousin in the other corner of the room
34
127000
2000
他们会突然在屋子另一角发现了失散多年的表亲
02:09
and run over and talk to them.
35
129000
2000
然后赶去跟他们说话
02:11
Or they'll suddenly become parched and/or hungry -- and often both --
36
131000
3000
或者他们会突然很渴或者很饿--通常是饥渴交迫--
02:14
and sprint off for a drink and some food.
37
134000
2000
然后奔向食物和饮料
02:16
And you'll be left in peace to talk to the person you really want to talk to.
38
136000
4000
这是你就能一个人静下来 跟你想聊天的人交谈
02:20
It's one of the challenges in our profession to try and explain what we do.
39
140000
3000
解释我们到底是做什么的 是我们这个领域的一个挑战
02:23
We're not top on people's lists for dinner party guests and conversations and so on.
40
143000
5000
我们并不是晚宴的贵宾 也不是理想的交谈对象
02:28
And it's something I've never really found a good way of doing.
41
148000
2000
对此我也一直没能找到什么好的解决办法
02:30
But my wife -- who was then my girlfriend --
42
150000
3000
但我的妻子--当时是我的女朋友
02:33
managed it much better than I've ever been able to.
43
153000
3000
在这件事上就比我出色的多
02:36
Many years ago, when we first started going out, she was working for the BBC in Britain,
44
156000
3000
多年前 那时我们刚开始约会 她在英国BBC工作
02:39
and I was, at that stage, working in America.
45
159000
2000
而我当时在美国
02:41
I was coming back to visit her.
46
161000
2000
我回英国看她的时候
02:43
She told this to one of her colleagues, who said, "Well, what does your boyfriend do?"
47
163000
6000
她跟一个同事说起这事 那个同事问:“你男朋友是做什么的?”
02:49
Sarah thought quite hard about the things I'd explained --
48
169000
2000
她苦苦思索着我刚才解释过的工作
02:51
and she concentrated, in those days, on listening.
49
171000
4000
于是那段时间她一直是一个专心的倾听者
02:55
(Laughter)
50
175000
2000
(笑)
02:58
Don't tell her I said that.
51
178000
2000
别告诉她我跟说过这事
03:00
And she was thinking about the work I did developing mathematical models
52
180000
4000
她当时想 我的工作是建立数模
03:04
for understanding evolution and modern genetics.
53
184000
3000
来加深对进化和现代基因学的了解
03:07
So when her colleague said, "What does he do?"
54
187000
3000
所以当同事问:“他是干什么的?”
03:10
She paused and said, "He models things."
55
190000
4000
她就停顿一下 然后说:“他做模型。”
03:14
(Laughter)
56
194000
1000
(笑)
03:15
Well, her colleague suddenly got much more interested than I had any right to expect
57
195000
4000
当然 她的同事立即就对我产生了出乎我意料的兴趣
03:19
and went on and said, "What does he model?"
58
199000
3000
并继续问:“他做什么模型?”
03:22
Well, Sarah thought a little bit more about my work and said, "Genes."
59
202000
3000
然后 萨拉又想了想我的工作 然后答:“基因。”
03:25
(Laughter)
60
205000
4000
(笑)
03:29
"He models genes."
61
209000
2000
“他建立基因模型。”
03:31
That is my first love, and that's what I'll tell you a little bit about.
62
211000
4000
这就是我的初恋 题外话了
03:35
What I want to do more generally is to get you thinking about
63
215000
4000
总的来说 我要给大家讲一些
03:39
the place of uncertainty and randomness and chance in our world,
64
219000
3000
不确定性、随机性和概率在生活中的影响
03:42
and how we react to that, and how well we do or don't think about it.
65
222000
5000
我们对此的反应是怎样的 以及我们了解他们的程度
03:47
So you've had a pretty easy time up till now --
66
227000
2000
到现在为止大家听得都很轻松
03:49
a few laughs, and all that kind of thing -- in the talks to date.
67
229000
2000
到现在为止都是听听笑笑
03:51
You've got to think, and I'm going to ask you some questions.
68
231000
3000
现在大家要开始思考了 我会提几个问题
03:54
So here's the scene for the first question I'm going to ask you.
69
234000
2000
下面这个场景就是我开始问第一个问题
03:56
Can you imagine tossing a coin successively?
70
236000
3000
想象连续掷硬币的情形
03:59
And for some reason -- which shall remain rather vague --
71
239000
3000
由于某种原因--我就暂时不做过多的解释了--
04:02
we're interested in a particular pattern.
72
242000
2000
我们很喜欢某种特定的情形
04:04
Here's one -- a head, followed by a tail, followed by a tail.
73
244000
3000
比如这个--正面、反面、正面
04:07
So suppose we toss a coin repeatedly.
74
247000
3000
假设我们连续掷硬币
04:10
Then the pattern, head-tail-tail, that we've suddenly become fixated with happens here.
75
250000
5000
然后我们设定这样一个情形 正反反
04:15
And you can count: one, two, three, four, five, six, seven, eight, nine, 10 --
76
255000
4000
数着掷十次:一 二 三 四 五 六 七 八 九 十
04:19
it happens after the 10th toss.
77
259000
2000
然后看结果怎么样
04:21
So you might think there are more interesting things to do, but humor me for the moment.
78
261000
3000
你可能觉得还有更有趣的事可以做 不过这次先迁就我一下
04:24
Imagine this half of the audience each get out coins, and they toss them
79
264000
4000
假设这半边观众都拿出硬币开始投掷
04:28
until they first see the pattern head-tail-tail.
80
268000
3000
直到他们看到正反反现象为止
04:31
The first time they do it, maybe it happens after the 10th toss, as here.
81
271000
2000
第一回投硬币 也许十次以后才能看到
04:33
The second time, maybe it's after the fourth toss.
82
273000
2000
第二回 也许第四次就能看到
04:35
The next time, after the 15th toss.
83
275000
2000
再下一回 也许比15次还多
04:37
So you do that lots and lots of times, and you average those numbers.
84
277000
3000
做过很多遍这个实验后 将每遍的次数平均
04:40
That's what I want this side to think about.
85
280000
3000
这就是我想让这半边思考的情况
04:43
The other half of the audience doesn't like head-tail-tail --
86
283000
2000
那半边观众不喜欢正反反
04:45
they think, for deep cultural reasons, that's boring --
87
285000
3000
出于某些深刻的文化因素 他们觉得这很无聊--
04:48
and they're much more interested in a different pattern -- head-tail-head.
88
288000
3000
他们跟更喜欢另一种情形--正反正
04:51
So, on this side, you get out your coins, and you toss and toss and toss.
89
291000
3000
所以 这半边的观众拿出硬币 反复投掷
04:54
And you count the number of times until the pattern head-tail-head appears
90
294000
3000
然后记下看到正反正情形出现时掷硬币的次数
04:57
and you average them. OK?
91
297000
3000
然后将所有的次数平均
05:00
So on this side, you've got a number --
92
300000
2000
那么 这半边的观众得出了一个平均数
05:02
you've done it lots of times, so you get it accurately --
93
302000
2000
因为做了很多次 所以这个数字是准确的
05:04
which is the average number of tosses until head-tail-tail.
94
304000
3000
就是正反反情形出现时投掷硬币次数的平均
05:07
On this side, you've got a number -- the average number of tosses until head-tail-head.
95
307000
4000
而这半边的观众 大家也得出了一个数字--正反正情形的平均
05:11
So here's a deep mathematical fact --
96
311000
2000
那么就有了这样一个数学问题
05:13
if you've got two numbers, one of three things must be true.
97
313000
3000
两个数之间只能有三种情形
05:16
Either they're the same, or this one's bigger than this one,
98
316000
3000
他们或者相等 或者这个比那个大
05:19
or this one's bigger than that one.
99
319000
1000
或者那个比这个大
05:20
So what's going on here?
100
320000
3000
那么在我们这两种情形下这两个数相比会怎样呢
05:23
So you've all got to think about this, and you've all got to vote --
101
323000
2000
大家来思考一下 然后投个票
05:25
and we're not moving on.
102
325000
1000
现在给大家一些时间
05:26
And I don't want to end up in the two-minute silence
103
326000
2000
不过我不想因为给大家更多的时间思考直到每个人都立场明确
05:28
to give you more time to think about it, until everyone's expressed a view. OK.
104
328000
4000
而最后以两分钟沉默告终
05:32
So what you want to do is compare the average number of tosses until we first see
105
332000
4000
所以你们要做的只是比较这两种情形下
05:36
head-tail-head with the average number of tosses until we first see head-tail-tail.
106
336000
4000
平均数的大小
05:41
Who thinks that A is true --
107
341000
2000
哪些认为A是对的--
05:43
that, on average, it'll take longer to see head-tail-head than head-tail-tail?
108
343000
4000
即 平均来看 出现正反正的情形要晚于正反反情形?
05:47
Who thinks that B is true -- that on average, they're the same?
109
347000
3000
哪些认为B是对的--即 平均来看次数相同?
05:51
Who thinks that C is true -- that, on average, it'll take less time
110
351000
2000
哪些认为C是对的--即 平均来看 出现正反正情形的次数
05:53
to see head-tail-head than head-tail-tail?
111
353000
3000
要少于正反反的情形?
05:57
OK, who hasn't voted yet? Because that's really naughty -- I said you had to.
112
357000
3000
好 谁没有投票? 那真是很调皮--我说过你们要选择一个
06:00
(Laughter)
113
360000
1000
(笑)
06:02
OK. So most people think B is true.
114
362000
3000
好的 那么大多数人认为B是正确的
06:05
And you might be relieved to know even rather distinguished mathematicians think that.
115
365000
3000
也许当听到甚至非常优秀的数学家也是这么想的 你会放下心来
06:08
It's not. A is true here.
116
368000
4000
B不正确 答案是A
06:12
It takes longer, on average.
117
372000
2000
实际上 平均起来
06:14
In fact, the average number of tosses till head-tail-head is 10
118
374000
2000
正反正情形下掷硬币的次数是10次
06:16
and the average number of tosses until head-tail-tail is eight.
119
376000
5000
而正反反情形的次数是8次
06:21
How could that be?
120
381000
2000
怎么会这样呢
06:24
Anything different about the two patterns?
121
384000
3000
这两种情形有什么不同吗
06:30
There is. Head-tail-head overlaps itself.
122
390000
5000
二者的确不同 正反正情形会自我重叠
06:35
If you went head-tail-head-tail-head, you can cunningly get two occurrences
123
395000
4000
如果你掷出正-反-正-反-正 你能在这五次中
06:39
of the pattern in only five tosses.
124
399000
3000
看到两次正反正的情形
06:42
You can't do that with head-tail-tail.
125
402000
2000
而这在正反反的情形下无法实现
06:44
That turns out to be important.
126
404000
2000
这一点变得很重要
06:46
There are two ways of thinking about this.
127
406000
2000
有两种方法可以来想这个问题
06:48
I'll give you one of them.
128
408000
2000
我提供其中之一
06:50
So imagine -- let's suppose we're doing it.
129
410000
2000
假设我们正在进行这个实验
06:52
On this side -- remember, you're excited about head-tail-tail;
130
412000
2000
这半边观众--记住 你们希望看到正反反
06:54
you're excited about head-tail-head.
131
414000
2000
而你们希望看到正反正
06:56
We start tossing a coin, and we get a head --
132
416000
3000
我们开始投硬币 第一次是正
06:59
and you start sitting on the edge of your seat
133
419000
1000
大家都开始暗自激动
07:00
because something great and wonderful, or awesome, might be about to happen.
134
420000
5000
因为一个美妙绝伦的事情要发生了
07:05
The next toss is a tail -- you get really excited.
135
425000
2000
第二次是反--大家都很激动
07:07
The champagne's on ice just next to you; you've got the glasses chilled to celebrate.
136
427000
4000
手边的香槟已经冰好 大家都拿着杯子开始准备庆祝
07:11
You're waiting with bated breath for the final toss.
137
431000
2000
大家都屏气凝神观望最后一掷
07:13
And if it comes down a head, that's great.
138
433000
2000
如果是正 那么非常好
07:15
You're done, and you celebrate.
139
435000
2000
你们完了 而你们可以庆祝了
07:17
If it's a tail -- well, rather disappointedly, you put the glasses away
140
437000
2000
如果这是反--那么有些遗憾 你们要把杯子移开
07:19
and put the champagne back.
141
439000
2000
然后把香槟放回去
07:21
And you keep tossing, to wait for the next head, to get excited.
142
441000
3000
接着掷硬币 等着下一个正 然后开始激动
07:25
On this side, there's a different experience.
143
445000
2000
而这半边则完全不同
07:27
It's the same for the first two parts of the sequence.
144
447000
3000
这个序列中前两步都是相同的
07:30
You're a little bit excited with the first head --
145
450000
2000
大家因第一个是正有点兴奋
07:32
you get rather more excited with the next tail.
146
452000
2000
当第二个是反的时候 变得更加激动
07:34
Then you toss the coin.
147
454000
2000
然后再掷硬币
07:36
If it's a tail, you crack open the champagne.
148
456000
3000
如果是反 你们就可以打开香槟了
07:39
If it's a head you're disappointed,
149
459000
2000
如果是正 你们会感到失望
07:41
but you're still a third of the way to your pattern again.
150
461000
3000
但你们仍旧已经完成了这个模式的三分之一
07:44
And that's an informal way of presenting it -- that's why there's a difference.
151
464000
4000
这就是一种不大正式的解释--这就是出现不同的原因
07:48
Another way of thinking about it --
152
468000
2000
另外一种思考的方法就是--
07:50
if we tossed a coin eight million times,
153
470000
2000
如果我们掷八百万次硬币
07:52
then we'd expect a million head-tail-heads
154
472000
2000
我们可能会预计有一百万正反正情形
07:54
and a million head-tail-tails -- but the head-tail-heads could occur in clumps.
155
474000
7000
和一百万次正反反情形的出现--但正反正的情形可能接连出现
08:01
So if you want to put a million things down amongst eight million positions
156
481000
2000
所以如果你想在八百万个位置中得到一百万个固定的模式
08:03
and you can have some of them overlapping, the clumps will be further apart.
157
483000
5000
可能会有一些是重叠的 重叠的部分会很长
08:08
It's another way of getting the intuition.
158
488000
2000
这就是另外一种思考方法
08:10
What's the point I want to make?
159
490000
2000
那么这说明什么问题呢?
08:12
It's a very, very simple example, an easily stated question in probability,
160
492000
4000
这是一个非常简单的例子 一个很简单明了的问题--
08:16
which every -- you're in good company -- everybody gets wrong.
161
496000
3000
有很多人跟你们一样--这个问题几乎没有人答对
08:19
This is my little diversion into my real passion, which is genetics.
162
499000
4000
这是一个小小的题外话 我很想讲的 是基因学
08:23
There's a connection between head-tail-heads and head-tail-tails in genetics,
163
503000
3000
在基因学中 正反正和正反反两种情形间存在某种联系
08:26
and it's the following.
164
506000
3000
这个联系是这样的
08:29
When you toss a coin, you get a sequence of heads and tails.
165
509000
3000
掷硬币的时候 你会得到一个正和反组成的序列
08:32
When you look at DNA, there's a sequence of not two things -- heads and tails --
166
512000
3000
而当观察DNA时 会发现这不是两个元素组成的序列--正反正--
08:35
but four letters -- As, Gs, Cs and Ts.
167
515000
3000
而是四个字母--A G C T
08:38
And there are little chemical scissors, called restriction enzymes
168
518000
3000
有一些小小的化学剪刀 叫做限制性内切酶
08:41
which cut DNA whenever they see particular patterns.
169
521000
2000
当它们遇到特定的情形时 就会剪断DNA
08:43
And they're an enormously useful tool in modern molecular biology.
170
523000
4000
在现代分子生物学中它们是非常有用的工具
08:48
And instead of asking the question, "How long until I see a head-tail-head?" --
171
528000
3000
在基因学中 我们不问“什么时候能看到正反正的情形?”
08:51
you can ask, "How big will the chunks be when I use a restriction enzyme
172
531000
3000
你可以问 比如说 “如果用限制性内切酶来剪断任何它遇到的GAAG排列
08:54
which cuts whenever it sees G-A-A-G, for example?
173
534000
4000
剪下来的基因部分会有多大?”
08:58
How long will those chunks be?"
174
538000
2000
那些基因部分会有多长?
09:00
That's a rather trivial connection between probability and genetics.
175
540000
5000
这是概率和基因之间的一个相当细微的联系
09:05
There's a much deeper connection, which I don't have time to go into
176
545000
3000
他们之间还有一个更深的联系 这里我没有时间多讲
09:08
and that is that modern genetics is a really exciting area of science.
177
548000
3000
那就是 现代基因学是一个很令人激动的科学领域
09:11
And we'll hear some talks later in the conference specifically about that.
178
551000
4000
以后我们可能会在某些大会的演讲中听到这个部分
09:15
But it turns out that unlocking the secrets in the information generated by modern
179
555000
4000
但是若把现代实验技术中发现的秘密公开,
09:19
experimental technologies, a key part of that has to do with fairly sophisticated --
180
559000
5000
关键就是那必须与一些相当复杂的--
09:24
you'll be relieved to know that I do something useful in my day job,
181
564000
3000
当听到我的工作是多有用的时候你们会倍感释然
09:27
rather more sophisticated than the head-tail-head story --
182
567000
2000
比正反正的试验要复杂地多--
09:29
but quite sophisticated computer modelings and mathematical modelings
183
569000
4000
但是相当复杂的计算机建模 数学建模
09:33
and modern statistical techniques.
184
573000
2000
以及现代统计技术
09:35
And I will give you two little snippets -- two examples --
185
575000
3000
我会举在牛津我们团队正在研究的项目中
09:38
of projects we're involved in in my group in Oxford,
186
578000
3000
的两个小例子
09:41
both of which I think are rather exciting.
187
581000
2000
我认为这两个例子都很有趣
09:43
You know about the Human Genome Project.
188
583000
2000
大家都了解人类基因组计划
09:45
That was a project which aimed to read one copy of the human genome.
189
585000
4000
那是一个项目 目的在于构建人类基因组遗传图谱
09:51
The natural thing to do after you've done that --
190
591000
2000
当完成那个项目后 下一步自然是--
09:53
and that's what this project, the International HapMap Project,
191
593000
2000
--就是这个计划 国际人类基因组单体型图计划
09:55
which is a collaboration between labs in five or six different countries.
192
595000
5000
目前有五六个不同个国家的实验室在合作研究
10:00
Think of the Human Genome Project as learning what we've got in common,
193
600000
4000
把人类基因遗传图谱看做是对我们共同点的了解
10:04
and the HapMap Project is trying to understand
194
604000
2000
而国际人类基因组单体型图计划就是试着了解
10:06
where there are differences between different people.
195
606000
2000
人类之间的不同
10:08
Why do we care about that?
196
608000
2000
为什么要这么关注这些呢?
10:10
Well, there are lots of reasons.
197
610000
2000
这有很多原因
10:12
The most pressing one is that we want to understand how some differences
198
612000
4000
最紧迫的一个就是 我们想了解其中一些不同
10:16
make some people susceptible to one disease -- type-2 diabetes, for example --
199
616000
4000
是怎样让一些人容易患一种病的--比如说 二型糖尿病--
10:20
and other differences make people more susceptible to heart disease,
200
620000
5000
而另一些不同使人更容易得心脏病
10:25
or stroke, or autism and so on.
201
625000
2000
或中风 自闭症等等其它病症
10:27
That's one big project.
202
627000
2000
这是一个宏大的项目
10:29
There's a second big project,
203
629000
2000
最近 英国威康信托基金会资助了一个项目
10:31
recently funded by the Wellcome Trust in this country,
204
631000
2000
其规模仅次于上一个项目
10:33
involving very large studies --
205
633000
2000
它包括了很多大型的研究--
10:35
thousands of individuals, with each of eight different diseases,
206
635000
3000
成千上万的人各负责八种不同的疾病
10:38
common diseases like type-1 and type-2 diabetes, and coronary heart disease,
207
638000
4000
有一些比较常见的疾病 比如一型糖尿病 二型糖尿病和冠心病
10:42
bipolar disease and so on -- to try and understand the genetics.
208
642000
4000
躁狂抑郁症等等--来试着了解基因
10:46
To try and understand what it is about genetic differences that causes the diseases.
209
646000
3000
着这了解那些导致疾病的基因的不同之处
10:49
Why do we want to do that?
210
649000
2000
为什么我们想做这些呢?
10:51
Because we understand very little about most human diseases.
211
651000
3000
因为我们对大多数人类疾病都了解甚微
10:54
We don't know what causes them.
212
654000
2000
我们不知道病因是什么
10:56
And if we can get in at the bottom and understand the genetics,
213
656000
2000
如果我们从根本入手并了解基因
10:58
we'll have a window on the way the disease works,
214
658000
3000
这边开启了一个通向疾病病理的窗口
11:01
and a whole new way about thinking about disease therapies
215
661000
2000
也开辟了思考疾病治疗方法
11:03
and preventative treatment and so on.
216
663000
3000
和预防措施的新路径
11:06
So that's, as I said, the little diversion on my main love.
217
666000
3000
所以 就像我之前说过的那样 这是我主要兴趣的一个小分支
11:09
Back to some of the more mundane issues of thinking about uncertainty.
218
669000
5000
回到一些关于随机性的平凡的问题上来
11:14
Here's another quiz for you --
219
674000
2000
这是给你们的另一个测试--
11:16
now suppose we've got a test for a disease
220
676000
2000
现在假设我们拿到了一个疾病的检测
11:18
which isn't infallible, but it's pretty good.
221
678000
2000
这个检测并不是完全准确的 但准确性很高
11:20
It gets it right 99 percent of the time.
222
680000
3000
这个检测的准确性高达99%
11:23
And I take one of you, or I take someone off the street,
223
683000
3000
现在我让你们中的一个人 或从街上拉来一个人
11:26
and I test them for the disease in question.
224
686000
2000
然后检测他患病的几率
11:28
Let's suppose there's a test for HIV -- the virus that causes AIDS --
225
688000
4000
假设这是一个艾滋病毒的测试--一个导致艾滋病的病毒--
11:32
and the test says the person has the disease.
226
692000
3000
而测试表明这个人患病
11:35
What's the chance that they do?
227
695000
3000
那么他患病的几率是多少呢
11:38
The test gets it right 99 percent of the time.
228
698000
2000
这个测试准确性是99%
11:40
So a natural answer is 99 percent.
229
700000
4000
所以自然而然会得出99%这个答案
11:44
Who likes that answer?
230
704000
2000
谁喜欢这个答案?
11:46
Come on -- everyone's got to get involved.
231
706000
1000
别这样--每个人都参与进来
11:47
Don't think you don't trust me anymore.
232
707000
2000
不要觉得你不再相信我了
11:49
(Laughter)
233
709000
1000
(笑)
11:50
Well, you're right to be a bit skeptical, because that's not the answer.
234
710000
3000
不过 你们的怀疑是正确的 因为这不是正确答案
11:53
That's what you might think.
235
713000
2000
你们可能是这么想的
11:55
It's not the answer, and it's not because it's only part of the story.
236
715000
3000
这不是正确答案 并不是因为这只是故事的一部分
11:58
It actually depends on how common or how rare the disease is.
237
718000
3000
而实际上它取决于这种病是常见的还是罕见的
12:01
So let me try and illustrate that.
238
721000
2000
现在我来试着说明一下
12:03
Here's a little caricature of a million individuals.
239
723000
4000
这个图代表一百万人
12:07
So let's think about a disease that affects --
240
727000
3000
我们来考虑一种疾病的感染率--
12:10
it's pretty rare, it affects one person in 10,000.
241
730000
2000
它非常罕见 在一万人中仅一人患病
12:12
Amongst these million individuals, most of them are healthy
242
732000
3000
在这一百万人中 大部分人都是健康的
12:15
and some of them will have the disease.
243
735000
2000
而一些人会患病
12:17
And in fact, if this is the prevalence of the disease,
244
737000
3000
实际上 如果这是疾病的流行程度
12:20
about 100 will have the disease and the rest won't.
245
740000
3000
那么约一百人会患病而其余人不会
12:23
So now suppose we test them all.
246
743000
2000
现在假设我们给所有人做了测试
12:25
What happens?
247
745000
2000
会出现什么情况呢
12:27
Well, amongst the 100 who do have the disease,
248
747000
2000
在100个患有该疾病的人中
12:29
the test will get it right 99 percent of the time, and 99 will test positive.
249
749000
5000
这个测试会有99%的正确性 所以99个人会检测出患病
12:34
Amongst all these other people who don't have the disease,
250
754000
2000
在那些没有患病的人中
12:36
the test will get it right 99 percent of the time.
251
756000
3000
这个测试仍然有99%的正确率
12:39
It'll only get it wrong one percent of the time.
252
759000
2000
只有1%是错误的
12:41
But there are so many of them that there'll be an enormous number of false positives.
253
761000
4000
但是没有患病的人太多了 所以错误的患病检测会非常多
12:45
Put that another way --
254
765000
2000
换种方法说--
12:47
of all of them who test positive -- so here they are, the individuals involved --
255
767000
5000
在所有结果是患病的检测中--就是这些人--
12:52
less than one in 100 actually have the disease.
256
772000
5000
真正患病的几率小于1%
12:57
So even though we think the test is accurate, the important part of the story is
257
777000
4000
所以即便我们认为这个测试是准确的 这个例子重要的部分在于
13:01
there's another bit of information we need.
258
781000
3000
我们还需要一些信息
13:04
Here's the key intuition.
259
784000
2000
这就是关键
13:07
What we have to do, once we know the test is positive,
260
787000
3000
当知道测试结果为患病时 我们要做的就是
13:10
is to weigh up the plausibility, or the likelihood, of two competing explanations.
261
790000
6000
权衡下面两种解释的概率或可能性
13:16
Each of those explanations has a likely bit and an unlikely bit.
262
796000
3000
每种解释都有一定的可能性
13:19
One explanation is that the person doesn't have the disease --
263
799000
3000
一种解释是这个人不患病--
13:22
that's overwhelmingly likely, if you pick someone at random --
264
802000
3000
这种可能性比较大 如果你随机选人的话--
13:25
but the test gets it wrong, which is unlikely.
265
805000
3000
但是测试结果错了 这种情况很罕见
13:29
The other explanation is that the person does have the disease -- that's unlikely --
266
809000
3000
另一种解释就是这个人不患病--这很少见--
13:32
but the test gets it right, which is likely.
267
812000
3000
但测试结果正确 这可能性很大
13:35
And the number we end up with --
268
815000
2000
而我们最后得到的数字--
13:37
that number which is a little bit less than one in 100 --
269
817000
3000
就是略少于100的数字--
13:40
is to do with how likely one of those explanations is relative to the other.
270
820000
6000
与这几种解释之间的关联性有关
13:46
Each of them taken together is unlikely.
271
826000
2000
每个解释合起来都不大可能
13:49
Here's a more topical example of exactly the same thing.
272
829000
3000
这是另一个说明同样道理的例子 更加切题
13:52
Those of you in Britain will know about what's become rather a celebrated case
273
832000
4000
在英国的听众知道 这是一个很有名的案子
13:56
of a woman called Sally Clark, who had two babies who died suddenly.
274
836000
5000
一个女人叫做萨里•克拉克 她有两个孩子 都突然去世
14:01
And initially, it was thought that they died of what's known informally as "cot death,"
275
841000
4000
很自然人们以为这属于婴儿猝死
14:05
and more formally as "Sudden Infant Death Syndrome."
276
845000
3000
更正式的说法是婴儿猝死综合征
14:08
For various reasons, she was later charged with murder.
277
848000
2000
由于多种原因 萨里后来以谋杀罪被逮捕
14:10
And at the trial, her trial, a very distinguished pediatrician gave evidence
278
850000
4000
在法庭上 一个非常著名的小儿科医师作证
14:14
that the chance of two cot deaths, innocent deaths, in a family like hers --
279
854000
5000
两个婴儿猝死 在一个像萨里的家里--
14:19
which was professional and non-smoking -- was one in 73 million.
280
859000
6000
有经验并不吸烟的--概率为七千三百万分之一
14:26
To cut a long story short, she was convicted at the time.
281
866000
3000
长话短说 她最后被判有罪
14:29
Later, and fairly recently, acquitted on appeal -- in fact, on the second appeal.
282
869000
5000
后来 最近 她在上诉中无罪释放了
14:34
And just to set it in context, you can imagine how awful it is for someone
283
874000
4000
当置于实际情境中 大家就能想象 一个人失去了一个孩子
14:38
to have lost one child, and then two, if they're innocent,
284
878000
3000
然后又失去了另一个 然后又被诬为凶手
14:41
to be convicted of murdering them.
285
881000
2000
这是多么可怕的事情
14:43
To be put through the stress of the trial, convicted of murdering them --
286
883000
2000
要被迫承受审判的压力 并判有罪--
14:45
and to spend time in a women's prison, where all the other prisoners
287
885000
3000
在女监里熬过一段日子 那里所有的囚犯
14:48
think you killed your children -- is a really awful thing to happen to someone.
288
888000
5000
都认为是你杀了孩子--这件事发生在一个人身上真是太可怕了
14:53
And it happened in large part here because the expert got the statistics
289
893000
5000
而这些事的发生 很大程度上是因为那个专家
14:58
horribly wrong, in two different ways.
290
898000
3000
得出的数据是错误的 错误出在两方面
15:01
So where did he get the one in 73 million number?
291
901000
4000
那么他是怎样得出七千三百万分之一这个数字的呢
15:05
He looked at some research, which said the chance of one cot death in a family
292
905000
3000
他看了一些研究 那些研究上说一个家庭里一个婴儿猝死的概率
15:08
like Sally Clark's is about one in 8,500.
293
908000
5000
就像萨里•克拉克家 这概率是八千五百分之一
15:13
So he said, "I'll assume that if you have one cot death in a family,
294
913000
4000
所以他说:“我假设如果一个家庭中出现了一个婴儿猝死
15:17
the chance of a second child dying from cot death aren't changed."
295
917000
4000
那么第二个婴儿发生猝死的概率也不会变。”
15:21
So that's what statisticians would call an assumption of independence.
296
921000
3000
这被统计学家们称为独立事件
15:24
It's like saying, "If you toss a coin and get a head the first time,
297
924000
2000
这就像是在说:“如果你掷硬币第一次是正
15:26
that won't affect the chance of getting a head the second time."
298
926000
3000
这并不会影响第二次投掷得到正的概率。”
15:29
So if you toss a coin twice, the chance of getting a head twice are a half --
299
929000
5000
所以如果你扔两次硬币 第一次正的几率是二分之一
15:34
that's the chance the first time -- times a half -- the chance a second time.
300
934000
3000
第二次正的几率也是二分之一
15:37
So he said, "Here,
301
937000
2000
所以他说:“我们来假设
15:39
I'll assume that these events are independent.
302
939000
4000
假设这些事件是独立的
15:43
When you multiply 8,500 together twice,
303
943000
2000
当你将八千五百分之一相乘
15:45
you get about 73 million."
304
945000
2000
你就会得到七千三百分之一
15:47
And none of this was stated to the court as an assumption
305
947000
2000
而上面这些并没有在法庭上向陪审团
15:49
or presented to the jury that way.
306
949000
2000
展示作为前提
15:52
Unfortunately here -- and, really, regrettably --
307
952000
3000
不幸的是--确实很令人遗憾--
15:55
first of all, in a situation like this you'd have to verify it empirically.
308
955000
4000
首先 在这种情况下要先以经验判断
15:59
And secondly, it's palpably false.
309
959000
2000
第二 这可能是错的
16:02
There are lots and lots of things that we don't know about sudden infant deaths.
310
962000
5000
我们对婴儿猝死综合症有太多不了解
16:07
It might well be that there are environmental factors that we're not aware of,
311
967000
3000
很可能有一些我们并不知道的环境因素
16:10
and it's pretty likely to be the case that there are
312
970000
2000
也很可能是有一些
16:12
genetic factors we're not aware of.
313
972000
2000
我们并不了解的基因因素
16:14
So if a family suffers from one cot death, you'd put them in a high-risk group.
314
974000
3000
所以如果一个家庭出现一个婴儿猝死 你就要把他们放到高概率组
16:17
They've probably got these environmental risk factors
315
977000
2000
他们很可能有这些环境因素
16:19
and/or genetic risk factors we don't know about.
316
979000
3000
和/或基因因素 而我们对这些并不知情
16:22
And to argue, then, that the chance of a second death is as if you didn't know
317
982000
3000
而就像不知道上面得出的信息一样 确定第二个死亡的概率
16:25
that information is really silly.
318
985000
3000
是非常愚蠢的
16:28
It's worse than silly -- it's really bad science.
319
988000
4000
这比愚蠢还糟--这是坏科学
16:32
Nonetheless, that's how it was presented, and at trial nobody even argued it.
320
992000
5000
但是 这推论就这样呈现在法庭上 而几乎没有人质疑
16:37
That's the first problem.
321
997000
2000
这是第一个问题
16:39
The second problem is, what does the number of one in 73 million mean?
322
999000
4000
第二个问题是 七千三百万分之一这个数字意味着什么
16:43
So after Sally Clark was convicted --
323
1003000
2000
在萨里•克拉克被定罪后--
16:45
you can imagine, it made rather a splash in the press --
324
1005000
4000
可以想象 这在媒体中引起轩然大波--
16:49
one of the journalists from one of Britain's more reputable newspapers wrote that
325
1009000
7000
一个英国相当有名望的报社记者写到
16:56
what the expert had said was,
326
1016000
2000
这个专家说
16:58
"The chance that she was innocent was one in 73 million."
327
1018000
5000
“她无罪的几率是七千三百万分之一”
17:03
Now, that's a logical error.
328
1023000
2000
这是一个逻辑上的错误
17:05
It's exactly the same logical error as the logical error of thinking that
329
1025000
3000
这个错误相当于认为
17:08
after the disease test, which is 99 percent accurate,
330
1028000
2000
在准确率99%的疾病测试后
17:10
the chance of having the disease is 99 percent.
331
1030000
4000
患病的几率是99%
17:14
In the disease example, we had to bear in mind two things,
332
1034000
4000
在疾病的例子中 我们要注意两点
17:18
one of which was the possibility that the test got it right or not.
333
1038000
4000
一个是这个测试得出的可能性是否正确
17:22
And the other one was the chance, a priori, that the person had the disease or not.
334
1042000
4000
另一个就是这个人本身是否患病
17:26
It's exactly the same in this context.
335
1046000
3000
这个情形是完全相同的
17:29
There are two things involved -- two parts to the explanation.
336
1049000
4000
这个解释包括两个部分
17:33
We want to know how likely, or relatively how likely, two different explanations are.
337
1053000
4000
我们想知道这两种不同解释发生的可能性 或相对的可能性
17:37
One of them is that Sally Clark was innocent --
338
1057000
3000
一个是 萨里•克拉克是清白的--
17:40
which is, a priori, overwhelmingly likely --
339
1060000
2000
也就是 一个先验 极为可能--
17:42
most mothers don't kill their children.
340
1062000
3000
大多母亲不会杀自己的孩子
17:45
And the second part of the explanation
341
1065000
2000
这个解释的第二部分
17:47
is that she suffered an incredibly unlikely event.
342
1067000
3000
就是她遭遇了一个可能性极小的时间
17:50
Not as unlikely as one in 73 million, but nonetheless rather unlikely.
343
1070000
4000
不像七千三百万分之一那样小 但也同样不可能
17:54
The other explanation is that she was guilty.
344
1074000
2000
另一个解释就是
17:56
Now, we probably think a priori that's unlikely.
345
1076000
2000
我们可能认为一个先验是 不大可能
17:58
And we certainly should think in the context of a criminal trial
346
1078000
3000
然后我们当然应该认为在刑事审判的情形下
18:01
that that's unlikely, because of the presumption of innocence.
347
1081000
3000
这是不大可能的 因为我们以无罪为前提
18:04
And then if she were trying to kill the children, she succeeded.
348
1084000
4000
如果她那时试着杀害孩子 那么她成功了
18:08
So the chance that she's innocent isn't one in 73 million.
349
1088000
4000
所以她无罪的机率并不是七千三百万分之一
18:12
We don't know what it is.
350
1092000
2000
我们不知道这个个机率是多少
18:14
It has to do with weighing up the strength of the other evidence against her
351
1094000
4000
这同衡量其它对她不利的证据
18:18
and the statistical evidence.
352
1098000
2000
和数据型证据有关
18:20
We know the children died.
353
1100000
2000
我们知道 孩子死了
18:22
What matters is how likely or unlikely, relative to each other,
354
1102000
4000
重要的是这两种解释
18:26
the two explanations are.
355
1106000
2000
相对发生的机率
18:28
And they're both implausible.
356
1108000
2000
他们都令人难以置信
18:31
There's a situation where errors in statistics had really profound
357
1111000
4000
在这种情形下 错误的数据
18:35
and really unfortunate consequences.
358
1115000
3000
产生了很重大而且不幸的结果
18:38
In fact, there are two other women who were convicted on the basis of the
359
1118000
2000
事实上 还有其他两个女人因这个小儿科医师的作证
18:40
evidence of this pediatrician, who have subsequently been released on appeal.
360
1120000
4000
而被定罪 而她们在上诉中都被无罪释放了
18:44
Many cases were reviewed.
361
1124000
2000
很多案子都因此而重审
18:46
And it's particularly topical because he's currently facing a disrepute charge
362
1126000
4000
这引起了很高的关注 因为他正面临着
18:50
at Britain's General Medical Council.
363
1130000
3000
英国综合医学委员会的名誉调查
18:53
So just to conclude -- what are the take-home messages from this?
364
1133000
4000
总结一下 我们应该得到什么警示呢
18:57
Well, we know that randomness and uncertainty and chance
365
1137000
4000
我们知道 随机性、不确定性和概率
19:01
are very much a part of our everyday life.
366
1141000
3000
在生活中影响重大
19:04
It's also true -- and, although, you, as a collective, are very special in many ways,
367
1144000
5000
并且大家作为一个集体 在很多方面都很特别
19:09
you're completely typical in not getting the examples I gave right.
368
1149000
4000
大家没有回答正确我给出的例子 是完全正常并具有代表性的
19:13
It's very well documented that people get things wrong.
369
1153000
3000
有很多人们理解错误的记录
19:16
They make errors of logic in reasoning with uncertainty.
370
1156000
3000
他们在不确定性方面犯逻辑错误
19:20
We can cope with the subtleties of language brilliantly --
371
1160000
2000
我们可以很好地解决语言的细微差别
19:22
and there are interesting evolutionary questions about how we got here.
372
1162000
3000
还有有趣的进化方面的问题 如我们是怎么来到这里的
19:25
We are not good at reasoning with uncertainty.
373
1165000
3000
我们并不擅长不确定性
19:28
That's an issue in our everyday lives.
374
1168000
2000
这是我们生活中的一个问题
19:30
As you've heard from many of the talks, statistics underpins an enormous amount
375
1170000
3000
像你们听过的很多演讲 数据是很多科学研究中
19:33
of research in science -- in social science, in medicine
376
1173000
3000
的基础--社会科学 医学
19:36
and indeed, quite a lot of industry.
377
1176000
2000
确实 很多行业
19:38
All of quality control, which has had a major impact on industrial processing,
378
1178000
4000
所有的质量控制 这些对工业过程的影响极其重要
19:42
is underpinned by statistics.
379
1182000
2000
这些都以数据为基础
19:44
It's something we're bad at doing.
380
1184000
2000
而这方面我们并不擅长
19:46
At the very least, we should recognize that, and we tend not to.
381
1186000
3000
至少我们应该意识到这一点 并尽力防止错误发生
19:49
To go back to the legal context, at the Sally Clark trial
382
1189000
4000
回到法律方面 在萨里•克拉克的案子中
19:53
all of the lawyers just accepted what the expert said.
383
1193000
4000
所有律师都接受了专家的证词
19:57
So if a pediatrician had come out and said to a jury,
384
1197000
2000
如果一个小儿科医师出来对陪审团作证
19:59
"I know how to build bridges. I've built one down the road.
385
1199000
3000
我不知道怎样建造桥梁 我在路那边建了一个
20:02
Please drive your car home over it,"
386
1202000
2000
开车回家的时候请放心过桥
20:04
they would have said, "Well, pediatricians don't know how to build bridges.
387
1204000
2000
他们会说 小儿科医师不懂怎样建造桥梁
20:06
That's what engineers do."
388
1206000
2000
那是工程师的工作
20:08
On the other hand, he came out and effectively said, or implied,
389
1208000
3000
而另一方面 他们站出来说 或暗示
20:11
"I know how to reason with uncertainty. I know how to do statistics."
390
1211000
3000
我知道怎样运用不确定性 我知道怎样处理数据
20:14
And everyone said, "Well, that's fine. He's an expert."
391
1214000
3000
然后大家都说 这没问题 他是专家
20:17
So we need to understand where our competence is and isn't.
392
1217000
3000
所以我们应该明白我们的什么是我们的强项 什么不是
20:20
Exactly the same kinds of issues arose in the early days of DNA profiling,
393
1220000
4000
完全相同类型的问题每天都出现在DNA的测绘中
20:24
when scientists, and lawyers and in some cases judges,
394
1224000
4000
科学家 律师 有些情况下甚至法官
20:28
routinely misrepresented evidence.
395
1228000
3000
都会错误地解释证据
20:32
Usually -- one hopes -- innocently, but misrepresented evidence.
396
1232000
3000
通常--大家希望--结果是无罪 只是错误地解释了证据
20:35
Forensic scientists said, "The chance that this guy's innocent is one in three million."
397
1235000
5000
法庭上的科学家说 这个人无罪的机率是三百万分之一
20:40
Even if you believe the number, just like the 73 million to one,
398
1240000
2000
即使你相信这个数据 就像七千三百万分之一
20:42
that's not what it meant.
399
1242000
2000
这也并不是它真正的含义
20:44
And there have been celebrated appeal cases
400
1244000
2000
因为这个在英国和其他地方
20:46
in Britain and elsewhere because of that.
401
1246000
2000
有很多上诉案件
20:48
And just to finish in the context of the legal system.
402
1248000
3000
这就是在法律层面上我们要考虑的问题
20:51
It's all very well to say, "Let's do our best to present the evidence."
403
1251000
4000
说“我们尽量给予证据更好的解释”固然很好
20:55
But more and more, in cases of DNA profiling -- this is another one --
404
1255000
3000
但越来越的地 在DNA测绘中--这也很重要--
20:58
we expect juries, who are ordinary people --
405
1258000
3000
我们希望陪审团 那些普通人--
21:01
and it's documented they're very bad at this --
406
1261000
2000
记录表明他们非常不擅此类--
21:03
we expect juries to be able to cope with the sorts of reasoning that goes on.
407
1263000
4000
我们希望陪审团能够处理好这些推理
21:07
In other spheres of life, if people argued -- well, except possibly for politics --
408
1267000
5000
在生活的其它方面 如果人们在争辩的时候--当然 也许不包括政治
21:12
but in other spheres of life, if people argued illogically,
409
1272000
2000
但是在生活的其他方面 如果人们争辩地并不合逻辑
21:14
we'd say that's not a good thing.
410
1274000
2000
我们认为这不是好现象
21:16
We sort of expect it of politicians and don't hope for much more.
411
1276000
4000
在不确定性方面 我们也从某种程度上对政客抱有希望
21:20
In the case of uncertainty, we get it wrong all the time --
412
1280000
3000
但并不奢求什么 我们一直都没对过
21:23
and at the very least, we should be aware of that,
413
1283000
2000
至少 我们应该认识到这一点
21:25
and ideally, we might try and do something about it.
414
1285000
2000
并且 希望我们能试着做什么去改变这一点
21:27
Thanks very much.
415
1287000
1000
谢谢大家
关于本网站

这个网站将向你介绍对学习英语有用的YouTube视频。你将看到来自世界各地的一流教师教授的英语课程。双击每个视频页面上显示的英文字幕,即可从那里播放视频。字幕会随着视频的播放而同步滚动。如果你有任何意见或要求,请使用此联系表与我们联系。

https://forms.gle/WvT1wiN1qDtmnspy7


This website was created in October 2020 and last updated on June 12, 2025.

It is now archived and preserved as an English learning resource.

Some information may be out of date.

隐私政策

eng.lish.video

Developer's Blog