Peter Donnelly: How stats fool juries

244,315 views ・ 2007-01-12

TED


请双击下面的英文字幕来播放视频。

翻译人员: Xiaofei Zhang 校对人员: Zhu Jie
00:25
As other speakers have said, it's a rather daunting experience --
0
25000
2000
正如一些演讲者所说 在这里的观众面前演讲
00:27
a particularly daunting experience -- to be speaking in front of this audience.
1
27000
3000
是一次令人畏缩的经历--相当令人恐慌
00:30
But unlike the other speakers, I'm not going to tell you about
2
30000
3000
不过与其他演讲者不同 我不会给大家讲
00:33
the mysteries of the universe, or the wonders of evolution,
3
33000
2000
宇宙的迷团 也不会讲进化的奥妙
00:35
or the really clever, innovative ways people are attacking
4
35000
4000
抑或是人们用来对抗世界上主要的不平等现象的
00:39
the major inequalities in our world.
5
39000
2000
那些着实非常奇妙新颖的办法
00:41
Or even the challenges of nation-states in the modern global economy.
6
41000
5000
更不会讲现代全球经济下国家之间的挑战
00:46
My brief, as you've just heard, is to tell you about statistics --
7
46000
4000
就像你们刚才听到的 概括来说 我讲的内容是统计学--
00:50
and, to be more precise, to tell you some exciting things about statistics.
8
50000
3000
更确切地说 是一些统计学中很有趣的事情
00:53
And that's --
9
53000
1000
而这--
00:54
(Laughter)
10
54000
1000
(笑)
00:55
-- that's rather more challenging
11
55000
2000
--相对所有在我之前以及之后的演讲者而言
00:57
than all the speakers before me and all the ones coming after me.
12
57000
2000
具有空前绝后的挑战性
00:59
(Laughter)
13
59000
1000
(笑)
01:01
One of my senior colleagues told me, when I was a youngster in this profession,
14
61000
5000
当我在统计学这个领域还是新人的时候 一个资深同事相当自豪地告诉我
01:06
rather proudly, that statisticians were people who liked figures
15
66000
4000
统计学家是那些喜欢数字
01:10
but didn't have the personality skills to become accountants.
16
70000
3000
但性格上不适合做会计的人
01:13
(Laughter)
17
73000
2000
(笑)
01:15
And there's another in-joke among statisticians, and that's,
18
75000
3000
还有一个统计学的笑话
01:18
"How do you tell the introverted statistician from the extroverted statistician?"
19
78000
3000
“怎样看出统计学家是内向还是外向呢?”
01:21
To which the answer is,
20
81000
2000
答案就是
01:23
"The extroverted statistician's the one who looks at the other person's shoes."
21
83000
5000
“外向的统计学家会看别人的鞋”
01:28
(Laughter)
22
88000
3000
(笑)
01:31
But I want to tell you something useful -- and here it is, so concentrate now.
23
91000
5000
不过其实我想讲一些有用的--所以请注意
01:36
This evening, there's a reception in the University's Museum of Natural History.
24
96000
3000
今晚在学校的自然历史博物馆里有一个招待会
01:39
And it's a wonderful setting, as I hope you'll find,
25
99000
2000
希望你能发现 这是一个绝妙的场合
01:41
and a great icon to the best of the Victorian tradition.
26
101000
5000
也是维多利亚优秀传统中的表现
01:46
It's very unlikely -- in this special setting, and this collection of people --
27
106000
5000
在这样的场合 这样的人群中 虽然有点不大可能
01:51
but you might just find yourself talking to someone you'd rather wish that you weren't.
28
111000
3000
但你也许仍然发现你在跟一些你并不想聊天的人交谈
01:54
So here's what you do.
29
114000
2000
这时候你就可以这么做
01:56
When they say to you, "What do you do?" -- you say, "I'm a statistician."
30
116000
4000
当他们问:“你的工作是?”--你就说:“我是统计学家”
02:00
(Laughter)
31
120000
1000
(笑)
02:01
Well, except they've been pre-warned now, and they'll know you're making it up.
32
121000
4000
除非他们事先得到提醒 知道这是你编的
02:05
And then one of two things will happen.
33
125000
2000
一般出现的情形都不过以下两种
02:07
They'll either discover their long-lost cousin in the other corner of the room
34
127000
2000
他们会突然在屋子另一角发现了失散多年的表亲
02:09
and run over and talk to them.
35
129000
2000
然后赶去跟他们说话
02:11
Or they'll suddenly become parched and/or hungry -- and often both --
36
131000
3000
或者他们会突然很渴或者很饿--通常是饥渴交迫--
02:14
and sprint off for a drink and some food.
37
134000
2000
然后奔向食物和饮料
02:16
And you'll be left in peace to talk to the person you really want to talk to.
38
136000
4000
这是你就能一个人静下来 跟你想聊天的人交谈
02:20
It's one of the challenges in our profession to try and explain what we do.
39
140000
3000
解释我们到底是做什么的 是我们这个领域的一个挑战
02:23
We're not top on people's lists for dinner party guests and conversations and so on.
40
143000
5000
我们并不是晚宴的贵宾 也不是理想的交谈对象
02:28
And it's something I've never really found a good way of doing.
41
148000
2000
对此我也一直没能找到什么好的解决办法
02:30
But my wife -- who was then my girlfriend --
42
150000
3000
但我的妻子--当时是我的女朋友
02:33
managed it much better than I've ever been able to.
43
153000
3000
在这件事上就比我出色的多
02:36
Many years ago, when we first started going out, she was working for the BBC in Britain,
44
156000
3000
多年前 那时我们刚开始约会 她在英国BBC工作
02:39
and I was, at that stage, working in America.
45
159000
2000
而我当时在美国
02:41
I was coming back to visit her.
46
161000
2000
我回英国看她的时候
02:43
She told this to one of her colleagues, who said, "Well, what does your boyfriend do?"
47
163000
6000
她跟一个同事说起这事 那个同事问:“你男朋友是做什么的?”
02:49
Sarah thought quite hard about the things I'd explained --
48
169000
2000
她苦苦思索着我刚才解释过的工作
02:51
and she concentrated, in those days, on listening.
49
171000
4000
于是那段时间她一直是一个专心的倾听者
02:55
(Laughter)
50
175000
2000
(笑)
02:58
Don't tell her I said that.
51
178000
2000
别告诉她我跟说过这事
03:00
And she was thinking about the work I did developing mathematical models
52
180000
4000
她当时想 我的工作是建立数模
03:04
for understanding evolution and modern genetics.
53
184000
3000
来加深对进化和现代基因学的了解
03:07
So when her colleague said, "What does he do?"
54
187000
3000
所以当同事问:“他是干什么的?”
03:10
She paused and said, "He models things."
55
190000
4000
她就停顿一下 然后说:“他做模型。”
03:14
(Laughter)
56
194000
1000
(笑)
03:15
Well, her colleague suddenly got much more interested than I had any right to expect
57
195000
4000
当然 她的同事立即就对我产生了出乎我意料的兴趣
03:19
and went on and said, "What does he model?"
58
199000
3000
并继续问:“他做什么模型?”
03:22
Well, Sarah thought a little bit more about my work and said, "Genes."
59
202000
3000
然后 萨拉又想了想我的工作 然后答:“基因。”
03:25
(Laughter)
60
205000
4000
(笑)
03:29
"He models genes."
61
209000
2000
“他建立基因模型。”
03:31
That is my first love, and that's what I'll tell you a little bit about.
62
211000
4000
这就是我的初恋 题外话了
03:35
What I want to do more generally is to get you thinking about
63
215000
4000
总的来说 我要给大家讲一些
03:39
the place of uncertainty and randomness and chance in our world,
64
219000
3000
不确定性、随机性和概率在生活中的影响
03:42
and how we react to that, and how well we do or don't think about it.
65
222000
5000
我们对此的反应是怎样的 以及我们了解他们的程度
03:47
So you've had a pretty easy time up till now --
66
227000
2000
到现在为止大家听得都很轻松
03:49
a few laughs, and all that kind of thing -- in the talks to date.
67
229000
2000
到现在为止都是听听笑笑
03:51
You've got to think, and I'm going to ask you some questions.
68
231000
3000
现在大家要开始思考了 我会提几个问题
03:54
So here's the scene for the first question I'm going to ask you.
69
234000
2000
下面这个场景就是我开始问第一个问题
03:56
Can you imagine tossing a coin successively?
70
236000
3000
想象连续掷硬币的情形
03:59
And for some reason -- which shall remain rather vague --
71
239000
3000
由于某种原因--我就暂时不做过多的解释了--
04:02
we're interested in a particular pattern.
72
242000
2000
我们很喜欢某种特定的情形
04:04
Here's one -- a head, followed by a tail, followed by a tail.
73
244000
3000
比如这个--正面、反面、正面
04:07
So suppose we toss a coin repeatedly.
74
247000
3000
假设我们连续掷硬币
04:10
Then the pattern, head-tail-tail, that we've suddenly become fixated with happens here.
75
250000
5000
然后我们设定这样一个情形 正反反
04:15
And you can count: one, two, three, four, five, six, seven, eight, nine, 10 --
76
255000
4000
数着掷十次:一 二 三 四 五 六 七 八 九 十
04:19
it happens after the 10th toss.
77
259000
2000
然后看结果怎么样
04:21
So you might think there are more interesting things to do, but humor me for the moment.
78
261000
3000
你可能觉得还有更有趣的事可以做 不过这次先迁就我一下
04:24
Imagine this half of the audience each get out coins, and they toss them
79
264000
4000
假设这半边观众都拿出硬币开始投掷
04:28
until they first see the pattern head-tail-tail.
80
268000
3000
直到他们看到正反反现象为止
04:31
The first time they do it, maybe it happens after the 10th toss, as here.
81
271000
2000
第一回投硬币 也许十次以后才能看到
04:33
The second time, maybe it's after the fourth toss.
82
273000
2000
第二回 也许第四次就能看到
04:35
The next time, after the 15th toss.
83
275000
2000
再下一回 也许比15次还多
04:37
So you do that lots and lots of times, and you average those numbers.
84
277000
3000
做过很多遍这个实验后 将每遍的次数平均
04:40
That's what I want this side to think about.
85
280000
3000
这就是我想让这半边思考的情况
04:43
The other half of the audience doesn't like head-tail-tail --
86
283000
2000
那半边观众不喜欢正反反
04:45
they think, for deep cultural reasons, that's boring --
87
285000
3000
出于某些深刻的文化因素 他们觉得这很无聊--
04:48
and they're much more interested in a different pattern -- head-tail-head.
88
288000
3000
他们跟更喜欢另一种情形--正反正
04:51
So, on this side, you get out your coins, and you toss and toss and toss.
89
291000
3000
所以 这半边的观众拿出硬币 反复投掷
04:54
And you count the number of times until the pattern head-tail-head appears
90
294000
3000
然后记下看到正反正情形出现时掷硬币的次数
04:57
and you average them. OK?
91
297000
3000
然后将所有的次数平均
05:00
So on this side, you've got a number --
92
300000
2000
那么 这半边的观众得出了一个平均数
05:02
you've done it lots of times, so you get it accurately --
93
302000
2000
因为做了很多次 所以这个数字是准确的
05:04
which is the average number of tosses until head-tail-tail.
94
304000
3000
就是正反反情形出现时投掷硬币次数的平均
05:07
On this side, you've got a number -- the average number of tosses until head-tail-head.
95
307000
4000
而这半边的观众 大家也得出了一个数字--正反正情形的平均
05:11
So here's a deep mathematical fact --
96
311000
2000
那么就有了这样一个数学问题
05:13
if you've got two numbers, one of three things must be true.
97
313000
3000
两个数之间只能有三种情形
05:16
Either they're the same, or this one's bigger than this one,
98
316000
3000
他们或者相等 或者这个比那个大
05:19
or this one's bigger than that one.
99
319000
1000
或者那个比这个大
05:20
So what's going on here?
100
320000
3000
那么在我们这两种情形下这两个数相比会怎样呢
05:23
So you've all got to think about this, and you've all got to vote --
101
323000
2000
大家来思考一下 然后投个票
05:25
and we're not moving on.
102
325000
1000
现在给大家一些时间
05:26
And I don't want to end up in the two-minute silence
103
326000
2000
不过我不想因为给大家更多的时间思考直到每个人都立场明确
05:28
to give you more time to think about it, until everyone's expressed a view. OK.
104
328000
4000
而最后以两分钟沉默告终
05:32
So what you want to do is compare the average number of tosses until we first see
105
332000
4000
所以你们要做的只是比较这两种情形下
05:36
head-tail-head with the average number of tosses until we first see head-tail-tail.
106
336000
4000
平均数的大小
05:41
Who thinks that A is true --
107
341000
2000
哪些认为A是对的--
05:43
that, on average, it'll take longer to see head-tail-head than head-tail-tail?
108
343000
4000
即 平均来看 出现正反正的情形要晚于正反反情形?
05:47
Who thinks that B is true -- that on average, they're the same?
109
347000
3000
哪些认为B是对的--即 平均来看次数相同?
05:51
Who thinks that C is true -- that, on average, it'll take less time
110
351000
2000
哪些认为C是对的--即 平均来看 出现正反正情形的次数
05:53
to see head-tail-head than head-tail-tail?
111
353000
3000
要少于正反反的情形?
05:57
OK, who hasn't voted yet? Because that's really naughty -- I said you had to.
112
357000
3000
好 谁没有投票? 那真是很调皮--我说过你们要选择一个
06:00
(Laughter)
113
360000
1000
(笑)
06:02
OK. So most people think B is true.
114
362000
3000
好的 那么大多数人认为B是正确的
06:05
And you might be relieved to know even rather distinguished mathematicians think that.
115
365000
3000
也许当听到甚至非常优秀的数学家也是这么想的 你会放下心来
06:08
It's not. A is true here.
116
368000
4000
B不正确 答案是A
06:12
It takes longer, on average.
117
372000
2000
实际上 平均起来
06:14
In fact, the average number of tosses till head-tail-head is 10
118
374000
2000
正反正情形下掷硬币的次数是10次
06:16
and the average number of tosses until head-tail-tail is eight.
119
376000
5000
而正反反情形的次数是8次
06:21
How could that be?
120
381000
2000
怎么会这样呢
06:24
Anything different about the two patterns?
121
384000
3000
这两种情形有什么不同吗
06:30
There is. Head-tail-head overlaps itself.
122
390000
5000
二者的确不同 正反正情形会自我重叠
06:35
If you went head-tail-head-tail-head, you can cunningly get two occurrences
123
395000
4000
如果你掷出正-反-正-反-正 你能在这五次中
06:39
of the pattern in only five tosses.
124
399000
3000
看到两次正反正的情形
06:42
You can't do that with head-tail-tail.
125
402000
2000
而这在正反反的情形下无法实现
06:44
That turns out to be important.
126
404000
2000
这一点变得很重要
06:46
There are two ways of thinking about this.
127
406000
2000
有两种方法可以来想这个问题
06:48
I'll give you one of them.
128
408000
2000
我提供其中之一
06:50
So imagine -- let's suppose we're doing it.
129
410000
2000
假设我们正在进行这个实验
06:52
On this side -- remember, you're excited about head-tail-tail;
130
412000
2000
这半边观众--记住 你们希望看到正反反
06:54
you're excited about head-tail-head.
131
414000
2000
而你们希望看到正反正
06:56
We start tossing a coin, and we get a head --
132
416000
3000
我们开始投硬币 第一次是正
06:59
and you start sitting on the edge of your seat
133
419000
1000
大家都开始暗自激动
07:00
because something great and wonderful, or awesome, might be about to happen.
134
420000
5000
因为一个美妙绝伦的事情要发生了
07:05
The next toss is a tail -- you get really excited.
135
425000
2000
第二次是反--大家都很激动
07:07
The champagne's on ice just next to you; you've got the glasses chilled to celebrate.
136
427000
4000
手边的香槟已经冰好 大家都拿着杯子开始准备庆祝
07:11
You're waiting with bated breath for the final toss.
137
431000
2000
大家都屏气凝神观望最后一掷
07:13
And if it comes down a head, that's great.
138
433000
2000
如果是正 那么非常好
07:15
You're done, and you celebrate.
139
435000
2000
你们完了 而你们可以庆祝了
07:17
If it's a tail -- well, rather disappointedly, you put the glasses away
140
437000
2000
如果这是反--那么有些遗憾 你们要把杯子移开
07:19
and put the champagne back.
141
439000
2000
然后把香槟放回去
07:21
And you keep tossing, to wait for the next head, to get excited.
142
441000
3000
接着掷硬币 等着下一个正 然后开始激动
07:25
On this side, there's a different experience.
143
445000
2000
而这半边则完全不同
07:27
It's the same for the first two parts of the sequence.
144
447000
3000
这个序列中前两步都是相同的
07:30
You're a little bit excited with the first head --
145
450000
2000
大家因第一个是正有点兴奋
07:32
you get rather more excited with the next tail.
146
452000
2000
当第二个是反的时候 变得更加激动
07:34
Then you toss the coin.
147
454000
2000
然后再掷硬币
07:36
If it's a tail, you crack open the champagne.
148
456000
3000
如果是反 你们就可以打开香槟了
07:39
If it's a head you're disappointed,
149
459000
2000
如果是正 你们会感到失望
07:41
but you're still a third of the way to your pattern again.
150
461000
3000
但你们仍旧已经完成了这个模式的三分之一
07:44
And that's an informal way of presenting it -- that's why there's a difference.
151
464000
4000
这就是一种不大正式的解释--这就是出现不同的原因
07:48
Another way of thinking about it --
152
468000
2000
另外一种思考的方法就是--
07:50
if we tossed a coin eight million times,
153
470000
2000
如果我们掷八百万次硬币
07:52
then we'd expect a million head-tail-heads
154
472000
2000
我们可能会预计有一百万正反正情形
07:54
and a million head-tail-tails -- but the head-tail-heads could occur in clumps.
155
474000
7000
和一百万次正反反情形的出现--但正反正的情形可能接连出现
08:01
So if you want to put a million things down amongst eight million positions
156
481000
2000
所以如果你想在八百万个位置中得到一百万个固定的模式
08:03
and you can have some of them overlapping, the clumps will be further apart.
157
483000
5000
可能会有一些是重叠的 重叠的部分会很长
08:08
It's another way of getting the intuition.
158
488000
2000
这就是另外一种思考方法
08:10
What's the point I want to make?
159
490000
2000
那么这说明什么问题呢?
08:12
It's a very, very simple example, an easily stated question in probability,
160
492000
4000
这是一个非常简单的例子 一个很简单明了的问题--
08:16
which every -- you're in good company -- everybody gets wrong.
161
496000
3000
有很多人跟你们一样--这个问题几乎没有人答对
08:19
This is my little diversion into my real passion, which is genetics.
162
499000
4000
这是一个小小的题外话 我很想讲的 是基因学
08:23
There's a connection between head-tail-heads and head-tail-tails in genetics,
163
503000
3000
在基因学中 正反正和正反反两种情形间存在某种联系
08:26
and it's the following.
164
506000
3000
这个联系是这样的
08:29
When you toss a coin, you get a sequence of heads and tails.
165
509000
3000
掷硬币的时候 你会得到一个正和反组成的序列
08:32
When you look at DNA, there's a sequence of not two things -- heads and tails --
166
512000
3000
而当观察DNA时 会发现这不是两个元素组成的序列--正反正--
08:35
but four letters -- As, Gs, Cs and Ts.
167
515000
3000
而是四个字母--A G C T
08:38
And there are little chemical scissors, called restriction enzymes
168
518000
3000
有一些小小的化学剪刀 叫做限制性内切酶
08:41
which cut DNA whenever they see particular patterns.
169
521000
2000
当它们遇到特定的情形时 就会剪断DNA
08:43
And they're an enormously useful tool in modern molecular biology.
170
523000
4000
在现代分子生物学中它们是非常有用的工具
08:48
And instead of asking the question, "How long until I see a head-tail-head?" --
171
528000
3000
在基因学中 我们不问“什么时候能看到正反正的情形?”
08:51
you can ask, "How big will the chunks be when I use a restriction enzyme
172
531000
3000
你可以问 比如说 “如果用限制性内切酶来剪断任何它遇到的GAAG排列
08:54
which cuts whenever it sees G-A-A-G, for example?
173
534000
4000
剪下来的基因部分会有多大?”
08:58
How long will those chunks be?"
174
538000
2000
那些基因部分会有多长?
09:00
That's a rather trivial connection between probability and genetics.
175
540000
5000
这是概率和基因之间的一个相当细微的联系
09:05
There's a much deeper connection, which I don't have time to go into
176
545000
3000
他们之间还有一个更深的联系 这里我没有时间多讲
09:08
and that is that modern genetics is a really exciting area of science.
177
548000
3000
那就是 现代基因学是一个很令人激动的科学领域
09:11
And we'll hear some talks later in the conference specifically about that.
178
551000
4000
以后我们可能会在某些大会的演讲中听到这个部分
09:15
But it turns out that unlocking the secrets in the information generated by modern
179
555000
4000
但是若把现代实验技术中发现的秘密公开,
09:19
experimental technologies, a key part of that has to do with fairly sophisticated --
180
559000
5000
关键就是那必须与一些相当复杂的--
09:24
you'll be relieved to know that I do something useful in my day job,
181
564000
3000
当听到我的工作是多有用的时候你们会倍感释然
09:27
rather more sophisticated than the head-tail-head story --
182
567000
2000
比正反正的试验要复杂地多--
09:29
but quite sophisticated computer modelings and mathematical modelings
183
569000
4000
但是相当复杂的计算机建模 数学建模
09:33
and modern statistical techniques.
184
573000
2000
以及现代统计技术
09:35
And I will give you two little snippets -- two examples --
185
575000
3000
我会举在牛津我们团队正在研究的项目中
09:38
of projects we're involved in in my group in Oxford,
186
578000
3000
的两个小例子
09:41
both of which I think are rather exciting.
187
581000
2000
我认为这两个例子都很有趣
09:43
You know about the Human Genome Project.
188
583000
2000
大家都了解人类基因组计划
09:45
That was a project which aimed to read one copy of the human genome.
189
585000
4000
那是一个项目 目的在于构建人类基因组遗传图谱
09:51
The natural thing to do after you've done that --
190
591000
2000
当完成那个项目后 下一步自然是--
09:53
and that's what this project, the International HapMap Project,
191
593000
2000
--就是这个计划 国际人类基因组单体型图计划
09:55
which is a collaboration between labs in five or six different countries.
192
595000
5000
目前有五六个不同个国家的实验室在合作研究
10:00
Think of the Human Genome Project as learning what we've got in common,
193
600000
4000
把人类基因遗传图谱看做是对我们共同点的了解
10:04
and the HapMap Project is trying to understand
194
604000
2000
而国际人类基因组单体型图计划就是试着了解
10:06
where there are differences between different people.
195
606000
2000
人类之间的不同
10:08
Why do we care about that?
196
608000
2000
为什么要这么关注这些呢?
10:10
Well, there are lots of reasons.
197
610000
2000
这有很多原因
10:12
The most pressing one is that we want to understand how some differences
198
612000
4000
最紧迫的一个就是 我们想了解其中一些不同
10:16
make some people susceptible to one disease -- type-2 diabetes, for example --
199
616000
4000
是怎样让一些人容易患一种病的--比如说 二型糖尿病--
10:20
and other differences make people more susceptible to heart disease,
200
620000
5000
而另一些不同使人更容易得心脏病
10:25
or stroke, or autism and so on.
201
625000
2000
或中风 自闭症等等其它病症
10:27
That's one big project.
202
627000
2000
这是一个宏大的项目
10:29
There's a second big project,
203
629000
2000
最近 英国威康信托基金会资助了一个项目
10:31
recently funded by the Wellcome Trust in this country,
204
631000
2000
其规模仅次于上一个项目
10:33
involving very large studies --
205
633000
2000
它包括了很多大型的研究--
10:35
thousands of individuals, with each of eight different diseases,
206
635000
3000
成千上万的人各负责八种不同的疾病
10:38
common diseases like type-1 and type-2 diabetes, and coronary heart disease,
207
638000
4000
有一些比较常见的疾病 比如一型糖尿病 二型糖尿病和冠心病
10:42
bipolar disease and so on -- to try and understand the genetics.
208
642000
4000
躁狂抑郁症等等--来试着了解基因
10:46
To try and understand what it is about genetic differences that causes the diseases.
209
646000
3000
着这了解那些导致疾病的基因的不同之处
10:49
Why do we want to do that?
210
649000
2000
为什么我们想做这些呢?
10:51
Because we understand very little about most human diseases.
211
651000
3000
因为我们对大多数人类疾病都了解甚微
10:54
We don't know what causes them.
212
654000
2000
我们不知道病因是什么
10:56
And if we can get in at the bottom and understand the genetics,
213
656000
2000
如果我们从根本入手并了解基因
10:58
we'll have a window on the way the disease works,
214
658000
3000
这边开启了一个通向疾病病理的窗口
11:01
and a whole new way about thinking about disease therapies
215
661000
2000
也开辟了思考疾病治疗方法
11:03
and preventative treatment and so on.
216
663000
3000
和预防措施的新路径
11:06
So that's, as I said, the little diversion on my main love.
217
666000
3000
所以 就像我之前说过的那样 这是我主要兴趣的一个小分支
11:09
Back to some of the more mundane issues of thinking about uncertainty.
218
669000
5000
回到一些关于随机性的平凡的问题上来
11:14
Here's another quiz for you --
219
674000
2000
这是给你们的另一个测试--
11:16
now suppose we've got a test for a disease
220
676000
2000
现在假设我们拿到了一个疾病的检测
11:18
which isn't infallible, but it's pretty good.
221
678000
2000
这个检测并不是完全准确的 但准确性很高
11:20
It gets it right 99 percent of the time.
222
680000
3000
这个检测的准确性高达99%
11:23
And I take one of you, or I take someone off the street,
223
683000
3000
现在我让你们中的一个人 或从街上拉来一个人
11:26
and I test them for the disease in question.
224
686000
2000
然后检测他患病的几率
11:28
Let's suppose there's a test for HIV -- the virus that causes AIDS --
225
688000
4000
假设这是一个艾滋病毒的测试--一个导致艾滋病的病毒--
11:32
and the test says the person has the disease.
226
692000
3000
而测试表明这个人患病
11:35
What's the chance that they do?
227
695000
3000
那么他患病的几率是多少呢
11:38
The test gets it right 99 percent of the time.
228
698000
2000
这个测试准确性是99%
11:40
So a natural answer is 99 percent.
229
700000
4000
所以自然而然会得出99%这个答案
11:44
Who likes that answer?
230
704000
2000
谁喜欢这个答案?
11:46
Come on -- everyone's got to get involved.
231
706000
1000
别这样--每个人都参与进来
11:47
Don't think you don't trust me anymore.
232
707000
2000
不要觉得你不再相信我了
11:49
(Laughter)
233
709000
1000
(笑)
11:50
Well, you're right to be a bit skeptical, because that's not the answer.
234
710000
3000
不过 你们的怀疑是正确的 因为这不是正确答案
11:53
That's what you might think.
235
713000
2000
你们可能是这么想的
11:55
It's not the answer, and it's not because it's only part of the story.
236
715000
3000
这不是正确答案 并不是因为这只是故事的一部分
11:58
It actually depends on how common or how rare the disease is.
237
718000
3000
而实际上它取决于这种病是常见的还是罕见的
12:01
So let me try and illustrate that.
238
721000
2000
现在我来试着说明一下
12:03
Here's a little caricature of a million individuals.
239
723000
4000
这个图代表一百万人
12:07
So let's think about a disease that affects --
240
727000
3000
我们来考虑一种疾病的感染率--
12:10
it's pretty rare, it affects one person in 10,000.
241
730000
2000
它非常罕见 在一万人中仅一人患病
12:12
Amongst these million individuals, most of them are healthy
242
732000
3000
在这一百万人中 大部分人都是健康的
12:15
and some of them will have the disease.
243
735000
2000
而一些人会患病
12:17
And in fact, if this is the prevalence of the disease,
244
737000
3000
实际上 如果这是疾病的流行程度
12:20
about 100 will have the disease and the rest won't.
245
740000
3000
那么约一百人会患病而其余人不会
12:23
So now suppose we test them all.
246
743000
2000
现在假设我们给所有人做了测试
12:25
What happens?
247
745000
2000
会出现什么情况呢
12:27
Well, amongst the 100 who do have the disease,
248
747000
2000
在100个患有该疾病的人中
12:29
the test will get it right 99 percent of the time, and 99 will test positive.
249
749000
5000
这个测试会有99%的正确性 所以99个人会检测出患病
12:34
Amongst all these other people who don't have the disease,
250
754000
2000
在那些没有患病的人中
12:36
the test will get it right 99 percent of the time.
251
756000
3000
这个测试仍然有99%的正确率
12:39
It'll only get it wrong one percent of the time.
252
759000
2000
只有1%是错误的
12:41
But there are so many of them that there'll be an enormous number of false positives.
253
761000
4000
但是没有患病的人太多了 所以错误的患病检测会非常多
12:45
Put that another way --
254
765000
2000
换种方法说--
12:47
of all of them who test positive -- so here they are, the individuals involved --
255
767000
5000
在所有结果是患病的检测中--就是这些人--
12:52
less than one in 100 actually have the disease.
256
772000
5000
真正患病的几率小于1%
12:57
So even though we think the test is accurate, the important part of the story is
257
777000
4000
所以即便我们认为这个测试是准确的 这个例子重要的部分在于
13:01
there's another bit of information we need.
258
781000
3000
我们还需要一些信息
13:04
Here's the key intuition.
259
784000
2000
这就是关键
13:07
What we have to do, once we know the test is positive,
260
787000
3000
当知道测试结果为患病时 我们要做的就是
13:10
is to weigh up the plausibility, or the likelihood, of two competing explanations.
261
790000
6000
权衡下面两种解释的概率或可能性
13:16
Each of those explanations has a likely bit and an unlikely bit.
262
796000
3000
每种解释都有一定的可能性
13:19
One explanation is that the person doesn't have the disease --
263
799000
3000
一种解释是这个人不患病--
13:22
that's overwhelmingly likely, if you pick someone at random --
264
802000
3000
这种可能性比较大 如果你随机选人的话--
13:25
but the test gets it wrong, which is unlikely.
265
805000
3000
但是测试结果错了 这种情况很罕见
13:29
The other explanation is that the person does have the disease -- that's unlikely --
266
809000
3000
另一种解释就是这个人不患病--这很少见--
13:32
but the test gets it right, which is likely.
267
812000
3000
但测试结果正确 这可能性很大
13:35
And the number we end up with --
268
815000
2000
而我们最后得到的数字--
13:37
that number which is a little bit less than one in 100 --
269
817000
3000
就是略少于100的数字--
13:40
is to do with how likely one of those explanations is relative to the other.
270
820000
6000
与这几种解释之间的关联性有关
13:46
Each of them taken together is unlikely.
271
826000
2000
每个解释合起来都不大可能
13:49
Here's a more topical example of exactly the same thing.
272
829000
3000
这是另一个说明同样道理的例子 更加切题
13:52
Those of you in Britain will know about what's become rather a celebrated case
273
832000
4000
在英国的听众知道 这是一个很有名的案子
13:56
of a woman called Sally Clark, who had two babies who died suddenly.
274
836000
5000
一个女人叫做萨里•克拉克 她有两个孩子 都突然去世
14:01
And initially, it was thought that they died of what's known informally as "cot death,"
275
841000
4000
很自然人们以为这属于婴儿猝死
14:05
and more formally as "Sudden Infant Death Syndrome."
276
845000
3000
更正式的说法是婴儿猝死综合征
14:08
For various reasons, she was later charged with murder.
277
848000
2000
由于多种原因 萨里后来以谋杀罪被逮捕
14:10
And at the trial, her trial, a very distinguished pediatrician gave evidence
278
850000
4000
在法庭上 一个非常著名的小儿科医师作证
14:14
that the chance of two cot deaths, innocent deaths, in a family like hers --
279
854000
5000
两个婴儿猝死 在一个像萨里的家里--
14:19
which was professional and non-smoking -- was one in 73 million.
280
859000
6000
有经验并不吸烟的--概率为七千三百万分之一
14:26
To cut a long story short, she was convicted at the time.
281
866000
3000
长话短说 她最后被判有罪
14:29
Later, and fairly recently, acquitted on appeal -- in fact, on the second appeal.
282
869000
5000
后来 最近 她在上诉中无罪释放了
14:34
And just to set it in context, you can imagine how awful it is for someone
283
874000
4000
当置于实际情境中 大家就能想象 一个人失去了一个孩子
14:38
to have lost one child, and then two, if they're innocent,
284
878000
3000
然后又失去了另一个 然后又被诬为凶手
14:41
to be convicted of murdering them.
285
881000
2000
这是多么可怕的事情
14:43
To be put through the stress of the trial, convicted of murdering them --
286
883000
2000
要被迫承受审判的压力 并判有罪--
14:45
and to spend time in a women's prison, where all the other prisoners
287
885000
3000
在女监里熬过一段日子 那里所有的囚犯
14:48
think you killed your children -- is a really awful thing to happen to someone.
288
888000
5000
都认为是你杀了孩子--这件事发生在一个人身上真是太可怕了
14:53
And it happened in large part here because the expert got the statistics
289
893000
5000
而这些事的发生 很大程度上是因为那个专家
14:58
horribly wrong, in two different ways.
290
898000
3000
得出的数据是错误的 错误出在两方面
15:01
So where did he get the one in 73 million number?
291
901000
4000
那么他是怎样得出七千三百万分之一这个数字的呢
15:05
He looked at some research, which said the chance of one cot death in a family
292
905000
3000
他看了一些研究 那些研究上说一个家庭里一个婴儿猝死的概率
15:08
like Sally Clark's is about one in 8,500.
293
908000
5000
就像萨里•克拉克家 这概率是八千五百分之一
15:13
So he said, "I'll assume that if you have one cot death in a family,
294
913000
4000
所以他说:“我假设如果一个家庭中出现了一个婴儿猝死
15:17
the chance of a second child dying from cot death aren't changed."
295
917000
4000
那么第二个婴儿发生猝死的概率也不会变。”
15:21
So that's what statisticians would call an assumption of independence.
296
921000
3000
这被统计学家们称为独立事件
15:24
It's like saying, "If you toss a coin and get a head the first time,
297
924000
2000
这就像是在说:“如果你掷硬币第一次是正
15:26
that won't affect the chance of getting a head the second time."
298
926000
3000
这并不会影响第二次投掷得到正的概率。”
15:29
So if you toss a coin twice, the chance of getting a head twice are a half --
299
929000
5000
所以如果你扔两次硬币 第一次正的几率是二分之一
15:34
that's the chance the first time -- times a half -- the chance a second time.
300
934000
3000
第二次正的几率也是二分之一
15:37
So he said, "Here,
301
937000
2000
所以他说:“我们来假设
15:39
I'll assume that these events are independent.
302
939000
4000
假设这些事件是独立的
15:43
When you multiply 8,500 together twice,
303
943000
2000
当你将八千五百分之一相乘
15:45
you get about 73 million."
304
945000
2000
你就会得到七千三百分之一
15:47
And none of this was stated to the court as an assumption
305
947000
2000
而上面这些并没有在法庭上向陪审团
15:49
or presented to the jury that way.
306
949000
2000
展示作为前提
15:52
Unfortunately here -- and, really, regrettably --
307
952000
3000
不幸的是--确实很令人遗憾--
15:55
first of all, in a situation like this you'd have to verify it empirically.
308
955000
4000
首先 在这种情况下要先以经验判断
15:59
And secondly, it's palpably false.
309
959000
2000
第二 这可能是错的
16:02
There are lots and lots of things that we don't know about sudden infant deaths.
310
962000
5000
我们对婴儿猝死综合症有太多不了解
16:07
It might well be that there are environmental factors that we're not aware of,
311
967000
3000
很可能有一些我们并不知道的环境因素
16:10
and it's pretty likely to be the case that there are
312
970000
2000
也很可能是有一些
16:12
genetic factors we're not aware of.
313
972000
2000
我们并不了解的基因因素
16:14
So if a family suffers from one cot death, you'd put them in a high-risk group.
314
974000
3000
所以如果一个家庭出现一个婴儿猝死 你就要把他们放到高概率组
16:17
They've probably got these environmental risk factors
315
977000
2000
他们很可能有这些环境因素
16:19
and/or genetic risk factors we don't know about.
316
979000
3000
和/或基因因素 而我们对这些并不知情
16:22
And to argue, then, that the chance of a second death is as if you didn't know
317
982000
3000
而就像不知道上面得出的信息一样 确定第二个死亡的概率
16:25
that information is really silly.
318
985000
3000
是非常愚蠢的
16:28
It's worse than silly -- it's really bad science.
319
988000
4000
这比愚蠢还糟--这是坏科学
16:32
Nonetheless, that's how it was presented, and at trial nobody even argued it.
320
992000
5000
但是 这推论就这样呈现在法庭上 而几乎没有人质疑
16:37
That's the first problem.
321
997000
2000
这是第一个问题
16:39
The second problem is, what does the number of one in 73 million mean?
322
999000
4000
第二个问题是 七千三百万分之一这个数字意味着什么
16:43
So after Sally Clark was convicted --
323
1003000
2000
在萨里•克拉克被定罪后--
16:45
you can imagine, it made rather a splash in the press --
324
1005000
4000
可以想象 这在媒体中引起轩然大波--
16:49
one of the journalists from one of Britain's more reputable newspapers wrote that
325
1009000
7000
一个英国相当有名望的报社记者写到
16:56
what the expert had said was,
326
1016000
2000
这个专家说
16:58
"The chance that she was innocent was one in 73 million."
327
1018000
5000
“她无罪的几率是七千三百万分之一”
17:03
Now, that's a logical error.
328
1023000
2000
这是一个逻辑上的错误
17:05
It's exactly the same logical error as the logical error of thinking that
329
1025000
3000
这个错误相当于认为
17:08
after the disease test, which is 99 percent accurate,
330
1028000
2000
在准确率99%的疾病测试后
17:10
the chance of having the disease is 99 percent.
331
1030000
4000
患病的几率是99%
17:14
In the disease example, we had to bear in mind two things,
332
1034000
4000
在疾病的例子中 我们要注意两点
17:18
one of which was the possibility that the test got it right or not.
333
1038000
4000
一个是这个测试得出的可能性是否正确
17:22
And the other one was the chance, a priori, that the person had the disease or not.
334
1042000
4000
另一个就是这个人本身是否患病
17:26
It's exactly the same in this context.
335
1046000
3000
这个情形是完全相同的
17:29
There are two things involved -- two parts to the explanation.
336
1049000
4000
这个解释包括两个部分
17:33
We want to know how likely, or relatively how likely, two different explanations are.
337
1053000
4000
我们想知道这两种不同解释发生的可能性 或相对的可能性
17:37
One of them is that Sally Clark was innocent --
338
1057000
3000
一个是 萨里•克拉克是清白的--
17:40
which is, a priori, overwhelmingly likely --
339
1060000
2000
也就是 一个先验 极为可能--
17:42
most mothers don't kill their children.
340
1062000
3000
大多母亲不会杀自己的孩子
17:45
And the second part of the explanation
341
1065000
2000
这个解释的第二部分
17:47
is that she suffered an incredibly unlikely event.
342
1067000
3000
就是她遭遇了一个可能性极小的时间
17:50
Not as unlikely as one in 73 million, but nonetheless rather unlikely.
343
1070000
4000
不像七千三百万分之一那样小 但也同样不可能
17:54
The other explanation is that she was guilty.
344
1074000
2000
另一个解释就是
17:56
Now, we probably think a priori that's unlikely.
345
1076000
2000
我们可能认为一个先验是 不大可能
17:58
And we certainly should think in the context of a criminal trial
346
1078000
3000
然后我们当然应该认为在刑事审判的情形下
18:01
that that's unlikely, because of the presumption of innocence.
347
1081000
3000
这是不大可能的 因为我们以无罪为前提
18:04
And then if she were trying to kill the children, she succeeded.
348
1084000
4000
如果她那时试着杀害孩子 那么她成功了
18:08
So the chance that she's innocent isn't one in 73 million.
349
1088000
4000
所以她无罪的机率并不是七千三百万分之一
18:12
We don't know what it is.
350
1092000
2000
我们不知道这个个机率是多少
18:14
It has to do with weighing up the strength of the other evidence against her
351
1094000
4000
这同衡量其它对她不利的证据
18:18
and the statistical evidence.
352
1098000
2000
和数据型证据有关
18:20
We know the children died.
353
1100000
2000
我们知道 孩子死了
18:22
What matters is how likely or unlikely, relative to each other,
354
1102000
4000
重要的是这两种解释
18:26
the two explanations are.
355
1106000
2000
相对发生的机率
18:28
And they're both implausible.
356
1108000
2000
他们都令人难以置信
18:31
There's a situation where errors in statistics had really profound
357
1111000
4000
在这种情形下 错误的数据
18:35
and really unfortunate consequences.
358
1115000
3000
产生了很重大而且不幸的结果
18:38
In fact, there are two other women who were convicted on the basis of the
359
1118000
2000
事实上 还有其他两个女人因这个小儿科医师的作证
18:40
evidence of this pediatrician, who have subsequently been released on appeal.
360
1120000
4000
而被定罪 而她们在上诉中都被无罪释放了
18:44
Many cases were reviewed.
361
1124000
2000
很多案子都因此而重审
18:46
And it's particularly topical because he's currently facing a disrepute charge
362
1126000
4000
这引起了很高的关注 因为他正面临着
18:50
at Britain's General Medical Council.
363
1130000
3000
英国综合医学委员会的名誉调查
18:53
So just to conclude -- what are the take-home messages from this?
364
1133000
4000
总结一下 我们应该得到什么警示呢
18:57
Well, we know that randomness and uncertainty and chance
365
1137000
4000
我们知道 随机性、不确定性和概率
19:01
are very much a part of our everyday life.
366
1141000
3000
在生活中影响重大
19:04
It's also true -- and, although, you, as a collective, are very special in many ways,
367
1144000
5000
并且大家作为一个集体 在很多方面都很特别
19:09
you're completely typical in not getting the examples I gave right.
368
1149000
4000
大家没有回答正确我给出的例子 是完全正常并具有代表性的
19:13
It's very well documented that people get things wrong.
369
1153000
3000
有很多人们理解错误的记录
19:16
They make errors of logic in reasoning with uncertainty.
370
1156000
3000
他们在不确定性方面犯逻辑错误
19:20
We can cope with the subtleties of language brilliantly --
371
1160000
2000
我们可以很好地解决语言的细微差别
19:22
and there are interesting evolutionary questions about how we got here.
372
1162000
3000
还有有趣的进化方面的问题 如我们是怎么来到这里的
19:25
We are not good at reasoning with uncertainty.
373
1165000
3000
我们并不擅长不确定性
19:28
That's an issue in our everyday lives.
374
1168000
2000
这是我们生活中的一个问题
19:30
As you've heard from many of the talks, statistics underpins an enormous amount
375
1170000
3000
像你们听过的很多演讲 数据是很多科学研究中
19:33
of research in science -- in social science, in medicine
376
1173000
3000
的基础--社会科学 医学
19:36
and indeed, quite a lot of industry.
377
1176000
2000
确实 很多行业
19:38
All of quality control, which has had a major impact on industrial processing,
378
1178000
4000
所有的质量控制 这些对工业过程的影响极其重要
19:42
is underpinned by statistics.
379
1182000
2000
这些都以数据为基础
19:44
It's something we're bad at doing.
380
1184000
2000
而这方面我们并不擅长
19:46
At the very least, we should recognize that, and we tend not to.
381
1186000
3000
至少我们应该意识到这一点 并尽力防止错误发生
19:49
To go back to the legal context, at the Sally Clark trial
382
1189000
4000
回到法律方面 在萨里•克拉克的案子中
19:53
all of the lawyers just accepted what the expert said.
383
1193000
4000
所有律师都接受了专家的证词
19:57
So if a pediatrician had come out and said to a jury,
384
1197000
2000
如果一个小儿科医师出来对陪审团作证
19:59
"I know how to build bridges. I've built one down the road.
385
1199000
3000
我不知道怎样建造桥梁 我在路那边建了一个
20:02
Please drive your car home over it,"
386
1202000
2000
开车回家的时候请放心过桥
20:04
they would have said, "Well, pediatricians don't know how to build bridges.
387
1204000
2000
他们会说 小儿科医师不懂怎样建造桥梁
20:06
That's what engineers do."
388
1206000
2000
那是工程师的工作
20:08
On the other hand, he came out and effectively said, or implied,
389
1208000
3000
而另一方面 他们站出来说 或暗示
20:11
"I know how to reason with uncertainty. I know how to do statistics."
390
1211000
3000
我知道怎样运用不确定性 我知道怎样处理数据
20:14
And everyone said, "Well, that's fine. He's an expert."
391
1214000
3000
然后大家都说 这没问题 他是专家
20:17
So we need to understand where our competence is and isn't.
392
1217000
3000
所以我们应该明白我们的什么是我们的强项 什么不是
20:20
Exactly the same kinds of issues arose in the early days of DNA profiling,
393
1220000
4000
完全相同类型的问题每天都出现在DNA的测绘中
20:24
when scientists, and lawyers and in some cases judges,
394
1224000
4000
科学家 律师 有些情况下甚至法官
20:28
routinely misrepresented evidence.
395
1228000
3000
都会错误地解释证据
20:32
Usually -- one hopes -- innocently, but misrepresented evidence.
396
1232000
3000
通常--大家希望--结果是无罪 只是错误地解释了证据
20:35
Forensic scientists said, "The chance that this guy's innocent is one in three million."
397
1235000
5000
法庭上的科学家说 这个人无罪的机率是三百万分之一
20:40
Even if you believe the number, just like the 73 million to one,
398
1240000
2000
即使你相信这个数据 就像七千三百万分之一
20:42
that's not what it meant.
399
1242000
2000
这也并不是它真正的含义
20:44
And there have been celebrated appeal cases
400
1244000
2000
因为这个在英国和其他地方
20:46
in Britain and elsewhere because of that.
401
1246000
2000
有很多上诉案件
20:48
And just to finish in the context of the legal system.
402
1248000
3000
这就是在法律层面上我们要考虑的问题
20:51
It's all very well to say, "Let's do our best to present the evidence."
403
1251000
4000
说“我们尽量给予证据更好的解释”固然很好
20:55
But more and more, in cases of DNA profiling -- this is another one --
404
1255000
3000
但越来越的地 在DNA测绘中--这也很重要--
20:58
we expect juries, who are ordinary people --
405
1258000
3000
我们希望陪审团 那些普通人--
21:01
and it's documented they're very bad at this --
406
1261000
2000
记录表明他们非常不擅此类--
21:03
we expect juries to be able to cope with the sorts of reasoning that goes on.
407
1263000
4000
我们希望陪审团能够处理好这些推理
21:07
In other spheres of life, if people argued -- well, except possibly for politics --
408
1267000
5000
在生活的其它方面 如果人们在争辩的时候--当然 也许不包括政治
21:12
but in other spheres of life, if people argued illogically,
409
1272000
2000
但是在生活的其他方面 如果人们争辩地并不合逻辑
21:14
we'd say that's not a good thing.
410
1274000
2000
我们认为这不是好现象
21:16
We sort of expect it of politicians and don't hope for much more.
411
1276000
4000
在不确定性方面 我们也从某种程度上对政客抱有希望
21:20
In the case of uncertainty, we get it wrong all the time --
412
1280000
3000
但并不奢求什么 我们一直都没对过
21:23
and at the very least, we should be aware of that,
413
1283000
2000
至少 我们应该认识到这一点
21:25
and ideally, we might try and do something about it.
414
1285000
2000
并且 希望我们能试着做什么去改变这一点
21:27
Thanks very much.
415
1287000
1000
谢谢大家
关于本网站

这个网站将向你介绍对学习英语有用的YouTube视频。你将看到来自世界各地的一流教师教授的英语课程。双击每个视频页面上显示的英文字幕,即可从那里播放视频。字幕会随着视频的播放而同步滚动。如果你有任何意见或要求,请使用此联系表与我们联系。

https://forms.gle/WvT1wiN1qDtmnspy7