How we're building the world's largest family tree | Yaniv Erlich

41,746 views ・ 2019-10-18

TED


请双击下面的英文字幕来播放视频。

翻译人员: psjmz mz 校对人员: Yanyan Hong
00:12
People use the internet for various reasons.
0
12817
3452
人们因各种原因使用着互联网。
00:17
It turns out that one of the most popular categories of website
1
17765
3804
一种最受欢迎的网站
00:21
is something that people typically consume in private.
2
21593
2872
是人们常常私下浏览的东西。
00:25
It involves curiosity,
3
25639
2510
它涉及到好奇心,
00:28
non-insignificant levels of self-indulgence
4
28173
3796
无关自我放纵程度,
00:31
and is centered around recording the reproductive activities
5
31993
3260
并以记录他人的生殖记录
00:35
of other people.
6
35277
1309
为中心。
00:36
(Laughter)
7
36610
1032
(笑声)
00:37
Of course, I'm talking about genealogy --
8
37666
2250
当然,我讨论的是家谱学——
00:39
(Laughter)
9
39940
1214
(笑声)
00:41
the study of family history.
10
41178
1702
也就是对家庭历史的研究。
00:43
When it comes to detailing family history,
11
43353
2037
当说到详细的家族历史,
00:45
in every family, we have this person that is obsessed with genealogy.
12
45414
3943
在每个家庭中,我们都有 一个痴迷于家谱的人。
00:49
Let's call him Uncle Bernie.
13
49381
1713
我们姑且叫他伯尼叔叔吧。
00:51
Uncle Bernie is exactly the last person you want to sit next to
14
51118
3782
伯尼叔叔正是你在感恩节晚餐上,
00:54
in Thanksgiving dinner,
15
54924
1599
最不想坐在一起的人,
00:56
because he will bore you to death with peculiar details
16
56547
2814
因为他会用一些远古亲戚的奇特细节
00:59
about some ancient relatives.
17
59385
1966
把你烦死。
01:02
But as you know,
18
62462
1262
但正如你所知,
01:03
there is a scientific side for everything,
19
63748
2872
任何事物都有科学的一面,
01:06
and we found that Uncle Bernie's stories
20
66644
2978
我们发现伯尼叔叔的故事
01:09
have immense potential for biomedical research.
21
69646
3168
具有巨大的生物医学研究潜力。
01:13
We let Uncle Bernie and his fellow genealogists
22
73306
2714
我们让伯尼叔叔和他的家谱同行,
01:16
document their family trees through a genealogy website called geni.com.
23
76044
4668
通过族谱网站 geni.com 记录他们的家谱。
01:21
When users upload their trees to the website,
24
81198
2128
当用户上传他们的家谱树到网站时,
01:23
it scans their relatives,
25
83350
1690
网站会扫描他们的亲戚,
01:25
and if it finds matches to existing trees,
26
85064
2075
如果它发现匹配上现存的家谱树,
01:27
it merges the existing and the new tree together.
27
87163
3610
它会合并现存的和新的家谱树。
01:31
The result is that large family trees are created,
28
91768
2950
结果是超大的家族树创建起来了,
01:34
beyond the individual level of each genealogist.
29
94742
3479
超越了每个家谱学家的个人水平。
01:38
Now, by repeating this process with millions of people
30
98808
4129
现在,凭借着全球数百万人
01:42
all over the world,
31
102961
1817
不断重复这个过程,
01:44
we can crowdsource the construction of a family tree of all humankind.
32
104802
5532
我们可以众包全人类家谱树的建设。
01:51
Using this website,
33
111292
1584
使用这个网站,
01:52
we were able to connect 125 million people
34
112900
4813
我们能够在一颗家族树上连接
01:57
into a single family tree.
35
117737
2521
1.25 亿人。
02:00
I cannot draw the tree on the screens over here
36
120967
2788
我无法在这里的屏幕上画家谱树,
02:03
because they have less pixels
37
123779
2165
因为它们的像素比
02:05
than the number of people in this tree.
38
125968
2513
在这棵树上的人还少。
02:08
But here is an example of a subset of 6,000 individuals.
39
128505
5010
但这里有一个 6000 人的子集例子。
02:14
Each green node is a person.
40
134159
2362
每一个绿色的节点是一个人。
02:17
The red nodes represent marriages,
41
137060
2849
红色的节点代表婚姻,
02:19
and the connections represent parenthood.
42
139933
2258
连接代表亲子关系。
02:22
In the middle of this tree, you see the ancestors.
43
142557
2372
在这个树的中央是祖先。
02:24
And as we go to the periphery, you see the descendants.
44
144953
2604
外围是后代。
02:27
This tree has seven generations, approximately.
45
147581
3102
这棵树大约有 7 代。
02:31
Now, this is what happens when we increase the number of individuals
46
151692
3234
而这是当我们增加人数到 7 万人时
02:34
to 70,000 people --
47
154950
1828
的样子——
02:36
still a tiny subset of all the data that we have.
48
156802
4330
仍然是我们拥有的所有 数据集的一小部分。
02:41
Despite that, you can already see the formation of gigantic family trees
49
161629
4813
即便如此,你已经能够看到由许多远亲
02:46
with many very distant relatives.
50
166466
2655
组成的一棵巨大家谱树。
02:49
Thanks to the hard work of our genealogists,
51
169610
3134
感谢家谱学家的努力工作,
02:52
we can go back in time hundreds of years ago.
52
172768
3103
我们可以回到数百年前。
02:56
For example, here is Alexander Hamilton,
53
176418
3441
比如,这是亚历山大·汉密尔顿,
02:59
who was born in 1755.
54
179883
2475
他出生于 1755 年。
03:02
Alexander was the first US Secretary of the Treasury,
55
182872
3764
亚历山大是首任美国财政部长,
03:06
but mostly known today due to a popular Broadway musical.
56
186660
3831
但主要由于一部流行的百老汇 音乐剧而广为人知。
03:11
We found that Alexander has deeper connections in the showbiz industry.
57
191137
4922
我们发现亚历山大在娱乐圈 有更深厚的人脉。
03:16
In fact, he's a blood relative of ...
58
196083
2111
事实上,他是——
03:18
Kevin Bacon!
59
198781
1220
凯文·贝肯的血亲!
03:20
(Laughter)
60
200025
2032
(笑声)
03:22
Both of them are descendants of a lady from Scotland
61
202081
2606
他们都是13世纪一位来自
03:24
who lived in the 13th century.
62
204711
2314
苏格兰的女士的后代。
03:27
So you can say that Alexander Hamilton
63
207049
3102
所以你可以说亚历山大·汉密尔顿
03:30
is 35 degrees of Kevin Bacon genealogy.
64
210175
3188
是 35 度凯文·贝肯的宗谱。
03:33
(Laughter)
65
213387
1441
(笑声)
03:34
And our tree has millions of stories like that.
66
214852
3230
我们的家谱树有数百万类似的故事。
03:40
We invested significant efforts to validate the quality of our data.
67
220113
4890
我们投入了不小的工作 在验证数据的质量上。
03:45
Using DNA, we found that .3 percent of the mother-child connections in our data
68
225027
5391
使用DNA,我们发现我们 数据中有 0.3% 的母子关系
03:50
are wrong,
69
230442
1250
是错误的,
03:51
which could match the adoption rate in the US pre-Second World War.
70
231716
3591
这可能与二战前美国的收养率相当。
03:56
For the father's side,
71
236847
1785
父亲方面,
03:58
the news is not as good:
72
238656
1961
消息也并不乐观:
04:02
1.9 percent of the father-child connections in our data are wrong.
73
242149
5600
我们的数据中 1.9% 的 父子关系是错误的。
04:07
And I see some people smirk over here.
74
247773
2363
我看到有人在这儿讪笑。
04:10
It is what you think --
75
250160
1717
这是你们在想的——
04:11
there are many milkmen out there.
76
251901
1789
外面有很多挤牛奶的人。
04:13
(Laughter)
77
253714
1064
(笑声)
04:14
However, this 1.9 percent error rate in patrilineal connections
78
254802
3989
然而这 1.9% 的父子关系错误率
04:18
is not unique to our data.
79
258815
1769
不是我们数据独有的。
04:20
Previous studies found a similar error rate
80
260608
3069
早先使用临床级血统的研究
04:23
using clinical-grade pedigrees.
81
263701
2021
也发现了类似的错误率。
04:26
So the quality of our data is good,
82
266254
2525
所以我们的数据质量是良好的,
04:28
and that should not be a surprise.
83
268803
2133
并且这也不应该是个意外。
04:30
Our genealogists have a profound, vested interest
84
270960
3776
我们的系谱学家对正确记录
04:34
in correctly documenting their family history.
85
274760
3668
他们的家族史有着浓厚的兴趣。
04:40
We can leverage this data to learn quantitative information about humanity,
86
280594
4591
我们可以利用这些数据来 了解人类的定量信息,
04:45
for example, questions about demography.
87
285209
2596
比如,有关人口统计学的问题。
04:47
Here is a look at all our profiles on the map of the world.
88
287829
3857
这是我们的资料在世界地图上的样子。
04:52
Each pixel is a person that lived at some point.
89
292250
4481
每个像素代表一个生活在特定位置的人。
04:56
And since we have so much data,
90
296755
1680
由于我们有很多数据,
04:58
you can see the contours of many countries,
91
298459
2781
你可以看到很多国家的轮廓,
05:01
especially in the Western world.
92
301264
2099
尤其在西方世界。
05:03
In this clip, we stratified the map that I've showed you
93
303387
3548
在这个视频片段中, 我们把给你展示的地图
05:06
based on the year of births of individuals from 1400 to 1900,
94
306959
5072
根据 1400-1900 年出生 的人口进行分层,
05:12
and we compared it to known migration events.
95
312055
2766
并且跟已知的迁移事件比较。
05:15
The clip is going to show you that the deepest lineages in our data
96
315482
3165
这个视频将向你展示 我们数据中最深的血统,
05:18
go all the way back to the UK,
97
318671
1627
可以追溯到英国,
05:20
where they had better record keeping,
98
320322
1808
这里有更好的记录保存,
05:22
and then they spread along the routes of Western colonialism.
99
322154
3282
然后他们沿着西方殖民主义 的道路传播。
05:25
Let's watch this.
100
325460
1322
让我们来看看这个。
05:27
(Music)
101
327143
1609
(音乐)
05:28
[Year of birth: ]
102
328776
2341
【出生年份:】
05:31
[1492 - Columbus sails the ocean blue]
103
331705
1836
【1492 - 哥伦布蓝色海洋航行时期】
05:35
[1620 - Mayflower lands in Massachusetts]
104
335661
2000
【1620 - 五月花号在马萨诸塞州着陆】
05:38
[1652 - Dutch settle in South Africa]
105
338726
1775
【1652 - 荷兰人在南非定居】
05:44
[1788 - Great Britain penal transportation to Australia starts]
106
344321
3186
【1788 - 英国开始向澳大利亚 进行刑事流放】】
05:47
[1836 - First migrants use Oregon Trail]
107
347531
1927
【1836 - 第一批移民来到俄勒冈小道】
05:50
[all activity]
108
350149
3183
【所有活动】
05:55
I love this movie.
109
355851
1543
我爱这个视频。
05:57
Now, since these migration events are giving the context of families,
110
357418
5093
因为这些移民时间 提供了家庭的背景,
06:02
we can ask questions such as:
111
362535
2183
我们可以问诸如此类的问题:
06:04
What is the typical distance between the birth locations
112
364742
3470
丈夫和妻子出生地
06:08
of husbands and wives?
113
368236
2812
的特定距离是多少?
06:11
This distance plays a pivotal role in demography,
114
371072
3677
这一距离在人口统计学中 起着重要的作用,
06:14
because the patterns in which people migrate to form families
115
374773
3681
因为人们迁移形成家庭的模式
06:18
determine how genes spread in geographical areas.
116
378478
3713
决定了基因如何在地理位置上传播。
06:22
We analyzed this distance using our data,
117
382706
2328
我们使用我们的数据分析了这个距离,
06:25
and we found that in the old days,
118
385058
2290
我们发现在古时候,
06:27
people had it easy.
119
387372
1230
人们过得很轻松。
06:28
They just married someone in the village nearby.
120
388626
2594
他们只是跟村子附近的某人结婚。
06:31
But the Industrial Revolution really complicated our love life.
121
391958
3705
但工业革命复杂化了 我们的爱情生活。
06:35
And today, with affordable flights and online social media,
122
395687
4560
今天,凭着可负担的航班 和网络社交媒体,
06:40
people typically migrate more than 100 kilometers from their place of birth
123
400271
4828
人们通常从出生地迁移 100 多公里
06:45
to find their soul mate.
124
405123
1504
来寻找灵魂伴侣。
06:48
So now you might ask:
125
408524
1187
所以现在你可能会问:
06:49
OK, but who does the hard work of migrating from places to places
126
409735
4496
好吧,但是谁会卖力从一个地方 迁移到另一个地方
06:54
to form families?
127
414255
1269
去构建家庭呢?
06:55
Are these the males or the females?
128
415548
3727
是男人还是女人?
06:59
We used our data to address this question,
129
419752
2155
我们使用我们的数据解答了这个问题,
07:01
and at least in the last 300 years,
130
421931
2594
至少在过去 300 年中,
07:04
we found that the ladies do the hard work
131
424549
3883
我们发现女性从一个地方 迁移到另一个地方
07:08
of migrating from places to places to form families.
132
428456
2996
去构建家庭上是最辛苦的。
07:11
Now, these results are statistically significant,
133
431476
3101
这些结果在统计上很显著,
07:14
so you can take it as scientific fact that males are lazy.
134
434601
3471
所以你可以把男性懒惰当作科学事实。
07:18
(Laughter)
135
438096
3156
(笑声)
07:21
We can move from questions about demography
136
441276
2536
我们可以把问题从人口统计学开始
07:23
and ask questions about human health.
137
443836
2913
转向人类健康问题。
07:26
For example, we can ask
138
446773
1487
比如,我们可以问
07:28
to what extent genetic variations account for differences in life span
139
448284
4963
遗传变异能在多大程度上影响个体的
07:33
between individuals.
140
453271
1194
寿命差异。
07:34
Previous studies analyzed the correlation of longevity between twins
141
454988
4530
之前的研究通过分析 双胞胎寿命的相关性
07:39
to address this question.
142
459542
1442
来解答这个问题。
07:41
They estimated that the genetic variations account for
143
461411
2667
他们估计出遗传变异
07:44
about a quarter of the differences in life span between individuals.
144
464102
4040
对个体寿命差异的影响大约占 1/4。
07:48
But twins can be correlated due to so many reasons,
145
468688
2598
但双胞胎之间的关联有很多原因,
07:51
including various environmental effects
146
471310
2304
包括多样的环境影响
07:53
or a shared household.
147
473638
1622
或共同的家庭。
07:56
Large family trees give us the opportunity to analyze both close relatives,
148
476411
3753
庞大的家谱树给了我们分析这些近亲,
08:00
such as twins,
149
480188
1207
比如双胞胎,
08:01
all the way to distant relatives, even fourth cousins.
150
481419
2917
到远房亲戚, 甚至四代表亲这样的机会。
08:04
This way we can build robust models
151
484749
2689
这样我们可以构建稳健的模型,
08:07
that can tease apart the contribution of genetic variations
152
487462
3708
从环境因素中分离出
08:11
from environmental factors.
153
491194
1717
遗传变异的贡献来。
08:13
We conducted this analysis using our data,
154
493379
2899
我们使用数据执行了这个分析,
08:16
and we found that genetic variations explain only 15 percent
155
496302
5791
发现遗传变异只解释了 15% 的
08:22
of the differences in life span between individuals.
156
502117
2806
个体寿命差异。
08:26
That is five years, on average.
157
506760
2756
平均而言, 就是 5 年之差。
08:30
So genes matter less than what we thought before to life span.
158
510316
4708
所以基因对寿命的重要性 比我们之前想象的少。
08:35
And I find it great news,
159
515675
2136
我发现这是个好消息,
08:38
because it means that our actions can matter more.
160
518438
3293
因为这意味着我们的行动更为重要。
08:42
Smoking, for example, determines 10 years of our life expectancy --
161
522533
4274
举个例子,吸烟会影响 大约10年的预期寿命——
08:46
twice as much as what genetics determines.
162
526831
2646
是基因所能影响的两倍。
08:50
We can even have more surprising findings
163
530236
2289
随着我们从家谱树展开,
08:52
as we move from family trees
164
532549
1492
让我们的家谱学专家建档,
08:54
and we let our genealogists document and crowdsource DNA information.
165
534065
4732
并且众包DNA信息, 我们能有更多惊奇的发现。
08:58
And the results can be amazing.
166
538821
2024
结果将是惊人的。
09:01
It might be hard to imagine, but Uncle Bernie and his friends
167
541255
3915
可能令人难以想象, 伯尼叔叔和他的朋友
09:05
can create DNA forensic capabilities
168
545194
2646
能够创建 DNA 法医能力,
09:07
that even exceed what the FBI currently has.
169
547864
3559
甚至超过了 FBI 目前拥有的水平。
09:12
When you place the DNA on a large family tree,
170
552862
2404
当你把 DNA 放在一棵大的家谱树中,
09:15
you effectively create a beacon
171
555290
2117
你就有效地创造了一个照亮
09:17
that illuminates the hundreds of distant relatives
172
557431
2634
数百个远亲的灯塔,
09:20
that are all connected to the person that originated the DNA.
173
560089
3490
他们都与 DNA 的拥有者有联系。
09:24
By placing multiple beacons on a large family tree,
174
564505
2913
通过在一棵大的家谱树中 放置不同的灯塔,
09:27
you can now triangulate the DNA of an unknown person,
175
567442
3720
你现在可以对一个陌生人 的 DNA 进行三角测量,
09:31
the same way that the GPS system uses multiple satellites
176
571186
3938
就跟 GPS 系统利用不同的卫星
09:35
to find a location.
177
575148
1324
来定位一样。
09:37
The prime example of the power of this technique
178
577226
3624
这种技术威力一个的主要例子
09:40
is capturing the Golden State Killer,
179
580874
2675
是追捕“金州杀手”,
09:44
one of the most notorious criminals in the history of the US.
180
584612
4528
美国历史上最臭名昭著的罪犯之一。
09:49
The FBI had been searching for this person for over 40 years.
181
589164
5892
FBI 已经寻找这人超过 40 年。
09:55
They had his DNA,
182
595588
1835
他们有他的 DNA,
09:57
but he never showed up in any police database.
183
597447
3350
但他从未出现在警方的数据库中。
10:01
About a year ago, the FBI consulted a genetic genealogist,
184
601447
4712
大约一年前,FBI 咨询了 一位基因谱系学家,
10:06
and she suggested that they submit his DNA to a genealogy service
185
606183
3950
她建议他们提交他的 DNA 到可以定位远房亲戚
10:10
that can locate distant relatives.
186
610157
2398
的家谱服务平台上。
10:13
They did that,
187
613117
1156
FBI 这样做了,
10:14
and they found a third cousin of the Golden State Killer.
188
614297
3692
他们找到了金州杀手的第三代表亲。
10:18
They built a large family tree,
189
618013
2344
他们构建了一棵巨大的家谱树,
10:20
scanned the different branches of that tree,
190
620381
2102
扫描树上的不同分支,
10:22
until they found a profile that exactly matched
191
622507
2565
直到他们找到完美匹配
10:25
what they knew about the Golden State Killer.
192
625096
2581
他们所了解的金州杀手信息的人。
10:27
They obtained DNA from this person and found a perfect match
193
627701
3592
他们从这人身上取得 DNA 并发现
10:31
to the DNA they had in hand.
194
631317
2025
跟他们手上的 DNA 一致。
10:33
They arrested him and brought him to justice
195
633366
2350
过了这么些年,他们终于逮捕了他,
10:35
after all these years.
196
635740
1424
并绳之与法。
10:38
Since then, genetic genealogists have started working with
197
638172
3241
自那之后,基因谱系学家开始
10:41
local US law enforcement agencies
198
641437
2668
跟美国当地执法机构合作,
10:44
to use this technique in order to capture criminals.
199
644129
3362
使用这种技术来抓捕罪犯。
10:47
And only in the past six months,
200
647521
2681
仅仅在过去的 6 个月,
10:50
they were able to solve over 20 cold cases with this technique.
201
650226
4296
他们使用这个技术就破获了 超过 20 个铁证悬案。
10:56
Luckily, we have people like Uncle Bernie and his fellow genealogists
202
656203
4636
幸好,我们有这群人, 像伯尼叔叔和他的家谱学同行,
11:01
These are not amateurs with a self-serving hobby.
203
661045
2994
他们不只是业余爱好者。
11:04
These are citizen scientists with a deep passion to tell us who we are.
204
664602
6419
他们是满怀热情的公民科学家, 想要揭开我们所有人身份的秘密。
11:11
And they know that the past can hold a key to the future.
205
671065
4458
他们知道,过去是通向未来的钥匙。
11:16
Thank you very much.
206
676067
1183
谢谢大家。
11:17
(Applause)
207
677314
3469
(鼓掌)
关于本网站

这个网站将向你介绍对学习英语有用的YouTube视频。你将看到来自世界各地的一流教师教授的英语课程。双击每个视频页面上显示的英文字幕,即可从那里播放视频。字幕会随着视频的播放而同步滚动。如果你有任何意见或要求,请使用此联系表与我们联系。

https://forms.gle/WvT1wiN1qDtmnspy7