How we're building the world's largest family tree | Yaniv Erlich

42,042 views ・ 2019-10-18

TED


请双击下面的英文字幕来播放视频。

翻译人员: psjmz mz 校对人员: Yanyan Hong
00:12
People use the internet for various reasons.
0
12817
3452
人们因各种原因使用着互联网。
00:17
It turns out that one of the most popular categories of website
1
17765
3804
一种最受欢迎的网站
00:21
is something that people typically consume in private.
2
21593
2872
是人们常常私下浏览的东西。
00:25
It involves curiosity,
3
25639
2510
它涉及到好奇心,
00:28
non-insignificant levels of self-indulgence
4
28173
3796
无关自我放纵程度,
00:31
and is centered around recording the reproductive activities
5
31993
3260
并以记录他人的生殖记录
00:35
of other people.
6
35277
1309
为中心。
00:36
(Laughter)
7
36610
1032
(笑声)
00:37
Of course, I'm talking about genealogy --
8
37666
2250
当然,我讨论的是家谱学——
00:39
(Laughter)
9
39940
1214
(笑声)
00:41
the study of family history.
10
41178
1702
也就是对家庭历史的研究。
00:43
When it comes to detailing family history,
11
43353
2037
当说到详细的家族历史,
00:45
in every family, we have this person that is obsessed with genealogy.
12
45414
3943
在每个家庭中,我们都有 一个痴迷于家谱的人。
00:49
Let's call him Uncle Bernie.
13
49381
1713
我们姑且叫他伯尼叔叔吧。
00:51
Uncle Bernie is exactly the last person you want to sit next to
14
51118
3782
伯尼叔叔正是你在感恩节晚餐上,
00:54
in Thanksgiving dinner,
15
54924
1599
最不想坐在一起的人,
00:56
because he will bore you to death with peculiar details
16
56547
2814
因为他会用一些远古亲戚的奇特细节
00:59
about some ancient relatives.
17
59385
1966
把你烦死。
01:02
But as you know,
18
62462
1262
但正如你所知,
01:03
there is a scientific side for everything,
19
63748
2872
任何事物都有科学的一面,
01:06
and we found that Uncle Bernie's stories
20
66644
2978
我们发现伯尼叔叔的故事
01:09
have immense potential for biomedical research.
21
69646
3168
具有巨大的生物医学研究潜力。
01:13
We let Uncle Bernie and his fellow genealogists
22
73306
2714
我们让伯尼叔叔和他的家谱同行,
01:16
document their family trees through a genealogy website called geni.com.
23
76044
4668
通过族谱网站 geni.com 记录他们的家谱。
01:21
When users upload their trees to the website,
24
81198
2128
当用户上传他们的家谱树到网站时,
01:23
it scans their relatives,
25
83350
1690
网站会扫描他们的亲戚,
01:25
and if it finds matches to existing trees,
26
85064
2075
如果它发现匹配上现存的家谱树,
01:27
it merges the existing and the new tree together.
27
87163
3610
它会合并现存的和新的家谱树。
01:31
The result is that large family trees are created,
28
91768
2950
结果是超大的家族树创建起来了,
01:34
beyond the individual level of each genealogist.
29
94742
3479
超越了每个家谱学家的个人水平。
01:38
Now, by repeating this process with millions of people
30
98808
4129
现在,凭借着全球数百万人
01:42
all over the world,
31
102961
1817
不断重复这个过程,
01:44
we can crowdsource the construction of a family tree of all humankind.
32
104802
5532
我们可以众包全人类家谱树的建设。
01:51
Using this website,
33
111292
1584
使用这个网站,
01:52
we were able to connect 125 million people
34
112900
4813
我们能够在一颗家族树上连接
01:57
into a single family tree.
35
117737
2521
1.25 亿人。
02:00
I cannot draw the tree on the screens over here
36
120967
2788
我无法在这里的屏幕上画家谱树,
02:03
because they have less pixels
37
123779
2165
因为它们的像素比
02:05
than the number of people in this tree.
38
125968
2513
在这棵树上的人还少。
02:08
But here is an example of a subset of 6,000 individuals.
39
128505
5010
但这里有一个 6000 人的子集例子。
02:14
Each green node is a person.
40
134159
2362
每一个绿色的节点是一个人。
02:17
The red nodes represent marriages,
41
137060
2849
红色的节点代表婚姻,
02:19
and the connections represent parenthood.
42
139933
2258
连接代表亲子关系。
02:22
In the middle of this tree, you see the ancestors.
43
142557
2372
在这个树的中央是祖先。
02:24
And as we go to the periphery, you see the descendants.
44
144953
2604
外围是后代。
02:27
This tree has seven generations, approximately.
45
147581
3102
这棵树大约有 7 代。
02:31
Now, this is what happens when we increase the number of individuals
46
151692
3234
而这是当我们增加人数到 7 万人时
02:34
to 70,000 people --
47
154950
1828
的样子——
02:36
still a tiny subset of all the data that we have.
48
156802
4330
仍然是我们拥有的所有 数据集的一小部分。
02:41
Despite that, you can already see the formation of gigantic family trees
49
161629
4813
即便如此,你已经能够看到由许多远亲
02:46
with many very distant relatives.
50
166466
2655
组成的一棵巨大家谱树。
02:49
Thanks to the hard work of our genealogists,
51
169610
3134
感谢家谱学家的努力工作,
02:52
we can go back in time hundreds of years ago.
52
172768
3103
我们可以回到数百年前。
02:56
For example, here is Alexander Hamilton,
53
176418
3441
比如,这是亚历山大·汉密尔顿,
02:59
who was born in 1755.
54
179883
2475
他出生于 1755 年。
03:02
Alexander was the first US Secretary of the Treasury,
55
182872
3764
亚历山大是首任美国财政部长,
03:06
but mostly known today due to a popular Broadway musical.
56
186660
3831
但主要由于一部流行的百老汇 音乐剧而广为人知。
03:11
We found that Alexander has deeper connections in the showbiz industry.
57
191137
4922
我们发现亚历山大在娱乐圈 有更深厚的人脉。
03:16
In fact, he's a blood relative of ...
58
196083
2111
事实上,他是——
03:18
Kevin Bacon!
59
198781
1220
凯文·贝肯的血亲!
03:20
(Laughter)
60
200025
2032
(笑声)
03:22
Both of them are descendants of a lady from Scotland
61
202081
2606
他们都是13世纪一位来自
03:24
who lived in the 13th century.
62
204711
2314
苏格兰的女士的后代。
03:27
So you can say that Alexander Hamilton
63
207049
3102
所以你可以说亚历山大·汉密尔顿
03:30
is 35 degrees of Kevin Bacon genealogy.
64
210175
3188
是 35 度凯文·贝肯的宗谱。
03:33
(Laughter)
65
213387
1441
(笑声)
03:34
And our tree has millions of stories like that.
66
214852
3230
我们的家谱树有数百万类似的故事。
03:40
We invested significant efforts to validate the quality of our data.
67
220113
4890
我们投入了不小的工作 在验证数据的质量上。
03:45
Using DNA, we found that .3 percent of the mother-child connections in our data
68
225027
5391
使用DNA,我们发现我们 数据中有 0.3% 的母子关系
03:50
are wrong,
69
230442
1250
是错误的,
03:51
which could match the adoption rate in the US pre-Second World War.
70
231716
3591
这可能与二战前美国的收养率相当。
03:56
For the father's side,
71
236847
1785
父亲方面,
03:58
the news is not as good:
72
238656
1961
消息也并不乐观:
04:02
1.9 percent of the father-child connections in our data are wrong.
73
242149
5600
我们的数据中 1.9% 的 父子关系是错误的。
04:07
And I see some people smirk over here.
74
247773
2363
我看到有人在这儿讪笑。
04:10
It is what you think --
75
250160
1717
这是你们在想的——
04:11
there are many milkmen out there.
76
251901
1789
外面有很多挤牛奶的人。
04:13
(Laughter)
77
253714
1064
(笑声)
04:14
However, this 1.9 percent error rate in patrilineal connections
78
254802
3989
然而这 1.9% 的父子关系错误率
04:18
is not unique to our data.
79
258815
1769
不是我们数据独有的。
04:20
Previous studies found a similar error rate
80
260608
3069
早先使用临床级血统的研究
04:23
using clinical-grade pedigrees.
81
263701
2021
也发现了类似的错误率。
04:26
So the quality of our data is good,
82
266254
2525
所以我们的数据质量是良好的,
04:28
and that should not be a surprise.
83
268803
2133
并且这也不应该是个意外。
04:30
Our genealogists have a profound, vested interest
84
270960
3776
我们的系谱学家对正确记录
04:34
in correctly documenting their family history.
85
274760
3668
他们的家族史有着浓厚的兴趣。
04:40
We can leverage this data to learn quantitative information about humanity,
86
280594
4591
我们可以利用这些数据来 了解人类的定量信息,
04:45
for example, questions about demography.
87
285209
2596
比如,有关人口统计学的问题。
04:47
Here is a look at all our profiles on the map of the world.
88
287829
3857
这是我们的资料在世界地图上的样子。
04:52
Each pixel is a person that lived at some point.
89
292250
4481
每个像素代表一个生活在特定位置的人。
04:56
And since we have so much data,
90
296755
1680
由于我们有很多数据,
04:58
you can see the contours of many countries,
91
298459
2781
你可以看到很多国家的轮廓,
05:01
especially in the Western world.
92
301264
2099
尤其在西方世界。
05:03
In this clip, we stratified the map that I've showed you
93
303387
3548
在这个视频片段中, 我们把给你展示的地图
05:06
based on the year of births of individuals from 1400 to 1900,
94
306959
5072
根据 1400-1900 年出生 的人口进行分层,
05:12
and we compared it to known migration events.
95
312055
2766
并且跟已知的迁移事件比较。
05:15
The clip is going to show you that the deepest lineages in our data
96
315482
3165
这个视频将向你展示 我们数据中最深的血统,
05:18
go all the way back to the UK,
97
318671
1627
可以追溯到英国,
05:20
where they had better record keeping,
98
320322
1808
这里有更好的记录保存,
05:22
and then they spread along the routes of Western colonialism.
99
322154
3282
然后他们沿着西方殖民主义 的道路传播。
05:25
Let's watch this.
100
325460
1322
让我们来看看这个。
05:27
(Music)
101
327143
1609
(音乐)
05:28
[Year of birth: ]
102
328776
2341
【出生年份:】
05:31
[1492 - Columbus sails the ocean blue]
103
331705
1836
【1492 - 哥伦布蓝色海洋航行时期】
05:35
[1620 - Mayflower lands in Massachusetts]
104
335661
2000
【1620 - 五月花号在马萨诸塞州着陆】
05:38
[1652 - Dutch settle in South Africa]
105
338726
1775
【1652 - 荷兰人在南非定居】
05:44
[1788 - Great Britain penal transportation to Australia starts]
106
344321
3186
【1788 - 英国开始向澳大利亚 进行刑事流放】】
05:47
[1836 - First migrants use Oregon Trail]
107
347531
1927
【1836 - 第一批移民来到俄勒冈小道】
05:50
[all activity]
108
350149
3183
【所有活动】
05:55
I love this movie.
109
355851
1543
我爱这个视频。
05:57
Now, since these migration events are giving the context of families,
110
357418
5093
因为这些移民时间 提供了家庭的背景,
06:02
we can ask questions such as:
111
362535
2183
我们可以问诸如此类的问题:
06:04
What is the typical distance between the birth locations
112
364742
3470
丈夫和妻子出生地
06:08
of husbands and wives?
113
368236
2812
的特定距离是多少?
06:11
This distance plays a pivotal role in demography,
114
371072
3677
这一距离在人口统计学中 起着重要的作用,
06:14
because the patterns in which people migrate to form families
115
374773
3681
因为人们迁移形成家庭的模式
06:18
determine how genes spread in geographical areas.
116
378478
3713
决定了基因如何在地理位置上传播。
06:22
We analyzed this distance using our data,
117
382706
2328
我们使用我们的数据分析了这个距离,
06:25
and we found that in the old days,
118
385058
2290
我们发现在古时候,
06:27
people had it easy.
119
387372
1230
人们过得很轻松。
06:28
They just married someone in the village nearby.
120
388626
2594
他们只是跟村子附近的某人结婚。
06:31
But the Industrial Revolution really complicated our love life.
121
391958
3705
但工业革命复杂化了 我们的爱情生活。
06:35
And today, with affordable flights and online social media,
122
395687
4560
今天,凭着可负担的航班 和网络社交媒体,
06:40
people typically migrate more than 100 kilometers from their place of birth
123
400271
4828
人们通常从出生地迁移 100 多公里
06:45
to find their soul mate.
124
405123
1504
来寻找灵魂伴侣。
06:48
So now you might ask:
125
408524
1187
所以现在你可能会问:
06:49
OK, but who does the hard work of migrating from places to places
126
409735
4496
好吧,但是谁会卖力从一个地方 迁移到另一个地方
06:54
to form families?
127
414255
1269
去构建家庭呢?
06:55
Are these the males or the females?
128
415548
3727
是男人还是女人?
06:59
We used our data to address this question,
129
419752
2155
我们使用我们的数据解答了这个问题,
07:01
and at least in the last 300 years,
130
421931
2594
至少在过去 300 年中,
07:04
we found that the ladies do the hard work
131
424549
3883
我们发现女性从一个地方 迁移到另一个地方
07:08
of migrating from places to places to form families.
132
428456
2996
去构建家庭上是最辛苦的。
07:11
Now, these results are statistically significant,
133
431476
3101
这些结果在统计上很显著,
07:14
so you can take it as scientific fact that males are lazy.
134
434601
3471
所以你可以把男性懒惰当作科学事实。
07:18
(Laughter)
135
438096
3156
(笑声)
07:21
We can move from questions about demography
136
441276
2536
我们可以把问题从人口统计学开始
07:23
and ask questions about human health.
137
443836
2913
转向人类健康问题。
07:26
For example, we can ask
138
446773
1487
比如,我们可以问
07:28
to what extent genetic variations account for differences in life span
139
448284
4963
遗传变异能在多大程度上影响个体的
07:33
between individuals.
140
453271
1194
寿命差异。
07:34
Previous studies analyzed the correlation of longevity between twins
141
454988
4530
之前的研究通过分析 双胞胎寿命的相关性
07:39
to address this question.
142
459542
1442
来解答这个问题。
07:41
They estimated that the genetic variations account for
143
461411
2667
他们估计出遗传变异
07:44
about a quarter of the differences in life span between individuals.
144
464102
4040
对个体寿命差异的影响大约占 1/4。
07:48
But twins can be correlated due to so many reasons,
145
468688
2598
但双胞胎之间的关联有很多原因,
07:51
including various environmental effects
146
471310
2304
包括多样的环境影响
07:53
or a shared household.
147
473638
1622
或共同的家庭。
07:56
Large family trees give us the opportunity to analyze both close relatives,
148
476411
3753
庞大的家谱树给了我们分析这些近亲,
08:00
such as twins,
149
480188
1207
比如双胞胎,
08:01
all the way to distant relatives, even fourth cousins.
150
481419
2917
到远房亲戚, 甚至四代表亲这样的机会。
08:04
This way we can build robust models
151
484749
2689
这样我们可以构建稳健的模型,
08:07
that can tease apart the contribution of genetic variations
152
487462
3708
从环境因素中分离出
08:11
from environmental factors.
153
491194
1717
遗传变异的贡献来。
08:13
We conducted this analysis using our data,
154
493379
2899
我们使用数据执行了这个分析,
08:16
and we found that genetic variations explain only 15 percent
155
496302
5791
发现遗传变异只解释了 15% 的
08:22
of the differences in life span between individuals.
156
502117
2806
个体寿命差异。
08:26
That is five years, on average.
157
506760
2756
平均而言, 就是 5 年之差。
08:30
So genes matter less than what we thought before to life span.
158
510316
4708
所以基因对寿命的重要性 比我们之前想象的少。
08:35
And I find it great news,
159
515675
2136
我发现这是个好消息,
08:38
because it means that our actions can matter more.
160
518438
3293
因为这意味着我们的行动更为重要。
08:42
Smoking, for example, determines 10 years of our life expectancy --
161
522533
4274
举个例子,吸烟会影响 大约10年的预期寿命——
08:46
twice as much as what genetics determines.
162
526831
2646
是基因所能影响的两倍。
08:50
We can even have more surprising findings
163
530236
2289
随着我们从家谱树展开,
08:52
as we move from family trees
164
532549
1492
让我们的家谱学专家建档,
08:54
and we let our genealogists document and crowdsource DNA information.
165
534065
4732
并且众包DNA信息, 我们能有更多惊奇的发现。
08:58
And the results can be amazing.
166
538821
2024
结果将是惊人的。
09:01
It might be hard to imagine, but Uncle Bernie and his friends
167
541255
3915
可能令人难以想象, 伯尼叔叔和他的朋友
09:05
can create DNA forensic capabilities
168
545194
2646
能够创建 DNA 法医能力,
09:07
that even exceed what the FBI currently has.
169
547864
3559
甚至超过了 FBI 目前拥有的水平。
09:12
When you place the DNA on a large family tree,
170
552862
2404
当你把 DNA 放在一棵大的家谱树中,
09:15
you effectively create a beacon
171
555290
2117
你就有效地创造了一个照亮
09:17
that illuminates the hundreds of distant relatives
172
557431
2634
数百个远亲的灯塔,
09:20
that are all connected to the person that originated the DNA.
173
560089
3490
他们都与 DNA 的拥有者有联系。
09:24
By placing multiple beacons on a large family tree,
174
564505
2913
通过在一棵大的家谱树中 放置不同的灯塔,
09:27
you can now triangulate the DNA of an unknown person,
175
567442
3720
你现在可以对一个陌生人 的 DNA 进行三角测量,
09:31
the same way that the GPS system uses multiple satellites
176
571186
3938
就跟 GPS 系统利用不同的卫星
09:35
to find a location.
177
575148
1324
来定位一样。
09:37
The prime example of the power of this technique
178
577226
3624
这种技术威力一个的主要例子
09:40
is capturing the Golden State Killer,
179
580874
2675
是追捕“金州杀手”,
09:44
one of the most notorious criminals in the history of the US.
180
584612
4528
美国历史上最臭名昭著的罪犯之一。
09:49
The FBI had been searching for this person for over 40 years.
181
589164
5892
FBI 已经寻找这人超过 40 年。
09:55
They had his DNA,
182
595588
1835
他们有他的 DNA,
09:57
but he never showed up in any police database.
183
597447
3350
但他从未出现在警方的数据库中。
10:01
About a year ago, the FBI consulted a genetic genealogist,
184
601447
4712
大约一年前,FBI 咨询了 一位基因谱系学家,
10:06
and she suggested that they submit his DNA to a genealogy service
185
606183
3950
她建议他们提交他的 DNA 到可以定位远房亲戚
10:10
that can locate distant relatives.
186
610157
2398
的家谱服务平台上。
10:13
They did that,
187
613117
1156
FBI 这样做了,
10:14
and they found a third cousin of the Golden State Killer.
188
614297
3692
他们找到了金州杀手的第三代表亲。
10:18
They built a large family tree,
189
618013
2344
他们构建了一棵巨大的家谱树,
10:20
scanned the different branches of that tree,
190
620381
2102
扫描树上的不同分支,
10:22
until they found a profile that exactly matched
191
622507
2565
直到他们找到完美匹配
10:25
what they knew about the Golden State Killer.
192
625096
2581
他们所了解的金州杀手信息的人。
10:27
They obtained DNA from this person and found a perfect match
193
627701
3592
他们从这人身上取得 DNA 并发现
10:31
to the DNA they had in hand.
194
631317
2025
跟他们手上的 DNA 一致。
10:33
They arrested him and brought him to justice
195
633366
2350
过了这么些年,他们终于逮捕了他,
10:35
after all these years.
196
635740
1424
并绳之与法。
10:38
Since then, genetic genealogists have started working with
197
638172
3241
自那之后,基因谱系学家开始
10:41
local US law enforcement agencies
198
641437
2668
跟美国当地执法机构合作,
10:44
to use this technique in order to capture criminals.
199
644129
3362
使用这种技术来抓捕罪犯。
10:47
And only in the past six months,
200
647521
2681
仅仅在过去的 6 个月,
10:50
they were able to solve over 20 cold cases with this technique.
201
650226
4296
他们使用这个技术就破获了 超过 20 个铁证悬案。
10:56
Luckily, we have people like Uncle Bernie and his fellow genealogists
202
656203
4636
幸好,我们有这群人, 像伯尼叔叔和他的家谱学同行,
11:01
These are not amateurs with a self-serving hobby.
203
661045
2994
他们不只是业余爱好者。
11:04
These are citizen scientists with a deep passion to tell us who we are.
204
664602
6419
他们是满怀热情的公民科学家, 想要揭开我们所有人身份的秘密。
11:11
And they know that the past can hold a key to the future.
205
671065
4458
他们知道,过去是通向未来的钥匙。
11:16
Thank you very much.
206
676067
1183
谢谢大家。
11:17
(Applause)
207
677314
3469
(鼓掌)
关于本网站

这个网站将向你介绍对学习英语有用的YouTube视频。你将看到来自世界各地的一流教师教授的英语课程。双击每个视频页面上显示的英文字幕,即可从那里播放视频。字幕会随着视频的播放而同步滚动。如果你有任何意见或要求,请使用此联系表与我们联系。

https://forms.gle/WvT1wiN1qDtmnspy7


This website was created in October 2020 and last updated on June 12, 2025.

It is now archived and preserved as an English learning resource.

Some information may be out of date.

隐私政策

eng.lish.video

Developer's Blog