Inside OKCupid: The math of online dating - Christian Rudder
探秘OKCupid: 网络交友中的数学 -- Christian Rudder
1,238,889 views ・ 2013-02-13
请双击下面的英文字幕来播放视频。
00:00
Translator: Andrea McDonough
Reviewer: Bedirhan Cinar
0
0
7000
翻译人员: Gena Volz
校对人员: Sharon Loh
00:17
Hello, my name is Christian Rudder,
1
17903
1714
大家好,我叫 Christian Rudder,
00:19
and I was one of the founders of OkCupid.
2
19641
2209
我是 OKCupid 网站的创办人之一。
00:21
It's now one of the biggest
dating sites in the United States.
3
21874
2918
这个网站现在已经是
全美最大的交友网站。
00:24
Like most everyone at the site,
I was a math major,
4
24816
2391
就象这网站上大多数其他人一样,
我是学数学的,
正如你所期待的那样,
00:27
As you may expect, we're known
for the analytic approach we take to love.
5
27231
3440
我们擅于分析。
我们把这方法也应用在爱情上。
我们把它叫做“配对算法”。
00:30
We call it our matching algorithm.
6
30695
1638
基本上 OK Cupid 的配对算法
00:32
Basically, OkCupid's matching
algorithm helps us decide
7
32357
2588
帮助我们决定
两个人是否应该约会。
00:34
whether two people should go on a date.
8
34969
1876
00:36
We built our entire business around it.
9
36869
1872
我们的整个业务都是基于这一点。
00:38
Now, algorithm is a fancy word,
10
38765
1960
“算法”这个词说起来专业而高级,
00:40
and people like to drop it
like it's this big thing.
11
40749
2485
大家喜欢把它想成很大的一件事,
00:43
But really, an algorithm
is just a systematic,
12
43258
2288
但其实,算法只不过是一个系统的,
00:45
step-by-step way to solve a problem.
13
45570
2223
一步一步的解决问题的方法。
00:47
It doesn't have to be fancy at all.
14
47817
2177
根本没有那么复杂。
现在,我将为大家解释
00:50
Here in this lesson,
15
50018
1151
00:51
I'm going to explain how we arrived
at our particular algorithm,
16
51193
3008
我们怎样得出这一个特殊的算法。
你会在这看到它是怎样成形的。
00:54
so you can see how it's done.
17
54225
1411
00:55
Now, why are algorithms even important?
18
55660
1934
为什么算法如此重要?
00:57
Why does this lesson even exist?
19
57618
1580
为什么我们要有这堂课?
00:59
Well, notice one very significant
phrase I used above:
20
59222
3420
请注意我刚才提到的一个很重要的词:
01:02
they are a step-by-step
way to solve a problem,
21
62666
2339
它们是一种"逐步"解决问题的方法,
01:05
and as you probably know, computers
excel at step-by-step processes.
22
65029
3418
你或许也知道,
电脑擅长于一步一步的运算过程。
01:08
A computer without an algorithm
23
68471
1589
没有算法的电脑,
基本上只是一个昂贵的镇纸。
01:10
is basically an expensive paperweight.
24
70084
2724
01:12
And since computers are such
a pervasive part of everyday life,
25
72832
2989
既然电脑已经普及到我们的日常生活,
01:15
algorithms are everywhere.
26
75845
1547
算法是无处不在。
01:18
The math behind OkCupid's matching
algorithm is surprisingly simple.
27
78590
3197
OK Cupid 配对算法背后的数学逻辑
是非常简单的。
01:21
It's just some addition, multiplication,
a little bit of square roots.
28
81811
4002
就是一些加法,
乘法,
再来一点平方根。
01:25
The tricky part in designing it
29
85837
1690
不过,设计这套算法的关键部分,
01:27
was figuring out how to take
something mysterious,
30
87551
2565
在于要找出那些神秘的
01:30
human attraction,
31
90140
1150
人与人之间的相互吸引力,
01:31
and break it into components
that a computer can work with.
32
91314
2784
并把它解构成电脑可以工作的部分,
我们要做的第一件事
就把人和数据关联起来,
01:34
The first thing we needed
to match people up was data,
33
94122
2553
01:36
something for the algorithm to work with.
34
96699
1992
这样算法才能生效。
01:38
The best way to get data quickly
from people is to just ask for it.
35
98715
3158
要最快的从人们那里得到数据,
最好就是直接询问他们。
01:41
So we decided that OkCupid
should ask users questions,
36
101897
2727
我们决定 OK Cupid
应该向用户问问题,
01:44
stuff like, "Do you want
to have kids one day?"
37
104648
2357
比如说:“你会想要小孩吗?”,
“你多久刷一次牙?“,
01:47
"How often do you brush your teeth?"
38
107029
1758
01:48
"Do you like scary movies?"
39
108811
1392
”你喜欢看恐怖电影么?”。
01:50
And big stuff like,
"Do you believe in God?"
40
110675
2077
也有严肃些的问题,
比如:“你相信上帝么?”。
01:53
Now, a lot of the questions
are good for matching like with like,
41
113843
3064
目前有很多问题
在进行同类型配对上都很合适,
01:56
that is, when both people
answer the same way.
42
116931
2156
就是当双方的答案相同时。
01:59
For example, two people
who are both into scary movies
43
119111
2548
比如,两个人都喜欢看恐怖电影
02:01
are probably a better match
than one person who is and one who isn't.
44
121683
3321
可能配对得更成功。
而一个人喜欢,
另外一个人不喜欢的情况下,
适配度就差点。
02:05
But what about a question like,
45
125028
1493
但如果碰到下面的问题 :
02:06
"Do you like to be
the center of attention?"
46
126545
2062
“你喜欢成为关注的中心么?”
02:08
If both people in a relationship
are saying yes to this,
47
128631
2628
如果交往中的双方都回答是,
那他们可有大问题了。
02:11
they're going to have massive problems.
48
131283
2093
02:13
We realized this early on,
49
133400
1245
我们很早就意识到了这一点,
02:14
and so we decided we needed
a bit more data from each question.
50
134669
3269
所以我们觉得需要
在每个问题再收集多一些数据。
02:17
We had to ask people to specify
not only their own answer,
51
137962
2763
我们不仅要人们回答自己的看法,
02:20
but the answer they wanted
from someone else.
52
140749
2265
也要他们回答
他们期待对方如何回答。
02:23
That worked really well.
53
143038
1501
这方法很有效,
02:24
But we needed one more dimension.
54
144563
1604
不过我们还要再多加一个维度。
02:26
Some questions tell you more
about a person than others.
55
146191
2643
有些问题能表达人们的与众不同之处。
02:28
For example, a question
about politics, something like,
56
148858
3395
比如,关于政治的问题,
“ 焚烧书籍或者国旗,
哪个更糟糕 ?”
02:32
"Which is worse:
book burning or flag burning?"
57
152277
2288
02:34
might reveal more about someone
than their taste in movies.
58
154589
2810
这能展露人们电影口味之外的东西
02:37
And it doesn't make sense
to weigh all things equally,
59
157423
2619
同时,并不是所有问题都同等重要的,
所以我们最后增加了一个数据点。
02:40
so we added one final data point.
60
160066
1596
02:41
For everything that OkCupid asks you,
61
161686
2024
任何 OK Cupid 的问题,
02:43
you have a chance to tell us
the role it plays in your life.
62
163734
2829
你都可以告诉我们
这问题对你的重要性,
02:46
And this ranges
from irrelevant to mandatory.
63
166587
2319
它的程度从“无关”到“必要”。
02:49
So now, for every question,
we have three things for our algorithm:
64
169446
3222
现在,每一个问题,
我们有三个资讯提供给算法:
02:52
first, your answer;
65
172692
1352
第一,你的答案;
02:54
second, how you want someone else --
your potential match -- to answer;
66
174617
4140
第二,你希望别人怎么回答;
也就是你潜在的对象,
的答案;
02:58
and third, how important
the question is to you at all.
67
178781
2788
第三,这个问题对你有多重要?
03:02
With all this information,
68
182710
1252
有了这些信息,
03:03
OkCupid can figure out
how well two people will get along.
69
183986
3118
OK Cupid 可以知道
两个人相处和谐程度如何。
03:07
The algorithm crunches the numbers
and gives us a result.
70
187128
3006
算法吃进数字,吐出答案。
实际举例来说吧,
03:10
As a practical example,
71
190158
1152
03:11
let's look at how we'd match you
with another person.
72
191334
2525
看我们怎样把你和另外一个人进行配对,
03:13
Let's call him "B."
73
193883
1189
暂且称他为 “B”。
你和 B 的适配度是基于
03:16
Your match percentage with B is based
on questions you've both answered.
74
196023
3482
你们双方都进行过回答的问题。
03:19
Let's call that set
of common questions "s."
75
199529
2425
姑且把这些共同问题称之为 “s”。
简单举例,我们用小样本的 “s”,
03:22
As a very simple example,
we use a small set "s"
76
202559
2349
03:24
with just two questions in common,
77
204932
1641
只需两个共同回答过的问题
03:26
and compute a match from that.
78
206597
1828
电脑会根据它算出适配度。
03:28
Here are our two example questions.
79
208449
1671
这里是我们的两道简单问题:
03:30
The first one, let's say, is,
"How messy are you?"
80
210144
2381
第一个是,“你有多杂乱无章?”
03:32
And the answer possibilities are:
81
212549
2096
可供选择的答案选项有
03:34
very messy, average and very organized.
82
214669
3361
非常杂乱无章,
一般,
和非常有条理。
03:38
And let's say you answered
"very organized,"
83
218054
2060
我们假设你回答的是“非常有条理”,
你期待别人的回答是“非常有条理”,
03:40
and you'd like someone else
to answer "very organized,"
84
220138
2760
03:42
and the question is very important to you.
85
222922
2256
并且对你来说,这个问题非常重要。
03:45
Basically, you're a neat freak.
86
225202
1492
基本上你就是个井井有条的怪胎。
03:46
You're neat, you want someone else
to be neat, and that's it.
87
226718
2868
你是整洁有条理的人,
你也希望对方同样如此,
就这样。
03:49
And let's say B is a little bit different.
88
229610
2015
我们假设 B 有些不同。
03:51
He answered "very organized" for himself,
89
231649
2039
他的回答是自己非常有条理,
03:53
but "average" is OK with him
as an answer from someone else,
90
233712
3007
但是他也接受“一般”,
如果别人是这样回答的话,
03:56
and the question is only
a little important to him.
91
236743
2402
这个问题于他而言不太重要。
我们看第二个问题,
03:59
Let's look at the second question,
from our previous example:
92
239169
2893
就是我们最开始举例的:
“你喜欢成为关注的中心么?”
04:02
"Do you like to be
the center of attention?"
93
242086
2056
答题项只有“是”或者“否”。
04:04
The answers are "yes" and "no."
94
244166
1514
04:05
You've answered "no," you want
someone else to answer "no,"
95
245704
2995
现在你的回答是“否”,
你希望别人怎样回答这栏答的是“否”
04:08
and the question is only
a little important to you.
96
248723
2391
这个问题对于你不太重要。
而B呢,他自己的回答是“是”,
04:11
Now B, he's answered "yes."
97
251138
1621
04:12
He wants someone else to answer "no,"
98
252783
1776
他希望别人回答“否”,
04:14
because he wants the spotlight on him,
99
254583
2274
因为他希望所有焦点都在他身上,
04:16
and the question is somewhat
important to him.
100
256881
2430
而这个问题对他还算重要。
04:19
So, let's try to compute all of this.
101
259335
1999
现在,我们让电脑来处理一切。
04:21
Our first step is, since we use
computers to do this,
102
261972
2503
我们的第一步是,
既然我们要用电脑来处理它,
04:24
we need to assign numerical values
103
264499
1867
我们就需要给一些数值
04:26
to ideas like "somewhat
important" and "very important,"
104
266390
2627
来定义比如“还算重要”和“非常重要”,
04:29
because computers need
everything in numbers.
105
269041
2211
因为电脑需要把所有资料都转化成数字。
04:31
We at OkCupid decided
on the following scale:
106
271276
2403
在 OK Cupid 上我们按如下级别:
04:33
"Irrelevant" is worth 0.
107
273703
1946
“无关”是 0,
“不太重要”的值是1,
04:36
"A little important" is worth 1.
108
276173
1889
04:38
"Somewhat important" is worth 10.
109
278538
1809
“还算重要”的值是 10,
04:40
"Very important" is 50.
110
280831
1754
“非常重要”的值是 50,
04:42
And "absolutely mandatory" is 250.
111
282609
3612
“绝对必要”的值是 250.
04:46
Next, the algorithm makes
two simple calculations.
112
286245
2631
接下来,算法要做两个简单的计算。
04:48
The first is: How much did
B's answers satisfy you?
113
288900
3246
第一个是你对B的回答给多少分,
另外一个是,你给对方答题的满分是多少?
04:52
That is, how many possible points
did B score on your scale?
114
292170
3793
04:55
Well, you indicated that B's answer
to the first question,
115
295987
3212
你可以指定 B 的答案
在第一个有关条理性的问题上,
04:59
about messiness,
116
299223
1166
对你是非常重要。
05:00
was very important to you.
117
300413
1350
05:01
It's worth 50 points and B got that right.
118
301787
2230
它值50分,B 答对了。
05:04
The second question is worth only 1,
119
304375
1737
第二个问题只有1分,
因为你说这问题对你不太重要,
05:06
because you said
it was only a little important.
120
306136
2278
B 答错了。
05:08
B got that wrong,
121
308438
1197
05:09
so B's answers were 50
out of 51 possible points.
122
309659
2782
所以B的回答在51分满分里拿到了50分。
05:12
That's 98% satisfactory. Pretty good.
123
312465
2608
适配满意度是 98%。
非常好。
05:15
The second question the algorithm
looks at is: How much did you satisfy B?
124
315097
3949
算法的第二个问题是看
B 对你的满意程度。
B给对于你有关条理性的回答
05:19
Well, B placed 1 point on your answer
to the messiness question
125
319070
3259
给1分,
05:22
and 10 on your answer to the second.
126
322353
1953
对于第二个问题的答案给10分。
05:24
Of those 11, that's 1 plus 10,
you earned 10 --
127
324745
3387
满分11分,就是1+10.
你得到了10分,
05:28
you guys satisfied each other
on the second question.
128
328156
2595
在第二个问题上,你俩彼此都满意。
05:30
So your answers were 10 out of 11
equals 91 percent satisfactory to B.
129
330775
4242
你的回答在B的满意度分数是10/11,
百分比是91%。
05:35
That's not bad.
130
335041
1151
还不错。
05:36
The final step is to take
these two match percentages
131
336216
2507
最后一步是把两个适配度百分比放在一起,
05:38
and get one number for the both of you.
132
338747
1866
为你们两打一个分数。
05:40
To do this, the algorithm
multiplies your scores,
133
340637
2611
为得到这点,
算法把你们两人的得分相乘,
然后开n次方根,
05:43
then takes the nth root,
134
343272
1665
05:44
where "n" is the number of questions.
135
344961
2183
n 就是问题的数目。
因为“s”-- 也就是问题的数目,
05:47
Because s, which is the number
of questions in this sample,
136
347168
2830
在这个例子里,只是“2”,
05:50
is only 2,
137
350022
1841
05:51
we have: match percentage
equals the square root
138
351887
3665
我们得到的适配度百分比等于
98% 乘以 91% 再开平方根。
05:55
of 98 percent times 91 percent.
139
355576
2896
05:58
That equals 94 percent.
140
358496
1784
结果等于94%。
06:00
That 94 percent is your match
percentage with B.
141
360304
3204
94%就是你和 B 之间的适配度百分比。
06:03
It's a mathematical expression
of how happy you'd be with each other,
142
363532
3243
这是通过数学方法来表达--
你们彼此之间相处的愉快程度是怎样。
06:06
based on what we know.
143
366799
1183
基于我们所知道的信息。
为什么算法要相乘,而不是除?
06:08
Now, why does the algorithm multiply,
144
368006
1786
06:09
as opposed to, say, average
the two match scores together,
145
369816
2769
比如,把两个分数求平均值以后
06:12
and do the square-root business?
146
372609
1670
再开平方根?
06:14
In general, this formula
is called the geometric mean.
147
374303
2529
总的来说,这个公式叫几何平均数,
06:16
It's a great way to combine
values that have wide ranges
148
376856
2627
它很适合处理
差异很大的数据,
06:19
and represent very different properties.
149
379507
1915
以及代表不同属性的数据。
换句话说,它能完美的
计算出浪漫爱情适配度。
06:21
In other words, it's perfect
for romantic matching.
150
381446
2413
06:23
You've got wide ranges and you've got
tons of different data points,
151
383883
3247
你有大范围的,
数不清的数据值,
就像刚说过的,有关电影的,
06:27
like I said, about movies, politics,
religion -- everything.
152
387154
3438
有关政治的,
有关宗教的,
有关所有的一切。
06:30
Intuitively, too, this makes sense.
153
390616
1838
凭直觉讲,以下情况很有道理。
06:32
Two people satisfying
each other 50 percent
154
392478
2775
两个人彼此的满意度是50%,
会好过
06:35
should be a better match
than two others who satisfy 0 and 100,
155
395277
3952
那些两个人彼此满意度是0或者100的。
06:39
because affection needs to be mutual.
156
399253
1814
因为爱慕应该是互相的。
在增加了对误差幅度的小修改后 --
06:41
After adding a little correction
for margin of error,
157
401091
2491
06:43
in the case where we have
a small number of questions,
158
403606
2571
这种情况在问题量很小的时候会出现,
就像我们刚举的运算实例一样--
06:46
like we do in this example,
159
406201
1317
06:47
we're good to go.
160
407542
1172
这套算法就可以运作了。
06:48
Any time OkCupid matches two people,
161
408738
1912
任何时候当 OK Cupid 将两个人配对时,
06:50
it goes through the steps
we just outlined.
162
410674
2032
它按照我们刚介绍的步骤来運作,
06:52
First it collects data about your answers,
163
412730
2269
首先它收集你的答题的数据,
然后它比较你的选项和
你期待的对方选项,
06:55
then it compares your choices
and preferences to other people's
164
415023
2985
以简单的,数学的方法来进行。
06:58
in simple, mathematical ways.
165
418032
1967
这种能将现实世界的现象,
07:00
This, the ability to take
real-world phenomena
166
420023
2923
07:02
and make them something
a microchip can understand,
167
422970
2415
转化为电脑芯片能读取的数据的能力,
07:05
is, I think, the most important skill
anyone can have these days.
168
425409
3277
我认为,
是现代最重要的一种技术。
07:08
Like you use sentences
to tell a story to a person,
169
428710
2423
就像你用话语来给一个人讲故事,
你是用算法来跟电脑讲故事。
07:11
you use algorithms
to tell a story to a computer.
170
431157
2484
如果你学会了这种语言,
07:14
If you learn the language,
you can go out and tell your stories.
171
434349
3033
你就可以去讲故事了。
我希望我刚才的介绍能帮助你做到这点。
07:17
I hope this will help you do that.
172
437406
1753
New videos
关于本网站
这个网站将向你介绍对学习英语有用的YouTube视频。你将看到来自世界各地的一流教师教授的英语课程。双击每个视频页面上显示的英文字幕,即可从那里播放视频。字幕会随着视频的播放而同步滚动。如果你有任何意见或要求,请使用此联系表与我们联系。