Inside OKCupid: The math of online dating - Christian Rudder
探索 OKCupid:當數學遇上交友網站 - Christian Rudder
1,238,192 views ・ 2013-02-13
請雙擊下方英文字幕播放視頻。
00:00
Translator: Andrea McDonough
Reviewer: Bedirhan Cinar
0
0
7000
譯者: Jephian Lin
審譯者: Coco Shen
00:17
Hello, my name is Christian Rudder,
1
17903
1714
大家好,我的名字叫 Christian Rudder,
00:19
and I was one of the founders of OkCupid.
2
19641
2209
我是 OK Cupid 的創辦者之一。
00:21
It's now one of the biggest
dating sites in the United States.
3
21874
2918
現在它是美國
最大的交友網站之一。
00:24
Like most everyone at the site,
I was a math major,
4
24816
2391
跟這網站的其它負責人一樣,
我主修數學,而就如你所預期的,
00:27
As you may expect, we're known
for the analytic approach we take to love.
5
27231
3440
我們較為人知的是
用分析方式研究戀愛行為。
我們把它叫做
速配演算法。
00:30
We call it our matching algorithm.
6
30695
1638
基本上,OK Cupid 的速配演算法
00:32
Basically, OkCupid's matching
algorithm helps us decide
7
32357
2588
幫助我們決定
某兩個人該不該去約會。
00:34
whether two people should go on a date.
8
34969
1876
00:36
We built our entire business around it.
9
36869
1872
這是我們事業的技術核心。
00:38
Now, algorithm is a fancy word,
10
38765
1960
演算法聽起來很花俏,
00:40
and people like to drop it
like it's this big thing.
11
40749
2485
而人們放棄搞懂因為它太複雜了
00:43
But really, an algorithm
is just a systematic,
12
43258
2288
但說真的,演算法只是一個
有系統的、
00:45
step-by-step way to solve a problem.
13
45570
2223
一步一步
解決問題的方法。
00:47
It doesn't have to be fancy at all.
14
47817
2177
不複雜也不花俏。
這個課程裡,我將會解釋
00:50
Here in this lesson,
15
50018
1151
00:51
I'm going to explain how we arrived
at our particular algorithm,
16
51193
3008
我們是怎麼設計我們的演算法
而它是如何運作的。
00:54
so you can see how it's done.
17
54225
1411
00:55
Now, why are algorithms even important?
18
55660
1934
為什麼演算法如此重要?
00:57
Why does this lesson even exist?
19
57618
1580
又為什麼要有這個課程?
00:59
Well, notice one very significant
phrase I used above:
20
59222
3420
這個,請注意我剛用的那個
非常重要的字:
01:02
they are a step-by-step
way to solve a problem,
21
62666
2339
演算法是一步一步
解決問題的方法,
01:05
and as you probably know, computers
excel at step-by-step processes.
22
65029
3418
而就像你可能知道的,
電腦很擅長做一步步
規劃好的程序。
01:08
A computer without an algorithm
23
68471
1589
一臺沒有演算法的電腦
基本上只是一個很貴的紙鎮而已。
01:10
is basically an expensive paperweight.
24
70084
2724
01:12
And since computers are such
a pervasive part of everyday life,
25
72832
2989
由於電腦在日常生活中
已經非常普及,
01:15
algorithms are everywhere.
26
75845
1547
所以演算法也是無所不在。
01:18
The math behind OkCupid's matching
algorithm is surprisingly simple.
27
78590
3197
而 OK Cupid 演算法背後的數學
其實異常地簡單。
01:21
It's just some addition, multiplication,
a little bit of square roots.
28
81811
4002
只是一些加法、
乘法、
還有一些些開根號。
01:25
The tricky part in designing it
29
85837
1690
而要設計它比較麻煩的部份,反而是
01:27
was figuring out how to take
something mysterious,
30
87551
2565
想辦法把一些神秘的東西,
01:30
human attraction,
31
90140
1150
像是人類的吸引力,
01:31
and break it into components
that a computer can work with.
32
91314
2784
把它變成電腦可以運算的東西。
好,要將人配對
所需要的第一樣東西是數據,
01:34
The first thing we needed
to match people up was data,
33
94122
2553
01:36
something for the algorithm to work with.
34
96699
1992
也就是要讓演算法計算的東西。
01:38
The best way to get data quickly
from people is to just ask for it.
35
98715
3158
要快速取得人們資料
最好的方法
就是直接問他。
01:41
So we decided that OkCupid
should ask users questions,
36
101897
2727
所以,我們決定 OK Cupid 應該要問
使用者一些問題,
01:44
stuff like, "Do you want
to have kids one day?"
37
104648
2357
像是:「你未來希望有小孩嗎?」
還有「你多常刷牙?」
01:47
"How often do you brush your teeth?"
38
107029
1758
01:48
"Do you like scary movies?"
39
108811
1392
「你喜歡恐怖片嗎?」
01:50
And big stuff like,
"Do you believe in God?"
40
110675
2077
以及較大的問題
像是「你相信神嗎?」
01:53
Now, a lot of the questions
are good for matching like with like,
41
113843
3064
而很多問題都有助於
將喜歡的人和喜歡的人
配在一起,
01:56
that is, when both people
answer the same way.
42
116931
2156
這是當雙方都回答了同一個答案的情況。
01:59
For example, two people
who are both into scary movies
43
119111
2548
舉例來說,兩個都喜歡恐怖片的人
02:01
are probably a better match
than one person who is and one who isn't.
44
121683
3321
也許就是不錯的配對,
比起將喜歡
和不喜歡的人配在一起好。
02:05
But what about a question like,
45
125028
1493
但如果是像這樣的問題:
02:06
"Do you like to be
the center of attention?"
46
126545
2062
「你喜歡成為眾人的焦點嗎?」
02:08
If both people in a relationship
are saying yes to this,
47
128631
2628
如果一對情侶的兩個人都說「喜歡」
那麼他們就有大問題了。
02:11
they're going to have massive problems.
48
131283
2093
02:13
We realized this early on,
49
133400
1245
我們很早就知道這點,
02:14
and so we decided we needed
a bit more data from each question.
50
134669
3269
所以我們決定
每個問題都需要再多一點資訊。
02:17
We had to ask people to specify
not only their own answer,
51
137962
2763
我們要求使用者
不只是回答問題本身,
02:20
but the answer they wanted
from someone else.
52
140749
2265
同時也回答他們對別人的期望。
02:23
That worked really well.
53
143038
1501
這效果真的很好,
02:24
But we needed one more dimension.
54
144563
1604
但我們還須要另一個思維。
02:26
Some questions tell you more
about a person than others.
55
146191
2643
有一些問題比其它問題
更能提供一個人的個性。
02:28
For example, a question
about politics, something like,
56
148858
3395
比如說,像是政治的問題:
「哪一個比較糟:燒書或是燒國旗?」
02:32
"Which is worse:
book burning or flag burning?"
57
152277
2288
02:34
might reveal more about someone
than their taste in movies.
58
154589
2810
比起對電影的品味,這可能透露更多
這個人的個性。
02:37
And it doesn't make sense
to weigh all things equally,
59
157423
2619
而每個人看事情的輕重大小都不同
所以我們加入了最後一個資料點。
02:40
so we added one final data point.
60
160066
1596
02:41
For everything that OkCupid asks you,
61
161686
2024
每一個 OK Cupid 問你的問題,
02:43
you have a chance to tell us
the role it plays in your life.
62
163734
2829
你都可以告訴我們
它在你生活中扮演的角色,
02:46
And this ranges
from irrelevant to mandatory.
63
166587
2319
而選項是從「不相關」到「極重要」。
02:49
So now, for every question,
we have three things for our algorithm:
64
169446
3222
所以現在,每一個問題,
我們都有三筆資訊
可以給我們的演算法:
02:52
first, your answer;
65
172692
1352
第一,你的答案;
02:54
second, how you want someone else --
your potential match -- to answer;
66
174617
4140
第二,你對別人期望的答案,
就是可能會跟你配對的人;
就是可能會跟你配對的人;
02:58
and third, how important
the question is to you at all.
67
178781
2788
第三,這問題究竟對你有多重要。
03:02
With all this information,
68
182710
1252
有全部這些資訊,
03:03
OkCupid can figure out
how well two people will get along.
69
183986
3118
OK Cupid 就可以算出
這兩個人相處有多融洽。
03:07
The algorithm crunches the numbers
and gives us a result.
70
187128
3006
這演算法會把數字吃進去
然後給我們答案。
舉一個實際的例子,
03:10
As a practical example,
71
190158
1152
03:11
let's look at how we'd match you
with another person.
72
191334
2525
我們來看看你和另一個人有多速配,
03:13
Let's call him "B."
73
193883
1189
估且叫他作 B 君。
你和 B 君的速配指數
是基於
03:16
Your match percentage with B is based
on questions you've both answered.
74
196023
3482
你們雙方回答的答案。
03:19
Let's call that set
of common questions "s."
75
199529
2425
我們把同樣的問題這集合叫做 s。
一個非常簡單的例子,
我們用很小的集合 s,
03:22
As a very simple example,
we use a small set "s"
76
202559
2349
03:24
with just two questions in common,
77
204932
1641
只有兩個相同的問題,
03:26
and compute a match from that.
78
206597
1828
然後由它算出速配程度。
03:28
Here are our two example questions.
79
208449
1671
這是兩個可能的問題。
03:30
The first one, let's say, is,
"How messy are you?"
80
210144
2381
第一個是:「你有多不愛乾淨?」
03:32
And the answer possibilities are:
81
212549
2096
而可能的答案是
03:34
very messy, average and very organized.
82
214669
3361
「很髒亂」、
「普通」、
「很愛乾淨」。
03:38
And let's say you answered
"very organized,"
83
218054
2060
假設你的答案是「很愛乾淨」,
而你期望別人也回答「很愛乾淨」,
03:40
and you'd like someone else
to answer "very organized,"
84
220138
2760
03:42
and the question is very important to you.
85
222922
2256
並且這問題對你來說「非常重要」。
03:45
Basically, you're a neat freak.
86
225202
1492
基本上你有潔癖。
03:46
You're neat, you want someone else
to be neat, and that's it.
87
226718
2868
你愛乾淨、
你也希望別人愛乾淨,
就是這樣。
03:49
And let's say B is a little bit different.
88
229610
2015
又假設 B 君回答有點不一樣。
03:51
He answered "very organized" for himself,
89
231649
2039
他回答自己「很愛乾淨」,
03:53
but "average" is OK with him
as an answer from someone else,
90
233712
3007
但別人回答是「普通」
對他來說就可以了,
03:56
and the question is only
a little important to him.
91
236743
2402
並且這問題對它只有「些許重要。」
接著我們來看第二個問題,
03:59
Let's look at the second question,
from our previous example:
92
239169
2893
是我們先前說過的例子:
「你喜歡成為眾人的焦點嗎?」
04:02
"Do you like to be
the center of attention?"
93
242086
2056
而答案只有「是」或「否」。
04:04
The answers are "yes" and "no."
94
244166
1514
04:05
You've answered "no," you want
someone else to answer "no,"
95
245704
2995
假設你的答案是「否」,
而你希望對方回答「否」、
04:08
and the question is only
a little important to you.
96
248723
2391
並且這問題對你只有「些許重要」。
換 B 君,他回答「是」,
04:11
Now B, he's answered "yes."
97
251138
1621
04:12
He wants someone else to answer "no,"
98
252783
1776
而他希望對方回答「否」,
04:14
because he wants the spotlight on him,
99
254583
2274
因為他希望焦點是在他身上,
04:16
and the question is somewhat
important to him.
100
256881
2430
而這問題對他「蠻重要的」。
04:19
So, let's try to compute all of this.
101
259335
1999
好,讓我們試著來算看看。
04:21
Our first step is, since we use
computers to do this,
102
261972
2503
第一個步驟,
因為我們是用電腦算,
04:24
we need to assign numerical values
103
264499
1867
我們必須給不同答案
相對應的數字,
04:26
to ideas like "somewhat
important" and "very important,"
104
266390
2627
比如說「蠻重要的」和「非常重要」,
04:29
because computers need
everything in numbers.
105
269041
2211
因為電腦須要每件事都是數字
才能運算。
04:31
We at OkCupid decided
on the following scale:
106
271276
2403
在 OK Cupid 裡我們訂定了這樣的量表:
04:33
"Irrelevant" is worth 0.
107
273703
1946
「不相關」是 0、
「些許重要」是 1、
04:36
"A little important" is worth 1.
108
276173
1889
04:38
"Somewhat important" is worth 10.
109
278538
1809
「蠻重要的」是 10、
04:40
"Very important" is 50.
110
280831
1754
「非常重要」是 50、
04:42
And "absolutely mandatory" is 250.
111
282609
3612
而「極重要」是 250。
04:46
Next, the algorithm makes
two simple calculations.
112
286245
2631
接著,演算法會進行兩個簡單的運算。
04:48
The first is: How much did
B's answers satisfy you?
113
288900
3246
第一是 B 君的答案
有多符合你的期望。
也就是,B 君在你的量表上會得到幾分?
04:52
That is, how many possible points
did B score on your scale?
114
292170
3793
04:55
Well, you indicated that B's answer
to the first question,
115
295987
3212
嗯,你在第一個愛乾淨的問題中
表示 B 君的答案
04:59
about messiness,
116
299223
1166
對你非常重要。
05:00
was very important to you.
117
300413
1350
05:01
It's worth 50 points and B got that right.
118
301787
2230
它佔 50 分而 B 正好符合。
05:04
The second question is worth only 1,
119
304375
1737
而第二個問題只佔 1 分,
因為你說它只有些許重要,
05:06
because you said
it was only a little important.
120
306136
2278
而 B 君答得不對。
05:08
B got that wrong,
121
308438
1197
05:09
so B's answers were 50
out of 51 possible points.
122
309659
2782
所以 B 君的答案
在總數 51 分裡得到 50 分。
05:12
That's 98% satisfactory. Pretty good.
123
312465
2608
這樣是 98% 的滿意度。
相當不錯。
05:15
The second question the algorithm
looks at is: How much did you satisfy B?
124
315097
3949
而演算法第二步要做的是
你有多符合 B 君。
嗯,B 君認為你對整潔問題
05:19
Well, B placed 1 point on your answer
to the messiness question
125
319070
3259
的答案佔 1 分,
05:22
and 10 on your answer to the second.
126
322353
1953
而第二個問題的答案佔 10 分。
05:24
Of those 11, that's 1 plus 10,
you earned 10 --
127
324745
3387
總共是 11 分,也就是 1 + 10,
你得到 10 分,
05:28
you guys satisfied each other
on the second question.
128
328156
2595
你們雙方在第二個問題
符合兩方的條件。
05:30
So your answers were 10 out of 11
equals 91 percent satisfactory to B.
129
330775
4242
所以你的答案是
11 分裡得 10 分,
相當於 B 君 91% 的滿意度。
05:35
That's not bad.
130
335041
1151
也是不錯。
05:36
The final step is to take
these two match percentages
131
336216
2507
而最後一步,
是把這兩個數字
05:38
and get one number for the both of you.
132
338747
1866
變成你們兩個速配指數。
05:40
To do this, the algorithm
multiplies your scores,
133
340637
2611
要完成這件事,
演算法會把你們的分數乘起來,
然後開 n 次方根,
(譯註:在 OK Cupid 官網中都是開根號。)
05:43
then takes the nth root,
134
343272
1665
05:44
where "n" is the number of questions.
135
344961
2183
這裡 n 是問題的數目。
因為在我們例子的 s 裡,
05:47
Because s, which is the number
of questions in this sample,
136
347168
2830
問題數只有 2,
05:50
is only 2,
137
350022
1841
05:51
we have: match percentage
equals the square root
138
351887
3665
我們就算出速配指數
是 98% 乘 91% 的開根號。
05:55
of 98 percent times 91 percent.
139
355576
2896
05:58
That equals 94 percent.
140
358496
1784
也就是 94%。
06:00
That 94 percent is your match
percentage with B.
141
360304
3204
這 94% 就是你和 B 君的速配指數。
06:03
It's a mathematical expression
of how happy you'd be with each other,
142
363532
3243
這是基於我們的了解,
你們兩個相處融洽的程度
06:06
based on what we know.
143
366799
1183
的一種數學式。
而,為什麼演算法要用相乘
06:08
Now, why does the algorithm multiply,
144
368006
1786
06:09
as opposed to, say, average
the two match scores together,
145
369816
2769
而不用相加,
06:12
and do the square-root business?
146
372609
1670
並且要取平方根呢?
06:14
In general, this formula
is called the geometric mean.
147
374303
2529
一般來說,這個公式叫作
幾何平均數,
06:16
It's a great way to combine
values that have wide ranges
148
376856
2627
它是將範圍很廣、
並表達不同特性的數據合在一起的
06:19
and represent very different properties.
149
379507
1915
一種很棒的方法。
也就是說,它對浪漫的配對來說
是很完美的。
06:21
In other words, it's perfect
for romantic matching.
150
381446
2413
06:23
You've got wide ranges and you've got
tons of different data points,
151
383883
3247
你會有很廣的數據、
你也許多不一樣的資訊,
比如說,關於電影、
06:27
like I said, about movies, politics,
religion -- everything.
152
387154
3438
關於政治、
關於信仰、
關於所有事。
06:30
Intuitively, too, this makes sense.
153
390616
1838
直覺來說,這也合理。
06:32
Two people satisfying
each other 50 percent
154
392478
2775
兩個人互相有 50% 的滿意度
應該會比
06:35
should be a better match
than two others who satisfy 0 and 100,
155
395277
3952
一人是 0% 另一人是 100% 來得好,
06:39
because affection needs to be mutual.
156
399253
1814
因為感情是互相的。
再加上一些邊界錯誤的修正,
06:41
After adding a little correction
for margin of error,
157
401091
2491
06:43
in the case where we have
a small number of questions,
158
403606
2571
就是說當問題數很少的時候的修正,
像是我們這個例子,
06:46
like we do in this example,
159
406201
1317
06:47
we're good to go.
160
407542
1172
我們就完成了。
06:48
Any time OkCupid matches two people,
161
408738
1912
每一次 OK Cupid 在幫兩人配對時,
06:50
it goes through the steps
we just outlined.
162
410674
2032
都經過了我們所講的那些步驟。
06:52
First it collects data about your answers,
163
412730
2269
首先從你的答案收集資訊,
然後用簡潔的數學方法
06:55
then it compares your choices
and preferences to other people's
164
415023
2985
來將你和其它人的偏好作比較。
06:58
in simple, mathematical ways.
165
418032
1967
這樣把真實世界的現象
07:00
This, the ability to take
real-world phenomena
166
420023
2923
07:02
and make them something
a microchip can understand,
167
422970
2415
變成微晶片能運作的一種能力,
07:05
is, I think, the most important skill
anyone can have these days.
168
425409
3277
我認為,
是我們現今可以擁有的
最重要的技能。
07:08
Like you use sentences
to tell a story to a person,
169
428710
2423
就像是你用句子來
向別人說故事一樣,
你會用演算法來
對電腦訴說故事。
07:11
you use algorithms
to tell a story to a computer.
170
431157
2484
如果你學會這種語言,
07:14
If you learn the language,
you can go out and tell your stories.
171
434349
3033
你就可以把你的故事告訴別人。
這就是我希望幫助你達成的事情。
07:17
I hope this will help you do that.
172
437406
1753
New videos
關於本網站
本網站將向您介紹對學習英語有用的 YouTube 視頻。 您將看到來自世界各地的一流教師教授的英語課程。 雙擊每個視頻頁面上顯示的英文字幕,從那裡播放視頻。 字幕與視頻播放同步滾動。 如果您有任何意見或要求,請使用此聯繫表與我們聯繫。