Massive-scale online collaboration | Luis von Ahn

311,075 views ・ 2011-12-06

TED


请双击下面的英文字幕来播放视频。

翻译人员: Chunxiang Qian 校对人员: Angelia King
00:15
How many of you had to fill out a web form
0
15260
2000
有多少人在填写网页表格时
00:17
where you've been asked to read
1
17284
1512
需要识别像这样扭曲的词语?
00:18
a distorted sequence of characters like this?
2
18820
2136
有多少人觉得很烦人?
00:20
How many of you found it really annoying?
3
20980
1953
哇,不少呢。我就是发明这个的人。
00:22
(Laughter)
4
22957
1099
00:24
OK, outstanding. So I invented that.
5
24080
1736
(笑声)
00:25
(Laughter)
6
25840
1836
或者说我是其中之一
00:27
Or I was one of the people who did it.
7
27700
1856
这个称作验证码
00:29
That thing is called a CAPTCHA.
8
29580
1536
其理由是保证填写表格的是一个真人
00:31
And it is there to make sure you, the entity filling out the form,
9
31140
3136
而不是什么电脑程序在操作
00:34
are a human and not a computer program
10
34300
1856
可以不停地填写表格
00:36
that was written to submit the form millions of times.
11
36180
2576
这是因为人类
00:38
The reason it works is because humans, at least non-visually-impaired humans,
12
38780
3656
至少是没有视力问题的人
可以识别这些扭曲的文字
00:42
have no trouble reading these distorted characters,
13
42460
2416
而机器做不到
00:44
whereas programs can't do it as well yet.
14
44900
1976
00:46
In the case of Ticketmaster,
15
46900
1496
比如说在票务大全网站上
00:48
the reason you have to type these characters
16
48420
2096
你输入这些扭曲字符的原因
00:50
is to prevent scalpers from writing a program
17
50540
2136
是防止黄牛写一个电脑程序
00:52
that can buy millions of tickets, two at a time.
18
52700
2256
一次购买上万张票
00:54
CAPTCHAs are used all over the Internet.
19
54980
1936
验证码在网络上普遍应用
00:56
And since they're used so often,
20
56940
1576
因其普遍性
00:58
a lot of times the sequence of random characters shown to the user
21
58540
3136
很多时候使用者就会看到一些
异常搭配的文字排序
01:01
is not so fortunate.
22
61700
1216
01:02
So this is an example from the Yahoo registration page.
23
62940
2656
这个例子来自雅虎注册网页
01:05
The random characters that happened to be shown to the user
24
65620
2816
使用者看到的这几个随机字母
W,A,I, T,正好组成了“等”
01:08
were W, A, I, T, which, of course, spell a word.
25
68460
2696
最有意思的是
01:11
But the best part is the message
26
71180
2096
01:13
that the Yahoo help desk got about 20 minutes later.
27
73300
2456
这是20分钟后的帮助页面
01:15
[Help! I've been waiting for over 20 minutes and nothing happens.]
28
75780
3136
文字:“帮忙!我已经等了二十多分钟,没有任何变化。”
01:18
(Laughter)
29
78940
4856
(笑声)
01:23
This person thought they needed to wait.
30
83820
1905
这人以为网站让他等着
01:25
This, of course, is not as bad as this poor person.
31
85749
2407
当然还有更倒霉的
01:28
(Laughter)
32
88180
2376
(笑声)
01:30
CAPTCHA Project is something that we did at Carnegie Melllon over 10 years ago,
33
90580
3736
验证码计划是我们十多年前在卡内基梅隆大学做起来的
并被广泛应用
01:34
and it's been used everywhere.
34
94340
1456
01:35
Let me now tell you about a project that we did a few years later,
35
95820
3136
现在谈谈几年后我们做的一个项目
算是验证码的新生代版本
01:38
which is sort of the next evolution of CAPTCHA.
36
98980
2216
这个计划我们称之“reCAPTCHA”
01:41
This is a project that we call reCAPTCHA,
37
101220
1976
这个计划是从卡内基梅隆大学起步
01:43
which is something that we started here at Carnegie Mellon,
38
103220
2776
成为我们的启动公司
01:46
then we turned it into a start-up company.
39
106020
2008
一年半前
01:48
And then about a year and a half ago, Google actually acquired this company.
40
108052
3588
谷歌收购了这个公司
现在我来说说这个项目的初始
01:51
Let me tell you what this project started.
41
111664
2007
这个项目是出于以下认识:
01:53
This project started from the following realization:
42
113695
2531
每天全球范围内有大约2亿次
01:56
It turns out that approximately 200 million CAPTCHAs
43
116250
2437
验证码输入
01:58
are typed everyday by people around the world.
44
118711
2151
02:00
When I first heard this, I was quite proud of myself.
45
120886
2484
我头次听到的时候还挺自豪
我想 我们的研究影响力不小啊
02:03
I thought, look at the impact my research has had.
46
123394
2341
接着我就感觉很难受
02:05
But then I started feeling bad.
47
125759
1484
因为每次你输入一个验证码
02:07
Here's the thing: each time you type a CAPTCHA,
48
127267
2206
你就浪费了10秒钟
02:09
essentially, you waste 10 seconds of your time.
49
129497
2339
02:11
And if you multiply that by 200 million,
50
131860
1936
这个乘以2亿
02:13
you get that humanity is wasting about 500,000 hours every day
51
133820
3016
全人类每天就浪费了50万个小时
02:16
typing these annoying CAPTCHAs.
52
136860
1536
来输入烦人的验证码
02:18
(Laughter)
53
138420
1016
我就很难受了
02:19
So then I started feeling bad.
54
139460
1429
02:20
(Laughter)
55
140913
1803
(笑声)
02:22
And then I started thinking, of course, we can't just get rid of CAPTCHAs,
56
142740
3496
我开始思考 既然不能放弃验证码
因为网页安全依赖于此
02:26
because the security of the web depends on them.
57
146260
2256
那么有什么方法可以利用它
02:28
But then I started thinking, can we use this effort
58
148540
2416
02:30
for something that is good for humanity?
59
150980
1936
来做点好事呢?
02:32
So see, here's the thing.
60
152940
1496
关键在于
02:34
While you're typing a CAPTCHA, during those 10 seconds,
61
154460
2616
当你在10秒钟内输入验证码的时候
你的大脑在做了不起的工作
02:37
your brain is doing something amazing.
62
157100
1856
02:38
Your brain is doing something that computers cannot yet do.
63
158980
2816
这是电脑目前尚无法做到的
那么能不能让这10秒钟的工作变得有意义呢?
02:41
So can we get you to do useful work for those 10 seconds?
64
161820
2696
也就是说
02:44
Is there some humongous problem that we cannot yet get computers to solve,
65
164540
3496
有没有什么目前电脑无法解决的难题
但是可以分割成10秒的单位小块
02:48
yet we can split into tiny 10-second chunks
66
168060
2776
02:50
such that each time somebody solves a CAPTCHA,
67
170860
2176
这样每个人通过验证码
解决这个问题的一个小单位?
02:53
they solve a little bit of this problem?
68
173060
1936
答案是肯定的话 这就是我们目前在做的
02:55
And the answer to that is "yes," and this is what we're doing now.
69
175020
3136
也许你不知道 如今当你输入一个验证码
02:58
Nowadays, while you're typing a CAPTCHA,
70
178180
1936
不仅仅是在证明你是真人
03:00
not only are you authenticating yourself as a human,
71
180140
2429
也是在把书电子化
03:02
but in addition you're helping us to digitize books.
72
182593
2443
我来解释一下
03:05
Let me explain how this works.
73
185060
1456
目前有很多书籍电子化的项目
03:06
There's a lot of projects trying to digitize books.
74
186540
2416
谷歌有一个。 “互联网档案”有一个
03:08
Google has one. The Internet Archive has one.
75
188980
2136
现亚马逊的Kindle也有一个
03:11
Amazon, with the Kindle, is trying to digitize books.
76
191140
2496
方法就是
03:13
Basically, the way this works is you start with an old book.
77
193660
3176
从一本旧书开始
03:16
You've seen those things, right?
78
196860
1576
你见过书对吧?一本书?
03:18
Like a book?
79
198460
1216
(笑声)
03:19
(Laughter)
80
199700
1256
03:20
So you start with a book and then you scan it.
81
200980
2536
首先扫描一本书
扫描就是
03:23
Now, scanning a book
82
203540
1216
03:24
is like taking a digital photograph of every page.
83
204780
2376
相当于把每一页照一张数码照片
你就有了这本书每一页的照片
03:27
It gives you an image for every page.
84
207180
1816
这是一本书每一页文字内容的照片
03:29
This is an image with text for every page of the book.
85
209020
2576
下一步就是
03:31
The next step in the process is that the computer needs to be able
86
211620
3136
电脑得解读这些照片上的每一个字
03:34
to decipher the words in this image.
87
214780
1736
这涉及到一个叫做OCR的技术
03:36
That's using a technology called OCR, for optical character recognition,
88
216540
3416
也就是光学字符识别
03:39
which takes a picture of text
89
219980
1416
拍下一段文字的照片
03:41
and tries to figure out what text is in there.
90
221420
2176
然后识别出文字内容
03:43
Now, the problem is that OCR is not perfect.
91
223620
2656
问题是光学字符识别的技术并不能解决所有问题
特别对于旧书
03:46
Especially for older books
92
226300
1416
03:47
where the ink has faded and the pages have turned yellow,
93
227740
3136
墨水褪色,书页泛黄
03:50
OCR cannot recognize a lot of the words.
94
230900
1936
很多字OCR无法识别
03:52
For things that were written more than 50 years ago,
95
232860
2456
比如,五十多年前的书
有百分之三十的单词电脑无法识别
03:55
the computer cannot recognize about 30 percent of the words.
96
235340
2856
我们做的就是
03:58
So now we're taking all of the words that the computer cannot recognize
97
238220
3376
摘录出电脑无法识别的单词
04:01
and we're getting people to read them for us
98
241620
2256
通过真人在网上输入验证码时
04:03
while they're typing a CAPTCHA on the Internet.
99
243900
2216
阅读识别出来
下次当你输入一个验证码时,你输入的那个单词
04:06
So the next time you type a CAPTCHA, these words that you're typing
100
246140
3176
实际是我们电子化书籍里
04:09
are actually words from books that are being digitized
101
249340
2576
04:11
that the computer could not recognize.
102
251940
1856
电脑无法识别的单词
04:13
The reason we have two words nowadays instead of one
103
253820
2456
现在我们使用两个而非一个单词的理由是
其中一个词是
04:16
is because one of the words
104
256300
1416
04:17
is a word that the system just got out of a book,
105
257740
2576
系统把一个电脑无法识别的单词
提供给你
04:20
it didn't know what it was and it's going to present it to you.
106
260340
3016
因为系统不认识这个单词 所以无法判断你的答案
04:23
But since it doesn't know the answer, it cannot grade it.
107
263380
2696
我们就加入另一个单词
04:26
So we give you another word,
108
266100
1376
04:27
for which the system does know the answer.
109
267500
2000
一个系统已经认识的单词
04:29
We don't tell you which one's which and we say, please type both.
110
269524
3072
不告诉你哪个是已知的,哪个是未知的 请你将两者都输入
如果你能拼写正确
04:32
And if you type the correct word
111
272620
1575
系统已认知的那个单词
04:34
for the one for which the system knows the answer,
112
274219
2377
就判断你为真人
04:36
it assumes you are human
113
276620
1256
04:37
and it also gets some confidence that you typed the other word correctly.
114
277900
3456
这样对你输入的另一个单词就有所把握
我们把这个过程让十个人重复进行
04:41
And if we repeat this process to 10 different people
115
281380
2456
如果他们对不识别单词的答案一致
04:43
and they agree on what the new word is,
116
283860
1896
我们就得到了一个准确电子化的新单词
04:45
then we get one more word digitized accurately.
117
285780
2216
这就是这个系统的工作原理
04:48
So this is how the system works.
118
288020
1576
大约三四年前我们导入这个系统
04:49
And since we released it about three or four years ago,
119
289620
2616
许多网站已经从旧的验证码
04:52
a lot of websites have started switching from the old CAPTCHA,
120
292260
2936
换成新的来帮助书籍电子化
04:55
where people wasted their time,
121
295220
1536
而不是浪费人们的时间
04:56
to the new CAPTCHA where people are helping to digitize books.
122
296780
2936
比如“票务大全”
04:59
So every time you buy tickets on Ticketmaster,
123
299740
2176
每次你在它的网站上购票 就在帮助把书籍电子化
05:01
you help to digitize a book.
124
301940
1376
脸书:每次你加好友或者打招呼
05:03
Facebook: Every time you add a friend or poke somebody,
125
303340
2616
你就帮忙在把书籍电子化
05:05
you help to digitize a book.
126
305980
1376
推特和其他350,000个网站都在用reCAPTCHA
05:07
Twitter and about 350,000 other sites are all using reCAPTCHA.
127
307380
2936
现在使用reCAPTCHA的网站是如此之多
05:10
And the number of sites that are using reCAPTCHA is so high
128
310340
2816
每天我们电子化的单词数量惊人
05:13
that the number of words we're digitizing per day is really large.
129
313180
3136
大概是每天一亿
05:16
It's about 100 million a day,
130
316340
1416
这就是每年大概250万本书
05:17
which is the equivalent of about two and a half million books a year.
131
317780
3496
而这一切仅仅都是通过人们在网上
05:21
And this is all being done one word at a time
132
321300
2136
输入验证码来做到的
05:23
by just people typing CAPTCHAs on the Internet.
133
323460
2216
(掌声)
05:25
(Applause)
134
325700
6880
05:32
Now, of course,
135
332940
1216
当然
05:34
since we're doing so many words per day,
136
334180
3336
因为每天处理的词是如此之多
难免有搞笑的状况
05:37
funny things can happen.
137
337540
1256
05:38
This is especially true because now we're giving people
138
338820
2616
特别是现在我们给出的单词是
两个随机组合的英语单词
05:41
two randomly chosen English words next to each other.
139
341460
2496
就出现了有意思的事
05:43
So funny things can happen.
140
343980
1336
比如 我们给出了这个词
05:45
For example, we presented this word.
141
345340
1736
“基督徒” 这没什么问题
05:47
It's the word "Christians"; there's nothing wrong with it.
142
347100
2736
问题是另外一个随机抽取的词
05:49
But if you present it along with another randomly chosen word,
143
349860
2936
就把事情搞糟了
05:52
bad things can happen.
144
352820
1336
比如这个 (恶基督徒)
05:54
So we get this.
145
354180
1216
05:55
[bad Christians]
146
355420
1216
更糟的是 出现这个的网站
05:56
But it's even worse, because the website where we showed this
147
356660
2896
正好是“神之国度大使馆”
05:59
actually happened to be called The Embassy of the Kingdom of God.
148
359580
3056
(笑声)
06:02
(Laughter)
149
362660
1696
糟了
06:04
Oops.
150
364380
1216
06:05
(Laughter)
151
365620
3856
(笑声)
这儿还有一个
06:09
Here's another really bad one.
152
369500
1696
JohnEdwards.com
06:11
JohnEdwards.com
153
371220
1296
06:12
[Damn liberal]
154
372540
1216
(该死的自由主义者)
06:13
(Laughter)
155
373780
4496
(笑声)
我们就这么每天不停地羞辱别人
06:18
So we keep on insulting people left and right everyday.
156
378300
2816
当然 不仅是人
06:21
Of course, we're not just insulting people.
157
381140
2016
其他东西也难逃厄运 因为我们是随机选取的单词
06:23
Here's the thing. Since we're presenting two randomly chosen words,
158
383180
3176
就有了很有趣的结果
06:26
interesting things can happen.
159
386380
1456
06:27
So this actually has given rise to a really big Internet meme
160
387860
4616
这个正在成为
互联网上一个流行趋势
06:32
that tens of thousands of people have participated in,
161
392500
2536
无数的人参与这个
所谓的验证码艺术
06:35
which is called CAPTCHA art.
162
395060
1656
06:36
I'm sure some of you have heard about it.
163
396740
1976
肯定有人听说过
06:38
Here's how it works.
164
398740
1256
是这样
06:40
Imagine you're using the Internet and you see a CAPTCHA
165
400020
2616
假设你在上网看到一个验证码
06:42
that you think is somewhat peculiar,
166
402660
1736
你觉得很特别
06:44
like this CAPTCHA.
167
404420
1216
比如这个 (隐形的烤面包机)
06:45
[invisible toaster]
168
405660
1216
06:46
What you're supposed to do is you take a screenshot of it.
169
406900
2736
你要做的就是截图
然后当然就是输入验证码
06:49
Then of course, you fill out the CAPTCHA because you help us digitize a book.
170
409660
3656
因为你在帮我们电子化书籍
接下来 你截了图
06:53
But first you take a screenshot
171
413340
1496
06:54
and then you draw something that is related to it.
172
414860
2376
就画出与它相关的图像
(笑声)
06:57
(Laughter)
173
417260
1696
06:58
That's how it works.
174
418980
1216
就是这样
07:00
(Laughter)
175
420220
1336
07:01
There are tens of thousands of these.
176
421580
2656
这样作品大概有一万个
07:04
Some of them are very cute.
177
424260
2072
有些很可爱 (握紧它)
07:06
[clenched it]
178
426356
1213
(笑声)
07:07
(Laughter)
179
427593
1843
有些很好玩
07:09
Some of them are funnier.
180
429460
1536
(大醉的创始人)
07:11
[stoned Founders]
181
431020
1216
07:12
(Laughter)
182
432260
4376
(笑声)
07:16
And some of them, like paleontological shvisle ...
183
436660
3429
还有一些
比如 “古生物学的史维凿”
07:20
(Laughter)
184
440113
1923
说不定那儿有史诺谱・道格(美国说唱歌手)
07:22
they contain Snoop Dogg.
185
442060
1216
07:23
(Laughter)
186
443300
3136
(笑声)
07:26
OK, so this is my favorite number of reCAPTCHA.
187
446460
2576
这是我最喜欢的reCAPTCHA数字
这是我最喜欢的这个项目的部分
07:29
So this is the favorite thing that I like about this whole project.
188
449060
3176
这个数字是
07:32
This is the number of distinct people
189
452260
1816
通过reCAPTCHA帮助我们电子化书籍中单词的人数
07:34
that have helped us digitize at least one word out of a book through reCAPTCHA:
190
454100
3736
7.5亿
07:37
750 million, a little over 10 percent of the world's population,
191
457860
3056
多于世界总人口的十分之一的人们
07:40
has helped us digitize human knowledge.
192
460940
1896
帮助我们电子化人类的知识
07:42
And it is numbers like these that motivate my research agenda.
193
462860
3096
正是这样的数字激励我的研究进程
07:45
So the question that motivates my research is the following:
194
465980
3056
那激励我研究进程的问题如下:
试想人类的大型成就
07:49
If you look at humanity's large-scale achievements,
195
469060
2416
人类共同
07:51
these really big things
196
471500
1216
07:52
that humanity has gotten together and done historically --
197
472740
2715
创造的那些大型历史性事物-
07:55
like, for example, building the pyramids of Egypt
198
475479
2477
比如 建造埃及金字塔
07:57
or the Panama Canal
199
477980
1576
开凿巴拿马运河
07:59
or putting a man on the Moon --
200
479580
2056
或者把人类送上月球-
08:01
there is a curious fact about them,
201
481660
1696
这些工程都有个奇怪的事实
08:03
and it is that they were all done with about the same number of people.
202
483380
3336
就是它们基本都是由一样数量的人们完成的
这很奇怪 这些工程都是由大概十万人完成
08:06
It's weird; they were all done with about 100,000 people.
203
486740
2696
因为在互联网出现之前
08:09
And the reason for that is because, before the Internet,
204
489460
2656
整合十万人
08:12
coordinating more than 100,000 people,
205
492140
1856
这十万人的巨大酬劳基本上是无法支付的
08:14
let alone paying them, was essentially impossible.
206
494020
3016
但是有了互联网 刚刚展示的这个项目
08:17
But now with the Internet, I've just shown you a project
207
497060
2656
就找到了7.5亿人
08:19
where we've gotten 750 million people to help us digitize human knowledge.
208
499740
3496
来帮助我们电子化人类知识
那么 激励我的研究的问题就是
08:23
So the question that motivates my research is,
209
503260
2176
如果十万人能把一个人送上月球
08:25
if we can put a man on the Moon with 100,000,
210
505460
2136
08:27
what can we do with 100 million?
211
507620
2176
一亿人能做到什么呢?
08:29
So based on this question,
212
509820
1256
基于这个问题
08:31
we've had a lot of different projects that we've been working on.
213
511100
3056
我们有很多项目在进行中
下面介绍一个最令我兴奋的项目
08:34
Let me tell you about one that I'm most excited about.
214
514180
2536
08:36
This is something that we've been semiquietly working on
215
516740
2656
这是过去一年半里
我们低调进行的一个项目
08:39
for the last year and a half or so.
216
519420
1696
还没有真正投入使用 它叫做Duolingo
08:41
It hasn't yet been launched. It's called Duolingo.
217
521140
2376
因为我们还没有投入使用 嘘!
08:43
Since it hasn't been launched, shhh!
218
523540
1736
(笑声)
08:45
(Laughter)
219
525300
1656
08:46
Yeah, I can trust you'll do that.
220
526980
2256
我相信你们都会保密的
这个项目是这样开始的
08:49
So this is the project. Here's how it started.
221
529260
2216
它始于我向我的一名研究生提的问题
08:51
It started with me posing a question to my graduate student, Severin Hacker.
222
531500
3576
塞韦林・骇客
这就是他
08:55
OK, that's Severin Hacker.
223
535100
1280
我向他提了一个问题
08:57
So I posed the question to my graduate student.
224
537299
2217
另外你确实没听错
08:59
By the way, you did hear me correctly; his last name is Hacker.
225
539540
2976
他姓骇客
09:02
(Laughter)
226
542540
1016
我向他提了个问题:
09:03
So I posed this question to him: How can we get 100 million people
227
543580
3296
怎么才能找到一亿人
09:06
translating the web into every major language for free?
228
546900
2960
把互联网上的内容免费翻译成所有的主要语言?
这个问题有好几个方面
09:10
There's a lot of things to say about this question.
229
550500
2416
首先是翻译网页
09:12
First of all, translating the web.
230
552940
1656
现在的网页内容主要分为几大语言
09:14
Right now, the web is partitioned into multiple languages.
231
554620
2796
其中一个大的分支是英语
09:17
A large fraction of it is in English.
232
557440
1816
如果你不会英语就无法使用
09:19
If you don't know English, you can't access it.
233
559280
2216
但是还有其他几种不同的语言
09:21
But there's large fractions in other different languages,
234
561520
2696
如果你不会那几种也无法使用
09:24
and if you don't know them, you can't access it.
235
564240
2256
我打算把所有网页 大部分网页
09:26
So I would like to translate all of the web,
236
566520
2096
09:28
or at least most of it, into every major language.
237
568640
2376
翻译成主要语言
这是我想做的
09:31
That's what I would like to do.
238
571040
1496
09:32
Now, some of you may say, why can't we use computers to translate?
239
572560
4476
也许有人会说 怎么不用电脑翻译?
为什么我们不用机器翻译?
09:37
Machine translation is starting to translate
240
577060
2096
机器翻译已经在应用中
09:39
some sentences here and there.
241
579180
1456
为什么不用它来翻译所有网页呢?
09:40
Why can't we use it to translate the web?
242
580660
1976
问题就是机器翻译还不够好
09:42
The problem with that is it's not yet good enough
243
582660
2336
也许将来的15到20年内都不行
09:45
and it probably won't be for the next 15 to 20 years.
244
585020
2496
机器翻译有很多错误
09:47
It makes a lot of mistakes. Even when it doesn't,
245
587540
2336
甚至就算它翻对的时候
09:49
since it makes so many mistakes, you don't know whether to trust it or not.
246
589900
3576
因为它的错误率太高 你也不敢相信它
比如这个例子
09:53
So let me show you an example
247
593500
1416
09:54
of something that was translated with a machine.
248
594940
2256
是由机器翻译的
这是个论坛帖子
09:57
Actually, it was a forum post.
249
597220
1456
09:58
It was somebody who was trying to ask a question about JavaScript.
250
598700
3176
有人提了关于Java语言的一个问题
10:01
It was translated from Japanese into English.
251
601900
2616
是从日语翻译成英语
10:04
So I'll just let you read.
252
604540
1776
你可以读一下
10:06
This person starts apologizing
253
606340
1776
他首先道歉
10:08
for the fact that it's translated with a computer.
254
608140
2456
这是机器翻译的内容
10:10
So the next sentence is going to be the preamble to the question.
255
610620
3776
下一个句子开始涉及问题
他开始说明
10:14
So he's just explaining something.
256
614420
1656
记住 这个问题是关于Java语言的
10:16
Remember, it's a question about JavaScript.
257
616100
2056
10:18
[At often, the goat-time install a error is vomit.]
258
618180
2616
(文字:常常,山羊时间启动错误被吐出来)
10:20
(Laughter)
259
620820
5096
(笑声)
10:25
Then comes the first part of the question.
260
625940
3536
接下来是问题的第一部分
10:29
[How many times like the wind, a pole, and the dragon?]
261
629500
2936
(文字:有多少次像风,像杆子,像龙?)
10:32
(Laughter)
262
632460
4656
(笑声)
接下来是最好玩的部分
10:37
Then comes my favorite part of the question.
263
637140
2056
10:39
[This insult to father's stones?]
264
639220
1936
(文字:这是对父亲的石头的侮辱?)
10:41
(Laughter)
265
641180
3856
(笑声)
接下来是结尾 我最喜欢的部分
10:45
And then comes the ending,
266
645060
1296
10:46
which is my favorite part of the whole thing.
267
646380
2136
(文字:请为你的愚蠢道歉,很多谢谢)
10:48
[Please apologize for your stupidity. There are a many thank you.]
268
648540
3136
10:51
(Laughter)
269
651700
2176
(笑声)
10:53
OK, so computer translation, not yet good enough.
270
653900
2936
可见 机器翻译 还不够好
回到问题上去
10:56
So back to the question.
271
656860
1256
我们需要人来翻译所有网页
10:58
So we need people to translate the whole web.
272
658140
2976
下一个问题是
11:01
So now the next question you may have is,
273
661140
1976
为什么不付钱找人做呢?
11:03
well, why can't we just pay people to do this?
274
663140
2176
我们可以找专业翻译人员来翻译整个网页
11:05
We could pay professional translators to translate the whole web.
275
665340
3096
可以这么做
11:08
We could do that.
276
668460
1256
11:09
Unfortunately, it would be extremely expensive.
277
669740
2216
但是 这会无比昂贵
11:11
For example, translating a tiny fraction of the whole web, Wikipedia,
278
671980
3256
比如 翻译互联网很小很小的一个部分 维基百科
英语翻译成西班牙语
11:15
into one other language, Spanish.
279
675260
2496
11:17
OK? Wikipedia exists in Spanish,
280
677780
1976
维基百科有西班牙语
11:19
but it's very small compared to the size of English.
281
679780
2456
但是相比英语部分小多了
大概是英语内容的百分之二十
11:22
It's about 20 percent of the size of English.
282
682260
2176
如果我们把剩下的百分之八十翻译成英语
11:24
If we wanted to translate the other 80 percent into Spanish,
283
684460
2856
就得至少五千万美元-
11:27
it would cost at least 50 million dollars --
284
687340
2136
这还是在最便宜的服务外包国家
11:29
and this is even at the most exploited, outsourcing country out there.
285
689500
3656
所以这个方法很昂贵
11:33
So it would be very expensive.
286
693180
1456
我们要的是一亿人
11:34
So what we want to do is, we want to get 100 million people
287
694660
2762
免费把网页内容翻译成
11:37
translating the web into every major language for free.
288
697446
2590
所有主要语言
如果你要这么做的话
11:40
If this is what you want to do, you quickly realize
289
700060
2416
就会意识到面临两个非常
11:42
you're going to run into two big hurdles, two big obstacles.
290
702500
2936
巨大的障碍
11:45
The first one is a lack of bilinguals.
291
705460
3296
首先是缺乏掌握双语的人
11:48
So I don't even know
292
708780
2176
我甚至不知道
11:50
if there exists 100 million people out there using the web
293
710980
2736
是否有一亿个互联网使用者
11:53
who are bilingual enough to help us translate.
294
713740
2296
掌握双语来进行翻译
这是个大问题
11:56
That's a big problem.
295
716060
1216
11:57
The other problem you're going to run into is a lack of motivation.
296
717300
3176
另一个问题是缺少鼓励机制
怎么才能让人们
12:00
How are we going to motivate people to actually translate the web for free?
297
720500
3536
甘愿免费翻译网页?
通常你得付钱请人干活儿
12:04
Normally, you have to pay people to do this.
298
724060
2296
12:06
So how are we going to motivate them to do it for free?
299
726380
2616
那么怎么才能让他们无偿劳动呢?
当我们着手考虑这个项目的时候这是拦在面前的两大问题
12:09
When we were starting to think about this, we were blocked by these two things.
300
729020
3736
后来我们意识到
12:12
But then we realized, there's a way
301
732780
1696
有一个方法可以一举解决这两个问题
12:14
to solve both these problems with the same solution.
302
734500
2456
一箭双雕
12:16
To kill two birds with one stone.
303
736980
1616
这就是把翻译转化成
12:18
And that is to transform language translation
304
738620
2136
12:20
into something that millions of people want to do
305
740780
2816
无数人想做的事情
12:23
and that also helps with the problem of lack of bilinguals,
306
743620
3136
同时解决双语人员人手不够的问题
12:26
and that is language education.
307
746780
2376
这就是语言学习
12:29
So it turns out that today,
308
749180
1976
当今
12:31
there are over 1.2 billion people learning a foreign language.
309
751180
3400
有超过12亿人口在学习外语
人们迫切得想学习外语
12:35
People really want to learn a foreign language.
310
755300
2216
而且这不是学校里不得不做的功课
12:37
And it's not just because they're being forced to do so in school.
311
757540
3136
比如在美国
12:40
In the US alone, there are over five million people
312
760700
2416
有超过五百万的人在为外语学习软件
12:43
who have paid over $500 for software to learn a new language.
313
763140
2896
每人支付超过五百美元
所有人们非常想学外语
12:46
So people really want to learn a new language.
314
766060
2176
过去一年半里我们建立的新网站
12:48
So what we've been working on for the last year and a half
315
768260
2736
叫做Duolingo-
12:51
is a new website -- it's called Duolingo --
316
771020
2056
就是基于这个让人们免费学习外语
12:53
where the basic idea is people learn a new language for free
317
773100
2856
12:55
while simultaneously translating the web.
318
775980
2056
同时翻译网页的想法
就是让他们学以致用
12:58
And so basically, they're learning by doing.
319
778060
2536
使用方法是这样
13:00
So the way this works
320
780620
1216
13:01
is whenever you're a just a beginner, we give you very simple sentences.
321
781860
3416
如果你是个新手 我们会给出非常非常简单的句子
网页上有很多简单的句子
13:05
There's a lot of very simple sentences on the web.
322
785300
2376
我们给出非常简单的句子
13:07
We give you very simple sentences along with what each word means.
323
787700
3216
以及句中单词释义
13:10
And as you translate them
324
790940
1336
然后你翻译一下 并且可以看到别人是如何翻译的
13:12
and as you see how other people translate them,
325
792300
2216
这样学习外语
13:14
you start learning the language.
326
794540
1576
当你级别提高后
13:16
And as you get more advanced,
327
796140
1416
13:17
we give you more complex sentences to translate.
328
797580
2256
我们会给出越来越复杂的句子让你翻译
13:19
But at all times, you're learning by doing.
329
799860
2016
这整个过程 你都是边学边用
13:21
Now, the crazy thing about this method is that it actually really works.
330
801900
3696
这个方法令人疯狂之处
是它居然确实有效
13:25
People are really learning a language.
331
805620
1856
首先 人们可以通过它学外语
13:27
We're mostly done building it and now we're testing it.
332
807500
2616
我们建完了网站,它现正在测试中
人们可以用它学习外语
13:30
People really can learn a language with it.
333
810140
2056
完全可以跟外语学习软件媲美
13:32
And they learn it about as well as the leading language learning software.
334
812220
3496
所以用它确实可以学外语
13:35
So people really do learn a language.
335
815740
1816
不仅可以学好
13:37
And not only do they learn it as well, but actually it's more interesting.
336
817580
3496
而且更有趣味性
因为通过Duolingo人们学的是真正的语言使用内容
13:41
Because with Duolingo, people are learning with real content.
337
821100
2896
而不是编造的句子
13:44
As opposed to learning with made-up sentences,
338
824020
2176
通过学习真正的文本内容,趣味性大大提高
13:46
people are learning with real content, which is inherently interesting.
339
826220
3336
这样人们就实实在在学习外语
13:49
So people really do learn a language.
340
829580
1816
最令人惊讶的是
13:51
But perhaps more surprisingly,
341
831420
1616
网站使用者的翻译
13:53
the translations that we get from people using the site,
342
833060
2736
13:55
even though they're just beginners,
343
835820
1776
甚至是初学者的翻译
13:57
the translations that we get
344
837620
1376
和专业的翻译人员几乎不相上下
13:59
are as accurate as those of professional language translators,
345
839020
2936
这很让人惊讶
14:01
which is very surprising.
346
841980
1216
让我给你们看一个例子
14:03
So let me show you one example.
347
843220
1536
14:04
This is a sentence that was translated from German into English.
348
844780
3016
这是一个从德语翻译成英语的例子
上面是德语
14:07
The top is the German. The middle is an English translation
349
847820
2776
中间是一名专业英语翻译人员
14:10
that was done by a professional translator
350
850620
2256
翻译的句子
14:12
who we paid 20 cents a word for this translation.
351
852900
2376
一个词二十美分的价钱
下面是Duolingo使用者的翻译
14:15
And the bottom is a translation by users of Duolingo,
352
855300
2696
他们在使用该网站前
14:18
none of whom knew any German before they started using the site.
353
858020
3736
不会任何德语
14:21
If you can see, it's pretty much perfect.
354
861780
1976
可以看到 几乎很完美
14:23
Of course, we play a trick here
355
863780
1536
当然 为了让翻译达到专业水准
14:25
to make the translations as good as professional language translators.
356
865340
3336
我们也想了个办法
我们把多名翻译者的翻译结合起来
14:28
We combine the translations of multiple beginners
357
868700
2336
得到专业人员的水准
14:31
to get the quality of a single professional translator.
358
871060
2896
14:33
Now, even though we're combining the translations,
359
873980
4536
即使我们要结合翻译
14:38
the site actually can translate pretty fast.
360
878540
2776
这个网站仍然可以迅速翻译
让我展示一下
14:41
So let me show you,
361
881340
1216
14:42
this is our estimates of how fast we could translate Wikipedia
362
882580
2936
这是我们对维基百科翻译工程的预计
从英语翻译成西班牙语
14:45
from English into Spanish.
363
885540
1296
14:46
Remember, this is 50 million dollars' worth of value.
364
886860
2976
要记住 这可是价值五千万美元的工程
14:49
So if we wanted to translate Wikipedia into Spanish,
365
889860
2456
如果要把维基百科从英文翻译成西班牙语
十万名活跃用户可以在五周内完成
14:52
we could do it in five weeks with 100,000 active users.
366
892340
2696
一百万活跃用户可以在八十小时内完成
14:55
And we could do it in about 80 hours with a million active users.
367
895060
3056
现在我们的项目小组已经有了上百万使用者
14:58
Since all the projects my group has worked on so far
368
898140
2456
15:00
have gotten millions of users,
369
900620
1456
我们希望可以以极快的速度
15:02
we're hopeful that we'll be able to translate extremely fast.
370
902100
2896
进行这个翻译工程
现在我对Duolingo最兴奋的就是
15:05
Now, the thing that I'm most excited about with Duolingo
371
905020
2976
它为外语教育创造了一个公平的商业模式
15:08
is I think this provides a fair business model for language education.
372
908020
3736
是这样:
15:11
So here's the thing:
373
911780
1216
目前外语教育的商业模式是
15:13
The current business model for language education
374
913020
2336
学生付钱
15:15
is the student pays,
375
915380
1376
15:16
and in particular, the student pays Rosetta Stone 500 dollars.
376
916780
3056
主要就是学生购买罗赛塔石碑五百美元的软件
(笑声)
15:19
(Laughter)
377
919860
1816
这是目前的商业模式
15:21
That's the current business model.
378
921700
1656
这个模式的问题是
15:23
The problem with this business model
379
923380
1736
世界人口的百分之九十五没有五百美元
15:25
is that 95 percent of the world's population doesn't have 500 dollars.
380
925140
3296
所以这个模式对穷人极度不公平
15:28
So it's extremely unfair towards the poor.
381
928460
2776
这是个面向富人的模式
15:31
This is totally biased towards the rich.
382
931260
1936
而Duolingo
15:33
Now, see, in Duolingo,
383
933220
1616
15:34
because while you learn, you're actually creating value,
384
934860
3656
因为你学习的时候
也创造价值,你在翻译东西-
15:38
you're translating stuff --
385
938540
1336
15:39
which, for example, we could charge somebody for translations,
386
939900
2936
因为比如我们得付钱给人翻译东西
15:42
so this is how we could monetize this.
387
942860
1856
这样你的学习过程就货币化了
15:44
Since people are creating value while they're learning,
388
944740
2616
因为人们学习的时候同时创造价值
他们就不用付钱 而是付出时间
15:47
they don't have to pay with their money, they pay with their time.
389
947380
3096
最妙的是 虽然人们得付出时间
15:50
But the magical thing here
390
950500
1923
15:52
is that is time that would have had to have been spent anyways
391
952447
2996
但这个时间是他们学习外语无论如何
都会付出的那部分时间
15:55
learning the language.
392
955467
1209
15:56
So the nice thing about Duolingo
393
956700
1576
所以Duolingo做的好事就是提供了一个公平的商业模式-
15:58
is, I think, it provides a fair business model --
394
958300
2336
这个模式对穷人一样敞开机会
16:00
one that doesn't discriminate against poor people.
395
960660
2376
这就是这个网站 谢谢
16:03
So here's the site. Thank you.
396
963060
1456
(掌声)
16:04
(Applause)
397
964540
7000
这个网站
16:13
We haven't yet launched,
398
973060
2416
我们还没有投入应用
16:15
but if you go there, you can sign up to be part of our private beta,
399
975500
3296
但是如果你去我们的页面的话可以注册
16:18
which is probably going to start in three or four weeks.
400
978820
2656
也许三四周后就可以开始了
我们还没有投入使用Duolingo
16:21
We haven't yet launched it.
401
981500
1336
16:22
By the way, I'm the one talking here,
402
982860
1816
另外 虽然是我在这里介绍Duolingo
16:24
but Duolingo is the work of a really awesome team,
403
984700
2376
但这个网站是一个优秀的团队的成果 这是其中一些人
16:27
some of whom are here. So thank you.
404
987100
1736
谢谢你们
16:28
(Applause)
405
988860
5240
(掌声)
关于本网站

这个网站将向你介绍对学习英语有用的YouTube视频。你将看到来自世界各地的一流教师教授的英语课程。双击每个视频页面上显示的英文字幕,即可从那里播放视频。字幕会随着视频的播放而同步滚动。如果你有任何意见或要求,请使用此联系表与我们联系。

https://forms.gle/WvT1wiN1qDtmnspy7