Kalika Bali: The giant leaps in language technology -- and who's left behind | TED
54,094 views ・ 2021-04-26
请双击下面的英文字幕来播放视频。
00:00
Transcriber:
0
0
7000
翻译人员: yue sun
校对人员: suya f.
我是Kalika Bali,
一位经受培训的语言学家,
也是一名职业技师。
二十年来,我在学术界工作过,
也曾创业、在小公司
和跨国企业工作过。
00:12
I'm Kalika Bali,
I'm a linguist by training
1
12430
2800
00:15
and a technologist by profession,
2
15263
2334
研究并开发语言系统
00:17
I have worked in academia,
3
17630
1667
00:19
in startups, in small companies
and multinationals for over two decades,
4
19330
5267
我的梦想是看到技术
能够跨越语言壁垒
00:24
doing research in and building
language technology systems.
5
24630
3733
作为印度微软实验室的研究员
我的工作领域是语言及语音技术
00:28
My dream is to see technology work
across the language barrier.
6
28363
4667
00:33
As a researcher
at Microsoft Research Labs India
7
33030
3100
我关注如何能够使技术
00:36
I work in the field of language technology
and speech technology.
8
36163
5534
普及全球
无论他们的母语是什么
00:41
And I worry about how
can we make technology accessible
9
41697
3600
所以自然语言处理
人工智能,语音技术
00:45
to people across the board,
10
45330
2200
这些非常时髦的术语
00:47
you know, irrespective
of the language that they speak.
11
47530
2967
每个人都在讨论什么是自然语言处理
00:51
So natural language processing,
12
51663
1600
00:53
artificial intelligence,
speech technology,
13
53297
2066
00:55
these are very big words,
they are buzzwords right now.
14
55363
2600
简而言之
这是计算机科学工程的一部分
00:57
Everybody is talking about what exactly
is NLP or natural language processing.
15
57997
5533
让机器处理
理解和生成自然语言
01:03
So in very simple terms,
16
63530
1533
01:05
this is the part
of computer science engineering
17
65063
3367
也就是人类说的语言
01:08
that makes machines process,
18
68463
2834
当你和机器沟通交流
想要预定你的火车票
01:11
understand and generate natural language,
19
71330
2967
或者机票
01:14
which is the language that humans speak.
20
74330
2233
当你和手机里的电子语言助手说话时
01:17
When you are interacting with a bot
trying to book your train tickets
21
77697
4700
这就是自然语言处理
它支撑了整个技术
01:22
or flight tickets,
22
82430
1267
01:23
when you are speaking to a voice-based
digital assistant in your phone,
23
83697
4833
但是它是如何运作的呢
自然语言处理是如何运作的
01:28
it's natural language processing
24
88530
1600
最基本的
01:30
that underpins the entire technology
that makes that work.
25
90163
3534
就是数据
01:34
But how does this work?
26
94797
1266
大量的关于人类如何使用语言的数据
01:36
How does NLP work?
27
96063
1567
01:37
In a very, very basic way,
28
97663
3900
01:41
it's about data.
29
101563
1800
并特定的算法的技术所处理
01:43
So a huge amount of data
of how actually humans use language
30
103363
6334
让机器学会
01:49
is then processed
by certain algorithms and techniques
31
109697
5166
人类自然语言的模式
近年来,另外一个很流行的术语,
01:54
that make the machines learn the patterns
32
114863
2767
你可能会经常听到,
就是深度神经网络
01:57
of natural language of humans, right?
33
117663
3700
这是更高级的技术
这个技术是自然语言处理的基石
02:01
These days, another buzzword that you
hear a lot about is deep neural networks.
34
121363
5367
我不会具体阐述这个是如何工作的
02:06
And these are the advanced techniques
35
126763
2467
但是你需要理解并记住的是
02:09
that underpin a lot of the NLP stuff
that happens right now.
36
129263
4134
这些都需要海量数据
02:13
And I will not go into the details
of how that works,
37
133430
3333
02:16
but the thing that you really
have to understand and keep in mind
38
136797
3400
自然语言数据
如果你想要一个可以和
你用古吉拉特语交流的语音系统
02:20
is that all of this requires
a humungous amount of data,
39
140197
5000
首先你需要
02:25
natural language data.
40
145197
1666
大量的古吉拉特人
02:26
If you want a speech system
to converse with you in Gujarati,
41
146863
5267
用本地语言互相交流的数据
02:32
the first thing you require
42
152163
1367
02:33
is a lot of data of Gujarati people
speaking to each other
43
153530
4533
2017年,微软开发了
一个语音识别系统
它能够将语音翻译成文字
02:38
in their own language.
44
158063
1867
02:41
So 2017, Microsoft came up
with a speech recognition system
45
161663
4734
并且比人类做的都好
这个系统使用了
02:46
which was able
to transcribe speech into text
46
166430
3600
2亿个单词进行训练
2018年,
02:50
better than a human did.
47
170030
1733
一个英文转汉语的翻译软件
02:52
And this system was trained
48
172563
3167
实现了将英语翻译成中文的功能
02:55
on 200 million transcribed words.
49
175763
2667
并且达到了和一个双语者
一样的精确程度
02:58
In 2018, an English-Chinese
machine translation system
50
178463
3967
这个系统使用了
1800万条双语词对
03:02
was able to translate
from English to Chinese
51
182463
2800
03:05
as well as any human bilingual could.
52
185297
2766
这是自然语言处理技术领域
03:08
And this was trained
on 18 million bilingual sentence pairs.
53
188063
4934
非常激动人心的时刻
我们曾经在科幻类书籍
和电影中看到的
03:14
This is a very, very exciting time
in natural language processing
54
194330
4433
在现实生活中即将实现
03:18
and in technology as such.
55
198797
1466
我们正在实现科技进步的大跨越
03:20
You know, we are seeing science fiction,
which we had read about and watched,
56
200297
4466
但是这些巨大进步
仅限于少数几种语言
03:24
kind of come true
in front of our own eyes.
57
204797
2833
03:27
We are making giant leaps
in technical advancement.
58
207663
4467
所以Monojit Choudhury
我的好朋友
03:32
But these giant leaps
are limited to very few languages.
59
212163
6234
兼同事
他正在研究这个领域
他研究了全球语言资源分布
03:38
So Monojit Choudhury,
60
218430
1300
03:39
who's like a very good friend of mine
61
219763
1867
03:41
and a colleague,
62
221663
2134
他发现这符合幂函数分布
03:43
he has studied this in some detail
63
223830
1733
03:45
and he has looked at resource distribution
across languages in the world.
64
225563
4067
意味着,有四种语言
阿拉伯语,中文,
英语,西班牙语
03:49
And he says that these follow
what is called a power-law distribution,
65
229663
4100
拥有最多的资源
03:53
which essentially means
that there are four languages,
66
233797
2766
其他一些语言也可以
03:56
Arabic, Chinese, English and Spanish,
67
236563
2700
03:59
which have the maximum amount
of resources available.
68
239297
3766
当前的资源和技术中获利
04:03
There are another handful of languages
which can also benefit from, you know,
69
243063
5167
但是全世界90%的语言
04:08
the resources and the technology
that's available right now.
70
248263
3834
基本没有
或者有非常少的资源
04:12
But there are 90 percent
of the world's languages
71
252130
4833
我们所讨论的变革
避开了全世界5000多种语言
04:16
which have no resources
72
256997
1800
04:18
or very little resources available.
73
258830
2067
现在,这意味着
那些资源丰富的语言
04:20
This revolution that we are talking about
74
260930
2667
已拥有相关的技术
04:23
has essentially bypassed
5,000 languages of the world.
75
263630
4100
所以研究技术人员被它们所吸引
为它们创建更多的技术和资源
04:27
Now, what this means is
that resource-rich languages
76
267763
2534
04:30
have technologies built for them,
77
270330
1800
这就陷入了一个富人更富的循环
04:32
so researchers and technologists
get attracted towards them.
78
272163
3267
资源稀少的语言只能保持匮乏
04:35
They build more technologies for them.
They create more resources.
79
275463
3500
它们没有技术没有研究员
04:38
So it's like a rich getting richer
kind of a cycle.
80
278997
2800
这种语言间的鸿沟
04:41
And the resource-poor languages stay poor,
81
281830
2400
越来越大
也就意味着,
不同语言社群的鸿沟
04:44
there's no technology for them,
nobody works for them.
82
284263
2600
04:46
And this divide,
digital divide between languages
83
286863
3400
正在扩大
04:50
is ever-expanding
84
290297
1500
04:51
and by implication also the divide
between the communities
85
291830
4633
所以微软的ellora计划
致力于消除这种隔阂
04:56
that speak these languages is expanding.
86
296497
2500
我们正在努力尝试用创新
的方法产生更多的数据
05:00
So in Microsoft, in Project Ellora,
we aim to bridge this gap.
87
300763
4767
更多的技术,
在没有足够资源的情况下
05:06
We are trying to see how can we create
more data by innovative methods,
88
306663
5567
以及有哪些应用
可以真正惠顾这些社群
05:12
have more techniques to build technology
without having a lot of resources,
89
312263
5800
所以当前,尽管这看起来非常理论化
05:18
and what are the applications
that can truly benefit these communities.
90
318063
4200
正如他所说,数据,技术,科技
让我举一个实实在在的例子
05:23
So at the moment,
this might seem very theoretical,
91
323463
3334
我是个语言爱好者,
我喜欢语言,喜欢谈论语言
05:26
like what is he talking about,
data and techniques and technology.
92
326830
3133
05:29
So let me give you
a very concrete example here.
93
329997
3066
让我举一个很多人没有听过的语言
05:33
I'm a linguist at heart, I love languages,
and that's what I love talking about.
94
333063
5300
冈德语
冈德语是一种中南地区的达罗毗荼语
05:38
So let me tell you about a language
that many of you might not know about.
95
338363
4367
有三百万五个邦的印度人说它
05:42
Gondi.
96
342763
1267
做个比较
05:44
Gondi is a South-Central
Dravidian language.
97
344030
2700
五百万人说挪威语
05:46
It is spoken by three million people
in five states of India.
98
346763
4434
略少于一百万人说威尔士语
05:51
And to put this
in some kind of perspective,
99
351197
3000
所以冈德是一个相对
来说活跃和大的
05:54
Norwegian is spoken by five million people
100
354197
2833
05:57
and Welsh by a little under a million.
101
357030
2933
印度社群
05:59
So Gondi is actually a pretty robust
and pretty large community
102
359997
6200
但是根据unesco
的濒危语言图册
冈德语被列为危险的语言
06:06
of the Gond tribals in India.
103
366197
2900
06:09
But by UNESCO's
Atlas of Languages in Danger,
104
369130
5067
CGNet Swara是一个
非政府组织,
06:14
Gondi is designated vulnerable status.
105
374197
4366
它为冈德市民提供了
手机端本地新闻
06:19
CGNet Swara is an NGO
that provides a citizen journalism portal
106
379030
4533
冈德语没有技术支持
06:23
for the Gond community
107
383563
1867
没有数据,没有可获得的资源
06:25
by making local stories
accessible through mobile phones.
108
385463
4234
06:29
There's absolutely
no tech support for Gondi.
109
389697
2933
所以所有的内容都是
人工创造,审核和编辑的
06:32
There is no data available for Gondi,
no resources available for Gondi.
110
392663
4800
如今,在ellora计划下
我们所做的就是把所有的参与方,
06:37
So all content that is created,
moderated and edited is done manually.
111
397497
5066
像CGNet Swara这样
的非政府组织
以及学术机构
比如IIT Naya Raipur,
06:42
Now, under Project Ellora,
112
402563
2067
06:44
what we did was that we
brought together all the stakeholders,
113
404663
2967
非盈利组织比如儿童书
出版商Pratham Books,
06:47
an NGOs like CGNet Swara,
114
407663
1800
以及最重要的,
社群中的母语者,集中起来
06:49
and academic institutions,
like IIIT Naya Raipur,
115
409497
3366
冈德部落人自己参与到这样的活动中
06:52
a not-for-profit
children's book publisher,
116
412863
2200
06:55
like Pratham Books,
117
415063
1300
并第一次参与编辑和翻译儿童书
06:56
and most importantly,
the speakers of the community.
118
416363
2434
06:58
The Gond tribals themselves
participated in this activity
119
418830
4933
我们才能第一次把200多本
儿童书翻译成冈德语
07:03
and for the first time edited
and translated children’s books in Gondi.
120
423797
5633
这样儿童就能够阅读属于
自己语言的故事和书籍
07:09
We were able to put out 200 books
for the very first time in Gondi,
121
429463
5334
Adivasi Radio是另外
一种形式的延伸
这是一个微软实验室开放的app
07:14
so that the children had access to stories
and books in their own language.
122
434830
4700
把它和其他参与人资源放在一起
07:19
Another extension of this
was Adivasi Radio,
123
439530
2267
07:21
which was like an app that we built
and developed in Microsoft Research,
124
441830
4033
接入一个印度语文字转语音系统
07:25
and then put out there,
along with our stakeholders,
125
445863
4567
它能够将Cgnet Swara
提供的新闻和文章
07:30
which takes a Hindi text-to-speech system
126
450463
3067
用冈德语读出来
07:33
and allows it to read out news
and articles provided by CGNet Swara
127
453530
6400
客户可以用这个app来读
和看新闻和各种信息
07:39
in Gondi language.
128
459963
2434
通过自己的文字和语音
07:42
Users can now use this app to read,
129
462430
2767
有趣的是如今这个app还被
07:45
watch news and access any information
130
465197
3433
用来将印度语翻译成冈德语
07:48
through text and voice
in their own language.
131
468663
3900
07:52
A very interesting thing is that this app
is now being used to translate --
132
472563
3634
这产生了许多平行数据
我们可以使用平行数据
07:56
by the community to translate text
from Hindi to Gondi.
133
476197
4833
来创造冈德语的机器翻译系统
这样为冈德人打开了通往世界的窗口
08:01
Now, what that will result in
is a lot of parallel data,
134
481030
3333
08:04
that we call parallel data,
135
484363
1534
08:05
that will allow us to build
machine translation systems for Gondi,
136
485930
3500
更重要的是我们知道了如何去操作
08:09
which will truly open up a window
for the Gond community to the world.
137
489463
5900
我们拥有了一个综合
解决方案,可以将它复制
08:15
And what is even more important
is now we know how to do this.
138
495363
3567
到任何
和冈德部落相同的语言社群中
08:18
We have the entire pipeline
and we can replicate this for any language
139
498963
4600
还有教育和信息资源中
08:23
and any language community
140
503563
1900
08:25
which is in a similar situation
as the Gond tribals.
141
505497
3033
那么谋生呢
08:29
Also education -- yes, you know,
information access -- yes,
142
509630
4833
对,我们如何让这些人获得收入
通过这些我们习以为常的电子工具
08:34
but what about earning a living?
143
514497
2700
08:37
Right? What about -- how can we make
these people earn a living
144
517830
4200
Vivek Seshadri,
微软实验室的另一个研究院
以及他的合作者
Manu Chopra
08:42
through the digital tools that all of us
just take for granted these days?
145
522030
3867
他们设计了一个叫做Karya的
为弱势群体提供电子微任务的平台
08:45
Vivek Seshadri,
who's another researcher at MSR,
146
525930
2533
08:48
and his collaborator, Manu Chopra,
147
528497
2100
他的目的是
08:50
they've designed a platform called Karya
148
530630
2500
08:53
for providing digital microtasks
to the underserved communities.
149
533163
4600
为这个国家农村和城市贫穷人口
提供有尊严的服务
08:57
His aim was basically to find a way
to provide a means of dignified labor
150
537797
5433
他们没有
09:03
to the populations, the rural populations
151
543263
2034
使用电子平台的知识
09:05
and the urban poor populations
of this country.
152
545330
2300
不像我们每天使用早已习以为常
09:08
They don't have access
to all the knowledge
153
548530
3000
但是
09:11
to use the digital platforms
154
551530
2567
这里有一大群
09:14
that all of us use every day
without even thinking, right?
155
554130
4200
能够识字的人想要工作
09:18
But ...
156
558930
1233
我们如何做到呢
09:20
Here is a large
157
560863
2434
karya就是这样一个
09:23
literate population
that wants to work, right,
158
563330
4133
能够为这些人群提供
电子世界的入口的平台
09:27
and how can we make this
possible for them?
159
567497
2766
你知道
09:30
So Karya is one such way
160
570297
3500
通过找工作完成任务他们可以赚钱
09:33
through which this population
can get on to the digital world
161
573830
4033
所以当我们看到这个时感觉不可思议
09:37
and, you know,
162
577863
1434
我们可以用这个做数据收集
09:39
through that find work and do tasks
that can then earn them money.
163
579330
4433
所以我们来到Amale,
一个居住了200的
09:43
So we saw this and we thought,
oh, this is wonderful.
164
583797
2500
坐落在马哈拉施特拉邦
的Wada地区的小村庄
09:46
We could probably use this
for data collection as well.
165
586330
2600
并决定使用Karya收集马拉地语数据
09:48
So we went to Amale,
166
588963
2000
09:50
which is a small village of 200 people
167
590997
3266
我知道你在想什么
我知道肯定很多人马拉地人,
并且在座的听众
09:54
in the Wada district of Maharashtra
168
594297
1933
觉得马拉地语
不是一个资源缺乏的语言
09:56
and decided to use Karya
to collect Marathi data.
169
596263
2667
马拉地语的确是
这个国家的主流语言之一
09:58
Now, I know what you are thinking --
170
598963
1900
10:00
I'm sure a lot of Marathi speakers
also in the audience --
171
600930
2767
但是就目前的语言技术而言
10:03
that Marathi is not
a low-resource language.
172
603697
2400
马拉地语属于少资源语言
10:06
Marathi is definitely
a mainstream language of the country.
173
606130
3600
所以我们来到这个村庄
这是一个非常成功的数据收集之旅
10:09
But as far as language
technology is concerned,
174
609763
2500
并且这个村庄非常偏远
10:12
Marathi is a low-resource language.
175
612297
2333
10:14
So we went to this village
176
614663
1600
他们没有电视和电力
10:16
and we had a very successful
data-collection trip.
177
616297
3800
没有手机信号
10:20
And, you know,
this village is very remote.
178
620130
3567
你必须爬上一个小山坡
并且挥舞你的手机
10:23
They have no TV, they have no electricity,
179
623697
2700
如果你想要给别人打电话
10:26
they have no mobile signal.
180
626430
3833
他们不仅给了我们所有的数据
10:30
You have to climb a hill
and wave your phone around
181
630297
2633
还有人生中非常重要的一课
10:32
if you want to, you know,
use your mobile to call anyone.
182
632963
3200
第一就是我们为自己的语言感到自豪
10:37
So they gave us all this data.
183
637197
1766
Amela人也非常兴奋
10:38
But more than that, they gave us
very valuable lessons in life.
184
638997
3266
因为他们这么做能够改进自己的语言
10:43
One is this pride in one's own language.
185
643063
3000
10:46
The people of Amale
were thrilled to be doing this
186
646063
2667
第二就是社群的价值
10:48
because they were advancing
their own language by doing this.
187
648763
5834
很快这成了全村落共同的努力
人们团结起来完成任务
10:54
The second was the value of community.
188
654630
1833
10:56
Very quickly, this became
a village community effort.
189
656497
3900
第三就是讲故事的重要性
11:00
People would gather together in tasks
and do this together as a group.
190
660430
4867
Amela非常渴望故事内容
11:05
And the third is
the importance of storytelling.
191
665330
3767
白天他们会用Karya来记录故事
11:09
People of Amale were so starved of content
that in the morning, during the daytime,
192
669130
6067
然后在夜晚整个村庄会聚集在一起
把这些故事复述给村里的人听
11:15
they would do recordings
of stories in Karya
193
675197
4066
11:19
and then in the evening
they would gather the entire village
194
679297
2833
作为科学家,我们沉迷于
技术和科学世界
11:22
and retell and recount
these stories to the village.
195
682163
3834
什么模型最好
11:27
So as scientists, we get so caught up
196
687563
2167
如何提高模型的准确性
11:29
in the science and technology
part of what we are doing, you know --
197
689763
3700
如何建造最好的系统
11:33
which is the next best model to have,
198
693497
1966
但是我们经常忘记
我们为什么这么做:为了人民
11:35
how can we increase
the accuracy of my system,
199
695497
2933
11:38
how can I build
the next best system there is --
200
698463
4600
任何成功的技术必须
要能够让使用它的人
11:43
that we forget the reason
why we are doing this: the people.
201
703063
3700
能够与时俱进
11:46
And any successful technology is the one
that keeps the people and the users
202
706797
5466
当我们开始这么做时
我们也意识到科技知识
其中的一小部分
这其中还有一些其他的因素
11:52
up front and center.
203
712297
1566
可能有社会,文化和政策的参与
11:54
And when they start doing that,
204
714830
1533
11:56
we also realize that technology
is probably a very small part of this
205
716363
3667
这些都是必须的,就像科技一样
12:00
and there are other things in the story.
206
720030
2367
我曾经参与过一个
叫做Videokheti的项目
12:02
Maybe there are social, cultural
and policy interventions
207
722430
3333
它能让印度中部的说印度语的工人
12:05
that are required, as much as technology.
208
725797
2200
通过语音搜索到农业知识视频
12:09
So some time back,
I worked on a project called VideoKheti
209
729030
3267
12:12
that allowed Hindi-speaking
farmers in Central India
210
732330
3600
于是我们来到
Madhya Pradesh搜集数据
12:15
to search for agricultural videos
by speaking into a phone-based app.
211
735963
6667
等我们回去训练我们的模型时
我们才意识到结果非常糟糕
12:23
So we went to Madhya Pradesh
to collect data for this,
212
743363
3434
根本不管用
我们很困惑到底哪里出错了
12:26
and we came back
and we were training our models
213
746830
2300
我们仔细研究了数据
发现我们的数据
12:29
and we discovered
we're getting very bad results.
214
749163
2367
12:31
This is not working.
215
751530
1267
来源于一个我们认为
夜晚很安静的村庄
12:32
So we were very confused.
Why is this happening?
216
752830
2267
12:35
So we looked deeper
and deeper into the data
217
755130
2267
然而我们没有听到
12:37
and discovered that, yes,
we had collected data
218
757430
2233
12:39
from what we thought was a very silent,
quiet village in the evening.
219
759697
4666
夜晚的持续的虫鸣声
在录制过程中,这些
虫鸣声也被录进去了
12:44
But what we hadn't heard
while we were doing this
220
764363
3400
从而干扰了我们的语音
12:47
was that there was this
constant buzz of night insects, you know?
221
767797
4100
第二个原因是
12:51
So throughout the recordings,
we had this "bzz" of the insects,
222
771930
3467
我们到达村庄开始测试app时
12:55
which was actually distorting our speech.
223
775430
2533
我和我的同事 Indrani Medhi
12:58
The second thing was
that when we went there
224
778797
2233
他是个非常受人尊敬的设计研究者
13:01
to kind of test our app in the village,
225
781030
3867
我们发现这里的女人无法准确读出
13:04
I and my colleague Indrani Medhi,
226
784930
2533
13:07
who is a very well-regarded
design researcher,
227
787497
3600
一些梵语化的单词
(梵语化指底层人民习
得上层精英的语言)
比如
13:11
we found that the women
couldn't pronounce the sanskritized words
228
791130
4400
【印度语】
13:15
that we had for some of the search terms.
229
795530
2767
比如化学杀虫剂这个单词
13:18
So, like ...
230
798330
1500
因为我们从农业中心获得了这些单词
13:21
(speaks Hindi)
231
801663
2800
13:24
Which is like the term
for chemical pesticides, right?
232
804497
3866
但这些女人,尽管她们会耕种
但她们不与这些中心有交流
13:28
Because we got these terms
from the agricultural extension center
233
808363
5534
男人会,但是女人会
用一起更简单的单词
13:33
and the women,
even though they are farming,
234
813930
2100
比如【印度语】
就是杀死害虫的药
13:36
do not interact with that center at all.
235
816030
2867
13:38
The men do, the women probably
use something much simpler, like ...
236
818930
3967
所以在这趟旅程中我意识到
13:42
(speaks Hindi)
237
822930
1300
和我想要传导给你们的
13:44
Which basically means
killing pests with medicine.
238
824263
3534
我希望你们能够理解
13:48
So what I have learned through my journey
239
828430
3867
世界上大部分的语言
13:52
and what I would like
to put across to you --
240
832330
2900
需要有密集的资源投资
13:55
by now, I hope you've understood me,
241
835263
2000
才能从语言科技中获利
13:57
is that there is the majority
of the world's languages
242
837297
3433
而这没有快速有效的方式
14:00
that require intensive investment
for resource creation
243
840763
4267
14:05
if they are to benefit
from language technology.
244
845030
2567
所以我们必须保证
14:07
And this is unlikely to happen
in a very fast and efficient manner.
245
847630
5367
社群能够从
语言科技领域中获得最大收益
14:13
So it is extremely important
for us to ensure
246
853963
2934
14:16
that the community derives maximum benefit
247
856930
3533
为了达到为这些社群
提供积极的社会影响的目的
14:20
from whatever that we are doing
in the language tech area.
248
860497
3966
我们运用了改良版
的4-D设计思维方法
14:24
And to do this and deliver
a positive social impact
249
864497
3466
4-D的意思是:发现,
设计,发展,运用
14:27
on these communities,
250
867997
1466
14:29
we follow what we call the modified
4-D design thinking methodology.
251
869497
4733
所以我们需要发掘语言科技
14:34
So the 4-D means:
discover, design, develop and deploy.
252
874263
5200
可以为特定语言社群解决的问题
这种观察为主导的方式
能够帮助我们有效分配资源
14:39
So discover the problem
that language technology can solve
253
879497
3066
到他们最需要的地方
为使用者和他们的语言设计
14:42
for a particular language community.
254
882563
2200
14:44
This observation-led approach
can help allocate resources
255
884797
3233
了解语言属性的多样性
和世界语言
14:48
where they are most needed,
256
888030
1700
14:49
designed for the users and their language,
257
889763
2767
不要觉得语言技术
是专为英语而设计的
现在,我们如何将它改编
到马拉语或冈德语上
14:52
understand the diversity
in the linguistic properties
258
892530
3367
14:55
and the languages of the world.
259
895930
2100
快速发展,高频使用
14:58
And don't think,
oh, this is made for English.
260
898030
2400
这是一种快速失败反复改进的过程
15:00
Now, how can we just adapt it
for Marathi or for Gondi, right?
261
900463
4334
早期的失败能够带来最终的成功
15:04
Develop rapidly and deploy frequently.
262
904830
2600
15:07
It's an iterative process
that will help you fail fast
263
907463
3500
最重要的是坚持
不放弃
15:10
and early failures
will eventually lead to success.
264
910997
3366
我记得有一个关于两个
澳大利亚土著女人的故事
15:15
The important thing is to persevere.
265
915497
1966
Patricia O’Connor and Ysola Best
15:17
Do not give up.
266
917497
1366
15:18
And I remember the story
of these two Aborigine Australian women,
267
918863
5734
90年代中期,
她们来到昆士兰大学
15:24
Patricia O'Connor and Ysola Best.
268
924630
3800
他们想要学习他们自己
的语言Yugambeh
但是却被直接告知
你们的语言已经死了
15:29
In the mid-90s, they went
to the University of Queensland
269
929763
3134
已经死亡三十年了
15:32
and they wanted to learn
their own language, called Yugambeh,
270
932930
3333
你不能研究这个,换个课题吧
她们没有放弃
15:36
and they were told very bluntly,
"Your language is dead.
271
936297
2633
她们来到社群
15:38
It's been dead for three decades.
272
938963
1600
发掘口头的记忆,传统和文献
15:40
You cannot work on this.
Find something else to work on."
273
940563
3867
成立了Yugambeh博物馆
15:44
They did not give up.
274
944463
1267
15:45
They went to the community,
275
945763
1600
15:47
they dug up oral memories,
oral traditions, oral literature,
276
947363
4867
这之后变成了Yugambeh语
15:52
and founded the Yugambeh Museum,
277
952263
3367
最重要的文化和语言中心
她们没有技术,只有毅力
15:55
which became the most important cultural
and linguistic center for the language
278
955663
5434
现在拥有了科技的力量
我们能够用芬兰的Salmi语,
16:01
and its community.
279
961130
1767
16:02
They did not have technology.
They only had their willpower.
280
962930
4033
16:06
Now, with the power of technology,
281
966997
2233
加拿大的Lillooet语和印度
的Mundari语谱写新的篇章
16:09
we can ensure that the next page
is written in Salmi from Finland,
282
969263
5767
谢谢
16:15
Lillooet from Canada
or Mundari from India.
283
975030
3467
16:19
Thank you.
284
979163
1000
New videos
Original video on YouTube.com
关于本网站
这个网站将向你介绍对学习英语有用的YouTube视频。你将看到来自世界各地的一流教师教授的英语课程。双击每个视频页面上显示的英文字幕,即可从那里播放视频。字幕会随着视频的播放而同步滚动。如果你有任何意见或要求,请使用此联系表与我们联系。