How computers translate human language - Ioannis Papachimonas

426,286 views ・ 2015-10-26

TED-Ed


请双击下面的英文字幕来播放视频。

翻译人员: Chuyu Huang 校对人员: Scarlett Huang
00:06
How is it that so many intergalactic species in movies and TV
0
6677
4629
为什么影视剧里会有那么多的星际物种
00:11
just happen to speak perfect English?
1
11306
3177
恰好都会说一口流利的英语呢?
00:14
The short answer is that no one wants to watch a starship crew
2
14483
3403
原因很简单,因为没人希望为了看一部星际舰队
00:17
spend years compiling an alien dictionary.
3
17886
3888
还得花上好几年时间去编译一本外星字典
00:21
But to keep things consistent,
4
21774
1618
但为了保证一致性,
00:23
the creators of Star Trek and other science-fiction worlds
5
23392
3397
星际迷航和其它科幻小说的编导们
00:26
have introduced the concept of a universal translator,
6
26789
3725
就想出了万能翻译机这个点子
00:30
a portable device that can instantly translate between any languages.
7
30514
4498
一个能够立马能在各种语言间进行翻译的手持设备
00:35
So is a universal translator possible in real life?
8
35012
3527
你们觉得万能翻译机在现实生活中是可行的吗?
00:38
We already have many programs that claim to do just that,
9
38539
3598
现在已经有很多程序声称
00:42
taking a word, sentence, or entire book in one language
10
42137
3817
他们能在各种语言间进行翻译
00:45
and translating it into almost any other,
11
45954
3050
不管是一个字,一句话,一本书
00:49
whether it's modern English or Ancient Sanskrit.
12
49004
3333
也不管是现代英语还是古梵语
00:52
And if translation were just a matter of looking up words in a dictionary,
13
52337
3576
如果翻译仅仅只是在字典上查找字意的话,
00:55
these programs would run circles around humans.
14
55913
3912
这些程序完全能比人类做得更好
00:59
The reality, however, is a bit more complicated.
15
59825
3474
但实际上没那么简单
01:03
A rule-based translation program uses a lexical database,
16
63299
4050
一个基于规则的翻译系统所用的词义数据
01:07
which includes all the words you'd find in a dictionary
17
67349
2953
包括你能在字典上找到的所有单词
01:10
and all grammatical forms they can take,
18
70302
2981
和所有能够使用的语法形态
01:13
and set of rules to recognize the basic linguistic elements in the input language.
19
73283
5642
并且得有一套规则能够区分输入语言的基本语言成分
01:18
For a seemingly simple sentence like, "The children eat the muffins,"
20
78925
3471
举个看起来比较简单的例子:孩子们在吃松饼。
01:22
the program first parses its syntax, or grammatical structure,
21
82396
4654
翻译程序会先解析这句话的句法或语法结构
01:27
by identifying the children as the subject,
22
87050
2537
通过将“孩子”定为主语
01:29
and the rest of the sentence as the predicate
23
89587
2730
剩下的部分作为谓语
01:32
consisting of a verb "eat,"
24
92317
2051
并且包含动词“吃”
01:34
and a direct object "the muffins."
25
94368
3054
和直接宾语“松饼”
01:37
It then needs to recognize English morphology,
26
97422
2827
或者这段话怎么才能够拆分成几个小词组
01:40
or how the language can be broken down into its smallest meaningful units,
27
100249
4432
01:44
such as the word muffin
28
104681
1443
就比如说“松饼”这个词
01:46
and the suffix "s," used to indicate plural.
29
106124
3631
后缀“s” 通常是表示复数
01:49
Finally, it needs to understand the semantics,
30
109755
2694
最后一步还需要理解其中的语义学
01:52
what the different parts of the sentence actually mean.
31
112449
3729
需要理解这段话中的每个部分都各自表示什么意思
为了恰当地翻译这句话
01:56
To translate this sentence properly,
32
116178
1896
翻译程序会为将翻译的文本
01:58
the program would refer to a different set of vocabulary and rules
33
118074
3908
参照其语言的各个要素词汇和使用规则
02:01
for each element of the target language.
34
121982
3184
但这才是麻烦的地方
02:05
But this is where it gets tricky.
35
125166
1854
在一些语言的句法结构中,文字并没有特定的顺序
02:07
The syntax of some languages allows words to be arranged in any order,
36
127020
4800
02:11
while in others, doing so could make the muffin eat the child.
37
131820
5134
而且在有些语言中这句话看起来就像:松饼在吃小孩儿
02:16
Morphology can also pose a problem.
38
136954
2693
词态学也是个问题
斯洛文尼亚语中区别通过使用双重后缀缺失
02:19
Slovene distinguishes between two children and three or more
39
139647
3596
来区分这句话中孩子的数量,两个、三个或者更多
02:23
using a dual suffix absent in many other languages,
40
143243
3854
然后俄罗斯人不使用定冠词会让你觉得
02:27
while Russian's lack of definite articles might leave you wondering
41
147097
3435
02:30
whether the children are eating some particular muffins,
42
150532
3043
这些孩子到底是在吃一些特定的松饼呢
02:33
or just eat muffins in general.
43
153575
3144
还是一般含义上的松饼
02:36
Finally, even when the semantics are technically correct,
44
156719
2989
结果是,就算程序翻译出来的语义是正确的
02:39
the program might miss their finer points,
45
159708
3049
它可能还是会忽略一些细节
02:42
such as whether the children "mangiano" the muffins,
46
162757
3052
就比如说这些孩子到底是在吃松饼
02:45
or "divorano" them.
47
165809
1985
还是在吞松饼?
02:47
Another method is statistical machine translation,
48
167794
3764
另一个研究方法是:统计翻译法
02:51
which analyzes a database of books, articles, and documents
49
171558
4204
这个方法是取分析那些已经被前人翻译过的
书籍、文章和文件的数据库
02:55
that have already been translated by humans.
50
175762
3726
02:59
By finding matches between source and translated text
51
179488
3471
翻译系统可以通过找到那些不是偶然
03:02
that are unlikely to occur by chance,
52
182959
2434
和译文恰好匹配的资源
03:05
the program can identify corresponding phrases and patterns,
53
185393
3952
辨识相关的短语和句型
03:09
and use them for future translations.
54
189345
3084
并存以备用
03:12
However, the quality of this type of translation
55
192429
2540
然而这种方式的翻译质量
03:14
depends on the size of the initial database
56
194969
2721
得根据某些语言或写作风格的
03:17
and the availability of samples for certain languages
57
197690
3667
初始数据库
03:21
or styles of writing.
58
201357
2026
和语库可用性而定
03:23
The difficulty that computers have with the exceptions, irregularities
59
203383
3757
有一些困难,就像一些特例、非常规的事物
和人类本能上的细微区别这样的困难
03:27
and shades of meaning that seem to come instinctively to humans
60
207140
3854
03:30
has led some researchers to believe that our understanding of language
61
210994
4051
导致了一些研究人员觉得我们对于语言的理解
03:35
is a unique product of our biological brain structure.
62
215045
4206
是我们大脑生物结构的单一产物
但事实上,最著名的科幻小说通用翻译器
03:39
In fact, one of the most famous fictional universal translators,
63
219251
3850
03:43
the Babel fish from "The Hitchhiker's Guide to the Galaxy",
64
223101
3338
-“巴别塔” 是从“银河系漫游指南”中逐渐分离出来的
03:46
is not a machine at all but a small creature
65
226439
3287
这翻译器不完全只是一个机器,而是一个
03:49
that translates the brain waves and nerve signals of sentient species
66
229726
4484
能以心电感应形式
从有意识生物那儿翻译他们的脑电波和神经信号的小生物
03:54
through a form of telepathy.
67
234210
2795
03:57
For now, learning a language the old fashioned way
68
237005
2721
目前为止,用老办法去学一门新的语言
03:59
will still give you better results than any currently available computer program.
69
239726
5380
仍然比用目前可用的计算机程序的效果更好
04:05
But this is no easy task,
70
245106
1643
但这也绝非易事,
04:06
and the sheer number of languages in the world,
71
246749
2265
世界上语言的绝对数量
04:09
as well as the increasing interaction between the people who speak them,
72
249014
3975
和其使用者间的相互作用
04:12
will only continue to spur greater advances in automatic translation.
73
252989
5015
会刺激自动翻译系统不断进步
也许等到我们遇到星际生命形态的物种时
04:18
Perhaps by the time we encounter intergalactic life forms,
74
258004
3405
04:21
we'll be able to communicate with them through a tiny gizmo,
75
261409
3251
我们就能够通过一个小发明与他们交流
04:24
or we might have to start compiling that dictionary, after all.
76
264660
4366
又或许我们终究得编译那样一套字典。
关于本网站

这个网站将向你介绍对学习英语有用的YouTube视频。你将看到来自世界各地的一流教师教授的英语课程。双击每个视频页面上显示的英文字幕,即可从那里播放视频。字幕会随着视频的播放而同步滚动。如果你有任何意见或要求,请使用此联系表与我们联系。

https://forms.gle/WvT1wiN1qDtmnspy7


This website was created in October 2020 and last updated on June 12, 2025.

It is now archived and preserved as an English learning resource.

Some information may be out of date.

隐私政策

eng.lish.video

Developer's Blog