How computers translate human language - Ioannis Papachimonas

424,108 views ・ 2015-10-26

TED-Ed


請雙擊下方英文字幕播放視頻。

譯者: Ivy Wang 審譯者: Gentian Pan
00:06
How is it that so many intergalactic species in movies and TV
0
6677
4629
為何電影、電視中星際間的不同物種
00:11
just happen to speak perfect English?
1
11306
3177
恰巧能講一口流利的英語?
00:14
The short answer is that no one wants to watch a starship crew
2
14483
3403
答案是:沒人想看太空船員在影片中
00:17
spend years compiling an alien dictionary.
3
17886
3888
花費數年來編撰外星人字典。
00:21
But to keep things consistent,
4
21774
1618
但為保持一致性,
00:23
the creators of Star Trek and other science-fiction worlds
5
23392
3397
「星際迷航」和其他科幻小說創作者
00:26
have introduced the concept of a universal translator,
6
26789
3725
引進「萬能翻譯器」的概念:
00:30
a portable device that can instantly translate between any languages.
7
30514
4498
一種攜帶式裝置,可即時翻譯任何語言。
00:35
So is a universal translator possible in real life?
8
35012
3527
那麼,「萬能翻譯器」可能存在於現實嗎?
00:38
We already have many programs that claim to do just that,
9
38539
3598
已有很多個程式宣稱做得到:
00:42
taking a word, sentence, or entire book in one language
10
42137
3817
從一種語言中選取單字、句子,或整本書,
00:45
and translating it into almost any other,
11
45954
3050
幾乎可以將它們翻譯成任何語言,
00:49
whether it's modern English or Ancient Sanskrit.
12
49004
3333
不論是現代英語,或是古梵語。
00:52
And if translation were just a matter of looking up words in a dictionary,
13
52337
3576
如果翻譯只是在詞典中查找單字,
00:55
these programs would run circles around humans.
14
55913
3912
那麼,這些程式早就普及了。
00:59
The reality, however, is a bit more complicated.
15
59825
3474
然而,現實複雜許多。
01:03
A rule-based translation program uses a lexical database,
16
63299
4050
基於「規則」的翻譯程式使用字彙資料庫,
01:07
which includes all the words you'd find in a dictionary
17
67349
2953
包含字典找到的單字、
01:10
and all grammatical forms they can take,
18
70302
2981
套用的文法型式、
01:13
and set of rules to recognize the basic linguistic elements in the input language.
19
73283
5642
以及「辨認基本語言元素」的規則。
01:18
For a seemingly simple sentence like, "The children eat the muffins,"
20
78925
3471
這個看似簡單的句子為例:「孩子們吃松餅」,
01:22
the program first parses its syntax, or grammatical structure,
21
82396
4654
程式首先分析「語法」或「文法結構」,
01:27
by identifying the children as the subject,
22
87050
2537
辨識出「孩子們」為主詞,
01:29
and the rest of the sentence as the predicate
23
89587
2730
剩下的句子為「述語」- 由動詞「吃」構成。
01:32
consisting of a verb "eat,"
24
92317
2051
01:34
and a direct object "the muffins."
25
94368
3054
和直接受詞 「松餅」。
01:37
It then needs to recognize English morphology,
26
97422
2827
程式需要辨識出「英語構詞學」,
01:40
or how the language can be broken down into its smallest meaningful units,
27
100249
4432
也就是將該語言拆分成 有意義的最小單元,
01:44
such as the word muffin
28
104681
1443
例如單字 「松餅」
01:46
and the suffix "s," used to indicate plural.
29
106124
3631
及字尾加「s」表示複數型。
01:49
Finally, it needs to understand the semantics,
30
109755
2694
最後,程式還需要理解「語意」- 各別部份所表達的意思。
01:52
what the different parts of the sentence actually mean.
31
112449
3729
01:56
To translate this sentence properly,
32
116178
1896
為了正確翻譯句子,
01:58
the program would refer to a different set of vocabulary and rules
33
118074
3908
程式會參考不同語言的字彙與規則
02:01
for each element of the target language.
34
121982
3184
來處理目標語言的每個元素。
02:05
But this is where it gets tricky.
35
125166
1854
這卻是棘手的地方。
02:07
The syntax of some languages allows words to be arranged in any order,
36
127020
4800
某些語言允許單字以任何順序排列,
02:11
while in others, doing so could make the muffin eat the child.
37
131820
5134
但在其它語言,這樣做會出現 「松餅吃孩子們」的句子。
02:16
Morphology can also pose a problem.
38
136954
2693
「構詞學」也有同樣問題。
02:19
Slovene distinguishes between two children and three or more
39
139647
3596
「斯拉維尼亞語」可區分是 兩個、三個、或更多孩子-
02:23
using a dual suffix absent in many other languages,
40
143243
3854
「雙字尾」的用法未見於其它語言中。
02:27
while Russian's lack of definite articles might leave you wondering
41
147097
3435
而 俄語 則缺少「定冠詞」,你可能會困惑
02:30
whether the children are eating some particular muffins,
42
150532
3043
孩子們是在吃某種特定的松餅,
02:33
or just eat muffins in general.
43
153575
3144
還是泛指一般松餅。
02:36
Finally, even when the semantics are technically correct,
44
156719
2989
最後,即使「語意」技術上正確,
02:39
the program might miss their finer points,
45
159708
3049
程式也可能遺失細微部分,
02:42
such as whether the children "mangiano" the muffins,
46
162757
3052
例如,孩子們是在「吃」松餅,
02:45
or "divorano" them.
47
165809
1985
還是在「吞」松餅?
02:47
Another method is statistical machine translation,
48
167794
3764
另一種方法是基於「統計」的機器翻譯,
02:51
which analyzes a database of books, articles, and documents
49
171558
4204
該方法分析「已翻譯的書籍、文章、文件」 所建立的資料庫。
02:55
that have already been translated by humans.
50
175762
3726
02:59
By finding matches between source and translated text
51
179488
3471
從「原文」與「譯文」之間, 尋找非偶然的匹配模式,
03:02
that are unlikely to occur by chance,
52
182959
2434
03:05
the program can identify corresponding phrases and patterns,
53
185393
3952
程式就可以辨識出對應的片語和句型,
03:09
and use them for future translations.
54
189345
3084
以便使用在未來的翻譯上。
03:12
However, the quality of this type of translation
55
192429
2540
然而,這種翻譯的品質
03:14
depends on the size of the initial database
56
194969
2721
決定於資料庫的大小
03:17
and the availability of samples for certain languages
57
197690
3667
以及能否應用於特定語言或 寫作風格的翻譯上。
03:21
or styles of writing.
58
201357
2026
03:23
The difficulty that computers have with the exceptions, irregularities
59
203383
3757
電腦的困難:會遇到異常、非常規情況、
03:27
and shades of meaning that seem to come instinctively to humans
60
207140
3854
以及無法呈現人類「直覺本能」可以了解的意函-
03:30
has led some researchers to believe that our understanding of language
61
210994
4051
這些令研究者相信「語言的理解能力」
03:35
is a unique product of our biological brain structure.
62
215045
4206
是我們大腦生理結構的獨特產物。
03:39
In fact, one of the most famous fictional universal translators,
63
219251
3850
實際上,小說中最著名的萬能翻譯器之一,
03:43
the Babel fish from "The Hitchhiker's Guide to the Galaxy",
64
223101
3338
出自《星際大奇航》的 「寶貝魚」,
03:46
is not a machine at all but a small creature
65
226439
3287
根本就不是機器,而是小生物-
03:49
that translates the brain waves and nerve signals of sentient species
66
229726
4484
是一隻能透過心靈感應,翻譯腦波和 神經信號的 「有感知」的生物 。
03:54
through a form of telepathy.
67
234210
2795
目前傳統的語言學習
03:57
For now, learning a language the old fashioned way
68
237005
2721
03:59
will still give you better results than any currently available computer program.
69
239726
5380
仍然會優於利用電腦程式的翻譯。
04:05
But this is no easy task,
70
245106
1643
但這不是簡單的任務,
04:06
and the sheer number of languages in the world,
71
246749
2265
世界上語言的數量,
04:09
as well as the increasing interaction between the people who speak them,
72
249014
3975
和人與人之間逐漸增加的語言互動,
04:12
will only continue to spur greater advances in automatic translation.
73
252989
5015
都會繼續激發「自動翻譯」的進步。
04:18
Perhaps by the time we encounter intergalactic life forms,
74
258004
3405
也許,遇到星際間的其他生物時,
04:21
we'll be able to communicate with them through a tiny gizmo,
75
261409
3251
我們已經能夠透過小裝置來溝通,
04:24
or we might have to start compiling that dictionary, after all.
76
264660
4366
也或許最終,我們還是得著手編寫那部字典。
關於本網站

本網站將向您介紹對學習英語有用的 YouTube 視頻。 您將看到來自世界各地的一流教師教授的英語課程。 雙擊每個視頻頁面上顯示的英文字幕,從那裡播放視頻。 字幕與視頻播放同步滾動。 如果您有任何意見或要求,請使用此聯繫表與我們聯繫。

https://forms.gle/WvT1wiN1qDtmnspy7