How computers translate human language - Ioannis Papachimonas

424,108 views ใƒป 2015-10-26

TED-Ed


ืื ื ืœื—ืฅ ืคืขืžื™ื™ื ืขืœ ื”ื›ืชื•ื‘ื™ื•ืช ื‘ืื ื’ืœื™ืช ืœืžื˜ื” ื›ื“ื™ ืœื”ืคืขื™ืœ ืืช ื”ืกืจื˜ื•ืŸ.

ืชืจื’ื•ื: Ido Dekkers ืขืจื™ื›ื”: Sigal Tifferet
00:06
How is it that so many intergalactic species in movies and TV
0
6677
4629
ืื™ืš ื–ื” ืฉื›ืœ ื›ืš ื”ืจื‘ื” ืžื™ื ื™ื ืื™ื ื˜ืจื’ืœืงื˜ื™ื™ื ื‘ืกืจื˜ื™ื ื•ื‘ื˜ืœื•ื™ื–ื™ื”
00:11
just happen to speak perfect English?
1
11306
3177
ืžื“ื‘ืจื™ื ื‘ืžืงืจื” ืื ื’ืœื™ืช ืžื•ืฉืœืžืช?
00:14
The short answer is that no one wants to watch a starship crew
2
14483
3403
ื”ืชืฉื•ื‘ื” ื”ืงืฆืจื” ื”ื™ื ืฉืืฃ ืื—ื“ ืœื ืจื•ืฆื” ืœืฆืคื•ืช ื‘ืฆื•ื•ืช ืกืคื™ื ืช ื—ืœืœ
00:17
spend years compiling an alien dictionary.
3
17886
3888
ืžื‘ืœื” ืฉื ื™ื ื‘ื”ืจื›ื‘ืช ืžื™ืœื•ืŸ ื—ื™ื™ื–ืจื™ื.
00:21
But to keep things consistent,
4
21774
1618
ืื‘ืœ ื›ื“ื™ ืœืฉืžื•ืจ ืขืœ ืขืงื‘ื™ื•ืช,
00:23
the creators of Star Trek and other science-fiction worlds
5
23392
3397
ื”ื™ื•ืฆืจื™ื ืฉืœ ืกื˜ืืจ ื˜ืจืง ื•ืขื•ืœืžื•ืช ื‘ื“ื™ื•ื ื™ื™ื ืื—ืจื™ื
00:26
have introduced the concept of a universal translator,
6
26789
3725
ื”ืฆื™ื’ื• ืืช ื”ืจืขื™ื•ืŸ ืฉืœ ืžืชืจื’ื ืื•ื ื™ื‘ืจืกืœื™,
00:30
a portable device that can instantly translate between any languages.
7
30514
4498
ืžื›ืฉื™ืจ ื ื™ื™ื“ ืฉื™ื›ื•ืœ ืœืชืจื’ื ืžื™ื™ื“ื™ืช ื›ืœ ืฉืคื”.
00:35
So is a universal translator possible in real life?
8
35012
3527
ืื– ื”ืื ืžืชืจื’ื ืื•ื ื™ื‘ืจืกืœื™ ืืคืฉืจื™ ื‘ื—ื™ื™ื ื”ืืžื™ืชื™ื™ื?
00:38
We already have many programs that claim to do just that,
9
38539
3598
ื™ืฉ ืœื ื• ื›ื‘ืจ ื”ืจื‘ื” ืชื•ื›ื ื•ืช ืฉื˜ื•ืขื ื•ืช ื‘ื“ื™ื•ืง ืœื–ื”,
00:42
taking a word, sentence, or entire book in one language
10
42137
3817
ืœืงื—ืช ืžื™ืœื”, ืžืฉืคื˜, ืื• ืกืคืจ ืฉืœื ื‘ืฉืคื” ืื—ืช
00:45
and translating it into almost any other,
11
45954
3050
ื•ืœืชืจื’ื ืื•ืชื• ืœื›ืžืขื˜ ื›ืœ ืื—ืช ืื—ืจืช,
00:49
whether it's modern English or Ancient Sanskrit.
12
49004
3333
ื‘ื™ืŸ ืื ื–ื” ืื ื’ืœื™ืช ืžื•ื“ืจื ื™ืช ืื• ืกื ืกืงืจื™ื˜ ืขืชื™ืงื”.
00:52
And if translation were just a matter of looking up words in a dictionary,
13
52337
3576
ื•ืื ืชืจื’ื•ื ื”ื™ื” ืคืฉื•ื˜ ืขื ื™ื™ืŸ ืฉืœ ืœื—ืคืฉ ืžื™ืœื™ื ื‘ืžื™ืœื•ืŸ,
00:55
these programs would run circles around humans.
14
55913
3912
ื”ืชื•ื›ื ื•ืช ื”ืืœื• ื”ื™ื” ื™ื›ื•ืœื•ืช ื‘ืงืœื•ืช ืœื ืฆื— ืื ืฉื™ื.
00:59
The reality, however, is a bit more complicated.
15
59825
3474
ื”ืžืฆื™ืื•ืช, ืขื ื–ืืช, ื”ื™ื ืžืขื˜ ื™ื•ืชืจ ืžื•ืจื›ื‘ืช.
01:03
A rule-based translation program uses a lexical database,
16
63299
4050
ืชืจื’ื•ื ืžื‘ื•ืกืก ื—ื•ืงื™ื ืžืฉืชืžืฉ ื‘ืžืื’ืจ ืžื™ื“ืข ืœืฉื•ื ื™,
01:07
which includes all the words you'd find in a dictionary
17
67349
2953
ืฉื›ื•ืœืœ ืืช ื›ืœ ื”ืžื™ืœื™ื ืฉืชืžืฆืื• ื‘ืžื™ืœื•ืŸ
01:10
and all grammatical forms they can take,
18
70302
2981
ื•ื›ืœ ื”ืžื‘ื ื™ื ื”ื“ืงื“ื•ืงื™ื™ื ืฉื”ืŸ ื™ื›ื•ืœื•ืช ืœืœื‘ื•ืฉ,
01:13
and set of rules to recognize the basic linguistic elements in the input language.
19
73283
5642
ื•ืกื˜ ื—ื•ืงื™ื ื›ื“ื™ ืœื”ื›ื™ืจ ืืช ื”ืืœืžื ื˜ื™ื ื”ืœืฉื•ื ื™ื™ื ื”ื‘ืกื™ืกื™ื™ื ื‘ืฉืคืช ื”ืงืœื˜.
01:18
For a seemingly simple sentence like, "The children eat the muffins,"
20
78925
3471
ืœืžืฉืคื˜ ืฉื ืจืื” ืคืฉื•ื˜ ื›ืžื•, "ื”ื™ืœื“ื™ื ืื•ื›ืœื™ื ืืช ื”ืžืืคื™ื ืก,"
01:22
the program first parses its syntax, or grammatical structure,
21
82396
4654
ื”ืชื•ื›ื ื” ืจืืฉื™ืช ืชื ืกื— ืืช ื”ืชื—ื‘ื™ืจ, ืื• ืžื‘ื ื” ื“ืงื“ื•ืงื™,
01:27
by identifying the children as the subject,
22
87050
2537
ืขืœ ื™ื“ื™ ื–ื™ื”ื•ื™ ื”ื™ืœื“ื™ื ื›ื ื•ืฉื,
01:29
and the rest of the sentence as the predicate
23
89587
2730
ื•ืฉืืจ ื”ืžืฉืคื˜ ื›ื ืฉื•ื
01:32
consisting of a verb "eat,"
24
92317
2051
ืฉืžื›ื™ืœ ืืช ื”ืคื•ืขืœ "ืœืื›ื•ืœ,"
01:34
and a direct object "the muffins."
25
94368
3054
ื•ืขืฆื ื™ืฉื™ืจ "ื”ืžืืคื™ื ืก."
01:37
It then needs to recognize English morphology,
26
97422
2827
ืื– ื”ื™ื ืฆืจื™ื›ื” ืœื–ื”ื•ืช ืžื•ืจืคื•ืœื•ื’ื™ื” ืื ื’ืœื™ืช,
01:40
or how the language can be broken down into its smallest meaningful units,
27
100249
4432
ืื• ืื™ืš ื”ืฉืคื” ื™ื›ื•ืœื” ืœื”ืชื—ืœืง ืœื™ื—ื™ื“ื•ืช ื”ื›ื™ ืงื˜ื ื•ืช ื‘ืขืœื•ืช ื”ืžืฉืžืขื•ืช,
01:44
such as the word muffin
28
104681
1443
ื›ืžื• ื”ืžื™ืœื” ืžืืคื™ืŸ
01:46
and the suffix "s," used to indicate plural.
29
106124
3631
ื•ื”ืชื•ืกืคืช "ืก" ืฉืžืฉืžืฉืช ืœื”ืจืื•ืช ืจื‘ื™ื.
01:49
Finally, it needs to understand the semantics,
30
109755
2694
ืœื‘ืกื•ืฃ, ื”ื™ื ืฆืจื™ื›ื” ืœื”ื‘ื™ืŸ ืืช ื”ืกืžื ื˜ื™ืงื”,
01:52
what the different parts of the sentence actually mean.
31
112449
3729
ืžื” ืœืžืขืฉื” ื”ืžืฉืžืขื•ืช ืฉืœ ื”ื—ืœืงื™ื ื”ืฉื•ื ื™ื ืฉืœ ื”ืžืฉืคื˜.
01:56
To translate this sentence properly,
32
116178
1896
ื›ื“ื™ ืœืชืจื’ื ืืช ื”ืžืฉืคื˜ ื”ื–ื” ื ื›ื•ืŸ,
01:58
the program would refer to a different set of vocabulary and rules
33
118074
3908
ื”ืชื•ื›ื ื” ืฆืจื™ื›ื” ืœื”ืชื™ื™ื—ืก ืœืกื˜ื™ื ืฉื•ื ื™ื ืฉืœ ืื•ืฆืจ ืžื™ืœื™ื ื•ื—ื•ืงื™ื
02:01
for each element of the target language.
34
121982
3184
ืœื›ืœ ืืœืžื ื˜ ืฉืœ ืฉืคืช ื”ืžื˜ืจื”.
02:05
But this is where it gets tricky.
35
125166
1854
ืื‘ืœ ืฉื ื–ื” ื ืขืฉื” ืžืกื•ื‘ืš.
02:07
The syntax of some languages allows words to be arranged in any order,
36
127020
4800
ื”ืชื—ื‘ื™ืจ ืฉืœ ื›ืžื” ืฉืคื•ืช ืžืืคืฉืจ ืœืžื™ืœื™ื ืœื”ื™ื•ืช ืžืื•ืจื’ื ื•ืช ื‘ื›ืœ ืกื“ืจ,
02:11
while in others, doing so could make the muffin eat the child.
37
131820
5134
ื‘ืขื•ื“ ื‘ืื—ืจื•ืช, ื–ื” ื™ื›ื•ืœ ืœื’ืจื•ื ืœืžืืคื™ืŸ ืœืื›ื•ืœ ืืช ื”ื™ืœื“.
02:16
Morphology can also pose a problem.
38
136954
2693
ืžื•ืจืคื•ืœื•ื’ื™ื” ื™ื›ื•ืœื” ื’ื ืœื”ื•ื•ืช ื‘ืขื™ื”.
02:19
Slovene distinguishes between two children and three or more
39
139647
3596
ืกืœื•ื‘ื ื™ืช ืžื‘ื“ื™ืœื” ื‘ื™ืŸ ืฉื ื™ ื™ืœื“ื™ื ื•ืฉืœื•ืฉื” ื™ืœื“ื™ื ืื• ื™ื•ืชืจ
02:23
using a dual suffix absent in many other languages,
40
143243
3854
ื‘ืฉื™ืžื•ืฉ ื‘ืชื•ืกืคืช ื›ืคื•ืœื” ืฉืœื ืงื™ื™ืžืช ื‘ื”ืจื‘ื” ืฉืคื•ืช ืื—ืจื•ืช,
02:27
while Russian's lack of definite articles might leave you wondering
41
147097
3435
ื‘ืขื•ื“ ื”ื™ืขื“ืจ ืชื•ื•ื™ื•ืช ื™ื™ื“ื•ืข ื‘ืจื•ืกื™ืช ื™ืฉืื™ืจ ืืชื›ื ืชื•ื”ื™ื
02:30
whether the children are eating some particular muffins,
42
150532
3043
ืื ื”ื™ืœื“ื™ื ืื•ื›ืœื™ื ืžืืคื™ืŸ ืžืกื•ื™ื™ื,
02:33
or just eat muffins in general.
43
153575
3144
ืื• ืคืฉื•ื˜ ืื•ื›ืœื™ื ืžืืคื™ื ืก ื‘ืื•ืคืŸ ื›ืœืœื™.
02:36
Finally, even when the semantics are technically correct,
44
156719
2989
ืœื‘ืกื•ืฃ, ืืคื™ืœื• ื›ืฉื”ืกืžื ื˜ื™ืงื” ื ื›ื•ื ื” ื˜ื›ื ื™ืช,
02:39
the program might miss their finer points,
45
159708
3049
ื”ืชื•ื›ื ื™ืช ื™ื›ื•ืœื” ืœืคืกืคืก ื ืงื•ื“ื•ืช ืขื“ื™ื ื•ืช ื™ื•ืชืจ,
02:42
such as whether the children "mangiano" the muffins,
46
162757
3052
ื›ืžื• ืื ื”ื™ืœื“ื™ื "ืžื ื’'ื™ืื ื•" ืืช ื”ืžืืคื™ื ืก,
02:45
or "divorano" them.
47
165809
1985
ืื• "ื“ื™ื‘ื•ืจื ื™" ืื•ืชื.
02:47
Another method is statistical machine translation,
48
167794
3764
ืฉื™ื˜ื” ื ื•ืกืคืช ื”ื™ื ืชืจื’ื•ื ืžื›ื•ื ื” ืกื˜ื˜ื™ืกื˜ื™,
02:51
which analyzes a database of books, articles, and documents
49
171558
4204
ืฉืžื ืชื— ืžืื’ืจ ืžื™ื“ืข ืฉืœ ืกืคืจื™ื, ืžืืžืจื™ื, ื•ืžืกืžื›ื™ื
02:55
that have already been translated by humans.
50
175762
3726
ืฉื›ื‘ืจ ืชื•ืจื’ืžื• ืขืœ ื™ื“ื™ ืื ืฉื™ื.
02:59
By finding matches between source and translated text
51
179488
3471
ืขืœ ื™ื“ื™ ืžืฆื™ืืช ื”ืชืืžื•ืช ื‘ื™ืŸ ืžืงื•ืจื•ืช ื•ื˜ืงืกื˜ ืžืชื•ืจื’ื
03:02
that are unlikely to occur by chance,
52
182959
2434
ืฉืœื ื”ื’ื™ื•ื ื™ ืฉื™ืชืจื—ืฉื• ื‘ืžืงืจื”,
03:05
the program can identify corresponding phrases and patterns,
53
185393
3952
ื”ืชื•ื›ื ื™ืช ื™ื›ื•ืœื” ืœื–ื”ื•ืช ืžื•ืฉื’ื™ื ื•ืชื‘ื ื™ื•ืช ืžื•ืชืืžื™ื,
03:09
and use them for future translations.
54
189345
3084
ื•ืœื”ืฉืชืžืฉ ื‘ื”ื ืœืชืจื’ื•ืžื™ื ืขืชื™ื“ื™ื™ื.
03:12
However, the quality of this type of translation
55
192429
2540
ืขื ื–ืืช, ื”ืื™ื›ื•ืช ืฉืœ ืกื•ื’ ื–ื” ืฉืœ ืชืจื’ื•ื
03:14
depends on the size of the initial database
56
194969
2721
ืชืœื•ื™ ื‘ื’ื•ื“ืœ ืžืื’ืจ ื”ืžื™ื“ืข ื”ืจืืฉื•ื ื™
03:17
and the availability of samples for certain languages
57
197690
3667
ื•ื”ื–ืžื™ื ื•ืช ืฉืœ ื“ื•ื’ืžืื•ืช ืœืฉืคื•ืช ืžืกื•ื™ื™ืžื•ืช
03:21
or styles of writing.
58
201357
2026
ืื• ืกื’ื ื•ื ื•ืช ืฉื•ื ื™ื ืฉืœ ื›ืชื™ื‘ื”.
03:23
The difficulty that computers have with the exceptions, irregularities
59
203383
3757
ื”ืงื•ืฉื™ ืฉื™ืฉ ืœืžื—ืฉื‘ื™ื ืขื ื™ื•ืฆืื™ ื”ื“ื•ืคืŸ, ื—ื•ืกืจ ื”ืกื“ืจ
03:27
and shades of meaning that seem to come instinctively to humans
60
207140
3854
ื•ื”ื’ื•ื•ื ื™ื ืฉืœ ืžืฉืžืขื•ื™ื•ืช ืฉื ืชืคืกื™ื ืื™ื ืกื˜ื™ื ืงื˜ื™ื‘ื™ืช ืขื‘ื•ืจ ืื ืฉื™ื
03:30
has led some researchers to believe that our understanding of language
61
210994
4051
ื”ื•ื‘ื™ืœื• ื›ืžื” ื—ื•ืงืจื™ื ืœื”ืืžื™ืŸ ืฉื”ื”ื‘ื ื” ืฉืœื ื• ืฉืœ ืฉืคื”
03:35
is a unique product of our biological brain structure.
62
215045
4206
ื”ื™ื ืชื•ืฆื ื™ื—ื•ื“ื™ ืฉืœ ืžื‘ื ื” ื”ืžื•ื— ื”ื‘ื™ื•ืœื•ื’ื™.
03:39
In fact, one of the most famous fictional universal translators,
63
219251
3850
ืœืžืขืฉื”, ืื—ื“ ื”ืžืชืจื’ืžื™ื ื”ืื•ื ื™ื‘ืจืกืœื™ื™ื ื”ืžื•ืžืฆืื™ื ื”ื›ื™ ืžืคื•ืจืกืžื™ื,
03:43
the Babel fish from "The Hitchhiker's Guide to the Galaxy",
64
223101
3338
ื“ื’ ื‘ื‘ืœ ืž"ืžื“ืจื™ืš ื”ื˜ืจืžืคื™ืกื˜ ืœื’ืœืงืกื™ื”",
03:46
is not a machine at all but a small creature
65
226439
3287
ื”ื•ื ืœื ืžื›ื•ื ื” ื‘ื›ืœืœ ืืœื ื™ืฆื•ืจ ืงื˜ืŸ
03:49
that translates the brain waves and nerve signals of sentient species
66
229726
4484
ืฉืžืชืจื’ื ืืช ื’ืœื™ ื”ืžื•ื— ื•ืกื™ืžื ื™ื ืขืฆื‘ื™ื™ื ืฉืœ ื™ืฆื•ืจื™ื ืชื‘ื•ื ื™ื™ื
03:54
through a form of telepathy.
67
234210
2795
ื“ืจืš ืฆื•ืจื” ืฉืœ ื˜ืœืคื˜ื™ื”.
03:57
For now, learning a language the old fashioned way
68
237005
2721
ื‘ื™ื ืชื™ื™ื, ืœืžื™ื“ืช ืฉืคื” ื‘ื“ืจืš ื”ืžืกื•ืจืชื™ืช
03:59
will still give you better results than any currently available computer program.
69
239726
5380
ืขื“ื™ื™ืŸ ืชื™ืชืŸ ืœื ื• ืชื•ืฆืื” ื˜ื•ื‘ื” ื™ื•ืชืจ ืžื›ืœ ืชื•ื›ื ืช ืชืจื’ื•ื ืฉื–ืžื™ื ื” ื›ืขืช.
04:05
But this is no easy task,
70
245106
1643
ืื‘ืœ ื–ื• ืœื ืžื˜ืœื” ืคืฉื•ื˜ื”,
04:06
and the sheer number of languages in the world,
71
246749
2265
ื•ื”ืžืกืคืจ ื”ืขืฆื•ื ืฉืœ ืฉืคื•ืช ื‘ืขื•ืœื,
04:09
as well as the increasing interaction between the people who speak them,
72
249014
3975
ื›ืžื• ื’ื ื”ืื™ื ื˜ืจืืงืฆื™ื” ื”ื’ื“ืœื” ื‘ื™ืŸ ื”ืื ืฉื™ื ืฉืžื“ื‘ืจื™ื ืื•ืชืŸ,
04:12
will only continue to spur greater advances in automatic translation.
73
252989
5015
ืจืง ืชืžืฉื™ืš ืœืขื•ืจืจ ื”ืชืงื“ืžื•ืช ื’ื“ื•ืœื” ื™ื•ืชืจ ื‘ืชืจื’ื•ื ืื•ื˜ื•ืžื˜ื™.
04:18
Perhaps by the time we encounter intergalactic life forms,
74
258004
3405
ืื•ืœื™ ื‘ื–ืžืŸ ืฉื ื™ืชืงืœ ื‘ืฆื•ืจื•ืช ื—ื™ื™ื ืื™ื ื˜ืจื’ืœืงื˜ื™ื•ืช,
04:21
we'll be able to communicate with them through a tiny gizmo,
75
261409
3251
ื ื”ื™ื” ืžืกื•ื’ืœื™ื ืœืชืงืฉืจ ืื™ืชื ื“ืจืš ืžื›ืฉื™ืจ ื–ืขื™ืจ,
04:24
or we might have to start compiling that dictionary, after all.
76
264660
4366
ืื• ืฉืื•ืœื™ ื ืฆื˜ืจืš ืœื”ืชื—ื™ืœ ืœื”ืจื›ื™ื‘ ืืช ื”ืžื™ืœื•ืŸ ื”ื”ื•ื ืื—ืจื™ ื”ื›ืœ.
ืขืœ ืืชืจ ื–ื”

ืืชืจ ื–ื” ื™ืฆื™ื’ ื‘ืคื ื™ื›ื ืกืจื˜ื•ื ื™ YouTube ื”ืžื•ืขื™ืœื™ื ืœืœื™ืžื•ื“ ืื ื’ืœื™ืช. ืชื•ื›ืœื• ืœืจืื•ืช ืฉื™ืขื•ืจื™ ืื ื’ืœื™ืช ื”ืžื•ืขื‘ืจื™ื ืขืœ ื™ื“ื™ ืžื•ืจื™ื ืžื”ืฉื•ืจื” ื”ืจืืฉื•ื ื” ืžืจื—ื‘ื™ ื”ืขื•ืœื. ืœื—ืฅ ืคืขืžื™ื™ื ืขืœ ื”ื›ืชื•ื‘ื™ื•ืช ื‘ืื ื’ืœื™ืช ื”ืžื•ืฆื’ื•ืช ื‘ื›ืœ ื“ืฃ ื•ื™ื“ืื• ื›ื“ื™ ืœื”ืคืขื™ืœ ืืช ื”ืกืจื˜ื•ืŸ ืžืฉื. ื”ื›ืชื•ื‘ื™ื•ืช ื’ื•ืœืœื•ืช ื‘ืกื ื›ืจื•ืŸ ืขื ื”ืคืขืœืช ื”ื•ื•ื™ื“ืื•. ืื ื™ืฉ ืœืš ื”ืขืจื•ืช ืื• ื‘ืงืฉื•ืช, ืื ื ืฆื•ืจ ืื™ืชื ื• ืงืฉืจ ื‘ืืžืฆืขื•ืช ื˜ื•ืคืก ื™ืฆื™ืจืช ืงืฉืจ ื–ื”.

https://forms.gle/WvT1wiN1qDtmnspy7