What we learned from 5 million books

236,151 views ・ 2011-09-20

TED


Dvaput kliknite na engleske titlove ispod za reprodukciju videozapisa.

Prevoditelj: Katarina Smetko Recezent: Zlatko Smetisko
00:15
Erez Lieberman Aiden: Everyone knows
0
15260
2000
Erez Lieberman Aiden: Svi znaju
00:17
that a picture is worth a thousand words.
1
17260
3000
da slika vrijedi tisuću riječi.
00:22
But we at Harvard
2
22260
2000
No, mi smo se na Harvardu
00:24
were wondering if this was really true.
3
24260
3000
zapitali je li to stvarno istina.
00:27
(Laughter)
4
27260
2000
(Smijeh)
00:29
So we assembled a team of experts,
5
29260
4000
Tako smo okupili tim stručnjaka,
00:33
spanning Harvard, MIT,
6
33260
2000
koji obuhvaća ljude na Harvardu i MIT-u,
00:35
The American Heritage Dictionary, The Encyclopedia Britannica
7
35260
3000
one koji rade na rječniku American Heritage i Encyclopediji Britannici,
00:38
and even our proud sponsors,
8
38260
2000
čak i naše ponosne sponzore,
00:40
the Google.
9
40260
3000
Google.
00:43
And we cogitated about this
10
43260
2000
Razmišljali smo o tome
00:45
for about four years.
11
45260
2000
oko četiri godine
00:47
And we came to a startling conclusion.
12
47260
5000
i došli smo do začuđujućeg zaključka.
00:52
Ladies and gentlemen, a picture is not worth a thousand words.
13
52260
3000
Dame i gospodo, slika ne vrijedi tisuću riječi.
00:55
In fact, we found some pictures
14
55260
2000
Čak smo pronašli neke slike
00:57
that are worth 500 billion words.
15
57260
5000
koje vrijede 500 milijardi riječi.
01:02
Jean-Baptiste Michel: So how did we get to this conclusion?
16
62260
2000
Jean-Baptiste Michel: Kako smo došlo do tog zaključka?
01:04
So Erez and I were thinking about ways
17
64260
2000
Erez i ja razmišljali smo o načinima
01:06
to get a big picture of human culture
18
66260
2000
na koje bismo mogli steći općenitu sliku ljudske kulture
01:08
and human history: change over time.
19
68260
3000
i ljudske povijesti: promjene kroz vrijeme.
01:11
So many books actually have been written over the years.
20
71260
2000
Kroz vrijeme je zapravo napisano mnogo knjiga.
01:13
So we were thinking, well the best way to learn from them
21
73260
2000
Stoga smo mislili kako je najbolji način da nešto naučimo iz njih
01:15
is to read all of these millions of books.
22
75260
2000
taj da pročitamo sve te milijune knjiga.
01:17
Now of course, if there's a scale for how awesome that is,
23
77260
3000
Naravno, ako postoji ljestvica za mjerenje koliko je to fenomenalno,
01:20
that has to rank extremely, extremely high.
24
80260
3000
tako nešto mora biti rangirano vrlo, vrlo visoko.
01:23
Now the problem is there's an X-axis for that,
25
83260
2000
Problem je što za to postoji os x
01:25
which is the practical axis.
26
85260
2000
ili praktična os.
01:27
This is very, very low.
27
87260
2000
Na njoj se to nalazi vrlo, vrlo nisko.
01:29
(Applause)
28
89260
3000
(Pljesak)
01:32
Now people tend to use an alternative approach,
29
92260
3000
Ljudi su skloni primjenjivanju alternativnog pristupa,
01:35
which is to take a few sources and read them very carefully.
30
95260
2000
a to je da izaberu nekoliko izvora i njih pročitaju vrlo pažljivo.
01:37
This is extremely practical, but not so awesome.
31
97260
2000
To je vrlo praktično, ali nije baš fenomenalno.
01:39
What you really want to do
32
99260
3000
Ono što zapravo želite
01:42
is to get to the awesome yet practical part of this space.
33
102260
3000
jest doći do dijela koji je i fenomenalan i praktičan.
01:45
So it turns out there was a company across the river called Google
34
105260
3000
Ispada da s druge strane rijeke postoji tvrtka koja se zove Google,
01:48
who had started a digitization project a few years back
35
108260
2000
koja je prije nekoliko godina počela s projektom digitalizacije
01:50
that might just enable this approach.
36
110260
2000
koji bi mogao omogućiti upravo ovaj pristup.
01:52
They have digitized millions of books.
37
112260
2000
Digitalizirali su milijune knjiga.
01:54
So what that means is, one could use computational methods
38
114260
3000
A to znači da se možemo služiti računalnim metodama
01:57
to read all of the books in a click of a button.
39
117260
2000
kako bismo sve knjige pročitali pritiskom na tipku.
01:59
That's very practical and extremely awesome.
40
119260
3000
To je vrlo praktično i poprilično fenomenalno.
02:03
ELA: Let me tell you a little bit about where books come from.
41
123260
2000
ELA: Ispričat ću vam malo o tome odakle dolaze knjige.
02:05
Since time immemorial, there have been authors.
42
125260
3000
Od pamtivijeka postoje autori.
02:08
These authors have been striving to write books.
43
128260
3000
Oni teže tome da pišu knjige.
02:11
And this became considerably easier
44
131260
2000
To je postalo znatno lakše
02:13
with the development of the printing press some centuries ago.
45
133260
2000
s razvojem tehnike tiskanja prije nekoliko stoljeća.
02:15
Since then, the authors have won
46
135260
3000
Od tada su autori pobijedili
02:18
on 129 million distinct occasions,
47
138260
2000
129 milijuna puta
02:20
publishing books.
48
140260
2000
i objavili su knjige.
02:22
Now if those books are not lost to history,
49
142260
2000
Ako se te knjige s vremenom nisu izgubile,
02:24
then they are somewhere in a library,
50
144260
2000
znači da su negdje u nekoj knjižnici.
02:26
and many of those books have been getting retrieved from the libraries
51
146260
3000
Mnoge od tih knjiga izvučene su iz knjižnica
02:29
and digitized by Google,
52
149260
2000
i Google ih je digitalizirao.
02:31
which has scanned 15 million books to date.
53
151260
2000
Do danas je skenirano 15 milijuna knjiga.
02:33
Now when Google digitizes a book, they put it into a really nice format.
54
153260
3000
Kad Google digitalizira knjigu, stavlja ju u zaista zgodan format.
02:36
Now we've got the data, plus we have metadata.
55
156260
2000
Imamo podatke, a imamo i metapodatke.
02:38
We have information about things like where was it published,
56
158260
3000
Imamo informacije o stvarima kao što su mjesto izdavanja,
02:41
who was the author, when was it published.
57
161260
2000
ime autora, datum izdavanja.
02:43
And what we do is go through all of those records
58
163260
3000
I mi tada prolazimo kroz sve te zapise
02:46
and exclude everything that's not the highest quality data.
59
166260
4000
i izostavljamo sve što nisu podaci najviše kvalitete.
02:50
What we're left with
60
170260
2000
Ono što nam ostaje
02:52
is a collection of five million books,
61
172260
3000
zbirka je od pet milijuna knjiga,
02:55
500 billion words,
62
175260
3000
500 milijardi riječi,
02:58
a string of characters a thousand times longer
63
178260
2000
niz likova koji je tisuću puta dulji
03:00
than the human genome --
64
180260
3000
od ljudskog genoma --
03:03
a text which, when written out,
65
183260
2000
tekst koji bi, kad bi se ispisao,
03:05
would stretch from here to the Moon and back
66
185260
2000
protezao 10 puta odavde do Mjeseca
03:07
10 times over --
67
187260
2000
i natrag --
03:09
a veritable shard of our cultural genome.
68
189260
4000
zaista tek djelić našeg kulturnog genoma.
03:13
Of course what we did
69
193260
2000
Naravno, ono što smo učinili,
03:15
when faced with such outrageous hyperbole ...
70
195260
3000
kad smo se suočili s tako skandaloznom hiperbolom...
03:18
(Laughter)
71
198260
2000
(Smijeh)
03:20
was what any self-respecting researchers
72
200260
3000
bilo je isto što bi učinili bilo koji istraživači
03:23
would have done.
73
203260
3000
koji drže do sebe.
03:26
We took a page out of XKCD,
74
206260
2000
Uzeli smo jednu stranicu s XKCD-a
03:28
and we said, "Stand back.
75
208260
2000
i rekli: "Odmaknite se!
03:30
We're going to try science."
76
210260
2000
Pokušat ćemo nešto znanstveno!"
03:32
(Laughter)
77
212260
2000
(Smijeh)
03:34
JM: Now of course, we were thinking,
78
214260
2000
JM: Naravno, mislili smo,
03:36
well let's just first put the data out there
79
216260
2000
hajdemo prvo omogućiti pristup podacima
03:38
for people to do science to it.
80
218260
2000
kako bi ih ljudi mogli znanstveno promotriti.
03:40
Now we're thinking, what data can we release?
81
220260
2000
Razmišljali smo kojim podacima možemo omogućiti pristup?
03:42
Well of course, you want to take the books
82
222260
2000
Naravno, želite uzeti te knjige
03:44
and release the full text of these five million books.
83
224260
2000
i omogućiti pristup kompletnom tekstu tih pet milijuna knjiga.
03:46
Now Google, and Jon Orwant in particular,
84
226260
2000
Google, a pogotovo Jon Orwant,
03:48
told us a little equation that we should learn.
85
228260
2000
pokazali su nam malu jednadžbu koju smo morali naučiti.
03:50
So you have five million, that is, five million authors
86
230260
3000
Imate pet milijuna knjiga, odnosno pet miljuna autora
03:53
and five million plaintiffs is a massive lawsuit.
87
233260
3000
i pet milijuna tužitelja u masovnoj tužbi.
03:56
So, although that would be really, really awesome,
88
236260
2000
Dakle, iako bi to bilo stvarno, stvarno fenomenalno,
03:58
again, that's extremely, extremely impractical.
89
238260
3000
to je opet vrlo, vrlo nepraktično.
04:01
(Laughter)
90
241260
2000
(Smijeh)
04:03
Now again, we kind of caved in,
91
243260
2000
Opet smo popustilli
04:05
and we did the very practical approach, which was a bit less awesome.
92
245260
3000
i primijenili vrlo praktičan pristup, koji je bio nešto manje fenomenalan.
04:08
We said, well instead of releasing the full text,
93
248260
2000
Rekli smo, umjesto da omogućimo pristup kompletnom tekstu,
04:10
we're going to release statistics about the books.
94
250260
2000
omogućit ćemo pristup statistikama o knjigama.
04:12
So take for instance "A gleam of happiness."
95
252260
2000
Uzmite primjerice "tračak sreće" (a gleam of happiness).
04:14
It's four words; we call that a four-gram.
96
254260
2000
To su četiri riječi i to zovemo četverogram.
04:16
We're going to tell you how many times a particular four-gram
97
256260
2000
Reći ćemo vam koliko se puta određeni četverogram
04:18
appeared in books in 1801, 1802, 1803,
98
258260
2000
pojavio u knjigama 1801., 1802., 1803. godine,
04:20
all the way up to 2008.
99
260260
2000
i tako sve do 2008.
04:22
That gives us a time series
100
262260
2000
Tako dobivamo vremenski niz
04:24
of how frequently this particular sentence was used over time.
101
264260
2000
učestalosti korištenja određene rečenice kroz vrijeme,
04:26
We do that for all the words and phrases that appear in those books,
102
266260
3000
To smo napravili za sve riječi i izraze koji se pojavljuju u tim knjigama,
04:29
and that gives us a big table of two billion lines
103
269260
3000
što nam daje veliku tablicu od dvije milijarde redaka
04:32
that tell us about the way culture has been changing.
104
272260
2000
koji nam prikazuju način na koji se kultura mijenja.
04:34
ELA: So those two billion lines,
105
274260
2000
ELA: Te dvije milijarde redaka
04:36
we call them two billion n-grams.
106
276260
2000
zovemo dvije milijarde n-grama.
04:38
What do they tell us?
107
278260
2000
Što nam oni govore?
04:40
Well the individual n-grams measure cultural trends.
108
280260
2000
Pojedinačni n-grami mjere kulturne trendove.
04:42
Let me give you an example.
109
282260
2000
Dat ću vam primjer.
04:44
Let's suppose that I am thriving,
110
284260
2000
Pretpostavimo da ja težim nečemu (thrive),
04:46
then tomorrow I want to tell you about how well I did.
111
286260
2000
a sutra vam želim ispričati koliko sam bio uspješan.
04:48
And so I might say, "Yesterday, I throve."
112
288260
3000
Mogao bih koristiti oblik za prošlo vrijeme "throve",
04:51
Alternatively, I could say, "Yesterday, I thrived."
113
291260
3000
a mogao koristiti i oblike "thrived".
04:54
Well which one should I use?
114
294260
3000
Koji bih trebao koristiti?
04:57
How to know?
115
297260
2000
Kako to znati?
04:59
As of about six months ago,
116
299260
2000
Prije otprilike šest mjeseci,
05:01
the state of the art in this field
117
301260
2000
najsuvremeniji podaci u tom polju
05:03
is that you would, for instance,
118
303260
2000
kažu da biste, primjerice,
05:05
go up to the following psychologist with fabulous hair,
119
305260
2000
otišli do ovog psihologa fantastične kose
05:07
and you'd say,
120
307260
2000
i rekli biste:
05:09
"Steve, you're an expert on the irregular verbs.
121
309260
3000
"Steve, ti si stručnjak za nepravilne glagole.
05:12
What should I do?"
122
312260
2000
Što da radim?"
05:14
And he'd tell you, "Well most people say thrived,
123
314260
2000
A on bi vam rekao: "Pa, većina ljudi koristi "thrived",
05:16
but some people say throve."
124
316260
3000
ali neki ljudi kažu "throve".
05:19
And you also knew, more or less,
125
319260
2000
A znali biste i, više-manje,
05:21
that if you were to go back in time 200 years
126
321260
3000
da kad biste se vratili 200 godina u prošlost
05:24
and ask the following statesman with equally fabulous hair,
127
324260
3000
i pitali ovog državnika jednako fantastične kose,
05:27
(Laughter)
128
327260
3000
(Smijeh)
05:30
"Tom, what should I say?"
129
330260
2000
"Tome, kako bih trebao govoriti?"
05:32
He'd say, "Well, in my day, most people throve,
130
332260
2000
On bi vam rekao: "Pa, u moje vrijeme većina je ljudi koristila "throve",
05:34
but some thrived."
131
334260
3000
ali neki su koristili "thrived".
05:37
So now what I'm just going to show you is raw data.
132
337260
2000
Sad ću vam pokazati samo sirove podatke.
05:39
Two rows from this table of two billion entries.
133
339260
4000
Dva reda iz ove tablice od dvije milijarde unosa.
05:43
What you're seeing is year by year frequency
134
343260
2000
Sada gledate učestalost godinu za godinom
05:45
of "thrived" and "throve" over time.
135
345260
3000
korištenja "thrived" i "throve" kroz vrijeme.
05:49
Now this is just two
136
349260
2000
Dakle, to su samo dva reda
05:51
out of two billion rows.
137
351260
3000
od dvije milijarde redova.
05:54
So the entire data set
138
354260
2000
Ukupan skup podataka
05:56
is a billion times more awesome than this slide.
139
356260
3000
milijardu je puta fenomenalniji od ovog slajda.
05:59
(Laughter)
140
359260
2000
(Smijeh)
06:01
(Applause)
141
361260
4000
(Pljesak)
06:05
JM: Now there are many other pictures that are worth 500 billion words.
142
365260
2000
JM: Postoji mnogo drugih slika koje vrijede 500 milijardi riječi.
06:07
For instance, this one.
143
367260
2000
Na primjer, ova ovdje.
06:09
If you just take influenza,
144
369260
2000
Ako uzmete samo gripu,
06:11
you will see peaks at the time where you knew
145
371260
2000
vidjet ćete vrhove u vrijeme za koje znate
06:13
big flu epidemics were killing people around the globe.
146
373260
3000
da su velike epidemije tada ubijale ljude u cijelom svijetu.
06:16
ELA: If you were not yet convinced,
147
376260
3000
ELA: Ako vam treba još dokaza,
06:19
sea levels are rising,
148
379260
2000
diže se razina mora,
06:21
so is atmospheric CO2 and global temperature.
149
381260
3000
kao i CO2 i temperatura u svijetu.
06:24
JM: You might also want to have a look at this particular n-gram,
150
384260
3000
JM: Možda ne bi bilo loše da pogledate i ovaj konkretni n-gram,
06:27
and that's to tell Nietzsche that God is not dead,
151
387260
3000
koji govori Nietzscheu da Bog nije mrtav,
06:30
although you might agree that he might need a better publicist.
152
390260
3000
iako se možda slažete da bi mu trebao bolji izdavač.
06:33
(Laughter)
153
393260
2000
(Smijeh)
06:35
ELA: You can get at some pretty abstract concepts with this sort of thing.
154
395260
3000
ELA: Na ovaj način možete dobiti prilično apstraktne koncepte.
06:38
For instance, let me tell you the history
155
398260
2000
Na primjer, ispričat ću vam priču
06:40
of the year 1950.
156
400260
2000
o 1950. godini.
06:42
Pretty much for the vast majority of history,
157
402260
2000
Veliki dio povijesti,
06:44
no one gave a damn about 1950.
158
404260
2000
nikoga nije bilo briga za 1950. godinu.
06:46
In 1700, in 1800, in 1900,
159
406260
2000
1700. godine, 1800., 1900.,
06:48
no one cared.
160
408260
3000
nikoga nije bilo briga.
06:52
Through the 30s and 40s,
161
412260
2000
30-ih i 40-ih godina,
06:54
no one cared.
162
414260
2000
nikoga nije bilo briga.
06:56
Suddenly, in the mid-40s,
163
416260
2000
Odjednom, sredinom 40-ih,
06:58
there started to be a buzz.
164
418260
2000
počelo se brujati o tome.
07:00
People realized that 1950 was going to happen,
165
420260
2000
Ljudi su shvatili da će doći 1950. godina
07:02
and it could be big.
166
422260
2000
i da bi mogla biti važna.
07:04
(Laughter)
167
424260
3000
(Smijeh)
07:07
But nothing got people interested in 1950
168
427260
3000
Ali ništa nije ljude zainteresiralo za 1950. godinu
07:10
like the year 1950.
169
430260
3000
kao 1950. godina.
07:13
(Laughter)
170
433260
3000
(Smijeh)
07:16
People were walking around obsessed.
171
436260
2000
Ljudi su hodali uokolo opsjednuti.
07:18
They couldn't stop talking
172
438260
2000
Nisu mogli prestati govoriti
07:20
about all the things they did in 1950,
173
440260
3000
o svim stvarima koje su učinili 1950. godine,
07:23
all the things they were planning to do in 1950,
174
443260
3000
o svim stvarima koje planiraju učiniti 1950. godine,
07:26
all the dreams of what they wanted to accomplish in 1950.
175
446260
5000
o svim snovima koje žele ostvariti 1950. godine.
07:31
In fact, 1950 was so fascinating
176
451260
2000
Zapravo, 1950. godina bila je toliko fascinantna
07:33
that for years thereafter,
177
453260
2000
da su i godinama kasnije
07:35
people just kept talking about all the amazing things that happened,
178
455260
3000
ljudi i dalje govorili o fantastičnim stvarima koje su se dogodile,
07:38
in '51, '52, '53.
179
458260
2000
'51., '52., '53.
07:40
Finally in 1954,
180
460260
2000
Na kraju, 1954. godine,
07:42
someone woke up and realized
181
462260
2000
netko se otrijeznio i shvatio
07:44
that 1950 had gotten somewhat passé.
182
464260
4000
da je 1950. godina postala passé.
07:48
(Laughter)
183
468260
2000
(Smijeh)
07:50
And just like that, the bubble burst.
184
470260
2000
I tako se iznenada mjehurić rasprsnuo.
07:52
(Laughter)
185
472260
2000
(Smijeh)
07:54
And the story of 1950
186
474260
2000
Priča o 1950. godini
07:56
is the story of every year that we have on record,
187
476260
2000
priča je o svakoj godini koju smo zabilježili,
07:58
with a little twist, because now we've got these nice charts.
188
478260
3000
s malom razlikom, jer sad imamo ove krasne grafove.
08:01
And because we have these nice charts, we can measure things.
189
481260
3000
A budući da imamo te krasne grafove, možemo mjeriti razne stvari.
08:04
We can say, "Well how fast does the bubble burst?"
190
484260
2000
Možemo pitati: "Koliko će se brzo mjehurić rasprsnuti?"
08:06
And it turns out that we can measure that very precisely.
191
486260
3000
Ispada da to možemo vrlo precizno izmjeriti.
08:09
Equations were derived, graphs were produced,
192
489260
3000
Jednadžbe su se derivirale, grafovi su se crtali,
08:12
and the net result
193
492260
2000
a ukupni rezultat
08:14
is that we find that the bubble bursts faster and faster
194
494260
3000
jest taj da smo otkrili da se mjehurić rasprsne sve brže
08:17
with each passing year.
195
497260
2000
sa svakom godinom koja prođe.
08:19
We are losing interest in the past more rapidly.
196
499260
5000
Sve brže gubimo zanimanje za prošlost.
08:24
JM: Now a little piece of career advice.
197
504260
2000
JM: A sad mali savjet o odabiru karijere.
08:26
So for those of you who seek to be famous,
198
506260
2000
Oni među vama koji žele biti slavni
08:28
we can learn from the 25 most famous political figures,
199
508260
2000
mogu ponešto naučiti od 25 najpoznatijih političkih ličnosti,
08:30
authors, actors and so on.
200
510260
2000
pisaca, glumaca i drugih.
08:32
So if you want to become famous early on, you should be an actor,
201
512260
3000
Dakle, ako želite rano postati slavni, trebate postati glumac
08:35
because then fame starts rising by the end of your 20s --
202
515260
2000
jer tada postajete slavni do kraja svojih 20-ih godina --
08:37
you're still young, it's really great.
203
517260
2000
još uvijek ste mladi i to je odlično.
08:39
Now if you can wait a little bit, you should be an author,
204
519260
2000
Ako možete malo čekati, trebali biste biti pisac
08:41
because then you rise to very great heights,
205
521260
2000
jer tada se možete vrlo visoko uzdignuti,
08:43
like Mark Twain, for instance: extremely famous.
206
523260
2000
poput primjerice Marka Twaina, on je bio zaista slavan.
08:45
But if you want to reach the very top,
207
525260
2000
Ali ako želite dosegnuti sam vrh,
08:47
you should delay gratification
208
527260
2000
trebali biste odgoditi zadovoljstvo
08:49
and, of course, become a politician.
209
529260
2000
i, naravno, postati političar.
08:51
So here you will become famous by the end of your 50s,
210
531260
2000
U tom ćete slučaju postati poznati do kraja svojih 50-ih godina,
08:53
and become very, very famous afterward.
211
533260
2000
i ostati vrlo, vrlo poznati nakon toga.
08:55
So scientists also tend to get famous when they're much older.
212
535260
3000
Znanstvenici uglavnom, isto tako, postaju poznati kad ostare.
08:58
Like for instance, biologists and physics
213
538260
2000
Biolozi i fizičari, primjerice,
09:00
tend to be almost as famous as actors.
214
540260
2000
znaju biti gotovo jednako slavni kao i glumci.
09:02
One mistake you should not do is become a mathematician.
215
542260
3000
Trebate izbjeći samo jednu pogrešku - da postanete matematičar.
09:05
(Laughter)
216
545260
2000
(Smijeh)
09:07
If you do that,
217
547260
2000
Ako to učinite,
09:09
you might think, "Oh great. I'm going to do my best work when I'm in my 20s."
218
549260
3000
možda ćete pomisliti: "Odlično, u 20-ima ću napraviti napraviti svoje najbolje radove."
09:12
But guess what, nobody will really care.
219
552260
2000
No, znate što, nikoga neće biti briga.
09:14
(Laughter)
220
554260
3000
(Smijeh)
09:17
ELA: There are more sobering notes
221
557260
2000
ELA: Postoje i neke ozbiljnije činjenice
09:19
among the n-grams.
222
559260
2000
među n-gramima.
09:21
For instance, here's the trajectory of Marc Chagall,
223
561260
2000
Primjerice, evo putanje Marca Chagalla,
09:23
an artist born in 1887.
224
563260
2000
umjetnika rođenog 1887. godine.
09:25
And this looks like the normal trajectory of a famous person.
225
565260
3000
Ovo izgleda kao normalna putanja poznate osobe.
09:28
He gets more and more and more famous,
226
568260
4000
Postaje sve poznatiji i poznatiji,
09:32
except if you look in German.
227
572260
2000
osim ako gledate za njemački jezik.
09:34
If you look in German, you see something completely bizarre,
228
574260
2000
Ako gledate za njemački, vidjet ćete nešto vrlo bizarno,
09:36
something you pretty much never see,
229
576260
2000
nešto što gotovo nikad ne vidite,
09:38
which is he becomes extremely famous
230
578260
2000
a to je da postaje iznimno poznat,
09:40
and then all of a sudden plummets,
231
580260
2000
a nakon toga mu popularnost iznenada padne,
09:42
going through a nadir between 1933 and 1945,
232
582260
3000
pri čemu su najniže točke bile između 1933. i 1945. godine,
09:45
before rebounding afterward.
233
585260
3000
nakon čega mu se opet vratila popularnost.
09:48
And of course, what we're seeing
234
588260
2000
Naravno, on što zapravo vidimo
09:50
is the fact Marc Chagall was a Jewish artist
235
590260
3000
jest činjenica da je Marc Chagall bio židovski umjetnik
09:53
in Nazi Germany.
236
593260
2000
u nacističkoj Njemačkoj.
09:55
Now these signals
237
595260
2000
Ovi su signali
09:57
are actually so strong
238
597260
2000
zapravo toliko jaki
09:59
that we don't need to know that someone was censored.
239
599260
3000
da ne trebamo ni znati da su nekoga cenzurirali.
10:02
We can actually figure it out
240
602260
2000
Zapravo to možemo zaključiti
10:04
using really basic signal processing.
241
604260
2000
koristeći osnovnu obradu znakova.
10:06
Here's a simple way to do it.
242
606260
2000
Evo jednostavnog načina kako to učiniti.
10:08
Well, a reasonable expectation
243
608260
2000
Razumno je za očekivati
10:10
is that somebody's fame in a given period of time
244
610260
2000
da će nečija slava u određenom razdoblju
10:12
should be roughly the average of their fame before
245
612260
2000
biti otprilike prosjek slave te osobe
10:14
and their fame after.
246
614260
2000
prije i nakon tog razdoblja.
10:16
So that's sort of what we expect.
247
616260
2000
To je otprilike ono što mi očekujemo.
10:18
And we compare that to the fame that we observe.
248
618260
3000
I to uspoređujemo sa slavom koju promatramo.
10:21
And we just divide one by the other
249
621260
2000
Samo podijelimo jedno drugim
10:23
to produce something we call a suppression index.
250
623260
2000
kako bismo dobili takozvani indeks zabrane.
10:25
If the suppression index is very, very, very small,
251
625260
3000
Ako je indeks zabrane vrlo, vrlo, vrlo malen,
10:28
then you very well might be being suppressed.
252
628260
2000
onda ste vrlo vjerojatno bili zabranjeni.
10:30
If it's very large, maybe you're benefiting from propaganda.
253
630260
3000
Ako je vrlo velik, možda profitirate od propagande.
10:34
JM: Now you can actually look at
254
634260
2000
JM: Zapravo možete promatrati
10:36
the distribution of suppression indexes over whole populations.
255
636260
3000
raspored indeksa zabrane unutar populacija.
10:39
So for instance, here --
256
639260
2000
Na primjer, ovdje --
10:41
this suppression index is for 5,000 people
257
641260
2000
ovo je indeks zabrane za 5.000 ljudi
10:43
picked in English books where there's no known suppression --
258
643260
2000
odabranih u engleskim knjigama u kojima nije zabilježeno zabranjivanje --
10:45
it would be like this, basically tightly centered on one.
259
645260
2000
bilo bi ovako, usko centrirano oko jednog.
10:47
What you expect is basically what you observe.
260
647260
2000
Ono što očekujete u biti je ono što i vidite.
10:49
This is distribution as seen in Germany --
261
649260
2000
Ovo je raspored za Njemačku --
10:51
very different, it's shifted to the left.
262
651260
2000
vrlo različito, pomaknuto je ulijevo.
10:53
People talked about it twice less as it should have been.
263
653260
3000
Ljudi su o tome razgovarali upola manje nego što su trebali.
10:56
But much more importantly, the distribution is much wider.
264
656260
2000
No, mnogo je važnije da je raspored širi.
10:58
There are many people who end up on the far left on this distribution
265
658260
3000
Ima mnogo ljudi koji su sasvim na lijevoj strani rasporeda
11:01
who are talked about 10 times fewer than they should have been.
266
661260
3000
i o kojima se govori 10 puta manje nego što bi se trebalo.
11:04
But then also many people on the far right
267
664260
2000
Ali isto tako ima mnogo ljudi na sasvim desnoj strani
11:06
who seem to benefit from propaganda.
268
666260
2000
koji, izgleda, profitiraju od propagande.
11:08
This picture is the hallmark of censorship in the book record.
269
668260
3000
Ova je slika glavni simbol cenzure u knjigama.
11:11
ELA: So culturomics
270
671260
2000
ELA: Dakle, kulturomika
11:13
is what we call this method.
271
673260
2000
jest ime koje smo dali ovoj metodi.
11:15
It's kind of like genomics.
272
675260
2000
Nalikuje na genomiku.
11:17
Except genomics is a lens on biology
273
677260
2000
Osim što je genomika pogled na biologiju,
11:19
through the window of the sequence of bases in the human genome.
274
679260
3000
pogled na slijed baza u ljudskom genomu.
11:22
Culturomics is similar.
275
682260
2000
Kulturomika je slična tome.
11:24
It's the application of massive-scale data collection analysis
276
684260
3000
To je primjena analize ogromnog skupa podataka
11:27
to the study of human culture.
277
687260
2000
na proučavanje ljudske kulture.
11:29
Here, instead of through the lens of a genome,
278
689260
2000
Ovdje, umjesto da promatramo genom,
11:31
through the lens of digitized pieces of the historical record.
279
691260
3000
promatramo digitalizirane dijelove povijesnih zapisa.
11:34
The great thing about culturomics
280
694260
2000
Ono što je odlično kod kulturomike
11:36
is that everyone can do it.
281
696260
2000
jest to da se svi mogu njome baviti.
11:38
Why can everyone do it?
282
698260
2000
Zašto se svi mogu njome baviti?
11:40
Everyone can do it because three guys,
283
700260
2000
Svi se mogu njome baviti jer su tri tipa,
11:42
Jon Orwant, Matt Gray and Will Brockman over at Google,
284
702260
3000
Jon Orwant, Matt Gray i Will Brockman iz Googlea
11:45
saw the prototype of the Ngram Viewer,
285
705260
2000
vidjeli prototip preglednika Ngram
11:47
and they said, "This is so fun.
286
707260
2000
i rekli: "Ovo je tako zabavno.
11:49
We have to make this available for people."
287
709260
3000
Moramo ljudima omogućiti pristup tome."
11:52
So in two weeks flat -- the two weeks before our paper came out --
288
712260
2000
Za samo dva tjedna -- dva tjedna prije nego nam je objavljen članak --
11:54
they coded up a version of the Ngram Viewer for the general public.
289
714260
3000
iskodirali su verziju pregledika Ngram za javnost.
11:57
And so you too can type in any word or phrase that you're interested in
290
717260
3000
Tako da i vi možete unijeti bilo koju riječ ili izraz koji vas zanima
12:00
and see its n-gram immediately --
291
720260
2000
i odmah vidjeti njegove n-grame --
12:02
also browse examples of all the various books
292
722260
2000
isto tako možete pregledavati primjere iz svih knjiga
12:04
in which your n-gram appears.
293
724260
2000
u kojima se pojavljuje vaš n-gram.
12:06
JM: Now this was used over a million times on the first day,
294
726260
2000
JM: Ovaj je preglednik korišten više od milijun puta prvog dana,
12:08
and this is really the best of all the queries.
295
728260
2000
i ovo je zapravo najbolji od svih upita.
12:10
So people want to be their best, put their best foot forward.
296
730260
3000
Ljudi žele dati sve od sebe, pokazati se u najboljem svijetlu.
12:13
But it turns out in the 18th century, people didn't really care about that at all.
297
733260
3000
Ali ispada da u 18. stoljeću ljudima uopće nije bilo stalo do toga.
12:16
They didn't want to be their best, they wanted to be their beft.
298
736260
3000
Nisu željeli dati sve od sebe, željeli su dati fve od sebe.
12:19
So what happened is, of course, this is just a mistake.
299
739260
3000
Naravno, ovdje se radi samo o pogrešci.
12:22
It's not that strove for mediocrity,
300
742260
2000
Nije da su težili osrednjosti,
12:24
it's just that the S used to be written differently, kind of like an F.
301
744260
3000
već se S prije pisao drugačije, pomalo nalik na F.
12:27
Now of course, Google didn't pick this up at the time,
302
747260
3000
Naravno, Google to nije prepoznao
12:30
so we reported this in the science article that we wrote.
303
750260
3000
i to smo napomenuli u znanstvenom članku koji smo napisali.
12:33
But it turns out this is just a reminder
304
753260
2000
No, ispada da je ovo samo podsjetnik da,
12:35
that, although this is a lot of fun,
305
755260
2000
iako je ovo vrlo zabavno,
12:37
when you interpret these graphs, you have to be very careful,
306
757260
2000
kad tumačite ove grafove, morate biti vrlo oprezni
12:39
and you have to adopt the base standards in the sciences.
307
759260
3000
i morate usvojiti ove temeljne znastvene standarde.
12:42
ELA: People have been using this for all kinds of fun purposes.
308
762260
3000
ELA: Ljudi ovo koriste za razne zabavne namjene.
12:45
(Laughter)
309
765260
7000
(Smijeh)
12:52
Actually, we're not going to have to talk,
310
772260
2000
Zapravo, ne moramo ni govoriti,
12:54
we're just going to show you all the slides and remain silent.
311
774260
3000
samo ćemo vam pokazati sve slajdove i šutjeti.
12:57
This person was interested in the history of frustration.
312
777260
3000
Ovu osobu je zanimala povijest frustracije.
13:00
There's various types of frustration.
313
780260
3000
Postoje različite vrste frustracija.
13:03
If you stub your toe, that's a one A "argh."
314
783260
3000
Kad se udarite u nožni prst, to je "argh" s jednim A.
13:06
If the planet Earth is annihilated by the Vogons
315
786260
2000
Ako plant Zemlju unište Vogonci
13:08
to make room for an interstellar bypass,
316
788260
2000
kako bi napravili mjesta za međuzvjezdanu zaobilaznicu,
13:10
that's an eight A "aaaaaaaargh."
317
790260
2000
to je "aaaaaaaargh" s 8 A-ova.
13:12
This person studies all the "arghs,"
318
792260
2000
Ova osoba proučava sve "arghove",
13:14
from one through eight A's.
319
794260
2000
od jednog do 8 A-ova.
13:16
And it turns out
320
796260
2000
Ispada da su
13:18
that the less-frequent "arghs"
321
798260
2000
manje učestali "arghovi"
13:20
are, of course, the ones that correspond to things that are more frustrating --
322
800260
3000
naravno, oni koji odgovaraju stvarima koje izazivaju veću frustraciju --
13:23
except, oddly, in the early 80s.
323
803260
3000
osim, čudno, početkom 80-ih.
13:26
We think that might have something to do with Reagan.
324
806260
2000
Mislimo da to možda ima veze s Reaganom.
13:28
(Laughter)
325
808260
2000
(Smijeh)
13:30
JM: There are many usages of this data,
326
810260
3000
JM: Ovi se podaci mogu koristiti za razne namjene,
13:33
but the bottom line is that the historical record is being digitized.
327
813260
3000
ali ono što je bitno jest da se povijesni zapisi digitaliziraju.
13:36
Google has started to digitize 15 million books.
328
816260
2000
Google je počeo digitalizirati 15 milijuna knjiga.
13:38
That's 12 percent of all the books that have ever been published.
329
818260
2000
To je 12 posto svih knjiga koje su ikad izdane.
13:40
It's a sizable chunk of human culture.
330
820260
3000
To je povelik dio ljudske kulture.
13:43
There's much more in culture: there's manuscripts, there newspapers,
331
823260
3000
U kulturi ima još mnogo toga: rukopisi, novine,
13:46
there's things that are not text, like art and paintings.
332
826260
2000
postoje stvari koje nisu tekst, poput umjetnosti i slika.
13:48
These all happen to be on our computers,
333
828260
2000
To će sve biti na našim računalima,
13:50
on computers across the world.
334
830260
2000
na računalima u cijelome svijetu.
13:52
And when that happens, that will transform the way we have
335
832260
3000
A kad se to dogodi, promijenit će se način na koji smo
13:55
to understand our past, our present and human culture.
336
835260
2000
shvaćali svoju prošlost, svoju sadašnjost i ljudsku kulturu.
13:57
Thank you very much.
337
837260
2000
Hvala vam puno.
13:59
(Applause)
338
839260
3000
(Pljesak)
O ovoj web stranici

Ova stranica će vas upoznati s YouTube videozapisima koji su korisni za učenje engleskog jezika. Vidjet ćete lekcije engleskog koje vode vrhunski profesori iz cijelog svijeta. Dvaput kliknite na engleske titlove prikazane na svakoj video stranici da biste reproducirali video s tog mjesta. Titlovi se pomiču sinkronizirano s reprodukcijom videozapisa. Ako imate bilo kakvih komentara ili zahtjeva, obratite nam se putem ovog obrasca za kontakt.

https://forms.gle/WvT1wiN1qDtmnspy7