Visualizing the world's Twitter data - Jer Thorp

68,370 views ・ 2013-02-21

TED-Ed


请双击下面的英文字幕来播放视频。

00:00
Transcriber: Andrea McDonough Reviewer: Bedirhan Cinar
0
0
7000
翻译人员: Zhimin Lin 校对人员: QI YU
00:14
A couple of years ago I started using Twitter,
1
14668
2110
几年前我开始用推特,
00:16
and one of the things that really charmed me about Twitter
2
16778
3106
它最吸引我的一点
00:19
is that people would wake up in the morning
3
19884
2213
是人们会在早上起床的时候
00:22
and they would say, "Good morning!"
4
22097
2259
发一条推特说:“早上好!”
00:24
which I thought,
5
24356
1054
我觉得这个……
00:25
I'm a Canadian,
6
25410
1113
我是个加拿大人,
00:26
so I was a little bit,
7
26523
808
所以我有点,
00:27
I liked that politeness.
8
27331
1769
我喜欢这种礼貌。
00:29
And so, I'm also a giant nerd,
9
29100
2563
同时我也是个典型的技术男,
00:31
and so I wrote a computer program
10
31663
1417
于是我写了个
00:33
that would record 24 hours of everybody on Twitter
11
33080
3421
记录24小时里推特上 所有发了“早上好!”的人的程序。
00:36
saying, "Good morning!"
12
36501
1324
00:37
And then I asked myself my favorite question,
13
37825
2213
之后我问了自己一个我最喜欢的问题:
00:40
"What would that look like?"
14
40038
1639
“那会是个什么样子?”
00:41
Well, as it turns out, I think it would look something like this.
15
41677
3305
结果是,我想看起来像这样:
00:44
Right, so we'd see this wave of people
16
44982
2063
我们看到这个
00:47
saying, "Good morning!" across the world as they wake up.
17
47045
3445
由世界各地醒来时发“早上好!”的人所组成的人浪
00:50
Now the green people, these are people that wake up
18
50490
1794
这些绿色所代表的人
00:52
at around 8 o'clock in the morning,
19
52284
2075
在早上8点左右醒来。
00:54
Who wakes up at 8 o'clock or says, "Good morning!" at 8?
20
54359
3096
谁8点起床或者8点时发“早上好”?
00:57
And the orange people,
21
57455
859
这些橙色代表的人,
00:58
they say, "Good morning!" around 9.
22
58314
3579
他们在9点左右发“早上好!”。
01:01
And the red people, they say, "Good morning!" around 10.
23
61906
2845
之后这些红色代表的人, 他们在10点左右发“早上好!”。
01:04
Yeah, more at 10's than, more at 10's than 8's.
24
64751
3311
是的,10点的人比8点的多。
01:08
And actually if you look at this map,
25
68062
1127
而且事实上如果你观察这幅图,
01:09
we can learn a little bit about how people wake up
26
69189
1933
我们就能稍微了解世界各地 人们起床时间的不同。
01:11
in different parts of the world.
27
71122
1275
01:12
People on the West Coast, for example,
28
72397
1345
比如说西海岸的人
01:13
they wake up a little bit later
29
73742
1353
就比东海岸的人起得稍晚一些。
01:15
than those people on the East Coast.
30
75095
2965
01:18
But that's not all that people say on Twitter, right?
31
78060
2358
但人们在推特上不只是发这个,对吧?
01:20
We also get these really important tweets, like,
32
80418
2340
我们也收到这些非常重要的推特,诸如
01:22
"I just landed in Orlando!! [plane sign, plane sign]"
33
82758
4869
“我刚刚在奥兰多降落了!![飞机][飞机]”
01:27
Or, or, "I just landed in Texas [exclamation point]!"
34
87627
3518
或者,“我刚刚降落在德州(感叹号)”
01:31
Or "I just landed in Honduras!"
35
91145
2274
又或者,“我刚刚在洪都拉斯降落了!”
01:33
These lists, they go on and on and on,
36
93419
2140
这些内容没完没了,
01:35
all these people, right?
37
95559
1873
总有人在发这些,是吧?
01:37
So, on the outside, these people are just telling us
38
97432
2737
那么,表面上看,这些人只是告诉我们
01:40
something about how they're traveling.
39
100169
2369
一些他们如何旅行的信息。
01:42
But we know the truth, don't we?
40
102538
1802
但是我们知道真相,不是吗?
01:44
These people are show-offs!
41
104340
1901
这些人就是在炫耀!
01:46
They are showing off that they're in Cape Town and I'm not.
42
106241
4194
炫耀他们在开普敦而我不在!
01:50
So I thought, how can we take this vanity
43
110435
2652
于是我想,我们如何才能利用这些虚荣
01:53
and turn it into utility?
44
113087
1796
并将其转化成实用的东西?
01:54
So using a similar approach that I did with "Good morning,"
45
114883
3421
因此我用类似于处理“早上好”的方法
01:58
I mapped all those people's trips
46
118304
2259
将所有人的旅行制成统计图,
02:00
because I know where they're landing,
47
120563
2092
因为我知道他们在哪落地,
02:02
they just told me,
48
122655
1070
他们直接告诉我了,
02:03
and I know where they live
49
123725
1231
而且我知道他们在哪居住,
02:04
because they share that information on their Twitter profile.
50
124956
4012
因为他们推特的个人简介上都写着呢。
02:08
So what I'm able to do with 36 hours of Twitter
51
128968
3332
所以我得以在推特上花36个小时
02:12
is create a model of how people are traveling
52
132300
2921
来建立一个
关于世界各地的人在此期间如何旅行的模型。
02:15
around the world during that 36 hours.
53
135221
3018
02:18
And this is kind of a prototype
54
138239
1486
这是一种原始模型,
02:19
because I think if we listen to everybody
55
139725
2906
因为我认为如果我们留意推特和脸书
02:22
on Twitter and Facebook and the rest of our social media,
56
142631
2758
和其它一切社交媒体上的所有人,
02:25
we'd actually get a pretty clear picture
57
145389
1889
我们其实会获得一幅清晰反映
02:27
of how people are traveling from one place to the other,
58
147278
3240
人们如何在各地之间旅行的图像,
02:30
which is actually turns out to be a very useful thing for scientists,
59
150518
3170
而这幅图像事实上对科学家来说非常有用,
02:33
particularly those who are studying how disease is spread.
60
153688
3738
尤其是那些研究疾病扩散问题的专家。
02:37
So, I work upstairs in the New York Times,
61
157426
2187
我在《纽约时报》工作,
02:39
and for the last two years,
62
159613
1109
在过去两年里,
02:40
we've been working on a project called, "Cascade,"
63
160722
2101
我们一直做一个叫“Cascade”的项目,
02:42
which in some ways is kind of similar to this one.
64
162823
2649
它在某种程度上和这个模型很相似。
02:45
But instead of modeling how people move,
65
165472
2222
但是我们不是对人们如何流动
02:47
we're modeling how people talk.
66
167694
2168
而是对人们如何发表言论进行建模。
02:49
We're looking at what does a discussion look like.
67
169862
3178
我们在研究某一场讨论是看起来是怎样的。
02:53
Well, here's an example.
68
173040
1853
这里有一个例子
02:54
This is a discussion around an article called,
69
174893
2815
这是一场围绕一篇文章的讨论,
02:57
"The Island Where People Forget to Die".
70
177708
2009
《那个人们忘记死亡的小岛》。
02:59
It's about an island in Greece where people live
71
179717
1642
它描述了一个希腊的小岛,
03:01
a really, really, really, really, really, really long time.
72
181359
3070
岛上的人们都非常、非常、非常、非常、非常、非常长寿。
03:04
And what we're seeing here
73
184429
1063
这里我们所看到的
03:05
is we're seeing a conversation that's stemming
74
185492
1922
是一场从左下角那第一条推特开始 延伸开来的讨论。
03:07
from that first tweet down in the bottom, left-hand corner.
75
187414
3038
03:10
So we get to see the scope of this conversation
76
190452
2513
因此我们得以了解9小时里
03:12
over about 9 hours right now,
77
192965
2168
这场讨论的规模。
03:15
we're going to creep up to 12 hours here in a second.
78
195133
2350
我们来把时间跨度拉大到12小时。
03:17
But, we can also see what that conversation
79
197483
2319
我们也可以在三维的模式下
03:19
looks like in three dimensions.
80
199802
1802
观察这场讨论。
03:21
And that three-dimensional view is actually much more useful for us.
81
201604
3304
而且这种三维的视角其实对我们更加有用。
03:24
As humans, we are really used to things
82
204908
1289
因为作为人类,我们非常习惯于三维的事物。
03:26
that are structured as three dimensions.
83
206197
1902
03:28
So, we can look at those little off-shoots of conversation,
84
208099
2679
所以我们能够看着讨论的那些细小分支
03:30
we can find out what exactly happened.
85
210778
2562
来了解到底发生了什么。
03:33
And this is an interactive, exploratory tool
86
213340
1903
这是一个交互式的、探索式的工具,
03:35
so we can go through every step in the conversation.
87
215243
2534
所以我们可以仔细研究这个讨论的每一步。
03:37
We can look at who the people were,
88
217777
1366
我们可以看看这些都是什么人、
03:39
what they said,
89
219143
1060
他们说了什么、
03:40
how old they are,
90
220203
1109
他们年纪多大、
03:41
where they live,
91
221312
1167
他们住在哪里、
03:42
who follows them,
92
222479
992
谁关注了他们,
03:43
and so on, and so on, and so on.
93
223471
2479
等等等等。
03:45
So, the Times creates about 6,500 pieces of content every month,
94
225950
4882
《纽约时报》每个月产生大约6500篇文章,
03:50
and we can model every single one
95
230832
1658
我们可以为每一篇所引发的议论
03:52
of the conversations that happen around them.
96
232490
1732
都建立一个模型。
03:54
And they look somewhat different.
97
234222
1448
它们看起来不太一样。
03:55
Depending on the story
98
235670
1167
取决于故事本身
03:56
and depending on how fast people are talking about it
99
236837
2727
以及它引起人们议论的速度
03:59
and how far the conversation spreads,
100
239564
1835
以及议论传播的范围,
04:01
these structures, which I call these conversational architectures,
101
241399
4218
这些结构,我管它们叫“讨论大楼”,
04:05
end up looking different.
102
245617
2455
最终看起来不尽相同。
04:08
So, these projects that I've shown you,
103
248072
2102
我向你们展示的这些项目
04:10
I think they all involve the same thing:
104
250174
2364
我认为它们在做同样的事情:
04:12
we can take small pieces of data
105
252538
2075
我们可以将碎片化的数据
04:14
and by putting them together,
106
254613
1565
拼凑起来,
04:16
we can generate more value,
107
256178
2236
我们可以产生更大的价值,
04:18
we can do more exciting things with them.
108
258414
2103
我们可以用它们来做更激动人心的事情。
04:20
But so far we've only talked about Twitter, right?
109
260517
2204
但是目前为止我们只提到了推特,对吧?
04:22
And Twitter isn't all the data.
110
262721
1965
而推特不是数据的全部。
04:24
We learned a moment ago
111
264686
1202
正如刚才讨论的
04:25
that there is tons and tons,
112
265888
1248
网络上有很多
04:27
tons more data out there.
113
267136
2224
很多很多数据。
04:29
And specifically, I want you to think about one type of data
114
269360
3089
我尤其要向你们介绍其中一种,
04:32
because all of you guys,
115
272449
1942
因为你们所有人,
04:34
everybody in this audience, we,
116
274391
1597
这里的每一位听众,我们,
04:35
we, me as well,
117
275988
1640
我们,包括我在内,
04:37
are data-making machines.
118
277629
2545
都是产生数据的机器。
04:40
We are producing data all the time.
119
280174
2534
我们时时刻刻都在产生数据。
04:42
Every single one of us, we're producing data.
120
282708
2205
我们每一个人,我们都在产生数据。
04:44
Somebody else, though, is storing that data.
121
284913
2307
也有一些人在储存这些数据。
04:47
Usually we put our trust into companies to store that data,
122
287220
5538
通常来说我们信任各种储存数据的公司,
04:52
but what I want to suggest here
123
292758
2532
但是我要在这里提出的是
04:55
is that rather than putting our trust
124
295290
1774
相比起信任那些公司,
04:57
in companies to store that data,
125
297064
1735
让它们储存数据,
04:58
we should put the trust in ourselves
126
298799
1688
我们应该相信我们自己
05:00
because we actually own that data.
127
300487
1919
因为我们拥有那些数据。
05:02
Right, that is something we should remember.
128
302406
1867
这是我们应该牢记的。
05:04
Everything that someone else measures about you,
129
304273
2927
他人对你的任何估量
05:07
you actually own.
130
307200
2111
都属于你。
05:09
So, it's my hope,
131
309311
1167
因此,我希望——
05:10
maybe because I'm a Canadian,
132
310478
2190
或许因为我是个加拿大人——
05:12
that all of us can come together
133
312668
1731
我们所有人能带着 我们储存的宝贵数据走到一起,
05:14
with this really valuable data that we've been storing,
134
314399
3786
05:18
and we can collectively launch that data
135
318185
2878
我们能一起利用那些数据
05:21
toward some of the world's most difficulty problems
136
321063
2841
来解决某些世界上最棘手的难题,
05:23
because big data can solve big problems,
137
323904
3115
因为大数据能解决大问题,
05:27
but I think it can do it the best
138
327019
1635
但是我认为如果我们每个人都参与进来
05:28
if it's all of us who are in control.
139
328654
2870
将能使它发挥最大的效用。
05:31
Thank you.
140
331524
1502
谢谢。
关于本网站

这个网站将向你介绍对学习英语有用的YouTube视频。你将看到来自世界各地的一流教师教授的英语课程。双击每个视频页面上显示的英文字幕,即可从那里播放视频。字幕会随着视频的播放而同步滚动。如果你有任何意见或要求,请使用此联系表与我们联系。

https://forms.gle/WvT1wiN1qDtmnspy7