Jennifer Golbeck: The curly fry conundrum: Why social media "likes" say more than you might think

376,125 views

2014-04-03 ・ TED


New videos

Jennifer Golbeck: The curly fry conundrum: Why social media "likes" say more than you might think

376,125 views ・ 2014-04-03

TED


Please double-click on the English subtitles below to play the video.

00:12
If you remember that first decade of the web,
0
12738
1997
00:14
it was really a static place.
1
14735
2255
00:16
You could go online, you could look at pages,
2
16990
2245
00:19
and they were put up either by organizations
3
19235
2513
00:21
who had teams to do it
4
21748
1521
00:23
or by individuals who were really tech-savvy
5
23269
2229
00:25
for the time.
6
25498
1737
00:27
And with the rise of social media
7
27235
1575
00:28
and social networks in the early 2000s,
8
28810
2399
00:31
the web was completely changed
9
31209
2149
00:33
to a place where now the vast majority of content
10
33358
3608
00:36
we interact with is put up by average users,
11
36966
3312
00:40
either in YouTube videos or blog posts
12
40278
2697
00:42
or product reviews or social media postings.
13
42975
3315
00:46
And it's also become a much more interactive place,
14
46290
2347
00:48
where people are interacting with others,
15
48637
2637
00:51
they're commenting, they're sharing,
16
51274
1696
00:52
they're not just reading.
17
52970
1614
00:54
So Facebook is not the only place you can do this,
18
54584
1866
00:56
but it's the biggest,
19
56450
1098
00:57
and it serves to illustrate the numbers.
20
57548
1784
00:59
Facebook has 1.2 billion users per month.
21
59332
3477
01:02
So half the Earth's Internet population
22
62809
1930
01:04
is using Facebook.
23
64739
1653
01:06
They are a site, along with others,
24
66392
1932
01:08
that has allowed people to create an online persona
25
68324
3219
01:11
with very little technical skill,
26
71543
1782
01:13
and people responded by putting huge amounts
27
73325
2476
01:15
of personal data online.
28
75801
1983
01:17
So the result is that we have behavioral,
29
77784
2543
01:20
preference, demographic data
30
80327
1986
01:22
for hundreds of millions of people,
31
82313
2101
01:24
which is unprecedented in history.
32
84414
2026
01:26
And as a computer scientist, what this means is that
33
86440
2560
01:29
I've been able to build models
34
89000
1664
01:30
that can predict all sorts of hidden attributes
35
90664
2322
01:32
for all of you that you don't even know
36
92986
2284
01:35
you're sharing information about.
37
95270
2202
01:37
As scientists, we use that to help
38
97472
2382
01:39
the way people interact online,
39
99854
2114
01:41
but there's less altruistic applications,
40
101968
2499
01:44
and there's a problem in that users don't really
41
104467
2381
01:46
understand these techniques and how they work,
42
106848
2470
01:49
and even if they did, they don't have a lot of control over it.
43
109318
3128
01:52
So what I want to talk to you about today
44
112446
1490
01:53
is some of these things that we're able to do,
45
113936
2702
01:56
and then give us some ideas of how we might go forward
46
116638
2763
01:59
to move some control back into the hands of users.
47
119401
2769
02:02
So this is Target, the company.
48
122170
1586
02:03
I didn't just put that logo
49
123756
1324
02:05
on this poor, pregnant woman's belly.
50
125080
2170
02:07
You may have seen this anecdote that was printed
51
127250
1840
02:09
in Forbes magazine where Target
52
129090
2061
02:11
sent a flyer to this 15-year-old girl
53
131151
2361
02:13
with advertisements and coupons
54
133512
1710
02:15
for baby bottles and diapers and cribs
55
135222
2554
02:17
two weeks before she told her parents
56
137776
1684
02:19
that she was pregnant.
57
139460
1864
02:21
Yeah, the dad was really upset.
58
141324
2704
02:24
He said, "How did Target figure out
59
144028
1716
02:25
that this high school girl was pregnant
60
145744
1824
02:27
before she told her parents?"
61
147568
1960
02:29
It turns out that they have the purchase history
62
149528
2621
02:32
for hundreds of thousands of customers
63
152149
2301
02:34
and they compute what they call a pregnancy score,
64
154450
2730
02:37
which is not just whether or not a woman's pregnant,
65
157180
2332
02:39
but what her due date is.
66
159512
1730
02:41
And they compute that
67
161242
1304
02:42
not by looking at the obvious things,
68
162546
1768
02:44
like, she's buying a crib or baby clothes,
69
164314
2512
02:46
but things like, she bought more vitamins
70
166826
2943
02:49
than she normally had,
71
169769
1717
02:51
or she bought a handbag
72
171486
1464
02:52
that's big enough to hold diapers.
73
172950
1711
02:54
And by themselves, those purchases don't seem
74
174661
1910
02:56
like they might reveal a lot,
75
176571
2469
02:59
but it's a pattern of behavior that,
76
179040
1978
03:01
when you take it in the context of thousands of other people,
77
181018
3117
03:04
starts to actually reveal some insights.
78
184135
2757
03:06
So that's the kind of thing that we do
79
186892
1793
03:08
when we're predicting stuff about you on social media.
80
188685
2567
03:11
We're looking for little patterns of behavior that,
81
191252
2796
03:14
when you detect them among millions of people,
82
194048
2682
03:16
lets us find out all kinds of things.
83
196730
2706
03:19
So in my lab and with colleagues,
84
199436
1747
03:21
we've developed mechanisms where we can
85
201183
1777
03:22
quite accurately predict things
86
202960
1560
03:24
like your political preference,
87
204520
1725
03:26
your personality score, gender, sexual orientation,
88
206245
3752
03:29
religion, age, intelligence,
89
209997
2873
03:32
along with things like
90
212870
1394
03:34
how much you trust the people you know
91
214264
1937
03:36
and how strong those relationships are.
92
216201
1804
03:38
We can do all of this really well.
93
218005
1785
03:39
And again, it doesn't come from what you might
94
219790
2197
03:41
think of as obvious information.
95
221987
2102
03:44
So my favorite example is from this study
96
224089
2281
03:46
that was published this year
97
226370
1240
03:47
in the Proceedings of the National Academies.
98
227610
1795
03:49
If you Google this, you'll find it.
99
229405
1285
03:50
It's four pages, easy to read.
100
230690
1872
03:52
And they looked at just people's Facebook likes,
101
232562
3003
03:55
so just the things you like on Facebook,
102
235565
1920
03:57
and used that to predict all these attributes,
103
237485
2138
03:59
along with some other ones.
104
239623
1645
04:01
And in their paper they listed the five likes
105
241268
2961
04:04
that were most indicative of high intelligence.
106
244229
2787
04:07
And among those was liking a page
107
247016
2324
04:09
for curly fries. (Laughter)
108
249340
1905
04:11
Curly fries are delicious,
109
251245
2093
04:13
but liking them does not necessarily mean
110
253338
2530
04:15
that you're smarter than the average person.
111
255868
2080
04:17
So how is it that one of the strongest indicators
112
257948
3207
04:21
of your intelligence
113
261155
1570
04:22
is liking this page
114
262725
1447
04:24
when the content is totally irrelevant
115
264172
2252
04:26
to the attribute that's being predicted?
116
266424
2527
04:28
And it turns out that we have to look at
117
268951
1584
04:30
a whole bunch of underlying theories
118
270535
1618
04:32
to see why we're able to do this.
119
272153
2569
04:34
One of them is a sociological theory called homophily,
120
274722
2913
04:37
which basically says people are friends with people like them.
121
277635
3092
04:40
So if you're smart, you tend to be friends with smart people,
122
280727
2014
04:42
and if you're young, you tend to be friends with young people,
123
282741
2630
04:45
and this is well established
124
285371
1627
04:46
for hundreds of years.
125
286998
1745
04:48
We also know a lot
126
288743
1232
04:49
about how information spreads through networks.
127
289975
2550
04:52
It turns out things like viral videos
128
292525
1754
04:54
or Facebook likes or other information
129
294279
2406
04:56
spreads in exactly the same way
130
296685
1888
04:58
that diseases spread through social networks.
131
298573
2454
05:01
So this is something we've studied for a long time.
132
301027
1791
05:02
We have good models of it.
133
302818
1576
05:04
And so you can put those things together
134
304394
2157
05:06
and start seeing why things like this happen.
135
306551
3088
05:09
So if I were to give you a hypothesis,
136
309639
1814
05:11
it would be that a smart guy started this page,
137
311453
3227
05:14
or maybe one of the first people who liked it
138
314680
1939
05:16
would have scored high on that test.
139
316619
1736
05:18
And they liked it, and their friends saw it,
140
318355
2288
05:20
and by homophily, we know that he probably had smart friends,
141
320643
3122
05:23
and so it spread to them, and some of them liked it,
142
323765
3056
05:26
and they had smart friends,
143
326821
1189
05:28
and so it spread to them,
144
328010
807
05:28
and so it propagated through the network
145
328817
1973
05:30
to a host of smart people,
146
330790
2569
05:33
so that by the end, the action
147
333359
2056
05:35
of liking the curly fries page
148
335415
2544
05:37
is indicative of high intelligence,
149
337959
1615
05:39
not because of the content,
150
339574
1803
05:41
but because the actual action of liking
151
341377
2522
05:43
reflects back the common attributes
152
343899
1900
05:45
of other people who have done it.
153
345799
2468
05:48
So this is pretty complicated stuff, right?
154
348267
2897
05:51
It's a hard thing to sit down and explain
155
351164
2199
05:53
to an average user, and even if you do,
156
353363
2848
05:56
what can the average user do about it?
157
356211
2188
05:58
How do you know that you've liked something
158
358399
2048
06:00
that indicates a trait for you
159
360447
1492
06:01
that's totally irrelevant to the content of what you've liked?
160
361939
3545
06:05
There's a lot of power that users don't have
161
365484
2546
06:08
to control how this data is used.
162
368030
2230
06:10
And I see that as a real problem going forward.
163
370260
3112
06:13
So I think there's a couple paths
164
373372
1977
06:15
that we want to look at
165
375349
1001
06:16
if we want to give users some control
166
376350
1910
06:18
over how this data is used,
167
378260
1740
06:20
because it's not always going to be used
168
380000
1940
06:21
for their benefit.
169
381940
1381
06:23
An example I often give is that,
170
383321
1422
06:24
if I ever get bored being a professor,
171
384743
1646
06:26
I'm going to go start a company
172
386389
1653
06:28
that predicts all of these attributes
173
388042
1454
06:29
and things like how well you work in teams
174
389496
1602
06:31
and if you're a drug user, if you're an alcoholic.
175
391098
2671
06:33
We know how to predict all that.
176
393769
1440
06:35
And I'm going to sell reports
177
395209
1761
06:36
to H.R. companies and big businesses
178
396970
2100
06:39
that want to hire you.
179
399070
2273
06:41
We totally can do that now.
180
401343
1177
06:42
I could start that business tomorrow,
181
402520
1788
06:44
and you would have absolutely no control
182
404308
2052
06:46
over me using your data like that.
183
406360
2138
06:48
That seems to me to be a problem.
184
408498
2292
06:50
So one of the paths we can go down
185
410790
1910
06:52
is the policy and law path.
186
412700
2032
06:54
And in some respects, I think that that would be most effective,
187
414732
3046
06:57
but the problem is we'd actually have to do it.
188
417778
2756
07:00
Observing our political process in action
189
420534
2780
07:03
makes me think it's highly unlikely
190
423314
2379
07:05
that we're going to get a bunch of representatives
191
425693
1597
07:07
to sit down, learn about this,
192
427290
1986
07:09
and then enact sweeping changes
193
429276
2106
07:11
to intellectual property law in the U.S.
194
431382
2157
07:13
so users control their data.
195
433539
2461
07:16
We could go the policy route,
196
436000
1304
07:17
where social media companies say,
197
437304
1479
07:18
you know what? You own your data.
198
438783
1402
07:20
You have total control over how it's used.
199
440185
2489
07:22
The problem is that the revenue models
200
442674
1848
07:24
for most social media companies
201
444522
1724
07:26
rely on sharing or exploiting users' data in some way.
202
446246
4031
07:30
It's sometimes said of Facebook that the users
203
450277
1833
07:32
aren't the customer, they're the product.
204
452110
2528
07:34
And so how do you get a company
205
454638
2714
07:37
to cede control of their main asset
206
457352
2558
07:39
back to the users?
207
459910
1249
07:41
It's possible, but I don't think it's something
208
461159
1701
07:42
that we're going to see change quickly.
209
462860
2320
07:45
So I think the other path
210
465180
1500
07:46
that we can go down that's going to be more effective
211
466680
2288
07:48
is one of more science.
212
468968
1508
07:50
It's doing science that allowed us to develop
213
470476
2510
07:52
all these mechanisms for computing
214
472986
1750
07:54
this personal data in the first place.
215
474736
2052
07:56
And it's actually very similar research
216
476788
2106
07:58
that we'd have to do
217
478894
1438
08:00
if we want to develop mechanisms
218
480332
2386
08:02
that can say to a user,
219
482718
1421
08:04
"Here's the risk of that action you just took."
220
484139
2229
08:06
By liking that Facebook page,
221
486368
2080
08:08
or by sharing this piece of personal information,
222
488448
2535
08:10
you've now improved my ability
223
490983
1502
08:12
to predict whether or not you're using drugs
224
492485
2086
08:14
or whether or not you get along well in the workplace.
225
494571
2862
08:17
And that, I think, can affect whether or not
226
497433
1848
08:19
people want to share something,
227
499281
1510
08:20
keep it private, or just keep it offline altogether.
228
500791
3239
08:24
We can also look at things like
229
504030
1563
08:25
allowing people to encrypt data that they upload,
230
505593
2728
08:28
so it's kind of invisible and worthless
231
508321
1855
08:30
to sites like Facebook
232
510176
1431
08:31
or third party services that access it,
233
511607
2629
08:34
but that select users who the person who posted it
234
514236
3247
08:37
want to see it have access to see it.
235
517483
2670
08:40
This is all super exciting research
236
520153
2166
08:42
from an intellectual perspective,
237
522319
1620
08:43
and so scientists are going to be willing to do it.
238
523939
1859
08:45
So that gives us an advantage over the law side.
239
525798
3610
08:49
One of the problems that people bring up
240
529408
1725
08:51
when I talk about this is, they say,
241
531133
1595
08:52
you know, if people start keeping all this data private,
242
532728
2646
08:55
all those methods that you've been developing
243
535374
2113
08:57
to predict their traits are going to fail.
244
537487
2653
09:00
And I say, absolutely, and for me, that's success,
245
540140
3520
09:03
because as a scientist,
246
543660
1786
09:05
my goal is not to infer information about users,
247
545446
3688
09:09
it's to improve the way people interact online.
248
549134
2767
09:11
And sometimes that involves inferring things about them,
249
551901
3218
09:15
but if users don't want me to use that data,
250
555119
3022
09:18
I think they should have the right to do that.
251
558141
2038
09:20
I want users to be informed and consenting
252
560179
2651
09:22
users of the tools that we develop.
253
562830
2112
09:24
And so I think encouraging this kind of science
254
564942
2952
09:27
and supporting researchers
255
567894
1346
09:29
who want to cede some of that control back to users
256
569240
3023
09:32
and away from the social media companies
257
572263
2311
09:34
means that going forward, as these tools evolve
258
574574
2671
09:37
and advance,
259
577245
1476
09:38
means that we're going to have an educated
260
578721
1414
09:40
and empowered user base,
261
580135
1694
09:41
and I think all of us can agree
262
581829
1100
09:42
that that's a pretty ideal way to go forward.
263
582929
2564
09:45
Thank you.
264
585493
2184
09:47
(Applause)
265
587677
3080
About this website

This site will introduce you to YouTube videos that are useful for learning English. You will see English lessons taught by top-notch teachers from around the world. Double-click on the English subtitles displayed on each video page to play the video from there. The subtitles scroll in sync with the video playback. If you have any comments or requests, please contact us using this contact form.

https://forms.gle/WvT1wiN1qDtmnspy7