Ben Wellington: How we found the worst place to park in New York City — using big data

80,081 views

2015-02-26 ・ TED


New videos

Ben Wellington: How we found the worst place to park in New York City — using big data

80,081 views ・ 2015-02-26

TED


Please double-click on the English subtitles below to play the video.

00:12
Six thousand miles of road,
0
12711
2820
00:15
600 miles of subway track,
1
15531
2203
00:17
400 miles of bike lanes
2
17734
1644
00:19
and a half a mile of tram track,
3
19378
1821
00:21
if you've ever been to Roosevelt Island.
4
21199
1953
00:23
These are the numbers that make up the infrastructure of New York City.
5
23152
3334
00:26
These are the statistics of our infrastructure.
6
26486
2619
00:29
They're the kind of numbers you can find released in reports by city agencies.
7
29105
3706
00:32
For example, the Department of Transportation will probably tell you
8
32811
3199
00:36
how many miles of road they maintain.
9
36010
1781
00:37
The MTA will boast how many miles of subway track there are.
10
37791
2821
00:40
Most city agencies give us statistics.
11
40612
1807
00:42
This is from a report this year
12
42419
1483
00:43
from the Taxi and Limousine Commission,
13
43902
1892
00:45
where we learn that there's about 13,500 taxis here in New York City.
14
45794
3276
00:49
Pretty interesting, right?
15
49070
1290
00:50
But did you ever think about where these numbers came from?
16
50360
2784
00:53
Because for these numbers to exist, someone at the city agency
17
53144
2903
00:56
had to stop and say, hmm, here's a number that somebody might want want to know.
18
56047
3880
00:59
Here's a number that our citizens want to know.
19
59927
2250
01:02
So they go back to their raw data,
20
62177
1830
01:04
they count, they add, they calculate,
21
64007
1797
01:05
and then they put out reports,
22
65804
1467
01:07
and those reports will have numbers like this.
23
67271
2177
01:09
The problem is, how do they know all of our questions?
24
69448
2540
01:11
We have lots of questions.
25
71988
1243
01:13
In fact, in some ways there's literally an infinite number of questions
26
73231
3340
01:16
that we can ask about our city.
27
76571
1649
01:18
The agencies can never keep up.
28
78220
1475
01:19
So the paradigm isn't exactly working, and I think our policymakers realize that,
29
79695
4056
01:23
because in 2012, Mayor Bloomberg signed into law what he called
30
83751
3959
01:27
the most ambitious and comprehensive open data legislation in the country.
31
87710
3837
01:31
In a lot of ways, he's right.
32
91547
1573
01:33
In the last two years, the city has released 1,000 datasets
33
93120
2861
01:35
on our open data portal,
34
95981
1610
01:37
and it's pretty awesome.
35
97591
1764
01:39
So you go and look at data like this,
36
99355
1968
01:41
and instead of just counting the number of cabs,
37
101323
2289
01:43
we can start to ask different questions.
38
103612
1943
01:45
So I had a question.
39
105555
1200
01:46
When's rush hour in New York City?
40
106755
1701
01:48
It can be pretty bothersome. When is rush hour exactly?
41
108456
2581
01:51
And I thought to myself, these cabs aren't just numbers,
42
111037
2625
01:53
these are GPS recorders driving around in our city streets
43
113662
2711
01:56
recording each and every ride they take.
44
116373
1913
01:58
There's data there, and I looked at that data,
45
118286
2322
02:00
and I made a plot of the average speed of taxis in New York City throughout the day.
46
120608
3961
02:04
You can see that from about midnight to around 5:18 in the morning,
47
124569
3412
02:07
speed increases, and at that point, things turn around,
48
127981
3563
02:11
and they get slower and slower and slower until about 8:35 in the morning,
49
131544
3962
02:15
when they end up at around 11 and a half miles per hour.
50
135506
2693
02:18
The average taxi is going 11 and a half miles per hour on our city streets,
51
138199
3562
02:21
and it turns out it stays that way
52
141761
1987
02:23
for the entire day.
53
143748
3368
02:27
(Laughter)
54
147116
1373
02:28
So I said to myself, I guess there's no rush hour in New York City.
55
148489
3180
02:31
There's just a rush day.
56
151669
1537
02:33
Makes sense. And this is important for a couple of reasons.
57
153206
2850
02:36
If you're a transportation planner, this might be pretty interesting to know.
58
156056
3637
02:39
But if you want to get somewhere quickly,
59
159693
1975
02:41
you now know to set your alarm for 4:45 in the morning and you're all set.
60
161668
3468
02:45
New York, right?
61
165136
1044
02:46
But there's a story behind this data.
62
166180
1762
02:47
This data wasn't just available, it turns out.
63
167942
2185
02:50
It actually came from something called a Freedom of Information Law Request,
64
170127
3619
02:53
or a FOIL Request.
65
173746
1076
02:54
This is a form you can find on the Taxi and Limousine Commission website.
66
174822
3466
02:58
In order to access this data, you need to go get this form,
67
178288
2826
03:01
fill it out, and they will notify you,
68
181114
1846
03:02
and a guy named Chris Whong did exactly that.
69
182960
2130
03:05
Chris went down, and they told him,
70
185090
1890
03:06
"Just bring a brand new hard drive down to our office,
71
186980
2827
03:09
leave it here for five hours, we'll copy the data and you take it back."
72
189807
3424
03:13
And that's where this data came from.
73
193231
2032
03:15
Now, Chris is the kind of guy who wants to make the data public,
74
195263
3005
03:18
and so it ended up online for all to use, and that's where this graph came from.
75
198268
3784
03:22
And the fact that it exists is amazing. These GPS recorders -- really cool.
76
202052
3518
03:25
But the fact that we have citizens walking around with hard drives
77
205570
3118
03:28
picking up data from city agencies to make it public --
78
208688
2582
03:31
it was already kind of public, you could get to it,
79
211270
2390
03:33
but it was "public," it wasn't public.
80
213660
1812
03:35
And we can do better than that as a city.
81
215472
1962
03:37
We don't need our citizens walking around with hard drives.
82
217434
2756
03:40
Now, not every dataset is behind a FOIL Request.
83
220190
2337
03:42
Here is a map I made with the most dangerous intersections in New York City
84
222527
3802
03:46
based on cyclist accidents.
85
226329
1878
03:48
So the red areas are more dangerous.
86
228207
1939
03:50
And what it shows is first the East side of Manhattan,
87
230146
2553
03:52
especially in the lower area of Manhattan, has more cyclist accidents.
88
232699
3611
03:56
That might make sense
89
236310
1019
03:57
because there are more cyclists coming off the bridges there.
90
237329
2896
04:00
But there's other hotspots worth studying.
91
240225
2014
04:02
There's Williamsburg. There's Roosevelt Avenue in Queens.
92
242239
2669
04:04
And this is exactly the kind of data we need for Vision Zero.
93
244908
2852
04:07
This is exactly what we're looking for.
94
247760
1990
04:09
But there's a story behind this data as well.
95
249750
2135
04:11
This data didn't just appear.
96
251885
2067
04:13
How many of you guys know this logo?
97
253952
2391
04:16
Yeah, I see some shakes.
98
256343
1352
04:17
Have you ever tried to copy and paste data out of a PDF
99
257695
2655
04:20
and make sense of it?
100
260350
1357
04:21
I see more shakes.
101
261707
1060
04:22
More of you tried copying and pasting than knew the logo. I like that.
102
262767
3345
04:26
So what happened is, the data that you just saw was actually on a PDF.
103
266112
3510
04:29
In fact, hundreds and hundreds and hundreds of pages of PDF
104
269622
3105
04:32
put out by our very own NYPD,
105
272727
2159
04:34
and in order to access it, you would either have to copy and paste
106
274886
3152
04:38
for hundreds and hundreds of hours,
107
278038
1726
04:39
or you could be John Krauss.
108
279764
1344
04:41
John Krauss was like,
109
281108
1043
04:42
I'm not going to copy and paste this data. I'm going to write a program.
110
282151
3413
04:45
It's called the NYPD Crash Data Band-Aid,
111
285564
2288
04:47
and it goes to the NYPD's website and it would download PDFs.
112
287852
3032
04:50
Every day it would search; if it found a PDF, it would download it
113
290884
3126
04:54
and then it would run some PDF-scraping program,
114
294010
2250
04:56
and out would come the text,
115
296260
1336
04:57
and it would go on the Internet, and then people could make maps like that.
116
297596
3565
05:01
And the fact that the data's here, the fact that we have access to it --
117
301161
3429
05:04
Every accident, by the way, is a row in this table.
118
304590
2450
05:07
You can imagine how many PDFs that is.
119
307040
1836
05:08
The fact that we have access to that is great,
120
308876
2207
05:11
but let's not release it in PDF form,
121
311083
2110
05:13
because then we're having our citizens write PDF scrapers.
122
313193
2739
05:15
It's not the best use of our citizens' time,
123
315932
2076
05:18
and we as a city can do better than that.
124
318008
2004
05:20
Now, the good news is that the de Blasio administration
125
320012
2736
05:22
actually recently released this data a few months ago,
126
322748
2532
05:25
and so now we can actually have access to it,
127
325280
2158
05:27
but there's a lot of data still entombed in PDF.
128
327438
2536
05:29
For example, our crime data is still only available in PDF.
129
329974
3197
05:33
And not just our crime data, our own city budget.
130
333171
3755
05:36
Our city budget is only readable right now in PDF form.
131
336926
3729
05:40
And it's not just us that can't analyze it --
132
340655
2141
05:42
our own legislators who vote for the budget
133
342796
2955
05:45
also only get it in PDF.
134
345751
1943
05:47
So our legislators cannot analyze the budget that they are voting for.
135
347694
3844
05:51
And I think as a city we can do a little better than that as well.
136
351538
3608
05:55
Now, there's a lot of data that's not hidden in PDFs.
137
355146
2488
05:57
This is an example of a map I made,
138
357634
1700
05:59
and this is the dirtiest waterways in New York City.
139
359334
2926
06:02
Now, how do I measure dirty?
140
362260
1509
06:03
Well, it's kind of a little weird,
141
363769
1857
06:05
but I looked at the level of fecal coliform,
142
365626
2113
06:07
which is a measurement of fecal matter in each of our waterways.
143
367739
3506
06:11
The larger the circle, the dirtier the water,
144
371245
3274
06:14
so the large circles are dirty water, the small circles are cleaner.
145
374519
3357
06:17
What you see is inland waterways.
146
377876
1644
06:19
This is all data that was sampled by the city over the last five years.
147
379520
3404
06:22
And inland waterways are, in general, dirtier.
148
382924
2694
06:25
That makes sense, right?
149
385618
1218
06:26
And the bigger circles are dirty. And I learned a few things from this.
150
386836
3374
06:30
Number one: Never swim in anything that ends in "creek" or "canal."
151
390210
3164
06:33
But number two: I also found the dirtiest waterway in New York City,
152
393374
4318
06:37
by this measure, one measure.
153
397692
1834
06:39
In Coney Island Creek, which is not the Coney Island you swim in, luckily.
154
399526
3648
06:43
It's on the other side.
155
403174
1158
06:44
But Coney Island Creek, 94 percent of samples taken over the last five years
156
404332
3878
06:48
have had fecal levels so high
157
408210
2157
06:50
that it would be against state law to swim in the water.
158
410367
3093
06:53
And this is not the kind of fact that you're going to see
159
413460
2729
06:56
boasted in a city report, right?
160
416189
1537
06:57
It's not going to be the front page on nyc.gov.
161
417726
2250
06:59
You're not going to see it there,
162
419976
1580
07:01
but the fact that we can get to that data is awesome.
163
421556
2518
07:04
But once again, it wasn't super easy,
164
424074
1773
07:05
because this data was not on the open data portal.
165
425847
2358
07:08
If you were to go to the open data portal,
166
428205
2013
07:10
you'd see just a snippet of it, a year or a few months.
167
430218
2613
07:12
It was actually on the Department of Environmental Protection's website.
168
432831
3390
07:16
And each one of these links is an Excel sheet, and each Excel sheet is different.
169
436221
3878
07:20
Every heading is different: you copy, paste, reorganize.
170
440099
2630
07:22
When you do you can make maps and that's great, but once again,
171
442729
2952
07:25
we can do better than that as a city, we can normalize things.
172
445681
2969
07:28
And we're getting there, because there's this website that Socrata makes
173
448650
3384
07:32
called the Open Data Portal NYC.
174
452034
1541
07:33
This is where 1,100 data sets that don't suffer
175
453575
2257
07:35
from the things I just told you live,
176
455832
1781
07:37
and that number is growing, and that's great.
177
457613
2148
07:39
You can download data in any format, be it CSV or PDF or Excel document.
178
459761
3412
07:43
Whatever you want, you can download the data that way.
179
463173
2547
07:45
The problem is, once you do,
180
465720
1352
07:47
you will find that each agency codes their addresses differently.
181
467072
3686
07:50
So one is street name, intersection street,
182
470758
2141
07:52
street, borough, address, building, building address.
183
472899
2491
07:55
So once again, you're spending time, even when we have this portal,
184
475390
3180
07:58
you're spending time normalizing our address fields.
185
478570
2606
08:01
And that's not the best use of our citizens' time.
186
481176
2423
08:03
We can do better than that as a city.
187
483599
1796
08:05
We can standardize our addresses,
188
485395
1645
08:07
and if we do, we can get more maps like this.
189
487040
2185
08:09
This is a map of fire hydrants in New York City,
190
489225
2285
08:11
but not just any fire hydrants.
191
491510
1531
08:13
These are the top 250 grossing fire hydrants in terms of parking tickets.
192
493041
4726
08:17
(Laughter)
193
497767
1986
08:19
So I learned a few things from this map, and I really like this map.
194
499753
3358
08:23
Number one, just don't park on the Upper East Side.
195
503111
2402
08:25
Just don't. It doesn't matter where you park, you will get a hydrant ticket.
196
505513
3587
08:29
Number two, I found the two highest grossing hydrants in all of New York City,
197
509100
4153
08:33
and they're on the Lower East Side,
198
513253
1886
08:35
and they were bringing in over 55,000 dollars a year in parking tickets.
199
515139
5098
08:40
And that seemed a little strange to me when I noticed it,
200
520237
2738
08:42
so I did a little digging and it turns out what you had is a hydrant
201
522975
3269
08:46
and then something called a curb extension,
202
526244
1996
08:48
which is like a seven-foot space to walk on,
203
528240
2059
08:50
and then a parking spot.
204
530299
1156
08:51
And so these cars came along, and the hydrant --
205
531455
2254
08:53
"It's all the way over there, I'm fine,"
206
533709
1911
08:55
and there was actually a parking spot painted there beautifully for them.
207
535620
3474
08:59
They would park there, and the NYPD disagreed with this designation
208
539094
3155
09:02
and would ticket them.
209
542249
1058
09:03
And it wasn't just me who found a parking ticket.
210
543307
2344
09:05
This is the Google Street View car driving by
211
545651
2146
09:07
finding the same parking ticket.
212
547797
1617
09:09
So I wrote about this on my blog, on I Quant NY, and the DOT responded,
213
549414
4504
09:13
and they said,
214
553918
1020
09:14
"While the DOT has not received any complaints about this location,
215
554938
3410
09:18
we will review the roadway markings and make any appropriate alterations."
216
558348
4542
09:22
And I thought to myself, typical government response,
217
562890
2959
09:25
all right, moved on with my life.
218
565849
1881
09:27
But then, a few weeks later, something incredible happened.
219
567730
3970
09:31
They repainted the spot,
220
571700
2520
09:34
and for a second I thought I saw the future of open data,
221
574220
2690
09:36
because think about what happened here.
222
576910
2000
09:38
For five years, this spot was being ticketed, and it was confusing,
223
578910
5100
09:44
and then a citizen found something, they told the city, and within a few weeks
224
584010
4306
09:48
the problem was fixed.
225
588316
1294
09:49
It's amazing. And a lot of people see open data as being a watchdog.
226
589610
3200
09:52
It's not, it's about being a partner.
227
592810
1772
09:54
We can empower our citizens to be better partners for government,
228
594582
3138
09:57
and it's not that hard.
229
597720
1881
09:59
All we need are a few changes.
230
599601
1459
10:01
If you're FOILing data,
231
601060
1107
10:02
if you're seeing your data being FOILed over and over again,
232
602167
2867
10:05
let's release it to the public, that's a sign that it should be made public.
233
605034
3574
10:08
And if you're a government agency releasing a PDF,
234
608608
2482
10:11
let's pass legislation that requires you to post it with the underlying data,
235
611090
3649
10:14
because that data is coming from somewhere.
236
614739
2028
10:16
I don't know where, but it's coming from somewhere,
237
616767
2482
10:19
and you can release it with the PDF.
238
619249
1725
10:20
And let's adopt and share some open data standards.
239
620974
2411
10:23
Let's start with our addresses here in New York City.
240
623385
2481
10:25
Let's just start normalizing our addresses.
241
625866
2074
10:27
Because New York is a leader in open data.
242
627940
2062
10:30
Despite all this, we are absolutely a leader in open data,
243
630002
2789
10:32
and if we start normalizing things, and set an open data standard,
244
632791
3121
10:35
others will follow. The state will follow, and maybe the federal government,
245
635912
3634
10:39
Other countries could follow,
246
639546
1445
10:40
and we're not that far off from a time where you could write one program
247
640991
3411
10:44
and map information from 100 countries.
248
644402
1890
10:46
It's not science fiction. We're actually quite close.
249
646292
2487
10:48
And by the way, who are we empowering with this?
250
648779
2240
10:51
Because it's not just John Krauss and it's not just Chris Whong.
251
651019
3005
10:54
There are hundreds of meetups going on in New York City right now,
252
654024
3095
10:57
active meetups.
253
657119
1025
10:58
There are thousands of people attending these meetups.
254
658144
2572
11:00
These people are going after work and on weekends,
255
660716
2368
11:03
and they're attending these meetups to look at open data
256
663084
2636
11:05
and make our city a better place.
257
665720
1640
11:07
Groups like BetaNYC, who just last week released something called citygram.nyc
258
667360
4073
11:11
that allows you to subscribe to 311 complaints
259
671433
2147
11:13
around your own home, or around your office.
260
673580
2068
11:15
You put in your address, you get local complaints.
261
675648
2427
11:18
And it's not just the tech community that are after these things.
262
678075
3374
11:21
It's urban planners like the students I teach at Pratt.
263
681449
2622
11:24
It's policy advocates, it's everyone,
264
684071
1919
11:25
it's citizens from a diverse set of backgrounds.
265
685990
2563
11:28
And with some small, incremental changes,
266
688553
2786
11:31
we can unlock the passion and the ability of our citizens
267
691339
3225
11:34
to harness open data and make our city even better,
268
694564
3156
11:37
whether it's one dataset, or one parking spot at a time.
269
697720
3626
11:41
Thank you.
270
701346
2322
11:43
(Applause)
271
703668
3305
About this website

This site will introduce you to YouTube videos that are useful for learning English. You will see English lessons taught by top-notch teachers from around the world. Double-click on the English subtitles displayed on each video page to play the video from there. The subtitles scroll in sync with the video playback. If you have any comments or requests, please contact us using this contact form.

https://forms.gle/WvT1wiN1qDtmnspy7