How bad data keeps us from good AI | Mainak Mazumdar

48,691 views ・ 2021-03-05

TED


μ•„λž˜ μ˜λ¬Έμžλ§‰μ„ λ”λΈ”ν΄λ¦­ν•˜μ‹œλ©΄ μ˜μƒμ΄ μž¬μƒλ©λ‹ˆλ‹€.

00:00
Transcriber: Leslie Gauthier Reviewer: Joanna Pietrulewicz
0
0
7000
λ²ˆμ—­: Jihyun Lee κ²€ν† : JY Kang
AI둜 세계 경제 규λͺ¨κ°€ 16μ‘° λ‹¬λŸ¬λ‚˜ λŠ˜μ–΄λ‚  수 μžˆμŠ΅λ‹ˆλ‹€.
10λ…„ 후에 말이죠.
이제 경제λ₯Ό 끌고 κ°€λŠ” 것은
00:13
AI could add 16 trillion dollars to the global economy
1
13750
4351
μΈκ°„μ΄λ‚˜ 곡μž₯이 μ•„λ‹Œ 컴퓨터와 μ•Œκ³ λ¦¬μ¦˜μ΄ 될 κ²ƒμž…λ‹ˆλ‹€.
00:18
in the next 10 years.
2
18125
2268
00:20
This economy is not going to be built by billions of people
3
20417
4642
μš°λ¦¬λŠ” 이미 AIλ‘œλΆ€ν„° μ—„μ²­λ‚œ ν˜œνƒμ„ λˆ„λ¦¬κ³  μžˆμŠ΅λ‹ˆλ‹€.
00:25
or millions of factories,
4
25083
2143
업무λ₯Ό λ‹¨μˆœν™”ν•˜κ³ 
00:27
but by computers and algorithms.
5
27250
2643
νš¨μœ¨μ„±μ„ 올리며
00:29
We have already seen amazing benefits of AI
6
29917
4684
우리의 삢을 κ°œμ„ μ‹œμΌ°μ£ .
ν•˜μ§€λ§Œ κ³΅μ •ν•˜κ³  κ³΅ν‰ν•œ μ •μ±… μ˜μ‚¬κ²°μ •μ— λŒ€ν•΄μ„œλŠ”
00:34
in simplifying tasks,
7
34625
2184
00:36
bringing efficiencies
8
36833
1601
00:38
and improving our lives.
9
38458
2393
κΈ°λŒ€μ— λΆ€μ‘ν•˜μ§€ λͺ»ν–ˆλŠ”λ°μš”.
00:40
However, when it comes to fair and equitable policy decision-making,
10
40875
5976
AIλŠ” 경제의 문지기가 λ˜μ–΄
취업에 성곡할 μ‚¬λžŒκ³Ό
00:46
AI has not lived up to its promise.
11
46875
3143
λŒ€μΆœμ„ 받을 수 μžˆλŠ” μ‚¬λžŒμ„ κ²°μ •ν•˜κ³  μžˆμŠ΅λ‹ˆλ‹€.
00:50
AI is becoming a gatekeeper to the economy,
12
50042
2892
AIλŠ” μ‚¬νšŒμ  영ν–₯을 λ°›μ•„ μš°λ¦¬κ°€ 가진 νŽΈκ²¬μ„ κ°•ν™”ν•˜κ³ 
00:52
deciding who gets a job
13
52958
2185
κ·Έ 속도와 규λͺ¨λ₯Ό 가속화할 λΏμž…λ‹ˆλ‹€.
00:55
and who gets an access to a loan.
14
55167
3434
00:58
AI is only reinforcing and accelerating our bias
15
58625
4309
AIκ°€ μš°λ¦¬μ—κ²Œ 도움을 주지 λͺ»ν•˜λŠ” κ±ΈκΉŒμš”?
편ν–₯되고 잘λͺ»λœ 결정을 내리렀고 μ•Œκ³ λ¦¬μ¦˜μ„ μ œμž‘ν•˜λŠ” κ±ΈκΉŒμš”?
01:02
at speed and scale
16
62958
1851
01:04
with societal implications.
17
64833
2393
01:07
So, is AI failing us?
18
67250
2226
데이터 κ³Όν•™μžλ‘œμ„œ λ§μ”€λ“œλ¦¬κ² μŠ΅λ‹ˆλ‹€.
01:09
Are we designing these algorithms to deliver biased and wrong decisions?
19
69500
5417
λ¬Έμ œλŠ” μ•Œκ³ λ¦¬μ¦˜μ΄ μ•„λ‹ˆλΌ
편ν–₯된 λ°μ΄ν„°μž…λ‹ˆλ‹€.
데이터에 λ”°λΌμ„œ μ˜μ‚¬ 결정이 λ‹¬λΌμ§€λŠ” 것이죠.
01:16
As a data scientist, I'm here to tell you,
20
76292
2892
인λ₯˜μ™€ μ‚¬νšŒλ₯Ό μœ„ν•œ AIλ₯Ό λ§Œλ“€λ €λ©΄
01:19
it's not the algorithm,
21
79208
1685
01:20
but the biased data
22
80917
1476
κΈ΄κΈ‰ μž¬μ •λΉ„λ₯Ό 거쳐야 ν•©λ‹ˆλ‹€.
01:22
that's responsible for these decisions.
23
82417
3059
μ•Œκ³ λ¦¬μ¦˜μ΄ μ•„λ‹ˆλΌ 데이터에 집쀑해야 ν•©λ‹ˆλ‹€.
01:25
To make AI possible for humanity and society,
24
85500
4434
ν˜„μž¬ μš°λ¦¬λŠ” AI κΈ°μˆ μ— λ§Žμ€ μ‹œκ°„κ³Ό λˆμ„ λ“€μž…λ‹ˆλ‹€.
01:29
we need an urgent reset.
25
89958
2351
λ§Žμ€ λΉ„μš©μ„ λ“€μ—¬ μ–‘μ§ˆμ˜ κ΄€λ ¨ 자료λ₯Ό 섀계, μˆ˜μ§‘ν•˜μ£ .
01:32
Instead of algorithms,
26
92333
2101
01:34
we need to focus on the data.
27
94458
2310
01:36
We're spending time and money to scale AI
28
96792
2642
μš°λ¦¬λŠ” 이런 데이터듀과
이미 λ³΄μœ ν•˜κ³  μžˆλŠ” 편ν–₯된 데이터 μ‚¬μš©μ„ λ©ˆμΆ”κ³ 
01:39
at the expense of designing and collecting high-quality and contextual data.
29
99458
6018
μ„Έ 가지에 집쀑해야 ν•©λ‹ˆλ‹€.
데이터 κ΄€λ ¨ κΈ°λ°˜μ‹œμ„€,
01:45
We need to stop the data, or the biased data that we already have,
30
105500
4268
λ°μ΄ν„°μ˜ ν’ˆμ§ˆ,
데이터 λ¬Έν•΄λ ₯.
μ§€λ‚œ 6μ›”, λ‹Ήν™©μŠ€λŸ¬μš΄ 일이 μžˆμ—ˆμŠ΅λ‹ˆλ‹€.
01:49
and focus on three things:
31
109792
2392
01:52
data infrastructure,
32
112208
1601
듀크 λŒ€ν•™μ˜ AI λͺ¨λΈμΈ PULSEκ°€
01:53
data quality
33
113833
1393
01:55
and data literacy.
34
115250
2101
νλ¦Ών•œ 사진을 κ°œμ„ ν•΄μ„œ
01:57
In June of this year,
35
117375
1309
01:58
we saw embarrassing bias in the Duke University AI model
36
118708
4768
인식 κ°€λŠ₯ν•œ 인물 μ‚¬μ§„μœΌλ‘œ λ°”κΎΈμ—ˆλŠ”λ°
02:03
called PULSE,
37
123500
1559
잘λͺ»λœ μ•Œκ³ λ¦¬μ¦˜μ΄ μœ μƒ‰μΈμ’…μ„ 백인처럼 λ§Œλ“œλŠ” κ²°κ³Όλ₯Ό λ§Œλ“€μ—ˆμŠ΅λ‹ˆλ‹€.
02:05
which enhanced a blurry image
38
125083
3018
02:08
into a recognizable photograph of a person.
39
128125
4018
ν•™μŠ΅ λ‹¨κ³„μ—μ„œ 흑인 사진을 적게 μ œκ³΅ν–ˆκΈ° λ•Œλ¬Έμ—
02:12
This algorithm incorrectly enhanced a nonwhite image into a Caucasian image.
40
132167
6166
잘λͺ»λœ κ²°μ •κ³Ό 예츑으둜 이어진 것이죠.
02:19
African-American images were underrepresented in the training set,
41
139042
5017
μ•„λ§ˆ 이번이 μ²˜μŒμ€ 아닐 κ±°μ˜ˆμš”.
AIκ°€ ν‘μΈμ˜ 사진을 잘λͺ» μΈμ‹ν•œ κ±Έ 보신 적이 μžˆμ„ κ²λ‹ˆλ‹€.
02:24
leading to wrong decisions and predictions.
42
144083
3417
AI 방법둠이 κ°œμ„ λ˜μ—ˆμŒμ—λ„ λΆˆκ΅¬ν•˜κ³ 
02:28
Probably this is not the first time
43
148333
2143
02:30
you have seen an AI misidentify a Black person's image.
44
150500
4768
λ‹€μ–‘ν•œ 인쒅, λ―Όμ‘±μ„±μ˜ λŒ€ν‘œμ„±μ΄ λΆ€μ‘±ν•˜μ—¬
μ—¬μ „νžˆ 편ν–₯된 κ²°κ³Όλ₯Ό μ•ˆκ²¨μ£Όμ—ˆμŠ΅λ‹ˆλ‹€.
02:35
Despite an improved AI methodology,
45
155292
3892
이 μ—°κ΅¬λŠ” ν•™λ¬Έμ μ΄μ§€λ§Œ,
02:39
the underrepresentation of racial and ethnic populations
46
159208
3810
λͺ¨λ“  데이터 μ„±ν–₯이 학문적인 것은 μ•„λ‹™λ‹ˆλ‹€.
02:43
still left us with biased results.
47
163042
2684
편견이 μ§„μ§œ κ²°κ³Όλ₯Ό λ³΄μ—¬μ£ΌλŠ” κ²λ‹ˆλ‹€.
02:45
This research is academic,
48
165750
2018
2020λ…„ λ―Έκ΅­ 인ꡬ 쑰사λ₯Ό λ³΄μ‹œμ£ .
02:47
however, not all data biases are academic.
49
167792
3976
인ꡬ μ‘°μ‚¬λŠ”
02:51
Biases have real consequences.
50
171792
3142
λ§Žμ€ μ‚¬νšŒ, 경제 μ •μ±… 결정을 μœ„ν•œ 주좧돌 역할을 ν•©λ‹ˆλ‹€.
02:54
Take the 2020 US Census.
51
174958
2334
그렇기에 λ―Έκ΅­ λ‚΄ 총 인ꡬ 수λ₯Ό 100% 계산해야 ν•©λ‹ˆλ‹€.
02:58
The census is the foundation
52
178042
1726
02:59
for many social and economic policy decisions,
53
179792
4392
κ·ΈλŸ¬λ‚˜, 팬데믹과
03:04
therefore the census is required to count 100 percent of the population
54
184208
4518
μ‹œλ―ΌκΆŒμ— λŒ€ν•œ μ •μΉ˜μ μΈ 문제둜 인해
03:08
in the United States.
55
188750
2018
μ†Œμˆ˜ 집단을 μ‹€μ œ 인ꡬ μˆ˜λ³΄λ‹€ 적게 μ„ΈλŠ” 일이 μΌμ–΄λ‚©λ‹ˆλ‹€.
03:10
However, with the pandemic
56
190792
2476
μ†Œμˆ˜ 집단 인ꡬ 수 차이가 맀우 클 거라고 μƒκ°ν•΄μš”.
03:13
and the politics of the citizenship question,
57
193292
3267
쑰사λ₯Ό μœ„ν•΄ 거주지λ₯Ό μ°Ύκ³ , μ—°λ½ν•˜κ³ , μ„€λ“ν•˜κ³ , μΈν„°λ·°ν•˜κΈ°κ°€ νž˜λ“œλ‹ˆκΉŒμš”.
03:16
undercounting of minorities is a real possibility.
58
196583
3393
03:20
I expect significant undercounting of minority groups
59
200000
4309
계산 였λ₯˜λŠ” νŽΈκ²¬μ„ κ°–κ²Œ ν•˜κ³ 
03:24
who are hard to locate, contact, persuade and interview for the census.
60
204333
5268
데이터 기반의 μ§ˆμ„ λ–¨μ–΄λœ¨λ¦½λ‹ˆλ‹€.
2010λ…„ 인ꡬ μ‘°μ‚¬μ—μ„œ κ³Όμ†Œ μ§‘κ³„λœ κ²°κ³Όλ₯Ό λ΄…μ‹œλ‹€.
03:29
Undercounting will introduce bias
61
209625
3393
1천 6백만 λͺ…이 μ΅œμ’… μ§‘κ³„μ—μ„œ λˆ„λ½λ˜μ—ˆμŠ΅λ‹ˆλ‹€.
03:33
and erode the quality of our data infrastructure.
62
213042
3184
κ·Έ μˆ«μžκ°€ μ–΄λŠ 정도 규λͺ¨λƒ ν•˜λ©΄
03:36
Let's look at undercounts in the 2010 census.
63
216250
3976
μ• λ¦¬μ‘°λ‚˜, μ•„μΉΈμ†Œ, 였클라호마,
03:40
16 million people were omitted in the final counts.
64
220250
3934
그리고 μ•„μ΄μ˜€μ™€ 주의 전체 인ꡬλ₯Ό ν•©μΉœ 것과 κ°™μ£ .
03:44
This is as large as the total population
65
224208
3143
그리고 2010λ…„ 인ꡬ μ‘°μ‚¬μ—μ„œλŠ” 5μ„Έ μ΄ν•˜ 아동이 μ•½ 100만 λͺ… μ •λ„λ‚˜
03:47
of Arizona, Arkansas, Oklahoma and Iowa put together for that year.
66
227375
5809
적게 κ³„μ‚°λ˜μ—ˆμŠ΅λ‹ˆλ‹€.
ν˜„μž¬, μ†Œμˆ˜ 집단에 λŒ€ν•œ 계산 였λ₯˜λŠ”
03:53
We have also seen about a million kids under the age of five undercounted
67
233208
4310
λ‹€λ₯Έ κ΅­κ°€μ˜ 인ꡬ μ‘°μ‚¬μ—μ„œλ„ ν”ν•˜κ²Œ μΌμ–΄λ‚©λ‹ˆλ‹€.
03:57
in the 2010 Census.
68
237542
2101
μ†Œμˆ˜ 집단은 접근성이 λ–¨μ–΄μ§€λ‹ˆκΉŒμš”.
03:59
Now, undercounting of minorities
69
239667
2976
그듀은 μ •λΆ€λ₯Ό λΆˆμ‹ ν•˜κ±°λ‚˜
04:02
is common in other national censuses,
70
242667
2976
μ •μΉ˜μ μœΌλ‘œ λΆˆμ•ˆν•œ 지역에 κ±°μ£Όν•˜λŠ”λ°μš”.
04:05
as minorities can be harder to reach,
71
245667
3184
예λ₯Ό λ“€μ–΄,
2016λ…„ 호주 인ꡬ μ‘°μ‚¬μ—μ„œλŠ”
04:08
they're mistrustful towards the government
72
248875
2059
04:10
or they live in an area under political unrest.
73
250958
3476
호주 원주민과 ν† λ ˆμŠ€ ν•΄ν˜‘ λ‚΄ 인ꡬλ₯Ό 더 적게 κ³„μ‚°ν–ˆμŠ΅λ‹ˆλ‹€.
04:14
For example,
74
254458
1810
μ•½ 17.5% 정도 차이가 났죠.
04:16
the Australian Census in 2016
75
256292
2934
2020λ…„ 인ꡬ쑰사 μ˜€μ°¨λŠ”
04:19
undercounted Aboriginals and Torres Strait populations
76
259250
3934
2010년보닀 훨씬 클 κ²ƒμž…λ‹ˆλ‹€.
04:23
by about 17.5 percent.
77
263208
3060
이런 νŽΈμ°¨κ°€ 가진 영ν–₯λ ₯은 정말 μ–΄λ§ˆμ–΄λ§ˆν•˜μ£ .
04:26
We estimate undercounting in 2020
78
266292
3142
인ꡬ 쑰사 λ°μ΄ν„°μ˜ 영ν–₯λ ₯을 λ΄…μ‹œλ‹€.
04:29
to be much higher than 2010,
79
269458
3018
04:32
and the implications of this bias can be massive.
80
272500
2917
인ꡬ μ‘°μ‚¬λŠ” κ°€μž₯ 신뒰도 높은 μ–‘μ§ˆμ˜ 곡곡 λ°μ΄ν„°λ‘œμ„œ
04:36
Let's look at the implications of the census data.
81
276625
3208
인ꡬ ꡬ성과 νŠΉμ§•μ— λŒ€ν•œ 정보λ₯Ό μ œκ³΅ν•©λ‹ˆλ‹€.
04:40
Census is the most trusted, open and publicly available rich data
82
280917
5559
기업은 μ†ŒλΉ„μžμ— λŒ€ν•΄ 적정 정보λ₯Ό λ³΄μœ ν•˜λŠ” λ°˜λ©΄μ—
인ꡬ 쑰사 기관은 μ •ν™•ν•œ 인ꡬ 수λ₯Ό λ³΄κ³ ν•˜κΈ° μœ„ν•΄
04:46
on population composition and characteristics.
83
286500
3851
λ‚˜μ΄μ™€ 성별, λ―Όμ‘±μ„±, 인쒅, μ·¨μ—… μƒνƒœ, κ°€μ‘± κ΅¬μ„±κΉŒμ§€ λ°˜μ˜ν•©λ‹ˆλ‹€.
04:50
While businesses have proprietary information
84
290375
2184
04:52
on consumers,
85
292583
1393
04:54
the Census Bureau reports definitive, public counts
86
294000
4143
지리적 λΆ„ν¬λΏλ§Œ μ•„λ‹ˆκ³  말이죠,
그것듀이 인ꡬ 데이터 기반의 기초 μžλ£Œκ°€ λ©λ‹ˆλ‹€.
04:58
on age, gender, ethnicity,
87
298167
2434
05:00
race, employment, family status,
88
300625
2851
λ§Œμ•½ μ†Œμˆ˜ 집단을 더 적게 κ³„μ‚°ν•˜λ©΄
05:03
as well as geographic distribution,
89
303500
2268
λŒ€μ€‘κ΅ν†΅, 주택, 보건, λ³΄ν—˜μ„ μ§€μ›ν•˜λŠ” AI λͺ¨λΈμ΄
05:05
which are the foundation of the population data infrastructure.
90
305792
4184
그런 μ„œλΉ„μŠ€λ₯Ό κ°€μž₯ ν•„μš”λ‘œ ν•˜λŠ” 지역 주민듀을 κ°„κ³Όν•˜κΈ° μ‰½μŠ΅λ‹ˆλ‹€.
05:10
When minorities are undercounted,
91
310000
2393
05:12
AI models supporting public transportation,
92
312417
2976
05:15
housing, health care,
93
315417
1434
더 λ‚˜μ€ κ²°κ³Όλ₯Ό μ–»κΈ° μœ„ν•œ 첫 λ‹¨κ³„λŠ”
05:16
insurance
94
316875
1268
인ꡬ 톡계 μžλ£Œλ§ˆλ‹€ λ‚˜μ΄μ™€ 성별, λ―Όμ‘±μ„±, 인쒅을 λŒ€ν‘œν•˜λŠ”
05:18
are likely to overlook the communities that require these services the most.
95
318167
5392
λ°μ΄ν„°λ² μ΄μŠ€λ₯Ό λ§Œλ“œλŠ” κ²λ‹ˆλ‹€.
05:23
First step to improving results
96
323583
2185
05:25
is to make that database representative
97
325792
2392
인ꡬ 쑰사가 μ€‘μš”ν•œ 만큼
100% μ •ν™•νžˆ μ„ΈκΈ° μœ„ν•΄ μ΅œμ„ μ˜ λ…Έλ ₯을 κΈ°μšΈμ—¬μ•Ό ν•©λ‹ˆλ‹€.
05:28
of age, gender, ethnicity and race
98
328208
3268
05:31
per census data.
99
331500
1292
λ°μ΄ν„°μ˜ ν’ˆμ§ˆκ³Ό 정확성에 νˆ¬μžν•˜λŠ” 것은
05:33
Since census is so important,
100
333792
1642
05:35
we have to make every effort to count 100 percent.
101
335458
4101
AIλ₯Ό κ΅¬ν˜„ν•˜λŠ” 데 ν•„μˆ˜μ μž…λ‹ˆλ‹€.
일뢀 νŠΉκΆŒμΈ΅μ„ μœ„ν•΄μ„œκ°€ μ•„λ‹ˆλΌ
05:39
Investing in this data quality and accuracy
102
339583
4060
μ‚¬νšŒμ˜ λͺ¨λ‘λ₯Ό μœ„ν•΄μ„œμš”.
05:43
is essential to making AI possible,
103
343667
3226
λŒ€λΆ€λΆ„μ˜ AI μ‹œμŠ€ν…œμ΄ μ‚¬μš©ν•˜λŠ” λ°μ΄ν„°λŠ”
기쑴에 가지고 μžˆμ—ˆκ±°λ‚˜ λ‹€λ₯Έ λͺ©μ μœΌλ‘œ μˆ˜μ§‘λœ κ²ƒλ“€μž…λ‹ˆλ‹€.
05:46
not for only few and privileged,
104
346917
2226
05:49
but for everyone in the society.
105
349167
2517
κ°„νŽΈν•˜κ³  μ €λ ΄ν•˜λ‹ˆκΉŒμš”.
05:51
Most AI systems use the data that's already available
106
351708
3560
ν•˜μ§€λ§Œ λ°μ΄ν„°μ˜ ν’ˆμ§ˆμ—λŠ” μ±…μž„μ΄ λ’€λ”°λ¦…λ‹ˆλ‹€.
05:55
or collected for some other purposes
107
355292
2434
μ§„μ§œ μ±…μž„μ΄μš”.
05:57
because it's convenient and cheap.
108
357750
2268
데이터 ν’ˆμ§ˆμ˜ μ •μ˜μ™€ 데이터 μˆ˜μ§‘, 편ν–₯μ„± 츑정에 μ£Όλͺ©ν•˜λŠ” 것은
06:00
Yet data quality is a discipline that requires commitment --
109
360042
4684
쒋은 평가λ₯Ό 받지 λͺ»ν•  뿐만 μ•„λ‹ˆλΌ
06:04
real commitment.
110
364750
1768
06:06
This attention to the definition,
111
366542
2809
속도와 규λͺ¨, νŽΈλ¦¬μ„±μ„ μΆ”κ΅¬ν•˜λŠ” μš”μ¦˜ μ„Έμƒμ—λŠ”
06:09
data collection and measurement of the bias,
112
369375
2768
μ•„μ˜ˆ λ¬΄μ‹œλ˜κΈ°λ„ ν•©λ‹ˆλ‹€..
μ €λŠ” λ‹μŠ¨ 데이터 κ³Όν•™ νŒ€μ˜ μΌμ›μœΌλ‘œμ„œ
06:12
is not only underappreciated --
113
372167
2476
06:14
in the world of speed, scale and convenience,
114
374667
3267
데이터λ₯Ό μˆ˜μ§‘ν•˜κΈ° μœ„ν•΄ ν˜„μž₯을 λ°©λ¬Έν–ˆμŠ΅λ‹ˆλ‹€.
μƒν•˜μ΄μ™€ λ°©κ°ˆλ‘œμ–΄ 외곽에 μœ„μΉ˜ν•œ μƒμ μ΄μ—ˆλŠ”λ°μš”.
06:17
it's often ignored.
115
377958
1810
06:19
As part of Nielsen data science team,
116
379792
2809
방문의 λͺ©μ μ€ μƒμ μ˜ νŒλ§€μ•‘μ„ μΈ‘μ •ν•˜λŠ” κ²ƒμ΄μ—ˆμŠ΅λ‹ˆλ‹€.
06:22
I went to field visits to collect data,
117
382625
2351
06:25
visiting retail stores outside Shanghai and Bangalore.
118
385000
3934
λ„μ‹œ λ°–μœΌλ‘œ 수 kmλ₯Ό 달렀
μž‘μ€ κ°€κ²Œλ“€μ„ λ°©λ¬Έν–ˆμŠ΅λ‹ˆλ‹€.
06:28
The goal of that visit was to measure retail sales from those stores.
119
388958
5060
ν—ˆλ¦„ν•˜κ³  접근성이 λ–¨μ–΄μ§€λŠ” κ°€κ²Œλ“€μ΄μ£ .
이제 μ—¬λŸ¬λΆ„μ€ κΆκΈˆν•˜μ‹€ κ²λ‹ˆλ‹€.
μ™œ 이런 μž‘μ€ κ°€κ²Œμ— 관심을 κ°€μ‘Œμ„κΉŒμš”?
06:34
We drove miles outside the city,
120
394042
2184
06:36
found these small stores --
121
396250
1976
λ„μ‹œμ— μžˆλŠ” 상점을 선택할 μˆ˜λ„ μžˆμ—ˆλŠ”λ° 말이죠.
06:38
informal, hard to reach.
122
398250
2059
λ„μ‹œλŠ” μ „μž 데이터가 μž”μ†‘λ§μ„ 톡해 μ‰½κ²Œ ν†΅ν•©λ˜μ–΄
06:40
And you may be wondering --
123
400333
2018
06:42
why are we interested in these specific stores?
124
402375
3518
λΉ„μš©μ΄ μ €λ ΄ν•˜κ³  νŽΈλ¦¬ν•˜λ©° μ‰½κ²Œ 확보할 수 μžˆλŠ”λ°,
06:45
We could have selected a store in the city
125
405917
2142
μ™œ μ‹œκ³¨ κ°€κ²Œμ˜ λ°μ΄ν„°μ˜ 질과 정확성에 κ·Έλ ‡κ²Œ μ§‘μ°©ν–ˆλ˜ κ±ΈκΉŒμš”?
06:48
where the electronic data could be easily integrated into a data pipeline --
126
408083
4101
06:52
cheap, convenient and easy.
127
412208
2851
정닡은 κ°„λ‹¨ν•©λ‹ˆλ‹€.
이런 μ‹œκ³¨ κ°€κ²Œμ˜ 데이터가 μ€‘μš”ν•˜κΈ° λ•Œλ¬Έμ΄μ£ .
06:55
Why are we so obsessed with the quality
128
415083
3060
06:58
and accuracy of the data from these stores?
129
418167
2976
ꡭ제 노동 기ꡬ에 λ”°λ₯΄λ©΄,
07:01
The answer is simple:
130
421167
1559
07:02
because the data from these rural stores matter.
131
422750
3250
쀑ꡭ인 40%와 인도인 65%κ°€ λ†μ΄Œ 지역에 κ±°μ£Όν•©λ‹ˆλ‹€.
07:07
According to the International Labour Organization,
132
427708
3726
그에 λ”°λ₯Έ 편ν–₯된 결정을 μƒμƒν•΄λ³΄μ„Έμš”.
07:11
40 percent Chinese
133
431458
1768
인도 λ‚΄ μ†ŒλΉ„ 주체의 65%κ°€ 뢄석 λͺ¨λΈμ—μ„œ λ°°μ œλ˜μ–΄
07:13
and 65 percent of Indians live in rural areas.
134
433250
4643
λ„μ‹œλ§Œ ν˜œνƒμ„ 보게 될 κ²λ‹ˆλ‹€.
07:17
Imagine the bias in decision
135
437917
1892
07:19
when 65 percent of consumption in India is excluded in models,
136
439833
5226
μ‹œκ³¨κ³Ό λ„μ‹œ 상황을 λͺ¨λ₯΄κ³ 
생계 μˆ˜λ‹¨, μƒν™œ 방식, 경제, κ°€μΉ˜μ— λŒ€ν•œ μ‹ ν˜Έλ₯Ό μ•Œ 수 μ—†λ‹€λ©΄,
07:25
meaning the decision will favor the urban over the rural.
137
445083
3834
μ†Œλ§€ 기업은 가격 κ²°μ •κ³Ό κ΄‘κ³ , λ§ˆμΌ€νŒ…μ— 잘λͺ»λœ 투자λ₯Ό ν•˜κ²Œ 될 κ²ƒμž…λ‹ˆλ‹€.
07:29
Without this rural-urban context
138
449583
2268
07:31
and signals on livelihood, lifestyle, economy and values,
139
451875
5226
ν˜Ήμ€ λ„μ‹œ 편ν–₯적인 λ°μ΄ν„°λ‘œ 인해
07:37
retail brands will make wrong investments on pricing, advertising and marketing.
140
457125
5792
λ†μ΄Œ μ§€μ—­μ˜ 보건과 μž¬μ • νˆ¬μž…μ— κ΄€ν•œ 잘λͺ»λœ μ •μ±… 결정을 내릴 μˆ˜λ„ 있죠.
07:43
Or the urban bias will lead to wrong rural policy decisions
141
463750
4893
잘λͺ»λœ μ˜μ‚¬κ²°μ •μ€ AI μ•Œκ³ λ¦¬μ¦˜λ§Œμ˜ λ¬Έμ œκ°€ μ•„λ‹™λ‹ˆλ‹€.
07:48
with regards to health and other investments.
142
468667
3517
λ°μ΄ν„°μ˜ λ¬Έμ œμ—μš”.
μ• μ΄ˆμ— μΈ‘μ •ν•˜λ €λ˜ 지역이 배제된 편ν–₯된 데이터가 문제죠.
07:52
Wrong decisions are not the problem with the AI algorithm.
143
472208
3625
07:56
It's a problem of the data
144
476792
2142
μΌκ΄€λœ 데이터가 μš°μ„ μž…λ‹ˆλ‹€.
07:58
that excludes areas intended to be measured in the first place.
145
478958
4792
μ•Œκ³ λ¦¬μ¦˜μ΄ μ•„λ‹ˆλΌμš”.
λ‹€λ₯Έ μ˜ˆμ‹œλ₯Ό λ³΄κ² μŠ΅λ‹ˆλ‹€.
μ €λŠ” 였레곀 μ£Όμ—μ„œ μ™Έλ”΄ 이동주택 지역과
08:04
The data in the context is a priority,
146
484917
2392
08:07
not the algorithms.
147
487333
1935
λ‰΄μš• μ‹œ μ•„νŒŒνŠΈλ₯Ό λ°©λ¬Έν–ˆμŠ΅λ‹ˆλ‹€.
08:09
Let's look at another example.
148
489292
2267
λ°©λ¬Έ λͺ©μ μ€ ν•΄λ‹Ή 가정듀을 λ‹μŠ¨ μžλ¬Έλ‹¨μœΌλ‘œ λͺ¨μ‹œκΈ° μœ„ν•΄μ„œμ˜€λŠ”λ°μš”.
08:11
I visited these remote, trailer park homes in Oregon state
149
491583
4560
μžλ¬Έλ‹¨μ€ 톡계 λΆ„μ„μ—μ„œ λŒ€ν‘œ ν‘œλ³Έμ΄ λ˜λŠ” κ°€κ΅¬λ‘œμ„œ
08:16
and New York City apartments
150
496167
1642
μ„ μ • ν›„ 일정 κΈ°κ°„ λ™μ•ˆ 톡계 쑰사에 μ°Έμ—¬ν•˜κ²Œ λ©λ‹ˆλ‹€.
08:17
to invite these homes to participate in Nielsen panels.
151
497833
3976
08:21
Panels are statistically representative samples of homes
152
501833
3601
μš°λ¦¬λŠ” λͺ¨λ“  λŒ€μƒμ„ 쑰사에 ν¬ν•¨μ‹œμΌœμ•Ό ν–ˆκ³ 
남미 μΆœμ‹  κ°€μ •κ³Ό 흑인 κ°€μ •μ˜ 데이터도 μˆ˜μ§‘ν–ˆμŠ΅λ‹ˆλ‹€.
08:25
that we invite to participate in the measurement
153
505458
2601
08:28
over a period of time.
154
508083
2018
08:30
Our mission to include everybody in the measurement
155
510125
3309
κ·Έλ“€ 가정은 μ§€μƒνŒŒ TV μˆ˜μ‹ μ— μ•ˆν…Œλ‚˜λ₯Ό μ‚¬μš©ν•˜κ³  μžˆμ—ˆμ£ .
08:33
led us to collect data from these Hispanic and African homes
156
513458
5101
λ‹μŠ¨ 데이터에 λ”°λ₯΄λ©΄
이런 가정이 λ―Έκ΅­ μ „μ²΄μ˜ 15%λ₯Ό μ°¨μ§€ν•©λ‹ˆλ‹€.
08:38
who use over-the-air TV reception to an antenna.
157
518583
3834
μ•½ 4천 5백만 λͺ…에 λ‹¬ν•˜λŠ”λ°μš”.
08:43
Per Nielsen data,
158
523292
1601
08:44
these homes constitute 15 percent of US households,
159
524917
4851
μš°μˆ˜ν•œ 데이터λ₯Ό μ•½μ†ν•˜κ³  이에 μ§‘μ€‘ν•˜λ €λ©΄
15%에 λ‹¬ν•˜λŠ” μ†Œμ™Έ 그룹의 정보λ₯Ό μˆ˜μ§‘ν•˜κΈ° μœ„ν•΄ λ…Έλ ₯ν•΄μ•Ό ν•©λ‹ˆλ‹€.
08:49
which is about 45 million people.
160
529792
2726
08:52
Commitment and focus on quality means we made every effort
161
532542
4684
그게 μ™œ μ€‘μš”ν• κΉŒμš”?
08:57
to collect information
162
537250
1559
이 집단은 규λͺ¨κ°€ μƒλ‹Ήν•΄μ„œ
08:58
from these 15 percent, hard-to-reach groups.
163
538833
4601
νŒμ΄‰κ³Ό μƒν’ˆ μΈ‘λ©΄μ—μ„œλ„ μ•„μ£Ό μ•„μ£Ό μ€‘μš”ν•©λ‹ˆλ‹€.
λ―Έλ””μ–΄ νšŒμ‚¬λΏλ§Œ μ•„λ‹ˆλΌμš”.
09:03
Why does it matter?
164
543458
1459
κ·Έ 데이터가 μ—†λ‹€λ©΄
09:05
This is a sizeable group
165
545875
1309
νŒμ΄‰κ³Ό μƒν’ˆ 그리고 μ˜μ—… λͺ¨λΈμ— μžˆμ–΄μ„œ
09:07
that's very, very important to the marketers, brands,
166
547208
3310
κ·Έλ“€μ—κ²Œ 접근성도 λ–¨μ–΄μ§‘λ‹ˆλ‹€.
09:10
as well as the media companies.
167
550542
2601
μ€‘μš”ν•œ μ†Œμˆ˜ 집단 인ꡬ λŒ€μƒμ˜ κ΄‘κ³  λ…ΈμΆœμ€ 맀우 μ€‘μš”ν•˜κΈ° λ•Œλ¬Έμ΄μ£ .
09:13
Without the data,
168
553167
1351
09:14
the marketers and brands and their models
169
554542
2892
κ΄‘κ³  수읡이 μ—†λ‹€λ©΄
09:17
would not be able to reach these folks,
170
557458
2393
ν…”λ ˆλ¬Έλ„μ™€ μœ λ‹ˆλΉ„μ „κ³Ό 같은 λ°©μ†‘μ‚¬λŠ”
09:19
as well as show ads to these very, very important minority populations.
171
559875
4684
무료 μ½˜ν…μΈ λ₯Ό μ œκ³΅ν•  수 없을 κ²ƒμž…λ‹ˆλ‹€.
09:24
And without the ad revenue,
172
564583
1976
λ―Όμ£Όμ£Όμ˜μ— μžˆμ–΄ κ°€μž₯ 기본이 λ˜λŠ” λ‰΄μŠ€ λ―Έλ””μ–΄λ₯Ό ν¬ν•¨ν•΄μ„œμš”.
09:26
the broadcasters such as Telemundo or Univision,
173
566583
4060
09:30
would not be able to deliver free content,
174
570667
3142
이 λ°μ΄ν„°λŠ” 기업체와 μ‚¬νšŒμ— λ°˜λ“œμ‹œ ν•„μš”ν•©λ‹ˆλ‹€.
09:33
including news media,
175
573833
2101
09:35
which is so foundational to our democracy.
176
575958
3560
AIκ°€ 가진 νŽΈκ²¬μ„ μ—†μ• κΈ° μœ„ν•œ 절호의 기회λ₯Ό 작으렀면
09:39
This data is essential for businesses and society.
177
579542
3541
데이터뢀터 μ‹œμž‘ν•΄μ•Ό ν•©λ‹ˆλ‹€.
μƒˆ μ•Œκ³ λ¦¬μ¦˜μ„ λ§Œλ“œλŠ” 데에 κ²½μŸν•˜κΈ°λ³΄λ‹€
09:44
Our once-in-a-lifetime opportunity to reduce human bias in AI
178
584000
4601
κ°œμ„ λœ 데이터 κΈ°λ°˜μ„ κ΅¬μΆ•ν•˜λŠ” 것이 μ €μ˜ λͺ©ν‘œμž…λ‹ˆλ‹€.
09:48
starts with the data.
179
588625
2309
κ·Έλž˜μ•Ό 윀리적인 AIλ₯Ό λ§Œλ“€ 수 μžˆμœΌλ‹ˆκΉŒμš”.
09:50
Instead of racing to build new algorithms,
180
590958
3476
μ—¬λŸ¬λΆ„λ„ λ™μ°Έν•΄μ£Όμ‹œκΈΈ λ°”λžλ‹ˆλ‹€.
09:54
my mission is to build a better data infrastructure
181
594458
3851
κ°μ‚¬ν•©λ‹ˆλ‹€.
09:58
that makes ethical AI possible.
182
598333
3060
10:01
I hope you will join me in my mission as well.
183
601417
3559
10:05
Thank you.
184
605000
1250
이 μ›Ήμ‚¬μ΄νŠΈ 정보

이 μ‚¬μ΄νŠΈλŠ” μ˜μ–΄ ν•™μŠ΅μ— μœ μš©ν•œ YouTube λ™μ˜μƒμ„ μ†Œκ°œν•©λ‹ˆλ‹€. μ „ 세계 졜고의 μ„ μƒλ‹˜λ“€μ΄ κ°€λ₯΄μΉ˜λŠ” μ˜μ–΄ μˆ˜μ—…μ„ 보게 될 κ²ƒμž…λ‹ˆλ‹€. 각 λ™μ˜μƒ νŽ˜μ΄μ§€μ— ν‘œμ‹œλ˜λŠ” μ˜μ–΄ μžλ§‰μ„ 더블 ν΄λ¦­ν•˜λ©΄ κ·Έκ³³μ—μ„œ λ™μ˜μƒμ΄ μž¬μƒλ©λ‹ˆλ‹€. λΉ„λ””μ˜€ μž¬μƒμ— 맞좰 μžλ§‰μ΄ μŠ€ν¬λ‘€λ©λ‹ˆλ‹€. μ˜κ²¬μ΄λ‚˜ μš”μ²­μ΄ μžˆλŠ” 경우 이 문의 양식을 μ‚¬μš©ν•˜μ—¬ λ¬Έμ˜ν•˜μ‹­μ‹œμ˜€.

https://forms.gle/WvT1wiN1qDtmnspy7