Why AI Needs a “Nutrition Label” | Kasia Chmielinski | TED

31,837 views ・ 2024-06-14

TED


請雙擊下方英文字幕播放視頻。

譯者: 麗玲 辛
00:04
Now, I haven't met most of you or really any of you,
0
4209
3295
我雖然還沒認識在座的多數人, 甚至可以說根本沒見過面,
00:07
but I feel a really good vibe in the room.
1
7504
2085
但我感覺這裡的氣氛很好。
00:09
(Laughter)
2
9631
1209
(笑聲)
00:10
And so I think I'd like to treat you all to a meal.
3
10882
2503
所以我想請大家吃頓飯。
00:13
What do you think?
4
13426
1252
你們覺得如何?
00:14
Yes? Great, so many new friends.
5
14678
1877
好嗎?太好了,這麼多新朋友。
00:17
So we're going to go to this cafe,
6
17681
1668
我們要去這家咖啡館,
00:19
they serve sandwiches.
7
19349
1501
他們那裡有賣三明治,
00:20
And the sandwiches are really delicious.
8
20850
2002
而且非常好吃。
00:22
But I have to tell you that sometimes they make people really, really sick.
9
22852
4422
但我得跟你說,有時候這些三明治 會讓人非常非常不舒服。
00:27
(Laughter)
10
27774
1335
(笑聲)
00:29
And we don't know why.
11
29109
1251
我們也不知道為什麼,
00:30
Because the cafe won't tell us how they make the sandwich,
12
30402
2711
因為咖啡館不告訴我們 他們是怎麼做三明治的,
00:33
they won't tell us about the ingredients.
13
33154
2044
也不告訴我們用的食材是什麼。
00:35
And then the authorities have no way to fix the problem.
14
35198
3128
這樣一來,相關部門也 無法解決這個問題。
00:38
But the offer still stands.
15
38702
1293
不過,我的邀請仍然有效。
00:39
So who wants to get a sandwich?
16
39995
1543
誰想吃三明治?
00:41
(Laughter)
17
41538
1168
(笑聲)
00:42
Some brave souls, we can talk after.
18
42747
1752
有些勇敢的人,我們可以事後再聊。
00:45
But for the rest of you, I understand.
19
45000
2168
至於其他人,我能理解,
00:47
You don't have enough information
20
47210
1585
因為你們沒有足夠的資訊
00:48
to make good choices about your safety
21
48795
1835
來做出自身安全的好選擇,
00:50
or even fix the issue.
22
50672
1835
甚至也無法解決這個問題。
00:52
Now, before I further the anxiety here, I'm not actually trying to make you sick,
23
52507
3879
我不想繼續加深大家的焦慮,
我也不想讓你們不舒服,
00:56
but this is an analogy to how we're currently making algorithmic systems,
24
56428
3545
這只是個比喻,來說明 我們目前製作演算系統的方式,
00:59
also known as artificial intelligence or AI.
25
59973
3003
也就是所謂的人工智慧 (AI)。
01:04
Now, for those who haven't thought about the relationship
26
64060
2753
至於從沒想過 AI 和三明治有什麼關係的人,
01:06
between AI and sandwiches, don't worry about it,
27
66813
2586
別擔心,
01:09
I'm here for you, I'm going to explain.
28
69441
2294
我會解釋給你聽。
01:11
You see, AI systems, they provide benefit to society.
29
71776
3754
AI 系統對社會有益,
01:15
They feed us,
30
75530
1251
它們餵養我們,
01:16
but they're also inconsistently making us sick.
31
76823
3670
但它們也不時讓我們生病。
01:20
And we don't have access to the ingredients that go into the AI.
32
80535
4505
我們無法得知 AI 的原料,
01:25
And so we can't actually address the issues.
33
85040
2460
所以也無法真正解決問題。
01:28
We also can't stop eating AI
34
88418
1793
我們也無法像不吃有疑慮的 三明治那樣停止使用 AI,
01:30
like we can just stop eating a shady sandwich
35
90211
2128
01:32
because it's everywhere,
36
92339
1209
因為它無處不在,
01:33
and we often don't even know that we're encountering a system
37
93590
2878
而且我們甚至常常不知道 自己在使用演算系統。
01:36
that's algorithmically based.
38
96509
1794
01:38
So today, I'm going to tell you about some of the AI trends that I see.
39
98345
3878
今天,我會告訴你們 一些我所看到的 AI 趨勢。
01:42
I'm going to draw on my experience building these systems
40
102223
2711
我會根據過去二十年來 我建立這些系統的經驗,
01:44
over the last two decades to tell you about the tools
41
104934
2545
介紹我和其他人所開發的工具,
01:47
that I and others have built to look into these AI ingredients.
42
107520
3879
用來檢視這些 AI 的成分。
01:51
And finally, I'm going to leave you with three principles
43
111441
2711
最後,我會提供三個原則,
這些原則能讓我們和那些建立 AI 的公司有更健康的關係。
01:54
that I think will give us a healthier relationship
44
114152
2336
01:56
to the companies that build artificial intelligence.
45
116488
2836
02:00
I'm going to start with the question, how did we get here?
46
120241
2878
我會從一個問題開始: 我們怎麼走到今天的情況?
02:03
AI is not new.
47
123745
2169
AI 並不是新事物。
02:06
We have been living alongside AI for two decades.
48
126665
3378
我們已經與 AI 共處了二十年。
02:10
Every time that you apply for something online,
49
130418
2294
每當你在線上申請某些東西、
02:12
you open a bank account or you go through passport control,
50
132712
3420
開設銀行帳戶或通過護照查驗,
02:16
you're encountering an algorithmic system.
51
136132
2044
你都會接觸到某個演算系統。
02:19
We've also been living with the negative repercussions of AI for 20 years,
52
139010
4088
過去二十年來,我們也一直 在承受 AI 帶來的負面影響,
02:23
and this is how it makes us sick.
53
143139
1752
這就是它讓我們生病的原因。
02:25
These systems get deployed on broad populations,
54
145266
2920
這些系統被部署在廣大人群中,
02:28
and then certain subsets end up getting negatively disparately impacted,
55
148228
4921
然後某些群體會因此 受到負面的差異性影響,
02:33
usually on the basis of race or gender or other characteristics.
56
153191
3504
通常是基於種族、性別或其他特徵。
02:37
We need to be able to understand the ingredients to these systems
57
157862
3087
我們需要了解這些系統的成分,
02:40
so that we can address the issues.
58
160990
2086
才能解決問題。
02:43
So what are the ingredients to an AI system?
59
163827
3086
那麼,AI 系統的成分是什麼呢?
02:46
Well, data fuels the AI.
60
166955
2294
數據是 AI 的燃料。
02:49
The AI is going to look like the data that you gave it.
61
169290
2962
AI 的表現會反映你給它的數據。
02:52
So for example,
62
172752
1293
例如,
02:54
if I want to make a risk-assessment system for diabetes,
63
174087
4129
如果我想製作一個 糖尿病風險評估系統,
02:58
my training data set might be adults in a certain region.
64
178258
4337
我的訓練數據集 可能來自某個地區的成年人。
03:02
And so I'll build that system,
65
182929
1460
這樣,我建立的這個系統, 對這些成年人來說就很有用。
03:04
it'll work really well for those adults in that region.
66
184389
2627
但它對其他地區的成年人 或者小孩可能根本沒幫助。
03:07
But it does not work for adults in other regions
67
187016
2294
03:09
or maybe at all for children.
68
189310
1419
03:10
So you can imagine if we deploy this for all those populations,
69
190770
3003
你可以想像,如果我們把這個系統 部署到所有群體中,
會有很多人受到傷害。
03:13
there are going to be a lot of people who are harmed.
70
193815
2502
我們需要在使用數據前, 了解它的質量。
03:16
We need to be able to understand the quality of the data before we use it.
71
196317
4422
03:22
But I'm sorry to tell you that we currently live
72
202157
2252
但很遺憾地告訴你,
我們目前生活在 我名為數據的狂野西部時代。
03:24
in what I call the Wild West of data.
73
204451
2502
03:26
It's really hard to assess quality of data before you use it.
74
206995
4171
很難在使用數據前評估它的質量。
03:31
There are no global standards for data quality assessment,
75
211166
2877
沒有全球性的數據質量評估標準,
03:34
and there are very few data regulations around how you can use data
76
214085
3295
而且關於如何使用數據 和能使用哪些數據的法規也很少。
03:37
and what types of data you can use.
77
217422
2377
03:40
This is kind of like in the food safety realm.
78
220967
2294
這就像食品安全領域一樣,
03:43
If we couldn't understand where the ingredients were sourced,
79
223303
3545
如果我們無法了解原料的來源,
03:46
we also had no idea whether they were safe for us to consume.
80
226890
3003
也無法知道它們是否可安全食用。
03:50
We also tend to stitch data together,
81
230643
2253
我們還經常把數據拼湊在一起,
03:52
and every time we stitch this data together,
82
232937
2086
每次我們拼湊數據時,
無論是從網路上抓取、生成還是獲取,
03:55
which we might find on the internet, scrape, we might generate it,
83
235023
3128
03:58
we could source it.
84
238151
1376
03:59
We lose information about the quality of the data.
85
239527
3128
都會失去關於數據質量的資訊。
04:03
And the folks who are building the models
86
243156
1960
而建立模型的人
並不是那些找到數據的人,
04:05
are not the ones that found the data.
87
245116
1919
04:07
So there's further information that's lost.
88
247076
2336
所以資訊進一步丟失。
04:10
Now, I've been asking myself a lot of questions
89
250497
2210
我一直在問自己,
04:12
about how can we understand the data quality before we use it.
90
252749
3754
如何在使用數據前,了解數據質量。
04:16
And this emerges from two decades of building these kinds of systems.
91
256544
4672
這是源於我二十年來 建立這些系統的經驗。
04:21
The way I was trained to build systems is similar to how people do it today.
92
261216
3920
我之前受訓建立系統的方式 和今天的人們相似,
04:25
You build for the middle of the distribution.
93
265178
2210
你為常態分佈的中間群體建立系統,
04:27
That's your normal user.
94
267430
1919
那是你的常態用戶。
04:29
So for me, a lot of my training data sets
95
269390
1961
就我而言,我的很多訓練數據集
04:31
would include information about people from the Western world who speak English,
96
271392
4213
包括來自西方世界、說英語、
04:35
who have certain normative characteristics.
97
275605
2336
具有某些標準特徵的人的訊息。
04:37
And it took me an embarrassingly long amount of time
98
277982
2461
我花了很長時間才意識到
04:40
to realize that I was not my own user.
99
280443
2503
我自己並不是我設定的用戶。
04:43
So I identify as non-binary, as mixed race,
100
283696
2628
我自認為是非二元性別、混血種族,
04:46
I wear a hearing aid
101
286324
1668
戴助聽器,
04:47
and I just wasn't represented in the data sets that I was using.
102
287992
3587
而我在使用的數據集中 看不到我這樣的人。
04:51
And so I was building systems that literally didn't work for me.
103
291621
3378
所以我建立的系統根本不適合我。
04:55
And for example, I once built a system that repeatedly told me
104
295041
3462
例如,我曾經建立過一個系統,
反覆告訴我,我是個東歐白人女性。
04:58
that I was a white Eastern-European lady.
105
298503
3670
05:02
This did a real number on my identity.
106
302966
2043
這對我的身份認同造成了很大的傷害。
05:05
(Laughter)
107
305051
1919
(笑聲)
05:06
But perhaps even more worrying,
108
306970
1793
但或許更令人擔憂的是,
05:08
this was a system to be deployed in health care,
109
308805
2961
這個系統要部署在醫療領域,
05:11
where your background can determine things like risk scores for diseases.
110
311808
4296
依個人背景決定某些疾病的風險評分。
05:17
And so I started to wonder,
111
317605
1627
所以我開始思考,
05:19
can I build tools and work with others to do this
112
319274
2794
我能否建立工具,並與他人合作,
05:22
so that I can look inside of a dataset before I use it?
113
322068
2836
在使用數據前,檢查數據集?
05:25
In 2018, I was part of a fellowship at Harvard and MIT,
114
325655
3629
2018 年,我參加了哈佛和 麻省理工學院的研究員計畫,
05:29
and I, with some colleagues, decided to try to address this problem.
115
329284
4379
我和一些同事決定嘗試解決這個問題。
05:33
And so we launched the Data Nutrition Project,
116
333705
2836
我們推出了「數據營養計畫」,
05:36
which is a research group and also a nonprofit
117
336541
2919
這是一個研究團體, 也是個非營利組織,
05:39
that builds nutrition labels for datasets.
118
339502
2711
要建立數據集的營養標示。
05:43
So similar to food nutrition labels,
119
343381
2628
就像食品營養標示一樣,
05:46
the idea here is that you can look inside of a data set before you use it.
120
346050
3504
這個概念是讓你在使用數據集之前
能夠檢視它的內部,了解其成分,
05:49
You can understand the ingredients,
121
349554
1710
看看是否適合你的用途。
05:51
see whether it's healthy for the things that you want to do.
122
351264
2878
這是一個卡通化的標示版本。
05:54
Now this is a cartoonified version of the label.
123
354142
2669
05:56
The top part tells you about the completion of the label itself.
124
356811
4213
頂部告訴你標示本身的完成情況。
06:01
And underneath that you have information about the data,
125
361065
2628
在下方,你可以看到關於數據的訊息,
06:03
the description, the keywords, the tags,
126
363693
2044
包括描述、關鍵字、標籤,
06:05
and importantly, on the right hand side,
127
365778
1919
以及很重要的,在右側,說明 如何使用和不應使用這些數據。
06:07
how you should and should not use the data.
128
367697
2586
06:10
If you could scroll on this cartoon,
129
370700
1793
如果你捲動這個卡通標示,
06:12
you would see information about risks and mitigation strategies
130
372493
3003
你會看到關於風險 和緩解策略的各種訊息。
06:15
across a number of vectors.
131
375496
1544
06:17
And we launched this with two audiences in mind.
132
377707
2836
我們推出這個標示是針對兩類受眾。
06:20
The first audience are folks who are building AI.
133
380543
3545
第一類是建立 AI 的人,
06:24
So they’re choosing datasets.
134
384130
1418
他們是選擇數據集的人,
06:25
We want to help them make a better choice.
135
385590
2294
我們希望幫助他們做出更好的選擇。
06:27
The second audience are folks who are building datasets.
136
387926
3128
第二類是建立數據集的人。
06:31
And it turns out
137
391095
1168
事實證明,
06:32
that when you tell someone they have to put a label on something,
138
392305
3086
當你告訴某人他們必須 在某物上貼標示時,
他們會在製作之前考慮成分。
06:35
they think about the ingredients beforehand.
139
395391
2086
06:38
The analogy here might be,
140
398102
1544
這就像如果我想做無麩質三明治,
06:39
if I want to make a sandwich and say that it’s gluten-free,
141
399687
2878
06:42
I have to think about all the components as I make the sandwich,
142
402607
3045
我必須在製作過程中考慮所有的成分,
06:45
the bread and the ingredients, the sauces.
143
405652
2210
麵包和其他材料、餡料,
06:47
I can't just put it on a sandwich and put it in front of you
144
407904
2836
不能只是做好後、放到你面前,
06:50
and tell you it's gluten-free.
145
410740
1960
就說它是無麩質的。
06:52
We're really proud of the work that we've done.
146
412700
2253
我們對我們所做的工作感到非常自豪。
06:54
We launched this as a design and then a prototype
147
414994
2336
我們首先推出了設計,然後是原型,
06:57
and ultimately a tool for others to make their own labels.
148
417330
3920
最終成為一項工具, 讓其他人製作自己的標示。
07:01
And we've worked with experts at places like Microsoft Research,
149
421709
3045
我們與微軟研究院、
07:04
the United Nations and professors globally
150
424754
3045
聯合國及全球的教授等專家合作,
07:07
to integrate the label and the methodology
151
427840
2002
把這個標示和方法
07:09
into their work flows and into their curricula.
152
429884
2628
整合到他們的工作流程和課程中。
07:13
But we know it only goes so far.
153
433096
1877
但我們知道這還遠遠不夠。
07:15
And that's because it's actually really hard to get a label
154
435014
2920
因為要在每一個數據集上貼標示 實際上非常困難。
07:17
on every single dataset.
155
437976
2293
07:20
And this comes down to the question
156
440311
1710
這就歸結到這個問題: 為什麼要先在數據集上貼標示?
07:22
of why would you put a label on a dataset to begin with?
157
442063
3086
07:25
Well, the first reason is not rocket science.
158
445525
2169
第一個原因並不複雜,
07:27
It's that you have to.
159
447735
1835
就是因為必須這麼做。
07:29
And this is, quite frankly, why food nutrition labels exist.
160
449570
2878
老實說,這也是為什麼 食品營養標示存在的原因。
07:32
It's because if they didn't put them on the boxes, it would be illegal.
161
452490
3420
因為如果他們不在盒子上 貼標示,這就是違法的。
07:36
However, we don't really have AI regulation.
162
456703
2377
然而,我們並沒有 真正的 AI 法規,
07:39
We don't have much regulation around the use of data.
163
459122
2627
我們對數據使用的法規也很少。
07:42
Now there is some on the horizon.
164
462208
1960
不過,有些法規即將出現。
07:44
For example, the EU AI Act just passed this week.
165
464168
3420
例如,本週歐盟 剛通過了 AI 法案,
07:48
And although there are no requirements around making the training data available,
166
468381
4630
雖然沒有要求提供訓練數據,
07:53
they do have provisions for creating transparency labeling
167
473052
4254
但他們確實有關於創建 標示透明度的規定,
07:57
like the dataset nutrition label, data sheets, data statements.
168
477348
3879
比如數據集營養標示、 數據表和數據聲明。
08:01
There are many in the space.
169
481269
1376
在這個領域有很多這樣的做法。
08:02
We think this is a really good first step.
170
482645
2044
我們認為這是很好的一步。
08:05
The second reason that you might have a label on a dataset
171
485606
2753
第二個該在數據集上貼標示的原因,
08:08
is because it is a best practice or a cultural norm.
172
488401
3920
這是一種最佳實踐或文化規範。
這裡的例子可能是
08:13
The example here might be how we're starting to see
173
493364
2544
08:15
more and more food packaging and menus at restaurants
174
495950
3337
我們越來越常看到 食品包裝和餐廳菜單上
08:19
include information about whether there's gluten.
175
499328
2920
有關於是否含有麩質的訊息。
08:22
This is not required by law,
176
502248
1794
這不是法律要求的,
08:24
although if you do say it, it had better be true.
177
504042
2627
不過如果你說沒有, 那最好是真的如此。
08:27
And the reason that people are adding this to their menus
178
507211
2711
而人們把這些訊息加到他們的菜單
08:29
and their food packaging
179
509922
1168
和食品包裝上,
08:31
is because there's an increased awareness of the sensitivity
180
511090
2878
是因為人們對這種過敏或病症的敏感性
08:33
and kind of the seriousness of that kind of an allergy or condition.
181
513968
3754
和嚴重性有了更多的認識。
08:39
So we're also seeing some movement in this area.
182
519057
2961
在這領域,我們也看到了一些改變。
08:42
Folks who are building datasets are starting to put nutrition labels,
183
522060
3503
建立數據集的人開始在他們的數據集上
加上營養標示、數據表。
08:45
data sheets on their datasets.
184
525605
1793
08:47
And people who are using data are starting to request the information.
185
527398
3337
而使用數據的人 也開始要求這些資訊。
08:50
This is really heartening.
186
530735
1293
這真的讓人很振奮。
你可能會問,「Kasia, 那你為什麼還要來這裡?
08:52
And you might say, "Kasia, why are you up here?
187
532028
2210
一切看起來都很順利, 似乎越來越好了。」
08:54
Everything seems to be going well, seems to be getting better."
188
534280
3003
08:57
In some ways it is.
189
537700
1210
在某些方面,是的。
08:58
But I'm also here to tell you that our relationship to data
190
538951
2795
但我也要告訴你, 我們和數據的關係在惡化。
09:01
is getting worse.
191
541746
1460
09:03
Now the last few years have seen a supercharged interest
192
543664
3337
過去幾年,我們看到了 對收集數據集的興趣激增。
09:07
in gathering datasets.
193
547001
1919
09:09
Companies are scraping the web.
194
549504
1876
有公司在抓取網頁,
09:11
They're transcribing millions of hours of YouTube videos into text.
195
551380
4004
他們將數百萬小時的 YouTube 影片轉錄成文本。
09:15
By some estimates, they'll run out of information on the internet by 2026.
196
555885
3879
據一些估計,到 2026 年, 網際網路上的資訊將被抓取完。
09:20
They're even considering buying publishing houses
197
560515
2502
他們甚至考慮購買出版社,
09:23
so they can get access to printed text and books.
198
563017
2753
以獲取印刷文本和書籍的存取權。
09:27
So why are they gathering this information?
199
567980
2503
他們為什麼要收集這些資訊呢?
09:30
Well, they need more and more information
200
570483
1918
他們需要越來越多的資訊
09:32
to train a new technique called generative AI.
201
572443
2670
來訓練一種新技術, 叫做生成式 AI。
09:35
I want to tell you about the size of these datasets.
202
575154
2461
我想讓你了解這些數據集的規模。
09:38
If you look at GPT-3, which is a model that launched in 2020,
203
578533
3378
如果你看一下 2020 年發布的 GPT-3,
09:41
the training dataset included 300 billion words, or parts of words.
204
581953
5547
訓練數據集包括 三千億個單詞或字詞片段。
09:47
Now for context, the English language contains less than a million words.
205
587542
3878
作為參照,英語包含 不到一百萬個單詞。
09:52
Just three years later, DBRX was launched,
206
592505
3003
僅僅三年後,DBRX 發布, (開源大型語言模型)
09:55
which was trained on eight trillion words.
207
595508
3086
它的訓練數據集包括八萬億個單詞。
09:58
So 300 billion to eight trillion in three years.
208
598636
3212
所以三年內從三千億增加到八萬億。
10:01
And the datasets are getting bigger.
209
601848
2252
數據集越來越大。
10:04
Now with each successive model launch,
210
604600
2211
隨著每個新的模型發布,
10:06
the datasets are actually less and less transparent.
211
606853
3044
數據集變得越來越不透明。
10:09
And even we have access to the information,
212
609939
2169
即使我們有存取權,
10:12
it's so big, it's so hard to look inside without any kind of transparency tooling.
213
612108
4838
它也太大,很難在沒有任何 透明工具的情況下檢視內部。
10:18
And the generative AI itself is also causing some worries.
214
618865
4212
生成式 AI 本身 也引起了一些擔憂。
你可能通過 ChatGPT 接觸過這種技術。
10:23
And you've probably encountered this technique through ChatGPT.
215
623077
3712
10:26
I don't need to know what you do on the internet,
216
626831
2336
我不需要知道你在網上做了什麼,
那是你和網路之間的事,
但跟我一樣,你可能知道,
10:29
that's between you and the internet,
217
629167
1751
10:30
but you probably know, just like I do,
218
630918
1835
使用 ChatGPT 和其他 生成式 AI 技術創建訊息
10:32
how easy it is to create information using ChatGPT
219
632795
2378
10:35
and other generative AI technologies
220
635214
1752
並將其發布到網上是多麼容易。
10:36
and to put that out onto the web.
221
636966
1919
10:38
And so we're looking at a situation
222
638885
1710
所以我們正面臨著
大量的演算法生成的資訊, 但我們不會知道它們是演算法生成的,
10:40
in which we're going to encounter lots of information
223
640636
2503
10:43
that's algorithmically generated but we won't know it
224
643139
2502
也無法判斷其真實性。
10:45
and we won't know whether it's true.
225
645683
1752
這增加了 AI 潛在風險 和危害的規模。
10:47
And this increases the scale of the potential risks and harms from AI.
226
647476
3796
10:51
Not only that, I'm sorry,
227
651981
1460
不僅如此,抱歉,
再者,這些模型本身
10:53
but the models themselves are getting controlled
228
653482
2878
也受到越來越少數的 美國科技公司的控制。
10:56
by a smaller and smaller number of private actors in US tech firms.
229
656360
4171
這是去年,2023 年發布的模型。
11:00
So this is the models that were launched last year, in 2023.
230
660531
4046
你可以看到大多數是粉紅色的, 意味著它們來自工業界。
11:04
And you can see most of them are pink, meaning they came out of industry.
231
664577
3462
如果你仔細看,隨著時間, 越來越多來自工業界,
11:08
And if you look at this over time, more and more are coming out of industry
232
668080
3587
而來自包括學術界和政府在內的 其他部門越來越少,
11:11
and fewer and fewer are coming out of all the other sectors combined,
233
671709
3253
11:14
including academia and government,
234
674962
1710
而這些部門推出的技術 通常更容易受到審查。
11:16
where technology is often launched in a way
235
676672
2044
11:18
that's more easy to be scrutinized.
236
678758
2169
11:20
So if we go back to our cafe analogy,
237
680927
1793
如果我們回到咖啡館的比喻,
這就像少數私營公司
11:22
this is like you have a small number of private actors
238
682762
2878
擁有所有的成分,
11:25
who own all the ingredients,
239
685681
1877
他們在全球製作所有的三明治,
11:27
they make all the sandwiches globally,
240
687600
2961
而且沒有太多法規限制。
11:30
and there's not a lot of regulation.
241
690561
1960
這時你可能感到害怕,
11:33
And so at this point you're probably scared
242
693064
2002
甚至有點不舒服。
11:35
and maybe feeling a little uncomfortable.
243
695107
1961
這很諷刺,因為幾分鐘前 我請你吃三明治,你說好。
11:37
Which is ironic because a few minutes ago, I was going to get you all sandwiches
244
697109
3796
11:40
and you said yes.
245
700905
1168
這就是為什麼你不應該 接受陌生人的食物。
11:42
This is why you should not accept food from strangers.
246
702114
2586
但如果我不抱有希望, 我也不會站在這裡。
11:44
But I wouldn't be up here if I weren't also optimistic.
247
704742
2878
那是因為我認為
11:47
And that's because I think we have momentum
248
707620
2044
我們在法規和文化變革方面有動力。
11:49
behind the regulation and the culture changes.
249
709705
2503
尤其是如果我們遵循三個基本原則
11:52
Especially if we align ourselves with three basic principles
250
712833
2837
來與數據公司互動。
11:55
about how corporations should engage with data.
251
715670
2544
第一個原則是收集數據的公司 應該告訴我們,他們在收集什麼。
11:58
The first principle is that companies that gather data should tell us
252
718547
3713
12:02
what they're gathering.
253
722301
1418
這樣我們才能提出問題, 例如,這是版權材料嗎?
12:04
This would allow us to ask questions like, is it copyrighted material?
254
724470
3545
這些資訊是私人的嗎?
12:08
Is that information private?
255
728057
1919
12:09
Could you please stop?
256
729976
1543
你能停止嗎?
這也開啟了數據的科學研究。
12:11
It also opens up the data to scientific inquiry.
257
731560
2962
第二個原則是收集數據的公司 應該在做任何事情之前,
12:15
The second principle is that companies that are gathering our data should tell us
258
735731
3921
12:19
what they're going to do with it before they do anything with it.
259
739694
3253
告訴我們他們打算做什麼。
要求公司告訴我們他們的計畫,
12:23
And by requiring that companies tell us their plan,
260
743572
2878
這意味著他們必須有計畫,
12:26
this means that they have to have a plan,
261
746450
2294
這將是很好的一步。
12:28
which would be a great first step.
262
748744
1877
這也可能讓數據收集最小化,
12:31
It also probably would lead to the minimization of data capture,
263
751706
3336
因為如果他們不知道要做什麼,
12:35
because they wouldn't be able to capture data
264
755042
2169
他們就無法收集數據。
12:37
if they didn't know what they were already going to do with it.
265
757253
2961
最後,第三個原則是
12:40
And finally, principle three,
266
760256
1626
12:41
companies that build AI should tell us about the data
267
761882
2628
建立 AI 的公司應該告訴我們
他們用來訓練 AI 的數據。
12:44
that they use to train the AI.
268
764552
1960
這就是數據集營養標示
12:47
And this is where dataset nutrition labels
269
767179
2294
和其他透明度標示的功能。
12:49
and other transparency labeling comes into play.
270
769515
2294
12:52
You know, in the case where the data itself won't be made available,
271
772893
3212
你知道,在大多數情況下, 數據本身不會被公開,
12:56
which is most of the time, probably,
272
776147
2294
所以標示對我們來說十分重要, 讓我們能夠檢視成分,
12:58
the labeling is critical for us to be able to investigate the ingredients
273
778482
3546
並開始尋找解決方案。
13:02
and start to find solutions.
274
782028
1793
最後,我想告訴你一個好消息,
13:05
So I want to leave you with the good news,
275
785698
2044
那就是數據營養計畫和其他項目
13:07
and that is that the data nutrition projects and other projects
276
787742
3003
只是全球 AI 問責運動的一小部分。
13:10
are just a small part of a global movement
277
790786
3337
13:14
towards AI accountability.
278
794165
1877
數據集營養標示 和其他項目只是第一步。
13:16
Dataset Nutrition Label and other projects are just a first step.
279
796792
4088
法規即將到來,
13:21
Regulation's on the horizon,
280
801714
1752
文化規範正在改變,
13:23
the cultural norms are shifting,
281
803507
1544
尤其是如果我們遵循這三個基本原則,
13:25
especially if we align with these three basic principles
282
805051
2961
即公司應該告訴我們他們在收集什麼,
13:28
that companies should tell us what they're gathering,
283
808012
2544
在做任何事情之前告訴我們 他們打算做什麼,
13:30
tell us what they're going to do with it before they do anything with it,
284
810598
3462
並解釋他們用來 建立 AI 系統的數據。
13:34
and that companies that are building AI
285
814101
1919
13:36
should explain the data that they're using to build the system.
286
816062
3336
我們必須要求這些建立 AI 的公司 為他們所構建的 AI 負責,
13:40
We need to hold these organizations accountable
287
820191
2210
13:42
for the AI that they're building
288
822443
2002
追問他們,就像我們 會問食品公司的一樣,
13:44
by asking them, just like we do with the food industry,
289
824487
2627
內容是什麼,怎麼製作的?
13:47
what's inside and how did you make it?
290
827156
2294
只有這樣,我們才能在問題發生前
13:50
Only then can we mitigate the issues before they occur,
291
830201
3128
而不是發生後,減輕問題的危害。
13:53
as opposed to after they occur.
292
833371
1918
這樣才能創造一個整合算法 的網際網路,對所有人都更有助益。
13:55
And in doing so, create an integrated algorithmic internet
293
835664
3879
13:59
that is healthier for everyone.
294
839585
2669
謝謝大家。
14:02
Thank you.
295
842546
1168
(掌聲)
14:03
(Applause)
296
843714
2836
關於本網站

本網站將向您介紹對學習英語有用的 YouTube 視頻。 您將看到來自世界各地的一流教師教授的英語課程。 雙擊每個視頻頁面上顯示的英文字幕,從那裡播放視頻。 字幕與視頻播放同步滾動。 如果您有任何意見或要求,請使用此聯繫表與我們聯繫。

https://forms.gle/WvT1wiN1qDtmnspy7