Why AI Needs a “Nutrition Label” | Kasia Chmielinski | TED

32,426 views ・ 2024-06-14

TED


請雙擊下方英文字幕播放視頻。

譯者: 麗玲 辛
00:04
Now, I haven't met most of you or really any of you,
0
4209
3295
我雖然還沒認識在座的多數人, 甚至可以說根本沒見過面,
00:07
but I feel a really good vibe in the room.
1
7504
2085
但我感覺這裡的氣氛很好。
00:09
(Laughter)
2
9631
1209
(笑聲)
00:10
And so I think I'd like to treat you all to a meal.
3
10882
2503
所以我想請大家吃頓飯。
00:13
What do you think?
4
13426
1252
你們覺得如何?
00:14
Yes? Great, so many new friends.
5
14678
1877
好嗎?太好了,這麼多新朋友。
00:17
So we're going to go to this cafe,
6
17681
1668
我們要去這家咖啡館,
00:19
they serve sandwiches.
7
19349
1501
他們那裡有賣三明治,
00:20
And the sandwiches are really delicious.
8
20850
2002
而且非常好吃。
00:22
But I have to tell you that sometimes they make people really, really sick.
9
22852
4422
但我得跟你說,有時候這些三明治 會讓人非常非常不舒服。
00:27
(Laughter)
10
27774
1335
(笑聲)
00:29
And we don't know why.
11
29109
1251
我們也不知道為什麼,
00:30
Because the cafe won't tell us how they make the sandwich,
12
30402
2711
因為咖啡館不告訴我們 他們是怎麼做三明治的,
00:33
they won't tell us about the ingredients.
13
33154
2044
也不告訴我們用的食材是什麼。
00:35
And then the authorities have no way to fix the problem.
14
35198
3128
這樣一來,相關部門也 無法解決這個問題。
00:38
But the offer still stands.
15
38702
1293
不過,我的邀請仍然有效。
00:39
So who wants to get a sandwich?
16
39995
1543
誰想吃三明治?
00:41
(Laughter)
17
41538
1168
(笑聲)
00:42
Some brave souls, we can talk after.
18
42747
1752
有些勇敢的人,我們可以事後再聊。
00:45
But for the rest of you, I understand.
19
45000
2168
至於其他人,我能理解,
00:47
You don't have enough information
20
47210
1585
因為你們沒有足夠的資訊
00:48
to make good choices about your safety
21
48795
1835
來做出自身安全的好選擇,
00:50
or even fix the issue.
22
50672
1835
甚至也無法解決這個問題。
00:52
Now, before I further the anxiety here, I'm not actually trying to make you sick,
23
52507
3879
我不想繼續加深大家的焦慮,
我也不想讓你們不舒服,
00:56
but this is an analogy to how we're currently making algorithmic systems,
24
56428
3545
這只是個比喻,來說明 我們目前製作演算系統的方式,
00:59
also known as artificial intelligence or AI.
25
59973
3003
也就是所謂的人工智慧 (AI)。
01:04
Now, for those who haven't thought about the relationship
26
64060
2753
至於從沒想過 AI 和三明治有什麼關係的人,
01:06
between AI and sandwiches, don't worry about it,
27
66813
2586
別擔心,
01:09
I'm here for you, I'm going to explain.
28
69441
2294
我會解釋給你聽。
01:11
You see, AI systems, they provide benefit to society.
29
71776
3754
AI 系統對社會有益,
01:15
They feed us,
30
75530
1251
它們餵養我們,
01:16
but they're also inconsistently making us sick.
31
76823
3670
但它們也不時讓我們生病。
01:20
And we don't have access to the ingredients that go into the AI.
32
80535
4505
我們無法得知 AI 的原料,
01:25
And so we can't actually address the issues.
33
85040
2460
所以也無法真正解決問題。
01:28
We also can't stop eating AI
34
88418
1793
我們也無法像不吃有疑慮的 三明治那樣停止使用 AI,
01:30
like we can just stop eating a shady sandwich
35
90211
2128
01:32
because it's everywhere,
36
92339
1209
因為它無處不在,
01:33
and we often don't even know that we're encountering a system
37
93590
2878
而且我們甚至常常不知道 自己在使用演算系統。
01:36
that's algorithmically based.
38
96509
1794
01:38
So today, I'm going to tell you about some of the AI trends that I see.
39
98345
3878
今天,我會告訴你們 一些我所看到的 AI 趨勢。
01:42
I'm going to draw on my experience building these systems
40
102223
2711
我會根據過去二十年來 我建立這些系統的經驗,
01:44
over the last two decades to tell you about the tools
41
104934
2545
介紹我和其他人所開發的工具,
01:47
that I and others have built to look into these AI ingredients.
42
107520
3879
用來檢視這些 AI 的成分。
01:51
And finally, I'm going to leave you with three principles
43
111441
2711
最後,我會提供三個原則,
這些原則能讓我們和那些建立 AI 的公司有更健康的關係。
01:54
that I think will give us a healthier relationship
44
114152
2336
01:56
to the companies that build artificial intelligence.
45
116488
2836
02:00
I'm going to start with the question, how did we get here?
46
120241
2878
我會從一個問題開始: 我們怎麼走到今天的情況?
02:03
AI is not new.
47
123745
2169
AI 並不是新事物。
02:06
We have been living alongside AI for two decades.
48
126665
3378
我們已經與 AI 共處了二十年。
02:10
Every time that you apply for something online,
49
130418
2294
每當你在線上申請某些東西、
02:12
you open a bank account or you go through passport control,
50
132712
3420
開設銀行帳戶或通過護照查驗,
02:16
you're encountering an algorithmic system.
51
136132
2044
你都會接觸到某個演算系統。
02:19
We've also been living with the negative repercussions of AI for 20 years,
52
139010
4088
過去二十年來,我們也一直 在承受 AI 帶來的負面影響,
02:23
and this is how it makes us sick.
53
143139
1752
這就是它讓我們生病的原因。
02:25
These systems get deployed on broad populations,
54
145266
2920
這些系統被部署在廣大人群中,
02:28
and then certain subsets end up getting negatively disparately impacted,
55
148228
4921
然後某些群體會因此 受到負面的差異性影響,
02:33
usually on the basis of race or gender or other characteristics.
56
153191
3504
通常是基於種族、性別或其他特徵。
02:37
We need to be able to understand the ingredients to these systems
57
157862
3087
我們需要了解這些系統的成分,
02:40
so that we can address the issues.
58
160990
2086
才能解決問題。
02:43
So what are the ingredients to an AI system?
59
163827
3086
那麼,AI 系統的成分是什麼呢?
02:46
Well, data fuels the AI.
60
166955
2294
數據是 AI 的燃料。
02:49
The AI is going to look like the data that you gave it.
61
169290
2962
AI 的表現會反映你給它的數據。
02:52
So for example,
62
172752
1293
例如,
02:54
if I want to make a risk-assessment system for diabetes,
63
174087
4129
如果我想製作一個 糖尿病風險評估系統,
02:58
my training data set might be adults in a certain region.
64
178258
4337
我的訓練數據集 可能來自某個地區的成年人。
03:02
And so I'll build that system,
65
182929
1460
這樣,我建立的這個系統, 對這些成年人來說就很有用。
03:04
it'll work really well for those adults in that region.
66
184389
2627
但它對其他地區的成年人 或者小孩可能根本沒幫助。
03:07
But it does not work for adults in other regions
67
187016
2294
03:09
or maybe at all for children.
68
189310
1419
03:10
So you can imagine if we deploy this for all those populations,
69
190770
3003
你可以想像,如果我們把這個系統 部署到所有群體中,
會有很多人受到傷害。
03:13
there are going to be a lot of people who are harmed.
70
193815
2502
我們需要在使用數據前, 了解它的質量。
03:16
We need to be able to understand the quality of the data before we use it.
71
196317
4422
03:22
But I'm sorry to tell you that we currently live
72
202157
2252
但很遺憾地告訴你,
我們目前生活在 我名為數據的狂野西部時代。
03:24
in what I call the Wild West of data.
73
204451
2502
03:26
It's really hard to assess quality of data before you use it.
74
206995
4171
很難在使用數據前評估它的質量。
03:31
There are no global standards for data quality assessment,
75
211166
2877
沒有全球性的數據質量評估標準,
03:34
and there are very few data regulations around how you can use data
76
214085
3295
而且關於如何使用數據 和能使用哪些數據的法規也很少。
03:37
and what types of data you can use.
77
217422
2377
03:40
This is kind of like in the food safety realm.
78
220967
2294
這就像食品安全領域一樣,
03:43
If we couldn't understand where the ingredients were sourced,
79
223303
3545
如果我們無法了解原料的來源,
03:46
we also had no idea whether they were safe for us to consume.
80
226890
3003
也無法知道它們是否可安全食用。
03:50
We also tend to stitch data together,
81
230643
2253
我們還經常把數據拼湊在一起,
03:52
and every time we stitch this data together,
82
232937
2086
每次我們拼湊數據時,
無論是從網路上抓取、生成還是獲取,
03:55
which we might find on the internet, scrape, we might generate it,
83
235023
3128
03:58
we could source it.
84
238151
1376
03:59
We lose information about the quality of the data.
85
239527
3128
都會失去關於數據質量的資訊。
04:03
And the folks who are building the models
86
243156
1960
而建立模型的人
並不是那些找到數據的人,
04:05
are not the ones that found the data.
87
245116
1919
04:07
So there's further information that's lost.
88
247076
2336
所以資訊進一步丟失。
04:10
Now, I've been asking myself a lot of questions
89
250497
2210
我一直在問自己,
04:12
about how can we understand the data quality before we use it.
90
252749
3754
如何在使用數據前,了解數據質量。
04:16
And this emerges from two decades of building these kinds of systems.
91
256544
4672
這是源於我二十年來 建立這些系統的經驗。
04:21
The way I was trained to build systems is similar to how people do it today.
92
261216
3920
我之前受訓建立系統的方式 和今天的人們相似,
04:25
You build for the middle of the distribution.
93
265178
2210
你為常態分佈的中間群體建立系統,
04:27
That's your normal user.
94
267430
1919
那是你的常態用戶。
04:29
So for me, a lot of my training data sets
95
269390
1961
就我而言,我的很多訓練數據集
04:31
would include information about people from the Western world who speak English,
96
271392
4213
包括來自西方世界、說英語、
04:35
who have certain normative characteristics.
97
275605
2336
具有某些標準特徵的人的訊息。
04:37
And it took me an embarrassingly long amount of time
98
277982
2461
我花了很長時間才意識到
04:40
to realize that I was not my own user.
99
280443
2503
我自己並不是我設定的用戶。
04:43
So I identify as non-binary, as mixed race,
100
283696
2628
我自認為是非二元性別、混血種族,
04:46
I wear a hearing aid
101
286324
1668
戴助聽器,
04:47
and I just wasn't represented in the data sets that I was using.
102
287992
3587
而我在使用的數據集中 看不到我這樣的人。
04:51
And so I was building systems that literally didn't work for me.
103
291621
3378
所以我建立的系統根本不適合我。
04:55
And for example, I once built a system that repeatedly told me
104
295041
3462
例如,我曾經建立過一個系統,
反覆告訴我,我是個東歐白人女性。
04:58
that I was a white Eastern-European lady.
105
298503
3670
05:02
This did a real number on my identity.
106
302966
2043
這對我的身份認同造成了很大的傷害。
05:05
(Laughter)
107
305051
1919
(笑聲)
05:06
But perhaps even more worrying,
108
306970
1793
但或許更令人擔憂的是,
05:08
this was a system to be deployed in health care,
109
308805
2961
這個系統要部署在醫療領域,
05:11
where your background can determine things like risk scores for diseases.
110
311808
4296
依個人背景決定某些疾病的風險評分。
05:17
And so I started to wonder,
111
317605
1627
所以我開始思考,
05:19
can I build tools and work with others to do this
112
319274
2794
我能否建立工具,並與他人合作,
05:22
so that I can look inside of a dataset before I use it?
113
322068
2836
在使用數據前,檢查數據集?
05:25
In 2018, I was part of a fellowship at Harvard and MIT,
114
325655
3629
2018 年,我參加了哈佛和 麻省理工學院的研究員計畫,
05:29
and I, with some colleagues, decided to try to address this problem.
115
329284
4379
我和一些同事決定嘗試解決這個問題。
05:33
And so we launched the Data Nutrition Project,
116
333705
2836
我們推出了「數據營養計畫」,
05:36
which is a research group and also a nonprofit
117
336541
2919
這是一個研究團體, 也是個非營利組織,
05:39
that builds nutrition labels for datasets.
118
339502
2711
要建立數據集的營養標示。
05:43
So similar to food nutrition labels,
119
343381
2628
就像食品營養標示一樣,
05:46
the idea here is that you can look inside of a data set before you use it.
120
346050
3504
這個概念是讓你在使用數據集之前
能夠檢視它的內部,了解其成分,
05:49
You can understand the ingredients,
121
349554
1710
看看是否適合你的用途。
05:51
see whether it's healthy for the things that you want to do.
122
351264
2878
這是一個卡通化的標示版本。
05:54
Now this is a cartoonified version of the label.
123
354142
2669
05:56
The top part tells you about the completion of the label itself.
124
356811
4213
頂部告訴你標示本身的完成情況。
06:01
And underneath that you have information about the data,
125
361065
2628
在下方,你可以看到關於數據的訊息,
06:03
the description, the keywords, the tags,
126
363693
2044
包括描述、關鍵字、標籤,
06:05
and importantly, on the right hand side,
127
365778
1919
以及很重要的,在右側,說明 如何使用和不應使用這些數據。
06:07
how you should and should not use the data.
128
367697
2586
06:10
If you could scroll on this cartoon,
129
370700
1793
如果你捲動這個卡通標示,
06:12
you would see information about risks and mitigation strategies
130
372493
3003
你會看到關於風險 和緩解策略的各種訊息。
06:15
across a number of vectors.
131
375496
1544
06:17
And we launched this with two audiences in mind.
132
377707
2836
我們推出這個標示是針對兩類受眾。
06:20
The first audience are folks who are building AI.
133
380543
3545
第一類是建立 AI 的人,
06:24
So they’re choosing datasets.
134
384130
1418
他們是選擇數據集的人,
06:25
We want to help them make a better choice.
135
385590
2294
我們希望幫助他們做出更好的選擇。
06:27
The second audience are folks who are building datasets.
136
387926
3128
第二類是建立數據集的人。
06:31
And it turns out
137
391095
1168
事實證明,
06:32
that when you tell someone they have to put a label on something,
138
392305
3086
當你告訴某人他們必須 在某物上貼標示時,
他們會在製作之前考慮成分。
06:35
they think about the ingredients beforehand.
139
395391
2086
06:38
The analogy here might be,
140
398102
1544
這就像如果我想做無麩質三明治,
06:39
if I want to make a sandwich and say that it’s gluten-free,
141
399687
2878
06:42
I have to think about all the components as I make the sandwich,
142
402607
3045
我必須在製作過程中考慮所有的成分,
06:45
the bread and the ingredients, the sauces.
143
405652
2210
麵包和其他材料、餡料,
06:47
I can't just put it on a sandwich and put it in front of you
144
407904
2836
不能只是做好後、放到你面前,
06:50
and tell you it's gluten-free.
145
410740
1960
就說它是無麩質的。
06:52
We're really proud of the work that we've done.
146
412700
2253
我們對我們所做的工作感到非常自豪。
06:54
We launched this as a design and then a prototype
147
414994
2336
我們首先推出了設計,然後是原型,
06:57
and ultimately a tool for others to make their own labels.
148
417330
3920
最終成為一項工具, 讓其他人製作自己的標示。
07:01
And we've worked with experts at places like Microsoft Research,
149
421709
3045
我們與微軟研究院、
07:04
the United Nations and professors globally
150
424754
3045
聯合國及全球的教授等專家合作,
07:07
to integrate the label and the methodology
151
427840
2002
把這個標示和方法
07:09
into their work flows and into their curricula.
152
429884
2628
整合到他們的工作流程和課程中。
07:13
But we know it only goes so far.
153
433096
1877
但我們知道這還遠遠不夠。
07:15
And that's because it's actually really hard to get a label
154
435014
2920
因為要在每一個數據集上貼標示 實際上非常困難。
07:17
on every single dataset.
155
437976
2293
07:20
And this comes down to the question
156
440311
1710
這就歸結到這個問題: 為什麼要先在數據集上貼標示?
07:22
of why would you put a label on a dataset to begin with?
157
442063
3086
07:25
Well, the first reason is not rocket science.
158
445525
2169
第一個原因並不複雜,
07:27
It's that you have to.
159
447735
1835
就是因為必須這麼做。
07:29
And this is, quite frankly, why food nutrition labels exist.
160
449570
2878
老實說,這也是為什麼 食品營養標示存在的原因。
07:32
It's because if they didn't put them on the boxes, it would be illegal.
161
452490
3420
因為如果他們不在盒子上 貼標示,這就是違法的。
07:36
However, we don't really have AI regulation.
162
456703
2377
然而,我們並沒有 真正的 AI 法規,
07:39
We don't have much regulation around the use of data.
163
459122
2627
我們對數據使用的法規也很少。
07:42
Now there is some on the horizon.
164
462208
1960
不過,有些法規即將出現。
07:44
For example, the EU AI Act just passed this week.
165
464168
3420
例如,本週歐盟 剛通過了 AI 法案,
07:48
And although there are no requirements around making the training data available,
166
468381
4630
雖然沒有要求提供訓練數據,
07:53
they do have provisions for creating transparency labeling
167
473052
4254
但他們確實有關於創建 標示透明度的規定,
07:57
like the dataset nutrition label, data sheets, data statements.
168
477348
3879
比如數據集營養標示、 數據表和數據聲明。
08:01
There are many in the space.
169
481269
1376
在這個領域有很多這樣的做法。
08:02
We think this is a really good first step.
170
482645
2044
我們認為這是很好的一步。
08:05
The second reason that you might have a label on a dataset
171
485606
2753
第二個該在數據集上貼標示的原因,
08:08
is because it is a best practice or a cultural norm.
172
488401
3920
這是一種最佳實踐或文化規範。
這裡的例子可能是
08:13
The example here might be how we're starting to see
173
493364
2544
08:15
more and more food packaging and menus at restaurants
174
495950
3337
我們越來越常看到 食品包裝和餐廳菜單上
08:19
include information about whether there's gluten.
175
499328
2920
有關於是否含有麩質的訊息。
08:22
This is not required by law,
176
502248
1794
這不是法律要求的,
08:24
although if you do say it, it had better be true.
177
504042
2627
不過如果你說沒有, 那最好是真的如此。
08:27
And the reason that people are adding this to their menus
178
507211
2711
而人們把這些訊息加到他們的菜單
08:29
and their food packaging
179
509922
1168
和食品包裝上,
08:31
is because there's an increased awareness of the sensitivity
180
511090
2878
是因為人們對這種過敏或病症的敏感性
08:33
and kind of the seriousness of that kind of an allergy or condition.
181
513968
3754
和嚴重性有了更多的認識。
08:39
So we're also seeing some movement in this area.
182
519057
2961
在這領域,我們也看到了一些改變。
08:42
Folks who are building datasets are starting to put nutrition labels,
183
522060
3503
建立數據集的人開始在他們的數據集上
加上營養標示、數據表。
08:45
data sheets on their datasets.
184
525605
1793
08:47
And people who are using data are starting to request the information.
185
527398
3337
而使用數據的人 也開始要求這些資訊。
08:50
This is really heartening.
186
530735
1293
這真的讓人很振奮。
你可能會問,「Kasia, 那你為什麼還要來這裡?
08:52
And you might say, "Kasia, why are you up here?
187
532028
2210
一切看起來都很順利, 似乎越來越好了。」
08:54
Everything seems to be going well, seems to be getting better."
188
534280
3003
08:57
In some ways it is.
189
537700
1210
在某些方面,是的。
08:58
But I'm also here to tell you that our relationship to data
190
538951
2795
但我也要告訴你, 我們和數據的關係在惡化。
09:01
is getting worse.
191
541746
1460
09:03
Now the last few years have seen a supercharged interest
192
543664
3337
過去幾年,我們看到了 對收集數據集的興趣激增。
09:07
in gathering datasets.
193
547001
1919
09:09
Companies are scraping the web.
194
549504
1876
有公司在抓取網頁,
09:11
They're transcribing millions of hours of YouTube videos into text.
195
551380
4004
他們將數百萬小時的 YouTube 影片轉錄成文本。
09:15
By some estimates, they'll run out of information on the internet by 2026.
196
555885
3879
據一些估計,到 2026 年, 網際網路上的資訊將被抓取完。
09:20
They're even considering buying publishing houses
197
560515
2502
他們甚至考慮購買出版社,
09:23
so they can get access to printed text and books.
198
563017
2753
以獲取印刷文本和書籍的存取權。
09:27
So why are they gathering this information?
199
567980
2503
他們為什麼要收集這些資訊呢?
09:30
Well, they need more and more information
200
570483
1918
他們需要越來越多的資訊
09:32
to train a new technique called generative AI.
201
572443
2670
來訓練一種新技術, 叫做生成式 AI。
09:35
I want to tell you about the size of these datasets.
202
575154
2461
我想讓你了解這些數據集的規模。
09:38
If you look at GPT-3, which is a model that launched in 2020,
203
578533
3378
如果你看一下 2020 年發布的 GPT-3,
09:41
the training dataset included 300 billion words, or parts of words.
204
581953
5547
訓練數據集包括 三千億個單詞或字詞片段。
09:47
Now for context, the English language contains less than a million words.
205
587542
3878
作為參照,英語包含 不到一百萬個單詞。
09:52
Just three years later, DBRX was launched,
206
592505
3003
僅僅三年後,DBRX 發布, (開源大型語言模型)
09:55
which was trained on eight trillion words.
207
595508
3086
它的訓練數據集包括八萬億個單詞。
09:58
So 300 billion to eight trillion in three years.
208
598636
3212
所以三年內從三千億增加到八萬億。
10:01
And the datasets are getting bigger.
209
601848
2252
數據集越來越大。
10:04
Now with each successive model launch,
210
604600
2211
隨著每個新的模型發布,
10:06
the datasets are actually less and less transparent.
211
606853
3044
數據集變得越來越不透明。
10:09
And even we have access to the information,
212
609939
2169
即使我們有存取權,
10:12
it's so big, it's so hard to look inside without any kind of transparency tooling.
213
612108
4838
它也太大,很難在沒有任何 透明工具的情況下檢視內部。
10:18
And the generative AI itself is also causing some worries.
214
618865
4212
生成式 AI 本身 也引起了一些擔憂。
你可能通過 ChatGPT 接觸過這種技術。
10:23
And you've probably encountered this technique through ChatGPT.
215
623077
3712
10:26
I don't need to know what you do on the internet,
216
626831
2336
我不需要知道你在網上做了什麼,
那是你和網路之間的事,
但跟我一樣,你可能知道,
10:29
that's between you and the internet,
217
629167
1751
10:30
but you probably know, just like I do,
218
630918
1835
使用 ChatGPT 和其他 生成式 AI 技術創建訊息
10:32
how easy it is to create information using ChatGPT
219
632795
2378
10:35
and other generative AI technologies
220
635214
1752
並將其發布到網上是多麼容易。
10:36
and to put that out onto the web.
221
636966
1919
10:38
And so we're looking at a situation
222
638885
1710
所以我們正面臨著
大量的演算法生成的資訊, 但我們不會知道它們是演算法生成的,
10:40
in which we're going to encounter lots of information
223
640636
2503
10:43
that's algorithmically generated but we won't know it
224
643139
2502
也無法判斷其真實性。
10:45
and we won't know whether it's true.
225
645683
1752
這增加了 AI 潛在風險 和危害的規模。
10:47
And this increases the scale of the potential risks and harms from AI.
226
647476
3796
10:51
Not only that, I'm sorry,
227
651981
1460
不僅如此,抱歉,
再者,這些模型本身
10:53
but the models themselves are getting controlled
228
653482
2878
也受到越來越少數的 美國科技公司的控制。
10:56
by a smaller and smaller number of private actors in US tech firms.
229
656360
4171
這是去年,2023 年發布的模型。
11:00
So this is the models that were launched last year, in 2023.
230
660531
4046
你可以看到大多數是粉紅色的, 意味著它們來自工業界。
11:04
And you can see most of them are pink, meaning they came out of industry.
231
664577
3462
如果你仔細看,隨著時間, 越來越多來自工業界,
11:08
And if you look at this over time, more and more are coming out of industry
232
668080
3587
而來自包括學術界和政府在內的 其他部門越來越少,
11:11
and fewer and fewer are coming out of all the other sectors combined,
233
671709
3253
11:14
including academia and government,
234
674962
1710
而這些部門推出的技術 通常更容易受到審查。
11:16
where technology is often launched in a way
235
676672
2044
11:18
that's more easy to be scrutinized.
236
678758
2169
11:20
So if we go back to our cafe analogy,
237
680927
1793
如果我們回到咖啡館的比喻,
這就像少數私營公司
11:22
this is like you have a small number of private actors
238
682762
2878
擁有所有的成分,
11:25
who own all the ingredients,
239
685681
1877
他們在全球製作所有的三明治,
11:27
they make all the sandwiches globally,
240
687600
2961
而且沒有太多法規限制。
11:30
and there's not a lot of regulation.
241
690561
1960
這時你可能感到害怕,
11:33
And so at this point you're probably scared
242
693064
2002
甚至有點不舒服。
11:35
and maybe feeling a little uncomfortable.
243
695107
1961
這很諷刺,因為幾分鐘前 我請你吃三明治,你說好。
11:37
Which is ironic because a few minutes ago, I was going to get you all sandwiches
244
697109
3796
11:40
and you said yes.
245
700905
1168
這就是為什麼你不應該 接受陌生人的食物。
11:42
This is why you should not accept food from strangers.
246
702114
2586
但如果我不抱有希望, 我也不會站在這裡。
11:44
But I wouldn't be up here if I weren't also optimistic.
247
704742
2878
那是因為我認為
11:47
And that's because I think we have momentum
248
707620
2044
我們在法規和文化變革方面有動力。
11:49
behind the regulation and the culture changes.
249
709705
2503
尤其是如果我們遵循三個基本原則
11:52
Especially if we align ourselves with three basic principles
250
712833
2837
來與數據公司互動。
11:55
about how corporations should engage with data.
251
715670
2544
第一個原則是收集數據的公司 應該告訴我們,他們在收集什麼。
11:58
The first principle is that companies that gather data should tell us
252
718547
3713
12:02
what they're gathering.
253
722301
1418
這樣我們才能提出問題, 例如,這是版權材料嗎?
12:04
This would allow us to ask questions like, is it copyrighted material?
254
724470
3545
這些資訊是私人的嗎?
12:08
Is that information private?
255
728057
1919
12:09
Could you please stop?
256
729976
1543
你能停止嗎?
這也開啟了數據的科學研究。
12:11
It also opens up the data to scientific inquiry.
257
731560
2962
第二個原則是收集數據的公司 應該在做任何事情之前,
12:15
The second principle is that companies that are gathering our data should tell us
258
735731
3921
12:19
what they're going to do with it before they do anything with it.
259
739694
3253
告訴我們他們打算做什麼。
要求公司告訴我們他們的計畫,
12:23
And by requiring that companies tell us their plan,
260
743572
2878
這意味著他們必須有計畫,
12:26
this means that they have to have a plan,
261
746450
2294
這將是很好的一步。
12:28
which would be a great first step.
262
748744
1877
這也可能讓數據收集最小化,
12:31
It also probably would lead to the minimization of data capture,
263
751706
3336
因為如果他們不知道要做什麼,
12:35
because they wouldn't be able to capture data
264
755042
2169
他們就無法收集數據。
12:37
if they didn't know what they were already going to do with it.
265
757253
2961
最後,第三個原則是
12:40
And finally, principle three,
266
760256
1626
12:41
companies that build AI should tell us about the data
267
761882
2628
建立 AI 的公司應該告訴我們
他們用來訓練 AI 的數據。
12:44
that they use to train the AI.
268
764552
1960
這就是數據集營養標示
12:47
And this is where dataset nutrition labels
269
767179
2294
和其他透明度標示的功能。
12:49
and other transparency labeling comes into play.
270
769515
2294
12:52
You know, in the case where the data itself won't be made available,
271
772893
3212
你知道,在大多數情況下, 數據本身不會被公開,
12:56
which is most of the time, probably,
272
776147
2294
所以標示對我們來說十分重要, 讓我們能夠檢視成分,
12:58
the labeling is critical for us to be able to investigate the ingredients
273
778482
3546
並開始尋找解決方案。
13:02
and start to find solutions.
274
782028
1793
最後,我想告訴你一個好消息,
13:05
So I want to leave you with the good news,
275
785698
2044
那就是數據營養計畫和其他項目
13:07
and that is that the data nutrition projects and other projects
276
787742
3003
只是全球 AI 問責運動的一小部分。
13:10
are just a small part of a global movement
277
790786
3337
13:14
towards AI accountability.
278
794165
1877
數據集營養標示 和其他項目只是第一步。
13:16
Dataset Nutrition Label and other projects are just a first step.
279
796792
4088
法規即將到來,
13:21
Regulation's on the horizon,
280
801714
1752
文化規範正在改變,
13:23
the cultural norms are shifting,
281
803507
1544
尤其是如果我們遵循這三個基本原則,
13:25
especially if we align with these three basic principles
282
805051
2961
即公司應該告訴我們他們在收集什麼,
13:28
that companies should tell us what they're gathering,
283
808012
2544
在做任何事情之前告訴我們 他們打算做什麼,
13:30
tell us what they're going to do with it before they do anything with it,
284
810598
3462
並解釋他們用來 建立 AI 系統的數據。
13:34
and that companies that are building AI
285
814101
1919
13:36
should explain the data that they're using to build the system.
286
816062
3336
我們必須要求這些建立 AI 的公司 為他們所構建的 AI 負責,
13:40
We need to hold these organizations accountable
287
820191
2210
13:42
for the AI that they're building
288
822443
2002
追問他們,就像我們 會問食品公司的一樣,
13:44
by asking them, just like we do with the food industry,
289
824487
2627
內容是什麼,怎麼製作的?
13:47
what's inside and how did you make it?
290
827156
2294
只有這樣,我們才能在問題發生前
13:50
Only then can we mitigate the issues before they occur,
291
830201
3128
而不是發生後,減輕問題的危害。
13:53
as opposed to after they occur.
292
833371
1918
這樣才能創造一個整合算法 的網際網路,對所有人都更有助益。
13:55
And in doing so, create an integrated algorithmic internet
293
835664
3879
13:59
that is healthier for everyone.
294
839585
2669
謝謝大家。
14:02
Thank you.
295
842546
1168
(掌聲)
14:03
(Applause)
296
843714
2836
關於本網站

本網站將向您介紹對學習英語有用的 YouTube 視頻。 您將看到來自世界各地的一流教師教授的英語課程。 雙擊每個視頻頁面上顯示的英文字幕,從那裡播放視頻。 字幕與視頻播放同步滾動。 如果您有任何意見或要求,請使用此聯繫表與我們聯繫。

https://forms.gle/WvT1wiN1qDtmnspy7


This website was created in October 2020 and last updated on June 12, 2025.

It is now archived and preserved as an English learning resource.

Some information may be out of date.

隱私政策

eng.lish.video

Developer's Blog