Why AI Needs a “Nutrition Label” | Kasia Chmielinski | TED

32,426 views ・ 2024-06-14

TED


请双击下面的英文字幕来播放视频。

翻译人员: Paris Yuan 校对人员: suya f.
00:04
Now, I haven't met most of you or really any of you,
0
4209
3295
现在,我还没有见过你们中的大多数人, 也没有见过你们中的任何人,
00:07
but I feel a really good vibe in the room.
1
7504
2085
但我感觉房间里的气氛非常好。
00:09
(Laughter)
2
9631
1209
(笑声)
00:10
And so I think I'd like to treat you all to a meal.
3
10882
2503
所以我想 请大家吃一顿饭。
00:13
What do you think?
4
13426
1252
你觉得怎么样?
00:14
Yes? Great, so many new friends.
5
14678
1877
是吗?太棒了,这么多新朋友。
00:17
So we're going to go to this cafe,
6
17681
1668
所以我们要去这家咖啡馆,
00:19
they serve sandwiches.
7
19349
1501
他们供应三明治。
00:20
And the sandwiches are really delicious.
8
20850
2002
而且三明治真的很好吃。
00:22
But I have to tell you that sometimes they make people really, really sick.
9
22852
4422
但我必须告诉你,有时候 它们会让人非常生病。
00:27
(Laughter)
10
27774
1335
(笑声)
00:29
And we don't know why.
11
29109
1251
而且我们不知道为什么。
00:30
Because the cafe won't tell us how they make the sandwich,
12
30402
2711
因为咖啡馆不会告诉我们他们是 怎么做三明治的,
00:33
they won't tell us about the ingredients.
13
33154
2044
所以他们不会告诉我们食材。
00:35
And then the authorities have no way to fix the problem.
14
35198
3128
然后当局没有办法解决这个问题。
00:38
But the offer still stands.
15
38702
1293
但是这个提议仍然有效。
00:39
So who wants to get a sandwich?
16
39995
1543
那么谁想吃三明治?
00:41
(Laughter)
17
41538
1168
(笑声)
00:42
Some brave souls, we can talk after.
18
42747
1752
一些勇敢的人,我们可以事后再谈。
00:45
But for the rest of you, I understand.
19
45000
2168
但是对于你们其余的人来说,我明白。
00:47
You don't have enough information
20
47210
1585
你没有足够的信息
00:48
to make good choices about your safety
21
48795
1835
来做出正确的安全选择,
00:50
or even fix the issue.
22
50672
1835
甚至无法解决问题。
00:52
Now, before I further the anxiety here, I'm not actually trying to make you sick,
23
52507
3879
现在,在我进一步阐述焦虑之前, 我并不是想让你感到恶心,
00:56
but this is an analogy to how we're currently making algorithmic systems,
24
56428
3545
但这可以类比我们 目前如何制造算法系统,
00:59
also known as artificial intelligence or AI.
25
59973
3003
也称为人工 智能或人工智能。
01:04
Now, for those who haven't thought about the relationship
26
64060
2753
现在,对于那些还没有 考虑过人工智能和三明治
01:06
between AI and sandwiches, don't worry about it,
27
66813
2586
之间关系的人, 不用担心,
01:09
I'm here for you, I'm going to explain.
28
69441
2294
我在这里等你,我会解释的。
01:11
You see, AI systems, they provide benefit to society.
29
71776
3754
你看,人工智能系统, 它们为社会带来好处。
01:15
They feed us,
30
75530
1251
它们养活我们,
01:16
but they're also inconsistently making us sick.
31
76823
3670
但也会不一致地 使我们生病。
01:20
And we don't have access to the ingredients that go into the AI.
32
80535
4505
而且我们无法获得 进入人工智能的要素。
01:25
And so we can't actually address the issues.
33
85040
2460
因此,我们实际上无法 解决这些问题。
01:28
We also can't stop eating AI
34
88418
1793
我们也无法停止吃人工智能,
01:30
like we can just stop eating a shady sandwich
35
90211
2128
就像停止吃阴凉的三明治一样,
01:32
because it's everywhere,
36
92339
1209
因为它无处不在,
01:33
and we often don't even know that we're encountering a system
37
93590
2878
而且我们通常甚至不知道我们遇到
01:36
that's algorithmically based.
38
96509
1794
的是一个基于算法的系统。
01:38
So today, I'm going to tell you about some of the AI trends that I see.
39
98345
3878
因此,今天,我将向大家 介绍我所看到的一些人工智能趋势。
01:42
I'm going to draw on my experience building these systems
40
102223
2711
我将借鉴我在过去二十年中 构建这些系统的
01:44
over the last two decades to tell you about the tools
41
104934
2545
经验, 向大家介绍我和 其他人为研究这些人工智
01:47
that I and others have built to look into these AI ingredients.
42
107520
3879
要素而开发的工具。
01:51
And finally, I'm going to leave you with three principles
43
111441
2711
最后,我要给你们讲三项原则
01:54
that I think will give us a healthier relationship
44
114152
2336
我认为这将使我们 与开发人工智能的公司
01:56
to the companies that build artificial intelligence.
45
116488
2836
建立更健康的关系。
02:00
I'm going to start with the question, how did we get here?
46
120241
2878
我首先要问一个问题,我们是 怎么来到这里的?
02:03
AI is not new.
47
123745
2169
人工智能并不是什么新鲜事物。
02:06
We have been living alongside AI for two decades.
48
126665
3378
我们与人工智能共处了二十年。
02:10
Every time that you apply for something online,
49
130418
2294
每当你在网上申请、
02:12
you open a bank account or you go through passport control,
50
132712
3420
开设银行账户 或通过护照检查时,
02:16
you're encountering an algorithmic system.
51
136132
2044
你都会遇到一个算法系统。
02:19
We've also been living with the negative repercussions of AI for 20 years,
52
139010
4088
20年来,我们也一直忍受着 人工智能的负面影响,
02:23
and this is how it makes us sick.
53
143139
1752
这就是它让我们生病的原因。
02:25
These systems get deployed on broad populations,
54
145266
2920
这些系统部署 在广泛的人群中,
02:28
and then certain subsets end up getting negatively disparately impacted,
55
148228
4921
然后某些子集最终会受到 不同的负面影响,
02:33
usually on the basis of race or gender or other characteristics.
56
153191
3504
通常是基于种族、 性别或其他特征。
02:37
We need to be able to understand the ingredients to these systems
57
157862
3087
我们需要能够了解这些 系统的组成部分,
02:40
so that we can address the issues.
58
160990
2086
这样我们才能解决问题。
02:43
So what are the ingredients to an AI system?
59
163827
3086
那么,人工智能系统的组成部分是什么?
02:46
Well, data fuels the AI.
60
166955
2294
好吧,数据为人工智能提供了动力。
02:49
The AI is going to look like the data that you gave it.
61
169290
2962
人工智能看起来会 像你给它的数据。
02:52
So for example,
62
172752
1293
因此,举例来说,
02:54
if I want to make a risk-assessment system for diabetes,
63
174087
4129
如果我想建立一个 糖尿病风险评估系统,
02:58
my training data set might be adults in a certain region.
64
178258
4337
我的训练数据集可能是某个地区 的成年人。
03:02
And so I'll build that system,
65
182929
1460
因此,我将建立这个系统,
03:04
it'll work really well for those adults in that region.
66
184389
2627
它将非常适合该地区 的成年人。
03:07
But it does not work for adults in other regions
67
187016
2294
但是它不适用于其他地区的成年人,
03:09
or maybe at all for children.
68
189310
1419
甚至根本不适用于儿童。
03:10
So you can imagine if we deploy this for all those populations,
69
190770
3003
因此,你可以想象,如果我们 对所有这些人群进行部署,
03:13
there are going to be a lot of people who are harmed.
70
193815
2502
将会有很多人受到伤害。
03:16
We need to be able to understand the quality of the data before we use it.
71
196317
4422
在使用数据之前, 我们需要能够了解数据的质量。
03:22
But I'm sorry to tell you that we currently live
72
202157
2252
但我很抱歉地 告诉你,我们目前生活
03:24
in what I call the Wild West of data.
73
204451
2502
在我所谓的数据狂野西部。
03:26
It's really hard to assess quality of data before you use it.
74
206995
4171
在使用数据之前, 真的很难评估数据的质量。
03:31
There are no global standards for data quality assessment,
75
211166
2877
没有数据质量评估 的全球标准,
03:34
and there are very few data regulations around how you can use data
76
214085
3295
关于如何使用数据
03:37
and what types of data you can use.
77
217422
2377
和使用哪些类型的数据法规也很少。
03:40
This is kind of like in the food safety realm.
78
220967
2294
这有点像食品安全领域。
03:43
If we couldn't understand where the ingredients were sourced,
79
223303
3545
如果我们无法理解 这些食材的来源,
03:46
we also had no idea whether they were safe for us to consume.
80
226890
3003
我们也不知道 它们是否可以安全食用。
03:50
We also tend to stitch data together,
81
230643
2253
我们也倾向于将数据拼接在一起,
03:52
and every time we stitch this data together,
82
232937
2086
每当我们将 这些数据拼接在一起时,
我们可能会在互联网上找到
03:55
which we might find on the internet, scrape, we might generate it,
83
235023
3128
这些数据,抓取, 我们可能会生成这些数据,
03:58
we could source it.
84
238151
1376
我们都可以获取这些数据。
03:59
We lose information about the quality of the data.
85
239527
3128
我们会丢失 有关数据质量的信息。
04:03
And the folks who are building the models
86
243156
1960
而且,构建模型
04:05
are not the ones that found the data.
87
245116
1919
的人并不是发现数据的人。
04:07
So there's further information that's lost.
88
247076
2336
因此,还有更多的信息丢失了。
04:10
Now, I've been asking myself a lot of questions
89
250497
2210
现在,我一直在问自己很多
04:12
about how can we understand the data quality before we use it.
90
252749
3754
关于在使用数据之前如何了解 数据质量的问题。
04:16
And this emerges from two decades of building these kinds of systems.
91
256544
4672
这源于二十年来 建造此类系统的历程。
04:21
The way I was trained to build systems is similar to how people do it today.
92
261216
3920
我接受系统构建培训的方式 与当今人们的操作方式类似。
04:25
You build for the middle of the distribution.
93
265178
2210
你在发行版 的中间进行构建。
04:27
That's your normal user.
94
267430
1919
那是你的普通用户。
04:29
So for me, a lot of my training data sets
95
269390
1961
因此,对我来说, 我的许多训练数据集
04:31
would include information about people from the Western world who speak English,
96
271392
4213
将包含有关来自西方世界的人的信息
这些人会说英语,
04:35
who have certain normative characteristics.
97
275605
2336
具有某些规范特征。
04:37
And it took me an embarrassingly long amount of time
98
277982
2461
令人尴尬的是, 我花了很长时间才意识
04:40
to realize that I was not my own user.
99
280443
2503
到自己不是自己的用户。
04:43
So I identify as non-binary, as mixed race,
100
283696
2628
因此,我认定自己是非二进制, 作为混血儿
04:46
I wear a hearing aid
101
286324
1668
我戴着助听器,
04:47
and I just wasn't represented in the data sets that I was using.
102
287992
3587
只是我使用的数据集中没有我的代表。
04:51
And so I was building systems that literally didn't work for me.
103
291621
3378
所以我正在构建 对我根本不起作用的系统。
04:55
And for example, I once built a system that repeatedly told me
104
295041
3462
例如,我曾经建立过一个系统 ,它反复告诉我
04:58
that I was a white Eastern-European lady.
105
298503
3670
我是一位东欧白人女士。
05:02
This did a real number on my identity.
106
302966
2043
这是我的身份上的真实数字。
05:05
(Laughter)
107
305051
1919
(笑声)
05:06
But perhaps even more worrying,
108
306970
1793
但也许更令人担忧的是,
05:08
this was a system to be deployed in health care,
109
308805
2961
这是一个 要部署在医疗保健领域的系统,
05:11
where your background can determine things like risk scores for diseases.
110
311808
4296
你的背景可以决定诸如疾病 风险评分之类的东西。
05:17
And so I started to wonder,
111
317605
1627
于是我开始怀疑,我
05:19
can I build tools and work with others to do this
112
319274
2794
能否构建工具并 与其他人合作来做这件事,
05:22
so that I can look inside of a dataset before I use it?
113
322068
2836
这样我就可以在使用数据集之前 查看数据集的内部?
05:25
In 2018, I was part of a fellowship at Harvard and MIT,
114
325655
3629
2018年,我参加了哈佛和 麻省理工学院的协会,
05:29
and I, with some colleagues, decided to try to address this problem.
115
329284
4379
我和一些同事 决定尝试解决这个问题。
05:33
And so we launched the Data Nutrition Project,
116
333705
2836
因此,我们启动了数据营养项目,
05:36
which is a research group and also a nonprofit
117
336541
2919
这是一个研究小组 , 也是为数据集建立营养标签的非
05:39
that builds nutrition labels for datasets.
118
339502
2711
营利组织。
05:43
So similar to food nutrition labels,
119
343381
2628
与食品营养标签非常相似,
05:46
the idea here is that you can look inside of a data set before you use it.
120
346050
3504
这里的想法是,在使用数据集之前, 你可以先查看数据集的内部。
05:49
You can understand the ingredients,
121
349554
1710
你可以了解成分,
05:51
see whether it's healthy for the things that you want to do.
122
351264
2878
看看它对你想做的事情是否有益。
05:54
Now this is a cartoonified version of the label.
123
354142
2669
现在这是该标签的卡通化版本。
05:56
The top part tells you about the completion of the label itself.
124
356811
4213
顶部告诉你标签本身的完成情况。
06:01
And underneath that you have information about the data,
125
361065
2628
其下方有关于数据、描述、
06:03
the description, the keywords, the tags,
126
363693
2044
关键字、标签的信息,
06:05
and importantly, on the right hand side,
127
365778
1919
更重要的是,在右边, 你应该和不应该
06:07
how you should and should not use the data.
128
367697
2586
如何使用这些数据。
06:10
If you could scroll on this cartoon,
129
370700
1793
如果你能在这部动画片上滚动,
06:12
you would see information about risks and mitigation strategies
130
372493
3003
你会看到 有关
06:15
across a number of vectors.
131
375496
1544
多个向量的风险和缓解策略的信息。
06:17
And we launched this with two audiences in mind.
132
377707
2836
我们推出这个游戏 时考虑了两个受众。
06:20
The first audience are folks who are building AI.
133
380543
3545
第一批受众是正在构建 AI 的人。
06:24
So they’re choosing datasets.
134
384130
1418
因此,他们在选择数据集。
06:25
We want to help them make a better choice.
135
385590
2294
我们想帮助他们做出更好的选择。
06:27
The second audience are folks who are building datasets.
136
387926
3128
第二个受众是正在构建数据集 的人。
06:31
And it turns out
137
391095
1168
事实证明
06:32
that when you tell someone they have to put a label on something,
138
392305
3086
,当你告诉别人 他们必须在某件事上贴标签时,
06:35
they think about the ingredients beforehand.
139
395391
2086
他们会事先考虑 食材。
06:38
The analogy here might be,
140
398102
1544
这里的比喻可能是,
06:39
if I want to make a sandwich and say that it’s gluten-free,
141
399687
2878
如果我想做一个三明治 并说它不含麸质,
06:42
I have to think about all the components as I make the sandwich,
142
402607
3045
我必须在做三明治、面包和食材、
06:45
the bread and the ingredients, the sauces.
143
405652
2210
酱汁时考虑所有成分。
06:47
I can't just put it on a sandwich and put it in front of you
144
407904
2836
我不能把它放在三明治上 然后放在你面前
06:50
and tell you it's gluten-free.
145
410740
1960
然后告诉你它不含麸质。
06:52
We're really proud of the work that we've done.
146
412700
2253
我们为自己 所做的工作感到非常自豪。
06:54
We launched this as a design and then a prototype
147
414994
2336
我们将其作为设计推出, 然后是原型,
06:57
and ultimately a tool for others to make their own labels.
148
417330
3920
最终成为其他人 制作自己标签的工具。
07:01
And we've worked with experts at places like Microsoft Research,
149
421709
3045
我们还与微软研究院、联合国等地
07:04
the United Nations and professors globally
150
424754
3045
的专家以及全球的教授合作,
07:07
to integrate the label and the methodology
151
427840
2002
将标签和方法论整合到
07:09
into their work flows and into their curricula.
152
429884
2628
他们的工作流程和课程中。
07:13
But we know it only goes so far.
153
433096
1877
但是我们知道这只会持续很长时间。
07:15
And that's because it's actually really hard to get a label
154
435014
2920
那是因为实际上很难
07:17
on every single dataset.
155
437976
2293
在每个数据集上贴上标签。
07:20
And this comes down to the question
156
440311
1710
这归结为一个问题,
07:22
of why would you put a label on a dataset to begin with?
157
442063
3086
一开始你为什么要在数据集上加标签?
07:25
Well, the first reason is not rocket science.
158
445525
2169
好吧,第一个原因不是火箭科学。
07:27
It's that you have to.
159
447735
1835
这是你必须做的。
07:29
And this is, quite frankly, why food nutrition labels exist.
160
449570
2878
坦率地说,这就是食品营养 标签存在的原因。
07:32
It's because if they didn't put them on the boxes, it would be illegal.
161
452490
3420
这是因为如果他们不把它们 放在箱子里,那将是非法的。
07:36
However, we don't really have AI regulation.
162
456703
2377
但是,我们实际上 没有人工智能法规。
07:39
We don't have much regulation around the use of data.
163
459122
2627
我们对数据的使用没有太多规定。
07:42
Now there is some on the horizon.
164
462208
1960
现在有一些即将到来。
07:44
For example, the EU AI Act just passed this week.
165
464168
3420
例如,欧盟人工智能法案本周刚刚通过。
07:48
And although there are no requirements around making the training data available,
166
468381
4630
尽管对提供训练数据 没有要求,
07:53
they do have provisions for creating transparency labeling
167
473052
4254
但他们确实有创建 透明度标签的规定,
07:57
like the dataset nutrition label, data sheets, data statements.
168
477348
3879
例如数据集营养标签、 数据表、数据声明。
08:01
There are many in the space.
169
481269
1376
太空中有很多。
08:02
We think this is a really good first step.
170
482645
2044
我们认为这是一个非常好的第一步。
08:05
The second reason that you might have a label on a dataset
171
485606
2753
你可能在数据集上 贴上标签的第二个原因是因为
08:08
is because it is a best practice or a cultural norm.
172
488401
3920
这是一种最佳实践 或文化规范。
08:13
The example here might be how we're starting to see
173
493364
2544
这里的例子可能是我们 如何开始看到
08:15
more and more food packaging and menus at restaurants
174
495950
3337
越来越多的食品包装 和餐厅菜单中
08:19
include information about whether there's gluten.
175
499328
2920
包含有关是否含有麸质的信息。
08:22
This is not required by law,
176
502248
1794
这不是法律所要求的,
08:24
although if you do say it, it had better be true.
177
504042
2627
尽管如果你这么说, 那最好是真的。
08:27
And the reason that people are adding this to their menus
178
507211
2711
08:29
and their food packaging
179
509922
1168
和食品包装中,
08:31
is because there's an increased awareness of the sensitivity
180
511090
2878
是因为人们越来越 意识到这种过敏
08:33
and kind of the seriousness of that kind of an allergy or condition.
181
513968
3754
或疾病的敏感性和严重性。
08:39
So we're also seeing some movement in this area.
182
519057
2961
因此,我们在这一领域 也看到了一些动向。
08:42
Folks who are building datasets are starting to put nutrition labels,
183
522060
3503
正在建立数据集的人们 开始在他们的数据集上
08:45
data sheets on their datasets.
184
525605
1793
贴上营养标签和数据表。
08:47
And people who are using data are starting to request the information.
185
527398
3337
而使用数据的人 也开始索取这些信息。
08:50
This is really heartening.
186
530735
1293
这确实令人振奋。
08:52
And you might say, "Kasia, why are you up here?
187
532028
2210
然后你可能会说: “Kasia,你为什么在这里?
08:54
Everything seems to be going well, seems to be getting better."
188
534280
3003
一切似乎都进展顺利, 似乎越来越好。”
08:57
In some ways it is.
189
537700
1210
在某些方面确实如此。
08:58
But I'm also here to tell you that our relationship to data
190
538951
2795
但我在这里也要告诉你, 我们与数据的关系
09:01
is getting worse.
191
541746
1460
正在恶化。
09:03
Now the last few years have seen a supercharged interest
192
543664
3337
现在,在过去的几年中, 人们对收集数据集
09:07
in gathering datasets.
193
547001
1919
的兴趣激增。
09:09
Companies are scraping the web.
194
549504
1876
各公司正在抓取网页。
09:11
They're transcribing millions of hours of YouTube videos into text.
195
551380
4004
他们正在将数百万小时 的 YouTube 视频转录为文本。
09:15
By some estimates, they'll run out of information on the internet by 2026.
196
555885
3879
据估计,到 2026 年, 他们将在互联网上耗尽信息。
09:20
They're even considering buying publishing houses
197
560515
2502
他们甚至在考虑收购出版社
09:23
so they can get access to printed text and books.
198
563017
2753
这样他们就可以获得 印刷的课本和书籍。
09:27
So why are they gathering this information?
199
567980
2503
那么他们为什么要收集这些信息呢?
09:30
Well, they need more and more information
200
570483
1918
好吧,他们需要 越来越多的信息
09:32
to train a new technique called generative AI.
201
572443
2670
来训练一种名为 生成式人工智能的新技术。
09:35
I want to tell you about the size of these datasets.
202
575154
2461
我想告诉你这些数据集 的大小。
09:38
If you look at GPT-3, which is a model that launched in 2020,
203
578533
3378
如果你看一下在 2020 年 推出的模型 GPT-3,
09:41
the training dataset included 300 billion words, or parts of words.
204
581953
5547
训练数据集包含 3,000 亿个 单词或部分单词。
09:47
Now for context, the English language contains less than a million words.
205
587542
3878
现在,就上下文而言,英语 包含的单词不到一百万个。
09:52
Just three years later, DBRX was launched,
206
592505
3003
仅仅三年后,DBRX问世,
09:55
which was trained on eight trillion words.
207
595508
3086
它接受了八万亿字的训练。
09:58
So 300 billion to eight trillion in three years.
208
598636
3212
因此,三年内将达到 3000 亿至 8 万亿美元。
10:01
And the datasets are getting bigger.
209
601848
2252
而且数据集越来越大。
10:04
Now with each successive model launch,
210
604600
2211
现在,随着模型的连续推出,
10:06
the datasets are actually less and less transparent.
211
606853
3044
数据集的透明度实际上越来越低。
10:09
And even we have access to the information,
212
609939
2169
而且,即使我们有机会获得信息,
10:12
it's so big, it's so hard to look inside without any kind of transparency tooling.
213
612108
4838
信息量如此之大,
如果没有任何透明度工具, 就很难了解内部。
10:18
And the generative AI itself is also causing some worries.
214
618865
4212
生成式人工智能本身 也引起了一些担忧。
10:23
And you've probably encountered this technique through ChatGPT.
215
623077
3712
而且你可能已经通过 ChatGPT 遇到了这种技术。
10:26
I don't need to know what you do on the internet,
216
626831
2336
我不需要知道你在互联网上做什么,
10:29
that's between you and the internet,
217
629167
1751
那是在你和互联网之间,
10:30
but you probably know, just like I do,
218
630918
1835
但你可能知道, 就像我一样,使用 ChatGPT
10:32
how easy it is to create information using ChatGPT
219
632795
2378
和其他生成式 AI 技术
10:35
and other generative AI technologies
220
635214
1752
创建信息并将其发布到网络上是
10:36
and to put that out onto the web.
221
636966
1919
多么容易。
10:38
And so we're looking at a situation
222
638885
1710
因此,我们正在研究这样一种情况
10:40
in which we're going to encounter lots of information
223
640636
2503
我们将遇到大量通过算法生成的信息
10:43
that's algorithmically generated but we won't know it
224
643139
2502
但我们不知道
10:45
and we won't know whether it's true.
225
645683
1752
也不会知道这是否属实。
10:47
And this increases the scale of the potential risks and harms from AI.
226
647476
3796
这增加了人工智能潜 在风险和危害的规模。
10:51
Not only that, I'm sorry,
227
651981
1460
对不起,不仅如此,
10:53
but the models themselves are getting controlled
228
653482
2878
模型本身也
10:56
by a smaller and smaller number of private actors in US tech firms.
229
656360
4171
由越来越少的美国科技 公司私人行为者控制。
11:00
So this is the models that were launched last year, in 2023.
230
660531
4046
因此,这是去年, 即 2023 年推出的车型。
11:04
And you can see most of them are pink, meaning they came out of industry.
231
664577
3462
你可以看到它们中 的大多数是粉红色的,
这意味着它们是从工业中脱颖而出的。
11:08
And if you look at this over time, more and more are coming out of industry
232
668080
3587
如果你随着时间的推移来看看, 越来越多的人走出工业
11:11
and fewer and fewer are coming out of all the other sectors combined,
233
671709
3253
而来自所有其他领域的总和越来越少,
11:14
including academia and government,
234
674962
1710
包括学术界和政府,
11:16
where technology is often launched in a way
235
676672
2044
在这些领域
11:18
that's more easy to be scrutinized.
236
678758
2169
技术的推出往往更容易受到审查。
11:20
So if we go back to our cafe analogy,
237
680927
1793
因此,如果我们回到 我们的咖啡馆打个比方,
11:22
this is like you have a small number of private actors
238
682762
2878
就好像 有少数私人演员
11:25
who own all the ingredients,
239
685681
1877
拥有所有食材,
11:27
they make all the sandwiches globally,
240
687600
2961
他们在全球范围内制作所有三明治,
11:30
and there's not a lot of regulation.
241
690561
1960
而且没有太多的监管。
11:33
And so at this point you're probably scared
242
693064
2002
因此,此时 你可能会感到害怕
11:35
and maybe feeling a little uncomfortable.
243
695107
1961
可能会感到有点不舒服。
11:37
Which is ironic because a few minutes ago, I was going to get you all sandwiches
244
697109
3796
这很具有讽刺意味,因为几分钟前, 我本来要给你们买所有的三明治
11:40
and you said yes.
245
700905
1168
而你说好的。
11:42
This is why you should not accept food from strangers.
246
702114
2586
这就是为什么 你不应该接受陌生人的食物。
11:44
But I wouldn't be up here if I weren't also optimistic.
247
704742
2878
但是,如果我不是很乐观 的话,我就不会来这里。
11:47
And that's because I think we have momentum
248
707620
2044
那是因为我认为 我们在监管和文化变革
11:49
behind the regulation and the culture changes.
249
709705
2503
背后势头强劲。
11:52
Especially if we align ourselves with three basic principles
250
712833
2837
特别是如果我们坚持
11:55
about how corporations should engage with data.
251
715670
2544
关于企业应如何处理 数据的三项基本原则。
11:58
The first principle is that companies that gather data should tell us
252
718547
3713
第一个原则是,收集数据的公司
12:02
what they're gathering.
253
722301
1418
应该告诉我们他们在收集什么数据。
12:04
This would allow us to ask questions like, is it copyrighted material?
254
724470
3545
这将允许我们提出一些问题, 比如,它是受版权保护的材料吗?
12:08
Is that information private?
255
728057
1919
这些信息是私密的吗?
12:09
Could you please stop?
256
729976
1543
你能停下来吗?
12:11
It also opens up the data to scientific inquiry.
257
731560
2962
它还为科学研究开辟了数据。
12:15
The second principle is that companies that are gathering our data should tell us
258
735731
3921
第二个原则是,收集我们数据的公司 应该在对据做任何事情之前告诉我们
12:19
what they're going to do with it before they do anything with it.
259
739694
3253
他们将如何处理这些数据。
12:23
And by requiring that companies tell us their plan,
260
743572
2878
而且,要求公司 告诉我们他们的计划,
12:26
this means that they have to have a plan,
261
746450
2294
这意味着他们必须制定计划,
12:28
which would be a great first step.
262
748744
1877
这将是很好的第一步。
12:31
It also probably would lead to the minimization of data capture,
263
751706
3336
这也可能会最大限度地减少数据采集,
12:35
because they wouldn't be able to capture data
264
755042
2169
因为
12:37
if they didn't know what they were already going to do with it.
265
757253
2961
如果他们不知道自己 已经打算用数据做什么,
就无法捕获数据。
12:40
And finally, principle three,
266
760256
1626
最后,原则三,构建人工智能的
12:41
companies that build AI should tell us about the data
267
761882
2628
公司应该告诉我们他们用来
12:44
that they use to train the AI.
268
764552
1960
训练人工智能的数据。
12:47
And this is where dataset nutrition labels
269
767179
2294
这就是数据集营养标签
12:49
and other transparency labeling comes into play.
270
769515
2294
和其他透明度标签发挥作用的地方。
12:52
You know, in the case where the data itself won't be made available,
271
772893
3212
你知道,在数据本身无法提供的情况下,
12:56
which is most of the time, probably,
272
776147
2294
在大多数情况下,
12:58
the labeling is critical for us to be able to investigate the ingredients
273
778482
3546
标签对于我们能够研究成分
13:02
and start to find solutions.
274
782028
1793
并开始寻找解决方案至关重要。
13:05
So I want to leave you with the good news,
275
785698
2044
因此,我想给你们留个好消息,
13:07
and that is that the data nutrition projects and other projects
276
787742
3003
那就是数据 营养项目和其他项目
13:10
are just a small part of a global movement
277
790786
3337
只是全球人工智能问责运动
13:14
towards AI accountability.
278
794165
1877
的一小部分。
13:16
Dataset Nutrition Label and other projects are just a first step.
279
796792
4088
数据集营养标签 和其他项目只是第一步。
13:21
Regulation's on the horizon,
280
801714
1752
监管即将到来,
13:23
the cultural norms are shifting,
281
803507
1544
文化规范正在发生变化,
13:25
especially if we align with these three basic principles
282
805051
2961
特别是如果我们 符合这三项基本原则,
13:28
that companies should tell us what they're gathering,
283
808012
2544
即公司应该告诉我们他们在收集什么,
13:30
tell us what they're going to do with it before they do anything with it,
284
810598
3462
在对数据做任何事情之前 告诉我们他们将要用这个数据做什么,
13:34
and that companies that are building AI
285
814101
1919
以及正在构建人工智能的公司
13:36
should explain the data that they're using to build the system.
286
816062
3336
应该解释他们用来构建系统的数据。
13:40
We need to hold these organizations accountable
287
820191
2210
我们需要让这些组织
13:42
for the AI that they're building
288
822443
2002
对他们正在构建
13:44
by asking them, just like we do with the food industry,
289
824487
2627
的人工智能负责, 就像我们对待食品行业一样,
问他们内部有什么, 你是如何制造出来的?
13:47
what's inside and how did you make it?
290
827156
2294
13:50
Only then can we mitigate the issues before they occur,
291
830201
3128
只有这样,我们才能 在问题发生之前,
13:53
as opposed to after they occur.
292
833371
1918
而不是在问题发生之后缓解问题。
13:55
And in doing so, create an integrated algorithmic internet
293
835664
3879
通过这样做,可以创建一个 对所有人来说都更健康
13:59
that is healthier for everyone.
294
839585
2669
的集成算法互联网。
14:02
Thank you.
295
842546
1168
谢谢。
14:03
(Applause)
296
843714
2836
(掌声)
关于本网站

这个网站将向你介绍对学习英语有用的YouTube视频。你将看到来自世界各地的一流教师教授的英语课程。双击每个视频页面上显示的英文字幕,即可从那里播放视频。字幕会随着视频的播放而同步滚动。如果你有任何意见或要求,请使用此联系表与我们联系。

https://forms.gle/WvT1wiN1qDtmnspy7


This website was created in October 2020 and last updated on June 12, 2025.

It is now archived and preserved as an English learning resource.

Some information may be out of date.

隐私政策

eng.lish.video

Developer's Blog