Why AI Needs a “Nutrition Label” | Kasia Chmielinski | TED

31,704 views ・ 2024-06-14

TED


请双击下面的英文字幕来播放视频。

翻译人员: Paris Yuan 校对人员: suya f.
00:04
Now, I haven't met most of you or really any of you,
0
4209
3295
现在,我还没有见过你们中的大多数人, 也没有见过你们中的任何人,
00:07
but I feel a really good vibe in the room.
1
7504
2085
但我感觉房间里的气氛非常好。
00:09
(Laughter)
2
9631
1209
(笑声)
00:10
And so I think I'd like to treat you all to a meal.
3
10882
2503
所以我想 请大家吃一顿饭。
00:13
What do you think?
4
13426
1252
你觉得怎么样?
00:14
Yes? Great, so many new friends.
5
14678
1877
是吗?太棒了,这么多新朋友。
00:17
So we're going to go to this cafe,
6
17681
1668
所以我们要去这家咖啡馆,
00:19
they serve sandwiches.
7
19349
1501
他们供应三明治。
00:20
And the sandwiches are really delicious.
8
20850
2002
而且三明治真的很好吃。
00:22
But I have to tell you that sometimes they make people really, really sick.
9
22852
4422
但我必须告诉你,有时候 它们会让人非常生病。
00:27
(Laughter)
10
27774
1335
(笑声)
00:29
And we don't know why.
11
29109
1251
而且我们不知道为什么。
00:30
Because the cafe won't tell us how they make the sandwich,
12
30402
2711
因为咖啡馆不会告诉我们他们是 怎么做三明治的,
00:33
they won't tell us about the ingredients.
13
33154
2044
所以他们不会告诉我们食材。
00:35
And then the authorities have no way to fix the problem.
14
35198
3128
然后当局没有办法解决这个问题。
00:38
But the offer still stands.
15
38702
1293
但是这个提议仍然有效。
00:39
So who wants to get a sandwich?
16
39995
1543
那么谁想吃三明治?
00:41
(Laughter)
17
41538
1168
(笑声)
00:42
Some brave souls, we can talk after.
18
42747
1752
一些勇敢的人,我们可以事后再谈。
00:45
But for the rest of you, I understand.
19
45000
2168
但是对于你们其余的人来说,我明白。
00:47
You don't have enough information
20
47210
1585
你没有足够的信息
00:48
to make good choices about your safety
21
48795
1835
来做出正确的安全选择,
00:50
or even fix the issue.
22
50672
1835
甚至无法解决问题。
00:52
Now, before I further the anxiety here, I'm not actually trying to make you sick,
23
52507
3879
现在,在我进一步阐述焦虑之前, 我并不是想让你感到恶心,
00:56
but this is an analogy to how we're currently making algorithmic systems,
24
56428
3545
但这可以类比我们 目前如何制造算法系统,
00:59
also known as artificial intelligence or AI.
25
59973
3003
也称为人工 智能或人工智能。
01:04
Now, for those who haven't thought about the relationship
26
64060
2753
现在,对于那些还没有 考虑过人工智能和三明治
01:06
between AI and sandwiches, don't worry about it,
27
66813
2586
之间关系的人, 不用担心,
01:09
I'm here for you, I'm going to explain.
28
69441
2294
我在这里等你,我会解释的。
01:11
You see, AI systems, they provide benefit to society.
29
71776
3754
你看,人工智能系统, 它们为社会带来好处。
01:15
They feed us,
30
75530
1251
它们养活我们,
01:16
but they're also inconsistently making us sick.
31
76823
3670
但也会不一致地 使我们生病。
01:20
And we don't have access to the ingredients that go into the AI.
32
80535
4505
而且我们无法获得 进入人工智能的要素。
01:25
And so we can't actually address the issues.
33
85040
2460
因此,我们实际上无法 解决这些问题。
01:28
We also can't stop eating AI
34
88418
1793
我们也无法停止吃人工智能,
01:30
like we can just stop eating a shady sandwich
35
90211
2128
就像停止吃阴凉的三明治一样,
01:32
because it's everywhere,
36
92339
1209
因为它无处不在,
01:33
and we often don't even know that we're encountering a system
37
93590
2878
而且我们通常甚至不知道我们遇到
01:36
that's algorithmically based.
38
96509
1794
的是一个基于算法的系统。
01:38
So today, I'm going to tell you about some of the AI trends that I see.
39
98345
3878
因此,今天,我将向大家 介绍我所看到的一些人工智能趋势。
01:42
I'm going to draw on my experience building these systems
40
102223
2711
我将借鉴我在过去二十年中 构建这些系统的
01:44
over the last two decades to tell you about the tools
41
104934
2545
经验, 向大家介绍我和 其他人为研究这些人工智
01:47
that I and others have built to look into these AI ingredients.
42
107520
3879
要素而开发的工具。
01:51
And finally, I'm going to leave you with three principles
43
111441
2711
最后,我要给你们讲三项原则
01:54
that I think will give us a healthier relationship
44
114152
2336
我认为这将使我们 与开发人工智能的公司
01:56
to the companies that build artificial intelligence.
45
116488
2836
建立更健康的关系。
02:00
I'm going to start with the question, how did we get here?
46
120241
2878
我首先要问一个问题,我们是 怎么来到这里的?
02:03
AI is not new.
47
123745
2169
人工智能并不是什么新鲜事物。
02:06
We have been living alongside AI for two decades.
48
126665
3378
我们与人工智能共处了二十年。
02:10
Every time that you apply for something online,
49
130418
2294
每当你在网上申请、
02:12
you open a bank account or you go through passport control,
50
132712
3420
开设银行账户 或通过护照检查时,
02:16
you're encountering an algorithmic system.
51
136132
2044
你都会遇到一个算法系统。
02:19
We've also been living with the negative repercussions of AI for 20 years,
52
139010
4088
20年来,我们也一直忍受着 人工智能的负面影响,
02:23
and this is how it makes us sick.
53
143139
1752
这就是它让我们生病的原因。
02:25
These systems get deployed on broad populations,
54
145266
2920
这些系统部署 在广泛的人群中,
02:28
and then certain subsets end up getting negatively disparately impacted,
55
148228
4921
然后某些子集最终会受到 不同的负面影响,
02:33
usually on the basis of race or gender or other characteristics.
56
153191
3504
通常是基于种族、 性别或其他特征。
02:37
We need to be able to understand the ingredients to these systems
57
157862
3087
我们需要能够了解这些 系统的组成部分,
02:40
so that we can address the issues.
58
160990
2086
这样我们才能解决问题。
02:43
So what are the ingredients to an AI system?
59
163827
3086
那么,人工智能系统的组成部分是什么?
02:46
Well, data fuels the AI.
60
166955
2294
好吧,数据为人工智能提供了动力。
02:49
The AI is going to look like the data that you gave it.
61
169290
2962
人工智能看起来会 像你给它的数据。
02:52
So for example,
62
172752
1293
因此,举例来说,
02:54
if I want to make a risk-assessment system for diabetes,
63
174087
4129
如果我想建立一个 糖尿病风险评估系统,
02:58
my training data set might be adults in a certain region.
64
178258
4337
我的训练数据集可能是某个地区 的成年人。
03:02
And so I'll build that system,
65
182929
1460
因此,我将建立这个系统,
03:04
it'll work really well for those adults in that region.
66
184389
2627
它将非常适合该地区 的成年人。
03:07
But it does not work for adults in other regions
67
187016
2294
但是它不适用于其他地区的成年人,
03:09
or maybe at all for children.
68
189310
1419
甚至根本不适用于儿童。
03:10
So you can imagine if we deploy this for all those populations,
69
190770
3003
因此,你可以想象,如果我们 对所有这些人群进行部署,
03:13
there are going to be a lot of people who are harmed.
70
193815
2502
将会有很多人受到伤害。
03:16
We need to be able to understand the quality of the data before we use it.
71
196317
4422
在使用数据之前, 我们需要能够了解数据的质量。
03:22
But I'm sorry to tell you that we currently live
72
202157
2252
但我很抱歉地 告诉你,我们目前生活
03:24
in what I call the Wild West of data.
73
204451
2502
在我所谓的数据狂野西部。
03:26
It's really hard to assess quality of data before you use it.
74
206995
4171
在使用数据之前, 真的很难评估数据的质量。
03:31
There are no global standards for data quality assessment,
75
211166
2877
没有数据质量评估 的全球标准,
03:34
and there are very few data regulations around how you can use data
76
214085
3295
关于如何使用数据
03:37
and what types of data you can use.
77
217422
2377
和使用哪些类型的数据法规也很少。
03:40
This is kind of like in the food safety realm.
78
220967
2294
这有点像食品安全领域。
03:43
If we couldn't understand where the ingredients were sourced,
79
223303
3545
如果我们无法理解 这些食材的来源,
03:46
we also had no idea whether they were safe for us to consume.
80
226890
3003
我们也不知道 它们是否可以安全食用。
03:50
We also tend to stitch data together,
81
230643
2253
我们也倾向于将数据拼接在一起,
03:52
and every time we stitch this data together,
82
232937
2086
每当我们将 这些数据拼接在一起时,
我们可能会在互联网上找到
03:55
which we might find on the internet, scrape, we might generate it,
83
235023
3128
这些数据,抓取, 我们可能会生成这些数据,
03:58
we could source it.
84
238151
1376
我们都可以获取这些数据。
03:59
We lose information about the quality of the data.
85
239527
3128
我们会丢失 有关数据质量的信息。
04:03
And the folks who are building the models
86
243156
1960
而且,构建模型
04:05
are not the ones that found the data.
87
245116
1919
的人并不是发现数据的人。
04:07
So there's further information that's lost.
88
247076
2336
因此,还有更多的信息丢失了。
04:10
Now, I've been asking myself a lot of questions
89
250497
2210
现在,我一直在问自己很多
04:12
about how can we understand the data quality before we use it.
90
252749
3754
关于在使用数据之前如何了解 数据质量的问题。
04:16
And this emerges from two decades of building these kinds of systems.
91
256544
4672
这源于二十年来 建造此类系统的历程。
04:21
The way I was trained to build systems is similar to how people do it today.
92
261216
3920
我接受系统构建培训的方式 与当今人们的操作方式类似。
04:25
You build for the middle of the distribution.
93
265178
2210
你在发行版 的中间进行构建。
04:27
That's your normal user.
94
267430
1919
那是你的普通用户。
04:29
So for me, a lot of my training data sets
95
269390
1961
因此,对我来说, 我的许多训练数据集
04:31
would include information about people from the Western world who speak English,
96
271392
4213
将包含有关来自西方世界的人的信息
这些人会说英语,
04:35
who have certain normative characteristics.
97
275605
2336
具有某些规范特征。
04:37
And it took me an embarrassingly long amount of time
98
277982
2461
令人尴尬的是, 我花了很长时间才意识
04:40
to realize that I was not my own user.
99
280443
2503
到自己不是自己的用户。
04:43
So I identify as non-binary, as mixed race,
100
283696
2628
因此,我认定自己是非二进制, 作为混血儿
04:46
I wear a hearing aid
101
286324
1668
我戴着助听器,
04:47
and I just wasn't represented in the data sets that I was using.
102
287992
3587
只是我使用的数据集中没有我的代表。
04:51
And so I was building systems that literally didn't work for me.
103
291621
3378
所以我正在构建 对我根本不起作用的系统。
04:55
And for example, I once built a system that repeatedly told me
104
295041
3462
例如,我曾经建立过一个系统 ,它反复告诉我
04:58
that I was a white Eastern-European lady.
105
298503
3670
我是一位东欧白人女士。
05:02
This did a real number on my identity.
106
302966
2043
这是我的身份上的真实数字。
05:05
(Laughter)
107
305051
1919
(笑声)
05:06
But perhaps even more worrying,
108
306970
1793
但也许更令人担忧的是,
05:08
this was a system to be deployed in health care,
109
308805
2961
这是一个 要部署在医疗保健领域的系统,
05:11
where your background can determine things like risk scores for diseases.
110
311808
4296
你的背景可以决定诸如疾病 风险评分之类的东西。
05:17
And so I started to wonder,
111
317605
1627
于是我开始怀疑,我
05:19
can I build tools and work with others to do this
112
319274
2794
能否构建工具并 与其他人合作来做这件事,
05:22
so that I can look inside of a dataset before I use it?
113
322068
2836
这样我就可以在使用数据集之前 查看数据集的内部?
05:25
In 2018, I was part of a fellowship at Harvard and MIT,
114
325655
3629
2018年,我参加了哈佛和 麻省理工学院的协会,
05:29
and I, with some colleagues, decided to try to address this problem.
115
329284
4379
我和一些同事 决定尝试解决这个问题。
05:33
And so we launched the Data Nutrition Project,
116
333705
2836
因此,我们启动了数据营养项目,
05:36
which is a research group and also a nonprofit
117
336541
2919
这是一个研究小组 , 也是为数据集建立营养标签的非
05:39
that builds nutrition labels for datasets.
118
339502
2711
营利组织。
05:43
So similar to food nutrition labels,
119
343381
2628
与食品营养标签非常相似,
05:46
the idea here is that you can look inside of a data set before you use it.
120
346050
3504
这里的想法是,在使用数据集之前, 你可以先查看数据集的内部。
05:49
You can understand the ingredients,
121
349554
1710
你可以了解成分,
05:51
see whether it's healthy for the things that you want to do.
122
351264
2878
看看它对你想做的事情是否有益。
05:54
Now this is a cartoonified version of the label.
123
354142
2669
现在这是该标签的卡通化版本。
05:56
The top part tells you about the completion of the label itself.
124
356811
4213
顶部告诉你标签本身的完成情况。
06:01
And underneath that you have information about the data,
125
361065
2628
其下方有关于数据、描述、
06:03
the description, the keywords, the tags,
126
363693
2044
关键字、标签的信息,
06:05
and importantly, on the right hand side,
127
365778
1919
更重要的是,在右边, 你应该和不应该
06:07
how you should and should not use the data.
128
367697
2586
如何使用这些数据。
06:10
If you could scroll on this cartoon,
129
370700
1793
如果你能在这部动画片上滚动,
06:12
you would see information about risks and mitigation strategies
130
372493
3003
你会看到 有关
06:15
across a number of vectors.
131
375496
1544
多个向量的风险和缓解策略的信息。
06:17
And we launched this with two audiences in mind.
132
377707
2836
我们推出这个游戏 时考虑了两个受众。
06:20
The first audience are folks who are building AI.
133
380543
3545
第一批受众是正在构建 AI 的人。
06:24
So they’re choosing datasets.
134
384130
1418
因此,他们在选择数据集。
06:25
We want to help them make a better choice.
135
385590
2294
我们想帮助他们做出更好的选择。
06:27
The second audience are folks who are building datasets.
136
387926
3128
第二个受众是正在构建数据集 的人。
06:31
And it turns out
137
391095
1168
事实证明
06:32
that when you tell someone they have to put a label on something,
138
392305
3086
,当你告诉别人 他们必须在某件事上贴标签时,
06:35
they think about the ingredients beforehand.
139
395391
2086
他们会事先考虑 食材。
06:38
The analogy here might be,
140
398102
1544
这里的比喻可能是,
06:39
if I want to make a sandwich and say that it’s gluten-free,
141
399687
2878
如果我想做一个三明治 并说它不含麸质,
06:42
I have to think about all the components as I make the sandwich,
142
402607
3045
我必须在做三明治、面包和食材、
06:45
the bread and the ingredients, the sauces.
143
405652
2210
酱汁时考虑所有成分。
06:47
I can't just put it on a sandwich and put it in front of you
144
407904
2836
我不能把它放在三明治上 然后放在你面前
06:50
and tell you it's gluten-free.
145
410740
1960
然后告诉你它不含麸质。
06:52
We're really proud of the work that we've done.
146
412700
2253
我们为自己 所做的工作感到非常自豪。
06:54
We launched this as a design and then a prototype
147
414994
2336
我们将其作为设计推出, 然后是原型,
06:57
and ultimately a tool for others to make their own labels.
148
417330
3920
最终成为其他人 制作自己标签的工具。
07:01
And we've worked with experts at places like Microsoft Research,
149
421709
3045
我们还与微软研究院、联合国等地
07:04
the United Nations and professors globally
150
424754
3045
的专家以及全球的教授合作,
07:07
to integrate the label and the methodology
151
427840
2002
将标签和方法论整合到
07:09
into their work flows and into their curricula.
152
429884
2628
他们的工作流程和课程中。
07:13
But we know it only goes so far.
153
433096
1877
但是我们知道这只会持续很长时间。
07:15
And that's because it's actually really hard to get a label
154
435014
2920
那是因为实际上很难
07:17
on every single dataset.
155
437976
2293
在每个数据集上贴上标签。
07:20
And this comes down to the question
156
440311
1710
这归结为一个问题,
07:22
of why would you put a label on a dataset to begin with?
157
442063
3086
一开始你为什么要在数据集上加标签?
07:25
Well, the first reason is not rocket science.
158
445525
2169
好吧,第一个原因不是火箭科学。
07:27
It's that you have to.
159
447735
1835
这是你必须做的。
07:29
And this is, quite frankly, why food nutrition labels exist.
160
449570
2878
坦率地说,这就是食品营养 标签存在的原因。
07:32
It's because if they didn't put them on the boxes, it would be illegal.
161
452490
3420
这是因为如果他们不把它们 放在箱子里,那将是非法的。
07:36
However, we don't really have AI regulation.
162
456703
2377
但是,我们实际上 没有人工智能法规。
07:39
We don't have much regulation around the use of data.
163
459122
2627
我们对数据的使用没有太多规定。
07:42
Now there is some on the horizon.
164
462208
1960
现在有一些即将到来。
07:44
For example, the EU AI Act just passed this week.
165
464168
3420
例如,欧盟人工智能法案本周刚刚通过。
07:48
And although there are no requirements around making the training data available,
166
468381
4630
尽管对提供训练数据 没有要求,
07:53
they do have provisions for creating transparency labeling
167
473052
4254
但他们确实有创建 透明度标签的规定,
07:57
like the dataset nutrition label, data sheets, data statements.
168
477348
3879
例如数据集营养标签、 数据表、数据声明。
08:01
There are many in the space.
169
481269
1376
太空中有很多。
08:02
We think this is a really good first step.
170
482645
2044
我们认为这是一个非常好的第一步。
08:05
The second reason that you might have a label on a dataset
171
485606
2753
你可能在数据集上 贴上标签的第二个原因是因为
08:08
is because it is a best practice or a cultural norm.
172
488401
3920
这是一种最佳实践 或文化规范。
08:13
The example here might be how we're starting to see
173
493364
2544
这里的例子可能是我们 如何开始看到
08:15
more and more food packaging and menus at restaurants
174
495950
3337
越来越多的食品包装 和餐厅菜单中
08:19
include information about whether there's gluten.
175
499328
2920
包含有关是否含有麸质的信息。
08:22
This is not required by law,
176
502248
1794
这不是法律所要求的,
08:24
although if you do say it, it had better be true.
177
504042
2627
尽管如果你这么说, 那最好是真的。
08:27
And the reason that people are adding this to their menus
178
507211
2711
08:29
and their food packaging
179
509922
1168
和食品包装中,
08:31
is because there's an increased awareness of the sensitivity
180
511090
2878
是因为人们越来越 意识到这种过敏
08:33
and kind of the seriousness of that kind of an allergy or condition.
181
513968
3754
或疾病的敏感性和严重性。
08:39
So we're also seeing some movement in this area.
182
519057
2961
因此,我们在这一领域 也看到了一些动向。
08:42
Folks who are building datasets are starting to put nutrition labels,
183
522060
3503
正在建立数据集的人们 开始在他们的数据集上
08:45
data sheets on their datasets.
184
525605
1793
贴上营养标签和数据表。
08:47
And people who are using data are starting to request the information.
185
527398
3337
而使用数据的人 也开始索取这些信息。
08:50
This is really heartening.
186
530735
1293
这确实令人振奋。
08:52
And you might say, "Kasia, why are you up here?
187
532028
2210
然后你可能会说: “Kasia,你为什么在这里?
08:54
Everything seems to be going well, seems to be getting better."
188
534280
3003
一切似乎都进展顺利, 似乎越来越好。”
08:57
In some ways it is.
189
537700
1210
在某些方面确实如此。
08:58
But I'm also here to tell you that our relationship to data
190
538951
2795
但我在这里也要告诉你, 我们与数据的关系
09:01
is getting worse.
191
541746
1460
正在恶化。
09:03
Now the last few years have seen a supercharged interest
192
543664
3337
现在,在过去的几年中, 人们对收集数据集
09:07
in gathering datasets.
193
547001
1919
的兴趣激增。
09:09
Companies are scraping the web.
194
549504
1876
各公司正在抓取网页。
09:11
They're transcribing millions of hours of YouTube videos into text.
195
551380
4004
他们正在将数百万小时 的 YouTube 视频转录为文本。
09:15
By some estimates, they'll run out of information on the internet by 2026.
196
555885
3879
据估计,到 2026 年, 他们将在互联网上耗尽信息。
09:20
They're even considering buying publishing houses
197
560515
2502
他们甚至在考虑收购出版社
09:23
so they can get access to printed text and books.
198
563017
2753
这样他们就可以获得 印刷的课本和书籍。
09:27
So why are they gathering this information?
199
567980
2503
那么他们为什么要收集这些信息呢?
09:30
Well, they need more and more information
200
570483
1918
好吧,他们需要 越来越多的信息
09:32
to train a new technique called generative AI.
201
572443
2670
来训练一种名为 生成式人工智能的新技术。
09:35
I want to tell you about the size of these datasets.
202
575154
2461
我想告诉你这些数据集 的大小。
09:38
If you look at GPT-3, which is a model that launched in 2020,
203
578533
3378
如果你看一下在 2020 年 推出的模型 GPT-3,
09:41
the training dataset included 300 billion words, or parts of words.
204
581953
5547
训练数据集包含 3,000 亿个 单词或部分单词。
09:47
Now for context, the English language contains less than a million words.
205
587542
3878
现在,就上下文而言,英语 包含的单词不到一百万个。
09:52
Just three years later, DBRX was launched,
206
592505
3003
仅仅三年后,DBRX问世,
09:55
which was trained on eight trillion words.
207
595508
3086
它接受了八万亿字的训练。
09:58
So 300 billion to eight trillion in three years.
208
598636
3212
因此,三年内将达到 3000 亿至 8 万亿美元。
10:01
And the datasets are getting bigger.
209
601848
2252
而且数据集越来越大。
10:04
Now with each successive model launch,
210
604600
2211
现在,随着模型的连续推出,
10:06
the datasets are actually less and less transparent.
211
606853
3044
数据集的透明度实际上越来越低。
10:09
And even we have access to the information,
212
609939
2169
而且,即使我们有机会获得信息,
10:12
it's so big, it's so hard to look inside without any kind of transparency tooling.
213
612108
4838
信息量如此之大,
如果没有任何透明度工具, 就很难了解内部。
10:18
And the generative AI itself is also causing some worries.
214
618865
4212
生成式人工智能本身 也引起了一些担忧。
10:23
And you've probably encountered this technique through ChatGPT.
215
623077
3712
而且你可能已经通过 ChatGPT 遇到了这种技术。
10:26
I don't need to know what you do on the internet,
216
626831
2336
我不需要知道你在互联网上做什么,
10:29
that's between you and the internet,
217
629167
1751
那是在你和互联网之间,
10:30
but you probably know, just like I do,
218
630918
1835
但你可能知道, 就像我一样,使用 ChatGPT
10:32
how easy it is to create information using ChatGPT
219
632795
2378
和其他生成式 AI 技术
10:35
and other generative AI technologies
220
635214
1752
创建信息并将其发布到网络上是
10:36
and to put that out onto the web.
221
636966
1919
多么容易。
10:38
And so we're looking at a situation
222
638885
1710
因此,我们正在研究这样一种情况
10:40
in which we're going to encounter lots of information
223
640636
2503
我们将遇到大量通过算法生成的信息
10:43
that's algorithmically generated but we won't know it
224
643139
2502
但我们不知道
10:45
and we won't know whether it's true.
225
645683
1752
也不会知道这是否属实。
10:47
And this increases the scale of the potential risks and harms from AI.
226
647476
3796
这增加了人工智能潜 在风险和危害的规模。
10:51
Not only that, I'm sorry,
227
651981
1460
对不起,不仅如此,
10:53
but the models themselves are getting controlled
228
653482
2878
模型本身也
10:56
by a smaller and smaller number of private actors in US tech firms.
229
656360
4171
由越来越少的美国科技 公司私人行为者控制。
11:00
So this is the models that were launched last year, in 2023.
230
660531
4046
因此,这是去年, 即 2023 年推出的车型。
11:04
And you can see most of them are pink, meaning they came out of industry.
231
664577
3462
你可以看到它们中 的大多数是粉红色的,
这意味着它们是从工业中脱颖而出的。
11:08
And if you look at this over time, more and more are coming out of industry
232
668080
3587
如果你随着时间的推移来看看, 越来越多的人走出工业
11:11
and fewer and fewer are coming out of all the other sectors combined,
233
671709
3253
而来自所有其他领域的总和越来越少,
11:14
including academia and government,
234
674962
1710
包括学术界和政府,
11:16
where technology is often launched in a way
235
676672
2044
在这些领域
11:18
that's more easy to be scrutinized.
236
678758
2169
技术的推出往往更容易受到审查。
11:20
So if we go back to our cafe analogy,
237
680927
1793
因此,如果我们回到 我们的咖啡馆打个比方,
11:22
this is like you have a small number of private actors
238
682762
2878
就好像 有少数私人演员
11:25
who own all the ingredients,
239
685681
1877
拥有所有食材,
11:27
they make all the sandwiches globally,
240
687600
2961
他们在全球范围内制作所有三明治,
11:30
and there's not a lot of regulation.
241
690561
1960
而且没有太多的监管。
11:33
And so at this point you're probably scared
242
693064
2002
因此,此时 你可能会感到害怕
11:35
and maybe feeling a little uncomfortable.
243
695107
1961
可能会感到有点不舒服。
11:37
Which is ironic because a few minutes ago, I was going to get you all sandwiches
244
697109
3796
这很具有讽刺意味,因为几分钟前, 我本来要给你们买所有的三明治
11:40
and you said yes.
245
700905
1168
而你说好的。
11:42
This is why you should not accept food from strangers.
246
702114
2586
这就是为什么 你不应该接受陌生人的食物。
11:44
But I wouldn't be up here if I weren't also optimistic.
247
704742
2878
但是,如果我不是很乐观 的话,我就不会来这里。
11:47
And that's because I think we have momentum
248
707620
2044
那是因为我认为 我们在监管和文化变革
11:49
behind the regulation and the culture changes.
249
709705
2503
背后势头强劲。
11:52
Especially if we align ourselves with three basic principles
250
712833
2837
特别是如果我们坚持
11:55
about how corporations should engage with data.
251
715670
2544
关于企业应如何处理 数据的三项基本原则。
11:58
The first principle is that companies that gather data should tell us
252
718547
3713
第一个原则是,收集数据的公司
12:02
what they're gathering.
253
722301
1418
应该告诉我们他们在收集什么数据。
12:04
This would allow us to ask questions like, is it copyrighted material?
254
724470
3545
这将允许我们提出一些问题, 比如,它是受版权保护的材料吗?
12:08
Is that information private?
255
728057
1919
这些信息是私密的吗?
12:09
Could you please stop?
256
729976
1543
你能停下来吗?
12:11
It also opens up the data to scientific inquiry.
257
731560
2962
它还为科学研究开辟了数据。
12:15
The second principle is that companies that are gathering our data should tell us
258
735731
3921
第二个原则是,收集我们数据的公司 应该在对据做任何事情之前告诉我们
12:19
what they're going to do with it before they do anything with it.
259
739694
3253
他们将如何处理这些数据。
12:23
And by requiring that companies tell us their plan,
260
743572
2878
而且,要求公司 告诉我们他们的计划,
12:26
this means that they have to have a plan,
261
746450
2294
这意味着他们必须制定计划,
12:28
which would be a great first step.
262
748744
1877
这将是很好的第一步。
12:31
It also probably would lead to the minimization of data capture,
263
751706
3336
这也可能会最大限度地减少数据采集,
12:35
because they wouldn't be able to capture data
264
755042
2169
因为
12:37
if they didn't know what they were already going to do with it.
265
757253
2961
如果他们不知道自己 已经打算用数据做什么,
就无法捕获数据。
12:40
And finally, principle three,
266
760256
1626
最后,原则三,构建人工智能的
12:41
companies that build AI should tell us about the data
267
761882
2628
公司应该告诉我们他们用来
12:44
that they use to train the AI.
268
764552
1960
训练人工智能的数据。
12:47
And this is where dataset nutrition labels
269
767179
2294
这就是数据集营养标签
12:49
and other transparency labeling comes into play.
270
769515
2294
和其他透明度标签发挥作用的地方。
12:52
You know, in the case where the data itself won't be made available,
271
772893
3212
你知道,在数据本身无法提供的情况下,
12:56
which is most of the time, probably,
272
776147
2294
在大多数情况下,
12:58
the labeling is critical for us to be able to investigate the ingredients
273
778482
3546
标签对于我们能够研究成分
13:02
and start to find solutions.
274
782028
1793
并开始寻找解决方案至关重要。
13:05
So I want to leave you with the good news,
275
785698
2044
因此,我想给你们留个好消息,
13:07
and that is that the data nutrition projects and other projects
276
787742
3003
那就是数据 营养项目和其他项目
13:10
are just a small part of a global movement
277
790786
3337
只是全球人工智能问责运动
13:14
towards AI accountability.
278
794165
1877
的一小部分。
13:16
Dataset Nutrition Label and other projects are just a first step.
279
796792
4088
数据集营养标签 和其他项目只是第一步。
13:21
Regulation's on the horizon,
280
801714
1752
监管即将到来,
13:23
the cultural norms are shifting,
281
803507
1544
文化规范正在发生变化,
13:25
especially if we align with these three basic principles
282
805051
2961
特别是如果我们 符合这三项基本原则,
13:28
that companies should tell us what they're gathering,
283
808012
2544
即公司应该告诉我们他们在收集什么,
13:30
tell us what they're going to do with it before they do anything with it,
284
810598
3462
在对数据做任何事情之前 告诉我们他们将要用这个数据做什么,
13:34
and that companies that are building AI
285
814101
1919
以及正在构建人工智能的公司
13:36
should explain the data that they're using to build the system.
286
816062
3336
应该解释他们用来构建系统的数据。
13:40
We need to hold these organizations accountable
287
820191
2210
我们需要让这些组织
13:42
for the AI that they're building
288
822443
2002
对他们正在构建
13:44
by asking them, just like we do with the food industry,
289
824487
2627
的人工智能负责, 就像我们对待食品行业一样,
问他们内部有什么, 你是如何制造出来的?
13:47
what's inside and how did you make it?
290
827156
2294
13:50
Only then can we mitigate the issues before they occur,
291
830201
3128
只有这样,我们才能 在问题发生之前,
13:53
as opposed to after they occur.
292
833371
1918
而不是在问题发生之后缓解问题。
13:55
And in doing so, create an integrated algorithmic internet
293
835664
3879
通过这样做,可以创建一个 对所有人来说都更健康
13:59
that is healthier for everyone.
294
839585
2669
的集成算法互联网。
14:02
Thank you.
295
842546
1168
谢谢。
14:03
(Applause)
296
843714
2836
(掌声)
关于本网站

这个网站将向你介绍对学习英语有用的YouTube视频。你将看到来自世界各地的一流教师教授的英语课程。双击每个视频页面上显示的英文字幕,即可从那里播放视频。字幕会随着视频的播放而同步滚动。如果你有任何意见或要求,请使用此联系表与我们联系。

https://forms.gle/WvT1wiN1qDtmnspy7