Should we get rid of standardized testing? - Arlo Kempf

1,216,875 views ・ 2017-09-19

TED-Ed


请双击下面的英文字幕来播放视频。

翻译人员: Jiawen Wei 校对人员: Jessica Lee
00:09
The first standardized tests that we know of
0
9061
3101
我们所知的第一场标准化考核
00:12
were administered in China over 2,000 years ago
1
12162
4180
是在2000多年前
00:16
during the Han dynasty.
2
16342
1881
由中国的汉朝举办的。
00:18
Chinese officials used them to determine aptitude for various government posts.
3
18223
5260
当时汉朝的官员依据这些考核 来为政府职位挑选人才。
00:23
The subject matter included philosophy,
4
23483
2089
考试的科目包括哲学,
00:25
farming,
5
25572
1065
农业,
00:26
and even military tactics.
6
26637
2326
甚至军事策略。
00:28
Standardized tests continued to be used around the world for the next two millennia,
7
28963
4827
标准化考核在之后的 两千年中被世界各地所采用,
00:33
and today, they're used for everything
8
33790
2092
时至今日,它们仍然 被广泛应用于方方面面,
00:35
from evaluating stair climbs for firefighters in France
9
35882
3956
从法国消防员的台阶攀爬考核,
00:39
to language examinations for diplomats in Canada
10
39838
3485
到加拿大外交官的语言考核,
00:43
to students in schools.
11
43323
2591
再到学校的学生。
00:45
Some standardized tests measure scores
12
45914
2110
有些标准化考核的成绩
00:48
only in relation to the results of other test takers.
13
48024
3760
仅仅和其他参加考试的考生成绩相关。
00:51
Others measure performances on how well test takers meet predetermined criteria.
14
51784
5671
其他考试则依据预定的标准 来评判考生的表现
00:57
So the stair climb for the firefighter
15
57455
2258
所以消防员的台阶攀爬测试
00:59
could be measured by comparing the time of the climb
16
59713
2881
可以通过和其他消防员
01:02
to that of all other firefighters.
17
62594
3010
比较攀爬时长来进行评估。
01:05
This might be expressed in what many call a bell curve.
18
65604
3839
考核结果可以用我们大家 所说的钟形曲线来展现。
01:09
Or it could be evaluated with reference to set criteria,
19
69443
3971
或者可以依据预设的 标准为参考来进行评估,
01:13
such as carrying a certain amount of weight a certain distance
20
73414
3590
比如携带指定的负重向上攀爬
01:17
up a certain number of stairs.
21
77004
2920
特定距离及特定的台阶数。
01:19
Similarly, the diplomat might be measured against other test-taking diplomats,
22
79924
4778
同样的,外交官考核的成绩可以 通过和其他考生互相比较来评估,
01:24
or against a set of fixed criteria,
23
84702
2443
或者根据能够展现
01:27
which demonstrate different levels of language proficiency.
24
87145
3909
语言掌握程度而设立的标准进行评估。
01:31
And all of these results can be expressed using something called a percentile.
25
91054
4731
而所有这些考核成绩都可以通过 一种被称为百分位数的形式来展现。
01:35
If a diplomat is in the 70th percentile, 70% of test takers scored below her.
26
95785
5989
例如,一位外交官的成绩是第70个 百分位数,即高于70%的考生。
01:41
If she scored in the 30th percentile, 70% of test takers scored above her.
27
101774
5561
而如果她的成绩是第30个百分位数, 就是低于70%的考生。
01:47
Although standardized tests are sometimes controversial,
28
107335
3411
尽管标准化考核有时也会引起争议,
01:50
they're simply a tool.
29
110746
1779
它们也仅仅只是一种工具而已。
01:52
As a thought experiment, think of a standardized test as a ruler.
30
112525
4171
把标准化考核想像成一把尺。
01:56
A ruler's usefulness depends on two things.
31
116696
2699
而让尺发挥作用取决于两个因素。
01:59
First, the job we ask it to do.
32
119395
2762
首先,是我们想让它发挥的功能。
02:02
Our ruler can't measure the temperature outside
33
122157
2829
我们不能用尺来测量室外的温度,
02:04
or how loud someone is singing.
34
124986
2460
或者某个人唱歌的分贝。
02:07
Second, the ruler's usefulness depends on its design.
35
127446
3419
其次,尺的设计决定了它的作用。
02:10
Say you need to measure the circumference of an orange.
36
130865
3281
比如你想要测量一个橙子的圆周长,
02:14
Our ruler measures length, which is the right quantity,
37
134146
3251
我们的尺正是用来测量长度的,
02:17
but it hasn't been designed with the flexibility required for the task at hand.
38
137397
4841
但是它的设计并不能满足 当前任务所需的弹性。
02:22
So, if standardized tests are given the wrong job,
39
142238
3128
所以当标准化考核 被赋予了错误的功能,
02:25
or aren't designed properly,
40
145366
1871
或者考核的设计失当,
02:27
they may end up measuring the wrong things.
41
147237
4390
它们最终可能会得出错误的测试结果。
02:31
In the case of schools,
42
151627
1280
例如在学校中,
02:32
students with test anxiety may have trouble performing their best
43
152907
3771
有考试焦虑症的学生 可能无法在标准化考核中
02:36
on a standardized test,
44
156678
1730
展现全部实力,
02:38
not because they don't know the answers,
45
158408
1708
这并不是因为他们不知道答案,
02:40
but because they're feeling too nervous to share what they've learned.
46
160116
3619
而是因为他们太紧张 而无法分享自己所学的知识。
02:43
Students with reading challenges
47
163735
1683
有阅读障碍的学生
02:45
may struggle with the wording of a math problem,
48
165418
2660
可能无法理解一道数学题的题意,
02:48
so their test results may better reflect their literacy
49
168078
2800
所以他们的考试成绩 也许更好的反馈了
02:50
rather than numeracy skills.
50
170878
2640
他们的读写能力,而不是数学能力。
02:53
And students who were confused by examples
51
173518
2060
而有些学生对于试题中涉及的
02:55
on tests that contain unfamiliar cultural references
52
175578
3590
他们所不熟悉的文化背景感到困惑,
02:59
may do poorly,
53
179168
1449
因而表现不佳。
03:00
telling us more about the test taker's cultural familiarity
54
180617
2792
这些最终会更多的向我们展示 考生对于文化的熟悉程度,
03:03
than their academic learning.
55
183409
2289
而非他们的学术能力。
03:05
In these cases, the tests may need to be designed differently.
56
185698
5392
以上事例中的考核也许需要重新设计。
03:11
Standardized tests can also have a hard time
57
191090
2329
标准化考核在测试抽象的特性或者技能
03:13
measuring abstract characteristics or skills,
58
193419
3219
比如创造力,批判性思维 和协同合作性上
03:16
such as creativity, critical thinking, and collaboration.
59
196638
4020
也无法发挥应有的作用。
03:20
If we design a test poorly,
60
200658
1720
如果我们没有正确的设计考核机制
03:22
or ask it to do the wrong job,
61
202378
1922
或者赋予考核错误的作用,
03:24
or a job it's not very good at,
62
204300
2253
或者将考核应用于不恰当的领域,
03:26
the results may not be reliable or valid.
63
206553
3296
考核的结果就可能并不可信或者无效。
03:29
Reliability and validity are two critical ideas
64
209849
3090
可信度和有效性是理解标准化考核的
03:32
for understanding standardized tests.
65
212939
2680
两个重要概念。
03:35
To understand the difference between them,
66
215619
1681
为了理解这两者间的不同之处,
03:37
we can use the metaphor of two broken thermometers.
67
217300
3089
我们可以用两个破损的温度计做比喻。
03:40
An unreliable thermometer
68
220389
1900
一个不可靠的温度计
03:42
gives you a different reading each time you take your temperature,
69
222289
3253
会在每次测量的时候得到不同的读数,
03:45
and the reliable but invalid thermometer is consistently ten degrees too hot.
70
225542
5649
而一个可靠但是结果无效的 温度计的读数会始终偏高10度。
03:51
Validity also depends on accurate interpretations of results.
71
231191
4269
有效性也取决于对于结果准确的解读。
03:55
If people say results of a test mean something they don't,
72
235460
3311
如果人们想将考核的结果推广到 超出其本身所代表的意义,
03:58
that test may have a validity problem.
73
238771
3163
那这个考核的有效性就出现了问题。
04:01
Just as we wouldn't expect a ruler to tell us how much an elephant weighs,
74
241934
4508
正如我们不能期望用尺来 测量出大象的重量
04:06
or what it had for breakfast,
75
246442
1860
或者它早饭吃了什么,
04:08
we can't expect standardized tests alone to reliably tell us how smart someone is,
76
248302
5879
我们也无法期待仅仅通过标准化考核 就能知道某个人有多聪明,
04:14
how diplomats will handle a tough situation,
77
254181
2142
外交官是否能机智的化解困境,
04:16
or how brave a firefighter might turn out to be.
78
256323
4299
或者消防员会有多勇敢。
04:20
So standardized tests may help us learn a little about a lot of people
79
260622
4790
所以标准化考核也许能够 帮助我们在短时间内
04:25
in a short time,
80
265412
1150
对一大群人有大概的了解,
04:26
but they usually can't tell us a lot about a single person.
81
266562
4451
但是这些考核通常无法告诉我们 关于某一个人的很多特点。
04:31
Many social scientists worry about test scores resulting in sweeping
82
271013
4719
很多社会学家担心考核成绩太过笼统
04:35
and often negative changes for test takers,
83
275732
3114
并且通常会为考生带来负面的变化,
04:38
sometimes with long-term life consequences.
84
278846
3542
有时候甚至是长期或者 影响终生的变化。
04:42
We can't blame the tests, though.
85
282388
2001
然而我们不能抱怨考核本身,
04:44
It's up to us to use the right tests for the right jobs,
86
284389
3790
因为这取决于我们如何去 将正确的考核用在正确的领域,
04:48
and to interpret results appropriately.
87
288179
2884
并且正确的解读考核的结果。
关于本网站

这个网站将向你介绍对学习英语有用的YouTube视频。你将看到来自世界各地的一流教师教授的英语课程。双击每个视频页面上显示的英文字幕,即可从那里播放视频。字幕会随着视频的播放而同步滚动。如果你有任何意见或要求,请使用此联系表与我们联系。

https://forms.gle/WvT1wiN1qDtmnspy7