How to get better at video games, according to babies - Brian Christian

559,494 views ・ 2021-11-02

TED-Ed

אנא לחץ פעמיים על הכתוביות באנגלית למטה כדי להפעיל את הסרטון.

תרגום: Ido Dekkers עריכה: zeeva livshitz

00:08

In 2013, a group of researchers at DeepMind in London

8871

4292

ב 2013, קבוצה של חוקרים בדיפמיינד בלונדון

00:13

had set their sights on a grand challenge.

13163

2666

שמה את עינייה על אתגר גדול.

00:15

They wanted to create an AI system that could beat,

15996

3292

הם רצו ליצור מערכת בינה מלאכותית שתוכל להביס,

00:19

not just a single Atari game, but every Atari game.

19288

4833

לא רק משחק אטארי בודד אלא כל משחק אטארי.

00:24

They developed a system they called Deep Q Networks, or DQN,

24663

5166

הם פיתחו מערכת שנקראה רשתות Q עמוקות, או DQN,

00:29

and less than two years later, it was superhuman.

29829

3667

ופחות משנתיים לאחר מכן, היא היתה על אנושית.

00:33

DQN was getting scores 13 times better

33954

4167

DQN קיבלה תוצאות טובות פי 13

00:38

than professional human games testers at “Breakout,”

38121

3541

משחקנים אנושיים מקצוענים ב“ברייקאאוט,”

00:41

17 times better at “Boxing,” and 25 times better at “Video Pinball.”

41662

6334

טובה פי 17 ב“בוקסינג,” וטובה פי 25 ב“פינבול ווידאו.”

00:48

But there was one notable, and glaring, exception.

48162

3834

אבל היתה חריגה אחת בוהקת ונתונה לציון.

00:52

When playing “Montezuma’s Revenge” DQN couldn’t score a single point,

52496

5791

כשמשחקים “נקמת מונטזומה” DQN לא הצליחה לקבל נקודה אחת,

00:58

even after playing for weeks.

58537

2625

אפילו אחרי ששיחקה במשך שבועות.

01:01

What was it that made this particular game so vexingly difficult for AI?

61412

5459

מה זה היה שהפך את המשחק המסויים הזה לכל כך מבלבל למערכת הבינה המלאכותית?

01:07

And what would it take to solve it?

67204

2459

ומה היה דרוש כדי לפתור את זה?

01:10

Spoiler alert: babies.

70538

2833

אזהרת ספויילר: תינוקות.

01:13

We’ll come back to that in a minute.

73746

2000

נחזור לזה עוד דקה.

01:16

Playing Atari games with AI involves what’s called reinforcement learning,

76163

5541

משחק במשחקי אטארי עם בינה מלאכותית כולל מה שנקרא למידה מחזקת,

01:21

where the system is designed to maximize some kind of numerical rewards.

81871

4917

שם המערכת מתוכננת למקסם סוגים מסויימים של פרסים מספריים.

01:26

In this case, those rewards were simply the game's points.

86788

3833

במקרה הזה, הפרסים האלה היו פשוט נקודות של השחקן.

01:30

This underlying goal drives the system to learn which buttons to press

90746

4333

המטרה הזו מניעה את המערכת ללמוד על איזה כפתורים ללחוץ

01:35

and when to press them to get the most points.

95079

3000

ומתי ללחוץ עליהם כדי לקבל את מירב הנקודות.

01:38

Some systems use model-based approaches, where they have a model of the environment

98079

5542

כמה מערכות מתבססות על גישה מבוססת מודל, בה יש להן מודל של הסביבה

01:43

that they can use to predict what will happen next

103621

3125

בו הן יכולות להשתמש כדי לחזות מה יקרה עכשיו

01:46

once they take a certain action.

106746

2000

ברגע שהן ינקטו בפעולה מסויימת.

01:49

DQN, however, is model free.

109288

3041

DQN, עם זאת, נטולת מודל.

01:52

Instead of explicitly modeling its environment,

112704

2584

במקום למדל מפורשות את הסביבה,

01:55

it just learns to predict, based on the images on screen,

115288

3458

היא פשוט לומדת לחזות, בהתבסס על התמונות על המסך,

01:58

how many future points it can expect to earn by pressing different buttons.

118746

4958

כמה נקודות עתידיות היא יכולה לצפות להרוויח על ידי לחיצה על כפתורים שונים.

02:03

For instance, “if the ball is here and I move left, more points,

123871

4792

לדוגמה, “אם הכדור פה ואני זזה שמאלה, יותר נקודות,

02:08

but if I move right, no more points.”

128663

2833

אבל אם אני זזה שמאלה, אין יותר נקודות.”

02:12

But learning these connections requires a lot of trial and error.

132038

4500

אבל למידת הקישורים האלה דורשת הרבה ניסוי וטעייה.

02:16

The DQN system would start by mashing buttons randomly,

136704

3834

מערכת DQN היתה מתחילה על ידי לחיצה אקראית על כפתורים,

02:20

and then slowly piece together which buttons to mash when

140538

3541

ואז לאט לאט מבינה על איזה כפתורים ללחוץ ומתי

02:24

in order to maximize its score.

144079

2125

כדי למקסם את התוצאה.

02:26

But in playing “Montezuma’s Revenge,”

146704

2375

אבל במשחק “נקמת מונטזומה,”

02:29

this approach of random button-mashing fell flat on its face.

149079

4334

הגישה הזו של לחיצה אקראית על כפתורים התרסקה.

02:34

A player would have to perform this entire sequence

154121

3000

שחקן היה צריך לבצע את כל הרצף

02:37

just to score their first points at the very end.

157121

3375

רק כדי לזכות בנקודה הראשונה ממש בסוף.

02:40

A mistake? Game over.

160871

2208

טעות? המשחק נגמר.

02:43

So how could DQN even know it was on the right track?

163538

3708

אז איך DQN אפילו תדע שזה המסלול הנכון?

02:47

This is where babies come in.

167746

2458

פה נכנסים תינוקות לתמונה.

02:50

In studies, infants consistently look longer at pictures

170746

3875

במחקרים, תינוקות מביטים בעקביות יותר זמן על תמונות

02:54

they haven’t seen before than ones they have.

174621

2667

שהם לא ראו לפני כן מאשר על אלו שראו.

02:57

There just seems to be something intrinsically rewarding about novelty.

177579

4000

פשוט נראה שיש משהו מספק באופן מהותי בנוגע לחדשנות.

03:02

This behavior has been essential in understanding the infant mind.

182121

4125

ההתנהגות הזו היתה חיונית להבנת מוח התינוקות.

03:06

It also turned out to be the secret to beating “Montezuma’s Revenge.”

186496

4792

מסתבר גם שזה הסוד להבסת “נקמת מונטזומה.”

03:12

The DeepMind researchers worked out an ingenious way

192121

3708

חוקרי דיפ מיינד העלו דרך גאונית

03:15

to plug this preference for novelty into reinforcement learning.

195829

4500

להכניס את ההעדפה הזו לחדשנות לתוך למידה מחזקת.

03:20

They made it so that unusual or new images appearing on the screen

200704

4542

הם גרמו לכך שתמונות חדשות או שונות שהופיעו על המסך

03:25

were every bit as rewarding as real in-game points.

205246

4208

היו מתגמלות כמו נקודות במשחק האמיתי.

03:29

Suddenly, DQN was behaving totally differently from before.

209704

4709

פתאום, DQN התנהגה שונה לגמרי מלפני כן.

03:34

It wanted to explore the room it was in,

214579

2334

היא רצתה לחקור את החדר בו היתה,

03:36

to grab the key and escape through the locked door—

216913

2708

כדי לתפוס את המפתח ולברוח דרך הדלת הנעולה --

03:39

not because it was worth 100 points,

219621

2708

לא בגלל שזה היה שווה 100 נקודות,

03:42

but for the same reason we would: to see what was on the other side.

222329

4667

אלא מאותה סיבה שאנחנו היינו עושים: כדי לראות מה יש בצד השני.

03:48

With this new drive, DQN not only managed to grab that first key—

228163

5250

עם המניע החדש הזה, DQN לא רק הצליחה לתפוס את המפתח הראשון --

03:53

it explored all the way through 15 of the temple’s 24 chambers.

233413

4833

היא חקרה כל הדרך עד 15 מ 24 החדרים של המקדש.

03:58

But emphasizing novelty-based rewards can sometimes create more problems

238454

4209

אבל הדגשת פרסים מבוססי חדשנות יכולה לפעמים ליצור יותר בעיות

04:02

than it solves.

242663

1166

משהיא פותרת.

04:03

A novelty-seeking system that’s played a game too long

243913

3208

מערכת מחפשת חדשנות שמשחקת משחק יותר מדי זמן

04:07

will eventually lose motivation.

247121

2500

תאבד בסוף את המוטיבציה.

04:09

If it’s seen it all before, why go anywhere?

249996

3042

אם היא ראתה את הכל לפני כן, למה ללכת למקום כלשהו?

04:13

Alternately, if it encounters, say, a television, it will freeze.

253621

5167

באופן חלופי, אם היה היתה נתקלת, נגיד, בטלוויזיה, היא היתה קופאת.

04:18

The constant novel images are essentially paralyzing.

258954

3750

התמונות החדשות המתמשכות פשוט משתקות.

04:23

The ideas and inspiration here go in both directions.

263204

3625

הרעיונות וההשראה פה מנוגדים.

04:27

AI researchers stuck on a practical problem,

267079

3125

חוקרי בינה מלאכותית שתקועים על בעיה פרקטית,

04:30

like how to get DQN to beat a difficult game,

270204

3334

כמו איך לגרום ל DQN לנצח משחק קשה,

04:33

are turning increasingly to experts in human intelligence for ideas.

273538

5000

פונים יותר ויותר למומחים בבינה אנושית לרעיונות.

04:38

At the same time,

278788

1125

באותו זמן,

04:39

AI is giving us new insights into the ways we get stuck and unstuck:

279913

5416

בינה מלאכותית נותנת לנו תובנות לדרכים בהן אנחנו נתקעים ומשתחררים:

04:45

into boredom, depression, and addiction,

285329

2792

לשעמום, דיכאון והתמכרות,

04:48

along with curiosity, creativity, and play.

288121

3667

יחד עם סקרנות, יצירתיות ומשחק.

New videos

07:02

How to learn English with the 'learning curve' ...

07:29

'MONEY makes the WORLD go round' - Mr Duncan ex...

02:09:01

English Addict Ep 359 -🔴LIVE stream / Sunday 2...

01:13:01

English Addict Ep 358 -🔴LIVE stream / Wednesda...

05:30

The golden rules for Learning English - Mr Dunc...

03:41

How to speak English with confidence - Mr Dunca...

15:27

Effective English Listening Practice with Short...

10:55

Master English Pronunciation (The Daily Pronunc...

Original video on YouTube.com

How to get better at video games, according to babies - Brian Christian - YouTube

על אתר זה

אתר זה יציג בפניכם סרטוני YouTube המועילים ללימוד אנגלית. תוכלו לראות שיעורי אנגלית המועברים על ידי מורים מהשורה הראשונה מרחבי העולם. לחץ פעמיים על הכתוביות באנגלית המוצגות בכל דף וידאו כדי להפעיל את הסרטון משם. הכתוביות גוללות בסנכרון עם הפעלת הווידאו. אם יש לך הערות או בקשות, אנא צור איתנו קשר באמצעות טופס יצירת קשר זה.

https://forms.gle/WvT1wiN1qDtmnspy7

Playback speed

Subtitle font size

How to get better at video games, according to babies - Brian Christian

New videos

How to get better at video games, according to babies - Brian Christian

New videos

Original video on YouTube.com