登录查看更多内容

Bug Finder

Apurv Sibal

Passionate about building AGI and leveraging it to solve hard problems

发布日期: 2023年7月6日

One challenge to making online education available worldwide is evaluating an immense volume of student work. Especially difficult is evaluating interactive computer programming assignments such as coding a game. A deep learning system automated the process by finding mistakes in completed assignments.

What’s new:?Evan Zheran Liu and colleagues at Stanford proposed?DreamGrader, a system that integrates reinforcement and supervised learning to identify errors (undesirable behaviors) in interactive computer programs and provide detailed information about where the problems lie.

Key insight:?A reinforcement learning model can play a game, randomly at first, and — if it receives the proper rewards — learn to take actions that bring about an error. A classifier can learn to recognize that the error occurred, randomly at first, and reward the RL model when it triggers the error. In this scheme, training requires a small number of student submissions that have been labeled with a particular error that is known to occur. The two models learn in an alternating fashion: The RL model plays for a while and does or doesn’t bring about the error; the classifier classifies the RL model’s actions (that is, it applies the model’s label to actions that trigger the error and, if so, dispenses a reward), then the RL model plays more, and so on. By repeating this cycle, the classifier learns to recognize an error reliably.

How it works:?DreamGrader was trained on a subset of 3,500 anonymized student responses to an assignment from the online educational platform Code.org. Students were asked to code?Bounce, a game in which a single player moves a paddle along a horizontal axis to send a ball into a goal. The authors identified eight possible errors (such as the ball bouncing out of the goal after entering and no new ball being launched after a goal was scored) and labeled the examples accordingly. The system comprised two components for each type of error: (i) a?player?that played the game (a?double dueling deep Q-network) and (ii) a classifier (an LSTM and vanilla neural network) that decided whether the error occurred.

The player played the game for 100 steps, each comprising a video frame and associated paddle motion, or until the score exceeded 30. The model moved the paddle based on the gameplay’s “trajectory”: (i) current x and y coordinates of the paddle and ball, (ii) x and y velocities of the ball, and (iii) previous paddle movements, coordinates, ball velocities, and rewards.
The player received a reward for bringing about an error, and it was trained to maximize its reward. To compute rewards, the system calculated the difference between the classification (error or no error) of the trajectory at the current and previous steps. In this way, the player received a reward only at the step in which the error occurred.
The feedback classifier learned in a supervised manner.
The authors repeated this process many times for each program to cover a wide variety of gameplay situations.
At inference, DreamGrader ran each player-and-classifier pair on a program and output a list of errors it found.

领英推荐

If You Don't Like Strawberry, Give Pineapple A Try

Jean Ng ?? 5 个月前

Celebrating Children’s Day: How AI-Driven E-Learning…

iLeaf Solutions 4 个月前

?? Twin Code Lab Renewed!

Twin Science & Robotics 1 个月前

Results:?The authors evaluated DreamGrader on a test set of Code.org student submissions. For comparison, they modified the previous?Play to Grade, which had been designed to identify error-free submissions, to predict the presence of a specific error. DreamGrader achieved 94.3 percent accuracy — 1.5 percent short of human-level performance — while Play to Grade achieved 75.5 percent accuracy. It evaluated student submissions in around 1 second each, 180 times faster than human-level performance.

Yes, but:?DreamGrader finds only known errors. It can’t catch bugs that instructors haven’t already seen.

Why it matters:?Each student submission can be considered a different, related task. The approach known as meta-RL aims to train an agent that can learn new tasks based on experience with related tasks. Connecting these two ideas, the authors trained their model following the learning techniques expressed in the meta-RL algorithm?DREAM. Sometimes it’s not about reinventing the wheel, but reframing the problem as one we already know how to solve.

References:

Generative AI with LLMs

691 位关注者

Apurv Sibal

Passionate about building AGI and leveraging it to solve hard problems

1 年

Research paper: https://arxiv.org/pdf/2008.02790.pdf

要查看或添加评论，请登录

Apurv Sibal的更多文章

Cloud Computing Goes Generative

2023年8月3日

Cloud Computing Goes Generative

Amazon aims to make it easier for its cloud computing customers to build applications that take advantage of generative…

2 条评论
Optimizer Without Hyperparameters

2023年7月22日

Optimizer Without Hyperparameters

During training, a neural network usually updates its weights according to an optimizer that’s tuned using hand-picked…

2 条评论
What Venture Investors Want

2023年7月22日

What Venture Investors Want

This year’s crop of hot startups shows that generative AI isn’t the only game in town. What’s new: CB Insights, which…

4 条评论
Sample-Efficient Training for Robots

2023年7月12日

Sample-Efficient Training for Robots

Training an agent that controls a robot arm to perform a task — say, opening a door — that involves a sequence of…

1 条评论
Language Models’ Impact on Jobs

2023年7月12日

Language Models’ Impact on Jobs

Telemarketers and college professors are most likely to find their jobs changing due to advances in language modeling…

1 条评论
AI & Banking: Progress Report

2023年7月12日

AI & Banking: Progress Report

One bank towers above the competition when it comes to AI, a recent study suggests. What’s new: A report from market…

1 条评论
Stable Biases

2023年7月12日

Stable Biases

Stable Diffusion may amplify biases in its training data in ways that promote deeply ingrained social stereotypes…

1 条评论
The Secret Life of Data Labelers

2023年7月6日

The Secret Life of Data Labelers

The business of supplying labeled data for building AI systems is a global industry. But the people who do the labeling…
Letting Chatbots See Your Data

2023年7月6日

Letting Chatbots See Your Data

A new coding framework lets you pipe your own data into large language models. What’s new: LlamaIndex streamlines the…

1 条评论
Making Government Multilingual

2023年7月6日

Making Government Multilingual

An app is bridging the language gap between the Indian government and its citizens, who speak a wide variety of…

2 条评论

See all articles

Bug Finder

Apurv Sibal

Passionate about building AGI and leveraging it to solve hard problems

领英推荐

Generative AI with LLMs

691 位关注者

Apurv Sibal的更多文章

社区洞察

其他会员也浏览了

?? Now My Students Want Quizzes!

?? How AI Can Transform Teaching and Learning

5 Best Computer Vision and OpenCV Courses for Beginners in 2024

Embracing a New Era: Harnessing Generative AI for Computer Science Education

AI for Kids

ADSUP Client Plan Analysis

OpenAI's Latest Educational Arsenal: The Power Tools for Tomorrow's Schools

Artificial Intelligence and Mathematics Education: Revolutionizing Learning for the Digital Era

Learn Engineering on LinkedIn

The Fatal Flaw in Worked Examples: How Poor Design Leads to Lethal Mutations

领英推荐

Generative AI with LLMs

691 位关注者

Apurv Sibal的更多文章

Cloud Computing Goes Generative

Optimizer Without Hyperparameters

What Venture Investors Want

Sample-Efficient Training for Robots

Language Models’ Impact on Jobs

AI & Banking: Progress Report

Stable Biases

The Secret Life of Data Labelers

Letting Chatbots See Your Data

Making Government Multilingual

社区洞察

其他会员也浏览了

?? Now My Students Want Quizzes!

?? How AI Can Transform Teaching and Learning

5 Best Computer Vision and OpenCV Courses for Beginners in 2024

Embracing a New Era: Harnessing Generative AI for Computer Science Education

AI for Kids

ADSUP Client Plan Analysis

OpenAI's Latest Educational Arsenal: The Power Tools for Tomorrow's Schools

Artificial Intelligence and Mathematics Education: Revolutionizing Learning for the Digital Era

Learn Engineering on LinkedIn

The Fatal Flaw in Worked Examples: How Poor Design Leads to Lethal Mutations