Bug Finder
Apurv Sibal
Passionate about building and leveraging artificial intelligence to solve problems
One challenge to making online education available worldwide is evaluating an immense volume of student work. Especially difficult is evaluating interactive computer programming assignments such as coding a game. A deep learning system automated the process by finding mistakes in completed assignments.
What’s new:?Evan Zheran Liu and colleagues at Stanford proposed?DreamGrader, a system that integrates reinforcement and supervised learning to identify errors (undesirable behaviors) in interactive computer programs and provide detailed information about where the problems lie.
Key insight:?A reinforcement learning model can play a game, randomly at first, and — if it receives the proper rewards — learn to take actions that bring about an error. A classifier can learn to recognize that the error occurred, randomly at first, and reward the RL model when it triggers the error. In this scheme, training requires a small number of student submissions that have been labeled with a particular error that is known to occur. The two models learn in an alternating fashion: The RL model plays for a while and does or doesn’t bring about the error; the classifier classifies the RL model’s actions (that is, it applies the model’s label to actions that trigger the error and, if so, dispenses a reward), then the RL model plays more, and so on. By repeating this cycle, the classifier learns to recognize an error reliably.
How it works:?DreamGrader was trained on a subset of 3,500 anonymized student responses to an assignment from the online educational platform Code.org. Students were asked to code?Bounce, a game in which a single player moves a paddle along a horizontal axis to send a ball into a goal. The authors identified eight possible errors (such as the ball bouncing out of the goal after entering and no new ball being launched after a goal was scored) and labeled the examples accordingly. The system comprised two components for each type of error: (i) a?player?that played the game (a?double dueling deep Q-network) and (ii) a classifier (an LSTM and vanilla neural network) that decided whether the error occurred.
领英推荐
Results:?The authors evaluated DreamGrader on a test set of Code.org student submissions. For comparison, they modified the previous?Play to Grade, which had been designed to identify error-free submissions, to predict the presence of a specific error. DreamGrader achieved 94.3 percent accuracy — 1.5 percent short of human-level performance — while Play to Grade achieved 75.5 percent accuracy. It evaluated student submissions in around 1 second each, 180 times faster than human-level performance.
Yes, but:?DreamGrader finds only known errors. It can’t catch bugs that instructors haven’t already seen.
Why it matters:?Each student submission can be considered a different, related task. The approach known as meta-RL aims to train an agent that can learn new tasks based on experience with related tasks. Connecting these two ideas, the authors trained their model following the learning techniques expressed in the meta-RL algorithm?DREAM. Sometimes it’s not about reinventing the wheel, but reframing the problem as one we already know how to solve.
References:
Passionate about building and leveraging artificial intelligence to solve problems
1 年Research paper: https://arxiv.org/pdf/2008.02790.pdf