Bug Finder
Credits: Deeplearning AI

Bug Finder

One challenge to making online education available worldwide is evaluating an immense volume of student work. Especially difficult is evaluating interactive computer programming assignments such as coding a game. A deep learning system automated the process by finding mistakes in completed assignments.

What’s new:?Evan Zheran Liu and colleagues at Stanford proposed?DreamGrader, a system that integrates reinforcement and supervised learning to identify errors (undesirable behaviors) in interactive computer programs and provide detailed information about where the problems lie.

Key insight:?A reinforcement learning model can play a game, randomly at first, and — if it receives the proper rewards — learn to take actions that bring about an error. A classifier can learn to recognize that the error occurred, randomly at first, and reward the RL model when it triggers the error. In this scheme, training requires a small number of student submissions that have been labeled with a particular error that is known to occur. The two models learn in an alternating fashion: The RL model plays for a while and does or doesn’t bring about the error; the classifier classifies the RL model’s actions (that is, it applies the model’s label to actions that trigger the error and, if so, dispenses a reward), then the RL model plays more, and so on. By repeating this cycle, the classifier learns to recognize an error reliably.

How it works:?DreamGrader was trained on a subset of 3,500 anonymized student responses to an assignment from the online educational platform Code.org. Students were asked to code?Bounce, a game in which a single player moves a paddle along a horizontal axis to send a ball into a goal. The authors identified eight possible errors (such as the ball bouncing out of the goal after entering and no new ball being launched after a goal was scored) and labeled the examples accordingly. The system comprised two components for each type of error: (i) a?player?that played the game (a?double dueling deep Q-network) and (ii) a classifier (an LSTM and vanilla neural network) that decided whether the error occurred.

  • The player played the game for 100 steps, each comprising a video frame and associated paddle motion, or until the score exceeded 30. The model moved the paddle based on the gameplay’s “trajectory”: (i) current x and y coordinates of the paddle and ball, (ii) x and y velocities of the ball, and (iii) previous paddle movements, coordinates, ball velocities, and rewards.
  • The player received a reward for bringing about an error, and it was trained to maximize its reward. To compute rewards, the system calculated the difference between the classification (error or no error) of the trajectory at the current and previous steps. In this way, the player received a reward only at the step in which the error occurred.
  • The feedback classifier learned in a supervised manner.
  • The authors repeated this process many times for each program to cover a wide variety of gameplay situations.
  • At inference, DreamGrader ran each player-and-classifier pair on a program and output a list of errors it found.

Results:?The authors evaluated DreamGrader on a test set of Code.org student submissions. For comparison, they modified the previous?Play to Grade, which had been designed to identify error-free submissions, to predict the presence of a specific error. DreamGrader achieved 94.3 percent accuracy — 1.5 percent short of human-level performance — while Play to Grade achieved 75.5 percent accuracy. It evaluated student submissions in around 1 second each, 180 times faster than human-level performance.

Yes, but:?DreamGrader finds only known errors. It can’t catch bugs that instructors haven’t already seen.

Why it matters:?Each student submission can be considered a different, related task. The approach known as meta-RL aims to train an agent that can learn new tasks based on experience with related tasks. Connecting these two ideas, the authors trained their model following the learning techniques expressed in the meta-RL algorithm?DREAM. Sometimes it’s not about reinventing the wheel, but reframing the problem as one we already know how to solve.

References:

  1. https://openreview.net/forum
  2. https://code.org/curriculum/course3/15/Teacher
  3. https://arxiv.org/abs/1511.06581
  4. https://arxiv.org/pdf/1511.06581.pdf
  5. https://proceedings.neurips.cc/paper/2021/hash/0b9b6d6d154e98ce34b3f2e4ef76eae9-Abstract.html
  6. https://arxiv.org/abs/2008.02790

Apurv Sibal

Passionate about building AGI and leveraging it to solve hard problems

1 年
回复

要查看或添加评论,请登录

Apurv Sibal的更多文章

  • Cloud Computing Goes Generative

    Cloud Computing Goes Generative

    Amazon aims to make it easier for its cloud computing customers to build applications that take advantage of generative…

    2 条评论
  • Optimizer Without Hyperparameters

    Optimizer Without Hyperparameters

    During training, a neural network usually updates its weights according to an optimizer that’s tuned using hand-picked…

    2 条评论
  • What Venture Investors Want

    What Venture Investors Want

    This year’s crop of hot startups shows that generative AI isn’t the only game in town. What’s new: CB Insights, which…

    4 条评论
  • Sample-Efficient Training for Robots

    Sample-Efficient Training for Robots

    Training an agent that controls a robot arm to perform a task — say, opening a door — that involves a sequence of…

    1 条评论
  • Language Models’ Impact on Jobs

    Language Models’ Impact on Jobs

    Telemarketers and college professors are most likely to find their jobs changing due to advances in language modeling…

    1 条评论
  • AI & Banking: Progress Report

    AI & Banking: Progress Report

    One bank towers above the competition when it comes to AI, a recent study suggests. What’s new: A report from market…

    1 条评论
  • Stable Biases

    Stable Biases

    Stable Diffusion may amplify biases in its training data in ways that promote deeply ingrained social stereotypes…

    1 条评论
  • The Secret Life of Data Labelers

    The Secret Life of Data Labelers

    The business of supplying labeled data for building AI systems is a global industry. But the people who do the labeling…

  • Letting Chatbots See Your Data

    Letting Chatbots See Your Data

    A new coding framework lets you pipe your own data into large language models. What’s new: LlamaIndex streamlines the…

    1 条评论
  • Making Government Multilingual

    Making Government Multilingual

    An app is bridging the language gap between the Indian government and its citizens, who speak a wide variety of…

    2 条评论

社区洞察

其他会员也浏览了