登录查看更多内容

New Technique Enhances AI's Problem-Solving Abilities with Python Programs

Dusan Simic

AI & VR animation studio | Innovating Immersive Media for the Next - Gen Viewership Experience | Emmy Nominated in Interactive Media | Work recognized by Forbes

发布日期: 2024年6月17日

Recent advancements in large language models, such as those behind ChatGPT, have demonstrated exceptional performance in tasks ranging from drafting legal briefs to translating documents. Despite these successes, these models often struggle with numerical or symbolic reasoning tasks, which require more than just natural language processing.

For example, a language model might easily recall a list of recent U.S. presidents and their birthdays. However, it could falter when asked, “Which U.S. presidents elected after 1950 were born on a Wednesday?” (The correct answer is Jimmy Carter.)

Addressing this limitation, researchers from MIT and other institutions have introduced a novel approach that enables large language models to tackle natural language, math, data analysis, and symbolic reasoning tasks more effectively by generating programs.

This new method, termed natural language embedded programs (NLEPs), prompts a language model to create and execute a Python program to answer a query, then translates the solution back into natural language.

The research team discovered that NLEPs significantly improved the accuracy of large language models across various reasoning tasks. Moreover, this approach is versatile, allowing a single NLEP prompt to be used for multiple tasks.

NLEPs also enhance transparency, as users can inspect the generated programs to understand how the model arrived at its conclusions and correct any mistakes directly.

“We aim for AI to perform complex reasoning in a transparent and trustworthy manner. While there is still much progress to be made, combining programming and natural language capabilities in large language models is a promising first step towards a future where AI is fully understandable and reliable,” says Hongyin Luo, PhD ’22, an MIT postdoc and co-lead author of a paper on NLEPs.

Luo collaborated on this paper with co-lead authors Tianhua Zhang from the Chinese University of Hong Kong and Jiaxin Ge from Peking University; Yoon Kim, an assistant professor at MIT’s Department of Electrical Engineering and Computer Science; and senior author James Glass, a senior research scientist at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL). The research will be presented at the Annual Conference of the North American Chapter of the Association for Computational Linguistics.

Solving Problems with Programs

Most large language models function by predicting the next word based on natural language input. While models like GPT-4 can generate programs, they typically embed these programs within natural language, which can lead to reasoning errors.

In contrast, the MIT researchers’ NLEP approach prompts the model to generate step-by-step Python code, embedding the necessary natural language within the program.

An NLEP consists of four steps: importing necessary packages, incorporating natural language representations of the required knowledge, implementing a function to calculate the answer, and outputting the result in natural language with an optional data visualization.

领英推荐

Graph of Thoughts with LLMs; GPT Can Solve Math…

Danny Butvinik 1 年前

Fine-Tuning a Language Model

Solutyics 8 个月前

New Open Long-Context LLM; LLMs For Text Analysis;…

Danny Butvinik 1 年前

“It’s like a digital calculator that ensures correct computation as long as the program is accurate,” says Luo.

Users can easily review and correct errors in the code directly, bypassing the need to rerun the entire model for troubleshooting.

This method also offers efficiency. If a user has multiple similar questions, they can generate one core program and adjust specific variables without rerunning the model.

To generate an NLEP, researchers instruct the model to write a Python program, provide two NLEP examples (one involving math and one involving natural language), and pose one test question.

“Typically, few-shot prompting requires designing prompts for each task. We found that one prompt can serve multiple tasks because it teaches the model to solve various problems by writing a program,” explains Luo.

“Using code for reasoning opens up numerous opportunities for tool use, output validation, and structured understanding of the model’s capabilities,” adds Leonid Karlinsky, principal scientist at the MIT-IBM Watson AI Lab.

Achieving High Accuracy

NLEPs achieved over 90 percent accuracy when using GPT-4 for symbolic reasoning tasks, such as tracking shuffled objects or playing the game 24, as well as for instruction-following and text classification tasks. The method outperformed task-specific prompting methods by 30 percent and showed improvements over open-source language models.

Beyond enhancing accuracy, NLEPs could improve data privacy by running programs locally, keeping sensitive user data secure. This approach also allows smaller language models to perform better without costly retraining.

“There’s no magic involved. We use program generation instead of natural language generation, significantly improving performance,” says Luo.

However, NLEPs depend on the model’s program generation capability, making the technique less effective for smaller models trained on limited datasets. Future research will explore ways to enhance smaller models' ability to generate effective NLEPs and investigate prompt variations to improve the robustness of the model’s reasoning processes.

要查看或添加评论，请登录

Dusan Simic的更多文章

AI-Enhanced Mammograms Reveal Heart Disease Risk, New Study Finds

2025年3月20日

AI-Enhanced Mammograms Reveal Heart Disease Risk, New Study Finds

Dual-Purpose Screening Shows Promise for Women's Health A groundbreaking study presented at the American College of…

2 条评论
Four Semiconductor Stocks for AI Growth

2025年3月20日

Four Semiconductor Stocks for AI Growth

The recent market volatility has created potential opportunities in the semiconductor sector, particularly among…
Nvidia Unveils Next-Generation AI Chips, Accelerating Its Product Release Timeline

2025年3月19日

Nvidia Unveils Next-Generation AI Chips, Accelerating Its Product Release Timeline

Nvidia has revealed its latest advancements in AI computing technology at its annual GTC conference, introducing new…
Baidu Unveils Next-Generation AI Models ERNIE 4.5 and ERNIE X1

2025年3月18日

Baidu Unveils Next-Generation AI Models ERNIE 4.5 and ERNIE X1

Baidu has expanded its artificial intelligence portfolio with two cutting-edge foundation models - ERNIE 4.5 and ERNIE…
Analysis and Overview of AI ReCamMaster Huggingface Project

2025年3月17日

Analysis and Overview of AI ReCamMaster Huggingface Project

1. Executive Summary ReCamMaster is a cutting-edge framework for camera-controlled generative video re-rendering…
ServiceNow Revolutionizes Enterprise Operations with AI-Powered Yokohama Platform

2025年3月14日

ServiceNow Revolutionizes Enterprise Operations with AI-Powered Yokohama Platform

ServiceNow has unveiled its groundbreaking Yokohama platform, introducing advanced AI agents designed to transform…

1 条评论
MIT Researcher Pioneers AI-Driven Drone Systems to Transform Dangerous Airfield Assessments

2025年3月13日

MIT Researcher Pioneers AI-Driven Drone Systems to Transform Dangerous Airfield Assessments

In a world where military personnel still walk dangerous terrain looking for unexploded ordnance, one Air Force…
Gemma 3 Unleashed: Google's Next-Generation Open AI Models Set New Standards

2025年3月12日

Gemma 3 Unleashed: Google's Next-Generation Open AI Models Set New Standards

Google has unveiled Gemma 3, the newest iteration of its open AI model family, designed to revolutionize accessibility…
Investing in Palantir Technologies: A Long-Term Bet on AI's Future

2025年3月11日

Investing in Palantir Technologies: A Long-Term Bet on AI's Future

Shares of Palantir Technologies (NASDAQ: PLTR) aren't cheap, and the company's future carries a fair share of…

3 条评论
ServiceNow to Acquire AI Startup Moveworks for $2.85 Billion

2025年3月10日

ServiceNow to Acquire AI Startup Moveworks for $2.85 Billion

Enterprise software giant ServiceNow announced on Monday that it has reached an agreement to acquire AI and…

1 条评论

See all articles

New Technique Enhances AI's Problem-Solving Abilities with Python Programs

Dusan Simic

AI & VR animation studio | Innovating Immersive Media for the Next - Gen Viewership Experience | Emmy Nominated in Interactive Media | Work recognized by Forbes

领英推荐

Dusan Simic的更多文章

社区洞察

其他会员也浏览了

AMR Future Brief| Why Have Large Language Models (LLMs) Become Indispensable to the Healthcare Sector in 2024?

Retrieval Augmented Generation and?Beyond

Large Concept Models (LCMs): A New Paradigm in AI Language Processing

A Guide to Training Your Own Language Model

NLP

Large Language Models

Natural Language Processing using Python

FOD#50: The Rise of Self-Evolving Language Models

Unlocking the Power of Small Language Models (SLMs): Evolution of Phi

From Text to Talk: Understanding Next Word Prediction in Large Language Models

领英推荐

Dusan Simic的更多文章

AI-Enhanced Mammograms Reveal Heart Disease Risk, New Study Finds

Four Semiconductor Stocks for AI Growth

Nvidia Unveils Next-Generation AI Chips, Accelerating Its Product Release Timeline

Baidu Unveils Next-Generation AI Models ERNIE 4.5 and ERNIE X1

Analysis and Overview of AI ReCamMaster Huggingface Project

ServiceNow Revolutionizes Enterprise Operations with AI-Powered Yokohama Platform

MIT Researcher Pioneers AI-Driven Drone Systems to Transform Dangerous Airfield Assessments

Gemma 3 Unleashed: Google's Next-Generation Open AI Models Set New Standards

Investing in Palantir Technologies: A Long-Term Bet on AI's Future

ServiceNow to Acquire AI Startup Moveworks for $2.85 Billion

社区洞察

其他会员也浏览了

AMR Future Brief| Why Have Large Language Models (LLMs) Become Indispensable to the Healthcare Sector in 2024?

Retrieval Augmented Generation and?Beyond

Large Concept Models (LCMs): A New Paradigm in AI Language Processing

A Guide to Training Your Own Language Model

NLP

Large Language Models

Natural Language Processing using Python

FOD#50: The Rise of Self-Evolving Language Models

Unlocking the Power of Small Language Models (SLMs): Evolution of Phi

From Text to Talk: Understanding Next Word Prediction in Large Language Models