登录查看更多内容

How Google is Expanding Reasoning Capabilities of Language Models

Michael Spencer

A.I. Writer, researcher and curator - full-time Newsletter publication manager.

发布日期: 2022年7月2日

What is Google AI's Minerva?

If you enjoy articles about A.I. at the intersection of breaking news join AiSupremacy?here. I cannot continue to write without community support. (follow the link below). For the price of a cup of coffee, Join 79 other paying subscribers.

https://aisupremacy.substack.com/subscribe

Solving Quantitative Reasoning Problems with Language Models

Hey Guys,

This is a summary of Google's AI blog, about a recent paper that caught my interest.

If you think I write too much, I invite you to follow me instead of an Email on Substack’s iOS app, it’s?android app is coming soon. This will give you more control and agency in when to engage with the articles and a likely better reading experience.

Read AI Supremacy in the new Substack app

Now available for iOS

Get the app

At the end of June, 2022 Google Research specifically?Ethan Dyer?and?Guy Gur-Ari, Research Scientists, at Google Research, Blueshift Team, released a new research paper summarized on Google AI’s blog.

Twitter embeds in LinkedIn Newsletters aren't working at they should, but here is a Tweet about it:

So why is this a big deal?

Language models have demonstrated remarkable performance on a variety of natural language tasks —?indeed, a general lesson from many works, including?BERT,?GPT-3,?Gopher, and?PaLM, has been that neural networks trained on diverse data at large scale in an unsupervised way can perform well on a variety of tasks.

As?Meta AI,?Microsoft Research?and?Google AI?zero in on human level performance of tasks, Quantitative reasoning is one area in which language models still?fall?far?short?of human-level performance.

Read the Paper

The Beauty of Maths; Quantitative Reasoning

Solving mathematical and scientific questions requires a combination of skills, including correctly parsing a question with natural language and mathematical notation, recalling relevant formulas and constants, and generating step-by-step solutions involving numerical calculations and symbolic manipulation.

Due to these challenges, it is?often believed?that solving quantitative reasoning problems using machine learning will?require?significant?advancements?in model architecture and training techniques, granting models access to external tools such as Python interpreters, or possibly a more profound paradigm shift.

So let’s talk about their paper:

What is Minerva?

In “Solving Quantitative Reasoning Problems With Language Models” (to be released soon on the arXiv), they present Minerva,?a language model capable of solving mathematical and scientific questions using step-by-step reasoning.?They show that by focusing on collecting training data that is relevant for quantitative reasoning problems, training models at scale, and employing best-in-class inference techniques, we achieve significant performance gains on a variety of difficult quantitative reasoning tasks.

What does Minerva do?

Minerva solves such problems by generating solutions that include numerical calculations and symbolic manipulation without relying on external tools such as a calculator. The model parses and answers mathematical questions using a mix of natural language and mathematical notation.?Minerva combines several techniques, including?few-shot prompting,?chain of thought?or?scratchpad?prompting, and?majority voting, to achieve state-of-the-art performance on STEM reasoning tasks. You can explore Minerva’s output with our?interactive sample explorer!

领英推荐

The Sparks of AGI May Catch Fire

Michael Spencer 1 年前

?? What Next-Gen RAG Is About

Pascal Biese 1 个月前

A Historic Week for ?O?p?e?n? ?S?o?u?r?c?e? AI

Pascal Biese 3 个月前

In recent times Maths and code have been getting more attention.

A Model Built for Multi-step Quantitative Reasoning

To promote quantitative reasoning, Minerva builds on the?Pathways Language Model?(PaLM), with further training on a 118GB dataset of scientific papers from the?arXiv?preprint server and web pages that contain mathematical expressions using?LaTeX,?MathJax, or other mathematical typesetting formats.

Standard text cleaning procedures often remove symbols and formatting that are essential to the semantic meaning of mathematical expressions. By maintaining this information in the training data,?the model learns to converse using standard mathematical notation.

So this gets pretty interesting.

Minerva also incorporates recent prompting and evaluation techniques to better solve mathematical questions. These include?chain of thought?or?scratchpad?prompting — where Minerva is prompted with several step-by-step solutions to existing questions before being presented with a new question — and?majority voting.
Like most language models,?Minerva assigns probabilities?to different possible outputs. When answering a question, rather than taking the single solution
Minerva scores as most likely, multiple solutions are generated by sampling stochastically from all possible outputs. These solutions are different (e.g., the steps are not identical), but often arrive at the same final answer.
Minerva uses?majority voting?on these sampled solutions, taking the most common result as the conclusive final answer.

They then evaluated Minerva on OCWCourses, a collection of college and graduate level problems covering a variety of STEM topics such as solid state chemistry, astronomy, differential equations, and special relativity that we collected from?MIT OpenCourseWare.

So what caught my attention about this particular study and paper is that in all cases, Minerva obtains state-of-the-art results, sometimes by a wide margin.

What Minerva Gets Wrong

Minerva still makes its fair share of mistakes.

About half are calculation mistakes, and the other half are reasoning errors, where the solution steps do not follow a logical chain of thought.

It is also possible for the model to arrive at a correct final answer but with faulty reasoning. They call such cases “false positives”, as they erroneously count toward a model’s overall performance score.

Minerva doesn’t understand maths per se though.

Limitations

The teams’ approach to quantitative reasoning is not grounded in formal mathematics. Minerva parses questions and generates answers using a mix of natural language and?LaTeX?mathematical expressions, with no explicit underlying mathematical structure.

Future Directions

While machine learning models have become impressive tools in many scientific disciplines, they are often narrowly scoped to solve specific tasks. Google Research hopes that general models capable of solving quantitative reasoning problems will help push the frontiers of science and education.

Research by OpenAI and Google AI in particular are driving language models in new directions. Global researchers are showing significant improvements in quality and frequency of research given these new beginnings and tweaking of language models. I track papers on Synced and Market Tech Post among other blog summary websites.

This material is a recent blog article from Google AI's blog. There are so many good AI papers these days, it's getting hard to keep up. Increasingly I see my Newsletter AiSupremacy as yet another way for busy professionals to do this.

What do you think about the research and direction language models are heading?

https://aisupremacy.substack.com/subscribe

Thanks for reading!

Artificial Intelligence Report

241,948 位关注者

PRASHANT RANJAN

Student at Amity University Patna

2 年

Hlo sir

POOJA JAIN

2 年

Interesting.. Insightful share ???? Michael Spencer

3 次回应

Michael Spencer

A.I. Writer, researcher and curator - full-time Newsletter publication manager.

2 年

In my view, Microsoft Research and Google AI are doing even more important work in the early 2020s, and OpenAI have been a good investment for Microsoft thus far in the commercialization of GPT-3 among other things. Google Brain and Meta AI are shedding talent founding their own startups, that are important to watch. DeepMind still has the greatest concentration of AI talent out there, by a wide margin. China's AI community is also rapidly improving with a majority of researchers around the world of Chinese origin. Just check out AI papers to realize how true this is and what it means for the future. Language models are showing some serious potential as they scale and are optimized.

1 次回应

查看更多评论

要查看或添加评论，请登录

查看全部

How Google is Expanding Reasoning Capabilities of Language Models

Michael Spencer

A.I. Writer, researcher and curator - full-time Newsletter publication manager.

What is Google AI's Minerva?

Solving Quantitative Reasoning Problems with Language Models

The Beauty of Maths; Quantitative Reasoning

What is Minerva?

What does Minerva do?

领英推荐

Artificial Intelligence Report

241,948 位关注者

更多精彩文章

社区洞察

其他会员也浏览了

LLM and Knowledge Graphs; GPT-4 with Wolfram; CHITA by Google AI; Train ChatGPT on Your Documents via APIs; Why Kindness At Work Pays Off; and More

All About LLMs

Issue #228 - THE ML ENGINEER ??

AMR Future Brief| Why Have Large Language Models (LLMs) Become Indispensable to the Healthcare Sector in 2024?

??Top ML Papers of the Week

Mastering Long Document Insights: Advanced Summarization with Amazon Bedrock and Anthropic Claude 2 Foundation Model

Top LLM Papers of the Week (July Week 2, 2024)

The war for large language models

Mapping the Mind of a Large Language Model

Trustworthy AI - Latest Insights

What is Google AI's Minerva?

Solving Quantitative Reasoning Problems with Language Models

The Beauty of Maths; Quantitative Reasoning

What is Minerva?

What does Minerva do?

领英推荐

Artificial Intelligence Report

241,948 位关注者

Can India become a Leader in AI?

2024年10月31日

NotebookLM gets a Meta Llama Clone

2024年10月29日

Top Semiconductor Infographics and Newsletters

2024年10月25日

Anthropic Unveils Computer Use but where will it lead?

2024年10月24日

Why Tesla is not an AI Company

2024年10月16日

The State of Robotics 2024

2024年10月15日

The Datacenter Big Bang is about to start

2024年10月9日

Why 2025 will be the Key year for OpenAI

2024年10月3日

Google's ChatGPT? NotebookLM Mania has Set In

2024年10月2日

Google NotebookLM is a Multimodal Research Assistant

2024年9月27日

社区洞察

其他会员也浏览了

LLM and Knowledge Graphs; GPT-4 with Wolfram; CHITA by Google AI; Train ChatGPT on Your Documents via APIs; Why Kindness At Work Pays Off; and More

All About LLMs

Issue #228 - THE ML ENGINEER ??

AMR Future Brief| Why Have Large Language Models (LLMs) Become Indispensable to the Healthcare Sector in 2024?

??Top ML Papers of the Week

Mastering Long Document Insights: Advanced Summarization with Amazon Bedrock and Anthropic Claude 2 Foundation Model

Top LLM Papers of the Week (July Week 2, 2024)

The war for large language models

Mapping the Mind of a Large Language Model

Trustworthy AI - Latest Insights