How Google is Expanding Reasoning Capabilities of Language Models

How Google is Expanding Reasoning Capabilities of Language Models

What is Google AI's Minerva?

If you enjoy articles about A.I. at the intersection of breaking news join AiSupremacy?here. I cannot continue to write without community support. (follow the link below). For the price of a cup of coffee, Join 79 other paying subscribers.

https://aisupremacy.substack.com/subscribe

Solving Quantitative Reasoning Problems with Language Models

Hey Guys,

This is a summary of Google's AI blog, about a recent paper that caught my interest.

If you think I write too much, I invite you to follow me instead of an Email on Substack’s iOS app, it’s?android app is coming soon. This will give you more control and agency in when to engage with the articles and a likely better reading experience.

Read AI Supremacy in the new Substack app

Now available for iOS

Get the app

At the end of June, 2022 Google Research specifically?Ethan Dyer?and?Guy Gur-Ari, Research Scientists, at Google Research, Blueshift Team, released a new research paper summarized on Google AI’s blog.

Twitter embeds in LinkedIn Newsletters aren't working at they should, but here is a Tweet about it:

So why is this a big deal?

Language models have demonstrated remarkable performance on a variety of natural language tasks —?indeed, a general lesson from many works, including?BERT,?GPT-3,?Gopher, and?PaLM, has been that neural networks trained on diverse data at large scale in an unsupervised way can perform well on a variety of tasks.

As?Meta AI,?Microsoft Research?and?Google AI?zero in on human level performance of tasks, Quantitative reasoning is one area in which language models still?fall?far?short?of human-level performance.

Read the Paper

The Beauty of Maths; Quantitative Reasoning


Solving mathematical and scientific questions requires a combination of skills, including correctly parsing a question with natural language and mathematical notation, recalling relevant formulas and constants, and generating step-by-step solutions involving numerical calculations and symbolic manipulation.

Due to these challenges, it is?often believed?that solving quantitative reasoning problems using machine learning will?require?significant?advancements?in model architecture and training techniques, granting models access to external tools such as Python interpreters, or possibly a more profound paradigm shift.

So let’s talk about their paper:

What is Minerva?


In “Solving Quantitative Reasoning Problems With Language Models” (to be released soon on the arXiv), they present Minerva,?a language model capable of solving mathematical and scientific questions using step-by-step reasoning.?They show that by focusing on collecting training data that is relevant for quantitative reasoning problems, training models at scale, and employing best-in-class inference techniques, we achieve significant performance gains on a variety of difficult quantitative reasoning tasks.

What does Minerva do?


Minerva solves such problems by generating solutions that include numerical calculations and symbolic manipulation without relying on external tools such as a calculator. The model parses and answers mathematical questions using a mix of natural language and mathematical notation.?Minerva combines several techniques, including?few-shot prompting,?chain of thought?or?scratchpad?prompting, and?majority voting, to achieve state-of-the-art performance on STEM reasoning tasks. You can explore Minerva’s output with our?interactive sample explorer!

In recent times Maths and code have been getting more attention.

A Model Built for Multi-step Quantitative Reasoning

To promote quantitative reasoning, Minerva builds on the?Pathways Language Model?(PaLM), with further training on a 118GB dataset of scientific papers from the?arXiv?preprint server and web pages that contain mathematical expressions using?LaTeX,?MathJax, or other mathematical typesetting formats.

Standard text cleaning procedures often remove symbols and formatting that are essential to the semantic meaning of mathematical expressions. By maintaining this information in the training data,?the model learns to converse using standard mathematical notation.

So this gets pretty interesting.

No alt text provided for this image
No alt text provided for this image

  • Minerva also incorporates recent prompting and evaluation techniques to better solve mathematical questions. These include?chain of thought?or?scratchpad?prompting — where Minerva is prompted with several step-by-step solutions to existing questions before being presented with a new question — and?majority voting.
  • Like most language models,?Minerva assigns probabilities?to different possible outputs. When answering a question, rather than taking the single solution
  • Minerva scores as most likely, multiple solutions are generated by sampling stochastically from all possible outputs. These solutions are different (e.g., the steps are not identical), but often arrive at the same final answer.
  • Minerva uses?majority voting?on these sampled solutions, taking the most common result as the conclusive final answer.

They then evaluated Minerva on OCWCourses, a collection of college and graduate level problems covering a variety of STEM topics such as solid state chemistry, astronomy, differential equations, and special relativity that we collected from?MIT OpenCourseWare.

So what caught my attention about this particular study and paper is that in all cases, Minerva obtains state-of-the-art results, sometimes by a wide margin.

No alt text provided for this image

What Minerva Gets Wrong

Minerva still makes its fair share of mistakes.

About half are calculation mistakes, and the other half are reasoning errors, where the solution steps do not follow a logical chain of thought.

It is also possible for the model to arrive at a correct final answer but with faulty reasoning. They call such cases “false positives”, as they erroneously count toward a model’s overall performance score.

Minerva doesn’t understand maths per se though.

Limitations

The teams’ approach to quantitative reasoning is not grounded in formal mathematics. Minerva parses questions and generates answers using a mix of natural language and?LaTeX?mathematical expressions, with no explicit underlying mathematical structure.

Future Directions

While machine learning models have become impressive tools in many scientific disciplines, they are often narrowly scoped to solve specific tasks. Google Research hopes that general models capable of solving quantitative reasoning problems will help push the frontiers of science and education.

Research by OpenAI and Google AI in particular are driving language models in new directions. Global researchers are showing significant improvements in quality and frequency of research given these new beginnings and tweaking of language models. I track papers on Synced and Market Tech Post among other blog summary websites.

This material is a recent blog article from Google AI's blog. There are so many good AI papers these days, it's getting hard to keep up. Increasingly I see my Newsletter AiSupremacy as yet another way for busy professionals to do this.

What do you think about the research and direction language models are heading?

If you enjoy articles about A.I. at the intersection of breaking news join AiSupremacy?here. I cannot continue to write without community support. (follow the link below). For the price of a cup of coffee, Join 79 other paying subscribers.

https://aisupremacy.substack.com/subscribe

Thanks for reading!

PRASHANT RANJAN

Student at Amity University Patna

2 年

Hlo sir

回复
POOJA JAIN

Storyteller | Linkedin Top Voice 2024 | Senior Data Engineer@ Globant | Linkedin Learning Instructor | 2xGCP & AWS Certified | LICAP'2022

2 年

Interesting.. Insightful share ???? Michael Spencer

Michael Spencer

A.I. Writer, researcher and curator - full-time Newsletter publication manager.

2 年

In my view, Microsoft Research and Google AI are doing even more important work in the early 2020s, and OpenAI have been a good investment for Microsoft thus far in the commercialization of GPT-3 among other things. Google Brain and Meta AI are shedding talent founding their own startups, that are important to watch. DeepMind still has the greatest concentration of AI talent out there, by a wide margin. China's AI community is also rapidly improving with a majority of researchers around the world of Chinese origin. Just check out AI papers to realize how true this is and what it means for the future. Language models are showing some serious potential as they scale and are optimized.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了