登录查看更多内容

Runs on Intel: Enhancing LLM Performance with RAG and ReAct

Robert Hallock

VP and General Manager, Client AI and Technical Marketing at Intel

发布日期: 2024年10月17日

Did you know? New techniques can give AI language models the power to understand and summarize information that wasn’t originally trained into the model.

State-of-the-art large language models are no longer frozen in time, doomed to forever repeat outdated information. These new technologies—Retrieval Augmented Generation (RAG) and ReAct Agents—augment language models with the ability to retrieve personalized and specific information from your own documents. These models and documents both exist entirely on your computer, so the analyses or summaries you generate are both personal and private. ?

Sound interesting? Intel already has these technologies up and running! Let's take a look at how the new capabilities work, using Microsoft's Phi-3 model as an example, then see it in action on an Intel Core Ultra 200V Series processor (codename: Lunar Lake).

Understanding Phi-3, RAG, and ReAct

Phi-3 is a family of small language models developed by Microsoft. The Phi-3 family consists of three flavors up to 14 billion parameters (learn more: what is a parameter?). Even the smallest Phi-3-mini, with just 3.8 billion parameters, is astonishingly good at understanding and reasoning in tasks like coding, mathematics, and logic. Phi-3-mini's size, accuracy, and advanced RAG/ReAct capabilities make it a great fit for offline use.

Retrieval Augmented Generation (RAG) allows an application using Phi-3 to access textual information in your local documents. If the language model identifies that the documents contain information that is more current or relevant than what was originally designed into the model, the model will retrieve that newer information and then augment the reply it generates.

As a living example, a model like Phi-3 could not possibly be trained to know what's inside a PDF you saved last Tuesday. But with a technology like RAG, Phi-3 could easily provide accurate replies about that content. If you expand this idea to the massive volume of information that needs to be accessed and summarized in medicine, law, business, government, or other information professions, you can see how this capability could be mighty powerful.

ReAct Agents go one step further by turning the local document retrieval of RAG into one of many sources of info the model can draw on. Whereas RAG only accesses documents, a ReAct Agent in the model can be configured to access emails, dictionaries, encyclopedias, documents, images, and more. Just like RAG, a ReAct Agent is designed to identify and incorporate external sources for much more precise answers.

RAG and ReAct Agents are designed to analyze the user's prompt against pre-trained and external sources. The best match is dynamically inserted into a modified request that gets re-submitted to the language model for a more accurate reply.

All of this begs the question: why do language models need this at all? Hallucinations. Language models can give absurdist replies when queried on a topic they do not know, and this is called a "hallucination." In practice, it doesn't feel all that different from a human trying to fake their way through an answer on a topic they don't understand. By augmenting the pre-trained knowledge of the model with access to newer and personalized information, the odds of these hallucinogenic answers drops tremendously.

As an added benefit, technologies like RAG and ReAct allow models to simultaneously get smaller and more accurate. Rather than training enormous models with gobs of information a majority of users may never touch, the model development can instead focus on excellence in core competencies. As an example, benchmarks for Microsoft's previous 2.7 billion parameter Phi-2 model often outperformed older language models 25x the size of Phi-2. Developments like these reduce the performance requirements of AI accelerators, enabling smaller GPUs and NPUs to handle the same work.

领英推荐

?? Actually Open AI: A Free o1 Alternative

Pascal Biese 4 个月前

The Core Limitations of Agent Technology: Analysis of…

宋斐 5 个月前

LLM Quantization and its Impact on Memory Consumption

Tahmid Ul Muntakim 3 个月前

RAG and ReAct in Action

Intel is at the forefront of testing and enabling new AI models on PC hardware. To date, over 500 AI models covering over a dozen disciplines have been tested to run with optimized performance on Intel Core Ultra processors. Phi-3-mini is one of those 500+ models, and it's high time we see it in action!

If you have a new laptop featuring an Intel Core Ultra 200V Series processor, you can also try RAG and ReAct yourself with Intel's AI Playground tool. AI Playground gives you access to language models like Phi-3-mini, high-res image generation, image upscaling, and more. No internet connection is required, it's easy to use, and the tool is completely free!

Laptop vendors like Acer are also taking advantage of Intel R&D by leveraging Phi-3 with RAG in the new AcerSense software, which comes pre-loaded on new notebooks featuring Intel Core Ultra 200V Series processors. Intel's work to enable, optimize, and validate these AI models is very similar to how graphics cards need optimized drivers for the best experience.

As Intel engineering continues to lead the industry in validating and optimizing AI models for offline use, we are clear-eyed about and motivated by a future where AI-based features are widely available in almost every application. And, before long, we foresee that system performance and power consumption will be deeply entangled with AI feature sets, which makes early and enthusiastic enabling work like RAG and ReAct convenient for now and vital as a foundation.

About the Authors

Robert Hallock is the General Manager and Vice President of Client AI & Technical Marketing at Intel. Prior to joining Intel, Robert spent 12 years in Client and Graphics at AMD, most recently as the Director of Product and Technology Marketing for consumer Ryzen processors. He has also been a PC hardware reviewer, journalist, and technical writer. He moonlights as a designer of high-performance aftermarket automotive components and is a lifelong PC enthusiast.

Erin Maiorino is the Director of Competitive AI Marketing at Intel. Prior to joining Intel, Erin was the Senior Product Marketing Manager for AMD Ryzen and Threadripper processors and served as the Director of Content Marketing at Lattice Semiconductor. Before working in PC hardware, Erin was in the gaming industry working on titles like Halo 4 and SWTOR. She is a self-proclaimed dog nerd and loves hiking, nosework and agility with her dog Whiskey.?

Footnotes:

Performance varies by use, configuration and other factors. Learn more?on the Performance Index site.?
No product or component can be absolutely secure.?
AI features may require software purchase, subscription or enablement by a software or platform provider, or may have specific configuration or compatibility requirements. Details at intel.com/AIPC.?
Your costs and results may vary.?
Intel technologies may require enabled hardware, software or service activation.
? Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.

Maggie Taylor Aherne

Sr. Account Director | Strategy and Account Management #customerfocused #brandtodemand #areyoufromGirlsState #boymom?? #AInerd #intentdata

5 个月

"Truly like the future is here!" Such a great statement to wrap that demo up Craig Raymond!

1 次回应

Intel AI

5 个月

This is fascinating! The advancements in AI with technologies like RAG and ReAct Agents are truly groundbreaking. It's impressive to see how these innovations can enhance the capabilities of language models by integrating personalized and up-to-date information, all while maintaining privacy.

3 次回应

Iram Mansoori

Social Media Manager & Graphic Designer at Qsc solutions

5 个月

Insightful

1 次回应

查看更多评论

要查看或添加评论，请登录

Robert Hallock的更多文章

Intel Pioneers Faster and More Efficient LLMs with "FastDraft" Project

2024年12月9日

Intel Pioneers Faster and More Efficient LLMs with "FastDraft" Project

Until recently, large language models (LLMs) faced a significant tradeoff. To improve accuracy, developers had to make…

17 条评论
TOPS Ain't Nothin' But a Number

2024年10月31日

TOPS Ain't Nothin' But a Number

In the rapidly evolving world of AI PCs, there's an undeniable gravity around one particular spec: TOPS. It’s an easy…

6 条评论

Runs on Intel: Enhancing LLM Performance with RAG and ReAct

Robert Hallock

VP and General Manager, Client AI and Technical Marketing at Intel

Understanding Phi-3, RAG, and ReAct

领英推荐

RAG and ReAct in Action

About the Authors

Robert Hallock的更多文章

社区洞察

其他会员也浏览了

Qroq support and GenAI speak in 6x languages

LLMs/GPT-x as the Infinite Monkey Theorem in Action

How GPU Cards Are Supporting Large Language Model Development

Applied AI News #9

The Magic Promise of Inference-Time Compute (Why to Care About Agents)

Mastering GPU Memory Estimation: How to Efficiently Deploy Large Language Models (LLMs)

Flashing Forward: The Promising Future of Large Language Models with FlashAttention-2

Exploring the Boundaries of Computation: Can Generative Agentic LLMs Serve as the Oracle Machine?

Run a llm on your local machine

Understanding Phi-3, RAG, and ReAct

领英推荐

RAG and ReAct in Action

About the Authors

Robert Hallock的更多文章

Intel Pioneers Faster and More Efficient LLMs with "FastDraft" Project

TOPS Ain't Nothin' But a Number

社区洞察

其他会员也浏览了

Qroq support and GenAI speak in 6x languages

LLMs/GPT-x as the Infinite Monkey Theorem in Action

How GPU Cards Are Supporting Large Language Model Development

Applied AI News #9

The Magic Promise of Inference-Time Compute (Why to Care About Agents)

Mastering GPU Memory Estimation: How to Efficiently Deploy Large Language Models (LLMs)

Flashing Forward: The Promising Future of Large Language Models with FlashAttention-2

Exploring the Boundaries of Computation: Can Generative Agentic LLMs Serve as the Oracle Machine?

Run a llm on your local machine