登录查看更多内容

Weekly Update: New LLM Models and the Basics of RAG

Kashif Manzoor

Enabling Customers for a Successful AI Adoption | AI Tech Evangelist | AI Solutions Architect

发布日期: 2024年8月4日

Stay Ahead in AI with Weekly AI Roundup; read and listen on AITechCircle:

Welcome to the weekly AI Newsletter, your go-to source for practical and actionable ideas. I'm here to give you tips to apply to your job and business immediately.

Before we start, share this week's updates with a friend or a colleague:

Today at a Glance:

Start your learning journey to RAG
Generative AI Usecase: Simulating Urban Planning Scenarios (Urban Planning/Future of Cities)
AI Weekly news and updates covering newly released LLMs
Open Tech Talk Podcast, the latest episode on career Growth Strategies with Executive Coach Vladimir Baranov

RAG Basics: A Beginner’s Guide to Retrieval-Augmented Generation

if you are new to this topic before you start reading this, I would suggest you go through these 2 earlier editions of the newsletter:

Build Your Business Specific LLMs Using RAG." this is to understand the fundamentals
Chat with Knowledge Base through RAG. Technical code to build RAG-Based Chatbots

This week, we will start with the very basic RAG from scratch, based on the repository available from Mistral LLM. The goal is to clarify your understanding of RAG’s internal workings and equip you with the foundational knowledge needed to construct an RAG using minimal dependencies.

Let's start with installing the required packages:

Now, to get the data from any article or document, or web source:

response = requests.get('https://www.gutenberg.org/cache/epub/1513/pg1513.txt')text = response.text

Then, split the data into Chunks: In a Retrieval-Augmented Generation (RAG) system, breaking the document into smaller chunks is essential for efficiently identifying and retrieving the most relevant information during the retrieval process. In this example, we split the text by characters and grouped 2048 characters into each chunk.

Key points:

Chunk size: To achieve optimal performance in RAG, we may need to customize or experiment with different chunk sizes and overlaps based on the specific use case. Smaller chunks can be more beneficial for retrieval processes, as larger chunks often contain filler text that can obscure semantic representation. Using smaller chunks allows the RAG system to identify and extract relevant information more effectively and accurately. However, be mindful of the trade-offs, such as increased processing time and computational resources, that come with using smaller chunks.

How to split: The simplest method is to split the text by character, but other options are based on the use case and document structure. To avoid exceeding token limits in API calls, you might need to split the text by tokens. Consider splitting the text into sentences, paragraphs, or HTML headers to maintain chunk cohesiveness. When working with code, it’s often best to split by meaningful code chunks, such as using an Abstract Syntax Tree (AST) parser.

Creation of embeddings for each text chunk:

Text embeddings convert text into numeric representations in a vector, enabling the model to understand semantic relationships between words. Words with similar meanings will be closer in this space, which is crucial for tasks like information retrieval and semantic search.

To generate these embeddings, we use Mistral AI’s embeddings API endpoint with the mistral-embed model. We create a function called get_text_embedding to retrieve the embedding for a single text chunk. Then, we use list comprehension to apply this function to all text chunks and obtain their embeddings efficiently.

Loading into Vector Database: after getting the embeddings in place, we need to store them in the Vector Database.

The question that the user will ask needs to create embeddings and then receive similar chunks from the Vector DB.

To perform a search on the vector database, we use the index .search method, which requires two arguments: the vector of the question embeddings and the number of similar vectors to retrieve. This method returns the distances and indices of the most similar vectors to the question vector in the database. Using these indices, we can then retrieve the corresponding relevant text chunks.

There are some common methods:

Similarity Search with Embeddings: This method uses embeddings to find similar text chunks based on their vector representations. It’s a straightforward approach that directly compares the vector distances.
Filtering with Metadata: If metadata is available, it can be beneficial to filter the data based on this metadata before performing the similarity search. This can narrow the search space and improve the relevance of the results.
Statistical Retrieval Methods: TF-IDF (Term Frequency-Inverse Document Frequency) evaluates the importance of a term in a document relative to a collection of documents. It uses the frequency of terms to identify relevant text chunks. BM25 is a ranking function based on term frequency and document length, which provides a more nuanced approach to identifying relevant text chunks compared to TF-IDF.

Combine Context and Question in a Prompt to Generate a Response:

Lastly, we can use the retrieved text chunks as a context within the prompt to generate a response.

Prompting Techniques for Developing a RAG System: In developing a Retrieval-Augmented Generation (RAG) system, various prompting techniques can significantly enhance the model’s performance and the quality of its responses.

Here are some key techniques that can be applied:

Few-Shot Learning: Few-shot learning involves providing the model with a few task examples to guide its responses. By including these examples in the prompt, the model can better understand the desired format and context, leading to more accurate and relevant answers. Example: Suppose you are building an RAG system to answer questions about historical events. The prompt could include a few examples of questions and answers to show the model how to respond appropriately.
Explicit Instructions: Explicitly instructing the model to format its answers in a specific way can help standardize the output, making it more consistent and easier to interpret. This can be especially useful for tasks that require a specific structure, such as generating reports or summaries. Example: If you need the model to provide responses in bullet points or a numbered list, you can include these instructions in the prompt to ensure the output follows the desired format

Head over to this link, and you can try building your first simple RAG.

Weekly News & Updates...

Last week's AI breakthroughs marked another leap forward in the tech revolution.

Llama 3.1 has introduced a new version that has improved reasoning capabilities, a larger 128K token context window, and support for 8 languages
FLUX.1 suite of models from Black Forest Labs that push the frontiers of text-to-image synthesis
GitHub Models: you can access different models via a built-in playground on GitHub
Meta Segment Anything Model 2 (SAM 2), the first unified model for real-time, promptable object segmentation in images & videos.
Prompt Tuner from Cohere uses customizable optimization and evaluation loops to refine prompts for generative language use cases.
Minitron is a family of small language models (SLMs) obtained by pruning NVIDIA's Nemotron-4 15B model

The Cloud: the backbone of the AI revolution

Deploy Llama 3.1 405B in OCI Data Science Link
Designing Generative AI Solutions: Key Lessons Learned Link
Everybody Will Have an AI Assistant,’ NVIDIA CEO Tells SIGGRAPH Audience Link

Gen AI Use Case of the Week:

Generative AI use cases in the Government and Public Sector :

Utilizing large language models (LLMs) for Simulating Urban Planning Scenarios (Urban Planning/Future of Cities), this use case derived from Deloitte

Business Challenges

Complexity of Urban Planning: Urban planning involves numerous variables, including demographics, infrastructure, environmental impact, and economic factors. Integrating these into a coherent plan is complex.
Time-Consuming Processes: Traditional urban planning methods are time-consuming and require extensive manual labor and iteration.
Resource Constraints: Limited access to real-time data and advanced tools can hinder effective planning and decision-making.
Public Participation: Ensuring community engagement and feedback in the planning process can be challenging.

AI Solution Description

Using large language models (LLMs), generative AI can simulate urban planning scenarios by processing vast data and generating multiple design concepts.

Here’s how it can be done:

Data Integration: The AI model ingests various data sources, including demographic data, environmental reports, infrastructure details, and economic statistics.

Scenario Generation: The LLM processes this data to generate multiple urban planning scenarios. It can create detailed descriptions, visualizations, and potential outcomes for each scenario.

Simulation and Optimization: The generated scenarios are then simulated to predict their impacts. The AI model optimizes these scenarios based on predefined goals, such as sustainability, economic growth, and livability.

领英推荐

Weekly AI Research Roundup (11-18 Nov)

Generative AI 4 个月前

Top 5 Trends in Data Science and Machine Learning in…

DTC Infotech Pvt. Ltd. 3 个月前

AI Innovations: Unveiling the Latest Breakthroughs

Bayes Labs 2 个月前

Expected Impact/Business Outcome

Revenue: More efficient planning processes can lead to cost savings and better resource allocation, ultimately boosting economic development.
User Experience: Improved urban design and infrastructure enhance the quality of life for residents and increase public satisfaction.
Operations: Streamlined planning processes reduce the time and effort required for urban development projects.
Process: Automating data analysis and scenario generation speeds up the planning process and improves decision-making.
Cost: Reducing manual labor and errors in planning processes can significantly cut down costs.

Required Data Sources

Environmental impact reports
Infrastructure data
Economic Statistics
Historical urban planning data
Public feedback and survey results

Strategic Fit and Impact

Implementing generative AI in urban planning aligns well with the strategic goals of modernizing infrastructure, improving public services, and fostering sustainable development. The high impact rating reflects its potential to transform urban planning processes, leading to more efficient and effective development outcomes.

Rating: High Impact & strategic fit

Favorite Tip Of The Week:

Here's my favorite resource of the week.

NVIDIA founder and CEO Jensen Huang and Meta founder and CEO Mark Zuckerberg explore how fundamental research drives AI breakthroughs. They highlight how generative AI and open-source software are empowering developers and creators. Additionally, they discuss the role of generative AI in creating virtual worlds and the potential of these worlds in advancing the next generation of AI and robotics.

Potential of AI

Try experimental demos featuring the latest AI research from Meta, AI Demos

Things to Know...

This week, I liked the resources on AI and Generative AI from Georgetown University on how to use Gen AI and cite it in your articles, papers, and research.

"To cite the informational product generated by ChatGPT or other AI, the recommendation is for the Methodology and/or Introduction of your paper to specify the following:

The prompt you used when utilizing ChatGPT; and
The text that the chatbot produced in response. If the response from ChatGPT is lengthy, please include it in the form of an Appendix.

Please remember that if AI connects you to another resource, you need to cite that resource, just as you would in a literature review."

The Opportunity...

Podcast:

This week's Open Tech Talks episode 141 is "Career Growth Strategies with Executive Coach Vladimir Baranov," Founder and Certified Executive Coach of Human Interfaces. As an entrepreneur and the business leader behind several successful tech companies, he knows what it takes to survive and blossom in today’s chaotic business landscape. Tune in to gain valuable insights and actionable strategies to advance your career, communicate effectively, sell your products, and navigate common challenges in the technology field.

Apple | Spotify | Youtube

Courses to attend:

Prompt Compression and Query Optimization: In this course, you’ll learn to integrate traditional database features with vector search capabilities to optimize the performance and cost-efficiency of large-scale RAG applications.
IBM: AI for Everyone: Master the Basics: Understand AI, its applications and use cases, and how it transforms our lives. Explain terms like Machine Learning, Deep Learning, and Neural Networks.
Embedding Models: From Architecture to Implementation: This course covers the details of the architecture and capabilities of embedding models, which are used in many AI applications to capture the meaning of words and sentences.

Events:

GITEX GLOBAL, Oct 14-18, 2024, Dubai, UAE
EUROPEAN Conference on Artificial Intelligence, Oct 19-24, 2024 Santiago de Compostela
TED Conference on AI, October 17-19, 2024 | Vienna, Austria

Tech and Tools...

Zerox OCR: A simple way of OCR-ing a document for AI ingestion
MIMIC-CXR Database: The MIMIC Chest X-ray (MIMIC-CXR) Database v2.0.0 is a large publicly available dataset of chest radiographs in DICOM format with free-text radiology reports. The dataset contains 377,110 images corresponding to 227,835 radiographic studies

Data Sets...

GOT-10k: Generic Object Tracking: The dataset contains over 10,000 video segments of real-world moving objects and over 1.5 million manually labeled bounding boxes
ExecuTorch enables on-device inference capabilities across mobile and edge devices, including wearables, embedded devices, and microcontrollers. It is part of the PyTorch Edge ecosystem and enables efficient deployment of PyTorch models to edge devices.

Other Technology News

Want to stay on the cutting edge?

Here's what else is happening in Information Technology you should know about:

AI Startup Anthropic Faces Backlash for Excessive Web Scraping as reported by Techopedia
Has the AI bubble burst? Wall Street wonders if artificial intelligence will ever make money. a story covered by CNN

Join a mini email course on Generative AI ...

Introduction to Generative AI for Newbies

Earlier week's Post:

That's it!

As always, thanks for reading.

Hit reply and let me know what you found most helpful this week - I'd love to hear from you!

Until next week,

Kashif Manzoor

The opinions expressed here are solely my conjecture based on experience, practice, and observation. They do not represent the thoughts, intentions, plans, or strategies of my current or previous employers or their clients/customers. The objective of this newsletter is to share and learn with the community.

AI Tech Circle

1,720 位关注者

Godwin Josh

Co-Founder of Altrosyn and DIrector at CDTECH | Inventor | Manufacturer

7 个月

The next generation of LLMs will seamlessly integrate with external knowledge bases, enabling truly contextual and insightful responses. Imagine an LLM that can access and synthesize information from scientific journals in real-time, providing groundbreaking insights. In a recent forecast, Elon Musk predicted that AI will eventually surpass human intelligence. How do you envision your approach evolving to ensure ethical and responsible development in this rapidly advancing field?

要查看或添加评论，请登录

Kashif Manzoor的更多文章

Generative AI Engagement Pyramid for growing AI Inference

2025年3月16日

Generative AI Engagement Pyramid for growing AI Inference

Welcome to your weekly AI Newsletter from AITechCircle! I'm building and implementing AI solutions and sharing…
Is the AI race Your organization's AI Race?

2025年3月2日

Is the AI race Your organization's AI Race?

Welcome to your weekly AI Newsletter from AITechCircle! I'm building and implementing AI solutions and sharing…

2 条评论
Agentic AI: Creating your first AI Agent in OCI

2025年2月16日

Agentic AI: Creating your first AI Agent in OCI

Welcome to your weekly AI Newsletter from AITechCircle! I'm building and implementing AI solutions and sharing…

5 条评论
What AI Can Do, Its Risks, and Mitigation Strategies from the International AI Safety Report

2025年2月9日

What AI Can Do, Its Risks, and Mitigation Strategies from the International AI Safety Report

Welcome to your weekly AI Newsletter from AITechCircle! I'm building and implementing AI solutions and sharing…
DeepSeek Special Edition: Action Steps for the Week

2025年2月2日

DeepSeek Special Edition: Action Steps for the Week

Welcome to your weekly AI Newsletter from AITechCircle! I'm Building, Implementing AI solutions, and sharing everything…
Race to Deliver Generative AI Outcomes

2025年1月26日

Race to Deliver Generative AI Outcomes

Welcome to your weekly AI Newsletter from AITechCircle! I'm Building, Implementing AI solutions, and sharing everything…
Introduction to 4 Agentic AI Design Patterns

2025年1月19日

Introduction to 4 Agentic AI Design Patterns

Welcome to your weekly AI Newsletter from AITechCircle! I'm Building, Implementing AI solutions, and sharing everything…
From Perception AI to Generative to Agentic to Physical AI

2025年1月12日

From Perception AI to Generative to Agentic to Physical AI

Welcome to your weekly AI Newsletter from AITechCircle! I'm Building, Implementing AI solutions, and sharing everything…
Generative AI: Use Case-Driven Approach vs Capability-Driven Approach

2025年1月4日

Generative AI: Use Case-Driven Approach vs Capability-Driven Approach

Welcome to your weekly AI Newsletter from AITechCircle! I'm Building, Implementing AI solutions, and sharing everything…
2024 Wrapping up a Year of Shared Learning and Growth in AI

2024年12月28日

2024 Wrapping up a Year of Shared Learning and Growth in AI

Welcome to your weekly AI Newsletter from AITechCircle! I'm Building, Implementing AI solutions, and sharing everything…

1 条评论

See all articles

Weekly Update: New LLM Models and the Basics of RAG

Kashif Manzoor

Enabling Customers for a Successful AI Adoption | AI Tech Evangelist | AI Solutions Architect

Today at a Glance:

RAG Basics: A Beginner’s Guide to Retrieval-Augmented Generation

Weekly News & Updates...

The Cloud: the backbone of the AI revolution

Gen AI Use Case of the Week:

领英推荐

Favorite Tip Of The Week:

Potential of AI

Things to Know...

The Opportunity...

Tech and Tools...

Data Sets...

Other Technology News

Join a mini email course on Generative AI ...

Earlier week's Post:

That's it!

AI Tech Circle

1,720 位关注者

Kashif Manzoor的更多文章

社区洞察

其他会员也浏览了

AI Connect Newsletter | Edition #16

Transforming the Future: How AI, Machine Learning, and Data Science Drive Innovation

Unveiling the Veil: Data Science and Explainable AI in Machine Learning

Why Data is the Lifeblood of AI: Understanding the Crucial Connection

June Newsletter - AI Trends and Innovations at Think Evolve Consulting

AI and the Transformation of Information Management: A New Era of Consolidated Efficiency

Data-centric approach vs model-centric approach

Enhancing AI Reliability: Understanding and Addressing AI Hallucinations Through Data Quality Improvement

Understanding GraphRAG in Generative AI

AI, Agents and Applications

Today at a Glance:

RAG Basics: A Beginner’s Guide to Retrieval-Augmented Generation

Weekly News & Updates...

The Cloud: the backbone of the AI revolution

Gen AI Use Case of the Week:

领英推荐

Favorite Tip Of The Week:

Potential of AI

Things to Know...

The Opportunity...

Tech and Tools...

Data Sets...

Other Technology News

Join a mini email course on Generative AI ...

Earlier week's Post:

That's it!

AI Tech Circle

1,720 位关注者

Kashif Manzoor的更多文章

Generative AI Engagement Pyramid for growing AI Inference

Is the AI race Your organization's AI Race?

Agentic AI: Creating your first AI Agent in OCI

What AI Can Do, Its Risks, and Mitigation Strategies from the International AI Safety Report

DeepSeek Special Edition: Action Steps for the Week

Race to Deliver Generative AI Outcomes

Introduction to 4 Agentic AI Design Patterns

From Perception AI to Generative to Agentic to Physical AI

Generative AI: Use Case-Driven Approach vs Capability-Driven Approach

2024 Wrapping up a Year of Shared Learning and Growth in AI

社区洞察

其他会员也浏览了

AI Connect Newsletter | Edition #16

Transforming the Future: How AI, Machine Learning, and Data Science Drive Innovation

Unveiling the Veil: Data Science and Explainable AI in Machine Learning

Why Data is the Lifeblood of AI: Understanding the Crucial Connection

June Newsletter - AI Trends and Innovations at Think Evolve Consulting

AI and the Transformation of Information Management: A New Era of Consolidated Efficiency

Data-centric approach vs model-centric approach

Enhancing AI Reliability: Understanding and Addressing AI Hallucinations Through Data Quality Improvement

Understanding GraphRAG in Generative AI

AI, Agents and Applications