GenAI Weekly — Edition 31

GenAI Weekly — Edition 31

Your Weekly Dose of Gen AI: News, Trends, and Breakthroughs

Stay at the forefront of the Gen AI revolution with Gen AI Weekly! Each week, we curate the most noteworthy news, insights, and breakthroughs in the field, equipping you with the knowledge you need to stay ahead of the curve.

? Click subscribe to be notified of future editions


How Large Language Models are Ushering in the IDP 2.0 Era

From the Unstract blog:

Engineers transitioning to Product Managers can be risky!

Why?

Because deep technical knowledge may actually limit product innovation. When product design is constrained by current technology, the opportunity to push boundaries is lost. Great product managers focus on what users need, not just what today's tech can achieve—they push engineers to turn the impossible into reality, which is where true innovation happens.

Take Intelligent Document Processing (IDP 1.0) systems, for example. These rely on classical Machine Learning and NLP, but have significant limitations. Now imagine a product manager, free from tech constraints, crafting a wish list:

  • No more manual field annotations
  • Handle document variations seamlessly
  • No need for training sets to classify document types
  • Extract fields from complex documents, not just simple forms
  • Accurately extract data from lengthy documents
  • Eliminate post-processing steps

While this might sound like a dream, it's quickly becoming reality with Large Language Models (LLMs)!

Read on: How Large Language Models are Ushering in the IDP 2.0 Era


Ban warnings fly as users dare to probe the “thoughts” of OpenAI’s latest model

Benj Edwards writing for Ars Technica:

One X user reported (confirmed by others, including Scale AI prompt engineer Riley Goodside) that they received a warning email if they used the term "reasoning trace" in conversation with o1. Others say the warning is triggered simply by asking ChatGPT about the model's "reasoning" at all.
The warning email from OpenAI states that specific user requests have been flagged for violating policies against circumventing safeguards or safety measures. "Please halt this activity and ensure you are using ChatGPT in accordance with our Terms of Use and our Usage Policies," it reads. "Additional violations of this policy may result in loss of access to GPT-4o with Reasoning," referring to an internal name for the o1 model.


My take on this: Seems extreme.


Apple Intelligence Promises Better AI Privacy. Here’s How It Actually Works

Lily Hay Newman writing for Wired:

With Private Cloud Compute, Apple has developed an array of innovative cloud security technologies. But the service is also significant for pushing the limits of what is an acceptable business proposition for a cloud service, seemingly prioritizing secure architecture over what would be most technically efficient or economical.
“We set out from the beginning with a goal of how can we extend the kinds of privacy guarantees that we’ve established with processing on-device with iPhone to the cloud—that was the mission statement," Craig Federighi, senior vice president of software engineering at Apple, tells WIRED. “It took breakthroughs on every level to pull this together, but what we’ve done is achieve our goal. I think this sets a new standard for processing in the cloud in the industry.”
To remove many of the potential attack points and pitfalls that cloud computing can introduce, Apple says its developers focused on the idea that “security and privacy guarantees are strongest when they are entirely technically enforceable” rather than implemented through policies.

My take on this: Interesting approach to work around the limitations of edge compute, but still needs that you trust Apple.


Qwen2.5, Qwen2.5-Code and Qwen2.5-Math models released

From the Qwen blog:

In the past three months since Qwen2’s release, numerous developers have built new models on the Qwen2 language models, providing us with valuable feedback. During this period, we have focused on creating smarter and more knowledgeable language models. Today, we are excited to introduce the latest addition to the Qwen family: Qwen2.5. We are announcing what might be the largest opensource release in history! Let’s get the party started!

Our latest release features the LLMs Qwen2.5, along with specialized models for coding, Qwen2.5-Coder, and mathematics, Qwen2.5-Math. All open-weight models are dense, decoder-only language models, available in various sizes, including:

  • Qwen2.5: 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B
  • Qwen2.5-Coder: 1.5B, 7B, and 32B on the way
  • Qwen2.5-Math: 1.5B, 7B, and 72B.

All our open-source models, except for the 3B and 72B variants, are licensed under Apache 2.0. You can find the license files in the respective Hugging Face repositories. In addition to these models, we offer APIs for our flagship language models: Qwen-Plus and Qwen-Turbo through Model Studio, and we encourage you to explore them! Furthermore, we have also open-sourced the Qwen2-VL-72B, which features performance enhancements compared to last month’s release.

[…]

The specialized expert language models, namely Qwen2.5-Coder for coding and Qwen2.5-Math for mathematics, have undergone substantial enhancements compared to their predecessors, CodeQwen1.5 and Qwen2-Math. Specifically, Qwen2.5-Coder has been trained on 5.5 trillion tokens of code-related data, enabling even smaller coding-specific models to deliver competitive performance against larger language models on coding evaluation benchmarks. Meanwhile, Qwen2.5-Math supports both Chinese and English and incorporates various reasoning methods, including Chain-of-Thought (CoT), Program-of-Thought (PoT), and Tool-Integrated Reasoning (TIR).

My take on this: The Code and Math models seem interesting.


Fine-Tuning for Precision and Privacy: How Corgea's LLM Enhances Enterprise Application Security

From the Corgea blog:

Enterprises, especially those in regulated industries, have stringent requirements for data residency, privacy, and security. These organizations often demand private-cloud deployments and need to avoid reliance on third-party LLMs that could pose data exposure risks. Our fine-tuned LLM addresses these concerns by offering complete data isolation and avoiding the need for customers to sign Business Associate Agreements (BAAs) for HIPAA compliance. Additionally, this approach allows for a low-cost deployment model while outperforming even larger models like OpenAI's in relevant benchmarks.
At the heart of our solution is Llama 3.1 8B which is an 8 billion parameter core model. We benchmarked all the popular small models against each other including Mistral, Mixtral, Codestral mamba, and Deepseek coder. We chose Llama 3.1 8B due to it's size, ease of fine-tuning, measurable performance in key areas we need, and newness.
The model was has multiple fine-tuned weights tailored for specific tasks, including false positive detection, automated fixes, and quality checks. This modular approach allows us to use the best of the weights for a particular task during inference. Flattening the model and merging the weights together proved to yield much worse results.

[…]

What is Our Dataset?

Our model is trained on a diverse dataset comprising hundreds repositories: closed-source projects we own, open-source vulnerable by design projects like Juice Shop, and other open-source codebases. Importantly, no customer data is ever used in the training process. The dataset spans multiple programming languages, including Python, JavaScript, TypeScript, Java, Go, Ruby, and C#, reflecting the diverse ecosystems our customers operate within. We also had to account for a wide range of frameworks such as Ruby-on-rails, Django, Flask, Kotlin, etc. as different frameworks handle security findings differently. For example, there are roughly 30 different ways to fix an SQL injection vulnerability as it depends on the programming language, framework and database that a particular application is using.

My take on this: A good example of how open-weights models are fostering innovation and how LLM fine-tuning can solve real world problems. [Take this with a pinch of salt since Corgea isn’t an established company.]


Plaud’s $169 ChatGPT-powered NotePin has a permanent place in my travel bag

Brian Heater writing for Tech Crunch:


These days, I record on my laptop or place my phone down on the table between myself and the subject. These devices present their own issues, like a lack of proper microphones and the tendency to pick up typing noises when doing double duty. I find myself harboring some light nostalgia for the days of my little Olympus recorder with its built-in USB-A dongle.
These days, I record on my laptop or place my phone down on the table between myself and the subject. These devices present their own issues, like a lack of proper microphones and the tendency to pick up typing noises when doing double duty. I find myself harboring some light nostalgia for the days of my little Olympus recorder with its built-in USB-A dongle.
Plaud.AI’s raison d’être lives somewhere among the above scenarios. Earlier this year, the startup launched Plaud Note, a recording device that magnetically snaps to the back of a handset, utilizing ChatGPT to transcribe conversations. While I didn’t have the opportunity to try out that earlier device, I jumped when the company told me about the upcoming NotePin.

My take on this: Not sure how big a market this is.


Introducing Contextual Retrieval

From Anthropic:

For an AI model to be useful in specific contexts, it often needs access to background knowledge. For example, customer support chatbots need knowledge about the specific business they're being used for, and legal analyst bots need to know about a vast array of past cases.
Developers typically enhance an AI model's knowledge using Retrieval-Augmented Generation (RAG). RAG is a method that retrieves relevant information from a knowledge base and appends it to the user's prompt, significantly enhancing the model's response. The problem is that traditional RAG solutions remove context when encoding information, which often results in the system failing to retrieve the relevant information from the knowledge base.
In this post, we outline a method that dramatically improves the retrieval step in RAG. The method is called “Contextual Retrieval” and uses two sub-techniques: Contextual Embeddings and Contextual BM25. This method can reduce the number of failed retrievals by 49% and, when combined with reranking, by 67%. These represent significant improvements in retrieval accuracy, which directly translates to better performance in downstream tasks.

Performance improvements

Our experiments showed that:

  • Contextual Embeddings reduced the top-20-chunk retrieval failure rate by 35% (5.7% → 3.7%).
  • Combining Contextual Embeddings and Contextual BM25 reduced the top-20-chunk retrieval failure rate by 49% (5.7% → 2.9%).

An interesting RAG technique.


If you've made it this far and follow my newsletter, please consider exploring the platform we're currently building: Unstract—a no-code LLM platform that automates unstructured data workflows.

Follow Unstract on Linkedin and Twitter.


For the extra curious

Videos

Sayan Roy

I Help B2B Founders & CXOs Create and Monetize Their Brand On & Beyond LinkedIn | Personal Branding Expert | LinkedIn Growth Hacker | LinkedIn Lead Generation Specialist | Ghost Writer

1 个月

This roundup is a goldmine for anyone interested in the latest advancements in Generative AI!?

回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了