GenAI Weekly — Edition 16

GenAI Weekly — Edition 16

Your Weekly Dose of Gen AI: News, Trends, and Breakthroughs

Stay at the forefront of the Gen AI revolution with Gen AI Weekly! Each week, we curate the most noteworthy news, insights, and breakthroughs in the field, equipping you with the knowledge you need to stay ahead of the curve.

? Click subscribe to be notified of future editions


Vector DB Retrieval: To chunk or not to chunk

From the Unstract Blog:

What is Chunking?

Chunking is the process of breaking a large document into smaller, manageable "chunks." This is crucial in the Retrieval-Augmented Generation (RAG) ecosystem, especially due to the context window size limitations of Large Language Models (LLMs).

LLMs have a limited context size or window, meaning they can only process a certain number of tokens (words and punctuation marks) at one time.

For example, GPT-3.5 Turbo has a context size of 4,096 tokens. This limitation means very large documents cannot be sent in their entirety to the LLM.

Instead, only the contextually relevant portions should be sent. To achieve this, the document needs to be chunked, allowing relevant chunks to be identified and processed efficiently.

In summary, chunking ensures that LLMs can handle large documents by breaking them into smaller, relevant pieces that fit within their processing capabilities.



A PR disaster: Microsoft has lost trust with its users, and Windows Recall is the straw that broke the camels back

Zac Bowden writing for Windows Central:

It's a nightmare scenario for Microsoft. The headlining feature of its new Copilot+ PC initiative, which is supposed to drive millions of PC sales over the next couple of years, is under significant fire for being what many say is a major breach of privacy and security on Windows. That feature in question is Windows Recall, a new AI tool designed to remember everything you do on Windows.
On paper, it's a cool idea. As CEO Satya Nadella described it, Windows now has a photographic memory that uses AI to triage and index everything you've ever done on your computer, enabling you to semantically search for things you've seen using natural language. It's a new and improved way of finding things on Windows, and in our testing of the feature, it works really well.
However, for a tool like this to be feasible, trust between the user and the platform is required, a luxury Microsoft doesn't appear to have with its Windows user base right now. Recall operates by taking and storing captures of your screen every few seconds to build a database that the user can later search, with screenshots as visual aids. That database is stored locally on your device and never uploaded to the cloud.
In fact, Microsoft goes so far as to promise that it cannot see the data collected by Windows Recall, that it can't train any of its AI models on your data, and that it definitely can't sell that data to advertisers. All of this is true, but that doesn't mean people believe Microsoft when it says these things. In fact, many have jumped to the conclusion that even if it's true today, it won't be true in the future.

And, Microsoft reacted: Read from The Verge: Windows won’t take screenshots of everything you do after all — unless you opt in


Google introduces PaliGemma, Gemma 2, and an Upgraded Responsible AI Toolkit

Tris Warkentin, Xiaohua Zhai and Ludovic Peran writing for the Google blog:

PaliGemma is a powerful open VLM inspired by PaLI-3. Built on open components including the SigLIP vision model and the Gemma language model, PaliGemma is designed for class-leading fine-tune performance on a wide range of vision-language tasks. This includes image and short video captioning, visual question answering, understanding text in images, object detection, and object segmentation.
We're providing both pretrained and fine-tuned checkpoints at multiple resolutions, as well as checkpoints specifically tuned to a mixture of tasks for immediate exploration.

To facilitate open exploration and research, PaliGemma is available through various platforms and resources. Start exploring today with free options like Kaggle and Colab notebooks. Academic researchers seeking to push the boundaries of vision-language research can also apply for Google Cloud credits to support their work.

And about Gemma 2, they say:

We're thrilled to announce the upcoming arrival of Gemma 2, the next generation of Gemma models. Gemma 2 will be available in new sizes for a broad range of AI developer use cases and features a brand new architecture designed for breakthrough performance and efficiency, offering benefits such as:

  • Class Leading Performance: At 27 billion parameters, Gemma 2 delivers performance comparable to Llama 3 70B at less than half the size. This breakthrough efficiency sets a new standard in the open model landscape.

  • Reduced Deployment Costs: Gemma 2's efficient design allows it to fit on less than half the compute of comparable models. The 27B model is optimized to run on NVIDIA’s GPUs or can run efficiently on a single TPU host in Vertex AI, making deployment more accessible and cost-effective for a wider range of users.

  • Versatile Tuning Toolchains: Gemma 2 will provide developers with robust tuning capabilities across a diverse ecosystem of platforms and tools. From cloud-based solutions like Google Cloud to popular community tools like Axolotl, fine-tuning Gemma 2 will be easier than ever. Plus, seamless partner integration with Hugging Face and NVIDIA TensorRT-LLM, along with our own JAX and Keras, ensures you can optimize performance and efficiently deploy across various hardware configurations.

I hear the Gemma models are really good. Especially when it comes to fine-tuning.


Breaking up is hard to do: Chunking in RAG applications

Ryan Donovan writing for the Stack Overflow blog:

When it comes to RAG systems, you’ll need to pay special attention to how big the individual pieces of data are. How you divide your data up is called chunking, and it’s more complex than embedding whole documents. This article will take a look at some of the current thinking around chunking data for RAG systems.
If chunking were cut and dried, the industry would have settled on a standard pretty quickly, but the best chunking strategy is dependent on the use case. Fortunately, you’re not just chunking data, vectorizing it, and crossing your fingers. You’ve also got metadata. This can be a link to the original chunk or larger portions of the document, categories and tags, text, or really anything at all. “It's kind of like a JSON blob that you can use to filter out things,” said Schwaber-Cohen. “You can reduce the search space significantly if you're just looking for a particular subset of the data, and you could use that metadata to then link the content that you're using in your response back to the original content.”

[…]

With these concerns in mind, several common chunking strategies have emerged. The most basic is to chunk text into fixed sizes. This works for fairly homogenous datasets that use content of similar formats and sizes, like news articles or blog posts. It’s the cheapest method in terms of the amount of compute you’ll need, but it doesn’t take into account the context of the content that you’re chunking. That might not matter for your use case, but it might end up mattering a lot.

Production-grade RAG apps are hard. Chunking strategies are far from settled.


An anti-AI social app, Cara grew from 40k to 650k users in a week

Amanda Siberling writing for Techcrunch:

Artists have finally had enough with Meta’s predatory AI policies, but Meta’s loss is Cara’s gain. An artist-run, anti-AI social platform, Cara has grown from 40,000 to 650,000 users within the last week, catapulting it to the top of the App Store charts.
Instagram is a necessity for many artists, who use the platform to promote their work and solicit paying clients. But Meta is using public posts to train its generative AI systems, and only European users can opt out, since they’re protected by GDPR laws. Generative AI has become so front-and-center on Meta’s apps that artists reached their breaking point.
“When you put [AI] so much in their face, and then give them the option to opt out, but then increase the friction to opt out… I think that increases their anger level — like, okay now I’ve really had enough,” Jingna Zhang, a renowned photographer and founder of Cara, told TechCrunch.

Several creative fields are being disrupted. Good art used to be supply constrained, the very reason good artists were valued. What happens when that constraint is removed?


Humane, the company that made the AI pin is in talks with HP to sell itself

Tripp Mickle and Erin Griffith writing for the New York Times:

Days before gadget reviewers weighed in on the Humane Ai Pin, a futuristic wearable device powered by artificial intelligence, the founders of the company gathered their employees and encouraged them to brace themselves. The reviews might be disappointing, they warned.
Humane’s founders, Bethany Bongiorno and Imran Chaudhri, were right. In April, reviewers brutally panned the new $699 product, which Humane had marketed for a year with ads and at glitzy events like Paris Fashion Week. The Ai Pin was “totally broken” and had “glaring flaws,” some reviewers said. One declared it “the worst product I’ve ever reviewed.”
About a week after the reviews came out, Humane started talking to HP, the computer and printer company, about selling itself for more than $1 billion, three people with knowledge of the conversations said. Other potential buyers have emerged, though talks have been casual and no formal sales process has begun.
Humane retained Tidal Partners, an investment bank, to help navigate the discussions while also managing a new funding round that would value it at $1.1 billion, three people with knowledge of the plans said.

Humane’s AI Pin wanted to be the anti-smartphone, but still came with a “screen” albeit in a different form. I think smartphones are so efficient at I/O with humans that they can’t be gotten rid of for a very long time. Just like how one constant in computing for the past 50 years—the keyboard—our top input device, could never be gotten rid of. It is even emulated on smartphones.


If you've made it this far and follow my newsletter, please consider exploring the platform we're currently building: Unstract—a no-code LLM platform that automates unstructured data workflows.


For the extra curious


要查看或添加评论,请登录

社区洞察

其他会员也浏览了