GenAI Weekly — Edition 30

GenAI Weekly — Edition 30

Your Weekly Dose of Gen AI: News, Trends, and Breakthroughs

Stay at the forefront of the Gen AI revolution with Gen AI Weekly! Each week, we curate the most noteworthy news, insights, and breakthroughs in the field, equipping you with the knowledge you need to stay ahead of the curve.

? Click subscribe to be notified of future editions



Best OCR Software in 2024 — A Tool Comparison & Evaluation Guide

From the Unstract blog:

OCR technology is essential in today's digital world, transforming scanned papers, PDFs, and images into editable, searchable text. This boosts productivity, especially in industries like finance, healthcare, legal, and education, where document processing is vital. The effectiveness of OCR directly affects workflows, data accuracy, and operational efficiency. As businesses embrace digital transformation, choosing the right OCR tool is crucial. This article reviews the top OCR software available in 2024.

We will compare:

1. Tesseract,

2. Paddle OCR,

3. Azure Document Intelligence

4. Amazon Textract

5. LLMWhisperer from Unstract



OpenAI releases o1-preview and o1-mini

From their blog :

We trained these models to spend more time thinking through problems before they respond, much like a person would. Through training, they learn to refine their thinking process, try different strategies, and recognize their mistakes.?
In our tests, the next model update performs similarly to PhD students on challenging benchmark tasks in physics, chemistry, and biology. We also found that it excels in math and coding. In a qualifying exam for the International Mathematics Olympiad (IMO), GPT-4o correctly solved only 13% of problems, while the reasoning model scored 83%. Their coding abilities were evaluated in contests and reached the 89th percentile in Codeforces competitions. You can read more about this in our technical research post .

As an early model, it doesn't yet have many of the features that make ChatGPT useful, like browsing the web for information and uploading files and images. For many common cases GPT-4o will be more capable in the near term.

But for complex reasoning tasks this is a significant advancement and represents a new level of AI capability. Given this, we are resetting the counter back to 1 and naming this series OpenAI o1.

[…]

Whom it’s for

These enhanced reasoning capabilities may be particularly useful if you’re tackling complex problems in science, coding, math, and similar fields. For example, o1 can be used by healthcare researchers to annotate cell sequencing data, by physicists to generate complicated mathematical formulas needed for quantum optics, and by developers in all fields to build and execute multi-step workflows.?

[…]

OpenAI o1-mini

The o1 series excels at accurately generating and debugging complex code. To offer a more efficient solution for developers, we’re also releasing OpenAI o1-mini , a faster, cheaper reasoning model that is particularly effective at coding. As a smaller model, o1-mini is 80% cheaper than o1-preview, making it a powerful, cost-effective model for applications that require reasoning but not broad world knowledge.

Also see Learning to Reason with LLMs .


Notes on OpenAI’s new o1 chain-of-thought models

From Simon Willison’s blog :

OpenAI released two major new preview models today: o1-preview and o1-mini (that mini one is not a preview )—previously rumored as having the codename “strawberry”. There’s a lot to understand about these models—they’re not as simple as the next step up from GPT-4o, instead introducing some major trade-offs in terms of cost and performance in exchange for improved “reasoning” capabilities.
[…] the models can better handle significantly more complicated prompts where a good result requires backtracking and “thinking” beyond just next token prediction.
I don’t really like the term “reasoning” because I don’t think it has a robust definition in the context of LLMs, but OpenAI have committed to using it here and I think it does an adequate job of conveying the problem these new models are trying to solve.

[…]

Most interestingly is the introduction of “reasoning tokens”—tokens that are not visible in the API response but are still billed and counted as output tokens. These tokens are where the new magic happens.

Thanks to the importance of reasoning tokens—OpenAI suggests allocating a budget of around 25,000 of these for prompts that benefit from the new models—the output token allowance has been increased dramatically—to 32,768 for o1-preview and 65,536 for the supposedly smaller o1-mini! These are an increase from the gpt-4o and gpt-4o-mini models which both currently have a 16,384 output token limit.

One last interesting tip from that API documentation:

Limit additional context in retrieval-augmented generation (RAG): When providing additional context or documents, include only the most relevant information to prevent the model from overcomplicating its response.

This is a big change from how RAG is usually implemented, where the advice is often to cram as many potentially relevant documents as possible into the prompt.

My take on this: The best summary one can read anywhere.


Mistral releases Pixtral 12B, its first multimodal model

Kyle Wiggers writing for TechCrunch :

French AI startup Mistral has released its first model that can process images as well as text.

Called Pixtral 12B, the 12-billion-parameter model is about 24GB in size. Parameters?roughly correspond to a model’s problem-solving skills, and models with more?parameters?generally perform better than those with fewer parameters.

Built on one of Mistral’s text models, Nemo 12B, the new model can answer questions about an arbitrary number of images of an arbitrary size given either URLs or images encoded using base64, the binary-to-text encoding scheme. Similar to other multimodal models such as Anthropic’s Claude family and OpenAI’s GPT-4o, Pixtral 12B should — at least in theory — be able to perform tasks like captioning images and counting the number of objects in a photo.

Available via a torrent link on GitHub and AI and machine learning development platform Hugging Face , Pixtral 12B can be downloaded, fine-tuned and used under an Apache 2.0 license without restrictions. (A Mistral spokesperson confirmed the license being applied to Pixtral 12B via email.)

This writer wasn’t able to take Pixtral 12B for a spin, unfortunately — there weren’t any working web demos at the time of publication. In a post on X , Sophia Yang, head of Mistral developer relations, said Pixtral 12B will be available for testing on Mistral’s chatbot and API-serving platforms, Le Chat and Le Plateforme, soon. It’s unclear which image data Mistral might have used to develop Pixtral 12B.

My take on this: Good to see multi-modal open weight models!


Google Illuminate: Transform your content into engaging AI-generated audio discussions


My take on this: Google Illuminate is just insanely good. I’d rate the quality of the conversation to be of almost podcast quality.


How few-shot learning with Google’s Prompt Poet can supercharge your LLMs

Michael Trestman writing for VentureBeat :

Prompt engineering, the discipline of crafting just the right input to a large language model (LLM) to get the desired response, is a critical new skill for the age of AI. It’s helpful for even casual users of conversational AI, but essential for builders of the next generation of AI-powered applications.

Enter Prompt Poet , the brainchild of Character.ai , a conversational LLM startup recently acquired by Google . Prompt Poet simplifies advanced prompt engineering by offering a user-friendly, low-code template system that manages context effectively and seamlessly integrates external data. This allows you to ground LLM-generated responses to a real-world data context, opening up a new horizon of AI interactions.

Prompt Poet shines for its seamless integration of “few-shot learning,” a powerful technique for rapid customization of LLMs without requiring complex and expensive model fine-tuning. This article explores how few-shot learning with Prompt Poet can be leveraged to deliver bespoke AI-driven interactions with ease and efficiency.

My take on this: Prompt engineering isn’t really all that widespread. This will help.


Tell Replit's AI Agent Your App Idea, and It'll Code and Deploy It for You

Chris McKay writing for Maginative :

Replit has launched an AI agent capable of building entire applications from scratch. This isn't just another copilot coding assistant – it's much closer to a intern software developer that can understand your vision and help bring it to life.

But what exactly is an AI agent, and why is this such a big deal?

An AI agent is a more autonomous and proactive system compared to current AI assistants like ChatGPT or Claude. While today's AI assistants respond to specific queries or tasks, AI agents operate with a higher degree of independence, making decisions and executing complex tasks without constant user input. They can learn and adapt over time, improving their actions based on feedback and new information.

Replit's AI agent takes this concept and applies it to the world of software development. It can reason through a task and create its own steps to complete it—such as writing code, setting up environments, and managing deployments.

"We've crossed a threshold," says Replit CEO Amjad Masad. "This isn't about AI replacing developers. It's about supercharging human creativity and making software creation accessible to everyone."

My take on this: I would wait for more in-depth reviews from industry experts.


Sebastian Raschka’s Build a Large Language Model (From Scratch) is now available

From the Manning website :


Learn how to create, train, and tweak large language models (LLMs) by building one from the ground up! In Build a Large Language Model (from Scratch) bestselling author Sebastian Raschka guides you step by step through creating your own LLM. Each stage is explained with clear text, diagrams, and examples. You’ll go from the initial design and creation, to pretraining on a general corpus, and on to fine-tuning for specific tasks.

Build a Large Language Model (from Scratch) teaches you how to:

  • Plan and code all the parts of an LLM
  • Prepare a dataset suitable for LLM training
  • Fine-tune LLMs for text classification and with your own data
  • Use human feedback to ensure your LLM follows instructions
  • Load pretrained weights into an LLM

Build a Large Language Model (from Scratch) takes you inside the AI black box to tinker with the internal systems that power generative AI. As you work through each key stage of LLM creation, you’ll develop an in-depth understanding of how LLMs work, their limitations, and their customization methods. Your LLM can be developed on an ordinary laptop, and used as your own personal assistant.

My take on this: Highly recommended. It’s a great resource to gain key fundamental insights.


If you've made it this far and follow my newsletter, please consider exploring the platform we're currently building: Unstract —a no-code LLM platform that automates unstructured data workflows.

Follow Unstract on Linkedin and Twitter .


For the extra curious

Sayan Roy

I Help B2B Founders & CXOs Create and Monetize Their Brand On & Beyond LinkedIn | Personal Branding Expert | LinkedIn Growth Hacker | LinkedIn Lead Generation Specialist | Ghost Writer

2 个月

Great post!

要查看或添加评论,请登录

Shuveb Hussain的更多文章

  • GenAI Weekly — Edition 37

    GenAI Weekly — Edition 37

    Your Weekly Dose of Gen AI: News, Trends, and Breakthroughs Stay at the forefront of the Gen AI revolution with Gen AI…

  • GenAI Weekly — Edition 36

    GenAI Weekly — Edition 36

    Your Weekly Dose of Gen AI: News, Trends, and Breakthroughs Stay at the forefront of the Gen AI revolution with Gen AI…

  • GenAI Weekly — Edition 35

    GenAI Weekly — Edition 35

    Your Weekly Dose of Gen AI: News, Trends, and Breakthroughs Stay at the forefront of the Gen AI revolution with Gen AI…

    3 条评论
  • GenAI Weekly — Edition 34

    GenAI Weekly — Edition 34

    Your Weekly Dose of Gen AI: News, Trends, and Breakthroughs Stay at the forefront of the Gen AI revolution with Gen AI…

  • GenAI Weekly — Edition 33

    GenAI Weekly — Edition 33

    Your Weekly Dose of Gen AI: News, Trends, and Breakthroughs Stay at the forefront of the Gen AI revolution with Gen AI…

  • GenAI Weekly — Edition 32

    GenAI Weekly — Edition 32

    Your Weekly Dose of Gen AI: News, Trends, and Breakthroughs Stay at the forefront of the Gen AI revolution with Gen AI…

    2 条评论
  • GenAI Weekly — Edition 31

    GenAI Weekly — Edition 31

    Your Weekly Dose of Gen AI: News, Trends, and Breakthroughs Stay at the forefront of the Gen AI revolution with Gen AI…

    1 条评论
  • GenAI Weekly — Edition 29

    GenAI Weekly — Edition 29

    Your Weekly Dose of Gen AI: News, Trends, and Breakthroughs Stay at the forefront of the Gen AI revolution with Gen AI…

  • GenAI Weekly — Edition 28

    GenAI Weekly — Edition 28

    Your Weekly Dose of Gen AI: News, Trends, and Breakthroughs Stay at the forefront of the Gen AI revolution with Gen AI…

  • GenAI Weekly — Edition 27

    GenAI Weekly — Edition 27

    Your Weekly Dose of Gen AI: News, Trends, and Breakthroughs Stay at the forefront of the Gen AI revolution with Gen AI…

社区洞察

其他会员也浏览了