GenAI Weekly — Edition 32

GenAI Weekly — Edition 32

Your Weekly Dose of Gen AI: News, Trends, and Breakthroughs

Stay at the forefront of the Gen AI revolution with Gen AI Weekly! Each week, we curate the most noteworthy news, insights, and breakthroughs in the field, equipping you with the knowledge you need to stay ahead of the curve.

? Click subscribe to be notified of future editions


A Guide to Optical Character Recognition (OCR) With Tesseract

From the Unstract blog:

OCR technology transforms various document types—scanned paper documents, images, and PDFs—into machine-readable and editable text. By analyzing character shapes in images, OCR extracts text for digitization, making it a vital tool for industries like legal and healthcare, where converting printed records into searchable digital formats speeds up data retrieval.

OCR plays a crucial role in improving workflow automation, reducing manual data entry, and enhancing accessibility. For example, it converts printed materials into formats readable by screen readers, supporting visually impaired individuals.

However, choosing the right OCR tool is key, as not all OCR engines excel in every use case. Some are better for printed text, while others handle handwriting or complex layouts like forms and tables. Tesseract OCR stands out as a powerful open-source tool, supporting over 100 languages and offering customization for advanced needs. It’s adaptable and widely used for its flexibility in handling diverse documents.

?? Check out our guide on Optical Character Recognition with Tesseract!


Meta introduces Llama 3.2 lightweight and multimodal models

From Meta’s AI blog:

Today, we’re releasing Llama 3.2, which includes small and medium-sized vision LLMs (11B and 90B), and lightweight, text-only models (1B and 3B) that fit onto edge and mobile devices, including pre-trained and instruction-tuned versions.

  • The Llama 3.2 1B and 3B models support context length of 128K tokens and are state-of-the-art in their class for on-device use cases like summarization, instruction following, and rewriting tasks running locally at the edge. These models are enabled on day one for Qualcomm and MediaTek hardware and optimized for Arm processors.
  • Supported by a broad ecosystem, the Llama 3.2 11B and 90B vision models are drop-in replacements for their corresponding text model equivalents, while exceeding on image understanding tasks compared to closed models, such as Claude 3 Haiku. Unlike other open multimodal models, both pre-trained and aligned models are available to be fine-tuned for custom applications using torchtune and deployed locally using torchchat. They’re also available to try using our smart assistant, Meta AI.
  • We’re sharing the first official Llama Stack distributions, which will greatly simplify the way developers work with Llama models in different environments, including single-node, on-prem, cloud, and on-device, enabling turnkey deployment of retrieval-augmented generation (RAG) and tooling-enabled applications with integrated safety.
  • We’ve been working closely with partners like AWS, Databricks, Dell Technologies, Fireworks, Infosys, and Together AI to build Llama Stack distributions for their downstream enterprise clients. On-device distribution is via PyTorch ExecuTorch, and single-node distribution is via Ollama.
  • We continue to share our work because we believe openness drives innovation and is good for developers, Meta, and the world. Llama is already leading the way on openness, modifiability, and cost efficiency—enabling more people to have creative, useful, and life-changing breakthroughs using generative AI.
  • We’re making Llama 3.2 models available for download on llama.com and Hugging Face, as well as available for immediate development on our broad ecosystem of partner platforms, including AMD, AWS, Databricks, Dell, Google Cloud, Groq, IBM, Intel, Microsoft Azure, NVIDIA, Oracle Cloud, Snowflake, and more.

My take on this: Meta continues to impress.


Google Cloud rolls out new Gemini models, AI agents, customer engagement suite

Larry Dignan writing for Constellation Research:

Google Cloud launched a series of updates including new Gemini 1.5 Flash and 1.5 Pro models with a 2 million context window, grounding with Google search, premade Gems in Gemini in Google Workspace and a series of AI agents designed for customer engagement and conversation.

The updates, outlined at a Gemini at Work event, come as generative AI players increasingly focus on agentic AI. Google is looking to drive Gemini throughout its platform. The pitch from Google Cloud is that its unified stack can enable enterprises to tap into multiple foundational models including Gemini, create agents with an integrated developer platform and deploy AI agents with grounding in enterprise data on optimized infrastructure.

Google Cloud's agent push was noted by Google Cloud CEO Thomas Kurian at an investment conference recently. Kurian cited a series of use cases in telecom and other industries. Kurian said Google Cloud is introducing new applications for customer experience and customer service. "Think of it as you can go on the web, on a mobile app, you can call a call center or be at a retail point of sale, and you can have a digital agent, help you assist you in searching for information, finding answers to questions using either chat or voice calls," said Kurian.
Google Cloud is showcasing more than 50 customer stories and case studies for Gemini deployments including a big push into customer engagement.?


My take on this: The two most popular suites from Microsoft and Google are becoming AI enabled.


The Intelligence Age

From Sam Altman:

Here is one narrow way to look at human history: after thousands of years of compounding scientific discovery and technological progress, we have figured out how to melt sand, add some impurities, arrange it with astonishing precision at extraordinarily tiny scale into computer chips, run energy through it, and end up with systems capable of creating increasingly capable artificial intelligence.
This may turn out to be the most consequential fact about all of history so far. It is possible that we will have superintelligence in a few thousand days (!); it may take longer, but I’m confident we’ll get there.

How did we get to the doorstep of the next leap in prosperity?

In three words: deep learning worked.

In 15 words: deep learning worked, got predictably better with scale, and we dedicated increasing resources to it.

That’s really it; humanity discovered an algorithm that could really, truly learn any distribution of data (or really, the underlying “rules” that produce any distribution of data). To a shocking degree of precision, the more compute and data available, the better it gets at helping people solve hard problems. I find that no matter how much time I spend thinking about this, I can never really internalize how consequential it is.

[…]

Many of the jobs we do today would have looked like trifling wastes of time to people a few hundred years ago, but nobody is looking back at the past, wishing they were a lamplighter. If a lamplighter could see the world today, he would think the prosperity all around him was unimaginable. And if we could fast-forward a hundred years from today, the prosperity all around us would feel just as unimaginable.

My take on this: Is this article so important that it has its own website? Only time will tell.


Mira Murati, CTO, leaves OpenAI

From her X account:

I shared the following note with the OpenAI team today.


See also: OpenAI’s chief research officer has left following CTO Mira Murati’s exit


OpenAI to move away from its non-profit structure

Krystal Hu and Kenrick Cai writing for Reuters:

ChatGPT-maker OpenAI is working on a plan to restructure its core business into a for-profit benefit corporation that will no longer be controlled by its non-profit board, people familiar with the matter told Reuters, in a move that will make the company more attractive to investors.

The OpenAI non-profit will continue to exist and own a minority stake in the for-profit company, the sources said. The move could also have implications for how the company manages AI risks in a new governance structure.

Chief executive Sam Altman will also receive equity for the first time in the for-profit company, which could be worth $150 billion after the restructuring as it also tries to remove the cap on returns for investors, sources added. The sources requested anonymity to discuss private matters.
"We remain focused on building AI that benefits everyone, and we’re working with our board to ensure that we’re best positioned to succeed in our mission. The non-profit is core to our mission and will continue to exist," an OpenAI spokesperson said.
The details of the proposed corporate structure, first reported by Reuters, highlight significant governance changes happening behind the scenes at one of the most important AI companies. The plan is still being hashed out with lawyers and shareholders and the timeline for completing the restructuring remains uncertain, the sources said.
The restructuring also comes amid a series of leadership changes at the startup. OpenAI's longtime chief technology officer Mira Murati abruptly announced her departure from the company on Wednesday. Greg Brockman, OpenAI's president, has also been on leave.

My take on this: This was to be expected for the investments to be sustained.


If your AI seems smarter, it's thanks to smarter human trainers

Supantha Mukherjee and Anna Tong writing for Reuters:

"A year ago, we could get away with hiring undergraduates, to just generally teach AI on how to improve," said Cohere co-founder Ivan Zhang, talking about its internal human trainers.
"Now we have licensed physicians teaching the models how to behave in medical environments, or financial analysts or accountants."
For more training, Cohere, which was last valued at over $5 billion, works with a startup called Invisible Tech. Cohere is one of the main rivals of OpenAI and specializes in AI for businesses.
The startup Invisible Tech employs thousands of trainers, working remotely, and has become one of the main partners of AI companies ranging from AI21 to Microsoft to train their AI models to reduce errors, known in the AI world as hallucinations.
"We have 5,000 people in over 100 countries around the world that are PhDs, Master's degree holders and knowledge work specialists," said Invisible founder Francis Pedraza.
Invisible pays as much as $40 per hour, depending on the location of the worker and the complexity of work. Some companies such as Outlier pay up to $50 per hour, while another company called Labelbox said it pays up to $200 per hour for "high expertise" subjects like quantum physics, but starts with $15 for basic topics.

My take on this: A behind the scenes look into making AI smart and the companies that enable it.


Cloudflare’s new marketplace will let websites charge AI bots for scraping

Maxwell Zeff writing for TechCrunch:

Cloudflare is trying to address a problem looming over the AI industry: How will smaller publishers survive in the AI era if people go to ChatGPT instead of their website? Today, AI model providers scrape thousands of small websites for information that powers their LLMs. While some larger publishers have struck deals with OpenAI to license content, most websites get nothing, but their content is still fed into popular AI models on a daily basis. That could break the business models for many websites, reducing traffic they desperately need.
Earlier this summer, AI-powered search startup Perplexity was accused of scraping websites that deliberately indicated they did not want to be crawled using the Robots Exclusion Protocol. Shortly after, Cloudflare released a button to ensure customers could block all AI bots with one click.

My take on this: This is the first time in the history of the internet that its open model has become a threat to its existence.


AMD Unveils Its First Small Language Model AMD-135M

From the AMD blog:

In the ever-evolving landscape of artificial intelligence, large language models (LLMs) like GPT-4 and Llama?have garnered significant attention for their impressive capabilities in natural language processing and generation. However, small language models (SLMs) are emerging as an essential counterpart in the AI model community offering a unique advantage for specific use cases.? AMD is excited to release its very first small language model, AMD-135M with Speculative Decoding. ?This work demonstrates the commitment to an open approach to AI which will lead to more inclusive, ethical, and innovative technological progress, helping ensure that its benefits are more widely shared, and its challenges more collaboratively addressed.?

AMD-135M: First AMD Small Language Model?

AMD-135M is the first small language model for Llama family that was trained from scratch on AMD Instinct? MI250 accelerators?utilizing 670B tokens and divided into two models: AMD-Llama-135M and AMD-Llama-135M-code.

  • Pretraining: The AMD-Llama-135M model was trained from scratch with 670 billion tokens of general data over six days using four MI250 nodes.?
  • Code Finetuning: The AMD-Llama-135M-code variant was fine-tuned with an additional 20 billion tokens of code data, taking four days on the same hardware.?

The training code, dataset and weights for this model are open sourced so that developers can reproduce the model and help train other SLMs and LLMs.

My take on this: Welcome to the party, AMD. Seriously though, since they’re in competition with Nvidia, this is a very relevant investment.


If you've made it this far and follow my newsletter, please consider exploring the platform we're currently building: Unstract—a no-code LLM platform that automates unstructured data workflows.

Follow Unstract on Linkedin and Twitter.


For the extra curious


Sayan Roy

I Help B2B Founders & CXOs Create and Monetize Their Brand On & Beyond LinkedIn | Personal Branding Expert | LinkedIn Growth Hacker | LinkedIn Lead Generation Specialist | Ghost Writer

1 个月

Wow, this is good news.?It's exciting to see how generative AI is going to change everyday life!?

要查看或添加评论,请登录

社区洞察

其他会员也浏览了