GenAI Weekly — Edition 32
Your Weekly Dose of Gen AI: News, Trends, and Breakthroughs
Stay at the forefront of the Gen AI revolution with Gen AI Weekly! Each week, we curate the most noteworthy news, insights, and breakthroughs in the field, equipping you with the knowledge you need to stay ahead of the curve.
A Guide to Optical Character Recognition (OCR) With Tesseract
OCR technology transforms various document types—scanned paper documents, images, and PDFs—into machine-readable and editable text. By analyzing character shapes in images, OCR extracts text for digitization, making it a vital tool for industries like legal and healthcare, where converting printed records into searchable digital formats speeds up data retrieval.
OCR plays a crucial role in improving workflow automation, reducing manual data entry, and enhancing accessibility. For example, it converts printed materials into formats readable by screen readers, supporting visually impaired individuals.
However, choosing the right OCR tool is key, as not all OCR engines excel in every use case. Some are better for printed text, while others handle handwriting or complex layouts like forms and tables. Tesseract OCR stands out as a powerful open-source tool, supporting over 100 languages and offering customization for advanced needs. It’s adaptable and widely used for its flexibility in handling diverse documents.
?? Check out our guide on Optical Character Recognition with Tesseract!
Meta introduces Llama 3.2 lightweight and multimodal models
Today, we’re releasing Llama 3.2, which includes small and medium-sized vision LLMs (11B and 90B), and lightweight, text-only models (1B and 3B) that fit onto edge and mobile devices, including pre-trained and instruction-tuned versions.
My take on this: Meta continues to impress.
Google Cloud rolls out new Gemini models, AI agents, customer engagement suite
Google Cloud launched a series of updates including new Gemini 1.5 Flash and 1.5 Pro models with a 2 million context window, grounding with Google search, premade Gems in Gemini in Google Workspace and a series of AI agents designed for customer engagement and conversation.
The updates, outlined at a Gemini at Work event, come as generative AI players increasingly focus on agentic AI. Google is looking to drive Gemini throughout its platform. The pitch from Google Cloud is that its unified stack can enable enterprises to tap into multiple foundational models including Gemini, create agents with an integrated developer platform and deploy AI agents with grounding in enterprise data on optimized infrastructure.
Google Cloud's agent push was noted by Google Cloud CEO Thomas Kurian at an investment conference recently. Kurian cited a series of use cases in telecom and other industries. Kurian said Google Cloud is introducing new applications for customer experience and customer service. "Think of it as you can go on the web, on a mobile app, you can call a call center or be at a retail point of sale, and you can have a digital agent, help you assist you in searching for information, finding answers to questions using either chat or voice calls," said Kurian.
Google Cloud is showcasing more than 50 customer stories and case studies for Gemini deployments including a big push into customer engagement.?
My take on this: The two most popular suites from Microsoft and Google are becoming AI enabled.
The Intelligence Age
Here is one narrow way to look at human history: after thousands of years of compounding scientific discovery and technological progress, we have figured out how to melt sand, add some impurities, arrange it with astonishing precision at extraordinarily tiny scale into computer chips, run energy through it, and end up with systems capable of creating increasingly capable artificial intelligence.
This may turn out to be the most consequential fact about all of history so far. It is possible that we will have superintelligence in a few thousand days (!); it may take longer, but I’m confident we’ll get there.
How did we get to the doorstep of the next leap in prosperity?
In three words: deep learning worked.
In 15 words: deep learning worked, got predictably better with scale, and we dedicated increasing resources to it.
That’s really it; humanity discovered an algorithm that could really, truly learn any distribution of data (or really, the underlying “rules” that produce any distribution of data). To a shocking degree of precision, the more compute and data available, the better it gets at helping people solve hard problems. I find that no matter how much time I spend thinking about this, I can never really internalize how consequential it is.
[…]
Many of the jobs we do today would have looked like trifling wastes of time to people a few hundred years ago, but nobody is looking back at the past, wishing they were a lamplighter. If a lamplighter could see the world today, he would think the prosperity all around him was unimaginable. And if we could fast-forward a hundred years from today, the prosperity all around us would feel just as unimaginable.
My take on this: Is this article so important that it has its own website? Only time will tell.
Mira Murati, CTO, leaves OpenAI
I shared the following note with the OpenAI team today.
领英推荐
OpenAI to move away from its non-profit structure
ChatGPT-maker OpenAI is working on a plan to restructure its core business into a for-profit benefit corporation that will no longer be controlled by its non-profit board, people familiar with the matter told Reuters, in a move that will make the company more attractive to investors.
The OpenAI non-profit will continue to exist and own a minority stake in the for-profit company, the sources said. The move could also have implications for how the company manages AI risks in a new governance structure.
Chief executive Sam Altman will also receive equity for the first time in the for-profit company, which could be worth $150 billion after the restructuring as it also tries to remove the cap on returns for investors, sources added. The sources requested anonymity to discuss private matters.
"We remain focused on building AI that benefits everyone, and we’re working with our board to ensure that we’re best positioned to succeed in our mission. The non-profit is core to our mission and will continue to exist," an OpenAI spokesperson said.
The details of the proposed corporate structure, first reported by Reuters, highlight significant governance changes happening behind the scenes at one of the most important AI companies. The plan is still being hashed out with lawyers and shareholders and the timeline for completing the restructuring remains uncertain, the sources said.
The restructuring also comes amid a series of leadership changes at the startup. OpenAI's longtime chief technology officer Mira Murati abruptly announced her departure from the company on Wednesday. Greg Brockman, OpenAI's president, has also been on leave.
My take on this: This was to be expected for the investments to be sustained.
If your AI seems smarter, it's thanks to smarter human trainers
"A year ago, we could get away with hiring undergraduates, to just generally teach AI on how to improve," said Cohere co-founder Ivan Zhang, talking about its internal human trainers.
"Now we have licensed physicians teaching the models how to behave in medical environments, or financial analysts or accountants."
For more training, Cohere, which was last valued at over $5 billion, works with a startup called Invisible Tech. Cohere is one of the main rivals of OpenAI and specializes in AI for businesses.
The startup Invisible Tech employs thousands of trainers, working remotely, and has become one of the main partners of AI companies ranging from AI21 to Microsoft to train their AI models to reduce errors, known in the AI world as hallucinations.
"We have 5,000 people in over 100 countries around the world that are PhDs, Master's degree holders and knowledge work specialists," said Invisible founder Francis Pedraza.
Invisible pays as much as $40 per hour, depending on the location of the worker and the complexity of work. Some companies such as Outlier pay up to $50 per hour, while another company called Labelbox said it pays up to $200 per hour for "high expertise" subjects like quantum physics, but starts with $15 for basic topics.
My take on this: A behind the scenes look into making AI smart and the companies that enable it.
Cloudflare’s new marketplace will let websites charge AI bots for scraping
Cloudflare is trying to address a problem looming over the AI industry: How will smaller publishers survive in the AI era if people go to ChatGPT instead of their website? Today, AI model providers scrape thousands of small websites for information that powers their LLMs. While some larger publishers have struck deals with OpenAI to license content, most websites get nothing, but their content is still fed into popular AI models on a daily basis. That could break the business models for many websites, reducing traffic they desperately need.
Earlier this summer, AI-powered search startup Perplexity was accused of scraping websites that deliberately indicated they did not want to be crawled using the Robots Exclusion Protocol. Shortly after, Cloudflare released a button to ensure customers could block all AI bots with one click.
My take on this: This is the first time in the history of the internet that its open model has become a threat to its existence.
AMD Unveils Its First Small Language Model AMD-135M
In the ever-evolving landscape of artificial intelligence, large language models (LLMs) like GPT-4 and Llama?have garnered significant attention for their impressive capabilities in natural language processing and generation. However, small language models (SLMs) are emerging as an essential counterpart in the AI model community offering a unique advantage for specific use cases.? AMD is excited to release its very first small language model, AMD-135M with Speculative Decoding. ?This work demonstrates the commitment to an open approach to AI which will lead to more inclusive, ethical, and innovative technological progress, helping ensure that its benefits are more widely shared, and its challenges more collaboratively addressed.?
AMD-135M: First AMD Small Language Model?
AMD-135M is the first small language model for Llama family that was trained from scratch on AMD Instinct? MI250 accelerators?utilizing 670B tokens and divided into two models: AMD-Llama-135M and AMD-Llama-135M-code.
The training code, dataset and weights for this model are open sourced so that developers can reproduce the model and help train other SLMs and LLMs.
My take on this: Welcome to the party, AMD. Seriously though, since they’re in competition with Nvidia, this is a very relevant investment.
If you've made it this far and follow my newsletter, please consider exploring the platform we're currently building: Unstract—a no-code LLM platform that automates unstructured data workflows.
For the extra curious
I Help B2B Founders & CXOs Create and Monetize Their Brand On & Beyond LinkedIn | Personal Branding Expert | LinkedIn Growth Hacker | LinkedIn Lead Generation Specialist | Ghost Writer
1 个月Wow, this is good news.?It's exciting to see how generative AI is going to change everyday life!?
Insightful. Thank you