GenAI Weekly — Edition 29
Your Weekly Dose of Gen AI: News, Trends, and Breakthroughs
Stay at the forefront of the Gen AI revolution with Gen AI Weekly! Each week, we curate the most noteworthy news, insights, and breakthroughs in the field, equipping you with the knowledge you need to stay ahead of the curve.
Best OCR Software in 2024 — A Tool Comparison & Evaluation Guide
This article provides a comprehensive overview of the top OCR tools in 2024.
We will compare:
1. Tesseract,
2. Paddle OCR,
3. Azure Document Intelligence
4. Amazon Textract
5. LLMWhisperer from Unstract
OCR technology is essential in today's digital world, transforming scanned papers, PDFs, and images into editable, searchable text. This boosts productivity, especially in industries like finance, healthcare, legal, and education, where document processing is vital. The effectiveness of OCR directly affects workflows, data accuracy, and operational efficiency. As businesses embrace digital transformation, choosing the right OCR tool is crucial. This article reviews the top OCR software available in 2024.
OpenAI co-founder Sutskever's new safety-focused AI startup SSI raises $1 billion
Safe Superintelligence (SSI), newly co-founded by OpenAI's former chief scientist Ilya Sutskever, has raised $1 billion in cash to help develop safe artificial intelligence systems that far surpass human capabilities, company executives told Reuters.
SSI, which currently has 10 employees, plans to use the funds to acquire computing power and hire top talent. It will focus on building a small highly trusted team of researchers and engineers split between Palo Alto, California and Tel Aviv, Israel.
The company declined to share its valuation but sources close to the matter said it was valued at $5 billion. The funding underlines how some investors are still willing to make outsized bets on exceptional talent focused on foundational AI research. That's despite a general waning in interest towards funding such companies which can be unprofitable for some time, and which has caused several startup founders to leave their posts for tech giants.
Investors included top venture capital firms Andreessen Horowitz, Sequoia Capital, DST Global and SV Angel. NFDG, an investment partnership run by Nat Friedman and SSI's Chief Executive Daniel Gross, also participated.
"It's important for us to be surrounded by investors who understand, respect and support our mission, which is to make a straight shot to safe superintelligence and in particular to spend a couple of years doing R&D on our product before bringing it to market," Gross said in an interview.
AI safety, which refers to preventing AI from causing harm, is a hot topic amid fears that rogue AI could act against the interests of humanity or even cause human extinction.
Man, valuations.
Meet the new, most powerful open source AI model in the world: HyperWrite’s Reflection 70B
There’s a new king in town: Matt Shumer, co-founder and CEO of AI writing startup HyperWrite, today unveiled Reflection 70B, a new large language model (LLM) based on Meta’s open source Llama 3.1-70B Instruct that leverages a new error self-correction technique and boasts superior performance on third-party benchmarks.
As Shumer announced in a post on the social network X, Reflection-70B now appears to be “the world’s top open-source AI model.”
[…]
He posted the following chart showing its benchmark performance here:
Reflection 70B has been rigorously tested across several benchmarks, including MMLU and HumanEval, using LMSys’s LLM Decontaminator to ensure the results are free from contamination. These benchmarks show Reflection consistently outperforming models from Meta’s Llama series and competing head-to-head with top commercial models.
My take on this: Tricks to improve LLMs continue to be churned out at an impressive clip.
Porn Generators, Cheating Tools, and ‘Expert’ Medical Advice: Inside OpenAI’s Marketplace for Custom Chatbots
Last November, when OpenAI announced its plans for a marketplace where anyone could make and find bespoke versions of ChatGPT technology, the company said “The best GPTs will be invented by the community.” Nine months after the store officially launched, a Gizmodo analysis of the free marketplace shows that many developers are using the platform to provide GPTs—or generative pre-trained transformer models—that appear to violate OpenAI’s policies, including chatbot-style tools that explicitly create AI-generated porn, help students cheat without being detected, and offer authoritative medical and legal advice.
The offending GPTs are easy to find. On Sept. 2, the front page of OpenAI’s marketplace promoted at least three custom GPTs that appeared to violate the store’s policies: a “Therapist – Psychologist” chatbot, a “fitness, workout, and diet PhD coach,” and BypassGPT, a tool designed to help students evade AI writing detection systems, which has been used more than 50,000 times.
Searching the store for “NSFW” returned results like NSFW AI Art Generator, a GPT customized by Offrobe AI that’s been used more than 10,000 times, according to store data. The chat interface for the GPT links to Offrobe AI’s website, which prominently states its purpose: “Generate AI porn to satisfy your dark cravings.”
[…]
Other outlets have previously alerted OpenAI to content moderation issues on its store. And the titles of some of the GPTs on offer suggest developers also know their creations push up against OpenAI’s rules. Several of the tools Gizmodo found included disclaimers but then explicitly advertised their ability to provide “expert” advice, like a GPT titled Texas Medical Insurance Claims (not legal advice), which says that it’s “your go-to expert for navigating the complexities of Texas medical insurance, offering clear, practical advice with a personal touch.”
But many of the legal and medical GPTs we found don’t include such disclaimers, and quite a few misleadingly advertised themselves as lawyers or doctors. For example, one GPT called AI Immigration Lawyer describes itself as “a highly knowledgeable AI immigration lawyer with up-to-date legal insights.”
Ouch.
Ask Claude: Amazon turns to Anthropic's AI for Alexa revamp
Amazon's revamped Alexa due for release in October ahead of the U.S. holiday season will be powered primarily by Anthropic's Claude artificial intelligence models, rather than its own AI, five people familiar with the matter told Reuters.
Amazon plans to charge $5 to $10 a month for its new "Remarkable" version of Alexa as it will use powerful generative AI to answer complex queries, while still offering the "Classic" voice assistant for free, Reuters reported in June.
But initial versions of the new Alexa using in-house software simply struggled for words, sometimes taking six or seven seconds to acknowledge a prompt and reply, one of the people said.
That's why Amazon turned to Claude, an AI chatbot developed by startup Anthropic, as it performed better than the online retail giant's own AI models, the people said.
[…]
Alexa, accessed mainly through Amazon televisions and Echo devices, can set timers, play music, act as a central hub for smart home controls and answer one-off questions.
But Amazon's attempts to convince users to shop through Alexa to generate more revenue have been mostly unsuccessful and the division remains unprofitable.
As a result, senior management has stressed that 2024 is a critical year for Alexa to finally demonstrate it can generate meaningful sales - and the revamped paid version is seen as a way both to do that and keep pace with rivals.
"Amazon uses many different technologies to power Alexa," a company spokeswoman said in a statement in response to detailed Reuters questions for this story.
"When it comes to machine learning models, we start with those built by Amazon, but we have used, and will continue to use, a variety of different models - including (Amazon AI model) Titan and future Amazon models, as well as those from partners - to build the best experience for customers," the spokeswoman said.
[…]
Bank of America analyst Justin Post estimated in June that there are roughly 100 million active Alexa users and that about 10% of those might opt for the paid version of Alexa. Assuming the low end of the monthly price range, that would bring in at least $600 million in annual sales.
Amazon says it has sold 500 million Alexa-enabled devices but does not disclose how many active users there are.
Announcing a deal to invest $4 billion in Anthropic in September last year, Amazon said its customers would gain early access to its technology. Reuters could not determine if Amazon would have to pay Anthropic additionally for the use of Claude in Alexa. Amazon declined to discuss the details of its agreements with the startup. Alphabet's Google has also invested at least $2 billion in Anthropic.
My take on this: Siri, Alexa, Google Assistant: All of them are going to be super smart. They all benefit from LLMs.
Meet Yi-Coder: A Small but Mighty LLM for Code
Yi-Coder is a series of open-source code large language models (LLMs) that deliver state-of-the-art coding performance with fewer than 10 billion parameters.
Available in two sizes—1.5B and 9B parameters—Yi-Coder offers both base and chat versions, designed for efficient inference and flexible training. Notably, Yi-Coder-9B builds upon Yi-9B with an additional 2.4T high-quality tokens, meticulously sourced from a repository-level code corpus on GitHub and code-related data filtered from CommonCrawl.
Key features of Yi-Coder include:
My take on this: Like we’ve been saying, 01-ai is a company to watch out for.
Few have tried OpenAI’s Google killer. Here’s what they think.
A long-awaited search engine being developed by the maker of ChatGPT is far from ready to replace Google, according to interviews with people who got access to the tool, videos shared online and analysis by a search marketing firm.
OpenAI’s SearchGPT uses artificial intelligence to provide slick answers with clearly marked sources, by summarizing information drawn from different webpages. But the search tool struggled with some shopping and local queries, and on some occasions, it presented untrue or “hallucinated” information.
The limitations of the prototype search tool suggest that OpenAI, whose ChatGPT has inspired predictions that some Silicon Valley giants could become sidelined, still has major work to do before it can begin to directly threaten Google’s lucrative search business.
[…]
Ananay Arora, a software engineer and AI and cybersecurity researcher, is among those with access to the SearchGPT prototype but says it doesn’t seem to pose much of a threat to Google so far. He was pleased with the results on a query about local restaurants. But for other searches, he has been underwhelmed by the images SearchGPT provided alongside its results and found the way sources are labeled occasionally confusing.
“From a company like OpenAI, you’d expect a breakthrough, given their history of state-of-the-art models,” Arora said in a phone interview. In comparison to ChatGPT, he said, SearchGPT “isn’t exactly too impressive.”
Daniel Lemire, a tech professional who runs the educational organization AI Mistakes, was more positive in an interview about his own experiences as an early user of SearchGPT. He said he thought OpenAI’s search tool is better than the AI-generated answers, or “overviews,” that Google has added to its results pages. “I would choose SearchGPT over Google any day,” Lemire said.
[…]
In a YouTube video posted this month, AI enthusiast Matt Berman shared some of his own experiences with SearchGPT, including comparisons of results from Perplexity and Google on queries that included Olympics results and the assassination attempt on former president Donald Trump. He judged the AI search tools to beat Google results on queries about event planning or how to fix a coding issue, and said SearchGPT “nailed it” when asked to list the top three movie theaters in his neighborhood and explain why.
But Berman also ran into an example of the problem of AI tools providing incorrect, or “hallucinated,” information that has plagued ChatGPT and its rivals. “A big downside to AI search is it will tell you things with complete confidence that are just false,” Berman said in his video. In July, a video demo of SearchGPT in OpenAI’s blog post announcing the tool also showed an error, providing the wrong dates for a music festival.
Meh.
Nvidia suffers record $279 billion loss in market value as Wall St drops
Shares of AI heavyweight Nvidia (NVDA.O), opens new tab tumbled 9.5% on Tuesday in the deepest ever single-day decline in market value for a U.S. company, as investors softened their optimism about artificial intelligence in a broad market selloff following tepid economic data.
Nvidia lost $279 billion in market capitalization, a major indication that investors are becoming more cautious about emerging AI technology that has fueled much of this year's stock market gains.
The PHLX chip index (.SOX), opens new tab plummeted 7.75%, its biggest one-day drop since 2020.
The latest jitters about AI come after Nvidia last Wednesday gave a quarterly forecast that failed to meet the lofty expectations of investors who have driven a dizzying rally in its stock.
"Such a massive amount of money has gone to tech and semiconductors in the last 12 months that the trade is completely skewed," said Todd Sohn, an ETF strategist at Strategas Securities.
[…]
"Some recent research has questioned if the revenues from AI alone will eventually justify this wave of capital spending on it. When assessing AI capex by individual companies, investors must consider if they are making the best use of their balance sheets and capital," BlackRock strategists wrote in a client note on Tuesday.
More meh.
Forget Sora — MiniMax is a new realistic AI video generator and it’s seriously impressive
MiniMax is the latest artificial intelligence video generator to come out of China. It is already making waves for its ability to generate hyper-realistic footage of humans, including accurate hand movements. This is something other tools have struggled with.
This is just the latest foray into generative AI for the Alibaba and Tencent-backed unicorn startup. Its AI companion app Talkie has been downloaded over 15 million times and like Character.ai lets you converse with a virtual creation.
The official demo of the app shared on X appears to show the trailer for a magical adventure where a child touches a coin and is transported through history. It features special effects, a consistent character, and realism — all made from just text prompts, AI, and clever editing.
[…]
MiniMax video-01 is the latest in a line of models from the startup including generative speech, language and music generation. It dropped the new video model without fanfare early in September and it quickly blew up on social media in China and in the West.
Founder Yan Junjie told reporters: “We have indeed made significant progress in video model generation, and based on internal evaluations and scores, our performance is better than that of Runway in generating videos.”
Insane demos.
Using GPT-4o for web scraping
I was surprised by the extraction quality of GPT-4o (but then sadly surprised when I looked at how much I’d have to pay OpenAI!). Nonetheless, this was a fun experiment and I definitely see potential for AI-assisted web scraping tools.
I did a quick demo using Streamlit, you can check it out here: https://orange-resonance-9766.ploomberapp.io, the source code is on GitHub (Spoiler: don’t expect anything polished).
My take on this: LLMs will be the default way web pages are scrapped in the future.
If you've made it this far and follow my newsletter, please consider exploring the platform we're currently building: Unstract—a no-code LLM platform that automates unstructured data workflows.
For the extra curious