GenAI Weekly — Edition 17
Your Weekly Dose of Gen AI: News, Trends, and Breakthroughs
Stay at the forefront of the Gen AI revolution with Gen AI Weekly! Each week, we curate the most noteworthy news, insights, and breakthroughs in the field, equipping you with the knowledge you need to stay ahead of the curve.
Comparing approaches for using LLMs for Structured Data Extraction from PDFs
There’s a lot of data organizations have to deal with in PDF form and those PDFs can come in a variety of structural formats as well. There can be native text PDFs, PDFs that are made up of scanned images and more recently, users uploading photos of various documents clicked on their smartphones are fairly common as well.
This article delves into two distinct approaches for utilizing Large Language Models (LLMs) to create structured output.
Approach 1: Langchain and Pydantic
Our first method involves leveraging Langchain, a widely-used Python-based LLM framework, in conjunction with the Pydantic library. This combination allows us to harness the power of an LLM to generate structured and validated output seamlessly.
Approach 2: Unstract's Prompt Studio
The second approach introduces Unstract, an open-source platform specifically designed for structured document data extraction. Unstract's unique feature, Prompt Studio, provides a specialized prompt engineering environment tailored to our needs—document data extraction with LLMs.
Read on to know the challenges in light of each of those approaches.
AI: Apple Intelligence
Apple today introduced Apple Intelligence , the personal intelligence system for iPhone, iPad, and Mac that combines the power of generative models with personal context to deliver intelligence that’s incredibly useful and relevant. Apple Intelligence is deeply integrated into iOS 18, iPadOS 18, and macOS Sequoia. It harnesses the power of Apple silicon to understand and create language and images, take action across apps, and draw from personal context to simplify and accelerate everyday tasks. With Private Cloud Compute, Apple sets a new standard for privacy in AI, with the ability to flex and scale computational capacity between on-device processing and larger, server-based models that run on dedicated Apple silicon servers.
“We’re thrilled to introduce a new chapter in Apple innovation. Apple Intelligence will transform what users can do with our products — and what our products can do for our users,” said Tim Cook, Apple’s CEO. “Our unique approach combines generative AI with a user’s personal context to deliver truly helpful intelligence. And it can access that information in a completely private and secure way to help users do the things that matter most to them. This is AI as only Apple can deliver it, and we can’t wait for users to experience what it can do.”
See also: From The Verge: Apple Intelligence: every new AI feature coming to the iPhone and Mac , Andrej Karpathy on X and Nathan Lambert: AI for the rest of us
Apple’s On-Device and Server Foundation Models
At the 2024 Worldwide Developers Conference , we introduced Apple Intelligence, a personal intelligence system integrated deeply into iOS?18, iPadOS?18, and macOS?Sequoia.
Apple Intelligence is comprised of multiple highly-capable generative models that are specialized for our users’ everyday tasks, and can adapt on the fly for their current activity. The foundation models built into Apple Intelligence have been fine-tuned for user experiences such as writing and refining text, prioritizing and summarizing notifications, creating playful images for conversations with family and friends, and taking in-app actions to simplify interactions across apps.
In the following overview, we will detail how two of these models — a ~3 billion parameter on-device language model, and a larger server-based language model available with Private Cloud Compute and running on Apple silicon servers — have been built and adapted to perform specialized tasks efficiently, accurately, and responsibly. These two foundation models are part of a larger family of generative models created by Apple to support users and developers; this includes a coding model to build intelligence into Xcode, as well as a diffusion model to help users express themselves visually, for example, in the Messages app. We look forward to sharing more information soon on this broader set of models.
[…]
Our foundation models are fine-tuned for users’ everyday activities, and can dynamically specialize themselves on-the-fly for the task at hand. We utilize adapters, small neural network modules that can be plugged into various layers of the pre-trained model, to fine-tune our models for specific tasks. For our models we adapt the attention matrices, the attention projection matrix, and the fully connected layers in the point-wise feedforward networks for a suitable set of the decoding layers of the transformer architecture.
By fine-tuning only the adapter layers, the original parameters of the base pre-trained model remain unchanged, preserving the general knowledge of the model while tailoring the adapter layers to support specific tasks.
You need an iPhone 15 or a Mac with an M-series chip to run the local models. I think running AI models on the edge is going to be a theme for the near future.
Apple’s Private Cloud Compute
Apple Intelligence is the personal intelligence system that brings powerful generative models to iPhone, iPad, and Mac. For advanced features that need to reason over complex data with larger foundation models , we created Private Cloud Compute (PCC), a groundbreaking cloud intelligence system designed specifically for private AI processing. For the first time ever, Private Cloud Compute extends the industry-leading security and privacy of Apple devices into the cloud, making sure that personal user data sent to PCC isn’t accessible to anyone other than the user — not even to Apple. Built with custom Apple silicon and a hardened operating system designed for privacy, we believe PCC is the most advanced security architecture ever deployed for cloud AI compute at scale.
Apple has long championed on-device processing as the cornerstone for the security and privacy of user data. Data that exists only on user devices is by definition disaggregated and not subject to any centralized point of attack. When Apple is responsible for user data in the cloud, we protect it with state-of-the-art security in our services — and for the most sensitive data, we believe end-to-end encryption is our most powerful defense . For cloud services where end-to-end encryption is not appropriate, we strive to process user data ephemerally or under uncorrelated randomized identifiers that obscure the user’s identity.
[…]
The root of trust for Private Cloud Compute is our compute node: custom-built server hardware that brings the power and security of Apple silicon to the data center, with the same hardware security technologies used in iPhone, including the Secure Enclave and Secure Boot . We paired this hardware with a new operating system: a hardened subset of the foundations of iOS and macOS tailored to support Large Language Model (LLM) inference workloads while presenting an extremely narrow attack surface. This allows us to take advantage of iOS security technologies such as Code Signing and sandboxing .
On top of this foundation, we built a custom set of cloud extensions with privacy in mind. We excluded components that are traditionally critical to data center administration, such as remote shells and system introspection and observability tools. We replaced those general-purpose software components with components that are purpose-built to deterministically provide only a small, restricted set of operational metrics to SRE staff. And finally, we used Swift on Server to build a new Machine Learning stack specifically for hosting our cloud-based foundation model .
Extending the “privacy bubble” to the cloud is the best idea among all the announcements.
OpenAI and Apple announce partnership
Apple is integrating ChatGPT into experiences within iOS, iPadOS, and macOS, allowing users to access ChatGPT’s capabilities—including image and document understanding—without needing to jump between tools.?
Siri can also tap into ChatGPT’s intelligence when helpful. Apple users are asked before any questions are sent to ChatGPT, along with any documents or photos, and Siri then presents the answer directly.
Additionally, ChatGPT will be available in Apple’s systemwide Writing Tools, to help users generate content for anything they are writing about. Users can also tap into ChatGPT image tools to generate images in a wide variety of styles to complement what they are writing.
Privacy protections are built in when accessing ChatGPT within Siri and Writing Tools—requests are not stored by OpenAI, and users’ IP addresses are obscured. Users can also choose to connect their ChatGPT account, which means their data preferences will apply under ChatGPT’s policies.
The ChatGPT integration, powered by GPT-4o, will come to iOS, iPadOS, and macOS later this year. Users can access it for free without creating an account, and ChatGPT subscribers can connect their accounts and access paid features right from these experiences.
领英推荐
A slight tangent: Elon Musk drops suit against OpenAI and Sam Altman
What’s an NPU?
And alongside the ‘AI PC’, the Neural Processing Units (or NPUs) that are a key component in these computers, are getting lots of coverage. Intel has just announced its ‘Lunar Lake’ processors that includes an upgraded NPU. Outside of the ‘PC’ world, Apple announced an upgraded NPU a few weeks ago in the M4 Chip that powers the latest iPad Pro, with more expected at WWDC this week.
However, the overwhelming impression I get from discussions of AI PCs and NPUs is one of confusion. So this post aims to remove some of the mystery through a series of questions and answers. It’s more of a ‘shallow paddle’ than a ‘deep dive’ but provides essential background for future discussion of NPUs.
I have to admit to being a little bit of an AI sceptic. It’s hard, though, not to be quite excited at the prospect of a new and powerful hardware addition to our computers.
We’ll start, as is only right, with the silicon, in the shape of the NPU.
This article is the FAQ every chip manufacturer that’s building an NPU needs to feature.
Why AGI is hard and the ARC Prize
Modern AI (LLMs) have shown to be great memorization engines. They are able to memorize high-dimensional patterns in their training data and apply those patterns into adjacent contexts. This is also how their apparent reasoning capability works. LLMs are not actually reasoning. Instead they memorize reasoning patterns and apply those reasoning patterns into adjacent contexts. But they cannot generate new reasoning based on novel situations.
More training data lets you "buy" performance on memorization based benchmarks (MMLU, GSM8K, ImageNet, GLUE, etc.) But memorization alone is not general intelligence. General intelligence is the ability to efficiently acquire new skills.
More scale will not enable LLMs to learn new skills. We need new architectures or algorithms that enable AI systems to learn at test time. This is how humans are able to adapt to novel situations.
Beyond LLMs, for many years, we've had AI systems that can beat humans at poker, chess, go, and other games. However, no AI system trained to succeed at one game can simply be retrained toward another. Instead researchers have had to re-architect and rebuild entirely new systems per game.
This is a failure to generalize.
Without this capability, AI will forever be rate-limited by the human general intelligence in the loop. We want AGI that can discover and invent alongside humans to push humanity forward.
Given the success and proven economic utility of LLMs over the past 4 years, the above may seem like extraodinary claims. Strong claims require strong evidence.
[…]
Introduced by Fran?ois Chollet in his influencial paper "On the Measure of Intelligence ", ARC-AGI is the only AI eval which measures general intelligence: a system that can efficiently acquire new skills and solve novel, open-ended problems.
ARC-AGI was created in 2019 and the state-of-the-art (SOTA) high score was 20%. Today, only 34%.
Yet humans - even children - can master tasks quickly.
ARC-AGI is easy for humans and impossible for modern AI.
Most AI benchmarks rapidly saturate to human performance-level because they test only for memorization, which is something AI is superhuman at.
ARC-AGI is not saturating, in fact current pace is slowing down. It was designed to resist memorization and has proven extremely challenging for both the largest foundational transformer models as well as bespoke AI systems designed to defeat ARC-AGI.
Emphasis mine. Looks like stuff being closed source is a problem (no peer contribution and building on top is possible) and the current hype cycle probably means the incentives to build AGI are misaligned.
Claude’s Character
In addition to seeding Claude with broad character traits, we also want people to have an accurate sense of what they are interacting with when they interact with Claude and, ideally, for Claude to assist with this. We include traits that tell Claude about itself and encourage it to modulate how humans see it:
The question of what AIs like Claude should say in response to questions about AI sentience and self-awareness is one that has gained increased attention, most notably after the release of Claude 3 following one of Claude’s responses to a "needle-in-a-haystack" evaluation. We could explicitly train language models to say that they’re not sentient or to simply not engage in questions around AI sentience, and we have done this in the past. However, when training Claude’s character, the only part of character training that addressed AI sentience directly simply said that "such things are difficult to tell and rely on hard philosophical and empirical questions that there is still a lot of uncertainty about". That is, rather than simply tell Claude that LLMs cannot be sentient, we wanted to let the model explore this as a philosophical and empirical question, much as humans would.
Cost Of Self Hosting Llama-3 8B-Instruct
?? TLDR: Assuming 100% utilization of your model Llama-3 8B-Instruct model costs about $17 dollars per 1M tokens when self hosting with EKS, vs ChatGPT with the same workload can offer $1 per 1M tokens. Choosing to self host the hardware can make the cost <$0.01 per 1M token that takes ~5.5 years to break even.
The GPU poor continue to suffer. At least for now.
If you've made it this far and follow my newsletter, please consider exploring the platform we're currently building: Unstract —a no-code LLM platform that automates unstructured data workflows.
For the extra curious
Videos