GenAI Weekly — Edition 28
Your Weekly Dose of Gen AI: News, Trends, and Breakthroughs
Stay at the forefront of the Gen AI revolution with Gen AI Weekly! Each week, we curate the most noteworthy news, insights, and breakthroughs in the field, equipping you with the knowledge you need to stay ahead of the curve.
GPT-4o is bad at processing PDF documents.
GPT-4o is bad at processing PDF documents. Whoever tells you otherwise is not living in the real world. In 2024, people fill out forms using pen and paper. Try to answer questions from those forms using modern models, and you'll be disappointed.
The answer is simple: stop letting the model see your PDF document.
Instead, preprocess it and stick to showing the model text.
Here's a video by Santiago Valdarrama using Unstract to turn the documents into text while keeping the original format, showing you how to preprocess documents. You can find the code here: https://github.com/svpino/unstract-llmwhisperer-sample
Andy Jassy on Amazon Q, their GenAI assistant for software development
Amazon Q, our GenAI assistant for software development, is trying to bring some light to this heaviness. We have a new code transformation capability, and here’s what we found when we integrated it into our internal systems and applied it to our needed Java upgrades:
- The average time to upgrade an application to Java 17 plummeted from what’s typically 50 developer-days to just a few hours. We estimate this has saved us the equivalent of 4,500 developer-years of work (yes, that number is crazy but, real).
- In under six months, we've been able to upgrade more than 50% of our production Java systems to modernized Java versions at a fraction of the usual time and effort. And, our developers shipped 79% of the auto-generated code reviews without any additional changes.
- The benefits go beyond how much effort we’ve saved developers. The upgrades have enhanced security and reduced infrastructure costs, providing an estimated $260M in annualized efficiency gains.
My take on this: One of the very few horizontally monetizable use cases for LLMs currently are coding-related copilots.
Creating calendar entries from an image using Anthropic Claude 3.5
A few days ago, my jazz piano teacher sent me the new fall/winter schedule for my private jazz piano lessons -- 13 different dates -- as a JPG (mine are outlined in green marker):
I was too lazy to go make 13 entries in Google Calendar, so I decided to see if Claude could help me out:
I first uploaded the jpg to Claude 3.5 Sonnet)...
My prompt: List the dates that are outlined in green
Cool - that was easy and accurate. Now I need to get it into my calendar, so I asked it to create an ics file...
My take on this: We’re happy when AI saves us $260 million (like we observed from the first entry in this week’s newsletter). But, we’re also happy when we save time and when not having to do annoying work! Ah, vision models.
Anthropic publishes the ‘system prompts’ that make Claude tick
vendors usually keep system prompts close to the chest — presumably for competitive reasons, but also perhaps because knowing the system prompt may suggest ways to circumvent it. The only way to expose GPT-4o‘s system prompt, for example, is through a prompt injection attack. And even then, the system’s output can’t be trusted completely.
However, Anthropic, in its continued effort to paint itself as a more ethical, transparent AI vendor, has published the system prompts for its latest models (Claude 3 Opus, Claude 3.5 Sonnet and Claude 3 Haiku) in the Claude iOS and Android apps and on the web.
[…]
The latest prompts, dated July 12, outline very clearly what the Claude models can’t do — e.g. “Claude cannot open URLs, links, or videos.” Facial recognition is a big no-no; the system prompt for Claude Opus tells the model to “always respond as if it is completely face blind” and to “avoid identifying or naming any humans in [images].”
But the prompts also describe certain personality traits and characteristics — traits and characteristics that Anthropic would have the Claude models exemplify.
The prompt for Claude 3 Opus, for instance, says that Claude is to appear as if it “[is] very smart and intellectually curious,” and “enjoys hearing what humans think on an issue and engaging in discussion on a wide variety of topics.” It also instructs Claude to treat controversial topics with impartiality and objectivity, providing “careful thoughts” and “clear information” — and never to begin responses with the words “certainly” or “absolutely.”
My take on this: There were a bunch of chuckles around Apple’s prompts that included things like “Do not hallucinate”. It’s always good to look at pro-level prompts.
领英推荐
Judge dismisses majority of GitHub Copilot copyright claims
Judge Jon Tigar’s ruling, unsealed last week, leaves only two claims standing: one accusing the companies of an open-source license violation and another alleging breach of contract. This decision marks a substantial setback for the developers who argued that GitHub Copilot, which uses OpenAI’s technology and is owned by Microsoft, unlawfully trained on their work.
The court’s dismissal primarily focused on the accusation that GitHub Copilot violates the Digital Millennium Copyright Act (DMCA) by suggesting code without proper attribution. An amended version of the complaint had taken issue with GitHub’s duplication detection filter, which allows users to “detect and suppress” Copilot suggestions matching public code on GitHub.
The developers argued that turning off this filter would “receive identical code” and cited a study showing how AI models can “memorise” and reproduce parts of their training data, potentially including copyrighted code.
However, Judge Tigar found these arguments unconvincing. He determined that the code allegedly copied by GitHub was not sufficiently similar to the developers’ original work. The judge also noted that the cited study itself mentions that GitHub Copilot “rarely emits memorised code in benign situations.”
As a result, Judge Tigar dismissed this allegation with prejudice, meaning the developers cannot refile the claim. Additionally, the court dismissed requests for punitive damages and monetary relief in the form of unjust enrichment.
AI: 1. Human: 0.
With 10x growth since 2023, Llama is the leading engine of AI innovation
Llama has the resources behind it that very few other LLMs have.
Building LLMs from the Ground Up: A 3-hour Coding Workshop
Below, you'll find a table of contents to get an idea of what this video covers (the video itself has clickable chapter marks, allowing you to jump directly to topics of interest):
0:00 – Workshop overview
2:17 – Part 1: Intro to LLMs
9:14 – Workshop materials
10:48 – Part 2: Understanding LLM input data
23:25 – A simple tokenizer class
41:03 – Part 3: Coding an LLM architecture
45:01 – GPT-2 and Llama 2
1:07:11 – Part 4: Pretraining
1:29:37 – Part 5.1: Loading pretrained weights
1:45:12 – Part 5.2: Pretrained weights via LitGPT
1:53:09 – Part 6.1: Instruction finetuning
2:08:21 – Part 6.2: Instruction finetuning via LitGPT
02:26:45 – Part 6.3: Benchmark evaluation
02:36:55 – Part 6.4: Evaluating conversational performance
02:42:40 – Conclusion
If you have the time (and the inclination), you should definitely take this.
If you've made it this far and follow my newsletter, please consider exploring the platform we're currently building: Unstract—a no-code LLM platform that automates unstructured data workflows.
For the extra curious