GenAI Weekly — Edition 28

GenAI Weekly — Edition 28

Your Weekly Dose of Gen AI: News, Trends, and Breakthroughs

Stay at the forefront of the Gen AI revolution with Gen AI Weekly! Each week, we curate the most noteworthy news, insights, and breakthroughs in the field, equipping you with the knowledge you need to stay ahead of the curve.

? Click subscribe to be notified of future editions



GPT-4o is bad at processing PDF documents.

Santiago Valdarrama on LinkedIn:

GPT-4o is bad at processing PDF documents. Whoever tells you otherwise is not living in the real world. In 2024, people fill out forms using pen and paper. Try to answer questions from those forms using modern models, and you'll be disappointed.

The answer is simple: stop letting the model see your PDF document.

Instead, preprocess it and stick to showing the model text.

Here's a video by Santiago Valdarrama using Unstract to turn the documents into text while keeping the original format, showing you how to preprocess documents. You can find the code here: https://github.com/svpino/unstract-llmwhisperer-sample


Andy Jassy on Amazon Q, their GenAI assistant for software development

Andy Jassy on LinkedIn:

Andy Jassy

Amazon Q, our GenAI assistant for software development, is trying to bring some light to this heaviness. We have a new code transformation capability, and here’s what we found when we integrated it into our internal systems and applied it to our needed Java upgrades:

- The average time to upgrade an application to Java 17 plummeted from what’s typically 50 developer-days to just a few hours. We estimate this has saved us the equivalent of 4,500 developer-years of work (yes, that number is crazy but, real).

- In under six months, we've been able to upgrade more than 50% of our production Java systems to modernized Java versions at a fraction of the usual time and effort. And, our developers shipped 79% of the auto-generated code reviews without any additional changes.

- The benefits go beyond how much effort we’ve saved developers. The upgrades have enhanced security and reduced infrastructure costs, providing an estimated $260M in annualized efficiency gains.

My take on this: One of the very few horizontally monetizable use cases for LLMs currently are coding-related copilots.


Creating calendar entries from an image using Anthropic Claude 3.5

From Greg Wilson’s Tech Blog:

A few days ago, my jazz piano teacher sent me the new fall/winter schedule for my private jazz piano lessons -- 13 different dates -- as a JPG (mine are outlined in green marker):


I was too lazy to go make 13 entries in Google Calendar, so I decided to see if Claude could help me out:

I first uploaded the jpg to Claude 3.5 Sonnet)...

My prompt: List the dates that are outlined in green

Cool - that was easy and accurate. Now I need to get it into my calendar, so I asked it to create an ics file...

My take on this: We’re happy when AI saves us $260 million (like we observed from the first entry in this week’s newsletter). But, we’re also happy when we save time and when not having to do annoying work! Ah, vision models.


Anthropic publishes the ‘system prompts’ that make Claude tick

Kyle Wiggers writing for Techcrunch:

Kyle Wiggers

vendors usually keep system prompts close to the chest — presumably for competitive reasons, but also perhaps because knowing the system prompt may suggest ways to circumvent it. The only way to expose GPT-4o‘s system prompt, for example, is through a prompt injection attack. And even then, the system’s output can’t be trusted completely.
However, Anthropic, in its continued effort to paint itself as a more ethical, transparent AI vendor, has published the system prompts for its latest models (Claude 3 Opus, Claude 3.5 Sonnet and Claude 3 Haiku) in the Claude iOS and Android apps and on the web.

[…]

The latest prompts, dated July 12, outline very clearly what the Claude models can’t do — e.g. “Claude cannot open URLs, links, or videos.” Facial recognition is a big no-no; the system prompt for Claude Opus tells the model to “always respond as if it is completely face blind” and to “avoid identifying or naming any humans in [images].”

But the prompts also describe certain personality traits and characteristics — traits and characteristics that Anthropic would have the Claude models exemplify.

The prompt for Claude 3 Opus, for instance, says that Claude is to appear as if it “[is] very smart and intellectually curious,” and “enjoys hearing what humans think on an issue and engaging in discussion on a wide variety of topics.” It also instructs Claude to treat controversial topics with impartiality and objectivity, providing “careful thoughts” and “clear information” — and never to begin responses with the words “certainly” or “absolutely.”

My take on this: There were a bunch of chuckles around Apple’s prompts that included things like “Do not hallucinate”. It’s always good to look at pro-level prompts.


Judge dismisses majority of GitHub Copilot copyright claims

Ryan Daws writing for Developer Tech:

Judge Jon Tigar’s ruling, unsealed last week, leaves only two claims standing: one accusing the companies of an open-source license violation and another alleging breach of contract. This decision marks a substantial setback for the developers who argued that GitHub Copilot, which uses OpenAI’s technology and is owned by Microsoft, unlawfully trained on their work.
The court’s dismissal primarily focused on the accusation that GitHub Copilot violates the Digital Millennium Copyright Act (DMCA) by suggesting code without proper attribution. An amended version of the complaint had taken issue with GitHub’s duplication detection filter, which allows users to “detect and suppress” Copilot suggestions matching public code on GitHub.
The developers argued that turning off this filter would “receive identical code” and cited a study showing how AI models can “memorise” and reproduce parts of their training data, potentially including copyrighted code.
However, Judge Tigar found these arguments unconvincing. He determined that the code allegedly copied by GitHub was not sufficiently similar to the developers’ original work. The judge also noted that the cited study itself mentions that GitHub Copilot “rarely emits memorised code in benign situations.”
As a result, Judge Tigar dismissed this allegation with prejudice, meaning the developers cannot refile the claim. Additionally, the court dismissed requests for punitive damages and monetary relief in the form of unjust enrichment.

AI: 1. Human: 0.


With 10x growth since 2023, Llama is the leading engine of AI innovation

From Meta’s AI blog:

  • Llama models are approaching 350 million downloads to date (more than 10x the downloads compared to this time last year), and they were downloaded more than 20 million times in the last month alone, making Llama the leading open source model family.
  • Llama usage by token volume across our major cloud service provider partners has more than doubled in just three months from May through July 2024 when we released Llama 3.1.
  • Monthly usage (token volume) of Llama grew 10x from January to July 2024 for some of our largest cloud service providers.

Llama has the resources behind it that very few other LLMs have.


Building LLMs from the Ground Up: A 3-hour Coding Workshop

Sebastian Raschka on his newsletter:

Sebastian Raschka, PhD

Below, you'll find a table of contents to get an idea of what this video covers (the video itself has clickable chapter marks, allowing you to jump directly to topics of interest):

0:00 – Workshop overview

2:17 – Part 1: Intro to LLMs

9:14 – Workshop materials

10:48 – Part 2: Understanding LLM input data

23:25 – A simple tokenizer class

41:03 – Part 3: Coding an LLM architecture

45:01 – GPT-2 and Llama 2

1:07:11 – Part 4: Pretraining

1:29:37 – Part 5.1: Loading pretrained weights

1:45:12 – Part 5.2: Pretrained weights via LitGPT

1:53:09 – Part 6.1: Instruction finetuning

2:08:21 – Part 6.2: Instruction finetuning via LitGPT

02:26:45 – Part 6.3: Benchmark evaluation

02:36:55 – Part 6.4: Evaluating conversational performance

02:42:40 – Conclusion

If you have the time (and the inclination), you should definitely take this.


If you've made it this far and follow my newsletter, please consider exploring the platform we're currently building: Unstract—a no-code LLM platform that automates unstructured data workflows.



For the extra curious


要查看或添加评论,请登录

社区洞察

其他会员也浏览了