Google I/O Updates, training a 65B model on a single GPU, HF Transformers Agent, and some interesting papers.
Image generated using Lexica.art

Google I/O Updates, training a 65B model on a single GPU, HF Transformers Agent, and some interesting papers.

Welcome to the 3rd edition of the AI Matters newsletter. This edition explores progress in the AI world since last week.

But before we get too deep into the newsletter, a few things:

1. I'm floored by the response and encouragement I have received from the LinkedIn network. Thanks to all who supported and have been supporting this experiment.

2. I would like to share that many organizations, including the ACM, use the newsletter name, AI Matters. Great minds think alike. :) So, I will change the newsletter's name next week. If you have an opinion on what it should be named, please make your voice heard.


Table of Contents

1. Google I/O updates

2. Cool Projects

3. Interesting Papers


Google I/O updates

Google I/O 2023 was all about AI. Sundar Pichai, Google's CEO, and his co-presenters mentioned AI 143 times over the two-hour keynote. I have listened to the keynote, and here are some notable topics and announcements I thought were interesting:

1. Bard

Google is removing the Bard waitlist and making it available in many countries worldwide. It's also going multi-modal with images. Adobe Firefly is now directly integrated (private beta?) into Bard for image generation based on a prompt.

Tools, extensions, and partners were mentioned. I assume tools/extensions will be similar to OpenAI's plugin concept. Bard can now annotate source code with citations.

With these announcements, Bard seems to achieve general feature parity for important aspects with ChatGPT. However, based on my interactions with both, Bard still has room for improvement in the quality of responses.

2. PaLM 2

PaLM 2, the next version of the PaLM foundation model, was announced during the keynote. It will come in different sizes (number of parameters). Supposedly better at logic, reasoning, and multi-lingual text. Apparently, it can generate, debug, and explain code in 20 programming languages. This will power about 25 products across their product lineup.

3. Codey

A foundation model trained on Google's documentation and code; fine-tuned on Google Cloud user behaviors and patterns. Based on the documentation, it's integrated into the developer environment and can generate/complete code, provide chat assistance to various GCP topics, best practices, and search capabilities across documents.

One aspect that caught my attention is that we can train Codey with custom code, and Google will keep it private. I can't wait to try this out.

3. Imagen

Imagen is a text-to-diffusion model similar to Lexica (my favorite), DreamStudio by stability.ai, and MidJourney was announced.

4. Chirp

Chirp, a speech-to-text model, was announced.

5. Fine-tuned PaLM versions

  • Sec-PaLM: A version of PaLM 2 targeted at security use cases.
  • Med-PaLM 2: A version of PaLM 2 fine-tuned on medical knowledge and is apparently 9x more accurate than the base model in reasoning. It also has reached an expert-level performance in Medical Licensing exams.

6. MusicLM

MusicLM is a text-to-music model.

7. Gemini

Next-generation foundation model currently in training. A key feature of this model is its multi-modal capabilities.

8. Deep integration of Generative AI into products.

Duet AI for Workspaces, Duet AI for Google Cloud, and Duet AI for Appsheet provide deep AI integration into Google's broader product ecosystem, such as Google Slides, Docs, Vertex AI, etc. "Help me write", is a feature that allows the user to type a prompt and generate text within Gmail. Another example I appreciated was the ability to generate speaker notes for slides.

The demo shown by the Google Labs team on how Generative AI is integrated into the Google Search experience looked like a hybrid between the current search and ChatBot experience.

9. Google Cloud

Vertex AI allows the use of foundation models such as PaLM 2 to fine-tune using dedicated clusters, thereby guaranteeing the privacy of data. Using the Generative AI studio, it looks like models can be fine-tuned with users' private data using a no-code interface. Users can then deploy the model from the UI. I was impressed by this.

A3 GPU Supercomputer was announced. The specs on A3 were mind-blowing. This blog page claims 26 exaFlops of performance. A3 VMs utilize 8 H100 GPUs with 3.6 TB/s bisectional bandwidth between the GPUs. It looks like these VMs are meant for training, while the G2 VMs are meant for inference.

10. Project Tailwind

Project Tailwind allows a Google user to fine-tune a model based on private documents stored on Google Drive!

11. Identifying synthetic content

Google is working towards identifying synthetic content. For example, Google Images will include metadata that indicates when an image first appeared. Google is also building watermarking into AI-generated images.

12. Prompt-to-Wallpapers on Android

Android gets Generative AI wallpapers that you can generate using a prompt using text-to-diffusion models.

13. StudioBot

An Android developer coding assistant was announced.

14. WebGPU

The latest Chrome version has WebGPU built-in. WebGPU accelerates in-browser workloads by accelerating AI libraries like TensorFlow.js 100x. This could open up a whole new set of apps, which leverage WebGPU for inference in the browser. We will have to wait and see how this plays out.


Cool Projects

  1. This is the biggest thing that happened since last week: we can now fine-tune a 65B model on a single GPU. It requires 48GB of memory on the GPU, which means a consumer-grade GPU like a 4090 isn't going to work. So, we will still need an enterprise-grade GPU or a Quadro. But we would only need one of those, and we can rent one for about $1 from LambdaLabs as an example. So I'm guessing that for less than $100, we should be able to fine-tune a model. Maybe even less. I haven't tried this yet. Tim says we can fine-tune a 7B model in about 3 hours. This is tremendous progress!! Similar announcement from another user on Twitter.
  2. Larry Laake pointed out Anthropic's Constitutional AI approach, which differs from the approach OpenAI takes (RLHF). Philosophically, this looks like a more automated approach. I feel that it may be a while before this vision is fully realized.
  3. Hugging Face announced "Transformers Agent", a natural language wrapper around some curated set of Hugging Face models. Quiet simple to use if you are Python programmer.
  4. Scale AI, the company chosen by the White House a few days ago to evaluate the big players in the AI industry, announced Scale Donovan and Scale EGP. Donovan is targeted at the Defense industry. They have an impressive demo. EGP is targeted at Enterprises.
  5. Wendy's and Google are using AI to take drive-through orders. Last month, I thought this was a use case that could be?implemented with the current tech stack (LLMs, text-to-speech, speech-to-text models). And, now, it's reality.
  6. China has an AI news anchor.
  7. A very interesting thread on how much time and money it costs to train an LLM.
  8. An LLM for Healthcare trained on NHS-UK Conditions dataset and UK's National Institute for Health and Care Excellence (NICE) guidance.
  9. Eva is a database for AI apps.
  10. Microsoft Guidance, a GitHub project to simplify prompt-based programming.
  11. Salesforce announced TableauGPT. This allows users to interact with their data!
  12. Interested in learning more about LLMs? Here's a course on it: https://fullstackdeeplearning.com/llm-bootcamp/spring-2023/.
  13. Here's a how-to on fine-tuning the RedPajama model.


Interesting Papers

  • ImageBind: Meta has a multi-modal (multi-sensor) model that weaves text, audio, IMU (sensor to detect speed and orientation), depth perception (3D map), and heat map into a single model. This is quite impressive, as AI can now correlate between audio, video, and other modalities. Their blog post says, and I quote:

"With the capability to use several modalities for input queries and retrieve outputs across other modalities, ImageBind shows new possibilities for creators. Imagine that someone could take a video recording of an ocean sunset and instantly add the perfect audio clip to enhance it, while an image of a brindle Shih Tzu could yield essays or depth models of similar dogs. Or when a model like Make-A-Video produces a video of a carnival, ImageBind can suggest background noise to accompany it, creating an immersive experience"


Reminder: please subscribe to the?AI Matters?newsletter (soon to be renamed to something else) and share it in your network. Thank you!


Please let me know your thoughts on this edition in the comments section. Did you like it? Too much info in one article? Did I miss anything you encountered in the last week?

#innovation ?#artificialintelligence ?#technology ?#news

Praveen Cherukuri

CTO passionate about solving real business challenges using tech

1 年

By the way, one notable development from yesterday was Sam Altman's senate testimony. I haven't been able to listen to the full testimony. But if you are interested, you can listen to it at: https://www.youtube.com/watch?v=P_ACcQxJIsg.

回复

要查看或添加评论,请登录

Praveen Cherukuri的更多文章

社区洞察

其他会员也浏览了