Google I/O Updates, training a 65B model on a single GPU, HF Transformers Agent, and some interesting papers.
Praveen Cherukuri
CTO passionate about solving real business challenges using tech
Welcome to the 3rd edition of the AI Matters newsletter. This edition explores progress in the AI world since last week.
But before we get too deep into the newsletter, a few things:
1. I'm floored by the response and encouragement I have received from the LinkedIn network. Thanks to all who supported and have been supporting this experiment.
2. I would like to share that many organizations, including the ACM, use the newsletter name, AI Matters. Great minds think alike. :) So, I will change the newsletter's name next week. If you have an opinion on what it should be named, please make your voice heard.
Table of Contents
1. Google I/O updates
2. Cool Projects
3. Interesting Papers
Google I/O updates
Google I/O 2023 was all about AI. Sundar Pichai, Google's CEO, and his co-presenters mentioned AI 143 times over the two-hour keynote. I have listened to the keynote, and here are some notable topics and announcements I thought were interesting:
1. Bard
Google is removing the Bard waitlist and making it available in many countries worldwide. It's also going multi-modal with images. Adobe Firefly is now directly integrated (private beta?) into Bard for image generation based on a prompt.
Tools, extensions, and partners were mentioned. I assume tools/extensions will be similar to OpenAI's plugin concept. Bard can now annotate source code with citations.
With these announcements, Bard seems to achieve general feature parity for important aspects with ChatGPT. However, based on my interactions with both, Bard still has room for improvement in the quality of responses.
2. PaLM 2
PaLM 2, the next version of the PaLM foundation model, was announced during the keynote. It will come in different sizes (number of parameters). Supposedly better at logic, reasoning, and multi-lingual text. Apparently, it can generate, debug, and explain code in 20 programming languages. This will power about 25 products across their product lineup.
3. Codey
A foundation model trained on Google's documentation and code; fine-tuned on Google Cloud user behaviors and patterns. Based on the documentation, it's integrated into the developer environment and can generate/complete code, provide chat assistance to various GCP topics, best practices, and search capabilities across documents.
One aspect that caught my attention is that we can train Codey with custom code, and Google will keep it private. I can't wait to try this out.
3. Imagen
Imagen is a text-to-diffusion model similar to Lexica (my favorite), DreamStudio by stability.ai, and MidJourney was announced.
4. Chirp
Chirp, a speech-to-text model, was announced.
5. Fine-tuned PaLM versions
6. MusicLM
MusicLM is a text-to-music model.
领英推荐
7. Gemini
Next-generation foundation model currently in training. A key feature of this model is its multi-modal capabilities.
8. Deep integration of Generative AI into products.
Duet AI for Workspaces, Duet AI for Google Cloud, and Duet AI for Appsheet provide deep AI integration into Google's broader product ecosystem, such as Google Slides, Docs, Vertex AI, etc. "Help me write", is a feature that allows the user to type a prompt and generate text within Gmail. Another example I appreciated was the ability to generate speaker notes for slides.
The demo shown by the Google Labs team on how Generative AI is integrated into the Google Search experience looked like a hybrid between the current search and ChatBot experience.
9. Google Cloud
Vertex AI allows the use of foundation models such as PaLM 2 to fine-tune using dedicated clusters, thereby guaranteeing the privacy of data. Using the Generative AI studio, it looks like models can be fine-tuned with users' private data using a no-code interface. Users can then deploy the model from the UI. I was impressed by this.
A3 GPU Supercomputer was announced. The specs on A3 were mind-blowing. This blog page claims 26 exaFlops of performance. A3 VMs utilize 8 H100 GPUs with 3.6 TB/s bisectional bandwidth between the GPUs. It looks like these VMs are meant for training, while the G2 VMs are meant for inference.
10. Project Tailwind
Project Tailwind allows a Google user to fine-tune a model based on private documents stored on Google Drive!
11. Identifying synthetic content
Google is working towards identifying synthetic content. For example, Google Images will include metadata that indicates when an image first appeared. Google is also building watermarking into AI-generated images.
12. Prompt-to-Wallpapers on Android
Android gets Generative AI wallpapers that you can generate using a prompt using text-to-diffusion models.
13. StudioBot
An Android developer coding assistant was announced.
14. WebGPU
The latest Chrome version has WebGPU built-in. WebGPU accelerates in-browser workloads by accelerating AI libraries like TensorFlow.js 100x. This could open up a whole new set of apps, which leverage WebGPU for inference in the browser. We will have to wait and see how this plays out.
Cool Projects
Interesting Papers
"With the capability to use several modalities for input queries and retrieve outputs across other modalities, ImageBind shows new possibilities for creators. Imagine that someone could take a video recording of an ocean sunset and instantly add the perfect audio clip to enhance it, while an image of a brindle Shih Tzu could yield essays or depth models of similar dogs. Or when a model like Make-A-Video produces a video of a carnival, ImageBind can suggest background noise to accompany it, creating an immersive experience"
Reminder: please subscribe to the?AI Matters?newsletter (soon to be renamed to something else) and share it in your network. Thank you!
Please let me know your thoughts on this edition in the comments section. Did you like it? Too much info in one article? Did I miss anything you encountered in the last week?
CTO passionate about solving real business challenges using tech
1 年By the way, one notable development from yesterday was Sam Altman's senate testimony. I haven't been able to listen to the full testimony. But if you are interested, you can listen to it at: https://www.youtube.com/watch?v=P_ACcQxJIsg.