Google I/O 2024 Live: All the major announcements.
(Image credit: GDG REHOVOT)

Google I/O 2024 Live: All the major announcements.

It's time for Google I/O once again, and this year Sundar Pichai and others from Google's leadership team announced a variety of AI-powered innovations from the stage in Mountain View, California.

(Image credit: GDG REHOVOT)

Today's focus at Google I/O was mainly on software and artificial intelligence, with discussions about Google Gemini and its various apps, as well as upcoming features for Android. Unlike previous keynotes, there were no hardware announcements or teases. Many were expecting news about the Pixel 9 or Pixel Fold 2 series, but none of that was announced.

The Google developer community at GDG Rehovot had the opportunity to watch the main conference live at the Innovation Social Club. Below, I'll summarize all the major announcements from Google I/O.

GOOGLE'S BIGGEST I/O ANNOUNCEMENTS

  • Welcome to the Gemini era. Gemini Nano, Google’s on-device mobile large language model, is getting a boost. It’s now going to be called Gemini Nano with Multimodality, which Google CEO Sundar Pichai said onstage lets it “turn any input into any output.” That means it can pull information from text, photos, audio, web, or social videos, and live video from your phone’s camera, and synthesize that input to summarize what’s within it. Google showed a video demonstrating this where someone scanned all the books on a shelf with a camera and recorded the titles in a database to recognize them later.
  • Project Astra: Google just unveiled the artificial intelligence agent of the future with Project Astra, which uses video while shooting with a phone and voice recognition to provide contextual answers to your questions. One demo showed someone using Project Astra to help them solve a coding problem using a camera while tracking where they left their glasses earlier. "Project Astra" is several artificial intelligence models placed on top of each other that can understand text, image, and video and store all this to be a kind of artificial intelligence assistant. In a recorded demo, the team showed how it can describe its surroundings, remember where it last saw objects, accurately describe its current location, and do everything promised with products like the Humane AI Pin and Rabbit R1.
  • Google Workspace: Gemini is launched for even more popular Google services, such as Gmail, which can summarize emails that are part of longer email chains. There is also a smart reply feature that will allow Gemini to provide more contextual replies after analyzing your email conversations.
  • More artificial intelligence in Android: While Google did not introduce or mention specific features regarding Android 15, the company did share how more artificial intelligence features are coming to Android. For example, there will be a wider rollout of Circle to Search. There's also an AI feature called TalkBack for Android, which is more of an accessibility tool for announcing image descriptions for those who are blind or have low vision.
  • Google Search: Search with Google gets a huge boost with new Gemini features like faster answers with AI Overview, creating a travel itinerary, and the ability to use video to solve problems.
  • Google Veo: Using generative AI, Google Veo can create realistic and detailed 1080p videos based on your request. Meanwhile, Imagen 3 can create images based on text prompts.
  • Gemini 1.5 Flash is a reduced and faster Gemini Pro As if it wasn't already confusing which Gemini is available, while Google is also introducing a new model called the Gemini Flash. It's smaller than the Pro, but Hassavis said it should be faster with lower latency for tasks that need to be fast. It is now available in Google AI Studio and can support up to 1 million tokens. Developers can sign up to get access to the legendary 2 million tokens

(Image credit: GDG REHOVOT)

  • Imagen 3 is Google's most updated artificial intelligence image generator

(Image credit: GDG REHOVOT)

An update from previous AI image generators, Imagen 3 should be able to create images that are "photorealistic," more detailed images. The company claimed that the more detailed instructions would yield better results and that it was the company's "best model yet" for word processing. Users can sign up for free use of ImageFX via labs.google, and it will be available to developers via Vertex AI.

Let's dive in - What's new?

New ways to make AI reviews work better for you, the ability to use AI in search to plan meals and trips, and even use video to answer questions about the world around you. Let's break down these six big features—all of which you can sign up for now (provided you live in the US), or wait until the end of the year for release.

1. Quick answers with AI review

You may have seen it being tested by members of Search Labs, and now it's rolling out to everyone in the US today, with more countries set to get it soon. AI Answers Take Over Search

Using this, the model can generate an answer to your question using a variety of different sourcing sites to create a quick overview of the topic - with original links to more information.

Google claims that with AI review, people are using Google more to search. It will be interesting to see how this affects whether a user decides to visit these source sites.

Google is making the biggest changes to its search engine since it launched its core product more than 20 years ago. Now, instead of displaying links to other sites or sections of those sites at the top of search results, the company will use artificial intelligence to summarize sites and provide multi-paragraph answers to search queries.

2. Customizable AI review

(Image credit: GDG REHOVOT)

3. Welcome to the Age of Gemini

One big thing coming out of Google I/O is how Gemini is getting serious upgrades to its multi-step thinking capabilities—the ability to break down complex questions and answer each element.

This also extends to Google Search AI Overviews, where you can ask a multi-step question and get all the answers in one go, instead of having to do many individual searches.

This variation in how the Gemini model is able to present the information is sure to make these reviews more accessible - giving geeks more information while giving newcomers a chance to catch up in a simpler summary.

4. Planning using search

(Image credit: GDG REHOVOT)

If you've used AI recently, you'll know that one of its superpowers is the ability to assist with planning. With Search's AI reviews, you can follow the same guidelines, and Google will gather information from the web to help you make a plan. Whether it's a trip to Berlin that requires minimal walking or an easy-to-prepare meal plan, Search can create and customize these based on tracking guidelines. Once you're finished, you can export it to Google Docs or Gmail.


5. Prainstorm with search

(Image credit: GDG REHOVOT)

If you don't have anything specific to look for, and you're just looking for inspiration, the AI-organized results page is a great way to go. In doing so, artificial intelligence almost takes over the job of ranking pages and splits groups of pages into categories created by artificial intelligence.

For example, if you're looking for a restaurant for an anniversary dinner, Gemini will share several answers from different angles with helpful clusters.

6. Google search is becoming multimodal

Photo credit: GDG REHOVOT)

Google demonstrated an impressive new search feature that allows users to search recorded videos for results and potentially find answers.

In this instance, a Googler was determined to figure out how to use a record player. She confidently recorded a video of the unit in question while asking a question, and promptly sent it. Google promptly performed a search and provided an answer in text form, which could be read aloud. This represents an innovative way to search, akin to Google Lens for video, and is distinct from the upcoming daily Project Astra AI, as it necessitates recording before searching, as opposed to working in real-time.

Yet, it's part of a dual-zone approach, integrating AI into Google Search to enhance user engagement and streamline access to information. In a recent video search demo, Google showcased an innovative feature for searching recipes and food. This functionality enables users to input natural language queries and receive recipe suggestions or restaurant recommendations directly on the search results page.

Google is making significant strides in utilizing generative AI for search, focusing on enhancing search results and accessibility.

Comparable to Google Lens, but with greater speed, Gemini's video understanding updates will empower users to swiftly find answers to inquiries about any visual content. In a demonstration featuring the repair of a record player, Google showcased Gemini's ability to analyze the video, break it down into individual frames, and extract pertinent information from rating sites to deliver a solution.

7. Meet a new AI video tool, Veo

Image credit: GDG REHOVOT, Demis Hassavis, head of Google's artificial intelligence department, introduces Veo.

Artificial intelligence companies, such as Google, aim to revolutionize the way people create visual images, audio, and movies. At I/O, Google announced a new AI tool for video creation called Veo, which aims to compete with the likes of OpenAI. Veo produces high-definition videos that can be over a minute long, a threshold that Google has yet to reach.

Before the big speeches, DJ Marc Rebillet tried to warm up the crowd by making beats using Google's AI tool. Rebilt jumped on stage and shouted "Google" over and over again. Google said it is working with creators including Rebuild, musician Wyclef Jean, and actor and producer Donald Glover on AI creations.

We've been admiring the creations of OpenAI's Sora text-to-video tool for the past few months, and now Google is joining the productive video party with its new tool called Veo. Like Sora, Veo can create minute-long 1080p videos, all with a simple prompt.

This prompt can include cinematic effects, such as requesting time-lapse or aerial photography, and the early samples look impressive. You don't have to start from scratch either - upload an input video with a command, and Veo can edit the clip to suit your request. There is also an option to add masks and change specific parts of the video as well.

Google also introduced a new AI tool for creating images called Imagen 3, designed to compete with OpenAI's Dall-E 3. The technology allows people to create realistic-looking images with text instructions.

8. Your child's homework will be much easier with NotebookLM.

(Image credit: GDG REHOVOT)

Parents often struggle to assist their children with homework, especially if it's been a while since they last studied the material. However, Google has upgraded its NotebookLM note-taking app, which could make this task much easier.

NotebookLM from the source material. now has access to Gemini 1.5 Pro. Following the demonstration at I/O 2024, it is poised to become a more effective educational tool. The demonstration, presented by Google's Josh Woodward, showcased the seamless process of creating a comprehensive learning guide by inputting notes on a chosen topic, such as science, into the notebook. Notably, this process generates supplementary components, including quizzes and FAQs, derived directly from the source material.

Impressive, but it was about to get even better. A new feature, still a prototype for now, could output all content as audio, essentially creating a podcast-style discussion. Moreover, the audio featured more than one speaker, engaging in a natural chat about the topic, which would be more helpful than a frustrated parent trying to play the role of a teacher.

Woodward even managed to interrupt and ask a question, in this case "Give us an example of basketball" - at this point the AI changed direction and brought clever metaphors to the subject, but in an accessible context. The parents on the TechRadar team are eager to try it out.

Google has added AI features to its suite of productivity apps, including Gmail, Docs, Drives and Sheets over the past year. At I/O, the company announced some new changes, allowing users to summarize groups of emails from the same sender, add details from a Google Doc to an email, or integrate content from a spreadsheet into a Slides presentation.

The company will also begin allowing people to ask Google's AI to find specific details in a document and add them to an email. Google's "Help me write" feature, which generates text from scratch, will soon be available in Spanish and Portuguese as well.

Google showed how its Gemini AI tool can also be used to teach children about new concepts, asking it to explain the physics behind how a basketball rolls and bounces.


9. Android as a tool to catch fraudulent calls

Google

Google is the owner of the Android smartphone operating system, which is running on the majority of phones worldwide. The company is actively working to enhance Android's appeal compared to Apple's iOS by integrating more AI into the operating system itself. One standout feature, known as Circle to Search, empowers users to instantly retrieve search results by circling any item they have a question about or want more information on. Furthermore, users can confidently create images for text messages using the "Ask twins" feature.

The Gemini engine can also help users get information from videos and PDFs. As they watch a video, for example, they can ask a specific question about something that happened in it. When they ask a question about a PDF file, it will direct users to the part of the PDF where they found the answer.

Scam calls have become an even bigger problem as AI voice generators allow scammers to impersonate real people. Android has introduced a feature that will listen to and interrupt calls by alerting the user if it thinks the call is coming from a scammer, such as if the caller asks for bank account details.

10. The Google Photos function has received a helpful AI boost from Gemini

(Image credit: GDG REHOVOT)

Have you ever wanted to effortlessly find a specific photo from your distant past? Whether it's a cherished note, an adorable puppy picture, or even your license plate, Google is turning this wish into a reality with a significant update to Google Photos that seamlessly integrates it with Gemini. This provides unfettered access to your entire photo library, streamlines the search process, and swiftly delivers the results you seek.

Google has called this feature "Ask Photos," and will roll it out to all users in the "coming weeks." And it will almost certainly come in handy, and make people who don't use Google Photos a little jealous.

11. Android got a major Gemini infusion

(Image credit: GDG REHOVOT)

Gemini is now integrated into Android's core, allowing it to watch, read, and understand what's on your phone's screen. This enables it to anticipate questions about whatever you're watching, similar to Google's Search Circle feature.

So it can get the context of a video you're watching, expect a summary prompt when viewing a long PDF file, or be ready for a myriad of questions about the app you're in. This is not a bad thing by any means and can be extremely useful.

Alongside Gemini integration at the system level, Gemini Nano with Multimodality will launch later this year on Pixel devices. What will it enable? Well, that should speed things up, but the way feature, for now, is that Gemini listens to conversations and can alert you in real-time if it's spam. It's pretty cool and builds on call filtering, a longstanding feature of Pixel phones. It is willing to be faster and process more on the device instead of sending it to the cloud.

12. Get ready Google Workspace will be much smarter

(Image credit: GDG REHOVOT)

Workspace users have access to a wealth of twin combinations and useful features that can significantly enhance their daily experience.

In Mail, a new side panel enables users to instruct Gemini to summarize recent conversations with a colleague. The resulting summary highlights the most important aspects with precision and clarity.

Twins on Google Meet can give you the highlights of a meeting or what other people in the conversation might be asking. You will no longer need to take notes during this conversation, which can be helpful if it's a long one.

Within Google Sheets, Gemini can help make sense of the data and process requests like pulling a specific amount or data set.

The virtual team member "Chip" may be the most futuristic example. It can live in G-chat and be called for various tasks or queries. While these tools will make their way to Workspace, likely through Labs first, the question that remains is when they will arrive in regular Gmail and Drive clients.

Given Google's AI-for-everyone approach and pushing it so hard in search, it's probably only a matter of time.

13. Music AI Sandbox

(Image credit: GDG REHOVOT)

Google's own AI music generator comes together in a full Music AI sandbox that's pushed towards "pros" who make music. Google claimed it could create musical pieces and transfer styles between tracks..

14. "Gems" - Gemini's special AI app that can write your fanfic for you

Gemini integrates with almost every Google app currently available. Still, the Gemini app itself is getting more capabilities with the "live" speech-to-text system coming this summer. Project Astra integration will enable AI to understand video sometime later this year.

Finally, there are also so-called "gems". These are custom AIs that target a specific need. Google said all you have to do is write a description of what you want to do and give it a name. You can also give the stone access to some of your apps like Gmail or Google Drive. Gems should be coming to Gemini Advanced subscribers sometime this summer.


GDG Rehovot

Google has shared details about a new AI-powered so-called "Temate" that should perform a specific function accessible to a variety of users. The example the company gave showed how it could be given the directive to sniff out problems or track progress on group projects.

Google Chat users can ask AI team members about various projects within the group. There's no word on when this might come to users, as work is likely still needed to bring it to users and allow third-party AI models to access the team member system.

16. Now circle to search will offer step-by-step answers for students

Meet Circle to Search has been out for a while on some of the latest Google Pixel and Samsung Galaxy phones, and now it's getting even more powerful with a few more AI-based improvements.

GDG Rehovot

The feature typically lets you circle or swipe over an image or text to perform a Google Lens-like search for that content. For those currently in Search Labs beta, when you circle a math or physics problem, the device will offer a step-by-step guide to solving the question. It won't offer the answer outright, though Google has said it's working on more updates in the future using its LearnLM model to directly answer questions beyond basic arithmetic, such as questions based on formulas or graphs.


GDG Rehovot

Google's Gemini AI has already proven controversial among some dedicated Pixel users, but Mountainview isn't taking any steps back from its big AI ambitions. Soon, Gemini will get more prominence in your latest Android phone. Google has announced its plans to create a Gemini overlay that can hover over any of your phone's existing apps.

This will allow users to drag or copy AI-generated images or text into other applications, such as email or a text messaging app. The overlay will also work with YouTube with the "Ask this video" function to get a text summary of the clip. This feature already exists as a beta function for YouTube Premium subscribers, but now users should eventually be able to upload it from the Gemini overlay. Twins will also be able to summarize PDF files using this overlay.

Gemini will get even better at understanding context to assist you in getting things done

Gemini on Android is a new kind of assistant that uses generative AI to help you be more creative and productive. This experience, which is integrated into Android, is getting even better at understanding the context of what's on your screen and what app you're using.

Soon, you'll be able to bring up Gemini's overlay on top of the app you're in to easily use Gemini in more ways. For example, you can drag and drop generated images into Gmail, Google Messages and other places, or tap "Ask this video" to find specific information in a YouTube video. If you have Gemini Advanced, you'll also have the option to "Ask this PDF" to quickly get answers without having to scroll through multiple pages. This update will roll out to hundreds of millions of devices over the next few months.

And we'll continue to improve Gemini to give you more dynamic suggestions related to what's on your screen.

19. The Gemini Nano is going to get vision and voice understanding

Google’s smallest version of its AI, Gemini Nano, should be getting more robust this year. Now, the smallest AI will have “multimodal” capabilities to comprehend and describe images via TalkBack. Google said it was testing additional features that would let it judge whether a phone call is a scam or not by judging whether the caller is using typical language associated with grifters. Nano is available on the recently-released Pixel 8a, so even the more mid-ranged devices should have access to some of these new AI features.

Receive alerts for suspected scams during phone calls

According to a recent report, in 12 months, people lost more than $1 trillion to fraud. We’re testing a new feature that uses Gemini Nano to provide real-time alerts during a call if it detects conversation patterns commonly associated with scams. For example, you would receive an alert if a “bank representative” asks you to urgently transfer funds, make a payment with a gift card, or request personal information like card PINs or passwords, which are uncommon bank requests. This protection all happens on-device, so your conversation stays private to you. We’ll share more about this opt-in feature later this year.

20. Get started on Android with TalkBack

TalkBack is the Google screen reader included on Android devices. TalkBack gives you eyes-free control of your device.

The setup of your device depends on the device manufacturer, Android version, and TalkBack version. These help pages apply to most devices, but you might experience some differences.

Later this year, Gemini Nano’s multimodal capabilities are coming to TalkBack, helping people who experience blindness or low vision get richer and clearer descriptions of what’s happening in an image. On average, TalkBack users come across 90 unlabeled images per day. This update will help fill in missing information — whether it’s more details about what’s in a photo that family or friends sent or the style and cut of clothes when shopping online. Since Gemini Nano is on-device, these descriptions happen quickly and even work when there's no network connection.

20. LearnLM

LearnLM is Google's new family of models fine-tuned for learning and grounded in educational research to make teaching and learning experiences more active, personal and engaging.

Generative AI is fundamentally changing how we’re approaching learning and education, enabling powerful new ways to support educators and learners. It’s taking curiosity and understanding to the next level — and we’re just at the beginning of how it can help us reimagine learning.

Building a new family of models for learning

Today we’re introducing LearnLM: our new family of models fine-tuned for learning, based on Gemini.

Grounded in educational research and tailored to how people learn, LearnLM represents an effort across Google DeepMind, Google Research and our product teams to help make learning experiences more engaging, personal and useful. Our technical report presents our approach to improving generative AI for education and highlights how we’re working together with the AI and EdTech communities to responsibly maximize its positive impact and potential.

Working alongside educators and other learning experts, we’re infusing learning science principles, like the following, into our models and the products they power:

  • Inspire active learning: Allow for practice and healthy struggle with timely feedback
  • Manage cognitive load: Present relevant, well-structured information in multiple modalities
  • Adapt to the learner: Dynamically adjust to goals and needs, grounding in relevant materials
  • Stimulate curiosity: Inspire engagement to provide motivation through the learning journey
  • Deepen metacognition: Plan, monitor and help the learner reflect on progress

Bringing LearnLM to products you already love

With LearnLM we’re enhancing learning experiences in products you already use today — like Search, YouTube and when chatting with Gemini — so they can help you deepen understanding, rather than just giving an answer. Here are a few examples:

  • In Google Search, soon you’ll be able to make sense of complex topics by tapping a button to adjust your AI Overview into the format that’s most useful for you — whether you want to simplify the language, or break it down.
  • On Android, Circle to Search can help people get unstuck on math and physics word problems directly from their phones and tablets. Later this year, you’ll be able to solve even more complex problems involving symbolic formulas, diagrams, graphs and more.
  • When chatting with Gemini, soon you’ll be able to use Gems, custom versions of Gemini that can act as personal experts on any topic. Learning coach, one of the pre-made Gems, can support you in building knowledge by providing step-by-step study guidance, along with helpful practice activities like quizzes and games. Learning coach in Gemini will launch in the coming months, and with Gemini Advanced, you’ll be able to further customize this Gem to suit your unique learning preferences.
  • On YouTube, a conversational AI tool makes it possible to figuratively “raise your hand” while watching academic videos to ask clarifying questions, get helpful explanations or take a quiz on what you’ve been learning. This even works with longer educational videos like lectures or seminars thanks to the Gemini model’s long-context capabilities. These features are already rolling out to select Android users in the U.S.

Applying LearnLM to build generative AI experiences for schools

We’ll also apply LearnLM to inform and enable the generative AI experiences that we build for schools. Through a new pilot program in Google Classroom, we’re working directly with educators to see how we can help simplify and improve the process of lesson planning — a critical, but time-consuming component of teaching. These features will help teachers discover new ideas and unique activities, find engaging materials, and differentiate their lessons and content to meet each of their students where they are. No technology can ever replace the magic of a teacher, but when applied in deliberate and thoughtful ways, AI can help to augment their capacity — giving them time back to invest in themselves and their students.

1:45

Introducing two new experimental tools to advance learning

Beyond LearnLM and our existing products, we’re also building entirely new tools and experiences that expand learning:

  • Illuminate is a new experiment that breaks down research papers into short audio conversations. In minutes, it can generate audio with two AI-generated voices in conversation, providing an overview of key insights from these complex papers. And soon, you’ll be able to ask follow-up questions. Visit Labs.google to check out a library of available audio conversations and join the waitlist to generate your own.




Maxim Shashkov

Founder of ToonTube: Where comics and manga creators do business

4 个月

At Google I/O, new AI features were showcased, but I’m skeptical until I see them in production :) This is the second Google I/O highlighting AI features for search, Gmail, etc. Interesting points: - Competitor to SORA was shown but remains inaccessible. - New image generator, Imagen 3, with limited access. - "Magic" feature returns in Google Sheets with Gemini integration, a feature I missed. - Gemini App, a paid ChatGPT copy, supports 35+ languages. No demo for its interruptible audio assistant. - Deeper Gemini integration in Android at the system level. Waiting for similar in Siri + OpenAI. - Controversially, Gemini now listens to phone calls for scam alerts. Hopefully, there’s an option to disable this. The rest you’ve likely seen, including the Gemini 1.5 model. I’m interested in test results. P.S. It was cringe when the speaker paused for applause, and the audience didn’t react. This will likely go viral.

回复

It sounds like Google I/O 2024 was an exciting event with groundbreaking innovations and new hardware releases! If you're interested in staying updated on the latest technology advancements and how they might impact our industry, you might find it helpful to check out a website like prodevtivity.com. They provide productivity metrics for developers, unlimited historical data, and team and individual engineer metrics. They also offer support for multiple organizations and integration with Slack. Make sure to explore their website to see how their product can benefit you!

要查看或添加评论,请登录

社区洞察

其他会员也浏览了