Google IO was about a week ago, that’s Google’s world wide developer conference with 6500+ attendees. They had more than 100 announcements, and the internet was full of: *yawn*, do we really care?
There was definitely a lot of “me too!” announcements that are just catching up to the other players:
- Gems = Custom GPT
- Veo = Sora
- Google Search AI Overview = Bing Chat = Perplexity
- Gemini in Workspace = MS Office Copilot
- Gemini Live = ChatGPT Voice Mode
- Gemini Advanced with data analysis = ChatGPT Pro Data Analysis mode
- ImageFX = Adobe Generative AI Features
- Lyra = Udio = Suno
But before we just dismiss the whole event, here’s a few things to care about, wonder about, and maybe even hope for.
- Infinite Wonderland, an interactive experience that endlessly generate visuals from text in the novel “Alice’s Adventures in Wonderland.” using models specifically tuned for different artist styles. Beautiful experience to bring novels to life for readers, and I think this will bring more readers back to fantastic novels.
- The Music AI Sandbox, we don’t have access yet, but it looks like where we want music and song generation to go. Being able to generate and mix different components of a song to create something new and unexpected. DJ mode in MusicFX is a first step, but I’m not impressed yet, though the direction of giving users more control and more creative knobs is the right direction.
- Teammate, a virtual team mate with it’s own Google account that answers questions, takes tasks, writes documents, and complete goals. If it actually works, you just create any team member you want to help track of things. I think project manager is the first virtual team member you’ll see a proliferation of. It’s a prototype shown in the Google Workspace keynote and there’s no timeline on release.
- Circle to search, an actually useful feature on the phone that might make me consider switching. You can circle any image in any app you’re looking at and you get search results. Now if i can just get auto organization of my photos and videos by the content and can search for it would be great.
- Context caching in Gemini API, ok, it’s totally only for developers but this is meaningful. Imagine the use case where i loaded in a code base, and i keep asking questions about it as i’m trying to decide how to implement my next set of features, or plan my road map based on dev effort.
- Illuminate turns academic papers into a two-way conversation. Yes I’m nerding out, but it’s essentially pod cast style summary of research papers, and really a much more consumable format.
- LearnLM, an education tuned language model. I’m fascinated by this because of my last 10+ years in EdTech and you can see the top teachers inspire learning, curiosity, and excitement in students in just simple dialogue while others shut them down. LLM’s can easily generate dialogue, but can it do it with pedagogy behind it that accomplishes what the top teachers does? I’m ever hopeful because this is for me by far the best educational tool invented.
Yeah, I was hoping that to-care-about list would be longer too, but here’s my I-don’t-really-care list from the big announcements.
- 2M token context window, honestly I have PTSD. The 1M context window dropped on my birthday and it errored out of me for a week. Now that’s finally stable, and accessible in API form, I’m rate limited to 5 calls a minute, which makes is unusable. So, 2M doesn’t matter because I can’t have meaningful access to find a use case of high-enough value and I don’t need to ask videos questions.
- Gemini Flash, fast would have been meaningful, BUT gpt-4o beat them to the punch. Ouch. And I have access to gpt-4o and it’s already live. So yeah, i’m sorry about that one Google, you should’a seen gpt-4o coming.
- Project Astra, the multi-modal virtual assistant on your phone. It just wasn’t a compelling vision. You watch the demo and tell me if you’re impressed. I don’t get it, how does this make my life better when i’m holding a phone in front of me walking around my office? My vision would be this mounted on my car while I’m driving, and I can ask “what on earth happened there?!”
- Asking questions of Youtube videos, umm, ok, why would I do that? I’m watching the video for the experience, not being forced to. And if i had questions while watching, there’s any number of ways to search already.
- PaliGemma and Gemma 2, more open models, yeah….*yawn*. There are more open source models than there are proprietary ones already. This didn’t unlock any area that wasn’t already possible so I’m still waiting for open model that can run on-device, a normal device most people have, that’s easily tunable for high performance in specific tasks.
- SynthID watermark for AI generated content is expanding from images, to text, to video. I should care, but this needs to become a standard and become a way to help trace back source data provenance and give credit to what the Ai generated content was trained and activated on to really be meaningful.
Wow, I didn’t expect the summary to be that short either. Maybe I’m a bit jaded from tracking all the AI changes that very few really meets the excitement bar any more. But at least my I-care list was 1 item longer than I-don’t-really-care list, so I’m still hopeful for Google.