Google's Gemini Launch and Amazon's Q generative assistant
Google's Gemini Launched

Google's Gemini Launch and Amazon's Q generative assistant

There were reports that the long-awaited Google Gemini model might be delayed until next year, but in a virtual briefing and a series of blog posts and videos by the Google DeepMind team, Google introduced Gemini 1.0 on December 5th.

Gemini comes in a suite of three AI model sizes and is natively multi-modal: Audio, text, images / photos, and videos, were all part of the AI model training data and can be used as inputs to Gemini models.

The AI models in Gemini are:

Gemini Ultra: Gemini Ultra is the flagship AI model meant for highly complex tasks.

Gemini Pro: Gemini Pro is the mid-range version of the model, and will be the model powering Bard.

Gemini Nano: Gemini Nano is the smallest and most efficient model, designed to run on edge devices such as Google’s Pixel 8 Pro. Gemini Nano comes in two model sizes, Nano-1 (1.8B parameters) and Nano-2 (3.25B parameters), for low- and high-memory devices respectively.

Gemini Ultra, Pro and Nano
Overview of the Gemini 1.0 model family

In a series of videos, Google Deep Mind touted some of the many capabilities of Gemini: A demo of multimodal reasoning capabilities , responding with text and images and a custom AI-generated UI on the fly for a given prompt. A demo of processing and understand raw audio to combine audio understanding and reasoning, translate. A demo on unlocking insights in scientific literature , Gemini searched through 200,000 papers “over a lunch break” to summarize and glean insights from them. A demo of Gemini’s coding ability now in AlphaCode2. Showing multi-modal Gemini guessing movies from pictures , a kind of pictionary test.

“AI is a profound platform shift, bigger than web or mobile. … I believe the transition we are seeing right now with AI will be the most profound in our lifetimes, far bigger than the shift to mobile or to the web before it. AI has the potential to create opportunities — from the everyday to the extraordinary — for people everywhere. It will bring new waves of innovation and economic progress and drive knowledge, learning, creativity and productivity on a scale we haven’t seen before.”

- Sundar Pichai

How Gemini was trained

Infrastructure: Gemini Ultra was trained on a large fleet of TPUv4 accelerators, deployed in “SuperPods” of 4096 chips each, across multiple datacenters. This was a scale up from PaLM 2. Gemini Pro, smaller than Ultra, completed pretraining in a matter of weeks, leveraging a fraction of the Ultra’s resources.

Dataset: The Gemini dataset was multimodal and multilingual, including web, books, code, images, videos and audio. They followed Chinchilla scaling for the large models, and Nano utilized distillation techniques and were trained on more tokens per parameter.

Regarding AI safety, they declared that “Gemini has the most comprehensive safety evaluations of any Google AI model to date, including for bias and toxicity.” As with other technical details, the AI Safety section is a bit vague, and since the red teaming of Gemini Ultra is still ongoing, perhaps more to say about that later.

Google Gemini Technical Report

Google Deep Mind released a technical report accompanying the announcement, called “Gemini: A Family of Highly Capable Multimodal Models .” At its heart, Gemini is a transformer architecture AI model that can take in text, audio, video and images, and output interleaved images and text. They mention that Gemini drew inspiration from Flamingo, CoCa, and PaLI, “with the important distinction that the models are multi-modal from the beginning.”

They also note that “Video understanding is accomplished by encoding the video as a sequence of frames in the large context window. Video frames or images can be interleaved naturally with text or audio as part of the model input.”

The power of multi-modality weaves through a number of their powerful use cases they demonstrated. For example, the first example they show is checking and correcting a student physics problem answer by natively reading from an image.

Gemini reviews and corrects a student physics problem answer

How does Google Gemini Ultra compare with GPT-4?

Gemini Ultra vs GPT-4
Gemini Pro vs GPT-3.5


Amazon launches Q generative assistant

Amazon has introduced a groundbreaking AI-powered chatbot, Amazon Q, tailored for the dynamic needs of businesses. This innovative tool is set to redefine how enterprises interact, create content, and streamline their operations.

Amazon Q Interface
Amazon Q: Code explainability

Amazon Q is more than just a chatbot; it's a comprehensive AI assistant designed to understand and cater to the specific requirements of your business. Trained with over 17 years of Amazon Web Services (AWS) insights, Q is equipped to offer personalized assistance in various business environments. Amazon Q symbolizes a pivotal shift in AI adoption in enterprise environments.

Key Features and Highlights:

  • Integration and Learning: Amazon Q seamlessly integrates with a multitude of applications, learning and adapting to your business's unique structure, jargon, and product specifics.
  • Content Generation and Task Execution: Beyond answering queries, Q excels in creating compelling content like blog posts and executing tasks through its plugin capabilities.
  • Troubleshooting and Code Assistance: Q is not just about handling routine queries. It offers robust troubleshooting for network issues and integrates with CodeWhisperer for code-related tasks - a significant aid for your development team.
  • Cost-Effective Solution: Starting at an accessible price of $20 per user annually, Amazon Q promises to enhance your business operations without breaking the bank. Its compatibility with major applications like Salesforce, ZenDesk, and Gmail ensures smooth integration into your existing systems.
  • AWS-Nvidia Partnership Expansion: AWS will deploy Nvidia's GH200 chips, making AWS the first cloud provider to use these chips, giving them a significant competitive advantage over the competition in the cloud computing market.
  • Quantum Computing Advances: AWS is developing a new chip aimed at solving key quantum computing problems, and the team is hopeful it could significantly advance the usability of quantum computing.
  • Enhancements to Amazon Transcribe: AWS has upgraded its transcription platform with AI; now supporting transcription in 100 languages and offering new AI capabilities, such as enhancing speech-to-text functions for AWS Cloud applications.

These Amazon moves underscore a trend in AI toward optimization and efficiency. Most businesses are figuring out that for most purposes, they don't need or want to use the most capable models, like GPT-4 or Claude 2.1. Smaller, more efficient ones can be fenced in for specific business purposes, reducing hallucinations, increasing speed, and lowering costs.

Amazon Q severe hallucinations and confidential data leaks

In light of recent reports, it's essential to address some concerns surrounding Amazon's AI chatbot, Amazon Q. According to a report by Platformer, internal documents have raised alarms about the chatbot "experiencing severe hallucinations and leaking confidential data." This alarming revelation includes the potential disclosure of the locations of AWS data centers, internal discount programs, and unreleased features. The situation was classified as a "sev 2" incident, indicating its severity and urgency, requiring engineers to work extensively to resolve the issue.

While Amazon has maintained that no security breach occurred and emphasized Q's focus on security and privacy, internal communications suggest that the Q team is grappling with challenges related to digital sovereignty and other critical issues. This development comes as a surprise, especially considering Amazon's traditionally tight-lipped approach to the locations of its extensive data center network. As of December 2022, AWS reportedly owns and leases a combined total of over 33 million square feet of data center space globally. This incident underscores the need for ongoing vigilance and refinement in the realm of AI-driven enterprise solutions, particularly when handling sensitive information.



要查看或添加评论,请登录

社区洞察

其他会员也浏览了