Multimodal Race Begins

Multimodal Race Begins

The epic battle of multimodal LLMs has only begun. The race is not just limited to Google or OpenAI, Meta and Stability AI are also inching towards it, massively.

So far, Stability AI has released three models – namely StableLM, Stable Diffusion and Stable Audio – and it is only a matter of time until they integrate everything into one powerful modal. In April, Stability AI, alongside its multimodal AI research lab DeepFloyd released DeepFloyd IF, a text-to-image cascade pixel diffusion model.?

Meta is likely to release a multimodal version of Llama really soon. In July, the company introduced CM3leon that does both text-to-image and image-to-text generation. In May, it also introduced ImageBind, a first of its kind AI model that is capable of binding data from six modalities (images and video, audio, text, depth, thermal and inertial measurement units (IMUs) at once, without the need for explicit supervision.?

It’s now or never for Meta. Last week, OpenAI finally announced the launch of multimodal ChatGPT,? integrating image features into the chatbot by integrating Dall-E 3 with ChatGPT Plus and ChatGPT Enterprise’, which will open up several new image-based applications for GPT-4, such as generating text to match images.?

Interestingly, OpenAI is unleashing the multimodal prowess of ChatGPT just before the launch of Google’s Gemini (due in the fall), which is also expected to have multimodal functionality. Well, OpenAI may have managed to touch the finish line before Google when it comes to the multimodal functionality of AI, but has it really won the race??

Google Wins the Internet?

Google claims that Gemini is five times more powerful than GPT-4. Besides its sheer computational might, Gemini flaunts an impressive array of talents and real multimodality, which means it can seamlessly handle text, code, images, and audio — pushing the boundaries of what AI can really do.

Whispers of Gemini being a behemoth are circulating, boasting a staggering 65 trillion parameters, and rumour has it that it was nurtured on a solid diet of YouTube data. With such capabilities, it's only natural to wonder if Gemini is poised to dethrone GPT-4 and live up to the hype Google has been building around it. Read: Google DeepMind Will Eclipse OpenAI


TOP STORIES OF THE WEEK >>

Collaborator, Not a Killer

AI tools like Copilot, Code Whisperer, and Codesense have eased the burden of developers, aiding in code completion, debugging, testing, tuning, and reviewing. However, a radical notion that AI could replace human programmers and revolutionise software development has met with scepticism.

The prevailing sentiment is that while AI holds immense potential, a comprehensive or even partial takeover of programming by AI seems improbable. Instead, developers who integrate AI into their workflow are better positioned for long-term success compared to those who resist its adoption.

In this landscape, a collaborative approach where AI complements human programmers appears more plausible than a complete AI takeover.

Read the full story here.


Intel is Serious?

Intel has revealed an ambitious AI-focused roadmap at Intel Innovation 2023, signalling its determination to compete with NVIDIA, AMD, and Apple in the AI arena. A highlight was the announcement of the "Meteor Lake" processor, set to launch in December, which aims to enable power-efficient AI acceleration and local inference on personal computers, making AI capabilities accessible without relying on cloud data centres. This aligns with Intel's vision of democratising AI with an "AI PC" concept. The company also introduced the Intel Developer Cloud platform for developers to test and deploy AI and high-performance computing applications.


Intel's three-pronged performance approach includes benchmarking CPUs, GPUs, and NPUs within a fixed power budget to compete with Apple and NVIDIA. Additionally, Intel is investing heavily in chip manufacturing, with plans to produce Panther Lake processors using advanced manufacturing processes in early 2024. Intel's strategic focus on AI and chip innovation positions it as a formidable contender in the AI landscape.

Read the full story here.


Oracle Predicts & Acts

Larry Ellison, CTO of Oracle, expressed optimism about generative AI during his keynote at Oracle CloudWorld 2023. Oracle is embracing generative AI through its cloud platform, aiming to democratise its use. Ellison believes generative AI is "probably" the most important new computer technology and emphasises that billions of dollars are already invested in the field.

Oracle is partnering with NVIDIA to build the largest scientific supercomputer, leveraging NVIDIA's H100s or GH100s. Ellison sees generative AI as transformational and central to Oracle's future endeavours. He addresses concerns about AI's potential dangers, asserting that humans will always decide how much power to cede to AI models.


AIM VIDEOS >>

Amid AI's pervasive influence across industries, the business management and accounting sector stands as a fertile ground for leveraging AI's potential. Tally, a traditional company, is actively engaging with AI to address various use cases, enhance productivity, tackle business challenges, and optimise outcomes.?

Vijayalakshmi from Analytics India Magazine sat with Nabendu Das, the Head of Engineering and Financial Services at Tally Solutions, and discussed the company's approach to staying relevant in this AI-driven era.


AIM Shots?>>

  • Indian spacetech startup, SatSure, is set to invest $35 million to launch four earth imaging satellites by 2025. This move follows the successful execution of edge processing in space. The project aims to make the company self-sufficient in generating its satellite imagery rather than relying on foreign space agencies.

  • ServiceNow has introduced Now Assist, an AI-powered suite aimed at boosting productivity and user experience in the IT, customer service, HR, and development sectors. It leverages the company's generative AI engine and domain-specific "Now LLM" for enterprise-level productivity and data privacy.?

  • Oracle has enhanced its cloud-based data analytics service, Oracle Analytics Cloud, with generative AI data interactions, enabling users to query data using natural language and AI-driven avatars. These avatars, developed in partnership with Synthesia, can serve as news readers to convey data stories.?

  • Microsoft is introducing Kosmos-2.5, a multimodal AI model focused on reading text-intensive images. Building on its predecessors, Kosmos-2.5 has undergone extensive training on large datasets, excelling in spatially-aware text block generation and structured markdown text output. This model enhances image-text understanding by accurately assigning spatial coordinates to text blocks and presenting extracted text in a structured format.

  • A group of 17 authors, including George RR Martin, known for the 'A Song of Ice and Fire' series, has filed a lawsuit against OpenAI, claiming data infringement. They accuse ChatGPT of flagrant copyright infringement, alleging the generation of infringing content, such as a prequel to 'Game of Thrones' using identical characters.

Harshad Dhuru

CXO Relationship Manager

1 年

thank u so much for sharing.

回复

要查看或添加评论,请登录

AIM的更多文章

社区洞察

其他会员也浏览了