Multimodal Race Begins
The epic battle of multimodal LLMs has only begun. The race is not just limited to Google or OpenAI, Meta and Stability AI are also inching towards it, massively.
So far, Stability AI has released three models – namely StableLM, Stable Diffusion and Stable Audio – and it is only a matter of time until they integrate everything into one powerful modal. In April, Stability AI, alongside its multimodal AI research lab DeepFloyd released DeepFloyd IF, a text-to-image cascade pixel diffusion model.?
Meta is likely to release a multimodal version of Llama really soon. In July, the company introduced CM3leon that does both text-to-image and image-to-text generation. In May, it also introduced ImageBind, a first of its kind AI model that is capable of binding data from six modalities (images and video, audio, text, depth, thermal and inertial measurement units (IMUs) at once, without the need for explicit supervision.?
It’s now or never for Meta. Last week, OpenAI finally announced the launch of multimodal ChatGPT,? integrating image features into the chatbot by integrating Dall-E 3 with ChatGPT Plus and ChatGPT Enterprise’, which will open up several new image-based applications for GPT-4, such as generating text to match images.?
Interestingly, OpenAI is unleashing the multimodal prowess of ChatGPT just before the launch of Google’s Gemini (due in the fall), which is also expected to have multimodal functionality. Well, OpenAI may have managed to touch the finish line before Google when it comes to the multimodal functionality of AI, but has it really won the race??
Google Wins the Internet?
Google claims that Gemini is five times more powerful than GPT-4. Besides its sheer computational might, Gemini flaunts an impressive array of talents and real multimodality, which means it can seamlessly handle text, code, images, and audio — pushing the boundaries of what AI can really do.
Whispers of Gemini being a behemoth are circulating, boasting a staggering 65 trillion parameters, and rumour has it that it was nurtured on a solid diet of YouTube data. With such capabilities, it's only natural to wonder if Gemini is poised to dethrone GPT-4 and live up to the hype Google has been building around it. Read: Google DeepMind Will Eclipse OpenAI
TOP STORIES OF THE WEEK >>
Collaborator, Not a Killer
AI tools like Copilot, Code Whisperer, and Codesense have eased the burden of developers, aiding in code completion, debugging, testing, tuning, and reviewing. However, a radical notion that AI could replace human programmers and revolutionise software development has met with scepticism.
The prevailing sentiment is that while AI holds immense potential, a comprehensive or even partial takeover of programming by AI seems improbable. Instead, developers who integrate AI into their workflow are better positioned for long-term success compared to those who resist its adoption.
In this landscape, a collaborative approach where AI complements human programmers appears more plausible than a complete AI takeover.
Intel is Serious?
Intel has revealed an ambitious AI-focused roadmap at Intel Innovation 2023, signalling its determination to compete with NVIDIA, AMD, and Apple in the AI arena. A highlight was the announcement of the "Meteor Lake" processor, set to launch in December, which aims to enable power-efficient AI acceleration and local inference on personal computers, making AI capabilities accessible without relying on cloud data centres. This aligns with Intel's vision of democratising AI with an "AI PC" concept. The company also introduced the Intel Developer Cloud platform for developers to test and deploy AI and high-performance computing applications.
Intel's three-pronged performance approach includes benchmarking CPUs, GPUs, and NPUs within a fixed power budget to compete with Apple and NVIDIA. Additionally, Intel is investing heavily in chip manufacturing, with plans to produce Panther Lake processors using advanced manufacturing processes in early 2024. Intel's strategic focus on AI and chip innovation positions it as a formidable contender in the AI landscape.
Read the full story here.
Oracle Predicts & Acts
Larry Ellison, CTO of Oracle, expressed optimism about generative AI during his keynote at Oracle CloudWorld 2023. Oracle is embracing generative AI through its cloud platform, aiming to democratise its use. Ellison believes generative AI is "probably" the most important new computer technology and emphasises that billions of dollars are already invested in the field.
Oracle is partnering with NVIDIA to build the largest scientific supercomputer, leveraging NVIDIA's H100s or GH100s. Ellison sees generative AI as transformational and central to Oracle's future endeavours. He addresses concerns about AI's potential dangers, asserting that humans will always decide how much power to cede to AI models.
AIM VIDEOS >>
Amid AI's pervasive influence across industries, the business management and accounting sector stands as a fertile ground for leveraging AI's potential. Tally, a traditional company, is actively engaging with AI to address various use cases, enhance productivity, tackle business challenges, and optimise outcomes.?
Vijayalakshmi from Analytics India Magazine sat with Nabendu Das, the Head of Engineering and Financial Services at Tally Solutions, and discussed the company's approach to staying relevant in this AI-driven era.
AIM Shots?>>