AI Newsletter
Ievgen Gorovyi
Founder & CEO @ It-Jim | AI Expert | PhD, Computer Vision | GenAI | AI Consulting
Another week - another cool updates in the world of AI!
?? Gemini 2.0
Google has just launched Gemini 2.0 Flash, an impressive upgrade to its AI lineup. Unlike Gemini 1.5, which was larger and trained on more data, Gemini 2.0 Flash is a more compact model that surprisingly outperforms its predecessor across various benchmarks. It achieves this by utilizing optimized algorithms and efficient data processing techniques, allowing it to run tasks at twice the speed of Gemini 1.5. Users can access Gemini 2.0 Flash for free, they can experiment with features like real-time voice conversations, webcam interactions, and seamless screen sharing.
?? Project Astra
Google’s Project Astra is set to redefine mobile AI by embedding advanced vision and auditory capabilities directly into smartphones. Tested on the Pixel 9, Astra functions as a highly intelligent assistant that can recognize objects, understand context from visual inputs, and respond to voice commands with remarkable accuracy. It leverages the Gemini 2.0 model to provide functionalities such as real-time translation, object identification, and contextual reminders. Additionally, Project Astra previews smart glasses equipped with heads-up displays, enabling users to receive notifications and information overlays without needing to hold their phones.
?? Project Mariner
Google’s Project Mariner is an innovative AI-driven browser assistant designed to automate repetitive online tasks, enhancing productivity and efficiency. For example, if you need to extract contact information from a list of companies in Google Sheets, Mariner can navigate through your browser tabs, visit each company’s website, and compile the necessary data automatically. It uses advanced natural language processing to understand and execute multi-step tasks, such as filling out forms, scraping data, and organizing information into spreadsheets. Although still in the experimental phase, Project Mariner showcases the potential for AI to handle complex browser-based activities, freeing up users to focus on more strategic and creative aspects of their work. Future updates may include expanded task capabilities and tighter integration with other Google Workspace tools.
?? Google Native Image Output
With the Gemini 2.0 update, Google is introducing native image generation and transformation capabilities directly within its AI models. This feature allows users to make specific modifications to images using natural language prompts. For instance, you can ask the AI to add a convertible top to a car photo, change the background of a landscape, or blend two different images seamlessly. Utilizing advanced generative adversarial networks (GANs) and image synthesis techniques, Gemini 2.0 can perform these edits in a conversational manner, making image manipulation more intuitive and accessible. It promises to bridge the gap between traditional graphic design tools and conversational AI.
?? Google Deep Research
Google’s Deep Research feature takes AI-powered research to the next level by enabling comprehensive web-based investigations. This tool can simultaneously analyze information from dozens of websites, academic papers, and online resources to generate detailed reports on complex topics. For example, if you need an in-depth analysis of quantum computing’s potential to break Bitcoin cryptography, Deep Research can aggregate data from 65+ sources, evaluate their credibility, and synthesize the information into a coherent report.
?? OpenAI’s Sora Release
OpenAI has launched Sora Turbo, its latest text-to-video generation tool that allows users to create short, 20-second videos based on their textual descriptions. Despite a rocky start with server overloads during its initial release, Sora Turbo has been optimized for better stability and performance. Users on the Pro Plan can generate up to 20-second videos, while those on the Plus Plan are limited to 10-second clips. Sora Turbo excels at creating specific scenes, such as a wolf howling at the moon, by interpreting detailed prompts to produce visually coherent videos. However, it still faces challenges with dynamic actions like dancing or gymnastics, indicating ongoing improvements in handling more complex movements and interactions within generated content.
?? ChatGPT Canvas
ChatGPT’s new Canvas feature transforms the traditional chat interface into a versatile workspace, enhancing both coding and creative tasks. Users can execute Python code directly within the chat, allowing for real-time code testing and debugging without leaving the conversation. Additionally, Canvas supports complex writing tasks, such as drafting poems or articles, with integrated tools for adjusting length, reading level, and adding final polish. The visual idea mapping feature enables users to organize their thoughts and projects visually, making it easier to brainstorm and develop ideas collaboratively.
?? ChatGPT & Apple Integration
ChatGPT has now seamlessly integrated with Apple’s ecosystem, enhancing the functionality of Siri on iPhones (iOS 18.2 and newer) and Macs. Users can now prompt Siri to utilize ChatGPT for more intelligent and context-aware responses, significantly improving the quality of information and assistance provided. On Mac computers, Siri can share screen content with ChatGPT, allowing the AI to offer more accurate and relevant help based on what’s displayed on the screen. This integration leverages Apple’s robust hardware and software infrastructure, making AI-driven assistance a more natural and powerful part of daily device usage.
?? ChatGPT Advanced Voice with Vision
OpenAI has enhanced ChatGPT’s advanced voice mode by adding vision capabilities, allowing the AI to interpret and discuss visual inputs captured through a camera. Users can show objects, book pages, or their surroundings to receive immediate and relevant feedback. For example, you can display a page from a book and ask ChatGPT to summarize its content or identify key points. This feature utilizes cutting-edge computer vision algorithms to analyze visual data in real-time, providing contextually appropriate responses based on what the AI "sees."
?? ChatGPT with Santa Claus
OpenAI has introduced a Santa Claus persona within the ChatGPT app, allowing users to engage in playful and themed conversations. You can now chat with Santa, asking him about his Christmas Eve journey, how many houses he visits, or even playful questions about being naughty or nice. This feature leverages natural language processing to create a believable and entertaining Santa character, adding a touch of holiday magic to the AI experience.
?? Anthropic Claude Haiku 3.5
Anthropic has quietly released Claude 3.5 Haiku, an optimized version of their AI model designed for faster responses and lower operational costs. Claude 3.5 Haiku is a smaller, more efficient model that maintains high performance while being more accessible for applications requiring quick, on-the-go interactions. It leverages improved training techniques and a streamlined architecture to deliver reliable outputs without the computational overhead associated with larger models. This makes Claude 3.5 Haiku ideal for scenarios where speed and cost-effectiveness are paramount, such as customer service chatbots, mobile applications, and real-time data analysis.
?? Grok’s New Image Generator
X’s Grok has launched its own image generation model, moving away from relying on external diffusion models like Flux and Stable Diffusion. The new Grok Image Generation uses an autoregressive mixture of experts network, which predicts the next token from interleaved text and image data to create visually appealing images. While it may not yet match the photorealism of some competitors, Grok produces vibrant and aesthetically pleasing results with accurate details and colors. Additionally, Grok supports multimodal inputs, allowing users to blend or edit images based on their prompts.
?? MidJourney Patchwork
MidJourney has introduced Patchwork, a collaborative tool designed to enhance the creative process for AI-generated art. Patchwork functions as a large digital canvas where users can generate images, place them on the canvas, and collaborate with others by adding notes, comments, and annotations. This tool is perfect for teams working on storyboarding, brainstorming, or developing visual narratives, as it allows for real-time collaboration and idea sharing.
?? Adobe Removes Reflections
Adobe has released a new AI-powered tool that effectively removes unwanted reflections from photos taken through glass surfaces. This feature targets reflections in raw image formats like JPEG and HEIC, allowing photographers to achieve cleaner, glare-free images effortlessly. Utilizing advanced image processing algorithms, the tool distinguishes between the subject and the reflection, seamlessly eliminating the latter without compromising the quality or integrity of the main image. This is particularly useful for architectural photography, product shots, and any scenario where reflections can detract from the desired visual.
领英推荐
?? YouTube’s New Dubbing Feature
YouTube has expanded its dubbing capabilities, enabling creators to add translated audio tracks to their videos more seamlessly. This feature leverages advanced speech synthesis and machine translation technologies to provide accurate and natural-sounding voiceovers in multiple languages. By automatically synchronizing dubbed audio with the original video content, YouTube makes it easier for creators to reach a global audience without the need for extensive manual editing. This enhancement not only broadens the accessibility of content but also improves the viewing experience for users who prefer or require content in different languages.
?? Pika Releases V2 Generation Model
Pika has unveiled version 2 of their AI generation model, introducing the innovative "Ingredients" feature. This allows users to stack multiple object photos within a single frame based on their prompts, enhancing the creativity and complexity of generated images. The announcement describes the release as "twelve days of gifts in one," highlighting its comprehensive capabilities. However, these new features and the upgraded model are exclusive to the Pro subscription tier, priced at $35, which may be a barrier for some users seeking advanced functionalities.
?? OpenAI Introduces Devin Subscription at $500
OpenAI has launched a premium subscription for Devin, an AI tool designed to handle multiple tasks simultaneously. Priced at $500, this subscription offers 250 ACUs (local credits), equating to approximately 60 hours of AI work. Users have reported mixed experiences, noting that while Devin performs adequately with simple tasks, it often stalls during more complex operations. Additionally, a significant vulnerability was discovered during a live stream, where Devin inadvertently exposed an API key, raising concerns about security and reliability. Despite these issues, the release of the o1-pro model promises more affordable and enhanced performance options in the future.
?? InvSR: New Image Upscale
InvSR, a new image upscaler, has been launched, offering a process similar to the popular Upscayl tool. Unlike traditional img2img models, InvSR focuses on preserving existing details without inventing new ones, which results in more reliable image enhancements. However, it falls short compared to Magnific, which excels in extracting finer details. Users can choose to install InvSR locally via GitHub or experiment with it online on Hugging Face.
?? Google Unveils Quantum Chip
Google has introduced Willow, a groundbreaking quantum chip that addresses a 30-year-old challenge in quantum error correction. According to CEO of Google and Alphabet, Willow exponentially reduces errors as the number of qubits increases, a significant advancement in quantum computing. In performance tests, Willow completed a standard calculation in under 5 minutes, whereas a leading supercomputer would take over 10^25 years—a timeframe vastly exceeding the universe's age.
?? Microsoft Launches Phi-4 Generative AI Model
Microsoft has revealed Phi-4, the latest addition to its Phi family of generative AI models, currently available in a research preview through the Azure AI Foundry platform. With 14 billion parameters, Phi-4 demonstrates significant improvements over its predecessors, particularly in solving mathematical problems, thanks to higher quality training data that includes both synthetic and human-generated content. Competing with models like GPT-4o mini, Gemini 2.0 Flash, and Claude 3.5 Haiku, Phi-4 offers a balance of speed and cost-effectiveness. Notably, Phi-4 is the first model released after the departure of Sébastien Bubeck, a key figure in Microsoft's AI development, who has moved to OpenAI.
Noteworthy papers:
We also have an amazing team of AI engineers with:
We are here to help you maximize efficiency with your available resources.
Reach out when:
Have doubts or many questions about AI in your business? Get in touch! ??