Apple’s MGIE, Multimodal AI Glasses, and Top AI Advancements
Here we have covered some of the most exciting updates in AI that will blow your mind. Read more to find out how Apple is innovating with AI, having AI super powers in the form of glasses feels like, Microsoft Copilot's fresh updates, and much more.
The AI Highlights
The Next Chapter in Google Gemini Era: Gemini Advanced
The Google Gemini model has been the subject to talk about in the AI world lately, with a focus on its language abilities, performance in various tasks, and comparisons to other models. It highlights several strengths of Gemini, including its ability to handle complex reasoning tasks, generate non-English languages, and outperform other models in certain benchmarks.
Gemini is now a step older with its advanced version and Bard has now been rebranded as Google Gemini. A brand new landing page, new apps, and a lot more has been pushed forward in the rebranding, with subtle warnings over the existing Google assistant.
Here are some details:
While the results achieved by Gemini are impressive, there are also discussions about the potential exaggeration of its capabilities in certain areas. Overall, Google Gemini provides valuable insights into its language abilities and performance across various tasks. Read More…
领英推荐
Natural Language-Prompted Image Editing with MGIE
Apple and UC Santa Barbara researchers have introduced MGIE, an open-source AI system that enables image editing through natural language commands. MGIE can reliably edit an image even if the user describes the changes to be made in natural language.
The system can handle common Photoshop adjustments like cropping, rotating, and filtering, as well as more advanced object manipulations, background replacement, and photo blending. MGIE optimizes images globally by adjusting properties.
How does MGIE use natural language prompts to improve image editing?
MGIE uses natural language prompts to improve image editing by incorporating multimodal large language models (MLLMs) to interpret text prompts and make pixel-level changes to photos.
The MLLMs are capable of cross-modal reasoning and responding appropriately to text, allowing MGIE to translate user commands into concise, unambiguous editing guidance.
For example in the image below, "make the sky more blue" becomes "increase the saturation of the sky region by 20%."
MGIE's versatile design empowers all kinds of image editing use cases, from common Photoshop adjustments like cropping, rotating, and filtering to more advanced object manipulations, background replacement, and photo blending. The system optimizes images globally by adjusting properties.
MGIE can understand a wide range of natural language prompts for image editing, including basic prompts like
Crop the image
Rotate the image
Apply a filter
It can also handle more complex tasks like "remove the background", "replace the sky", and "add a person to the photo".
The system can even understand ambiguous commands make it look better and make appropriate edits based on the context of the image.
Benefits of natural language prompts over traditional image editing methods
So, this is it.
Thanks for reading PixelBin Newsletter! There is much more to these updates, read the newsletter to unveil the most useful AI tools for you.
We’ll be back next week with fresh updates on AI, which we think you’ll love.
Meanwhile, if you have something to tell us, we are all ears.
Have suggestions or questions for us? Reach out to us at [email protected].
Follow Us for Everyday Highlights on Twitter, LinkedIn, and Instagram. Join our PixelBin discord community and engage in conversation with fellow AI enthusiasts.