Mistral’s Pixtral 12B, More on “Strawberry”, Apple’s visual AI approach
Miko Pawlikowski ???
Follow for coding, bootstrapped startups & breakthroughs in tech. Founder, Engineer, Speaker.
Here’s what you’ve missed in AI this week.
??? Mistral introduces Pixtral 12B (1 min)
Mistral has made its entrance into the multimodal world with Pixtral 12B. This model, built on Nemo 12B and clearly harnessing 12 billion parameters, can answer queries about images shared through links or encoded ones and perform a variety of tasks related to pictures, such as captioning or identifying how many objects are in a photo.
Pixtral 12B is currently available on GitHub and Hugging Face, and it can easily be fine-tuned by users, as it's been developed under an Apache 2.0 licence.
?? “Strawberry” is closer than we thought (2 min)
The fruit-related theme continues as we receive more news about OpenAI's upcoming "Strawberry" model. Recent claims from two insiders who have tried it suggest a launch date closer than initially expected, with the model possibly making headlines in two weeks.
Strawberry will be integrated into ChatGPT, and what makes it stand out is a feature that allows it to "think" for 10 to 20 seconds before answering queries. This brief pause is designed to enhance its logical reasoning capabilities in complex areas like maths, programming, design, and new tasks.
?? Learn about Apple’s Visual Intelligence (1 min)
With the official announcement of the iPhone 16, we can finally say we're getting closer to Apple's full-on AI era. However, the excitement must wait until October, when iOS 18 becomes available with all the 'Apple Intelligence' features. One standout is Visual Intelligence, which resembles a concept we've already seen from OpenAI and Google.
领英推荐
Visual Intelligence, enabled through Camera Control, a new button for the iPhone 16 and 16 Pro, allows users to ask questions about their surroundings and gather information from pictures they've taken, such as dates and times from a flyer or identifying dog breeds.
??Every day, new open-source models compete for the top spot, so it's common to hear claims of different ones being the "leader." However, these claims are sometimes made too soon, and some models fail to live up to the hype. DeepSeek is taking its chance with the launch of DeepSeek-V2.5, now available on Hugging Face. Its benchmark results and performance earn it the "world's top."
???Google is transforming NotebookLM through "Audio Overview," intending to translate complex information into easy-to-digest audio. Basically, your documents can be turned into a "podcast" where two AI hosts highlight the most relevant aspects of your annotations, dive deeper, and summarise the material to enhance comprehension.