Sparkbit的动态

查看Sparkbit的组织主页

1,193 位关注者

What can a Multimodal Large Language Model do with an image? Lately, we have written about Apple’s Ferret open-source model. We talked a bit about how it works, so now it’s time to look at a use case. Ferret might be the first MLLM to perform a free-form referring and grounding. It means that its hallucination-to-correct answer ratio is very low. It refers to an object in an image and grounds the response on an object in an image very well. Look at the image below, as it represents how interactions within Ferret work. As the user selects points, boxes, or free-form objects, the model is able to identify elements, recognize relations between objects, and use the LLM part to synthesize more complex answers within a processed image. Are we much closer to a reliable AI assistant? --- For more insights on?#data?and?#MachineLearning, follow?Sparkbit?on Linkedin. If you're looking for a tech partner in your?#AI?projects, DM us or leave a message via the contact form on our website at?https://www.sparkbit.pl/

  • diagram

要查看或添加评论,请登录