登录查看更多内容

Introducing InkyMM: The First Commercial Open Source Multimodal Model

OctoAI (Acquired by NVIDIA)

Run, tune, and scale the models that power AI applications.

发布日期: 2023年5月30日

Today, OctoML is announcing ?? InkyMM, the first open-source, fully commercializable Image + Text LLM, built upon the great work of researchers at King Adbullah University and the MPT-7B Instruct published by MosaicML.

We've captured the highlights in this article, but if you want the deep dive, read the full post here ??

Computer Vision has Been Hard Work

Deep learning has made computer vision much more powerful, but still not?easy: getting good performance still usually depends upon gathering large amounts of training data, selecting the right model architecture, and training that model on your data.

For example, when Octonaut Ben?created a cat door that locks his cat out when he is trying to bring in a “present”, he had to gather and hand-label more than?22,000 images:

The same thing goes for any computer vision use case. Recent computer vision competitions have featured detecting player collisions in the NFL, finding ancient Roman ink in ash fragments from Vesuvius, or predicting whether a piece of clothing will look good on a customer.

But all of these require extensive training sets. There have been many advancements in trying to make this pain easier—better labeling tools, self- and semi-supervised learning techniques. But getting good performance almost always means painstaking data gathering.

Furthermore, a model trained on such datasets will usually not go beyond them—it won’t deal well with novel types of images, and it certainly can’t have a chat with you about whether that blazer you’re wearing is in fashion at the moment.

Multimodal learning in machine learning is?a type of learning where the model is trained to understand and work with multiple forms of input data, such as text, images, and audio.?

The Holy Grail: Zero-Shot Image Models

There have been many recent attempts to create image labeling models that can work as “zero-shot” detectors — meaning they can label any image without getting training data. Popular image captioning models that seek to do so include OpenAI’s CLIP, Salesforce’s BLIP, and ViLT.

All of these are impressive in their own right, but their ability to reason about images is very limited, and sometimes they really miss the point. Below is an example of how BLIP handles a question that could be useful for any e-commerce company:

领英推荐

An introduction to machine learning for images and…

Algolia 1 年前

??Top ML Papers of the Week'

DAIR.AI 4 个月前

GPT4 Turbo vs. GPT 4o: Which New Model Is King?

CapeStart 7 个月前

Since the release of BLIP, many others have released models with zero-shot, multimodal capabilities:

March 14th, 2023: OpenAI announced that GPT-4 can respond to both images and text. OpenAI is currently testing this capability with a single customer, Be My Eyes, and has not released it to the general market.

April 20, 2023: Researchers at King Abdullah University released MiniGPT-4, a multimodal LLM built on top of BLIP and Llama. MiniGPT-4 shows impressive capability improvement by combining an LLM with an image captioning model, though, it has some flaws, including being slow and hallucinating in its responses.

May 10, 2023: Google announces that Bard will soon have multimodal capabilities.

The only open source model from the list above is MiniGPT-4. However, this open source contribution has a fatal flaw: It cannot be commercialized. MiniGPT-4 is based upon Vicuna, which was trained on prompts and responses scraped from ChatGPT. As such, it cannot be used for commercial purposes.

Enter OctoML with InkyMM

Since we’re all about making models ready for commercial enterprise, we decided to try an experiment: what if we could replicate MiniGPT-4’s training process, but attach it to a fully commercializable language model instead of Vicuna? To do so, we anchored on Mosaic ML’s MPT-7B Instruct model, a commercially available LLM that is trained upon instruction-following datasets. After a surprisingly short amount of effort, we were able to succeed in creating ??InkyMM, a multi-modal LLM with no legal encumbrances, ready for commercial use.

Early access users of the OctoML compute service can also access ??InkyMM endpoints for application development.?

Join the early access program here ??????

We should note a few things:

Since MPT-7B Instruct is not as strong an LLM as Vicuna, ??InkyMM has more of a “hallucination” issue than MiniGPT-4. We’re working on it!
The web version of ??InkyMM is still slow—we have yet to deploy techniques to accelerate this model pipeline but will deploy an accelerated version into our compute service once it’s running super fast.

We are excited to release an API endpoint version for application developers in June!

Madrona

1 年

?? ??

1 次回应

Dennis Daniels

1 年

I have a white beard! I didn't know that! :)

Yujian Tang

AI Hacker

1 年

hey I remember seeing this demo'd at your Tuesday event, looks super cool and congrats to the team

3 次回应

Harry Kim

ML inference product

1 年

I just tried it and it's much faster than MiniGPT. Congrats OctoML team :) What HW are you running the InkyMM for inference?

3 次回应

Harry Kim

ML inference product

1 年

I tried out MiniGPT before so I couldn't be more excited about this particular release :) Thank you OctoML team!

2 次回应

查看更多评论

要查看或添加评论，请登录

OctoAI (Acquired by NVIDIA)的更多文章

See all articles

Introducing InkyMM: The First Commercial Open Source Multimodal Model

OctoAI (Acquired by NVIDIA)

Run, tune, and scale the models that power AI applications.

Computer Vision has Been Hard Work

The Holy Grail: Zero-Shot Image Models

领英推荐

Since the release of BLIP, many others have released models with zero-shot, multimodal capabilities:

Enter OctoML with InkyMM

OctoAI (Acquired by NVIDIA)的更多文章

社区洞察

其他会员也浏览了

Machine Learning: An Introduction to the Subset of Artificial Intelligence

Machine Learning Vs Human Intelligence: Can Machines Outsmart Us?

Exploring the World of Machine Learning: A Guide and Insights

OpenAI o1: A Leap into Inference-Time Scaling

Meta and Google Researchers' New Data Curation Method Could Transform Self-Supervised Learning

4 Ways to Tackle the Lack of Machine Learning Datasets

Machine Learning and Artificial Intelligence: Are they the same?

Machine Learning: Revolutionizing Technology and Society

The evolution of LLMs within the Enterprise will be different from that outside the enterprise.

An In-Depth Introduction to Machine Learning: Types, Algorithms, and Real-World Use Cases

Computer Vision has Been Hard Work

The Holy Grail: Zero-Shot Image Models

领英推荐

Since the release of BLIP, many others have released models with zero-shot, multimodal capabilities:

Enter OctoML with InkyMM

OctoAI (Acquired by NVIDIA)的更多文章

Build amazing eCommerce apps with new OctoAI Image Gen Solution

OctoAI is now GA ??

Making the Llama 2 Herd Work for You on OctoAI

OctoAI Now Provides Fastest Stable Diffusion XL Endpoint

OctoML Welcomes Tony Tzeng as Chief Product Officer

??Train Your Own Custom Stable Diffusion Model with Automatic 1111 on OctoAI

Build LLM Apps With Open Source Models Using OctoAI & LangChain

OctoAI Compute Service Launch Recap

Running the Industry’s Most Cost Effective LLaMA 65B on OctoAI

How to Run Stable Diffusion 3X Faster for 5X Less

社区洞察

其他会员也浏览了

Machine Learning: An Introduction to the Subset of Artificial Intelligence

Machine Learning Vs Human Intelligence: Can Machines Outsmart Us?

Exploring the World of Machine Learning: A Guide and Insights

OpenAI o1: A Leap into Inference-Time Scaling

Meta and Google Researchers' New Data Curation Method Could Transform Self-Supervised Learning

4 Ways to Tackle the Lack of Machine Learning Datasets

Machine Learning and Artificial Intelligence: Are they the same?

Machine Learning: Revolutionizing Technology and Society

The evolution of LLMs within the Enterprise will be different from that outside the enterprise.

An In-Depth Introduction to Machine Learning: Types, Algorithms, and Real-World Use Cases