nebullvm 0.4.0 open-source release

nebullvm?is an open-source tool designed to speed up AI inference in just a few lines of code.?nebullvm?boosts your model to achieve the maximum acceleration that is physically possible on your hardware.

We are building a new AI inference acceleration product leveraging state-of-the-art open-source optimization tools enabling the optimization of the whole software to hardware stack. If you like the idea, give us a star to support the project??

The core?nebullvm?workflow consists of 3 steps:

Select: input your model in your preferred DL framework and express your preferences regarding:

Accuracy loss: do you want to trade off a little accuracy for much higher performance?
Optimization time: stellar accelerations can be time-consuming. Can you wait, or do you need an instant answer?

Search:?nebullvm?automatically tests every combination of optimization techniques across the software-to-hardware stack (sparsity, quantization, compilers, etc.) that is compatible with your needs and local hardware.

Serve: finally,?nebullvm?chooses the best configuration of optimization techniques and returns an accelerated version of your model in the DL framework of your choice.

API quick view

Only a single line of code is needed to get your accelerated model:

import torch
import torchvision.models as models
from nebullvm.api.functions import optimize_model

# Load a resnet as example
model = models.resnet50()

# Provide an input data for the model
input_data = [((torch.randn(1, 3, 256, 256), ), 0)]

# Run nebullvm optimization in one line of code
optimized_model = optimize_model(
    model, input_data=input_data, optimization_time="constrained"
)

# Try the optimized model
x = torch.randn(1, 3, 256, 256)
res = optimized_model(x)

For more details, please visit?Installation?and?Get started.

How it works

We are not here to reinvent the wheel, but to build an all-in-one open-source product to master all the available AI acceleration techniques and deliver the?fastest AI ever.?As a result,?nebullvm?leverages available enterprise-grade open-source optimization tools. If these tools and communities already exist, and are distributed under a permissive license (Apache, MIT, etc), we integrate them and happily contribute to their communities. However, many tools do not exist yet, in which case we implement them and open-source the code so that the community can benefit from it.

Product design

nebullvm?is shaped around?4 building blocks?and leverages a modular design to foster scalability and integration of new acceleration components across the stack.

?Converter:?converts the input model from its original framework to the framework backends supported by?nebullvm, namely PyTorch, TensorFlow, and ONNX. This allows the Compressor and Optimizer modules to apply any optimization technique to the model.
?Compressor:?applies various compression techniques to the model, such as pruning, knowledge distillation, or quantization-aware training.
?Optimizer:?converts the compressed models to the intermediate representation (IR) of the supported deep learning compilers. The compilers apply both post-training quantization techniques and graph optimizations, to produce compiled binary files.
?Inference Learner:?takes the best performing compiled model and converts it to the same interface as the original input model.

The?compressor?stage leverages the following open-source projects:

Intel/neural-compressor: targeting to provide unified APIs for network compression technologies, such as low precision quantization, sparsity, pruning, knowledge distillation, across different deep learning frameworks to pursue optimal inference performance.
SparseML: libraries for applying sparsification recipes to neural networks with a few lines of code, enabling faster and smaller models.

The?optimizer stage?leverages the following open-source projects:

Apache TVM: open deep learning compiler stack for cpu, gpu and specialized accelerators.
BladeDISC: end-to-end Dynamic Shape Compiler project for machine learning workloads.
DeepSparse: neural network inference engine that delivers GPU-class performance for sparsified models on CPUs.
OpenVINO: open-source toolkit for optimizing and deploying AI inference.
ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
TensorRT: C++ library for high performance inference on NVIDIA GPUs and deep learning accelerators.
TFlite?and?XLA: open-source libraries to accelerate TensorFlow models.

Documentation

Community

Discord: best for sharing your projects, hanging out with the community and learning about AI acceleration.
Github issues: ideal for suggesting new acceleration components, requesting new features, and reporting bugs and improvements.

We’re developing?nebullvm?together with our community so the best way to get started is to pick a?good-first issue. Please read our?contribution guidelines?for a deep dive on how to best contribute to our project!

Don't forget to leave a star???to support the project and happy acceleration???

nebullvm 0.4.0 open-source release

Nebuly

Analyze user behavior in LLM products

API quick view

How it works

Product design

Documentation

Community

LLM user intelligence

397 位关注者

Nebuly的更多文章

API quick view

How it works

Product design

Documentation

Community

LLM user intelligence

397 位关注者

Nebuly的更多文章

nebulgym: open-source training accelerator release

nebullvm 0.3.0 release

Nebullvm, an open-source library to accelerate AI inference in a few lines of code