登录查看更多内容

nebullvm 0.3.0 release

Nebuly

Analyze user behavior in LLM products

发布日期: 2022年5月10日

Check the library on GitHub ??

We are super excited to announce the new major release?nebullvm 0.3.0, where?nebullvm's AI inference accelerator becomes more powerful, stable and covers more use cases.

nebullvm?is an open-source library that generates an optimized version of your deep learning model that runs 2-10 times faster in inference without performance loss by leveraging multiple deep learning compilers (OpenVINO, TensorRT, etc.). With the new release 0.3.0, nebullvm can now accelerate inference up to 30x if you specify that you are willing to trade off a self-defined amount of accuracy/precision to get an even lower response time and a lighter model. This additional acceleration is achieved by exploiting optimization techniques that slightly modify the model graph to make it lighter, such as quantization, half precision, distillation, sparsity, etc.

Find tutorials and examples on how to use?nebullvm, as well as installation instructions in the?main readme?of?nebullvm?library. And check below if you want to learn more about

Overview of Nebullvm 0.3.0
Benchmarks
How the new Nebullvm 0.3.0 API Works
New Features & Bug Fixes

Non è stato fornito nessun testo alternativo per questa immagine

Overview of Nebullvm

With this new version, nebullvm continues in its mission to be:

???Easy-to-use. It takes a few lines of code to install the library and optimize your models.

???Framework agnostic.?nebullvm supports the most widely used frameworks (PyTorch, TensorFlow,???ONNX???and Hugging Face, etc.) and provides as output an optimized version of your model with the same interface (PyTorch, TensorFlow, etc.).

???Deep learning model agnostic.?nebullvm?supports all the most popular deep learning architectures such as transformers, LSTM, CNN and FCN.

???Hardware agnostic. The library now works on most CPU and GPU and will soon support TPU and other deep learning-specific ASIC.

???Secure.?Everything runs locally on your hardware.

??Leveraging the best optimization techniques. There are many inference techniques such as deep learning compilers,???quantization or half precision??, and soon sparsity and distillation, which are all meant to optimize the way your AI models run on your hardware.

Benchmarks

We have tested?nebullvm?on popular AI models and hardware from leading vendors.

领英推荐

TAI 131: OpenAI’s o3 Passes Human Experts; LLMs…

Towards AI 2 个月前

How should OpenAI price o1?

Ibbaka 5 个月前

8 Critical Fundamentals You Need to Know to Conquer…

scOS 2 年前

The table below shows the inference speedup provided by?nebullvm. The speedup is calculated as the response time of the unoptimized model divided by the response time of the accelerated model, as an average over 100 experiments. As an example, if the response time of an unoptimized model was on average 600 milliseconds and after?nebullvm optimization only 240 milliseconds, the resulting speedup is 2.5x times, meaning 150% faster inference.

A complete overview of the experiment and findings can be found on?this page.

Overall, the library provides great results, with more than 2x acceleration in most cases and around 20x in a few applications. We can also observe that acceleration varies greatly across different hardware-model couplings, so we suggest you test?nebullvm?on your model and hardware to assess its full potential. You can find the instructions below.

Besides, across all scenarios,?nebullvm?is very helpful for its ease of use, allowing you to take advantage of inference optimization techniques without having to spend hours studying, testing and debugging these technologies.

How the New Nebullvm API Works

With the latest release,?nebullvm?has a new API and can be deployed in two ways.

Option A: 2-10x acceleration, NO performance loss

If you choose this option,?nebullvm?will test multiple deep learning compilers (TensorRT, OpenVINO, ONNX Runtime, etc.) and identify the optimal way to compile your model on your hardware, increasing inference speed by 2-10 times without affecting the performance of your model.

Option B: 2-30x acceleration, supervised performance loss

Nebullvm?is capable of speeding up inference by much more than 10 times in case you are willing to sacrifice a fraction of your model's performance. If you specify how much performance loss you are willing to sustain,?nebullvm?will push your model's response time to its limits by identifying the best possible blend of state-of-the-art inference optimization techniques, such as deep learning compilers, distillation, quantization, half precision, sparsity, etc.

Performance monitoring is accomplished using the?perf_loss_ths?(performance loss threshold), and the?perf_metric?for performance estimation.

When a predefined metric (e.g.?"accuracy") or a custom metric is passed as the perf_metric argument, the value of perf_loss_ths will be used as the maximum acceptable loss for the given metric evaluated on your datasets (Option B.1).

When no?perf_metric?is provided as input,?nebullvm?calculates the performance loss using the default?precisionfunction. If the?dataset?is provided, the?precision?will be calculated on 100 sampled data (option B.2). Otherwise, the data will be randomly generated from the metadata provided as input, i.e.?input_sizes?and?batch_size?(option B.3).

Check out the?main GitHub readme?if you want to take a look at?nebullvm's performance and benchmarks, tutorials and notebooks on how to implement?nebullvm?with ease. And please leave a???if you enjoy the project and?join the Discord community?where we chat about?nebullvm?and AI optimization.

nebullvm 0.3.0 release

Nebuly

Analyze user behavior in LLM products

Overview of Nebullvm

Benchmarks

领英推荐

How the New Nebullvm API Works

Option A: 2-10x acceleration, NO performance loss

Option B: 2-30x acceleration, supervised performance loss

LLM user intelligence

397 位关注者

Nebuly的更多文章

社区洞察

其他会员也浏览了

Syntheseus: Microsoft's New Benchmarking Library Revolutionizes Retrosynthetic Planning

Top Machine Learning Tools and Frameworks to Boost Your Productivity

Issue #219 - THE ML ENGINEER ??

BASICS OF SECOND COMPONENT OF DEEP LEARNING ARCHITECTURE FOR CODERS:

Considering 'advanced' research in AI and related technologies? Here's a checklist to guide you...(Part 3)

Tensorflow

The Lang Project, Effective Visualization, LLM course, and More

o1-Preview?—?Everything You Need to Know About OpenAI’s New Model in 2024

The sound of war: Tensorflow VS Pytorch

Implementing AdaGrad Optimizer in Spark

Overview of Nebullvm

Benchmarks

领英推荐

How the New Nebullvm API Works

Option A: 2-10x acceleration, NO performance loss

Option B: 2-30x acceleration, supervised performance loss

LLM user intelligence

397 位关注者

Nebuly的更多文章

nebullvm 0.4.0 open-source release

nebulgym: open-source training accelerator release

Nebullvm, an open-source library to accelerate AI inference in a few lines of code

社区洞察

其他会员也浏览了

Syntheseus: Microsoft's New Benchmarking Library Revolutionizes Retrosynthetic Planning

Top Machine Learning Tools and Frameworks to Boost Your Productivity

Issue #219 - THE ML ENGINEER ??

BASICS OF SECOND COMPONENT OF DEEP LEARNING ARCHITECTURE FOR CODERS:

Considering 'advanced' research in AI and related technologies? Here's a checklist to guide you...(Part 3)

Tensorflow

The Lang Project, Effective Visualization, LLM course, and More

o1-Preview?—?Everything You Need to Know About OpenAI’s New Model in 2024

The sound of war: Tensorflow VS Pytorch

Implementing AdaGrad Optimizer in Spark