Edge Insights #8 - Revolutionary Pruning Methods for LLMs/LDMs and Expansion of LaunchX
Edge Insights #8 - Nota AI?

Edge Insights #8 - Revolutionary Pruning Methods for LLMs/LDMs and Expansion of LaunchX

Highlights

? Expansion of supported devices for LaunchX

? Papers recognized at academic workshops

  • Shortened LLaMA: A Simple Depth Pruning for Large Language Models
  • LD-Pruner: Efficient Pruning of Latent Diffusion Models using Task-Agnostic Insights

? Global meetings & events

  • Embedded World 2024
  • tinyML Summit 2024
  • 2024 Embedded Vision Summit


? Expansion of supported devices for the LaunchX

We are pleased to announce a significant enhancement to the capabilities of LaunchX! LaunchX, a spin-off of our flagship product NetsPresso?, has recently broadened its range of supported devices by adding four new Arm's IP-based devices. This development represents a significant advancement in our mission to enable the seamless deployment of AI models across diverse hardware platforms.

LaunchX features effortless deployment of AI models to devices with its two main functionalities: Converter and Benchmarker.

Within our product lineup, LaunchX is a spin-off product that has been developed by bolstering two essential features of NetsPresso?: its Converter and Benchmarker functionalities. These functionalities ensures that AI models can be optimized for efficiency and effectiveness across different hardware configurations.

Four Arm's IP-based devices have been added to the list of devices supported by LaunchX.

In the past, LaunchX was compatible with 16 devices from Arm , 英伟达 , Raspberry Pi , and 英特尔 . However, we've recently broadened our device compatibility by adding four more Arm's IP-based devices:

This expansion opens up new possibilities for developers and engineers looking to leverage AI capabilities across a broader range of hardware environments. With LaunchX, you can seamlessly deploy and benchmark AI models on Arm's IP-based devices, empowering innovation and accelerating development in the field of AI.

Check out the benchmark results on LaunchX's newly updated devices at launchx.netspresso.ai! Also, stay tuned for more updates and innovations from LaunchX as we continue to push the boundaries of hardware-aware AI optimization.


? Papers recognized at academic workshops

Two papers authored by Nota AI? researchers have been accepted at academic workshops of ICLR, CVPR. These papers delve into the subjects of novel pruning methods for LLMs and LDMs, offering fresh perspectives and knowledge. Take the opportunity to explore these papers and gain valuable insights.


[Shortened LLaMA: A Simple Depth Pruning for Large Language Models]

The image illustrates the comparison of pruning units and efficiency of pruned LLaMA-7B models on an NVIDIA H100 GPU.

Authors: Bo-Kyeong Kim, Research Engineer, Nota AI

About the Paper:

Delve into the remarkable advancement of Large Language Models (LLMs). This paper introduces a novel approach to compressing LLMs by implementing one-shot removal of multiple Transformer blocks.

Our new depth pruning method is specifically designed to accelerate LLM inference under small-batch conditions on memory-limited local devices. By applying depth pruning, which involves one-shot removal of multiple Transformer blocks, we are able to achieve significant speed improvements while maintaining comparable zero-shot performance to recent width pruning methods.

With our depth pruning technique, we look forward to offer a promising solution that ensures LLMs remain transformative while also becoming more accessible and efficient.

Read the full article on our community platform: Shortened LLaMA: A Simple Depth Pruning for Large Language Models


[LD-Pruner: Efficient Pruning of Latent Diffusion Models using Task-Agnostic Insights]

Applying LD-Pruner to text-to-image generation and unconditional image generation tasks results in significant improvements in inference time and parameter count reduction while maintaining performance degradation.

Authors: Thibault Castells, Research Engineer, Nota AI

About the Paper:

Discover LD-Pruner, the latest innovation in model optimization developed by our R&D team at Nota AI. In this groundbreaking paper, we introduce LD-Pruner as a solution to the challenges encountered in deploying latent diffusion models (LDMs) on resource-constrained devices.

By focusing on preserving performance, our method reduces the cost of re-training latent diffusion models after pruning. By leveraging task-agnostic insights and the latent space, LD-Pruner achieves substantial improvements in inference time and parameter count reduction while minimizing performance degradation across diverse tasks.

LD-Pruner's development has opened up new possibilities for the widespread adoption of LDMs in various applications, bringing us closer to the goal of making powerful generative models available to a broader range of users and devices.

Read the full article on our community platform: LD-Pruner: Efficient Pruning of Latent Diffusion Models using Task-Agnostic Insights


? Global meetings & events

Explore the recent international conferences where Nota AI has actively participated. Delve into the notable moments from the past Embedded World event and gain insights into the forthcoming Embedded Vision Summit 2024.


[Embedded World 2024]

Collaboration between Nota and partner companies were showcased at the Arm, STMicroelectronics, and Future Electronics booths during the Embedded Vision Summit 2024.

At Embedded World 2024, held in NürnbergMesse GmbH from April 9th to 11th, Nota AI effectively demonstrated our cutting edge AI optimization technology in partnership with Arm, STMicroelectronics, and Future Electronics.

Throughout the event, attendees had the chance to directly observe the synergy between Nota and our three collaborators. The Arm and STMicroelectronics booths featured demonstration of NetsPresso?, Nota's AI optimization platform, as well as its spin-off, LaunchX. Visitors could also experience a live demonstration of Nota DMS, an edge AI solution built upon NetsPresso? at the Future Electronics booth.


[tinyML Summit 2024]

Demo tables and presentations were conducted by Nota AI at the tinyML Summit 2024.

Nota AI had the honor of being a silver sponsor at the tinyML Summit 2024, which occurred from May 7th to 9th in San Francisco.

During this event, we showcased the live demonstration of LaunchX, the ultimate Converter and Benchmarker, to global tech leaders. Moreover, attendees had the chance to explore our collaboration with Renesas at its demo table. On the second day, Shinkook Choi, our core research leader, delivered a presentation titled "Deploying Transformer-Based Models on Edge Devices using MicroNPUs Operator Converter," sharing valuable insights into edge AI model deployment.


[2024 Embedded Vision Summit]

Nota AI will participate as a gold sponsor at the upcoming Embedded Vision Summit 2024.

Nota AI will unveil our GenAI potentials at the upcoming Embedded Vision Summit 2024!

We are thrilled to announce our participation as a Gold Sponsor at the Embedded Vision Summit 2024, taking place from May 21st to 23rd at the Santa Clara Convention Center.

Don't miss the chance to visit Nota AI's booth to experience VLM&LLM demos and explore the potential and excellence we bring in the GenAI domain. Additionally, you can also experience firsthand the demos of our AI optimization platform, NetsPresso?, and our edge AI solution, DMS.

We look forward to seeing you there soon!

Learn more about the summit: https://embeddedvisionsummit.com/


?? Learn more about Nota AI?: https://notaai.notion.site/Nota-AI-5682c3f1a011453eb110949f3da0d26c?pvs=4

?? Subscribe to Edge Insights: https://bit.ly/44XuDKX

要查看或添加评论,请登录

Nota AI的更多文章

社区洞察

其他会员也浏览了