1-bit LLMs and Sustainability: The New Titans of Small Language Models

1-bit LLMs and Sustainability: The New Titans of Small Language Models

Language models, especially Large Language Models (LLMs) have been instrumental in shaping the dynamic landscape of artificial intelligence, significantly impacting how models comprehend and produce text that seems like human language. The new entrants in this field are 1-bit LLMs that have brought a shift in our thoughts since the launch of Small Language Models (SLMs). The first example of this model is BitNet 1.58 running at just over 1-bit making waves in the sustainable AI space, offering high performance with minimal resources unlike their LLM counterparts.?

The Epoch of 1-bit LLMs?

1-bit Large Language Models are a new group of models that use particularly low-bit values to denote their weights, i.e. every single parameter or weight of the model is stored in only 1-bit {0 or 1}. While the established LLMs in general use a 16-bit or even 32-bit Floating Point to denote the model’s parameters or weights. These parameters or weights control how the model processes information. This drops the size of the 1-bit LLMs considerably reducing computational and storage resources, allowing even mobile devices to run these models.?

BitNet b1.58 is founded on the BitNet architecture but uses more than 1-bit as the name suggests. It uses more than 1-bit for every single parameter or weight of the LLM is ternary i.e. 3 {-1, 0, 1}. The further 0 value (compared to true 1-bit implementations) is a vital element that boosts the model’s performance. This might seems severely constraining but it has numerous advantages. It is trained from the beginning, with 1.58-bit weights and 8-bit activations with some modifications to 1-bit.?

Quantization Function. The quantization function in 1-bit LLMs scales the weight matrix by its average absolute value and rounds each value to -1, 0, or +1. For activations, it follows the same process, but scales all activations to [?Qb, Qb] per token, eliminating zero-point quantization. This approach is simple, convenient for optimisation, and has minimal impact on performance.?

LLaMA-like Elements. In practice, the design of LLaMA has been the pillar for open-source LLMs. The design of BitNet b1.58 implements the LLaMA-alike elements in-order to support the open-source community. Therefore, BitNet b1.58 can be merged into the standard open-source models with least effort.?

Advantages of 1-bit LLMs?

The basic account of weights in a 1-bit LLM explains the quicker inference speeds – the method of generating text, interpreting languages, or executing other language-related tasks. Easier computations mean the models reason and reply much quicker. The computational competence of 1-bit LLMs also indicates lower energy consumption, enabling them to be more environmentally friendly and cost-effective to run.?

Sustainability in AI?

To what extent is our AI ecologically friendly? This question is being contemplated not only by us, but also by numerous business leaders, researchers, practitioners, and policymakers alike. Among AI models, LLMs especially have become formidable and worldwide in many domains, such as generative AI, conversational AI, machine vision and learning, deep learning, knowledge graphs, and many more. Nevertheless, these models also come with a massive ecological price, as they need sizable datasets and extensive computational resources to train and operate. Also, the fine-tuning of AI models can have a steep ecological impact than that of initial training.??

Table 1. Zero-shot accuracy of BitNet b1.58 and LLaMA LLM across the end tasks

Both models showcase competitive performance, and the selection between them may depend on specific use cases and resource limitations. BitNet b1.58 matchable accuracy and efficiency (see Table 1) make it a convincing contender amongst their LLM counterparts.?

At this juncture, 1-bit LLM such as BitNet b1.58 can help reduce the ecological overload on our environments with reduced computational resources to run them. On top, they only need smaller and more niche datasets for fine-tuning. Currently there is no specific information on its public availability for testing.?

Figure 1. Interpreting latency, memory, and energy consumption of BitNet b1.58 rivalled to LLaMA

More powerfully, the 1.58-bit LLM outlines a new scaling law and process for training new groups of LLMs that are equally high-performance and cost-effective.??

While comparing 1-bit LLMs to full-precision traditional LLMs for perplexity and end-task performance, 1-bit LLMs can be significantly more cost-effective in terms of latency, memory, and energy consumption (see Figure 1). This positions 1-bit models as more sustainable replacements.?

  • 13B BitNet b1.58 is more capable than 3B Floating Point (FP) 16-bit LLM, for latency, memory, and energy consumption.?

  • 30B BitNet b1.58 is more capable than 7B FP16 LLM, for latency, memory, and energy consumption.?

  • 70B BitNet b1.58 is more capable than 13B FP16 LLM, for latency, memory, and energy consumption.?

The Future?

The advent of 1-bit Low Latency Models signifies a pivotal moment in the democratisation of advanced AI models. This innovation enables end-users to leverage their own Graphical Processing Units (GPUs), thereby broadening access to these powerful tools. This novel approach, particularly within the realm of Generative AI and natural language processing, has the potential to transform the field and its myriad use cases.?

This exploration has revealed the model’s potential to enhance the sustainability of AI models. While it may not offer a flawless solution, it represents a well-intentioned and significant stride towards a more sustainable future in AI. This development underscores the commitment to continually pushing the boundaries of what is possible in AI, while also considering the environmental implications of our advancements. As we continue to innovate, we remain dedicated to fostering an inclusive, sustainable, and transformative AI landscape.?

要查看或添加评论,请登录

Rosemary J Thomas, PhD的更多文章

社区洞察

其他会员也浏览了