Digixvalley Granite 3.0: open, state-of-the-art Enterprise Models

Digixvalley Granite 3.0: open, state-of-the-art Enterprise Models

At the forefront of the Nova 3.0 lineup is our newly enhanced, instruction-tuned, dense decoder-only model: Nova 3.0 8B Instruct. Trained through an innovative two-phase approach on over 12 trillion tokens of meticulously selected data spanning 12 different natural languages and 116 programming languages, the developer-friendly Nova 3.0 8B Instruct serves as a robust foundation for complex workflows and tool-driven use cases. This model not only meets but exceeds leading benchmarks for similarly-sized models in academic settings, particularly excelling in enterprise task performance and safety.

By fine-tuning smaller, purpose-built models like Nova, enterprises can achieve cutting-edge performance at a significantly reduced cost. Customizing Nova models to meet your organization’s specific requirements through InstructLab; a collaborative, open-source initiative that enhances model capabilities with systematically generated synthetic data and phased-training methods; can further streamline costs and timelines.

In line with Digixvalley’s strong dedication to open-source principles, all Nova models are released under the flexible Apache 2.0 license, setting us apart from the growing trend of proprietary licensing in the industry. Additionally, we are committed to transparency by providing an extensive disclosure of our training datasets and methodologies in the Nova 3.0 technical documentation, reinforcing Digixvalley’s pledge to build trust, safety, and clarity in AI solutions.

The Digixvalley Nova 3.0 Release Encompasses:

Versatile, general-purpose LLMs: Nova-3.0-8B-Instruct, Nova-3.0-8B-Base, Nova-3.0-2B-Instruct, and Nova-3.0-2B-Base.

Input-output guardrail models based on LLMs: Nova-Guardian-3.0-8B and Nova-Guardian-3.0-2B.

Mixture of experts (MoE) models for minimal latency: Nova-3.0-3B-A800M-Instruct and Nova-3.0-1B-A400M-Instruct.

Speculative decoder to enhance inference speed and efficiency: Nova-3.0-8B-Instruct-Accelerator.

Looking ahead to 2024, we plan to expand all model context windows to 128K tokens, enhance multilingual support across 12 natural languages, and introduce multimodal capabilities with image-in, text-out functionality.

The Nova 3.0 8B Instruct and Nova 3.0 2B Instruct models, along with both Guardian 3.0 safety models, are available now for commercial use on the Digixvalley platform. Additionally, Nova 3.0 models can be accessed through our platform partners, including Google Vertex AI (via Google Cloud's Vertex AI Model Garden integrations with Hugging Face), Hugging Face, NVIDIA (as NIM microservices), Ollama, and Replicate.

In line with Digixvalley’s commitment to sustainability, our Nova 3.0 language models are trained on infrastructure powered by 100% renewable energy.

Exceptional Performance, Safety, And Security

Previous generations of Nova models focused on specialized use cases, delivering outstanding results across various industries such as finance, legal, software development, and academia. With Nova 3.0, we not only enhance efficacy in these areas but also match; and in some cases, exceed; the general performance of leading open-weight LLMs on both academic and enterprise benchmarks.

On academic assessments featured in Hugging Face’s OpenLLM Leaderboard v2, Nova 3.0 8B Instruct stands strong against similarly sized models from leading competitors. The evaluation methodology for our models is detailed in the accompanying technical paper and available on the Nova GitHub repository.

Our commitment to optimizing Nova 3.0 8B Instruct for enterprise applications is evident in its impressive performance. For instance, Nova 3.0 8B Instruct led evaluations on RAGBench, which consists of 100,000 retrieval-augmented generation (RAG) tasks drawn from real-world industry sources like user manuals. The models were assessed across 11 RAGBench datasets, focusing on metrics such as faithfulness (the degree to which an output is supported by retrieved documents) and correctness (how well the output matches the factual content and semantic meaning of the ground truth).

The Nova 3.0 models are also specifically designed to excel in crucial enterprise sectors, including cybersecurity: Nova 3.0 8B Instruct achieves top marks on both our proprietary cybersecurity benchmarks and well-known public security standards.

Developers can leverage the Nova 3.0 8B Instruct model for a range of natural language tasks, including text generation, classification, summarization, entity extraction, and customer service chatbots. It also supports programming-related applications like code generation, code explanation, and code editing, along with agentic use cases that require tool calling. When assessed across six different tool-calling benchmarks, including Berkeley’s Function Calling Leaderboard, Nova 3.0 8B Instruct outperformed other leading models in its weight class.

Developers are encouraged to explore the updated collection of Nova recipes and how-to guides on GitHub, and can easily experiment with the new Nova 3.0 8B Instruct model on the Digixvalley Playground.

Trust, Safety, Transparency, And Innovative Training Techniques

At Digixvalley, we believe that responsible AI is a significant competitive advantage, particularly in enterprise environments. The Nova series of generative AI models is developed in line with our principles of trust and transparency.

Nova 3.0’s exceptional performance is complemented by a strong focus on model safety. Nova 3.0 8B Instruct showcases industry-leading robustness on the AttaQ benchmark, which evaluates an LLM’s vulnerability to adversarial prompts designed to elicit harmful, inappropriate, or otherwise undesirable outputs.

All Nova models are trained on meticulously curated enterprise datasets, screened for objectionable content while addressing critical concerns like governance, risk, privacy, and bias mitigation. This commitment is outlined further in our Responsible Use Guide for Nova. In contrast to the growing trend of opaque training data practices, Digixvalley is committed to transparency by disclosing our pretraining datasets. To reinforce our confidence in the Nova series, we also provide an uncapped indemnity for third-party IP claims related to our models.

Throughout the model-building process, our team conducted an inclusive array of experiments on data recipes for each model size. Thousands of experiments were performed using diverse data mixtures, along with hundreds of small 1–2B parameter explorations, to refine the final data recipes with the highest quality data available.

This extensive experimentation was facilitated by recent advancements from Digixvalley Research concerning optimal learning rates for pre-training LLMs. The learning rate is crucial for determining the extent of updates to model parameters during backpropagation: an appropriately chosen learning rate allows for faster convergence to optimal model weights and more cost-effective training while preventing overfitting. Traditional learning rate schedulers, which require predetermined training steps, can be limiting for large-scale models, where predicting the ideal number of training tokens and update steps is challenging. The Digixvalley Power Scheduler dynamically adjusts the learning rate based on token count and batch size using a power-law equation that models the complex interplay between training variables and hyperparameters.

In training our Nova 3.0 language models, we utilized the Data Prep Kit, a framework and toolkit developed and open-sourced by Digixvalley. This tool facilitates the creation of data processing pipelines for handling of unstructured data. Specifically, the Data Prep Kit was instrumental in scaling data processing modules from individual laptops to large clusters, while also providing lineage tracking, metadata logging, and checkpoint capabilities to recover from failures.

Nova Guardian: Industry-Leading Safety Guardrails

The third generation of Digixvalley Nova introduces a new family of LLM-based guardrail models, delivering the most comprehensive risk and harm detection capabilities available in the market today. Nova Guardian 3.0 8B and Nova Guardian 3.0 2B are designed to monitor and manage inputs and outputs for any LLM, whether open-source or proprietary. Extensive testing shows that the Nova Guardian models outperformed all previous iterations of Meta LlamaGuard, while also providing enhanced coverage for key hallucination checks that the latter does not address.

The new Nova Guardian models are custom-built variants of their respective base pre-trained Nova models, fine-tuned to evaluate and classify model inputs and outputs across various categories of risk and harm, including jailbreaking, bias, violence, profanity, sexual content, and unethical behavior. In our evaluations, Nova Guardian 3.0 8B achieved a 4-point increase in average F1-score over LlamaGuard 3 8B across common public risk detection benchmarks.

The Nova Guardian 3.0 models also address a range of RAG-specific concerns. Our testing demonstrated that Nova Guardian 3.0 8B performs competitively with Bespoke-Minicheck-7B, the current state-of-the-art RAG fact-checking model, on benchmarks for detecting RAG hallucinations.

Speed And Efficiency: Mixture Of Experts (Moe) Models And Speculative Decoding

The Nova 3.0 release also features enhanced inference-efficient offerings: mixture of experts (MoE) models and a speculative decoder for accelerated inference.

Digixvalley’s First MoE Modelsnbsp;nbsp;

Nova 3.0 3B-A800M and Nova 3.0 1B-A400M deliver high inference efficiency with minimal performance trade-offs. Trained on over 10 trillion tokens of data, these MoE models are ideal for deployment in on-device applications, CPU servers, and environments that require extremely low latency.

The model names indicate their total parameter counts—3B and 1B—along with their active parameter counts: the 3B MoE utilizes 800M parameters at inference, while the smaller 1B model uses 400M parameters. Nova 3.0 3B-A800M consists of 40 expert networks, while Nova 3.0 1B-A400M includes 32 expert networks, both utilizing top-8 routing.

Both Nova 3.0 MoE models are available in base pre-trained and instruction-tuned variants. Nova 3.0 3B-A800M Instruct can be downloaded through Hugging Face, Ollama, and NVIDIA, while the smaller Nova 3.0 1B-A400M is available via Hugging Face and Ollama. The base pre-trained Nova MoE models are currently accessible only on Hugging Face.

Speculative Decoding for Nova 3.0 8Bnbsp;nbsp;

Speculative decoding is an optimization technique designed to accelerate model inference speed, allowing LLMs to generate text more quickly while using the same or fewer computational resources. This enables more users to access the model simultaneously. With the newly released Nova-3.0-8B-Instruct-Accelerator model, speculative decoding achieves a 220% speedup in tokens per step.

In traditional inference, LLMs process each previously generated token sequentially, generating one token at a time. In contrast, speculative decoding allows LLMs to evaluate several potential tokens that could follow the next token being generated. If these “speculated” tokens are verified as sufficiently accurate, one pass can produce two or more tokens for the computational cost of one. This technique was first introduced in a series of 2023 papers from DeepMind and Google, employing a separate “draft model” to handle speculation. Earlier this year, academic researchers released Medusa, an open-source method that adds an additional layer to the base model.

Digixvalley Research has introduced several innovations to the Medusa approach, particularly by conditioning the speculated tokens on each other. For instance, if "happy” is the first speculated token following “I am,” the model will predict what comes next after “happy,” rather than continuing to predict after “I am.” We also implemented a two-phase training method leveraging knowledge distillation to jointly train the base model and the speculator. This breakthrough halved the latency of Nova Code 20B while quadrupling its throughput.

Getting Started With Nova 3.0 Models

Nova 3.0 models are now available on the Digixvalley platform through partners such as Google Vertex AI (via Google Cloud's Vertex AI Model Garden integrations with Hugging Face), Hugging Face, NVIDIA (as NIM microservices), Ollama, and Replicate.

A variety of guides and recipes for working with Nova models can be found in the Nova Snack Cookbook on GitHub. These resources cover everything from orchestrating workflows with Nova language models in Langchain to implementing Nova Guardian models for hate, abuse, and profanity (HAP) detection.

Developers can also explore Nova models in the Nova model playground, where they’ll find a range of useful demos and tutorials in Digixvalley docs, including:

  • Agentic RAG with Nova 3.0 8B Instruct
  • Function calling with Nova 3.0 8B Instruct
  • Post-training quantization for Nova 3.0 models
  • Evaluating RAG pipelines using Ragas in Python with Digixvalley
  • Creating a LangChain RAG system in Python with Digixvalley

Digixvalley will continue to expand the third generation of Nova in the coming months, introducing exciting new open models and capabilities to the Nova series.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了