Non-supervised AI for SMEs: Infrastructure is more than just roads.

Non-supervised AI for SMEs: Infrastructure is more than just roads.

This article is part 3 of our series on non-supervised AI for SMEs.?

Authors: Johannes Otterbach, Clara Swaboda, Design: Clara Swaboda

Technical infrastructure is the engine that drives AI. This is especially true for non-supervised AI methods as they require bigger models and thus depend on state-of-the-art infrastructure. Non-supervised AI offers a lot of potential for industrial applications but is currently not on the radar of most companies. The vast majority of AI projects today rely on supervised methods that require labeled data, which means that a human annotator has to define explicitly what a data point is or is not. Annotating is costly and might draw smaller companies away from even starting their own AI projects. This is where un-, self-and semi-supervised learning —or USSL for short— techniques are attractive because they work without or with only a few labels.?

“Technical infrastructure is the engine that drives AI. This is especially true for non-supervised AI methods as they require bigger models and thus depend on state-of-the-art infrastructure.”

To create value with these non-supervised methods, a larger data volume is needed compared to supervised approaches because the model discovers patterns in the data by itself. During the last few years, we have seen that bigger models are often better. Large Language Models (LLMs) like Megatron, PaLM, GPT-3, BERT and others have developed remarkable language abilities. Being able to translate and generate text opens a wide range of possible applications such as generating email subject lines, communicating with customers through chatbots, or translating text to code. Although Megatron, GPT-3 and others frequently make it into the headlines, large models are not limited to Natural Language Processing (NLP), but also employed in speech recognition, multi-modal modeling, recommender engines, computer vision and more.?

No alt text provided for this image

Caption: State-of-the-art NLP model size increased at an exponential rate during the last years. The size of large models poses high requirements on technical infrastructure. Source: Microsoft?

Scaling up to bigger models requires better tech infrastructure. This infrastructure rests on two pillars: traditional information technology (IT) elements and specific Machine Learning (ML) hardware. The IT scaffold includes broadband connection, storage layers, as well as Central Processing Unit (CPU) power for data processing and electricity. Apart from that, ML requires specialized hardware namely Graphics Processing Units (GPUs). Using GPUs for model training has become the gold standard in ML because of their processing speed and the option to train a model on multiple GPUs in parallel which saves time. IT infrastructure components and ML hardware like GPUs are united in powerful High-Performance-Computing (HPC) clusters.?

“Using GPUs for model training has become the gold standard in ML because of their processing speed and the option to train a model on multiple GPUs in parallel which saves time.”

But is it really necessary to invest in expensive HPC centers or could the same results be achieved with less advanced infrastructure? In principle, we could train any model on a single GPU - but this would be practically infeasible. Take this example: training a GPT-3 model with 175 billion parameters would take 36 years on eight NVIDIA V100 GPUs, or seven months with 512 NVIDIA V100 GPUs.

Most companies will of course never train models as big as GPT-3 but the pressure of making investments in up-to-date infrastructure still exists. Companies have to make a tradeoff between cost and efficiency. On the one hand, investing in tech infrastructure is expensive in the beginning. On the other hand, being fast in bringing models from development to deployment is crucial for becoming an AI-driven company and staying competitive. Leveraging non-supervised AI methods together with up-to-date tech infrastructure is a future-proof strategy which will save additional costs for data labeling.

These non-supervised AI approaches pay off when various stakeholders such as SMEs, corporations, and research institutions work together. Creating industry-specific collaborative data sets and training basis models using a shared tech infrastructure enables the development of much more powerful models than any of those participating actors would be able to achieve alone. These general basis models can then be fine-tuned on smaller labeled data sets to ultimately solve a variety of tasks and generate value within industry-relevant use cases.?

No alt text provided for this image

Caption: Sharing data, technical infrastructure and models allows stakeholders to bundle their resources and accomplish faster and better AI solutions.

A lot of companies outsource their hardware needs using on-demand services provided by big platform driven tech companies that are situated mostly in the US. We would like to challenge this common practice and suggest European AI-specific HPC clusters as an interesting alternative — a core topic also advocated by the LEAM (Large European AI Models) initiative pushing for the development of large AI models in Europe. This would enable us to develop AI more independently from the US or China and at the same time become more competitive with those major players in the tech industry. Moreover, having servers situated in Europe ensures that sensitive data is handled in compliance with European values and regulations. Another advantage is that building HPC lighthouse projects attracts international talent - a crucial success factor for AI projects. Our next article delves deeper into what it takes to build a strong team of AI experts.

No alt text provided for this image

要查看或添加评论,请登录

Merantix Momentum的更多文章

社区洞察

其他会员也浏览了