AI & ML Jargon

AI & ML Jargon

At the beginning of 2022 I joined Cerebras, and got introduced to new field of Artificial Intelligence (AI). The purpose of this article is to capture basic terminology used in the field and ignite curiosity to further explore each topic. It is written for someone who is new to the field and wants to learn fundamentals in very simple way. Cerebras Websites are very content rich and generously sharing knowledge of the field using their cutting edge product line (SW, HW and solutions for variety of real life applications).

When machines start learning and problem solving like humans (e.g., understands speech and performs tasks (Alexa, Siri etc.)), the machine understands user’s behavior pattern and suggests relevant choices. Some examples of AI include autonomous vehicles that self-drive on a busy street safely. Mimicking human like behavior (e.g., reasoning, planning, learning, problem solving, decision making) while using all the information available to machine requires wide range of field knowledge including math, statistics, computer science, psychology, linguistics and more. Natural language processing (NLP) is a subfield of linguistics which is specific to computer science. This allows AI to understand, analyze, and process natural language data. NLP enables a machine to quickly make sense of unstructured text created by web data, emails, media, texts, social media, instant messages etc., and enable commercial enterprise and government agencies to identify trends, sentiment, correlations, and key ideas.

As a part of AI, the machine needs to make decisions based on all the data it acquires to accomplish the indented goals. The machine uses algorithms and mathematical models to process the data to fit the model. Fitted models use validation or test data sets to predicts the behavior and evaluating model fit on the training data to assure there is no overfitting on training data to save learning time. This is how Machine leaning (ML) algorithms build a model based on training/sample data to make predictions or decisions without being explicitly programmed to do so. Big Data is generated as businesses and technology collect more data; this triggers more advanced learning algorithms. For example, Bidirectional Encoder Representation from Transformers (BERT) the deep learning NLP model has gone beyond the simple text analyzer to more specialized applications in the field of BioBERT, FinBERT, SciBERT, ClinicalBERT, GilBERT, DNABERT, PatentBERT, mBERT etc. This allows data/text analysis in the field of biomedical, finance, science, clinical, geological, genomic, patent, and multi-lingual fields respectively. 

More and more enterprises and individuals are using the Cloud to store and use compute power from off-site decentralized facilities and SaaS (Software as a Service- use of on-line SW applications with subscription than owning, installing and running it locally) applications. Platform as a service (Paas), Infrastructure-as-a-Service (Iaas) and Function-as-a-Service (Faas) are other Cloud service models which one can rent all the applications, servers/storage, and serverless function respectively. 

Container contains an application and everything to use it. What a container needs are hosting and run to perform its function. This is way to partitioning a machine or server into separate user space environments. Each environment runs only one application and is isolated from other applications/partitioned sections on the machine. However, each container shares the computer hardware through machine’s kernel. While Virtual Machine (VM) is a software-based computer that resides on yet another computer’s operating system. Cloud uses VMs for testing, back up, and running SaaS applications.

The machines also do Data Mining; analytics to extract patterns & knowledge from large data sets. Neural networks models solve larger and complex problems of AI with more advanced decision-making through interconnected nodes and granular decision-making more efficiently than linear algorithms can. 

Machine learning through deep hidden layers of neural network data called Deep Learning (DL). DL uses vast data from neural network and training them to perform complicated tasks which are difficult to describe. Generative Pre-trained Transformer 3 (GPT-3) is language prediction model created by OpenAI in as latest as May 2020 (AI research lab) uses deep learning to produce human like text using autoregressive language model.

Most of us are familiar with Central Processing Unit (CPU) and Graphic Processing Unit (GPU) serial and parallel processing respectively to handle computing. GPU helped many industries like automotive, robotics, and healthcare & life science dramatically by accelerating applications. Deep learning processor (DLP) or deep learning accelerator designed for deep learning algorithms. Nvidia GPUs process matrix multiplication using Tensor cores. Tensor Processing Units (TPU) are Google’s custom-developed ASIC to accelerate ML workloads. TPUs are custom-built processors to run on Google’s specific TensorFlow framework. TPU or Neural processing units (NPUs) in Huawei cellphones are all types of DLPs. High data-level parallelism, large on-chip buffer/memory makes DLPs highly efficient to perform DL algorithms than FPGA (Field-programmable gate arrays), CPUs and GPUs.  AI accelerators are computer hardware system designed to accelerate AI and ML applications. 

NLP models have increased to 17 billion parameters in 2020 and continuous growth of network size demands more memory and time for training. GPT-3 demands close to 2.8 TB of memory to store weights and states. Individual GPU or compute. Individual compute units are not enough to handle GPT-3 types of modeling efficiently.  Cluster of such units as many as 1000 GPUs requires to reduce distribute training time by just few days. Years of training time and huge memory on compute unit is required. Weight streaming allows growth of cluster size independent of model size. Wafer-Scale Engine is very innovative solution which has multiple cores with big SRAM, aggregate high on-wafer network bandwidth. It executes DL compute on BERT models way efficiently than standard GPUs. Cerebras CS-2 is one such system which has massive computation, storage and AI processing capabilities. One can create cluster of such systems to further handle more complex trainings and bigger models at very high speed for any AI applications. 

That is a nice introduction to introduce the terms in context, Milind! You have many talents!

要查看或添加评论,请登录

Milind Patel的更多文章

  • Key to Success - Communication

    Key to Success - Communication

    The means world relates to you. There is information/communication that radiates out of each one of us at any given…

    2 条评论
  • Fundamentals of EDVT (Electrical Design Validation/Verification Test)

    Fundamentals of EDVT (Electrical Design Validation/Verification Test)

    Electronics products are becoming faster, smaller and more complex day by day. While many product functions are…

    8 条评论
  • Make the best out of the meeting

    Make the best out of the meeting

    Check list for both host and attendees. Before meeting- 1.

  • Mechanical Design Validation Test (MDVT) Fundamentals

    Mechanical Design Validation Test (MDVT) Fundamentals

    Design validation tests are performed to validate a product’s design; as for design specifications, internal design or…

    10 条评论
  • Catch the wave or wipe out safely

    Catch the wave or wipe out safely

    Are you ready for next technological wave? It is fascinating to observe how innovators, visionaries, and proactive…

    1 条评论
  • To do or not to do?

    To do or not to do?

    It is challenging to plan the amount of testing a product needs to validate the design in order for faster of the…

    1 条评论

社区洞察

其他会员也浏览了