Artificial Intelligence (AI) in a Nutshell for Telecom/Wireless Engineers
Artificial Intelligence (AI) is called “New Electricity”, when electricity was invented it brought a revolution in each and every technology sector in the world, likewise, AI is going to make a big impact on each and every technology sector. A telecom/wireless engineer may think how it’s relevant to me, but the fact is it’s very much relevant to us. If you are not learning AI yourself now, very soon or later you will be forced to learn AI as part of your job. This article will explain AI in a nutshell for telecom/wireless engineers.
First, the article will discuss the values the AI brings to an organization, then explain basics about AI and then takes an example problem from the telecom/wireless domain and explain how it can be solved efficiently by AI.
Values of AI (My Perspective):
An organization is nothing but its employees, the experienced and smart engineers got a great role to play in an organization, but once these engineers leave the organization, their experience and knowledge leave along with them. The AI helps an organization to use their great engineers' knowledge and experience more smartly, by storing this knowledge digitally, by creating AI models and training them, so that it can be used over and over again to do a smart job, even after their employee leave the organization, those great engineers experience and knowledge are embedded and retained digitally.
For example one of the popular AI application in the telecom/wireless domain is "Analytics software", where the knowledge of expert engineers is used to train algorithms to do automatic incident detection and root cause analysis for thousands of different scenarios, and prioritizing the alerts and also fixing them automatically. So in the above example, an organization can embed and retain their expert engineers knowledge and experience in their Analytics software and also be able to automate the incident detection, root cause analysis, and fixing them, otherwise, an organization needs an expert engineering team to do this job throughout the product life cycle, the quality is impacted based on their engineering team.
So, the value of AI for an organization is 1) Time saving 2) Improved quality 3) and Automation to save cost.
Basics:
AI is?intelligence?demonstrated by?machines, in contrast to the?natural intelligence?displayed by humans. AI is accomplished by studying how the human brain thinks, and how humans learn, decide, and work while trying to solve a problem, and then using the outcomes of this study as a basis for developing intelligent software and systems.
What is Intelligence? The ability of a system to reason, learn, problems Solving, perception, and linguistic intelligence.
Difference between Human and Machine Intelligence,
AI often revolves around the use of?algorithms. A complex algorithm is often built on top of other, simpler, algorithms. Many AI algorithms are capable of learning from data; they can enhance themselves by learning new?heuristics?(strategies, or "rules of thumb", that have worked well in the past), or can themselves write other algorithms.?Some of the "learners" like Bayesian networks, decision trees, and nearest-neighbor, could theoretically, if given infinite data, time, and memory, learn to approximate any?function, including whatever combination of mathematical functions would best describe the entire world. These learners could therefore, in theory, derive all possible knowledge, by considering every possible hypothesis and matching it against the data.
So at this point, we have to remember AI is realized using a set of algorithms, and “LEARNING” is the key for AI, and it needs “DATA” for learning. When we discuss algorithms it may look a little complex, but the fact is we no need to write all these complex algorithms, there are free and open-source software libraries like Caffe2, Cognitive Toolkit, MXNet, PyTorch, and TensorFlow available. And even the AI hardware/chip providers will implement these algorithms and provide APIs, for example, Nvidia GPU-accelerated libraries such as cuDNN and NCCL. So, we have to just know what all AI algorithms/architectures (for example, there are various architectures of CNNs available like ResNet, AlexNet, etc.,) are available and their applications.
Which programming language to use to code these AI programs? Caffe2 is written in C++, Cognitive toolkit is written in C++, MXNet is written in C++, Python, and other languages, PyTorch is written in Python, TensorFlow is written in Python, and C++. So if you know either C++ or Python you will be able to write AI programs.
What is Data? For example, if you write an AI program to find a given drink is beer or wine, then the "data" for this system is "color" and "alcohol content", like all "different colors" of the beer and wine and their different "range of alcohol content".??
Neural Networks Basics:
Understanding Neural Networks' basics are very important before learning about AI. So this section gives the basics about Neural Networks.
Artificial Neural Networks (ANNs):
AI has developed a large number of tools to solve the most difficult problems in?computer science, one of the tools is Artificial Neural Networks (ANNs).
ANN was inspired by the architecture of neurons in the human brain. ANNs are composed of multiple?nodes, which imitate the biological?neurons?of the human brain. The neurons are connected by links and they interact with each other. The nodes can take input data and perform simple operations on the data. The result of these operations is passed to other neurons. The output at each node is called its?activation?or?node value. A simple "neuron"?N?accepts input from multiple other neurons, each of which, when activated (or "fired"), cast a weighted "vote" for or against whether neuron?N?should itself activate. Learning requires an algorithm to adjust these weights based on the training data; one simple algorithm is to increase the weight between two connected neurons when the activation of one triggers the successful activation of another.?
Each link is associated with?weight.?ANNs are capable of learning, which takes place by altering weight values. The below diagram illustrates the ANNs.
There are two Artificial Neural Network topologies ??FeedForward?and?Feedback.
FeedForward ANN:
The information flow is unidirectional. A node sends information to other nodes from which it does not receive any information. There are no feedback loops. They are used in pattern generation/recognition/classification. They have fixed inputs and outputs.??
FeedBack ANN:
The feedback loops are allowed and short-term memories of previous input events.??
Working of ANNs:
In the topology diagrams shown, each arrow represents a connection between two neurons and indicates the pathway for the flow of information. Each connection has a weight, an integer number that controls the signal between the two neurons.
If the network generates a “good or desired” output, there is no need to adjust the weights. However, if the network generates a “poor or undesired” output or an error, then the system alters the weights to improve subsequent results.
Different types of ANNs:
There are different types of ANNs, below chart illustrates the list of different neural networks.
Evolution of AI:
AI research started first, which is the largest research field, the subset of AI is machine learning, Machine learning is a fundamental concept of AI research since the field's inception, and then deep learning the subset of machine learning, the below diagram illustrates this. The current trend is deep learning, which is making a huge impact on different technology sectors, because of the availability of abundant data.
Machine learning:
Machine learning is a subset of AI techniques, which uses statistical methods to enable machines to improve with experience. Machine learning (ML) is a rather loosely defined field with strong ties to computational science, statistics, and optimization. The goal in ML is to learn something from data, either to make predictions or to extract patterns. ML is used in the medical field, search engines, movie rankings, object recognition, etc,
Machine learning tasks are typically classified into three broad categories, depending on whether there is a learning "signal" or "feedback" available to a learning system.
Supervised Learning:
It involves a teacher than the ANN itself. For example, the teacher feeds some example data about which the teacher already knows the answers.
For example, pattern recognizing. The ANN comes up with guesses while recognizing. Then the teacher provides the ANN with the answers. The network then compares its guesses with the teacher’s “correct” answers and makes adjustments according to errors.
Applications of supervised learning, “Classification”, inputs are divided into two or more classes, and the learner must produce a model that assigns unseen inputs to one or more of these classes. Example spam mail filtering is a classification model, where the inputs are email messages and the classes are “spam” or “not spam”. And another application is “Regression”, the outputs are continuous rather than discrete. Focuses on the relationship between dependent features and one or more independent features, helps to understand how the typical value of the dependent variable changes when any one of the independent features is varied, while other independent features are help fixed. For example, for given?housing price data?of a city X which contains information like, year house was built, lot size, #of bedrooms, etc. This information is also known as?independent variables?and there is a?price?associated with each house also known as?a dependent variable?(because its value depends on the independent variables). Now the task is to?predict the price of a new house?with given data. Problems like these are solved usually using?Regression.
Unsupervised Learning:
It is required when there is no example data set with known answers. For example, searching for a hidden pattern. In unsupervised learning, the algorithm is given a lot of data and asked to find a pattern. The learner is only given the inputs and asked to find patterns among them. Usually, this is done by finding clusters or by analyzing which feature/dimension is the most important one. In this case, clustering i.e. dividing a set of elements into groups according to some unknown pattern is carried out based on the existing data sets present.
Applications of unsupervised learning, “Clustering”, a set of inputs is to be divided into groups. Unlike in classification, the groups are not known beforehand, making this typically an unsupervised task.
Reinforcement Learning:
This strategy is built on observation. The ANN decides by observing its environment. If the observation is negative, the network adjusts its weights to be able to make a different required decision the next time. The training data (in form of rewards and punishments) is given only as feedback to the program's actions in a dynamic environment, such as?driving a vehicle?or playing a game against an opponent.
Some of the ML algorithms are Artificial Neural Networks (ANN), deep learning, random forests, decision tree learning, inductive logic programming,?clustering,?reinforcement learning, and Bayesian networks. Below table summaries some of them,
One of the Machine learning algorithms is ANN and most of the other algorithms may be graph-based or tree-based algorithms. Deep learning is a way to?implement?machine learning.
Limitations of ML:
-?????????Are not useful while working with high dimensional data, that is where we have a large number of inputs and outputs.
-?????????Cannot solve crucial AI problems like NLP, Image recognition, etc.,
-?????????One of the big challenges with traditional ML models is a process called feature extraction. Unable to automatically generate the required or new feature. The features are nothing but variables of a system, for example, to predict whether there is a match or not on a particular date the variables are whether the weather is good or not, it’s windy or not on that day, these are the inputs but one more variable is missing in input which is humidity, which the ML will not be able to generate automatically.
-?????????For complex problems such as object recognition or handwriting recognition, this is a huge challenge.?
Deep Learning:
Deep learning skips the manual steps of extracting features; you can directly feed images to the deep learning algorithm, which then predicts the object. In contrast, ML needs to feed all features as the input, so we have to manually extract all the features in ML.
Deep learning (DL)?is used?in layers to create an?artificial “neural network”. That can learn and make intelligent decisions on its own. The idea behind DL is to build learning algorithms that mimic the brain.
-?????????Deep learning models are capable to focus on the right features by themselves, requiring little guidance from the programmer.
-?????????These models also partially solve the dimensionality (high dimension data) problem.
Deep learning is implemented using ANN.
The above diagram illustrated an artificial neural network. The X1 to Xn are input to the neuron and the neuron cell body here is “processing element”, which is nothing but the summation of all inputs and their corresponding weights, these weights are randomly assigned, with the summation it will generate a function of “S”, which is F(S). And the function is given to “activation function” which is nothing but threshold “Y” if the output is above the threshold only the neuron will fire. And the outputs will be compared against the output data provided if it’s not equal to desired outputs, based on that the input weights of the next neurons will be assigned until we get the desired output.
Deep learning is implemented using deep networks; please check the list of ANNs in the above chart in the previous section, to find deep networks. The deep networks are nothing but neural networks with multiple hidden layers, for example, hundreds of hidden layers. This is the major difference between Machine learning and deep learning, ML will not have such kind of hidden layers or deep networks. And the higher the layer the abstract is the data, example the first layer is the input and the next layer will be the abstract data of the input, and the next layer will be the abstract of the previous layer data. The below diagram illustrates the deep network with multiple hidden layers,
Facial recognition as an example for deep learning is illustrated in the below diagram.
Generally, deep learning depends on high-end machines. While traditional learning depends on low-end machines. Thus, deep learning?requirement?includes GPUs. That is an integral part of its working.
Performance is the main key difference between ML and DL algorithms. Although, when the data is small, deep learning algorithms don’t perform well. This is the only reason DL algorithms need a large amount of data to understand it?perfectly.
Deep learning architectures such as?deep neural networks,?deep belief networks,?and?recurrent neural networks?have been applied to fields including?computer vision,?speech recognition,?natural language processing, audio recognition, social network filtering,?machine translation,?bioinformatics,?drug design,?and?board game?programs, where they have produced results comparable to and in some cases superior to human experts.
Choice of Algorithm:
When averaged over all possible problems, no algorithm will perform better than all others. The assumptions of a great model for one problem may not hold for another problem, so it is imperative to try multiple models and find one that works best for a particular problem. The structured overview of ML algorithms main features:
Linear regression and Linear classifier.?Despite an apparent simplicity, they are very useful on a huge amount of features where better algorithms suffer from overfitting.
Logistic regression?is the simplest non-linear classifier with a linear combination of parameters and nonlinear function (sigmoid) for binary classification.
Decision trees?are often similar to people’s decision processes and are easy to interpret. But they are most often used in compositions such as Random Forest or Gradient boosting.
K-means?is more primal, but a very easy-to-understand algorithm, that can be perfect as a baseline in a variety of problems.
Principal component analysis (PCA)?is a great choice to reduce the dimensionality of your feature space with minimum loss of information.
Neural Networks?are a new era of machine learning algorithms and can be applied for many tasks, but their training needs huge computational complexity.
Below table summaries the commonly used Machine Learning algorithms,
Example problem from Telecom/wireless domain:
Even though deep learning is higher performing than machine learning, a large amount of data and the high-end hardware requirements are key restrictions for using deep learning, so for these reasons, machine learning is also a popular tool.
For explaining the AI with an example, I have taken a master thesis from Linkoping University done by Bjorn Ekman, this is a machine learning-based technique, and the title of the thesis is “Machine Learning for Beam based Mobility Optimization in NR”.
System Overview:
In 5G NR system, the cell coverage is beams-based, and all data transmissions are beam-formed, if you are not familiar with these concepts please refer to this article https://tinyurl.com/yc7zqof5. At the time this thesis is written there are no 5G NR specifications. So in this thesis, they have used an LTE system for simulation, modified to allow for some fundamental ideas of NR mobility: more antennas and different reference signals. There are seven Base Station (BS) sites, each BS is capable of 24 beams transmission. The below diagram illustrates this concept. The smoothness and shapes of the beam are somewhat exaggerated, a beam can have very different shapes when taking reflections into account. Nevertheless, the below diagram is useful, showing beams of various sizes and shapes and the possibilities of them reaching far into other beams and BS.
Each BS site has three sectors with eight antennas each. Using eight different predefined pre-coding matrices, each sector can combine its antennas into eight mobility beams. In total, in the simulation system, there are 7 BS sites * 3 sectors * 8 beams = 168 beams. Why this calculation is important? Typically a cell sector will have six physical neighbors, but if you see the neighbor table there will be tens of neighbors, this is due to beam reflections, and we may oversee the non-physical neighbor cells also in our neighbor table. That’s why the candidate beam and the best selection are complex and resource-consuming; this is the core of this thesis problem.
Each UE in the system is always served by one beam, denoted as a serving beam. Eventually, due to the movement of the UE, the quality of the serving beam will deteriorate and a new beam needs to be selected. To be able to select which beam to hand over to, the current serving BS needs some information about signal strength. With that, it is possible to compare beams and judge which one is best. In LTE that is provided by UEs continuously monitoring and reporting the signal quality of surrounding cells. In NR this will be trickier since there are many more possible reference beams, which are not transmitted continuously.
In NR, a machine-learned algorithm will help the BS site to come up with a good set of candidate beams, originating either from the serving BS or from its neighboring BS. These candidate beams will then be activated by the corresponding BS, measured by the UE, and reported back to the serving BS. The serving BS will then decide whether to hand over or not and if so to which beam (and thus which BS).
The role of machine learning is to help the BS with the selection of the candidate beams. A set of beams is considered good if it: 1) contains few beams 2) has a high probability of containing the best beam. It is vital to limit the number of beams it activated, as each active beam consumes system resources. The best beam is the beam with the highest signal strength a.k.a. RSRP. These two demands work against each other:?more active beams will make it more likely that the best beam is among them. Taken to its extreme, the machine learner will try to find the best beam and only suggest that one. This extreme case is a good starting point, as it is more easily converted into a machine learning problem.
Choice of a machine learning algorithm:
The supervised machined learning algorithm is used. The traditional model of supervised learning assumes only one output target but can be generalized to models with several output targets. Multiple-target supervised learning is very flexible and therefore applicable to a wide range of problems. Random forests can be used to rank the importance of variables in a regression or classification problem in a natural way. It is relatively easy to predict several targets in a random forest algorithm, at least as long as all targets are of the same type – either classification or regression. In that case, the performance of each split is computed for each target and then averaged overall targets.
Because of its resistance to messy data, the random forest was chosen as the main model for this thesis. The main benefits that were attractive were the reduced demand on pre-processed and continuous, feature importance, and support for multi-target problems. This thesis used the “Scikit-learn” package implementation of random forest, mainly because of ease of use and well-reputed documentation.
The regression trees tried to predict RSRP. The classification trees tried to classify each sample according to the best beam in the sample. Beam Index was simply converted into a class index. Predicting class probabilities in the multi-class case made input and output to the two models have the same dimensions.?The ranking was done by applying a max-function on the output matrix in both cases (maximizing either estimated RSRP or estimated “probability of being best”).
The performance of different ML algorithms strongly depends on the size and structure of your data. Thus, the correct choice of an algorithm often remains unclear unless we test out our algorithms directly through plain old trial and error. Along with chosen ML algorithm, other algorithms were also tested, to find which one perform better. So five types of ML models are tested, FVV (Feature Vector Virtualization), MCC (Multi-Class Classification), MTR (Multi-Target Regression), PP (Pairwise Preference), and ST (Single Target).
Random Forest:
Random forest is a well-known and easy to use, yet in some aspects complex, learning algorithm. It builds upon three ideas; the ensemble of decision trees, bagging, and random feature selection, evolved since 1980 by numerous authors and eventually combined into one model. Random Forests are simply an ensemble of decision trees. The input vector is run through multiple decision trees. The below diagram illustrates this concept.
Machine learner Implementation:
The learner was done offline with the data collected from the simulator. The collected input is split into two sets, one for training and another one for testing. The search space for cross-validation was defined and then the actual learning was carried out by the scikit-learn version of random forest. After that, the additional metrics were computed and plotted.
The process of training an ML model involves providing an ML algorithm (that is, the?learning algorithm) with training data to learn from. The term?ML model?refers to the model artifact that is created by the training process.
The training data must contain the correct answer, which is known as a?target?or?target attribute. The learning algorithm finds patterns in the training data that map the input data attributes to the target (the answer that you want to predict), and it outputs an ML model that captures these patterns.
You can use the ML model to get predictions on new data for which you do not know the target.
Data Overview:
Here follows an overview of the available data and how it can be turned into features and targets.
Available Data:
From the simulator following data is collected.
-?????????Beam indexes
-?????????Beam RSRP/CQI
-?????????Position/distance
-?????????UE speed
Features (Variables of the system):
Here the available data is turned into machine learning features. The features were eventually divided into two features groups: instant and history features. The instant features focused on values updated at each UE measurement, and the history feature focused on past events and measurements. An asterisk marks data mainly used as a learning target.
Instant features:
-?????????Serving beam index
-?????????Destination beam index*
-?????????Serving beam RSRP
-?????????Non-serving beam RSRP*
-?????????Distance to serving BS
-?????????Position
-?????????UE speed
-?????????CQI
History features:
-?????????Previous serving beam
-?????????Time spent in the serving beam
-?????????Trends in the instance features (mainly RSRP and distance)
Targets:
The machine learner tries to find the best destination beam, which in this study is the same as the beam with the highest RSRP. This makes two options available: either learn the RSRP of all beams and use that to determine which is best, or learn the index of the best beam directly. The first one leads to a multi-target regression (MRT) problem and the latter one to a multi-class classification (MCC) problem. As discussed in the choice of machine learning algorithm section, the regression trees tried to predict RSRP. The classification trees tried to classify each sample according to the best beam in the sample.?
Performance metrics:
Metrics and baseline models are described.
Problem specific metrics:
Traditional machine learning metrics are important but fail to capture some of the radio communication aspects of the problem. To get a better picture beam-hit-ratio and RSRP-difference were introduced as metrics.
-?????????Beam-hit-ratio: % of test samples where the best beam was selected.
-?????????Sector-hit-ratio: % of test samples where the best sector was selected.
-?????????BSSite-hit-ratio: % of test samples where the best BS site was selected.
-?????????RSRP-difference: the average difference between the RSRP of the best beam and the beam that the algorithm selected.
Baseline models:
A machine learning algorithm tries to learn a function that models the relationship between the input (feature) data and the target variable (or label). When you test it, you will typically measure performance in one way or another. For example, your algorithm maybe 75% accurate. But what does this mean? You can infer this meaning by comparing it with a baseline's performance. In general, you will want your approach to outperform the baselines you have selected.
Random selection is not very a good method. Instead focus turned to two marginally more complex methods, Beam-Overlap and Average-RSRP.
Average-RSRP:
This model is a simple predictor. For each serving, beam training samples are used to compute the average RSRP of each destination beam. The destination beams are then ranked according to the average RSRPs and the ranking used for beam selection.
Beam Overlap:
Beam overlap is easiest viewed as an area-overlap described as a ratio: 100% illustrated that both beams cover the same area and 0% indicates that they have no area in common. This definition follows the problem of defining the coverage area of the beam. In a system with quite many beams and where the coverage area is narrow and highly influenced by surrounding buildings, this is a bit cumbersome. Used instead in this thesis is an approximation of beam-to-beam overlap. By looking at the samples served by a particular beam, that beams overlap with another beam is computed as the percentage of time that the other beam was the best beam.
Results and summary:
In general, models built with MCC are better than MTR. Models built for all beams in the BS site perform slightly better than when there is one model per serving beam. With more positional information, performance increases. MCC seems still to have the lead, with their MTR counterparts close behind.
The results for the best models are displayed from a different point of view in the below tables. Rather than answer the question “how good is the model if allowed these many candidate beams?” they try to answer “how many beams are needed to achieve a certain performance level?”. They need to be considered in the proportion of the total number of beams and compared with each other.
In these tables’ two different MCC modes, MCC and MCCextra, two different MTR models, MTR and MTRextra, and the beam-Overlap baseline model are compared. The MCC and MTR are models built with as many beams as possible as sources, while MCCextra and MTRextra are the same models (only considering models with the node as the source). The main difference between the normal and extra models is the number of features they use (most importantly position or not).
The number of candidate beams needed for adequate performance was a bit higher than the initial expectations, almost requiring 10-15 beams (10% of all beams) to be reliable (assuming 90-95% beam-hit-ratio combined with low average RSRP-difference counts as reliable).
Thanks for sharing
5G FFA - mmWave, Sub6 & C-Band Feature Testing and Validation, SA/NSA , NPI, Lab and Feature Field Test & Analysis.
6 年Thanks for Sharing. Nicely put-up such complex topic in simple words? your easier understand.