AI essentials - Random Variables and Probability Distributions
Sanjay Patel
Systems Engineer and Program Lead | Technical Product Developer | Mechatronics, Systems & Controls, Model Based Design, Physics and Math Modeling and Simulation | Bosch | Sedemac | Tata Motors | IIT Madras
Objectives
What are random variables?
What are discrete random variables and continuous random variables?
What are probability distributions? ?
Random variables
When an event’s outcome is dependent on chance, the outcome of the event is random and its occurrence can be modeled as probability. Probabilities are assigned to various plausible outcomes of the event. These plausible values and probabilities are further used for forecasting or decision making. This is where random variables and probability distributions come to one’s aid.
Example case study
Let us say we are operating a control process which has a failure rate of 0.1 (and so success rate of 0.9). When the process is running, it is said to be Up. And when process has stopped due to failure, it is said to be Down. During uptime, the process results into profits of $10 per second whereas during downtime, the process runs into loss of $1 per second due to a bunch of losses such as inventory cost, labor cost, machine hours cost, etc.
Say we were to run the process for today for total time 1000s. Then using relative frequency approach,
Estimated uptime = 0.9 * 1000 = 900s, Estimated uptime profits = $10 * 900 = $9000
Estimated downtime = 0.1 * 1000 = 100s, Estimated downtime losses = $1 * 100 = $100
________________________________________________________________________
Total profits = $9000 - $100 = $8900 and profit/s = $8.9
Note that, profit/s = ($10*0.9*1000s +(-$1)* 0.1*1000s)/1000s
OR profit/s = $10 * 0.9 + (-$1) * 0.1 = $8.9
Also, that $10 and (-$1) are per second uptime profits and per second downtime losses, respectively. These are the characteristics of our process. And 0.9 and 0.1 are the probabilities associated with these characteristics.
Here, the profit produced per second is dependent on chance – with 90% probability i.e. the universe (internal as well as external conditions/agents affecting the process) will favor/cause profit 90% of the time and not-profit, and hence loss, rest 10% of the time. This nature or behavior of our “profit-event and its occurrence” can be modeled as a random variable and by assigning a probability (a numerical value) to it. We don’t know the exact value. We cannot possibly know the exact value. We will never know the exact value. And we don’t need the exact value. We are good as far as we have a good, practical, usable estimate of its value a.k.a. their probability of occurrence. Similarly, “loss-event and its occurrence” can be modeled as a random variable with an associated probability.
Other examples of random variables depending on the case:
1.?????Outcome of randomly dealt cards
2.?????Outcome of a random toss of a fair coin
3.?????Success or failure of a system or a component or a process
4.?????Quality check result of a manufactured good
领英推荐
5.?????Very, very broadly speaking, NIFTY level after 6 months or time taken by NIFTY to cross 25k level from today
6.?????Whether a company or a country survives an economic crisis or a natural calamity ???
There are two options to model events using random variables ?
Discrete random variable
It is result of a chance-event. And it takes strictly discrete (or selected or class) values in a range.
e.g. Outcome of throwing a dice {1, 2, 3, 4, 5, 6} ?
Probability is assigned to each value (probable outcome or sample space).
In a discrete random variable distribution (or curve), the Y-axis values corresponding to a probable outcome indicates its associated probability. ?See the example video below for probability distribution for outcomes of "throwing a dice".
Continuous random variable
It is also result of a chance-event but it can take continuous values in a range.
e.g. Height or weight of students in a class room. Hot water demand (Flow-rate or duration) in an urban ?household at a particular time on a particular day in a particular year.
Probability is not assigned to a particular value but to a range of values.
Area under the random variable’s distribution (or curve) is used as measure or indicator of probability.
Probability distribution
Probability distribution is generated when frequency of occurrence of every probable outcome is plotted against that outcome. It indicates how total probability (the sum is always 1 when normalized) is distributed across all values.
X-axis: outcomes or classes or range of probable outcome (these discrete values or range of values are “events” when the discrete or a continuous random variable takes these corresponding values)
Y-axis: Frequency of occurrent or probability associated with the given outcome
In the following example, using Relative Frequency approach, probability distribution of outcomes of “throwing a dice” event is estimated by trials.?
Please feel free to share your feedback and suggestions.
-Sanjay