课程: Probability Foundations for Data Science

Random variables

- [Instructor] Now you should have a good understanding of some basic concepts of probability. Let's build on those basic concepts you just learned by exploring the different types of probability distributions. Before you do that though, you need to learn what a random variable is. A random variable is a mathematical expression or formula that represents potential outcomes of a random event. A random variable can take on many values, and it is represented with a probability distribution function that shows the likelihood of each outcome occurring. Let's show how this works with a few simple examples. Let the random variable X represent the outcome of one roll of one fair die. The probability distribution for this variable contains the values 1, 2, 3, 4, 5, and 6, all with probability values of 1/6 assigned to them. In this case, the random variable is representing the different outcomes and probabilities associated with them for rolling the die. Another scenario is the amount of snow a mountain range receives in one year. The random variable X in this situation represents the amount of snow the mountain range may receive in a year's timeframe. The probability distribution for this one is different since there are no set distinct values, but instead, a range of values starting from zero to the maximum amount of snow. As you saw in the previous two examples, random variables can be discreet or continuous. You will use both types throughout this course, so let's begin by exploring discreet random variables. Discrete random variables represent a probability distribution that contains a finite or countable number of values. This is represented by the following equation, meaning the values in the probability distribution all should sum up to equal one. For example, a discrete random variable can represent pulling a king card from a deck of 52 cards. There are 52 cards in the deck with four king cards to choose from. The total number of cards in the deck is set at 52, and the possibilities of getting a king card is set at four total opportunities. Since these values are distinct and set with no range of infinite values, they represent a discrete random variable. The corresponding probability for drawing a king is 4 over 52 or 1 over 13. Continuous random variables represent a probability distribution that contains any value within any specified range or interval of values. Since this range or interval is continuous and can include decimal values, there are technically an infinite number of possible values the random variable can take on. The range of values itself does not need to go to positive or negative infinity to make a random variable continuous. The values within the range just need the possibility of being infinite. This is represented by the following equation, meaning each value in the probability distribution has the probability of zero due to there being infinite possible values. For example, a continuous random variable can represent the amount of steel a company produces in a year in kilograms. The range of this starts at zero and can technically go to infinity since the maximum amount is not set. Since there can be a million kilograms or 503,000.783 kilograms produced, the random variable is continuous because the possible outcomes can't be counted. Another example of a continuous random variable is the average test score a class receives. In this case, the range starts at zero and ends at 100. Even though the range of values has a clear start to end, the random variable is still continuous because the average test value outcomes can include decimal values such as 78.94 or 85.2093. Since these values can't be counted, this is what classifies it as continuous and makes the probability of a particular value occurring be zero. So why are random variables important? Random variables represent the base of each data experiment or observation. These random variables allow you to understand the sample data you gather and use in your analyses by knowing the likelihood that a specific value will occur. Let's use these random variables to explore the different probability distributions.

内容