Standard Deviation of Continuous Random Variable
The Probability Distribution of a Continuous Random Variable
Suppose that $X$ is a discrete random variable with many values. We are going to construct a picture of the probability distribution of $X$ in a new and very graphical way.
Like a relative frequency diagram, our new picture consists of a number of bars. All the bars are of equal width, and we choose the height of each bar so that the area of the bar—its width times its height—is equal to the probability that a randomly chosen value of $X$ is in the interval under the bar. With this, bars that sit over an interval containing more values of $X$ will be higher than bars over intervals with fewer values of $X$. For a symmetric distribution, we might get a picture like the following, in which the mean (1600) and one other $x$-value are shown.
One advantage of this way of thinking is that it allows us to compute probabilities by computing areas. For example, if we want to know the probability that a randomly chosen value of $X$ is, say, between 1300 and 1400, all we have to do is to add up the areas of the bars over the interval $[1300,1400]$, as shown in the image below.
If we add up the areas of all the bars, we get the probability that $X$ has some value. Since $X$ must have a value, that probability is 1, so the total area of the bars is 1.
Notice that we do not really need the bars to compute probabilities. All we really need is the jagged "curve" made up of the tops of the bars:
To get from this picture to a picture of the probability distribution of a continuous random variable $X$, first remember that any $x$-value is a possible value of $X$. For a continuous random variable, there aren't just a lot of possible values; there are infinitely many. Next, think of making these distribution pictures with thinner and thinner bars (and so, more of them). No matter how many bars we used, the two important things would still be true: we could still compute probabilities by computing areas, and the total area under the bar-top curve would still be 1.
Finally, think of using infinitely many bars, each infinitely narrow, so that the curve made up of the tops of the bars becomes a smooth curve. In this limit, we cannot compute areas by adding up rectangles, but the idea remains the same: the total area under the curve is still 1, and the probability that a randomly chosen value of $X$ is in any given interval is still the area over that interval and under the curve. The function whose graph is the curve involved is called the probability density function for $X$, as you will see in the following definition.
Definition
The probability distribution of a continuous random variable $X$ is an assignment of probabilities to intervals on the $x$-axis using a function $f(x)$, called a probability density function, in the following way: the probability that a randomly chosen value of $X$ is in the interval $(a,b)$ is equal to the area of the region that is bounded above by the graph of the equation $y=f(x)$, bounded below by the $x$-axis, and bounded on the left and right by the vertical lines through $a$ and $b$, as illustrated in Figure 5.1 below.
Figure 5.1
Instead of "probability density function," we usually just write "pdf."
Every probability density function $f(x)$ must satisfy the following two conditions:
- For all numbers $x$, $f(x)\ge 0$, so that the graph of $y=f(x)$ never drops below the $x$-axis.
- The area of the region under the graph of $y=f(x)$ and above the $x$-axis is 1.
For most pdfs, we need calculus to find the area over an interval and under the curve. In some special cases we can compute it without calculus, as you will see, and for the most important families of continuous pdfs, we can compute it using TI calculator functions.
A word about vocabulary: We, like many others, sometimes refer to the probability that a random variable $X$ "assumes" a value. To understand this language, remember that even though we don't always mention it, there must be an experiment with which $X$ is associated, and that the value of $X$ depends on the outcome of that experiment. When we say that $X$ assumes a particular value, we mean that $X$ has that value after the experiment is performed.
Because the area of a line segment is zero, the definition of the probability distribution of a continuous random variable implies that for any particular number $a$, the probability that $X$ assumes the exact value $a$ is zero—in symbols, $P(X=a)=0$. Because of this, the probability that a randomly chosen value of $X$ is in a given interval is the same whether or not the endpoints of the interval are included. In symbols, for any continuous random variable $X$,
$$P(a\le X\le b)=P(a\lt X\le b)=P(a\le X\lt b)=P(a\lt X\lt b)$$
Example 1
A random variable $X$ has the uniform distribution on the interval $(0,1)$. That is, the pdf of $X$ is the function $f(x)$, where $f(x)=1$ if $x$ is between 0 and 1, and $f(x)=0$ for all other values of $x$. This pdf is pictured below.
- Find $P(X\gt 0.75)$, the probability that $X$ assumes a value greater than 0.75.
- Find $P(X\le 0.2)$, the probability that $X$ assumes a value less than or equal to 0.2.
- Find $P(0.4\lt X\lt 0.7)$, the probability that $X$ assumes a value between 0.4 and 0.7.
Solution:
In each case we find the probability by computing an area.
- $P(X\gt 0.75)$ is the area of the rectangle of height 1 and base length $1-0.75=0.25$, hence is base $\times$ height $=(0.25)(1)=0.25$. See panel (a) in the figure below.
- $P(X\le 0.2)$ is the area of the rectangle of height 1 and base length $0.2-0=0.2$, hence is base $\times$ height $=(0.2)(1)=0.2$. See panel (b) in the figure below.
- $P(0.4\lt X\lt 0.7)$ is the area of the rectangle of height 1 and length $0.7-0.4=0.3$, hence is base $\times$ height $=(0.3)(1)=0.3$. See panel (c) in the figure below.
Areas for the uniform distribution on $[0,1]$ |
Example 2
A man arrives at a bus stop at a random time. Buses run every 30 minutes without fail, so the next bus will come any time during the next 30 minutes with evenly distributed probability (a uniform distribution). Find the probability that a bus will come within the next 10 minutes.
Solution:
The graph of the density function is a horizontal line above the interval from 0 to 30 and is the $x$-axis everywhere else, as shown in the figure below. Since the total area under the curve must be 1, the height of the horizontal line is $1/30$. The probability sought is $P(0\le X\le 10)$. By definition, this probability is the area of the rectangular region bounded above by the horizontal line $f(x)=1/30$, bounded below by the $x$-axis, bounded on the left by the vertical line at $x=0$ (the $y$-axis), and bounded on the right by the vertical line at $x=10$. I.e., it is the shaded rectangle in the figure below. Its area is the base of the rectangle times its height, $10(1/30)=1/3$, so $P(0\le X\le 10)=1/3$.
Normal Distributions
Most people have heard of the "bell curve." The bell curve is the graph of a specific pdf $f(x)$ that describes the behavior of continuous random variables as different as the heights of human beings, the amount of a product in a container that was filled by a high-speed packing machine, or the velocities of molecules in a gas. The formula for $f(x)$ contains two parameters $\mu$ and $\sigma$ that can be assigned any specific numerical values, so long as $\sigma$ is positive. We will not need to know the formula for $f(x)$, but for those who are interested it is
$$f(x)=\frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}$$
where $e\approx 2.71828$ is the base of the natural logarithms.
Each different choice of specific numerical values for the pair $\mu$ and $\sigma$ gives a different bell curve. The value of $\mu$ determines the location of the curve along the $x$-axis, as shown in Figure 5.5 below. The curve is always symmetric about a vertical line through $x=\mu$.
Figure 5.5 Bell Curves with $\sigma=0.25$ and Different Values of $\mu$
The value of $\sigma$ determines whether the bell curve is tall and thin or short and squat, subject always to the condition that the total area under the curve be equal to 1. This is shown in Figure 5.6 below, where we have arbitrarily chosen to center the curves at $\mu = 6$.
Figure 5.6 Bell Curves with $\mu=6$ and Different Values of $\sigma$
Definition
The probability distribution corresponding to the density function for the bell curve with parameters $\mu$ and $\sigma$ is called the normal distribution with mean $\boldsymbol{\mu}$ and standard deviation $\boldsymbol{\sigma}$. We sometimes denote this distribution by $\boldsymbol{N(\mu,\sigma)}$.
Definition
A continuous random variable whose probabilities are described by the normal distribution with mean $\mu$ and standard deviation $\sigma$ is called a normally distributed random variable, or a with mean $\mu$ and standard deviation $\sigma$.
A normally distributed random variable may be called a "normal random variable" for short. We write $\boldsymbol{X\sim N(\mu,\sigma)}$ to mean that $X$ is a random variable that is normally distributed with mean $\mu$ and standard deviation $\sigma$.
Figure 5.7 shows the density function that determines the normal distribution with mean $\mu$ and standard deviation $\sigma$. Note that the curve, like all normal curves, is symmetric about its mean.
Figure 5.7 Density Function for a Normally Distributed Random Variable with Mean $\mu$ and Standard Deviation $\sigma$
Example 3
Heights of 25-year-old men in a certain region have mean 69.75 inches and standard deviation 2.59 inches. These heights are approximately normally distributed. Thus the height $X$ of a randomly selected 25-year-old man is a normal random variable with mean $\mu = 69.75$ and standard deviation $\sigma = 2.59$. Sketch a qualitatively accurate graph of the density function for $X$. Find the probability that a randomly selected 25-year-old man is more than 69.75 inches tall.
Solution:
The distribution of heights looks like the bell curve shown below. It is centered at its mean, 69.75, and is symmetric about that mean.
Since the total area under the curve is 1, the area to the right of 69.75 is 1⁄2. But this area is precisely the probability $P(X\gt 69.75)$, the probability that a randomly selected 25-year-old man is more than 69.75 inches tall. Thus $P(X\gt 69.75) = 0.5$.
The Empirical Rule
If a data set has an approximately normal distribution, then:
- Approximately 68% of the data lie within one standard deviation of the mean
- Approximately 95% of the data lie within two standard deviations of the mean
- Approximately 99.7% of the data lie within three standard deviations of the mean
This fact is called the Empirical Rule. You should memorize it; it will come in handy.
Here is the Empirical Rule stated for a data set that has an approximately normal distribution with mean $\mu$ and standard deviation $\sigma\,$:
- Approximately 68% of the data lie in the interval $[\mu-\sigma,\mu+\sigma]$
- Approximately 95% of the data lie in the interval $[\mu-2\sigma,\mu+2\sigma]$
- Approximately 99.7% of the data lie in the interval $[\mu-3\sigma,\mu+3\sigma]$
Warm-Up Exercises
-
A continuous random variable $X$ has a uniform distribution on the interval $[5,12]$. Sketch the graph of its density function.
-
A continuous random variable $X$ has a uniform distribution on the interval $[-3,3]$. Sketch the graph of its density function.
-
A continuous random variable $X$ has a normal distribution with mean 100 and standard deviation 10. Sketch a qualitatively accurate graph of its density function.
-
A continuous random variable $X$ has a normal distribution with mean 73. The probability that $X$ takes a value greater than 80 is 0.212. Use this information and the symmetry of the density function to find the probability that $X$ takes a value less than 66. Sketch the density curve with relevant regions shaded to illustrate the computation.
-
A continuous random variable $X$ has a normal distribution with mean 50.5. The probability that $X$ takes a value less than 54 is 0.76. Use this information and the symmetry of the density function to find the probability that $X$ takes a value greater than 47. Sketch the density curve with relevant regions shaded to illustrate the computation.
-
The figure below shows the density curves of three normally distributed random variables $X_A$, $X_B$, and $X_C$. Their standard deviations (in no particular order) are 15, 7, and 20. Use the figure to identify the values of the means $\mu_A$, $\mu_B$, $\mu_C$ and the standard deviations $\sigma_A$, $\sigma_B$, $\sigma_C$ of the three random variables.
-
Dogberry's alarm clock is battery operated. The battery could fail with equal probability at any time of the day or night. Every day Dogberry sets his alarm for 6:30 a.m. and goes to bed at 10:00 p.m. Find the probability that when the clock battery finally dies, it will do so at the most inconvenient time, between 10:00 p.m. and 6:30 a.m.
-
The amount $X$ of orange juice in a randomly selected half-gallon container varies according to a normal distribution with mean 64 ounces and standard deviation 0.25 ounce.
- Sketch the graph of the density function for $X$.
- What proportion of all containers contain less than a half gallon (64 ounces)? Explain.
- What is the median amount of orange juice in such containers? Explain.
Answers
-
The graph is a horizontal line with height 1/7 from $x=5$ to $x=12$.
-
The graph is a horizontal line with height 1/6 from $x=-3$ to $x=3$.
-
The graph is a bell-shaped curve centered at 100 and extending from about 70 to 130.
-
0.212
-
0.76
-
$\mu_A=100$, $\mu_B=200$, $\mu_C=300$; $\sigma_A=7$, $\sigma_B=20$, $\sigma_C=15$
-
0.3542
-
- The graph is a bell-shaped curve centered at 64 and extending from about 63.25 to 64.75.
- 0.5
- 64
Source: https://sites.radford.edu/~scorwin/courses/200/book/160IntroToNormalDist.html
0 Response to "Standard Deviation of Continuous Random Variable"
Enregistrer un commentaire