Mean and median are measures of central tendency;
that is, they each provide a single number that attempts to describe the center of a collection of data.
However, data can be ‘spread out’ around its ‘center’ in very different ways!
This section explores the three most common measures of spread: range, variance, and standard deviation.
The following data sets all have mean equal to $\,1\,$:
$$
\begin{gather}
\cssId{s10}{1,\ \ 1,\ \ 1,\ \ 1,\ \ 1}\cr
\cr
\cssId{s11}{1,\ \ 0,\ \ 1,\ \ 2,\ \ 3}\cr
\cr
\cssId{s12}{1,\ \ 1,\ \ 1,\ \ 3,\ \ 3}
\end{gather}
$$
These three data sets are pictured below
(as pebbles of equal weight on a number line).
Notice that each has its balancing point (mean) at $\,1\,$,
but the data is spread about this mean in very different ways:
Clearly, the mean does not capture any information about the spread or variability of data about the mean.
First, we discuss the simplest measure of spread—the range.
Thus, the range is the difference between the greatest and least numbers in the data set.
Since
$\ x_{\text{max}}\ $ is always greater than or equal to $\ x_{\text{min}}\ $,
it follows that the range is always greater than
or equal to zero.
Since computation of the range uses only two members from a data set,
it is necessarily incomplete in the information that it provides.
However, the range is extremely easy to compute.
Another reasonable way to measure the spread takes into account how far each data element is from the mean:
From this definition, it is apparent that:
Merely summing the deviations from the mean is useless as a measure of spread
because the sum of all the deviations is always equal to zero,
as the following calculation shows:
$\displaystyle \begin{align} \cssId{s52}{\sum_{i=1}^n\ (x_i  \bar{x})} & \ \cssId{s53}{=\ (x_1  \bar{x}) + (x_2  \bar{x}) + \cdots + (x_n  \bar{x})}\cr\cr & \ \cssId{s54}{=\ (x_1 + x_2 + \cdots + x_n)  n\bar{x}}\cr\cr & \ \cssId{s55}{=\ n\cdot \frac{x_1 + x_2 + \cdots + x_n}{n}  n\bar{x}}\qquad \cssId{s56}{(\text{multiply first part by 1})}\cr\cr & \ \cssId{s57}{=\ n\bar{x}  n\bar{x}}\cr\cr & \ \cssId{s58}{=\ 0} \end{align} $
Also, we don't usually care whether data elements lie above or below the mean;
we're more interested simply in the distances from the mean.
A reasonable idea is to sum the absolute values of the deviations from the mean,
$\,x_i  \bar{x}\,$.
However, the absolute value function is not particularly easy to work with mathematically.
Instead, we get a good measure of spread by summing the squares of the deviations from
the mean,
$\,(x_i \bar{x})^2\,$.
There's just one little problem to resolve first.
The formulas for the population mean and the sample mean are identical:
add up the numbers, and divide by how many there are.
The population mean is denoted by
$\,\mu\,$
and a sample mean is denoted by
$\,\bar{x}\,$.
In general, population statistics are reported using Greek letters, like
$\,\mu\,$ (mu)
and $\,\sigma\,$ (sigma).
However, sample statistics are reported using Roman letters,
like $\,x\,$ and $\,s\,$.
The common formulas for measures of spread are slightly different,
depending upon whether you're looking at the entire population,
or just a sample from this population,
as shown next:
Thus, to find the variance of a population,
you sum the squared deviations from the mean,
and then divide by the number of data values.
Observe the difference between population variance and sample variance:
for the sample variance,
you divide by one less than the number of data values,
instead of the actual number of data values.
Why is this?
Here's one way to understand why:
if you randomly choose a sample from a population,
what's the likelihood that you'll choose both the greatest and the least values,
to represent the true variability in the data set?
NOT MUCH!
A sample tends to underestimate the true variability in a population.
To compensate, we divide by $\,n1\,$ instead of $\,n\,$;
dividing by a smaller number adjusts the result so it's a bit larger.
(The precise reason that you divide by $\,n1\,$ is explored in a collegelevel statistics course.)
Standard deviation has a couple advantages over variance:
To conclude, let's return to the three simple data sets presented at the start of this lesson,
and compute their measures of spread.
When computing sample variance and sample standard deviation, assume that the given data is part of some (unknown) larger population.
Jump up to wolframalpha.com and type in, say:
{1,1,1,3,3}
Voila!
Instant statistics!
How easy is that?
On this exercise, you will not key in your answer. However, you can check to see if your answer is correct. 
PROBLEM TYPES:
