No results found
We can’t find anything with that term at the moment, try searching something else.
Calculator for mean, median, and mode in statistics. Use this calculator to get the mean, median, mode, range, and the average for any data set.
Result | |||
---|---|---|---|
Mean x̄ | 16.75 | Outliers | 6, 33, 35 |
Median x̃ | 15 | Quartile Q1 | 12.5 |
Mode | 15 appeared 3 times | Quartile Q2 | 15 |
Range | 29 | Quartile Q3 | 16 |
Minimum | 6 | Interquartile Range IQR | 3.5 |
Maximum | 35 | ||
Sum | 201 | ||
Count n | 12 |
There was an error with your calculation.
Looking at tables and graphs of statistical data can be difficult for us to interpret. We often need to summarize data sets and identify important features to get more useful information from statistics.
In statistics, different measures are used to summarize data. Some describe the center of the data; they are called measures of central tendency. Others tell how scattered the data values are; they are called dispersion measures. Others, called position measures, reveal the proportion of the data that is less than a given value.
The primary purpose of this calculator is to calculate measures of central tendency—the mean and median—which can represent the typical or central value in a data set. The secondary purpose of this calculator is to determine the degree of variation in a data set by calculating the range, quartiles, and interquartile range.
The mean is the sum of the values divided by the total number of values. It is easiest to understand and calculate using the following formula for calculating the mean for a sample:
$$\bar{x}=\frac{x₁+x₂+x₃+\ldots+x_n}{n}=\frac{\sum_{}^{}x}{n}$$
The formula for the mean for the population is:
$$\mu=\frac{x₁+x₂+x₃+\ldots+x_n}{N}=\frac{\sum_{}^{}x}{N}$$
Here, the numerator represents the sum of the values in the data set. And the denominator represents the number of values in the data set.
The main feature of using the arithmetic mean is that it involves all the data points present in the data set.
The main limitation of the mean is that it is susceptible to extreme values that are either too large or too small. Such values are known as outliers, and they significantly affect the average.
Note also that the average value is not necessarily the typical value for the data. The mean value may be a value that is not present in the data set at all.
The population consists of the entire set of values about which information is obtained. The sample consists of a smaller group taken from the population.
The method for calculating the mean value is the same for both samples and populations. Only the designations differ.
If x₁, x₂,..., xₙ is a sample, the mean is referred to as the sample mean and is represented by the symbol x̄. The mean of the population is denoted by the Greek letter 𝜇.
In statistics, we use the lowercase letter n to denote the sample size and the uppercase letter N to denote the population size.
Let's look at the following example: Luigi is a first-rate chef and pizza lover. He has decided to open his pizzeria in Bali. To find an investor, Luigi writes a business plan. He wants to determine the average cost of pizza at different restaurants on the island to value future financial performance.
He did a little research on the price of Margherita pizza at restaurants in Bali and got a data set of pizza prices. For ease of calculation, let's discard the last three zeros and use the number of thousands in the price. That is, 60 in our calculations will mean 60,000 Indonesian rupiahs.
60, 60, 84, 45, 59, 70, 42, 59, 53, 70, 69, 70, 120, 160, 95, 50, 75, 55, 72, 70
Luigi hasn't toured every pizzeria on the island. He randomly selected 20 of them. Thus, we are dealing with a sample.
Let's calculate the average value for this data set using the formula:
$$\bar{x}=\frac{x₁+x₂+x₃+\ldots+x_n}{n}=\frac{\sum_{}^{}x}{n}$$
We end up with the mean x̄ = 71.9.
Luigi's research shows that 71,900 Indonesian rupiah is the average price of a Margherita pizza in Bali. He can now base his calculations on this price.
The median is a positional measure representing the average value of a data set arranged in ascending or descending order.
By calculating the median, we try to find a number that divides the data set in half. Half of the data values are less than the median, and half are greater than the median. This is why when we manually determine the median without a median calculator, we need to sort the values in ascending or descending order.
Calculating the median differs depending on whether the number of values in the data set is even or odd.
If the total number of elements is odd, that is, n or N is odd, then the following formula applies:
$$Median=(\frac{n+1}{2})-th \ element$$
However, if the number of elements is even, which means that n is an even number, then the following formula is used:
$$Median=\frac{\left[(\frac{n}{2})-th \ element+(\frac{n}{2}+1)-th \ element\right]}{2}$$
The main advantage of using the median is that it is least affected by extremely high or extremely low values.
For a given set of twenty values,
60, 60, 84, 45, 59, 70, 42, 59, 53, 70, 69, 70, 120, 160, 95, 50, 75, 55, 72, 70
We can calculate the median as follows:
42, 45, 50, 53, 55, 59, 59, 60, 60, 69, 70, 70, 70, 70, 72, 75, 84, 95, 120, 160
Let's determine the number of values in the data set. We have n = 20.
If n is odd, we choose the central value of the data as the median. If n is even, we find the arithmetic mean of the two median values. Add them and divide the sum by 2.
20 is an even number.
The central values in our sample are 69 and 70. We find the median this way:
$$Median = \frac{69 + 70}{2} = 69.5$$
If Luigi had a set of 21 values, e.g.,
60, 60, 84, 45, 59, 70, 42, 59, 53, 70, 69, 70, 120, 160, 95, 50, 75, 90, 55, 72, 70
He could order the values:
42, 45, 50, 53, 55, 59, 59, 60, 60, 69, 70, 70, 70, 70, 72, 75, 84, 90, 95, 120, 160
and select the value in the center at the 11th position, that is, 70.
Both the mean and median are used as measures of central tendency. But it is essential to know how they differ.
One crucial difference between the mean and the median is that the formula for the mean uses all the values in the data set. In contrast, the formula for the median depends only on the central number or two of the central numbers.
This is especially important for data sets where one or more numbers are unusually large or unusually small. Such numbers are called outliers. In most cases, these outliers will significantly affect the mean, but they will have little or no effect on the median.
In statistics, we say that a measure is resistant if its value is not greatly affected by extreme values in the data set. So we can say that the median is resistant, and the mean is not resistant.
The mean and median measure the center of the data set differently. The mean is the point at which the data set balances. The median is the average that separates 50% of the data on one side from 50% of the data on the other side. When the data set is symmetric, the mean and median are equal.
However, the mean and median may not be equal.
In some data sets, the mean may be less than the median, or the median may be less than the mean. In this case, we say that the data set is skewed.
If the mean value is positioned to the left or less than the median, we say the dataset is skewed to the left. If the mean is positioned to the right or greater than the median, we say the dataset is skewed to the right.
Neither the mean nor the median is better as the measure of central tendency. They both measure the center in different ways. Some experts prefer to use the median when the data is highly skewed or contains extreme values because the median is more representative of a typical value.
A mode is the value of a dataset that occurs the maximum number of times in the dataset. The mode of a dataset is the value that appears most frequently.
A dataset is unimodal if it has one value that occurs more frequently than any other.
If a data set has two values with the same highest frequency, then both values are considered modal, and the data set is considered bimodal.
If a dataset has more than two values with the same highest frequency, then each value is used as a mode, and the dataset is considered multimodal.
If no single data value occurs more than once, then the data set is said to have no mode. In this case, it would be incorrect to say that the mode is zero. Actually zero may be the actual value in some data sets, such as temperature measurements.
The main advantage of calculating a mode is that it is easiest to find and is not affected by extreme values. The disadvantage of mode calculation is that, in certain situations, a mode value may not exist for some data sets.
For a given set of twenty values,
60, 60, 84, 45, 59, 70, 42, 59, 53, 70, 69, 70, 120, 160, 95, 50, 75, 55, 72, 70
We can find the mode as follows:
Arrange the data set in ascending or descending order. Here the order is as follows:
42, 45, 50, 53, 55, 59, 59, 60, 60, 69, 70, 70, 70, 70, 72, 75, 84, 95, 120, 160
Next, we find the value repeated the maximum number of times. Here, the most frequent value is 70. Hence, for a given data set, the modal value is 70.
While the mode is a measure of central tendency, it may not always reflect the central value of a distribution, particularly in skewed distributions. The mode can be the largest value in the data set, the smallest value, or any other value. For example, if we had the following numbers in the data set:
42, 45, 50, 53, 55, 57, 59, 60, 63, 69, 70, 72, 79, 82, 83, 95, 96, 120, 120, 120
The mode would be 120. Although in this case, it would not reflect the central tendency.
Interestingly, we can only calculate the mean and median for quantitative data. And we can calculate the mode for both quantitative and qualitative data.
On average, Anna eats pizza 12 times per month.
In this case, we will have two modes: Napoletana pizza and Margherita pizza.
Measures of dispersion, also known as measures of variability, are used to determine the spread or variability within a data set. They usually reflect the degree of variation in the data from the central value. We can examine the variance in a data set using the range, quartiles, and interquartile range.
The range for a data set is the difference between the highest and lowest value in the data set. We can calculate it by determining the maximum and minimum values of the data set. The formula for calculating the range is:
Range = Largest value - Smallest value
For a given set of twenty values,
60, 60, 84, 45, 59, 70, 42, 59, 53, 70, 69, 70, 120, 160, 95, 50, 75, 55, 72, 70
we can calculate the range as follows:
Arrange the data set in ascending or descending order. Here, the order looks like this:
42, 45, 50, 53, 55, 59, 59, 60, 60, 69, 70, 70, 70, 70, 72, 75, 84, 95, 120, 160
Further, the highest value is 160, and the lowest value is 42. Hence, the range:
Range = largest value - smallest value = 160 - 42 = 118
Therefore, for this data set, the range is 118.
Quartiles are values that divide the data set into four quarters by three points, namely the first, second, and third quartiles.
The first quartile, labeled Q₁, is the value below which 25% of the data falls, with the remaining 75% lying above it.
The second quartile, labeled Q₂, is also known as the median. It divides the dataset into two equal parts, with 50% of the values lying below it and 50% above.
The third quartile, denoted Q₃, is the value below which 75% of the data falls, with the remaining 25% lying above it.
A procedure for calculating the quartiles of a data set:
Arrange the data in ascending order.
To calculate the second quartile, calculate the median. For the first and third quartiles, proceed as follows. Determine n - the number of values in the data set.
For the first quartile, calculate L = 0.25n. For the third quartile, calculate L = 0.75n.
If L is an integer, the quartile is the average of the number at position L and the number at position L + 1.
If L is not an integer, round it up to the next higher integer. The quartile is the number at the position corresponding to the rounded value.
For a given set of twenty values,
60, 60, 84, 45, 59, 70, 42, 59, 53, 70, 69, 70, 120, 160, 95, 50, 75, 55, 72, 70
We can calculate the quartiles as follows:
42, 45, 50, 53, 55, 59, 59, 60, 60, 69, 70, 70, 70, 70, 72, 75, 84, 95, 120, 160
Median = 70
L for the first quartile: 0.25 × 20 = 5. L for the third quartile: 0.75 × 20 = 15.
5 is an integer, so Q₁ in our case is:
$$Q₁=\frac{55+59}{2}=57$$
$$Q₃=\frac{72+75}{2}=73.5$$
Therefore, for this data set, the first quartile is 57, the second is 70, and the third is 73.5.
The interquartile range (IQR) is the difference between the third Q₃ and first Q₁ quartiles of a data set. It is a measure of the average dispersion, which can be calculated as follows:
IQR = Q₃ - Q₁
In the previous section, we already calculated the first and third quartiles. They are 57 and 73.5. All we have to do is simply apply the formula.
IQR = Q₃ - Q₁ = 73.5 - 57 = 16.5
Thus, for this data set, the interquartile range is 16.5.
In our case, with Luigi's mini-survey of Margherita pizza prices, he could draw the following conclusions: The mean and median did not match; a slight skew in the data was formed. But it is not very noticeable. So both the mean and the median could be used to measure the central tendency.
If Luigi wanted to determine an average price for a Margherita pizza, he could consider either the mean or the median. However, prices such as 71,900 IDR or 69,500 IDR might not be as memorable. Fortunately, the mode price for Margherita pizza falls within this range, at 70,000 IDR, which makes it a convenient figure for Luigi to use in his pricing strategy.
If he wanted to create a pizzeria for a more thrifty target group, he could focus on figures closer to the first quartile. That is a price of around 57,000 Indonesian rupees. It is not very convenient to focus on the third quartile to determine the price for more demanding clients because the third quartile is not very representative.