Which Measure Of Central Tendency Is Least Representative Of The Data Set Shown? 2, 2, 3, 4, 5, 32

Measures of Central Tendency

Introduction

A measure of central tendency is a unmarried value that attempts to depict a set of data by identifying the primal position within that set of data. As such, measures of fundamental trend are sometimes called measures of cardinal location. They are besides classed equally summary statistics. The mean (frequently called the average) is most likely the mensurate of central tendency that you lot are most familiar with, but there are others, such as the median and the mode.

The mean, median and way are all valid measures of central tendency, just under different conditions, some measures of fundamental tendency become more than advisable to utilize than others. In the post-obit sections, we volition await at the mean, way and median, and learn how to calculate them and under what atmospheric condition they are nigh appropriate to exist used.

Hateful (Arithmetics)

The hateful (or boilerplate) is the most pop and well known measure out of primal tendency. Information technology tin can be used with both discrete and continuous data, although its apply is about frequently with continuous data (come across our Types of Variable guide for data types). The mean is equal to the sum of all the values in the information fix divided by the number of values in the data set. So, if nosotros take $ n $ values in a information fix and they take values $ x_1, x_2, $ …$, x_n $, the sample mean, usually denoted past $ \overline{ten} $ (pronounced "ten bar"), is:

$$ \overline{10} = {{x_1 + x_2 + \dots + x_n}\over{northward}} $$

This formula is normally written in a slightly dissimilar mode using the Greek capitol letter, $ \sum $, pronounced "sigma", which means "sum of...":

$$ \overline{10} = {{\sum{10}}\over{north}} $$

You may have noticed that the to a higher place formula refers to the sample mean. So, why have we called it a sample mean? This is because, in statistics, samples and populations take very dissimilar meanings and these differences are very important, even if, in the case of the hateful, they are calculated in the aforementioned way. To admit that nosotros are computing the population hateful and not the sample mean, we utilise the Greek lower case letter "mu", denoted as $ \mu $:

$$ \mu = {{\sum{x}}\over{north}} $$

The hateful is essentially a model of your information set. Information technology is the value that is most mutual. You lot will find, nonetheless, that the mean is not oftentimes one of the actual values that yous have observed in your data set. However, one of its of import properties is that information technology minimises fault in the prediction of any 1 value in your data set. That is, it is the value that produces the lowest amount of fault from all other values in the data set.

An of import property of the mean is that information technology includes every value in your data fix equally part of the calculation. In improver, the mean is the merely measure out of cardinal tendency where the sum of the deviations of each value from the mean is always nothing.

When not to apply the mean

The hateful has one master disadvantage: it is particularly susceptible to the influence of outliers. These are values that are unusual compared to the rest of the data prepare by being especially minor or big in numerical value. For case, consider the wages of staff at a factory beneath:

Staff	1	2	3	4	5	6	seven	viii	9	x
Salary	15k	18k	16k	14k	15k	15k	12k	17k	90k	95k

The hateful salary for these ten staff is $thirty.7k. Nonetheless, inspecting the raw data suggests that this mean value might not be the best manner to accurately reflect the typical salary of a worker, as most workers take salaries in the $12k to 18k range. The mean is existence skewed past the two large salaries. Therefore, in this situation, we would like to have a improve measure out of primal tendency. As nosotros will find out after, taking the median would be a better measure of fundamental trend in this situation.

Another time when we usually prefer the median over the mean (or mode) is when our data is skewed (i.east., the frequency distribution for our data is skewed). If nosotros consider the normal distribution - every bit this is the most frequently assessed in statistics - when the data is perfectly normal, the mean, median and mode are identical. Moreover, they all represent the most typical value in the data fix. Even so, as the data becomes skewed the mean loses its power to provide the best central location for the data considering the skewed information is dragging it abroad from the typical value. Still, the median best retains this position and is not as strongly influenced past the skewed values. This is explained in more than detail in the skewed distribution department after in this guide.

Median

The median is the eye score for a set of data that has been arranged in order of magnitude. The median is less affected by outliers and skewed data. In club to calculate the median, suppose we have the data below:

We beginning need to rearrange that data into order of magnitude (smallest first):

xiv

Our median marking is the middle marking - in this instance, 56 (highlighted in assuming). It is the middle mark because there are 5 scores before it and 5 scores later on it. This works fine when you have an odd number of scores, simply what happens when y'all accept an even number of scores? What if you had only 10 scores? Well, you only have to take the middle two scores and average the effect. So, if we look at the case below:

Nosotros once more rearrange that data into gild of magnitude (smallest first):

Simply now nosotros have to take the 5th and 6th score in our data prepare and average them to get a median of 55.5.

Way

The mode is the most frequent score in our data set. On a histogram information technology represents the highest bar in a bar chart or histogram. Yous can, therefore, sometimes consider the manner as being the most pop option. An example of a fashion is presented below:

Histogram showing mode as highest bar in the middle of the continuous distribution as the mode

Commonly, the mode is used for categorical data where we wish to know which is the most common category, as illustrated below:

Bar chart showing highest bar as the mode

Nosotros can run into above that the almost common form of send, in this item data set, is the bus. However, one of the problems with the mode is that it is not unique, so it leaves us with problems when we have two or more values that share the highest frequency, such as below:

Histogram of a continuous distribution showing two modes, both somewhat centrally located

We are now stuck as to which mode best describes the central trend of the information. This is particularly problematic when we have continuous data considering we are more probable not to have whatever ane value that is more frequent than the other. For example, consider measuring 30 peoples' weight (to the nearest 0.1 kg). How likely is it that we will detect 2 or more people with exactly the aforementioned weight (e.one thousand., 67.4 kg)? The respond, is probably very unlikely - many people might be close, but with such a small sample (30 people) and a large range of possible weights, you are unlikely to find two people with exactly the same weight; that is, to the nearest 0.i kg. This is why the mode is very rarely used with continuous data.

Another trouble with the mode is that information technology will not provide usa with a very proficient mensurate of primal tendency when the most common mark is far away from the rest of the data in the data set, equally depicted in the diagram below:

Histogram of a continuous distribution showing mode not centrally located

In the above diagram the mode has a value of 2. We can clearly see, however, that the mode is non representative of the data, which is by and large concentrated around the twenty to 30 value range. To utilize the style to describe the central trend of this information set up would exist misleading.

Skewed Distributions and the Mean and Median

We oftentimes test whether our information is usually distributed because this is a common assumption underlying many statistical tests. An instance of a normally distributed gear up of data is presented below:

A histogram showing a normally distributed continuous data set

When you accept a normally distributed sample you can legitimately employ both the mean or the median as your measure of cardinal tendency. In fact, in whatsoever symmetrical distribution the mean, median and manner are equal. However, in this situation, the mean is widely preferred as the best measure of central tendency because it is the measure that includes all the values in the data set for its adding, and any change in any of the scores volition affect the value of the mean. This is not the instance with the median or style.

However, when our data is skewed, for instance, as with the right-skewed data set below:

Histogram of a skewed distribution showing a noticable difference between the median and mean values

Nosotros discover that the mean is existence dragged in the direct of the skew. In these situations, the median is generally considered to exist the best representative of the cardinal location of the data. The more skewed the distribution, the greater the deviation between the median and mean, and the greater emphasis should be placed on using the median every bit opposed to the mean. A classic example of the above correct-skewed distribution is income (salary), where higher-earners provide a imitation representation of the typical income if expressed as a mean and non a median.

If dealing with a normal distribution, and tests of normality prove that the data is not-normal, information technology is customary to use the median instead of the mean. However, this is more a dominion of thumb than a strict guideline. Sometimes, researchers wish to report the mean of a skewed distribution if the median and mean are not appreciably dissimilar (a subjective assessment), and if information technology allows easier comparisons to previous enquiry to be fabricated.

Summary of when to use the mean, median and mode

Please use the following summary table to know what the all-time measure of central tendency is with respect to the different types of variable.

Type of Variable	All-time measure of central trend
Nominal	Manner
Ordinal	Median
Interval/Ratio (not skewed)	Mean
Interval/Ratio (skewed)	Median

For answers to frequently asked questions virtually measures of central trend, please get the adjacent page.