Two types of measures of dispersion

Any data contains two term variables and the frequencies. Variables refer to the quantities that may change after an interval, while frequencies are their speed of occurring. Ensuring stability in variance is essential for optimizing errors and further difference. In this blog, we will study measures of dispersion in data.

Table of Contents Show

What is the Measure of Dispersion?
Types of Variability
Ways to Calculate Measures of Dispersion
What is Correlation?
Types of Correlation
Methods of measuring a Correlation

What is the Measure of Dispersion?

Every data consists of some variability within its range. Variability in data is defined as how far apart the data lie from each other and from the center of distribution. It is also known as the shatter, spread or dispersion of data.

Dispersion of data is defined as the degree to which the arithmetical data approached to spread an average value. Measure of dispersion helps in calculating the variability of data.

Like the central tendency of data, variability is also essential for summarizing the characteristics of data. It helps in stating the facts and figures of the data. The mathematical concept of variability has been thoroughly applied to science studies. The physics and chemistry branch of science don’t hold much variability as found in medicines and biology.

Also Read | What is Sampling Distribution?

Types of Variability

In any data, there can occur three kinds of variability, biological, real and experimental. Here is a brief introduction to three of them.

As the name suggests, biological variability is related to the human body and medications. In the same environment and test conditions two individuals can respond differently when compared to each other. These variations may arise due to difference in sex class, weight etc. Such variations are also known as biological variability.

Real variability is also known as variability within the limits. The variability is termed as real when the difference between two readings or observations is more than the defined limits of the universe.

Experimental variability occurs during the time of experiments. It may arise because of error or difference during methods, procedure or any other defects during the techniques. Experimental variability is further classified into three types: observer error, instrumental error, and sampling error.

Observer error: Observer error can be subjective or objective. For subjective error consider the way an interviewer may change the information while asking the questions and objective error occurs when an untrained observer records the measurement.

Instrumental error: Instrumental error is generally negligible and gross. It is caused because of defects in machines, height measures, calibration errors and other undesirable variability that is responsible for wrong conclusions or calculation.

Sampling error: Sometimes, biased samples are also responsible for different variations or calculations. A sample drawn shouldn't be too small to draw decisions else it may again result in variations.

Error also occurs when the sample is not the true representative of the population, and thus it won’t enable us to draw the true conclusions.

Also Read | What is Statistics?

Ways to Calculate Measures of Dispersion

Types of measure of dispersion

Below we’ve disclosed 5 different ways of calculating measures of dispersion. These are :

Range
Quartile deviation
Mean deviation
Standard deviation
Variance and coefficient of variance

Range is the difference between the highest and lowest value of the sample. The coefficient of range is defined as the relative measure of the range.

Mathematically, range (R) = H-L, where H is the higher limit and L is the lower limit.

and, the coefficient of range is defined as H-L / H+L.

For consideration, consider the data related to weekly production of a fabric producing industry.

Week

Production (in thousands)

8.3

10.2

13.2

9.6

Table 1

For the above data, the range is H-L = 13.2-8.3 = 4.9.

and, coefficient of range is defined as H-L / H+L = 4.9/21.5 = 0.228

The advantage of range is that it is easy to calculate and understand. Range calculation is helpful in calculating statistical quality control and weather forecasting.

The disadvantage of range is that it is not worthy for thorough analysis as it gets affected by the extreme value of sample distribution.

Quartile deviation is also known as interquartile range. It is defined as the range of a group of observations, it is calculated by processing the value of the upper quartile and lower quartile of the particular group.

For a group, the upper quartile is defined as the value above which 25% of the observations lie. Lower quartile is vice versa, it is the value below which 25% of the observation lies.

For consideration, here is the class data of a school,

Weight (in kg)

Frequencies

Table 2

For the given table, the upper quartile is 64 and the lower quartile is 60, so the quartile range is 64-60= 4.

The advantage of quartile deviation is that it is easy to calculate and remains unaffected by the extreme values. Moreover, quartile deviation is more beneficial when the observer has to deal with the half of the group only.

The disadvantage of quartile deviation is that it avoids only 50% of the extreme value and is not suitable for algebraic treatment.

Also Read | Analysis of Variance (ANOVA)

Mean deviation is defined as the average or mean of the deviations of the value from central tendency. Central tendency can be any measurement mean, median or mode. The mean deviation is calculated in following steps,

Consider the entire set of data as x.
Now, sum up the arithmetic mean as x’.
Now calculate the deviation of each observation entry from the mean.
Take the positive value of each difference into consideration.
Final deviation is calculated by this formula. MD = sum of all differences between the observations and mean/ total number of observations.

For consideration, here is a demo data set.

S. No	x (data)	dx = x-x’	Positive value of dx
1	10	10-21 = -11	11
2	26	26-21 = 5	5
3	20	20-21 = -1	1
4	23	23-21 = 2	2
5	15	15-21 = -6	6
6	32	32-21 = 11	11
Total = 6	126 (mean = 126/6= 21)		Total = 36

Table 3

Mean deviation = sum of differences of mean and observation/ total number of observations

= 36/6 = 6

The coefficient of mean deviation is defined as the ratio of mean deviation to arithmetic mean, 6/21= 0.285.

The advantage of mean deviation is that it can be easily calculated by any method of central tendency, the observations are volatile by extreme items, and is based on the measurement not on estimation.

The drawback of mean deviation is that it ignores the original sign of observation while calculation, thus, not suitable for accurate and deeper analysis.

Standard deviation is defined as the square root of arithmetic mean of the squared deviation of observation taken from the average observation. The steps for the calculation of standard deviation are as follows:

Calculate the average or mean of data.
Find the deviations and calculate the square of these deviations.
Now find the summation of these deviations.
Divide your calculation by the total number of observations and find the root of it.

For consideration, for table 3 data, standard deviation, square root of (362 / 6) which is 14.64 here.

Standard deviation is used for calculating the large set of data. It helps in calculating the errors and differences in the data set. It is used for identifying the suitable size of the data.

Variance is defined as the square of the standard deviation. Variance is helpful for drawing the inferences of the statistics. The coefficient of variance is used for the comparison of variability of one character in two different variable groups. Coefficient of variation is calculated from standard deviation and the arithmetic mean of the observation.

For consideration, for the data of table 3, variance is calculated as (14.694)2 = 215. 913

Variance and Coefficients of variance are versatile in the data series which have the same units but different standard deviations. It helps in comparison and representation of series with different units.

The only disadvantage is that variance is unitless.

What is Correlation?

Another term related to variables and statistics is correlation. Correlation is defined as a relationship between the two variables of a data study.

In other words, when in a set of observations, two variables are inter-related in such a manner that change in one variable affects the other variable of the entry too is called correlation. Correlation helps to study the linear association or relativity between two quantitative parameters.

Types of Correlation

There are primarily three categories of Correlation. Here is the brief introduction to three of them:

If the change in variation directly affects the other variation, it is defined as a positive correlation. For example, height and weight. And, when the change in variation is inverse it is called as negative correlation. When in a data set, there is no relation between two variables it is called zero correlation.

When the change in value of a variable changes constantly with the change in another variable, it is known as linear correlation. But, when this change is not constant with the other change, the correlation is termed as non-linear.

The simple study with two variables is called simple correlation. When this study is done keeping the other variables constant, the process is called partial correlation. And, when study is conducted fluctuating the other variables, the study is termed as multiple correlation.

Also Read | What is Vital Statistics?

Methods of measuring a Correlation

Correlation study can be measured in following ways:

Scattered diagram: Scatter diagram is a graphical method of presentation, where one variable is presented on the x-axis and another variable on the y-axis.

Correlation graph: Correlation graph is the graphical presentation where two variables are plotted on different graphs. Two graphs are obtained and hence studied for the correlation studies.

Spearman’s rank coefficient of correlation: Spearman’s method considers the ranks of two variables for finding their correlation.

Concurrent deviation method: Concurrent deviation method works on the direction of change of two paired variations. The coefficient deviation between two X and Y series is called the coefficient of concurrent deviation.

Also Read | Statistical Quality Control

Statistics is an important subject not for the theoretical classes, but for the different practical applications in life. For example, statistics helps in data management of bigger firms and organizations. From calculating errors in manufacturing to maintaining the raw supply and other stocks, dispersion has a huge role in businesses and organizations.