Quartiles

Quartile

Statistic which divides data into four same-sized parts for analysis

In statistics, quartiles are a type of quantiles which divide the number of data points into four parts, or quarters, of more-or-less equal size. The data must be ordered from smallest to largest to compute quartiles; as such, quartiles are a form of order statistic. The three quartiles, resulting in four data divisions, are as follows:

The first quartile (Q₁) is defined as the 25th percentile where lowest 25% data is below this point. It is also known as the lower quartile.
The second quartile (Q₂) is the median of a data set; thus 50% of the data lies below this point.
The third quartile (Q₃) is the 75th percentile where lowest 75% data is below this point. It is known as the upper quartile, as 75% of the data lies below this point.^[1]

Along with the minimum and maximum of the data (which are also quartiles), the three quartiles described above provide a five-number summary of the data. This summary is important in statistics because it provides information about both the center and the spread of the data. Knowing the lower and upper quartile provides information on how big the spread is and if the dataset is skewed toward one side. Since quartiles divide the number of data points evenly, the range is generally not the same between adjacent quartiles (i.e. usually (Q₃ - Q₂) ≠ (Q₂ - Q₁)). Interquartile range (IQR) is defined as the difference between the 75th and 25th percentiles or Q₃ - Q₁. While the maximum and minimum also show the spread of the data, the upper and lower quartiles can provide more detailed information on the location of specific data points, the presence of outliers in the data, and the difference in spread between the middle 50% of the data and the outer data points.^[2]

Computing methods

Discrete distributions

For discrete distributions, there is no universal agreement on selecting the quartile values.^[3]

Method 4

If we have an ordered dataset $x_{1},x_{2},...,x_{n}$ , then we can interpolate between data points to find the $p$ th empirical quantile if $x_{i}$ is in the $i/(n+1)$ quantile. If we denote the integer part of a number $a$ by $\lfloor a\rfloor$ , then the empirical quantile function is given by,

$q(p/4)=x_{k}+\alpha (x_{k+1}-x_{k})$ ,

where $k=\lfloor p(n+1)/4\rfloor$ and $\alpha =p(n+1)/4-\lfloor p(n+1)/4\rfloor$ .^[1]

To find the first, second, and third quartiles of the dataset we would evaluate $q(0.25)$ , $q(0.5)$ , and $q(0.75)$ respectively.

Continuous probability distributions

Quartiles on a cumulative distribution function of a normal distribution

If we define a continuous probability distributions as $P(X)$ where $X$ is a real valued random variable, its cumulative distribution function (CDF) is given by

$F_{X}(x)=P(X\leq x)$ .^[1]

The CDF gives the probability that the random variable $X$ is less than or equal to the value $x$ . Therefore, the first quartile is the value of $x$ when $F_{X}(x)=0.25$ , the second quartile is $x$ when $F_{X}(x)=0.5$ , and the third quartile is $x$ when $F_{X}(x)=0.75$ .^[5] The values of $x$ can be found with the quantile function $Q(p)$ where $p=0.25$ for the first quartile, $p=0.5$ for the second quartile, and $p=0.75$ for the third quartile. The quantile function is the inverse of the cumulative distribution function if the cumulative distribution function is monotonically increasing because the one-to-one correspondence between the input and output of the cumulative distribution function holds.

Outliers

There are methods by which to check for outliers in the discipline of statistics and statistical analysis. Outliers could be a result from a shift in the location (mean) or in the scale (variability) of the process of interest.^[6] Outliers could also be evidence of a sample population that has a non-normal distribution or of a contaminated population data set. Consequently, as is the basic idea of descriptive statistics, when encountering an outlier, we have to explain this value by further analysis of the cause or origin of the outlier. In cases of extreme observations, which are not an infrequent occurrence, the typical values must be analyzed. The Interquartile Range (IQR), defined as the difference between the upper and lower quartiles ( ${\textstyle Q_{3}-Q_{1}}$ ), may be used to characterize the data when there may be extremities that skew the data; the interquartile range is a relatively robust statistic (also sometimes called "resistance") compared to the range and standard deviation. There is also a mathematical method to check for outliers and determining "fences", upper and lower limits from which to check for outliers.

After determining the first (lower) and third (upper) quartiles ( ${\textstyle Q_{1}}$ and ${\textstyle Q_{3}}$ respectively) and the interquartile range ( ${\textstyle {\textrm {IQR}}=Q_{3}-Q_{1}}$ ) as outlined above, then fences are calculated using the following formula:

{\text{Lower fence}}=Q_{1}-(1.5\times \mathrm {IQR} )

{\text{Upper fence}}=Q_{3}+(1.5\times \mathrm {IQR} )

Boxplot Diagram with Outliers

The lower fence is the "lower limit" and the upper fence is the "upper limit" of data, and any data lying outside these defined bounds can be considered an outlier. The fences provide a guideline by which to define an outlier, which may be defined in other ways. The fences define a "range" outside which an outlier exists; a way to picture this is a boundary of a fence. It is common for the lower and upper fences along with the outliers to be represented by a boxplot. For the boxplot shown on the right, only the vertical heights correspond to the visualized data set while horizontal width of the box is irrelevant. Outliers located outside the fences in a boxplot can be marked as any choice of symbol, such as an "x" or "o". The fences are sometimes also referred to as "whiskers" while the entire plot visual is called a "box-and-whisker" plot.

When spotting an outlier in the data set by calculating the interquartile ranges and boxplot features, it might be easy to mistakenly view it as evidence that the population is non-normal or that the sample is contaminated. However, this method should not take place of a hypothesis test for determining normality of the population. The significance of the outliers varies depending on the sample size. If the sample is small, then it is more probable to get interquartile ranges that are unrepresentatively small, leading to narrower fences. Therefore, it would be more likely to find data that are marked as outliers.^[7]

Share this article:

This article uses material from the Wikipedia article Quartiles, and is written by contributors. Text is available under a CC BY-SA 4.0 International License; additional terms may apply. Images, videos and audio are available under their respective licenses.

[:0-1] [1]
Dekking, Michel (2005). A modern introduction to probability and statistics: understanding why and how. London: Springer. pp. 236-238. ISBN 978-1-85233-896-1. OCLC 262680588.

[2] [2]
Knoch, Jessica (February 23, 2018). "How are Quartiles Used in Statistics?". Magoosh. Archived from the original on December 10, 2019. Retrieved February 24, 2023.

[3] [3]
Hyndman, Rob J; Fan, Yanan (November 1996). "Sample quantiles in statistical packages". American Statistician. 50 (4): 361–365. doi:10.2307/2684934. JSTOR 2684934.

[4] [4]
Tukey, John Wilder (1977). Exploratory Data Analysis. ISBN 978-0-201-07616-5.

[5] [5]
"6. Distribution and Quantile Functions" (PDF). math.bme.hu.

[6] [6]
Walfish, Steven (November 2006). "A Review of Statistical Outlier Method". Pharmaceutical Technology.

[7] [7]
Dawson, Robert (July 1, 2011). "How Significant is a Boxplot Outlier?". Journal of Statistics Education. 19 (2). doi:10.1080/10691898.2011.11889610.

[8] [8]
"How to use the Excel QUARTILE function | Exceljet". exceljet.net. Retrieved December 11, 2019.

[9] [9]
"Quantiles of a data set – MATLAB quantile". www.mathworks.com. Retrieved December 11, 2019.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

Symbol	Names	Definition
Q₁	First quartile Lower quartile 25th percentile	Splits off the lowest 25% of data from the highest 75%
Q₂	Second quartile Median 50th percentile	Cuts data set in half
Q₃	Third quartile Upper quartile 75th percentile	Splits off the highest 25% of data from the lowest 75%

	Method 1	Method 2	Method 3	Method 4
Q₁	15	25.5	20.25	15
Q₂	40	40	40	40
Q₃	43	42.5	42.75	43

	Method 1	Method 2	Method 3	Method 4
Q₁	15	15	15	13
Q₂	37.5	37.5	37.5	37.5
Q₃	40	40	40	40.25

Environment	Function	Quartile Method
Microsoft Excel	QUARTILE.EXC	Method 4
Microsoft Excel	QUARTILE.INC	Method 3
TI-8X series calculators	1-Var Stats	Method 1
R	fivenum	Method 2
Python	numpy.percentile	Method 3
Python	pandas.DataFrame.describe	Method 3

Quart	Output QUARTILE Value
0	Minimum value
1	Lower Quartile (25th percentile)
2	Median
3	Upper Quartile (75th percentile)
4	Maximum value

Quartiles

Quartile

Definitions

Computing methods

Discrete distributions

Method 1

Method 2

Method 3

Method 4

Example 1

Example 2

Continuous probability distributions

Outliers

Computer software for quartiles

Excel

MATLAB

See also

References

External links

Share this article:

p	Output QUARTILE Value
0	Minimum value
0.25	Lower Quartile (25th percentile)
0.5	Median
0.75	Upper Quartile (75th percentile)
1	Maximum value