Two-way_analysis_of_variance

Two-way analysis of variance

Statistical test examining influence of two categorical variables on one continuous variable

In statistics, the two-way analysis of variance (ANOVA) is an extension of the one-way ANOVA that examines the influence of two different categorical independent variables on one continuous dependent variable. The two-way ANOVA not only aims at assessing the main effect of each independent variable but also if there is any interaction between them.

Data set

Let us imagine a data set for which a dependent variable may be influenced by two factors which are potential sources of variation. The first factor has $I$ levels ( $i\in \{1,\ldots ,I\}$ ) and the second has $J$ levels ( $j\in \{1,\ldots ,J\}$ ). Each combination $(i,j)$ defines a treatment, for a total of $I\times J$ treatments. We represent the number of replicates for treatment $(i,j)$ by $n_{ij}$ , and let $k$ be the index of the replicate in this treatment ( $k\in \{1,\ldots ,n_{ij}\}$ ).

From these data, we can build a contingency table, where $n_{i+}=\sum _{j=1}^{J}n_{ij}$ and $n_{+j}=\sum _{i=1}^{I}n_{ij}$ , and the total number of replicates is equal to $n=\sum _{i,j}n_{ij}=\sum _{i}n_{i+}=\sum _{j}n_{+j}$ .

The experimental design is balanced if each treatment has the same number of replicates, $K$ . In such a case, the design is also said to be orthogonal, allowing to fully distinguish the effects of both factors. We hence can write $\forall i,j\;n_{ij}=K$ , and $\forall i,j\;n_{ij}={\frac {n_{i+}\cdot n_{+j}}{n}}$ .

Model

Upon observing variation among all $n$ data points, for instance via a histogram, "probability may be used to describe such variation".^[4] Let us hence denote by $Y_{ijk}$ the random variable which observed value $y_{ijk}$ is the $k$ -th measure for treatment $(i,j)$ . The two-way ANOVA models all these variables as varying independently and normally around a mean, $\mu _{ij}$ , with a constant variance, $\sigma ^{2}$ (homoscedasticity):

$Y_{ijk}\,|\,\mu _{ij},\sigma ^{2}\;{\overset {\mathrm {i.i.d.} }{\sim }}\;{\mathcal {N}}(\mu _{ij},\sigma ^{2})$ .

Specifically, the mean of the response variable is modeled as a linear combination of the explanatory variables:

$\mu _{ij}=\mu +\alpha _{i}+\beta _{j}+\gamma _{ij}$ ,

where $\mu$ is the grand mean, $\alpha _{i}$ is the additive main effect of level $i$ from the first factor (i-th row in the contingency table), $\beta _{j}$ is the additive main effect of level $j$ from the second factor (j-th column in the contingency table) and $\gamma _{ij}$ is the non-additive interaction effect of treatment $(i,j)$ for samples $k=1,...,n_{ij}$ from both factors (cell at row i and column j in the contingency table).

Another equivalent way of describing the two-way ANOVA is by mentioning that, besides the variation explained by the factors, there remains some statistical noise. This amount of unexplained variation is handled via the introduction of one random variable per data point, $\epsilon _{ijk}$ , called error. These $n$ random variables are seen as deviations from the means, and are assumed to be independent and normally distributed:

$Y_{ijk}=\mu _{ij}+\epsilon _{ijk}{\text{ with }}\epsilon _{ijk}{\overset {\mathrm {i.i.d.} }{\sim }}{\mathcal {N}}(0,\sigma ^{2})$ .

Parameter estimation

To ensure identifiability of parameters, we can add the following "sum-to-zero" constraints:

$\sum _{i}\alpha _{i}=\sum _{j}\beta _{j}=\sum _{i}\gamma _{ij}=\sum _{j}\gamma _{ij}=0$

Example

The following hypothetical example gives the yields of 15 plants subject to two different environmental variations, and three different fertilisers.

More information Extra CO2, Extra humidity ...

Five sums of squares are calculated:

More information

,

...

Factor	Calculation	Sum	$\sigma ^{2}$
Individual	$7^{2}+2^{2}+1^{2}+7^{2}+6^{2}+11^{2}+6^{2}+10^{2}+7^{2}+3^{2}+5^{2}+3^{2}+4^{2}+11^{2}+4^{2}$	641	15
Fertilizer × Environment	${\frac {(7+2+1)^{2}}{3}}+{\frac {(7+6)^{2}}{2}}+{\frac {(11+6)^{2}}{2}}+{\frac {(10+7+3)^{2}}{3}}+{\frac {(5+3+4)^{2}}{3}}+{\frac {(11+4)^{2}}{2}}$	556.1667	6
Fertilizer	${\frac {(7+2+1+7+6)^{2}}{5}}+{\frac {(11+6+10+7+3)^{2}}{5}}+{\frac {(5+3+4+11+4)^{2}}{5}}$	525.4	3
Environment	${\frac {(7+2+1+11+6+5+3+4)^{2}}{8}}+{\frac {(7+6+10+7+3+11+4)^{2}}{7}}$	519.2679	2
Composite	${\frac {(7+2+1+11+6+5+3+4+7+6+10+7+3+11+4)^{2}}{15}}$	504.6	1

Finally, the sums of squared deviations required for the analysis of variance can be calculated.

More information

...

Factor	Sum	$\sigma ^{2}$	Total	Environment	Fertiliser	Fertiliser × Environment	Residual
Individual	641	15	1				1
Fertiliser × Environment	556.1667	6				1	−1
Fertiliser	525.4	3			1	−1
Environment	519.2679	2		1		−1
Composite	504.6	1	−1	−1	−1	1

Squared deviations			136.4	14.668	20.8	16.099	84.833
Degrees of freedom			14	1	2	2	9

Share this article:

This article uses material from the Wikipedia article Two-way_analysis_of_variance, and is written by contributors. Text is available under a CC BY-SA 4.0 International License; additional terms may apply. Images, videos and audio are available under their respective licenses.

[1] [1]
Yates, Frank (March 1934). "The analysis of multiple classifications with unequal numbers in the different classes". Journal of the American Statistical Association. 29 (185): 51–66. doi:10.1080/01621459.1934.10502686. JSTOR 2278459.

[2] [2]
Fujikoshi, Yasunori (1993). "Two-way ANOVA models with unbalanced data". Discrete Mathematics. 116 (1): 315–334. doi:10.1016/0012-365X(93)90410-U.

[3] [3]
Gelman, Andrew (February 2005). "Analysis of variance? why it is more important than ever". The Annals of Statistics. 33 (1): 1–53. arXiv:math/0504499. doi:10.1214/009053604000001048. S2CID 125025956.

[4] [4]
Kass, Robert E (1 February 2011). "Statistical inference: The big picture". Statistical Science. 26 (1): 1–9. arXiv:1106.2895. doi:10.1214/10-sts337. PMC 3153074. PMID 21841892.

[5] [5]
Gelman, Andrew; Hill, Jennifer (18 December 2006). Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press. pp. 45–46. ISBN 978-0521867061.

[6] [6]
Yi-An Ko; et al. (September 2013). "Novel Likelihood Ratio Tests for Screening Gene-Gene and Gene-Environment Interactions with Unbalanced Repeated-Measures Data". Genetic Epidemiology. 37 (6): 581–591. doi:10.1002/gepi.21744. PMC 4009698. PMID 23798480.

[1]

[2]

[3]

[4]

[5]

[6]

	Extra CO₂	Extra humidity
No fertiliser	7, 2, 1	7, 6
Nitrate	11, 6	10, 7, 3
Phosphate	5, 3, 4	11, 4

Two-way_analysis_of_variance

Two-way analysis of variance

History

Data set

Model

Assumptions

Parameter estimation

Hypothesis testing

Example

See also

Notes

References

Share this article: