Goodness of fit: Difference between revisions

Content deleted Content added

Inline

Revision as of 11:01, 6 July 2021

The goodness of fit of a statistical model describes how well it fits a set of observations. Measures of goodness of fit typically summarize the discrepancy between observed values and the values expected under the model in question. Such measures can be used in statistical hypothesis testing, e.g. to test for normality of residuals, to test whether two samples are drawn from identical distributions (see Kolmogorov–Smirnov test), or whether outcome frequencies follow a specified distribution (see Pearson's chi-square test). In the analysis of variance, one of the components into which the variance is partitioned may be a lack-of-fit sum of squares.

Fit of distributions

In assessing whether a given distribution is suited to a data-set, the following tests and their underlying measures of fit can be used:

Regression analysis

In regression analysis, the following topics relate to goodness of fit:

Coefficient of determination (the R-squared measure of goodness of fit);
Lack-of-fit sum of squares;
Reduced chi-square
Regression validation
Mallows's Cp criterion

Categorical data

The following are examples that arise in the context of categorical data.

Pearson's chi-square test

Pearson's chi-square test uses a measure of goodness of fit which is the sum of differences between observed and expected outcome frequencies (that is, counts of observations), each squared and divided by the expectation:

\chi ^{2}=\sum _{i=1}^{n}{{\frac {(O_{i}-E_{i})}{E_{i}}}^{2}}

where:

O_i = an observed count for bin i

E_i = an expected count for bin i, asserted by the null hypothesis.

The expected frequency is calculated by:

E_{i}\,=\,{\bigg (}F(Y_{u})\,-\,F(Y_{l}){\bigg )}\,N

where:

F = the cumulative distribution function for the probability distribution being tested.

Y_u = the upper limit for class i,

Y_l = the lower limit for class i, and

N = the sample size

The resulting value can be compared with a chi-square distribution to determine the goodness of fit. The chi-square distribution has (k − c) degrees of freedom, where k is the number of non-empty cells and c is the number of estimated parameters (including location and scale parameters and shape parameters) for the distribution plus one. For example, for a 3-parameter Weibull distribution, c = 4.

Example: equal frequencies of men and women

For example, to test the hypothesis that a random sample of 100 people has been drawn from a population in which men and women are equal in frequency, the observed number of men and women would be compared to the theoretical frequencies of 50 men and 50 women. If there were 44 men in the sample and 56 women, then

\chi ^{2}={(44-50)^{2} \over 50}+{(56-50)^{2} \over 50}=1.44

If the null hypothesis is true (i.e., men and women are chosen with equal probability in the sample), the test statistic will be drawn from a chi-square distribution with one degree of freedom. Though one might expect two degrees of freedom (one each for the men and women), we must take into account that the total number of men and women is constrained (100), and thus there is only one degree of freedom (2 − 1). In other words, if the male count is known the female count is determined, and vice versa.

Consultation of the chi-square distribution for 1 degree of freedom shows that the probability of observing this difference (or a more extreme difference than this) if men and women are equally numerous in the population is approximately 0.23. This probability is higher than conventional criteria for statistical significance (.001-.05), so normally we would not reject the null hypothesis that the number of men in the population is the same as the number of women (i.e. we would consider our sample within the range of what we'd expect for a 50/50 male/female ratio.)

Note the assumption that the mechanism that has generated the sample is random, in the sense of independent random selection with the same probability, here 0.5 for both males and females. If, for example, each of the 44 males selected brought a male buddy, and each of the 56 females brought a female buddy, each ${\textstyle {(O_{i}-E_{i})}^{2}}$ will increase by a factor of 4, while each ${\textstyle E_{i}}$ will increase by a factor of 2. The value of the statistic will double to 2.88. Knowing this underlying mechanism, we should of course be counting pairs. In general, the mechanism, if not defensibly random, will not be known. The distribution to which the test statistic should be referred may, accordingly, be very different from chi-square.^[4]

Binomial case

A binomial experiment is a sequence of independent trials in which the trials can result in one of two outcomes, success or failure. There are n trials each with probability of success, denoted by p. Provided that np_i ≫ 1 for every i (where i = 1, 2, ..., k), then

$\chi ^{2}=\sum _{i=1}^{k}{\frac {(N_{i}-np_{i})^{2}}{np_{i}}}=\sum _{\mathrm {all\ cells} }^{}{\frac {(\mathrm {O} -\mathrm {E} )^{2}}{\mathrm {E} }}.$

This has approximately a chi-square distribution with k − 1 degrees of freedom. The fact that there are k − 1 degrees of freedom is a consequence of the restriction $\sum N_{i}=n$ . We know there are k observed cell counts, however, once any k − 1 are known, the remaining one is uniquely determined. Basically, one can say, there are only k − 1 freely determined cell counts, thus k − 1 degrees of freedom.

G-test

G-tests are likelihood-ratio tests of statistical significance that are increasingly being used in situations where Pearson's chi-square tests were previously recommended.^[5]

The general formula for G is

G=2\sum _{i}{O_{i}\cdot \ln \left({\frac {O_{i}}{E_{i}}}\right)},

where ${\textstyle O_{i}}$ and ${\textstyle E_{i}}$ are the same as for the chi-square test, ${\textstyle \ln }$ denotes the natural logarithm, and the sum is taken over all non-empty cells. Furthermore, the total observed count should be equal to the total expected count: $\sum _{i}O_{i}=\sum _{i}E_{i}=N$ where ${\textstyle N}$ is the total number of observations.

G-tests have been recommended at least since the 1981 edition of the popular statistics textbook by Robert R. Sokal and F. James Rohlf.^[6]

References

^ Liu, Qiang; Lee, Jason; Jordan, Michael (20 June 2016). "A Kernelized Stein Discrepancy for Goodness-of-fit Tests". Proceedings of the 33rd International Conference on Machine Learning. The 33rd International Conference on Machine Learning. New York, New York, USA: Proceedings of Machine Learning Research. pp. 276–284.
^ Chwialkowski, Kacper; Strathmann, Heiko; Gretton, Arthur (20 June 2016). "A Kernel Test of Goodness of Fit". Proceedings of the 33rd International Conference on Machine Learning. The 33rd International Conference on Machine Learning. New York, New York, USA: Proceedings of Machine Learning Research. pp. 2606–2615.
^ Zhang, Jin (2002). "Powerful goodness-of-fit tests based on the likelihood ratio" (PDF). J. R. Stat. Soc. B. 64: 281–294. Retrieved 5 November 2018.
^ Maindonald, J. H.; Braun, W. J. (2010). Data Analysis and Graphics Using R. An Example-Based Approach (Third ed.). New York: Cambridge University Press. pp. 116-118. ISBN 978-0-521-76293-9.
^ McDonald, J.H. (2014). "G–test of goodness-of-fit". Handbook of Biological Statistics (Third ed.). Baltimore, Maryland: Sparky House Publishing. pp. 53–58.
^ Sokal, R. R.; Rohlf, F. J. (1981). Biometry: The Principles and Practice of Statistics in Biological Research (Second ed.). W. H. Freeman. ISBN 0-7167-2411-1.

@@ Line 1: / Line 1: @@
 {{more citations needed|date=January 2018}}
 {{Regression bar}}
-The '''goodness of fit''' of a [[statistical model]] describes how well it fits a set of observations. Measures of goodness of fit typically summarize the discrepancy between observed values and the values expected under the model in question. Such measures can be used in [[statistical hypothesis testing]], e.g. to [[normality test|test for normality]] of [[Errors and residuals in statistics|residual]]s, to test whether two samples are drawn from identical distributions  (see [[Kolmogorov–Smirnov]] test), or whether outcome frequencies follow a specified distribution (see [[Pearson's chi-squared test]]).  In the [[analysis of variance]], one of the components into which the variance is partitioned may be a [[lack-of-fit sum of squares]].
+The '''goodness of fit''' of a [[statistical model]] describes how well it fits a set of observations. Measures of goodness of fit typically summarize the discrepancy between observed values and the values expected under the model in question. Such measures can be used in [[statistical hypothesis testing]], e.g. to [[normality test|test for normality]] of [[Errors and residuals in statistics|residual]]s, to test whether two samples are drawn from identical distributions  (see [[Kolmogorov–Smirnov]] test), or whether outcome frequencies follow a specified distribution (see [[Pearson's chi-square test]]).  In the [[analysis of variance]], one of the components into which the variance is partitioned may be a [[lack-of-fit sum of squares]].
 ==Fit of distributions==
@@ Line 11: / Line 11: @@
 *[[Anderson–Darling test]]
 *[[Shapiro–Wilk test]]
-*[[Chi-squared test]]
+*[[Chi-square test]]
 *[[Akaike information criterion]]
 *[[Hosmer–Lemeshow test]]
@@ Line 25: / Line 25: @@
 * [[Coefficient of determination]] (the R-squared measure of goodness of fit);
 * [[Lack-of-fit sum of squares]];
-* [[Reduced chi-squared]]
+* [[Reduced chi-square]]
 * [[Regression validation]]
 * [[Mallows's Cp|Mallows's Cp criterion]]
@@ Line 33: / Line 33: @@
 The following are examples that arise in the context of [[categorical data]].
-===Pearson's chi-squared test===
+===Pearson's chi-square test===
-[[Pearson's chi-squared test]] uses a measure of goodness of fit which is the sum of differences between observed and [[Expected value|expected outcome]] frequencies (that is, counts of observations), each squared and divided by the expectation:
+[[Pearson's chi-square test]] uses a measure of goodness of fit which is the sum of differences between observed and [[Expected value|expected outcome]] frequencies (that is, counts of observations), each squared and divided by the expectation:
 :<math> \chi^2 = \sum_{i=1}^n {\frac{(O_i - E_i)}{E_i}^2}</math>
@@ Line 50: / Line 50: @@
 :''N'' = the sample size
-The resulting value can be compared with a [[chi-squared distribution]] to determine the goodness of fit. The chi-squared distribution has (''k'' &minus; ''c'') [[Degrees of freedom (statistics)|degrees of freedom]], where ''k'' is the number of non-empty cells and ''c'' is the number of estimated parameters (including location and scale parameters and shape parameters) for the distribution plus one. For example, for a 3-parameter [[Weibull distribution]], ''c'' = 4.
+The resulting value can be compared with a [[chi-square distribution]] to determine the goodness of fit. The chi-square distribution has (''k'' &minus; ''c'') [[Degrees of freedom (statistics)|degrees of freedom]], where ''k'' is the number of non-empty cells and ''c'' is the number of estimated parameters (including location and scale parameters and shape parameters) for the distribution plus one. For example, for a 3-parameter [[Weibull distribution]], ''c'' = 4.
 ====Example: equal frequencies of men and women====
@@ Line 58: / Line 58: @@
 :<math> \chi^2 = {(44 - 50)^2 \over 50} + {(56 - 50)^2 \over 50} = 1.44</math>
-If the null hypothesis is true (i.e., men and women are chosen with equal probability in the sample), the test statistic will be drawn from a chi-squared distribution with one [[degrees of freedom (statistics)|degree of freedom]].  Though one might expect two degrees of freedom (one each for the men and women), we must take into account that the total number of men and women is constrained (100), and thus there is only one degree of freedom (2&nbsp;&minus;&nbsp;1).  In other words, if the male count is known the female count is determined, and vice versa.
+If the null hypothesis is true (i.e., men and women are chosen with equal probability in the sample), the test statistic will be drawn from a chi-square distribution with one [[degrees of freedom (statistics)|degree of freedom]].  Though one might expect two degrees of freedom (one each for the men and women), we must take into account that the total number of men and women is constrained (100), and thus there is only one degree of freedom (2&nbsp;&minus;&nbsp;1).  In other words, if the male count is known the female count is determined, and vice versa.
-Consultation of the [[chi-squared distribution]] for 1 degree of freedom shows that the [[probability]] of observing this difference (or a more extreme difference than this) if men and women are equally numerous in the population is approximately 0.23. This probability is higher than conventional criteria for [[statistical significance]] (.001-.05), so normally we would not reject the null hypothesis that the number of men in the population is the same as the number of women (i.e. we would consider our sample within the range of what we'd expect for a 50/50 male/female ratio.)
+Consultation of the [[chi-square distribution]] for 1 degree of freedom shows that the [[probability]] of observing this difference (or a more extreme difference than this) if men and women are equally numerous in the population is approximately 0.23. This probability is higher than conventional criteria for [[statistical significance]] (.001-.05), so normally we would not reject the null hypothesis that the number of men in the population is the same as the number of women (i.e. we would consider our sample within the range of what we'd expect for a 50/50 male/female ratio.)
-Note the assumption that the mechanism that has generated the sample is random, in the sense of independent random selection with the same probability, here 0.5 for both males and females. If, for example, each of the 44 males selected brought a male buddy, and each of the 56 females brought a female buddy, each <math display="inline">{(O_i - E_i)}^2</math> will increase by a factor of 4, while each <math display="inline">E_i</math> will increase by a factor of 2. The value of the statistic will double to 2.88.  Knowing this underlying mechanism, we should of course be counting pairs.  In general, the mechanism, if not defensibly random, will not be known. The distribution to which the test statistic should be referred may, accordingly, be very different from chi-squared.<ref>{{cite book |last=Maindonald |first=J. H. |last2=Braun |first2=W. J. |year=2010 |title=Data Analysis and Graphics Using R.  An Example-Based Approach.|url=https://archive.org/details/dataanalysisgrap00main_071 |url-access=limited |location=New York |publisher=Cambridge University Press |edition=Third |isbn=978-0-521-76293-9 |pp=[https://archive.org/details/dataanalysisgrap00main_071/page/n143 116]-118}}</ref>
+Note the assumption that the mechanism that has generated the sample is random, in the sense of independent random selection with the same probability, here 0.5 for both males and females. If, for example, each of the 44 males selected brought a male buddy, and each of the 56 females brought a female buddy, each <math display="inline">{(O_i - E_i)}^2</math> will increase by a factor of 4, while each <math display="inline">E_i</math> will increase by a factor of 2. The value of the statistic will double to 2.88.  Knowing this underlying mechanism, we should of course be counting pairs.  In general, the mechanism, if not defensibly random, will not be known. The distribution to which the test statistic should be referred may, accordingly, be very different from chi-square.<ref>{{cite book |last=Maindonald |first=J. H. |last2=Braun |first2=W. J. |year=2010 |title=Data Analysis and Graphics Using R.  An Example-Based Approach.|url=https://archive.org/details/dataanalysisgrap00main_071 |url-access=limited |location=New York |publisher=Cambridge University Press |edition=Third |isbn=978-0-521-76293-9 |pp=[https://archive.org/details/dataanalysisgrap00main_071/page/n143 116]-118}}</ref>
 ====Binomial case====
@@ Line 70: / Line 70: @@
 <math> \chi^2 = \sum_{i=1}^{k} {\frac{(N_i - np_i)^2}{np_i}} = \sum_{\mathrm{all\ cells}}^{} {\frac{(\mathrm{O} - \mathrm{E})^2}{\mathrm{E}}}.</math>
-This has approximately a chi-squared distribution with ''k''&nbsp;&minus;&nbsp;1 degrees of freedom.  The fact that there are ''k''&nbsp;&minus;&nbsp;1 degrees of freedom is a consequence of the restriction <math> \sum N_i=n</math>.  We know there are ''k'' observed cell counts, however, once any ''k''&nbsp;&minus;&nbsp;1 are known, the remaining one is uniquely determined.  Basically, one can say, there are only ''k''&nbsp;&minus;&nbsp;1 freely determined cell counts, thus ''k''&nbsp;&minus;&nbsp;1 degrees of freedom.
+This has approximately a chi-square distribution with ''k''&nbsp;&minus;&nbsp;1 degrees of freedom.  The fact that there are ''k''&nbsp;&minus;&nbsp;1 degrees of freedom is a consequence of the restriction <math> \sum N_i=n</math>.  We know there are ''k'' observed cell counts, however, once any ''k''&nbsp;&minus;&nbsp;1 are known, the remaining one is uniquely determined.  Basically, one can say, there are only ''k''&nbsp;&minus;&nbsp;1 freely determined cell counts, thus ''k''&nbsp;&minus;&nbsp;1 degrees of freedom.
 ===''G''-test===
-[[G-test|''G''-tests]] are [[likelihood ratio test|likelihood-ratio]] tests of [[statistical significance]] that are increasingly being used in situations where Pearson's chi-squared tests were previously recommended.<ref>{{cite book|author=McDonald, J.H.|year=2014|title=Handbook of Biological Statistics|location=Baltimore, Maryland|publisher=Sparky House Publishing|edition=Third|chapter=G–test of goodness-of-fit|url=http://www.biostathandbook.com/gtestgof.html|pages=53–58}}</ref>
+[[G-test|''G''-tests]] are [[likelihood ratio test|likelihood-ratio]] tests of [[statistical significance]] that are increasingly being used in situations where Pearson's chi-square tests were previously recommended.<ref>{{cite book|author=McDonald, J.H.|year=2014|title=Handbook of Biological Statistics|location=Baltimore, Maryland|publisher=Sparky House Publishing|edition=Third|chapter=G–test of goodness-of-fit|url=http://www.biostathandbook.com/gtestgof.html|pages=53–58}}</ref>
 The general formula for ''G'' is
 :<math> G = 2\sum_{i} {O_{i} \cdot \ln\left(\frac{O_i}{E_i}\right)}, </math>
-where <math display="inline">O_i</math> and <math display="inline">E_i</math> are the same as for the chi-squared test, <math display="inline">\ln</math> denotes the [[natural logarithm]], and the sum is taken over all non-empty cells. Furthermore, the total observed count should be equal to the total expected count:<math display="block">\sum_i O_i = \sum_i E_i = N</math>where <math display="inline">N</math> is the total number of observations.
+where <math display="inline">O_i</math> and <math display="inline">E_i</math> are the same as for the chi-square test, <math display="inline">\ln</math> denotes the [[natural logarithm]], and the sum is taken over all non-empty cells. Furthermore, the total observed count should be equal to the total expected count:<math display="block">\sum_i O_i = \sum_i E_i = N</math>where <math display="inline">N</math> is the total number of observations.
 ''G''-tests have been recommended at least since the 1981 edition of the popular statistics textbook by [[Robert R. Sokal]] and [[F. James Rohlf]].<ref>{{cite book |last=Sokal |first=R. R. |last2=Rohlf |first2=F. J. |year=1981 |title=Biometry: The Principles and Practice of Statistics in Biological Research |publisher=[[W. H. Freeman]] |edition=Second |isbn=0-7167-2411-1 |url-access=registration |url=https://archive.org/details/biometryprincipl00soka_0 }}</ref>