False positives and false negatives: Difference between revisions

Content deleted Content added

Inline

Latest revision as of 03:42, 26 April 2024

A false positive is an error in binary classification in which a test result incorrectly indicates the presence of a condition (such as a disease when the disease is not present), while a false negative is the opposite error, where the test result incorrectly indicates the absence of a condition when it is actually present. These are the two kinds of errors in a binary test, in contrast to the two kinds of correct result (a true positive and a true negative). They are also known in medicine as a false positive (or false negative) diagnosis, and in statistical classification as a false positive (or false negative) error.^[1]

In statistical hypothesis testing, the analogous concepts are known as type I and type II errors, where a positive result corresponds to rejecting the null hypothesis, and a negative result corresponds to not rejecting the null hypothesis. The terms are often used interchangeably, but there are differences in detail and interpretation due to the differences between medical testing and statistical hypothesis testing.

False positive error[edit]

A false positive error, or false positive, is a result that indicates a given condition exists when it does not. For example, a pregnancy test which indicates a woman is pregnant when she is not, or the conviction of an innocent person.^[2]

A false positive error is a type I error where the test is checking a single condition, and wrongly gives an affirmative (positive) decision. However it is important to distinguish between the type 1 error rate and the probability of a positive result being false. The latter is known as the false positive risk (see Ambiguity in the definition of false positive rate, below).^[3]

False negative error[edit]

A false negative error, or false negative, is a test result which wrongly indicates that a condition does not hold. For example, when a pregnancy test indicates a woman is not pregnant, but she is, or when a person guilty of a crime is acquitted, these are false negatives. The condition "the woman is pregnant", or "the person is guilty" holds, but the test (the pregnancy test or the trial in a court of law) fails to realize this condition, and wrongly decides that the person is not pregnant or not guilty.^{[citation needed]}

A false negative error is a type II error occurring in a test where a single condition is checked for, and the result of the test is erroneous, that the condition is absent.^[4]

Related terms[edit]

False positive and false negative rates[edit]

The false positive rate (FPR) is the proportion of all negatives that still yield positive test outcomes, i.e., the conditional probability of a positive test result given an event that was not present.^{[citation needed]}

The false positive rate is equal to the significance level. The specificity of the test is equal to 1 minus the false positive rate.

In statistical hypothesis testing, this fraction is given the Greek letter α, and 1 − α is defined as the specificity of the test. Increasing the specificity of the test lowers the probability of type I errors, but may raise the probability of type II errors (false negatives that reject the alternative hypothesis when it is true).^[a]

Complementarily, the false negative rate (FNR) is the proportion of positives which yield negative test outcomes with the test, i.e., the conditional probability of a negative test result given that the condition being looked for is present.

In statistical hypothesis testing, this fraction is given the letter β. The "power" (or the "sensitivity") of the test is equal to 1 − β.

Ambiguity in the definition of false positive rate[edit]

The term false discovery rate (FDR) was used by Colquhoun (2014)^[5] to mean the probability that a "significant" result was a false positive. Later Colquhoun (2017)^[3] used the term false positive risk (FPR) for the same quantity, to avoid confusion with the term FDR as used by people who work on multiple comparisons. Corrections for multiple comparisons aim only to correct the type I error rate, so the result is a (corrected) p-value. Thus they are susceptible to the same misinterpretation as any other p-value. The false positive risk is always higher, often much higher, than the p-value.^[5]^[3]

Confusion of these two ideas, the error of the transposed conditional, has caused much mischief.^[6] Because of the ambiguity of notation in this field, it is essential to look at the definition in every paper. The hazards of reliance on p-values was emphasized in Colquhoun (2017)^[3] by pointing out that even an observation of p = 0.001 was not necessarily strong evidence against the null hypothesis. Despite the fact that the likelihood ratio in favor of the alternative hypothesis over the null is close to 100, if the hypothesis was implausible, with a prior probability of a real effect being 0.1, even the observation of p = 0.001 would have a false positive rate of 8 percent. It wouldn't even reach the 5 percent level. As a consequence, it has been recommended^[3]^[7] that every p-value should be accompanied by the prior probability of there being a real effect that it would be necessary to assume in order to achieve a false positive risk of 5%. For example, if we observe p = 0.05 in a single experiment, we would have to be 87% certain that there was a real effect before the experiment was done to achieve a false positive risk of 5%.

Receiver operating characteristic[edit]

The article "Receiver operating characteristic" discusses parameters in statistical signal processing based on ratios of errors of various types.

Notes[edit]

^ When developing detection algorithms or tests, a balance must be chosen between risks of false negatives and false positives. Usually there is a threshold of how close a match to a given sample must be achieved before the algorithm reports a match. The higher this threshold, the more false negatives and the fewer false positives.

References[edit]

^ False Positives and False Negatives
^ Robinson A, Keller LR, del Campo C. Building insights on true positives vs. false positives: Bayes’ rule. Decision Sciences Journal of Innovative Education. 2022;20(4):224-234. doi:10.1111/dsji.12265
^ ^a ^b ^c ^d ^e Colquhoun, David (2017). "The reproducibility of research and the misinterpretation of p-values". Royal Society Open Science. 4 (12): 171085. doi:10.1098/rsos.171085. PMC 5750014. PMID 29308247.
^ Banerjee, A; Chitnis, UB; Jadhav, SL; Bhawalkar, JS; Chaudhury, S (2009). "Hypothesis testing, type I and type II errors". Ind Psychiatry J. 18 (2): 127–31. doi:10.4103/0972-6748.62274. PMC 2996198. PMID 21180491.
^ ^a ^b Colquhoun, David (2014). "An investigation of the false discovery rate and the misinterpretation of p-values". Royal Society Open Science. 1 (3): 140216. arXiv:1407.5296. Bibcode:2014RSOS....140216C. doi:10.1098/rsos.140216. PMC 4448847. PMID 26064558.
^ Colquhoun, David. "The problem with p-values". Aeon. Aeon Magazine. Retrieved 11 December 2016.
^ Colquhoun, David (2018). "The false positive risk: A proposal concerning what to do about p values". The American Statistician. 73: 192–201. arXiv:1802.04888. doi:10.1080/00031305.2018.1529622. S2CID 85530643.

[5] When developing detection algorithms or tests, a balance must be chosen between risks of false negatives and false positives. Usually there is a threshold of how close a match to a given sample must be achieved before the algorithm reports a match. The higher this threshold, the more false negatives and the fewer false positives.

[1] False Positives and False Negatives

[2] Robinson A, Keller LR, del Campo C. Building insights on true positives vs. false positives: Bayes’ rule. Decision Sciences Journal of Innovative Education. 2022;20(4):224-234. doi:10.1111/dsji.12265

[DC2017-3] Colquhoun, David (2017). "The reproducibility of research and the misinterpretation of p-values". Royal Society Open Science. 4 (12): 171085. doi:10.1098/rsos.171085. PMC 5750014. PMID 29308247.

[fn-4] Banerjee, A; Chitnis, UB; Jadhav, SL; Bhawalkar, JS; Chaudhury, S (2009). "Hypothesis testing, type I and type II errors". Ind Psychiatry J. 18 (2): 127–31. doi:10.4103/0972-6748.62274. PMC 2996198. PMID 21180491.

[DC2014-6] Colquhoun, David (2014). "An investigation of the false discovery rate and the misinterpretation of p-values". Royal Society Open Science. 1 (3): 140216. arXiv:1407.5296. Bibcode:2014RSOS....140216C. doi:10.1098/rsos.140216. PMC 4448847. PMID 26064558.

[DC2016-7] Colquhoun, David. "The problem with p-values". Aeon. Aeon Magazine. Retrieved 11 December 2016.

[DC2018-8] Colquhoun, David (2018). "The false positive risk: A proposal concerning what to do about p values". The American Statistician. 73: 192–201. arXiv:1802.04888. doi:10.1080/00031305.2018.1529622. S2CID 85530643.

[1]

[2]

[3]

[4]

[a]

[5]

[6]

[7]

@@ Line 3: / Line 3: @@
 A '''false positive''' is an error in [[binary classification]] in which a test result incorrectly indicates the presence of a condition (such as a disease when the disease is not present), while a '''false negative''' is the opposite error, where the test result incorrectly indicates the absence of a condition when it is actually present. These are the two kinds of errors in a [[binary test]], in contrast to the two kinds of correct result (a '''{{visible anchor|true positive}}''' and a '''{{visible anchor|true negative}}'''). They are also known in medicine as a '''false positive''' (or '''false negative''') '''diagnosis''', and in [[statistical classification]] as a '''false positive''' (or '''false negative''') '''error'''.<ref>[http://www.mathsisfun.com/data/probability-false-negatives-positives.html False Positives and False Negatives]</ref>
-In [[statistical hypothesis testing]] the analogous concepts are known as [[type I and type II errors]], where a positive result corresponds to rejecting the [[null hypothesis]], and a negative result corresponds to not rejecting the null hypothesis. The terms are often used interchangeably, but there are differences in detail and interpretation due to the differences between medical testing and statistical hypothesis testing.
+In [[statistical hypothesis testing]], the analogous concepts are known as [[type I and type II errors]], where a positive result corresponds to rejecting the [[null hypothesis]], and a negative result corresponds to not rejecting the null hypothesis. The terms are often used interchangeably, but there are differences in detail and interpretation due to the differences between medical testing and statistical hypothesis testing.
 ==False positive error==
-A '''false positive error''', or '''false positive''', is a result that indicates a given condition exists when it does not. For example, a pregnancy test which indicates a woman is pregnant when she is not, or the conviction of an innocent person.
+A '''false positive error''', or '''false positive''', is a result that indicates a given condition exists when it does not. For example, a pregnancy test which indicates a woman is pregnant when she is not, or the conviction of an innocent person.<ref>Robinson A, Keller LR, del Campo C. Building insights on true positives vs. false positives: Bayes’ rule. Decision Sciences Journal of Innovative Education. 2022;20(4):224-234. doi:10.1111/dsji.12265</ref>
 A false positive error is a [[type I error]] where the test is checking a single condition, and wrongly gives an affirmative (positive) decision. However it is important to distinguish between the type 1 error rate and the probability of a positive result being false. The latter is known as the false positive risk (see [[#Ambiguity in the definition of false positive rate|Ambiguity in the definition of false positive rate, below]]).<ref name="DC2017"/>
 ==False negative error==
-A '''false negative error''', or '''false negative''', is a test result which wrongly indicates that a condition does not hold. For example, when a pregnancy test indicates a woman is not pregnant, but she is, or when a person guilty of a crime is acquitted, these are false negatives. The condition "the woman is pregnant", or "the person is guilty" holds, but the test (the pregnancy test or the trial in a court of law) fails to realize this condition, and wrongly decides that the person is not pregnant or not guilty.
+A '''false negative error''', or '''false negative''', is a test result which wrongly indicates that a condition does not hold. For example, when a pregnancy test indicates a woman is not pregnant, but she is, or when a person guilty of a crime is acquitted, these are false negatives. The condition "the woman is pregnant", or "the person is guilty" holds, but the test (the pregnancy test or the trial in a court of law) fails to realize this condition, and wrongly decides that the person is not pregnant or not guilty.{{cn|date=April 2024}}
-A false negative error is a [[type II error]] occurring in a test where a single condition is checked for, and the result of the test is erroneous, that the condition is absent.<ref name="fn">{{cite journal | title=Hypothesis testing, type I and type II errors | date=2009 | pmc=2996198 | pmid=21180491 | doi=10.4103/0972-6748.62274 | volume=18 | issue=2 | journal=Ind Psychiatry J | pages=127–31 | last1 = Banerjee | first1 = A | last2 = Chitnis | first2 = UB | last3 = Jadhav | first3 = SL | last4 = Bhawalkar | first4 = JS | last5 = Chaudhury | first5 = S}}</ref>
+A false negative error is a [[type II error]] occurring in a test where a single condition is checked for, and the result of the test is erroneous, that the condition is absent.<ref name="fn">{{cite journal | title=Hypothesis testing, type I and type II errors | date=2009 | pmc=2996198 | pmid=21180491 | doi=10.4103/0972-6748.62274 | volume=18 | issue=2 | journal=Ind Psychiatry J | pages=127–31 | last1 = Banerjee | first1 = A | last2 = Chitnis | first2 = UB | last3 = Jadhav | first3 = SL | last4 = Bhawalkar | first4 = JS | last5 = Chaudhury | first5 = S | doi-access=free }}</ref>
 ==Related terms==
@@ Line 19: / Line 19: @@
 ===False positive and false negative rates===
 {{Main|Sensitivity and specificity|False positive rate}}
-The '''[[false positive rate]]''' is the proportion of all negatives that still yield positive test outcomes, i.e., the conditional probability of a positive test result given an event that was not present.
+The '''[[false positive rate]]''' (FPR) is the proportion of all negatives that still yield positive test outcomes, i.e., the conditional probability of a positive test result given an event that was not present.{{cn|date=April 2024}}
 The false positive rate is equal to the [[significance level]]. The [[specificity (tests)|specificity]] of the test is equal to '''1''' minus the false positive rate.
-In [[statistical hypothesis testing]], this fraction is given the Greek letter ''α'', and 1&nbsp;&minus;&nbsp;''α'' is defined as the specificity of the test. Increasing the specificity of the test lowers the probability of type&nbsp;I errors, but may raise the probability of type&nbsp;II errors (false negatives that reject the alternative hypothesis when it is true).{{efn|When developing detection algorithms or tests, a balance must be chosen between risks of false negatives and false positives. Usually there is a threshold of how close a match to a given sample must be achieved before the algorithm reports a match. The higher this threshold, the more false negatives and the fewer false positives.}}
+In [[statistical hypothesis testing]], this fraction is given the Greek letter ''α'', and 1&nbsp;−&nbsp;''α'' is defined as the specificity of the test. Increasing the specificity of the test lowers the probability of type&nbsp;I errors, but may raise the probability of type&nbsp;II errors (false negatives that reject the alternative hypothesis when it is true).{{efn|When developing detection algorithms or tests, a balance must be chosen between risks of false negatives and false positives. Usually there is a threshold of how close a match to a given sample must be achieved before the algorithm reports a match. The higher this threshold, the more false negatives and the fewer false positives.}}
-Complementarily, the '''{{visible anchor|false negative rate}}''' is the proportion of positives which yield negative test outcomes with the test, i.e., the conditional probability of a negative test result given that the condition being looked for is present.
+Complementarily, the '''{{visible anchor|false negative rate}}''' (FNR) is the proportion of positives which yield negative test outcomes with the test, i.e., the conditional probability of a negative test result given that the condition being looked for is present.
-In [[statistical hypothesis testing]], this fraction is given the letter ''β''. The "[[statistical power|power]]" (or the "[[Sensitivity and specificity|sensitivity]]") of the test is equal to 1&nbsp;&minus;&nbsp;''β''.
+In [[statistical hypothesis testing]], this fraction is given the letter ''β''. The "[[statistical power|power]]" (or the "[[Sensitivity and specificity|sensitivity]]") of the test is equal to 1&nbsp;−&nbsp;''β''.
 ===Ambiguity in the definition of false positive rate===
-The term false discovery rate (FDR) was used by Colquhoun (2014)<ref name=DC2014>{{cite journal |last1=Colquhoun| first1=David| title=An investigation of the false discovery rate and the misinterpretation of ''p''-values| journal=Royal Society Open Science |date=2014|volume=1| issue=3|page=140216|doi=10.1098/rsos.140216| pmid=26064558| pmc=4448847| arxiv=1407.5296| bibcode=2014RSOS....140216C}}</ref> to mean the probability that a "significant" result was a false positive. Later Colquhoun (2017)<ref name="DC2017">{{cite journal|last1=Colquhoun|first1=David|title=The reproducibility of research and the misinterpretation of p-values|journal=Royal Society Open Science|volume=4|issue=12|pages=171085|date=2017|doi=10.1098/rsos.171085|pmid=29308247|pmc=5750014}}</ref> used the term false positive risk (FPR) for the same quantity, to avoid confusion with the term FDR as used by people who work on [[multiple comparisons]]. Corrections for multiple comparisons aim only to correct the type I error rate, so the result is a (corrected) [[p-value|''p''-value]]. Thus they are susceptible to the same misinterpretation as any other ''p''-value. The false positive risk is always higher, often much higher, than the ''p''-value.<ref name="DC2014"/><ref name="DC2017"/>
+The term false discovery rate (FDR) was used by Colquhoun (2014)<ref name=DC2014>{{cite journal |last1=Colquhoun| first1=David| title=An investigation of the false discovery rate and the misinterpretation of ''p''-values| journal=Royal Society Open Science |date=2014|volume=1| issue=3|page=140216|doi=10.1098/rsos.140216| pmid=26064558| pmc=4448847| arxiv=1407.5296| bibcode=2014RSOS....140216C}}</ref> to mean the probability that a "significant" result was a false positive. Later Colquhoun (2017)<ref name="DC2017">{{cite journal|last1=Colquhoun|first1=David|title=The reproducibility of research and the misinterpretation of p-values|journal=Royal Society Open Science|volume=4|issue=12|page=171085|date=2017|doi=10.1098/rsos.171085|pmid=29308247|pmc=5750014}}</ref> used the term false positive risk (FPR) for the same quantity, to avoid confusion with the term FDR as used by people who work on [[multiple comparisons]]. Corrections for multiple comparisons aim only to correct the type I error rate, so the result is a (corrected) [[p-value|''p''-value]]. Thus they are susceptible to the same misinterpretation as any other ''p''-value. The false positive risk is always higher, often much higher, than the ''p''-value.<ref name="DC2014"/><ref name="DC2017"/>
-Confusion of these two ideas, the [[Fallacy of transposed conditional|error of the transposed conditional]], has caused much mischief.<ref name="DC2016">{{cite web|last1=Colquhoun |first1=David |title= The problem with p-values |url= https://aeon.co/essays/it-s-time-for-science-to-abandon-the-term-statistically-significant |website= Aeon|publisher= Aeon Magazine|access-date =11 December 2016}}</ref> Because of the ambiguity of notation in this field, it is essential to look at the definition in every paper. The hazards of reliance on ''p''-values was emphasized in Colquhoun (2017)<ref name="DC2017"/> by pointing out that even an observation of ''p'' = 0.001 was not necessarily strong evidence against the null hypothesis. Despite the fact that the likelihood ratio in favor of the alternative hypothesis over the null is close to 100, if the hypothesis was implausible, with a [[prior probability]] of a real effect being 0.1, even the observation of ''p'' = 0.001 would have a false positive rate of 8 percent. It wouldn't even reach the 5 percent level. As a consequence, it has been recommended<ref name="DC2017"/><ref name="DC2018">{{cite journal|arxiv=1802.04888|last1=Colquhoun|first1=David|title=The false positive risk: A proposal concerning what to do about p values|journal=The American Statistician|volume=73|pages=192–201|year=2018|doi=10.1080/00031305.2018.1529622|s2cid=85530643}}</ref> that every ''p''-value should be accompanied by the prior probability of there being a real effect that it would be necessary to assume in order to achieve a false positive risk of 5%. For example, if we observe ''p'' = 0.05 in a single experiment, we would have to be 87% certain that there as a real effect before the experiment was done to achieve a false positive risk of 5%.
+Confusion of these two ideas, the [[Fallacy of transposed conditional|error of the transposed conditional]], has caused much mischief.<ref name="DC2016">{{cite web|last1=Colquhoun |first1=David |title= The problem with p-values |url= https://aeon.co/essays/it-s-time-for-science-to-abandon-the-term-statistically-significant |website= Aeon|publisher= Aeon Magazine|access-date =11 December 2016}}</ref> Because of the ambiguity of notation in this field, it is essential to look at the definition in every paper. The hazards of reliance on ''p''-values was emphasized in Colquhoun (2017)<ref name="DC2017"/> by pointing out that even an observation of ''p'' = 0.001 was not necessarily strong evidence against the null hypothesis. Despite the fact that the likelihood ratio in favor of the alternative hypothesis over the null is close to 100, if the hypothesis was implausible, with a [[prior probability]] of a real effect being 0.1, even the observation of ''p'' = 0.001 would have a false positive rate of 8 percent. It wouldn't even reach the 5 percent level. As a consequence, it has been recommended<ref name="DC2017"/><ref name="DC2018">{{cite journal|arxiv=1802.04888|last1=Colquhoun|first1=David|title=The false positive risk: A proposal concerning what to do about p values|journal=The American Statistician|volume=73|pages=192–201|year=2018|doi=10.1080/00031305.2018.1529622|s2cid=85530643}}</ref> that every ''p''-value should be accompanied by the prior probability of there being a real effect that it would be necessary to assume in order to achieve a false positive risk of 5%. For example, if we observe ''p'' = 0.05 in a single experiment, we would have to be 87% certain that there was a real effect before the experiment was done to achieve a false positive risk of 5%.
 ===Receiver operating characteristic===
@@ Line 38: / Line 39: @@
 == See also ==
+* [[Base rate fallacy]]
 * [[False positive rate]]
 * [[Positive and negative predictive values]]