Psychological Bulletin
2011, Vol. 137, No. 4, 708 –712
© 2011 American Psychological Association
0033-2909/11/$12.00 DOI: 10.1037/a0023327
COMMENT
A Misleading Review of Response Bias:
Comment on McGrath, Mitchell, Kim, and Hough (2010)
Martin L. Rohling
Glenn J. Larrabee
University of South Alabama
Sarasota, Florida
Manfred F. Greiffenstein
Yossef S. Ben-Porath
Royal Oak, Michigan
Kent State University
Paul Lees-Haley
Paul Green
Huntsville, Alabama
Neurobehavioural Associates, Edmonton,
Alberta, Canada
Kevin W. Greve
University of New Orleans
In the May 2010 issue of Psychological Bulletin, R. E. McGrath, M. Mitchell, B. H. Kim, and L. Hough
published an article entitled “Evidence for Response Bias as a Source of Error Variance in Applied
Assessment” (pp. 450 – 470). They argued that response bias indicators used in a variety of settings typically
have insufficient data to support such use in everyday clinical practice. Furthermore, they claimed that despite
100 years of research into the use of response bias indicators, “a sufficient justification for 关their兴 use . . . in
applied settings remains elusive” (p. 450). We disagree with McGrath et al.’s conclusions. In fact, we assert
that the relevant and voluminous literature that has addressed the issues of response bias substantiates validity
of these indicators. In addition, we believe that response bias measures should be used in clinical and research
settings on a regular basis. Finally, the empirical evidence for the use of response bias measures is strongest
in clinical neuropsychology. We argue that McGrath et al.’s erroneous perspective on response bias measures
is a result of 3 errors in their research methodology: (a) inclusion criteria for relevant studies that are too
narrow; (b) errors in interpreting results of the empirical research they did include; (c) evidence of a
confirmatory bias in selectively citing the literature, as evidence of moderation appears to have been
overlooked. Finally, their acknowledging experts in the field who might have highlighted these errors prior to
publication may have prevented critiques during the review process.
Keywords: response bias, suppressor variables, moderating variables, employee selection, disability
evaluation
A critical scientific review, a format in which Psychological
Bulletin excels, requires authors to address fundamental issues in
the scientific literature in a comprehensive and accurate manner.
This process entails consideration of a complete literature to avoid
engaging in confirmatory bias. Conclusions and generalizations
should be clearly supported, particularly those involving rejection
of alternative hypotheses. McGrath, Mitchell, Kim, and Hough’s
(2010) paper on the validity of response bias indicators (symptom
validity tests, or SVTs) deviated from sound review practices,
made erroneous characterizations of typical practice, and advanced
sweeping but unfounded conclusions.
Using examples from the literature on SVTs in neuropsychological settings as illustrations, we challenge the methods, literature sampling, logic, major assertions, and tacit assumptions of
McGrath et al. (2010).1 Their conclusions may be applicable to a
Martin L. Rohling, Department of Psychology, University of South
Alabama; Glenn J. Larrabee, Independent Practice, Sarasota, Florida; Manfred F. Greiffenstein, Independent Practice, Royal Oak, Michigan; Yossef
S. Ben-Porath, Department of Psychology, Kent State University; Paul
Lees-Haley, Independent Practice, Huntsville, Alabama; Paul Green, Neurobehavioural Associates, Edmonton, Alberta, Canada; Kevin W. Greve,
Department of Psychology, University of New Orleans.
Correspondence concerning this article should be addressed to Martin L.
Rohling, Department of Psychology, 331 Life Sciences Building, University of
South Alabama, Mobile, AL 36688-0002. E-mail: mrohling@usouthal.edu
1
All of the authors of the present commentary have conducted empirical
research that has been published in peer-reviewed articles on symptom
validity tests (SVTs) in applied neuropsychological settings. Furthermore,
most of us were part of the American Academy of Clinical Neuropsychology consensus conference statement on the neuropsychological assessment
of effort, response bias, and malingering (Heilbronner, Sweet, Morgan,
Larrabee, & Millis, 2009). As such, we have firsthand experience in
research on SVTs as well as in clinical application of these techniques.
708
RESPONSE BIAS: COMMENT
small sample of the psychological assessment world, specifically,
to measurement of social desirability in industrial– organization
settings. However, McGrath et al. ignored a substantial portion of
the relevant research on negative response bias that is contrary to
their conclusions. Peer-reviewed empirical evidence strongly supports the inclusion of bias indicators in psychological assessment,
particularly in the context of forensic and disability evaluation.
A Summary of McGrath et al.’s (2010) Position
McGrath et al. (2010) reviewed and quantitatively analyzed
selected literature to inform readers on a major issue: Are
reporting bias indicators (validity, response style) adequately
validated as measures of invalid responding? Are they valid for
the purpose intended? This is an important question because a
conclusion of inaccurate responding has many ramifications for
the assessed individual. McGrath et al. defined two types of
response bias: positive impression management (i.e., exaggerating uncommon virtues or denying flaws) and negative impression management (i.e., exaggerating or faking impairment).
They also recognized the distinction between measures of biased self-report, when responding to questions on a personality
test, and measures of test-taking effort, when cognitive and
perceptual–motor performance is at issue. We do not dispute
these definitions.
McGrath et al. (2010) asserted that valid bias indicators
should have two proven properties: (a) Response bias suppresses or moderates the criterion-related validity of substantive
psychological indicators, and (b) bias indicators are capable of
detecting response bias. They proposed a standard for examining the validity of validity indicators, a testable hypothesis
termed the response bias hypothesis (RBH): “A valid bias
indicator should be able to enhance [emphasis added] the predictive accuracy of a valid substantive indicator” (p. 452). In
other words, to be proven valid, a response bias measure should
augment or improve prediction of an external criterion (e.g.,
diagnosis, cognitive level, or real-world behaviors) when biased
respondents are removed from a sample.
McGrath et al. (2010) excluded from their analyses many
SVT findings that contained variables sufficient to test the
RBH. Four thousand studies were identified, yet only 40 of
these purportedly met their inclusion criteria, and only two of
those studies examined negative response bias in neuropsychology (p. 462). (There are 40 studies identified in the references
that are contributing to the analyses, not 41 as stated in the
abstract.) After many analyses, McGrath et al. could not find
any published study that showed the expected attenuation of
predictor– criterion correlations when bias indicators were significant. As a result, they concluded that in any assessment
context, the support for the use of both types of response bias
indicators was weak. McGrath et al. concluded that in other
settings, including neuropsychological assessment, the findings
reported in the literature were too small and too unreliable to
recommend clinicians use them on a day-to-day basis.
McGrath et al.’s (2010) assertions are incorrect for three reasons: (a) the literature review was too narrowly focused on social
desirability, (b) McGrath et al. made errors in interpretation of the
literature they did include, and (c) they overgeneralized their
709
results. Contrary to their sweeping conclusion, the value of symptom validity testing is strongly supported by a comprehensive
literature review guided by a more accurate definition of a validity
scale’s validity.
Conceptual Problems: Narrow Definition
McGrath et al.’s (2010) best argument rests on their analysis of
positive bias indicators (i.e., “faking good,” social desirability,
nay-saying), which is the true focus of their paper. Our content
analysis showed 82% of the key studies were from either
industrial– organizational or individual differences domains, and
only 8% of them were from the forensic literature. Divided by type
of bias instrument, 85% dealt with positive bias but only 18% dealt
with negative bias (total greater than 100% because a few studies
included both). We conclude that McGrath et al. conducted insufficient sampling of the literature. For this reason alone, McGrath et
al. should not have made any generalizations, much less the
sweeping ones they did.
McGrath et al. (2010) also claimed that there was a lack of
evidence for the validity of the most commonly used measures of
response bias. This claim stands in sharp contrast to a large body
of literature that supports the use of such procedures (Boone, 2007;
Larrabee, 2007; Morgan & Sweet, 2009). Numerous articles have
been published supporting the validity of these measures, which
has led the major professional associations in clinical neuropsychology to recommend use of SVTs during all evaluations, particularly when the context of evaluation involves external incentives for poor performance (Bush et al., 2005; Heilbronner, Sweet,
Morgan, Larrabee, & Millis, 2009).
Obvious Errors in Interpreting Results
McGrath et al. (2010) described finding only two studies in
the area of forensic neuropsychological assessment that met
their criteria for inclusion in their review that used SVTs to
evaluate cognitive malingering, “despite extensive searching”
(p. 462). McGrath et al. cited Bowden, Shores, and Mathias
(2006) as failing to demonstrate significant interactions between an SVT, the Word Memory Test (WMT), three different
indicators, and posttraumatic amnesia (PTA) as a criterion of
injury severity. McGrath et al. also cited Rohling and Demakis
(2010) as failing to demonstrate such an interaction in their
reanalysis of Bowden et al. and the data presented by Green,
Rohling, Lees-Haley, and Allen (2001). But McGrath et al.
mischaracterized the analyses conducted by Bowden et al. and
by Rohling and Demakis. Neither study evaluated the prediction
of a criterion of injury severity (i.e., PTA), as stated by
McGrath et al. in text or as shown in their Table 4. Rather, both
Bowden et al. and Rohling and Demakis predicted a criterion of
neuropsychological test performance by the WMT, head injury
severity, and the interaction of WMT and head trauma severity.
Consequently, the data reported by Bowden et al. and by
Rohling and Demakis do not address the RBH/moderator effect
as defined by McGrath et al.
Furthermore, McGrath et al. (2010) did not consider the arguments made by Rohling and Demakis (2010) against looking for
presence of interactions between SVTs and injury severity for
710
ROHLING ET AL.
prediction of neuropsychological test scores as primary support
for the validity of an SVT. As Rohling and Demakis demonstrated,
using the data sets of Bowden et al. (2006) and Green et al. (2001),
neuropsychological test scores were associated with measures of
trauma severity, with key demographic factors (e.g., age and
education), and, to the largest degree, with scores from the WMT.
However, scores on the WMT were associated only with performance on the substantive indicator and not with indices of injury
severity (i.e., PTA and Glasgow Coma Scale) or demographic
measures that have clearly been shown to affect cognitive ability
(e.g., age and education). Rohling and Demakis noted that the
assumption an SVT must show an interaction between level of
performance on the SVT and trauma severity (with those mildly
injured performing worse than those severely injured) is not appropriate for two reasons. First, it is assumed that more severely
injured persons are unlikely to malinger but that most mildly
injured persons do malinger. Second, it is assumed that persons
with a mild traumatic brain injury (TBI) who malinger on cognitive ability tests will perform worse than more severely injured
persons will. Neither of these assumptions is valid. Only in special
circumstances will an interaction appear: when using a less severely injured group of severe TBI patients, none of whom malinger, and a mildly injured group, the majority of whom show
gross malingering.
Thus, the Bowden et al. (2006) and Rohling and Demakis
(2010) investigations demonstrated that the WMT is functioning in
a different manner than a neuropsychological test of verbal memory, as originally stated by Bowden et al., or than any other type
of neuropsychological test. In particular, the WMT does not show
predictive relationships with loss of consciousness, PTA, age, or
education, as do most tests of neuropsychological ability. This lack
of significant correlation provides indirect evidence that the test
measures response validity. This point does not address McGrath
et al.’s (2010) definition of a moderating effect supportive of the
RBH. However, such support is provided by Green et al. (2001),
cited by McGrath et al., which was the impetus for the Bowden et
al. investigation, as we discuss in the next section.
Confirmatory Bias: Evidence for Moderation
Overlooked
McGrath et al. (2010) incorrectly denied the existence of any
study proving that a bias indicator moderates accepted testcriterion associations. Many empirical studies published in peerreviewed journals were not included by McGrath et al., despite
meeting their inclusion criterion. For example, Greiffenstein and
Baker (2003) calculated the association between school records
and current IQ in a large sample of litigants seeking compensation
for a remote mild traumatic head injury. Historically, grade point
average (GPA) and IQ show a range of correlations that center at
.50 (Kaufman & Lichtenberg, 2006). In litigants who failed a
cognitive response bias measure (Reliable Digit Span [RDS],
which measures exaggerated inattention), the Full Scale IQ–GPA
correlation was .310 (p ⫽ .102), which is below the historical
range. By contrast, in litigants who passed the response bias
measure, GPA correlated with Full Scale IQ at a value of .551
(p ⬍ .0001), which is consistent with the historical range, although
this difference in correlation coefficients was slightly smaller than
is needed to reach the traditional level of statistical significance
(z ⫽ 1.21, p ⫽ .113). However, the discrepancy was more pronounced with Verbal IQ, which correlated with GPA at r ⫽ .323
(p ⫽ .087) in the group failing the RDS but at r ⫽ .646 (p ⬍
.0001) in the group passing the RDS. This difference in correlation
coefficients was statistically significant (z ⫽ 1.75, p ⫽ .04).
Clearly, in the case of the RDS (Greiffenstein, Baker, & Gola,
1994), response bias attenuated the correlation between criterion
and test.
Moderating effects of response bias indicators are also supported by the investigation of Green, Rohling, Iverson, and Gervais (2003), another paper not cited by McGrath et al. (2010).
Green et al. found that performance on a measure of olfactory
identification (Alberta Smell Test) was associated in doseresponse fashion with four indicators of brain injury severity (e.g.,
Glasgow Coma Scale level on admission). It is important to note,
in the light of the RBH, that this relationship existed only for
patients who passed response bias measures such as the Word
Memory Test and Computerized Assessment of Response Bias. In
the group that failed the response bias measures, the Alberta Smell
Test scores were not significantly related to any of the neurological
severity criteria.
Another paper omitted from McGrath et al. that demonstrated a
moderator effect of SVT was the investigation by Gervais, BenPorath, Wygant, and Green (2008). Gervais et al. found that the
Memory Complaints Inventory showed significant correlations
with six different scores from the California Verbal Learning Test
(CVLT), with correlations ranging from ⫺.19 to ⫺.26, all significant (p ⬍ .001), in a sample of 1,550 disability claimants. When
subjects who failed a SVT were excluded, the correlation between
the CVLT Total score and the Memory Complaints Inventory total
score was not significant (r ⫽ ⫺.07, p ⫽ .117, n ⫽ 513). However,
the correlation was significant when the sample examined included
only those who had failed an effort test (r ⫽ ⫺.47, p ⬍ .0001, n ⫽
347). Furthermore, the difference between these two correlations
was highly significant (z ⫽ 6.31, p ⬍ .0001). These data were not
reported by the original authors in their published manuscript but
were computed for the group failing the SVT (n ⫽ 347) using
simple algebraic substitution.
Green (2007) also demonstrated a moderator effect of a response bias measure, the WMT, in a comparison of CVLT scores
in brain-injured or neurological patients who had normal computed
tomography (CT) or magnetic resonance imaging (MRI) scans and
patients who had abnormal scans. The mean CVLT short- and
long-delayed recall scores were 9.3 (SD ⫽ 3.7) for the 321 subjects
in the normal scan group, which did not differ significantly from
the mean score of 8.9 (SD ⫽ 3.7) for the 314 subjects in the
abnormal scan group. When those who failed the WMT were
removed, the mean CVLT free-recall scores in the 174 subjects
with normal brain scans was 11.1 (SD ⫽ 3.1). This differed
significantly (p ⬍ .001) from the mean of 9.9 (SD ⫽ 3.2) for the
220 subjects with abnormal brain scans.
Of note, the Green et al. (2001) paper that prompted the Bowden
et al. (2006) paper and the Rohling and Demakis (2010) paper
described above presented evidence of a moderator effect that was
missed by McGrath et al. (2010). Green et al. compared three
groups. The first included “TBI-neuro” patients who had PTAs
greater than or equal to 1 day and/or a GCS score less than or equal
to 12. Of these TBI patients, 88% had substantiated cerebral
abnormality as evidenced by CT or MRI scans. Other patients
RESPONSE BIAS: COMMENT
included in this group were non-TBI neurological (e.g., stroke,
aneurysm) patients who had known cerebral impairment, also
evidenced by CT or MRI scans. The second consisted of mild TBI
patients who had PTAs less than 1 day and no abnormalities
evidenced on CT or MRI scans. The third consisted of psychiatric
(e.g., depression or anxiety) patients who were combined with
patients who had other orthopedic injuries, chronic pain, or fibromyalgia. The TBI-neuro group did not differ significantly from the
other two groups on a composite neuropsychological measure, the
Overall Test Battery Mean, until those who had failed the WMT
were excluded from the analyses. This is clear evidence of a
moderating effect, which supports the validity of the WMT as a
measure of response bias.
Thus, we have identified five studies not reviewed by McGrath
et al. (2010), all of which demonstrate that considering evidence of
negative response bias improves prediction (Gervais et al., 2008;
Green, 2007; Green et al., 2001, 2003; Greiffenstein & Baker,
2003). Moreover, we have demonstrated that McGrath et al.
missed a clear moderator effect in the Green et al. (2001) paper
they reviewed.
Peer Review Process Issues
All but two of the authors of this rebuttal (GJL and MFG) were
specifically acknowledged by McGrath et al. (2010) “for their
comments on drafts of this article and/or their help identifying
manuscripts for possible inclusion” (p. 450). Because we are all
widely published in the area of response bias, we are concerned
about the possibility that the “acknowledgment” may have implied
a sense of approval of the manuscript despite our unanimous
disagreement with McGrath et al.’s conclusions, notwithstanding
the disclaimer by McGrath et al. that the views of the paper were
“those of the authors and should not be taken as representing those
of our colleagues who provided input” (p. 450).
Closing Comments
In summary of our rebuttal, McGrath et al. (2010) reviewed only
a small part of the response bias literature, yet made inappropriately sweeping conclusions because they comingled positive and
negative response bias indicators. The two forms of response bias
are associated with different examinee motivations and goals, and
negative response bias is rare in personnel selection. They also
overlooked many articles in the forensic and neuropsychology
literature that actually support the RBH.
McGrath et al.’s (2010) erroneous conclusions could have an
unfortunate impact by calling into question the sound use of bias
indicators in clinical and forensic practice and research. Indeed,
McGrath et al. were recently cited by Libon (2010) to defend the
practice of not employing SVTs in an investigation of persons with
complex but medically unexplained pain. Libon raised this defense
when criticized by Victor, Boone, and Kulick (2010) for failure to
consider motivational factors in a potentially compensable context.
Application of the recommendations of McGrath et al. could
hamper clinicians who evaluate many important questions, such as
prospects for malingering in murder defendants making insanity
claims or the validity of cognitive deficits in a minor head trauma
lawsuit involving millions of dollars. Worse, McGrath et al.’s
conclusions may encourage misdiagnosis. Patients who are misdi-
711
agnosed may go on to develop iatrogenic illnesses and might then
receive unnecessary, ineffective, and potentially deleterious treatments.
We have detailed why we disagree with the conclusions put
forth by McGrath et al. (2010). Response bias measures have
substantial evidence of validity, and we contend that they should
commonly be used in clinical assessments and in clinical research.
The empirical evidence for their use is particularly strong in the
area of clinical neuropsychology. Finally, we recommend that
researchers submitting manuscripts to scientific journals that use
peer review seek permission from individuals whom they wish to
acknowledge. Failure to do so may influence the peer-review
process in a manner that can diminish its effectiveness.
References
Boone, K. B. (Ed.). (2007). Assessment of feigned cognitive impairment: A
neuropsychological perspective. New York, NY: Guilford Press.
Bowden, S. C., Shores, E. A., & Mathias, J. L. (2006). Does effort suppress
cognition after brain injury? A re-examination of the evidence for the
Word Memory Test. Clinical Neuropsychologist, 20, 858 – 872. doi:
10.1080/13854040500246935
Bush, S. S., Ruff, R. M., Tröster, A. I., Barth, J. T., Koffler, S. P., Pliskin,
N. H., . . . Silver, C. H.(2005). Symptom validity assessment: Practice
issues and medical necessity. Archives of Clinical Neuropsychology, 20,
419 – 426. doi:10.1016/j.acn.2005.02.002
Gervais, R. O., Ben-Porath, Y. S., Wygant, D. B., & Green, P. (2008).
Differential sensitivity of the Response Bias Scale (RBS) and MMPI–2
validity scales to memory complaints. Clinical Neuropsychologist, 22,
1061–1079. doi:10.1080/13854040701756930
Green, P. (2007). The pervasive influence of effort on neuropsychological
tests. Physical Medicine and Rehabilitation Clinics of North America,
18, 43– 68. doi:10.1016/j.pmr.2006.11.002
Green, P., Rohling, M. L., Iverson, G. L., & Gervais, R. O. (2003).
Relationships between olfactory discrimination and head injury severity.
Brain Injury, 17, 479 – 496. doi:10.1080/0269905031000070242
Green, P., Rohling, M. L., Lees-Haley, P. R., & Allen, L. M. (2001). Effort
has a greater effect on test scores than severe brain injury in compensation claimants. Brain Injury, 15, 1045–1060. doi:10.1080/
02699050110088254
Greiffenstein, M. F., & Baker, W. J. (2003). Premorbid clues? Preinjury
scholastic performance and present neuropsychological functioning in
late postconcussion syndrome. Clinical Neuropsychologist, 17, 561–
573. doi:10.1076/clin.17.4.561.27937
Greiffenstein, M. F., Baker, W. J., & Gola, T. (1994). Validation of
malingered amnesia measures with a large clinical sample. Psychological Assessment, 6, 218 –224. doi:10.1037/1040-3590.6.3.218
Heilbronner, R. L., Sweet, J. J., Morgan, J. E., Larrabee, G. J., & Millis,
S. R. (2009). American Academy of Clinical Neuropsychology consensus conference statement on neuropsychological assessment of effort,
response bias, and malingering. Clinical Neuropsychologist, 23, 1093–
1129. doi:10.1080/13854040903155063
Kaufman, A. S., & Lichtenberg, E. O. (2006). The assessment of adolescent and adult intelligence (3rd ed.). Hoboken, NJ: Wiley.
Larrabee, G. J. (Ed.). (2007). Assessment of malingered neuropsychological deficits. New York, NY: Oxford.
Libon, D. J. (2010). Neurobiological aspects of complex regional pain
syndrome (CRPS): Reply to Victor, Boone, and Kulick (2010). Journal
of the International Neuropsychological Society, 16, 1153–1154. doi:
10.1017/S1355617710001049
McGrath, R. E., Mitchell, M., Kim, B. H., & Hough, L. (2010). Evidence
for response bias as a source of error variance in applied assessment.
Psychological Bulletin, 136, 450 – 470. doi:10.1037/a0019216
712
ROHLING ET AL.
Morgan, J. E., & Sweet, J. J. (2009). Neuropsychology of malingering
casebook. New York, NY: Psychology Press.
Rohling, M. L., & Demakis, G. J. (2010). Bowden, Shores, & Mathias
(2006): Failure to replicate or just failure to notice. Does effort still
account for more variance in neuropsychological test scores than TBI
severity? Clinical Neuropsychologist, 24, 119 –136. doi:10.1080/
13854040903307243
Victor, T. L., Boone, K. B., & Kulick, A. D. (2010). My head hurts just
thinking about it. Journal of the International Neuropsychological Society, 16, 1151–1152. doi:10.1017/S1355617710000858
Received January 24, 2011
Revision received February 3, 2011
Accepted February 7, 2011 䡲