www.fgks.org   »   [go: up one dir, main page]

Academia.eduAcademia.edu
Psychological Bulletin 2011, Vol. 137, No. 4, 708 –712 © 2011 American Psychological Association 0033-2909/11/$12.00 DOI: 10.1037/a0023327 COMMENT A Misleading Review of Response Bias: Comment on McGrath, Mitchell, Kim, and Hough (2010) Martin L. Rohling Glenn J. Larrabee University of South Alabama Sarasota, Florida Manfred F. Greiffenstein Yossef S. Ben-Porath Royal Oak, Michigan Kent State University Paul Lees-Haley Paul Green Huntsville, Alabama Neurobehavioural Associates, Edmonton, Alberta, Canada Kevin W. Greve University of New Orleans In the May 2010 issue of Psychological Bulletin, R. E. McGrath, M. Mitchell, B. H. Kim, and L. Hough published an article entitled “Evidence for Response Bias as a Source of Error Variance in Applied Assessment” (pp. 450 – 470). They argued that response bias indicators used in a variety of settings typically have insufficient data to support such use in everyday clinical practice. Furthermore, they claimed that despite 100 years of research into the use of response bias indicators, “a sufficient justification for 关their兴 use . . . in applied settings remains elusive” (p. 450). We disagree with McGrath et al.’s conclusions. In fact, we assert that the relevant and voluminous literature that has addressed the issues of response bias substantiates validity of these indicators. In addition, we believe that response bias measures should be used in clinical and research settings on a regular basis. Finally, the empirical evidence for the use of response bias measures is strongest in clinical neuropsychology. We argue that McGrath et al.’s erroneous perspective on response bias measures is a result of 3 errors in their research methodology: (a) inclusion criteria for relevant studies that are too narrow; (b) errors in interpreting results of the empirical research they did include; (c) evidence of a confirmatory bias in selectively citing the literature, as evidence of moderation appears to have been overlooked. Finally, their acknowledging experts in the field who might have highlighted these errors prior to publication may have prevented critiques during the review process. Keywords: response bias, suppressor variables, moderating variables, employee selection, disability evaluation A critical scientific review, a format in which Psychological Bulletin excels, requires authors to address fundamental issues in the scientific literature in a comprehensive and accurate manner. This process entails consideration of a complete literature to avoid engaging in confirmatory bias. Conclusions and generalizations should be clearly supported, particularly those involving rejection of alternative hypotheses. McGrath, Mitchell, Kim, and Hough’s (2010) paper on the validity of response bias indicators (symptom validity tests, or SVTs) deviated from sound review practices, made erroneous characterizations of typical practice, and advanced sweeping but unfounded conclusions. Using examples from the literature on SVTs in neuropsychological settings as illustrations, we challenge the methods, literature sampling, logic, major assertions, and tacit assumptions of McGrath et al. (2010).1 Their conclusions may be applicable to a Martin L. Rohling, Department of Psychology, University of South Alabama; Glenn J. Larrabee, Independent Practice, Sarasota, Florida; Manfred F. Greiffenstein, Independent Practice, Royal Oak, Michigan; Yossef S. Ben-Porath, Department of Psychology, Kent State University; Paul Lees-Haley, Independent Practice, Huntsville, Alabama; Paul Green, Neurobehavioural Associates, Edmonton, Alberta, Canada; Kevin W. Greve, Department of Psychology, University of New Orleans. Correspondence concerning this article should be addressed to Martin L. Rohling, Department of Psychology, 331 Life Sciences Building, University of South Alabama, Mobile, AL 36688-0002. E-mail: mrohling@usouthal.edu 1 All of the authors of the present commentary have conducted empirical research that has been published in peer-reviewed articles on symptom validity tests (SVTs) in applied neuropsychological settings. Furthermore, most of us were part of the American Academy of Clinical Neuropsychology consensus conference statement on the neuropsychological assessment of effort, response bias, and malingering (Heilbronner, Sweet, Morgan, Larrabee, & Millis, 2009). As such, we have firsthand experience in research on SVTs as well as in clinical application of these techniques. 708 RESPONSE BIAS: COMMENT small sample of the psychological assessment world, specifically, to measurement of social desirability in industrial– organization settings. However, McGrath et al. ignored a substantial portion of the relevant research on negative response bias that is contrary to their conclusions. Peer-reviewed empirical evidence strongly supports the inclusion of bias indicators in psychological assessment, particularly in the context of forensic and disability evaluation. A Summary of McGrath et al.’s (2010) Position McGrath et al. (2010) reviewed and quantitatively analyzed selected literature to inform readers on a major issue: Are reporting bias indicators (validity, response style) adequately validated as measures of invalid responding? Are they valid for the purpose intended? This is an important question because a conclusion of inaccurate responding has many ramifications for the assessed individual. McGrath et al. defined two types of response bias: positive impression management (i.e., exaggerating uncommon virtues or denying flaws) and negative impression management (i.e., exaggerating or faking impairment). They also recognized the distinction between measures of biased self-report, when responding to questions on a personality test, and measures of test-taking effort, when cognitive and perceptual–motor performance is at issue. We do not dispute these definitions. McGrath et al. (2010) asserted that valid bias indicators should have two proven properties: (a) Response bias suppresses or moderates the criterion-related validity of substantive psychological indicators, and (b) bias indicators are capable of detecting response bias. They proposed a standard for examining the validity of validity indicators, a testable hypothesis termed the response bias hypothesis (RBH): “A valid bias indicator should be able to enhance [emphasis added] the predictive accuracy of a valid substantive indicator” (p. 452). In other words, to be proven valid, a response bias measure should augment or improve prediction of an external criterion (e.g., diagnosis, cognitive level, or real-world behaviors) when biased respondents are removed from a sample. McGrath et al. (2010) excluded from their analyses many SVT findings that contained variables sufficient to test the RBH. Four thousand studies were identified, yet only 40 of these purportedly met their inclusion criteria, and only two of those studies examined negative response bias in neuropsychology (p. 462). (There are 40 studies identified in the references that are contributing to the analyses, not 41 as stated in the abstract.) After many analyses, McGrath et al. could not find any published study that showed the expected attenuation of predictor– criterion correlations when bias indicators were significant. As a result, they concluded that in any assessment context, the support for the use of both types of response bias indicators was weak. McGrath et al. concluded that in other settings, including neuropsychological assessment, the findings reported in the literature were too small and too unreliable to recommend clinicians use them on a day-to-day basis. McGrath et al.’s (2010) assertions are incorrect for three reasons: (a) the literature review was too narrowly focused on social desirability, (b) McGrath et al. made errors in interpretation of the literature they did include, and (c) they overgeneralized their 709 results. Contrary to their sweeping conclusion, the value of symptom validity testing is strongly supported by a comprehensive literature review guided by a more accurate definition of a validity scale’s validity. Conceptual Problems: Narrow Definition McGrath et al.’s (2010) best argument rests on their analysis of positive bias indicators (i.e., “faking good,” social desirability, nay-saying), which is the true focus of their paper. Our content analysis showed 82% of the key studies were from either industrial– organizational or individual differences domains, and only 8% of them were from the forensic literature. Divided by type of bias instrument, 85% dealt with positive bias but only 18% dealt with negative bias (total greater than 100% because a few studies included both). We conclude that McGrath et al. conducted insufficient sampling of the literature. For this reason alone, McGrath et al. should not have made any generalizations, much less the sweeping ones they did. McGrath et al. (2010) also claimed that there was a lack of evidence for the validity of the most commonly used measures of response bias. This claim stands in sharp contrast to a large body of literature that supports the use of such procedures (Boone, 2007; Larrabee, 2007; Morgan & Sweet, 2009). Numerous articles have been published supporting the validity of these measures, which has led the major professional associations in clinical neuropsychology to recommend use of SVTs during all evaluations, particularly when the context of evaluation involves external incentives for poor performance (Bush et al., 2005; Heilbronner, Sweet, Morgan, Larrabee, & Millis, 2009). Obvious Errors in Interpreting Results McGrath et al. (2010) described finding only two studies in the area of forensic neuropsychological assessment that met their criteria for inclusion in their review that used SVTs to evaluate cognitive malingering, “despite extensive searching” (p. 462). McGrath et al. cited Bowden, Shores, and Mathias (2006) as failing to demonstrate significant interactions between an SVT, the Word Memory Test (WMT), three different indicators, and posttraumatic amnesia (PTA) as a criterion of injury severity. McGrath et al. also cited Rohling and Demakis (2010) as failing to demonstrate such an interaction in their reanalysis of Bowden et al. and the data presented by Green, Rohling, Lees-Haley, and Allen (2001). But McGrath et al. mischaracterized the analyses conducted by Bowden et al. and by Rohling and Demakis. Neither study evaluated the prediction of a criterion of injury severity (i.e., PTA), as stated by McGrath et al. in text or as shown in their Table 4. Rather, both Bowden et al. and Rohling and Demakis predicted a criterion of neuropsychological test performance by the WMT, head injury severity, and the interaction of WMT and head trauma severity. Consequently, the data reported by Bowden et al. and by Rohling and Demakis do not address the RBH/moderator effect as defined by McGrath et al. Furthermore, McGrath et al. (2010) did not consider the arguments made by Rohling and Demakis (2010) against looking for presence of interactions between SVTs and injury severity for 710 ROHLING ET AL. prediction of neuropsychological test scores as primary support for the validity of an SVT. As Rohling and Demakis demonstrated, using the data sets of Bowden et al. (2006) and Green et al. (2001), neuropsychological test scores were associated with measures of trauma severity, with key demographic factors (e.g., age and education), and, to the largest degree, with scores from the WMT. However, scores on the WMT were associated only with performance on the substantive indicator and not with indices of injury severity (i.e., PTA and Glasgow Coma Scale) or demographic measures that have clearly been shown to affect cognitive ability (e.g., age and education). Rohling and Demakis noted that the assumption an SVT must show an interaction between level of performance on the SVT and trauma severity (with those mildly injured performing worse than those severely injured) is not appropriate for two reasons. First, it is assumed that more severely injured persons are unlikely to malinger but that most mildly injured persons do malinger. Second, it is assumed that persons with a mild traumatic brain injury (TBI) who malinger on cognitive ability tests will perform worse than more severely injured persons will. Neither of these assumptions is valid. Only in special circumstances will an interaction appear: when using a less severely injured group of severe TBI patients, none of whom malinger, and a mildly injured group, the majority of whom show gross malingering. Thus, the Bowden et al. (2006) and Rohling and Demakis (2010) investigations demonstrated that the WMT is functioning in a different manner than a neuropsychological test of verbal memory, as originally stated by Bowden et al., or than any other type of neuropsychological test. In particular, the WMT does not show predictive relationships with loss of consciousness, PTA, age, or education, as do most tests of neuropsychological ability. This lack of significant correlation provides indirect evidence that the test measures response validity. This point does not address McGrath et al.’s (2010) definition of a moderating effect supportive of the RBH. However, such support is provided by Green et al. (2001), cited by McGrath et al., which was the impetus for the Bowden et al. investigation, as we discuss in the next section. Confirmatory Bias: Evidence for Moderation Overlooked McGrath et al. (2010) incorrectly denied the existence of any study proving that a bias indicator moderates accepted testcriterion associations. Many empirical studies published in peerreviewed journals were not included by McGrath et al., despite meeting their inclusion criterion. For example, Greiffenstein and Baker (2003) calculated the association between school records and current IQ in a large sample of litigants seeking compensation for a remote mild traumatic head injury. Historically, grade point average (GPA) and IQ show a range of correlations that center at .50 (Kaufman & Lichtenberg, 2006). In litigants who failed a cognitive response bias measure (Reliable Digit Span [RDS], which measures exaggerated inattention), the Full Scale IQ–GPA correlation was .310 (p ⫽ .102), which is below the historical range. By contrast, in litigants who passed the response bias measure, GPA correlated with Full Scale IQ at a value of .551 (p ⬍ .0001), which is consistent with the historical range, although this difference in correlation coefficients was slightly smaller than is needed to reach the traditional level of statistical significance (z ⫽ 1.21, p ⫽ .113). However, the discrepancy was more pronounced with Verbal IQ, which correlated with GPA at r ⫽ .323 (p ⫽ .087) in the group failing the RDS but at r ⫽ .646 (p ⬍ .0001) in the group passing the RDS. This difference in correlation coefficients was statistically significant (z ⫽ 1.75, p ⫽ .04). Clearly, in the case of the RDS (Greiffenstein, Baker, & Gola, 1994), response bias attenuated the correlation between criterion and test. Moderating effects of response bias indicators are also supported by the investigation of Green, Rohling, Iverson, and Gervais (2003), another paper not cited by McGrath et al. (2010). Green et al. found that performance on a measure of olfactory identification (Alberta Smell Test) was associated in doseresponse fashion with four indicators of brain injury severity (e.g., Glasgow Coma Scale level on admission). It is important to note, in the light of the RBH, that this relationship existed only for patients who passed response bias measures such as the Word Memory Test and Computerized Assessment of Response Bias. In the group that failed the response bias measures, the Alberta Smell Test scores were not significantly related to any of the neurological severity criteria. Another paper omitted from McGrath et al. that demonstrated a moderator effect of SVT was the investigation by Gervais, BenPorath, Wygant, and Green (2008). Gervais et al. found that the Memory Complaints Inventory showed significant correlations with six different scores from the California Verbal Learning Test (CVLT), with correlations ranging from ⫺.19 to ⫺.26, all significant (p ⬍ .001), in a sample of 1,550 disability claimants. When subjects who failed a SVT were excluded, the correlation between the CVLT Total score and the Memory Complaints Inventory total score was not significant (r ⫽ ⫺.07, p ⫽ .117, n ⫽ 513). However, the correlation was significant when the sample examined included only those who had failed an effort test (r ⫽ ⫺.47, p ⬍ .0001, n ⫽ 347). Furthermore, the difference between these two correlations was highly significant (z ⫽ 6.31, p ⬍ .0001). These data were not reported by the original authors in their published manuscript but were computed for the group failing the SVT (n ⫽ 347) using simple algebraic substitution. Green (2007) also demonstrated a moderator effect of a response bias measure, the WMT, in a comparison of CVLT scores in brain-injured or neurological patients who had normal computed tomography (CT) or magnetic resonance imaging (MRI) scans and patients who had abnormal scans. The mean CVLT short- and long-delayed recall scores were 9.3 (SD ⫽ 3.7) for the 321 subjects in the normal scan group, which did not differ significantly from the mean score of 8.9 (SD ⫽ 3.7) for the 314 subjects in the abnormal scan group. When those who failed the WMT were removed, the mean CVLT free-recall scores in the 174 subjects with normal brain scans was 11.1 (SD ⫽ 3.1). This differed significantly (p ⬍ .001) from the mean of 9.9 (SD ⫽ 3.2) for the 220 subjects with abnormal brain scans. Of note, the Green et al. (2001) paper that prompted the Bowden et al. (2006) paper and the Rohling and Demakis (2010) paper described above presented evidence of a moderator effect that was missed by McGrath et al. (2010). Green et al. compared three groups. The first included “TBI-neuro” patients who had PTAs greater than or equal to 1 day and/or a GCS score less than or equal to 12. Of these TBI patients, 88% had substantiated cerebral abnormality as evidenced by CT or MRI scans. Other patients RESPONSE BIAS: COMMENT included in this group were non-TBI neurological (e.g., stroke, aneurysm) patients who had known cerebral impairment, also evidenced by CT or MRI scans. The second consisted of mild TBI patients who had PTAs less than 1 day and no abnormalities evidenced on CT or MRI scans. The third consisted of psychiatric (e.g., depression or anxiety) patients who were combined with patients who had other orthopedic injuries, chronic pain, or fibromyalgia. The TBI-neuro group did not differ significantly from the other two groups on a composite neuropsychological measure, the Overall Test Battery Mean, until those who had failed the WMT were excluded from the analyses. This is clear evidence of a moderating effect, which supports the validity of the WMT as a measure of response bias. Thus, we have identified five studies not reviewed by McGrath et al. (2010), all of which demonstrate that considering evidence of negative response bias improves prediction (Gervais et al., 2008; Green, 2007; Green et al., 2001, 2003; Greiffenstein & Baker, 2003). Moreover, we have demonstrated that McGrath et al. missed a clear moderator effect in the Green et al. (2001) paper they reviewed. Peer Review Process Issues All but two of the authors of this rebuttal (GJL and MFG) were specifically acknowledged by McGrath et al. (2010) “for their comments on drafts of this article and/or their help identifying manuscripts for possible inclusion” (p. 450). Because we are all widely published in the area of response bias, we are concerned about the possibility that the “acknowledgment” may have implied a sense of approval of the manuscript despite our unanimous disagreement with McGrath et al.’s conclusions, notwithstanding the disclaimer by McGrath et al. that the views of the paper were “those of the authors and should not be taken as representing those of our colleagues who provided input” (p. 450). Closing Comments In summary of our rebuttal, McGrath et al. (2010) reviewed only a small part of the response bias literature, yet made inappropriately sweeping conclusions because they comingled positive and negative response bias indicators. The two forms of response bias are associated with different examinee motivations and goals, and negative response bias is rare in personnel selection. They also overlooked many articles in the forensic and neuropsychology literature that actually support the RBH. McGrath et al.’s (2010) erroneous conclusions could have an unfortunate impact by calling into question the sound use of bias indicators in clinical and forensic practice and research. Indeed, McGrath et al. were recently cited by Libon (2010) to defend the practice of not employing SVTs in an investigation of persons with complex but medically unexplained pain. Libon raised this defense when criticized by Victor, Boone, and Kulick (2010) for failure to consider motivational factors in a potentially compensable context. Application of the recommendations of McGrath et al. could hamper clinicians who evaluate many important questions, such as prospects for malingering in murder defendants making insanity claims or the validity of cognitive deficits in a minor head trauma lawsuit involving millions of dollars. Worse, McGrath et al.’s conclusions may encourage misdiagnosis. Patients who are misdi- 711 agnosed may go on to develop iatrogenic illnesses and might then receive unnecessary, ineffective, and potentially deleterious treatments. We have detailed why we disagree with the conclusions put forth by McGrath et al. (2010). Response bias measures have substantial evidence of validity, and we contend that they should commonly be used in clinical assessments and in clinical research. The empirical evidence for their use is particularly strong in the area of clinical neuropsychology. Finally, we recommend that researchers submitting manuscripts to scientific journals that use peer review seek permission from individuals whom they wish to acknowledge. Failure to do so may influence the peer-review process in a manner that can diminish its effectiveness. References Boone, K. B. (Ed.). (2007). Assessment of feigned cognitive impairment: A neuropsychological perspective. New York, NY: Guilford Press. Bowden, S. C., Shores, E. A., & Mathias, J. L. (2006). Does effort suppress cognition after brain injury? A re-examination of the evidence for the Word Memory Test. Clinical Neuropsychologist, 20, 858 – 872. doi: 10.1080/13854040500246935 Bush, S. S., Ruff, R. M., Tröster, A. I., Barth, J. T., Koffler, S. P., Pliskin, N. H., . . . Silver, C. H.(2005). Symptom validity assessment: Practice issues and medical necessity. Archives of Clinical Neuropsychology, 20, 419 – 426. doi:10.1016/j.acn.2005.02.002 Gervais, R. O., Ben-Porath, Y. S., Wygant, D. B., & Green, P. (2008). Differential sensitivity of the Response Bias Scale (RBS) and MMPI–2 validity scales to memory complaints. Clinical Neuropsychologist, 22, 1061–1079. doi:10.1080/13854040701756930 Green, P. (2007). The pervasive influence of effort on neuropsychological tests. Physical Medicine and Rehabilitation Clinics of North America, 18, 43– 68. doi:10.1016/j.pmr.2006.11.002 Green, P., Rohling, M. L., Iverson, G. L., & Gervais, R. O. (2003). Relationships between olfactory discrimination and head injury severity. Brain Injury, 17, 479 – 496. doi:10.1080/0269905031000070242 Green, P., Rohling, M. L., Lees-Haley, P. R., & Allen, L. M. (2001). Effort has a greater effect on test scores than severe brain injury in compensation claimants. Brain Injury, 15, 1045–1060. doi:10.1080/ 02699050110088254 Greiffenstein, M. F., & Baker, W. J. (2003). Premorbid clues? Preinjury scholastic performance and present neuropsychological functioning in late postconcussion syndrome. Clinical Neuropsychologist, 17, 561– 573. doi:10.1076/clin.17.4.561.27937 Greiffenstein, M. F., Baker, W. J., & Gola, T. (1994). Validation of malingered amnesia measures with a large clinical sample. Psychological Assessment, 6, 218 –224. doi:10.1037/1040-3590.6.3.218 Heilbronner, R. L., Sweet, J. J., Morgan, J. E., Larrabee, G. J., & Millis, S. R. (2009). American Academy of Clinical Neuropsychology consensus conference statement on neuropsychological assessment of effort, response bias, and malingering. Clinical Neuropsychologist, 23, 1093– 1129. doi:10.1080/13854040903155063 Kaufman, A. S., & Lichtenberg, E. O. (2006). The assessment of adolescent and adult intelligence (3rd ed.). Hoboken, NJ: Wiley. Larrabee, G. J. (Ed.). (2007). Assessment of malingered neuropsychological deficits. New York, NY: Oxford. Libon, D. J. (2010). Neurobiological aspects of complex regional pain syndrome (CRPS): Reply to Victor, Boone, and Kulick (2010). Journal of the International Neuropsychological Society, 16, 1153–1154. doi: 10.1017/S1355617710001049 McGrath, R. E., Mitchell, M., Kim, B. H., & Hough, L. (2010). Evidence for response bias as a source of error variance in applied assessment. Psychological Bulletin, 136, 450 – 470. doi:10.1037/a0019216 712 ROHLING ET AL. Morgan, J. E., & Sweet, J. J. (2009). Neuropsychology of malingering casebook. New York, NY: Psychology Press. Rohling, M. L., & Demakis, G. J. (2010). Bowden, Shores, & Mathias (2006): Failure to replicate or just failure to notice. Does effort still account for more variance in neuropsychological test scores than TBI severity? Clinical Neuropsychologist, 24, 119 –136. doi:10.1080/ 13854040903307243 Victor, T. L., Boone, K. B., & Kulick, A. D. (2010). My head hurts just thinking about it. Journal of the International Neuropsychological Society, 16, 1151–1152. doi:10.1017/S1355617710000858 Received January 24, 2011 Revision received February 3, 2011 Accepted February 7, 2011 䡲