www.fgks.org   »   [go: up one dir, main page]

You are here: Census.govSubjects A to Z › Center for Statistical Research and Methodology (CSRM)
Skip top of page navigation

Center for Statistical Research and Methodology (CSRM)

Missing Data, Edit, and Imputation

Motivation: Missing data problems are endemic to the conduct of statistical experiments and data collection projects. The investigators almost never observe all the outcomes they had set out to record. When dealing with sample surveys or censuses, that means individuals or entities omit to respond, or give only part of the information they are being asked to provide. In addition the information provided may be logically inconsistent, which is tantamount to missing. To compute official statistics, agencies need to compensate for missing data. Available techniques for compensation include cell adjustments, imputation and editing, possibly aided by administrative information. All these techniques involve mathematical modeling along with subject matter experience.

Research Problem:

  • Compensating for missing data typically involves explicit or implicit modeling. Explicit methods include Bayesian multiple imputation, propensity score matching and direct substitution of information extracted from administrative records. Implicit methods revolve around donor-based techniques such as hot-deck imputation and predictive mean matching. All these techniques are subject to edit rules to ensure the logical consistency of the remedial product. Research on integrating together statistical validity and logical requirements into the process of imputing continues to be challenging. Another important problem is that of correctly quantifying the reliability of predictor in part through imputation, as their variance can be substantially greater than that computed nominally. Specific projects consider (1) nonresponse adjustment and imputation using administrative records, based on propensity and/or multiple imputation models and (2) simultaneous imputation of multiple survey variables to maintain joint properties, related to methods of evaluation of model-based imputation methods.

Potential Applications:

  • Research on missing data leads to improved overall data quality and predictors accuracy for any census or sample survey with a substantial frequency of missing data. It also leads to methods to adjust the variance to reflect the additional uncertainty created by the missing data. Given the continuously rising cost of conducting censuses and sample surveys, imputation and other missing-data compensation methods aided by administrative records may come to argument actual data collection, in the future.

Accomplishments (October 2015 - September 2016):

  • Researched modeling approaches for using administrative records in lieu of Decennial Census field visits due to forthcoming design decisions. Documented methodologies in scientific papers (Public Opinion Quarterly and Statistical Journal of the IAOS).
  • Supported the implementation of this research in the 2016 Census Test and presented in an invited session at the 2016 Joint Statistical Meetings.
  • Investigated the feasibility of using third party ("big") data from NPD Group, a major credit card, and First Data to supplement and/or enhance retail sales estimates in the Monthly/Annual Retail Trade Survey (MRTS and ARTS).
  • Designed and implemented a comparative analysis of the imputation error and fraction of missing information when applying the Sequential Regression Multivariate Imputation (SRMI) and the Ratio Expansion methods to imputing missing product data in the Economic Census.
  • Applied classification tree analysis to recommend a hot deck imputation method for imputing missing product data in the Economic Census and documented results in a FCSM Proceedings paper.
  • Collaborated in the development of four separate alternative methods to raking balance complexes in the Standard Economic Processing System (StEPS) when detail items are negative or there is subtraction in the balance complexes.
  • Set-up the problem of augmenting the exports and patents datasets with variables from the Business Register (BR) as a missing data problem and proposed two separate approaches: Statistical Matching and the multiple imputation procedure Sequential Regression Multivariate Imputation (SRMI).
  • Developed a system that generates essentially new implied edits based on given explicit edits.

Short-Term Activities (FY 2017):

  • Continue researching modeling approaches for using administrative records in lieu of Decennial Census field visits due to imminent design decisions.
  • Continue to investigate the feasibility of using third party ("big") data from various available sources to supplement and/or enhance retail sales estimates in the Monthly/Annual Retail Trade Survey (MRTS and ARTS).
  • Complete implementation of separate alternative methods for raking balance complexes in the Standard Economic Processing (StEPS) system when variables are allowed to take negative values or there is subtraction in the balance equations.
  • Continue research on augmenting export transactions and patents data files by adding variables form the business register (BR).
  • Continue work on heuristic methods for edit generations.

Longer-Term Activities (beyond FY 2017):

  • Continue researching modeling approaches for using administrative records in lieu of Decennial Census field visits to support future design decisions.
  • Research practical ways to apply decision theoretic concepts to the use of administrative records (versus personal contact or proxy response) in the Decennial Census.
  • Research joint models for longitudinal count data and missing data (e.g. drop out) using shared random effects to measure the association between propensity for nonresponse and the count outcome of interest.
  • Research imputation methods for a Decennial Census design that incorporates adaptive design and administrative records to reduce contacts and consequently increases proxy response and nonresponse.
  • Research macro and selective editing in the context of large sets of administrative records and high-bandwidth data stream (Big Data).
  • Continue collaboration on researching methods for data integration of the exports and patents data files with the Business Register (BR).
  • Evaluate the results of data corrections in the Standard Economic Processing System (StEPS) using new raking algorithms for adjusting balance complexes.
  • Continue research on edit procedures.

Selected Publications:

Bechtel, L., Morris, D.S., and Thompson, K.J. (2015). "Using Classification Trees to Recommend Hot Deck Imputation Methods: A Case Study." In FCSM Proceedings. Washington, DC: Federal Committee on Statistical Methodology.
Klemens, B., Rodriguez, R., and Thibaudeau, Y. (2014). "Simultaneous Editing and Imputation for ACS Data." ACS Work Request RSI 4-3-0094 (supporting research project report in progress).
Garcia, M., Morris, D.S., and Diamond, L.K. (2015). "Implementation of Ratio Imputation and Sequential Regression Multivariate Imputation on Economic Census Products." Proceedings of the Joint Statistical Meetings.
Morris, D.S., Keller, A., and Clark, B. (2016). "An Approach for Using Administrative Records to Reduce Contacts in the 2020 Census." Statistical Journal of the International Association for Official Statistics, 32(2): 177-188.
Morris, D. S. (2014). "A Comparison of Methodologies for Classification of Administrative Records Quality for Census Enumeration," Public Opinion Quarterly (to appear).
Thibaudeau Y., Slud, E., and Gottschalck, A. O. (2011). "Modeling Log-Linear Conditional Probabilities for Prediction in Surveys," Proceedings of the 2010 Joint Statistical Meetings, American Statistical Association, Alexandria, VA.
Thibaudeau, Y., Shao, J., and Mulrow, J. (2007). "A Study of Basic Calibration Estimators in Presence of Nonresponse,"Proceedings of the American Statistical Association, American Statistical Association, Alexandria, VA.
Thibaudeau, Y. (2002). "Model Explicit Item Imputation for Demographic Categories," Survey Methodology, 28(2), 135-143. Winkler, W. E. (2008). "General Methods and Algorithms for Imputing Discrete Data under a Variety of Constraints," Research Report Series (Statistics #2008-08), Statistical Research Division, U.S. Census Bureau, Washington DC.
Winkler, W. and Garcia, M. (2009). "Determining a Set of Edits," Research Report Series (Statistics #2009-05), Statistical Research Division, U.S. Census Bureau, Washington, DC.

Contact: Yves Thibaudeau, Maria Garcia, Martin Klein, Darcy Morris, Jun Shao, Eric Slud, William Winkler, Xiaoyun Lu

Funding Sources for FY 2017:

  • 0331 - Working Capital Fund / General Research Project
    Various Decennial, Demographic, and Economic Projects

Annual and Quarterly Reports

Contact

Tommy Wright, Center Chief, 301-763-1702
tommy.wright@census.gov

Kelly Taylor, Center Secretary, 301-763-4896
kelly.l.taylor@census.gov

Organization Chart

Source: U.S. Census Bureau | Research and Methodology Directorate | Center for Statistical Research & Methodology | (301) 763-9862 (or lauren.emanuel@census.gov) |   Last Revised: January 25, 2017