www.fgks.org   »   [go: up one dir, main page]

You are here: Census.govSubjects A to Z › Center for Statistical Research and Methodology (CSRM)
Skip top of page navigation

Center for Statistical Research and Methodology (CSRM)

Simulation and Statistical Modeling

Motivation: Simulation studies that are carefully designed under realistic survey conditions can be used to evaluate the quality of new statistical methodology for Census Bureau data. Furthermore, new computationally intensive statistical methodology is often beneficial because it can require less strict assumptions, offer more flexibility in sampling or modeling, accommodate complex features in the data, enable valid inference where other methods might fail, etc. Statistical modeling is at the core of the design of realistic simulation studies and the development of computationally intensive statistical methods. Modeling also enables one to efficiently use all available information when producing estimates. Such studies can benefit from software for data processing. Statistical disclosure avoidance methods are also developed and properties studied.

Research Problem:

  • Systematically develop an environment for simulating complex surveys that can by used as a test-bed for new data analysis methods.
  • Develop flexible model-based estimation methods for survey data.
  • Develop new methods for statistical disclosure control that simultaneously protect confidential data from disclosure while enabling valid inferences to be drawn on relevant population parameters.
  • Investigate the bootstrap for analyzing data from complex sample surveys.
  • Develop models for the analysis of measurement errors in Demographic sample surveys (e.g., Current Population Survey or the Survey of Income and Program Participation).
  • Identify and develop statistical models (e.g., loglinear models, mixture models, and mixed-effects models) to characterize relationships between variables measured in censuses, sample surveys, and administrative records.
  • Investigate noise multiplication for statistical disclosure control.

Potential Applications:

  • Simulating data collection operations using Monte Carlo techniques can help the Census Bureau make more efficient changes.
  • Use noise multiplication or synthetic data as an alternative to top coding for statistical disclosure control in publicly released data. Both noise multiplication and synthetic data have the potential to preserve more information in the released data over top coding.
  • Rigorous statistical disclosure control methods allow for the release of new microdata products.
  • Using an environment for simulating complex surveys, statistical properties of new methods for missing data imputation, model-based estimation, small area estimation, etc. can be evaluated.
  • Model-based estimation procedures enable efficient use of auxiliary information (for example, Economic Census information in business surveys), and can be applied in situations where variables are highly skewed and sample sizes are not sufficiently large to justify normal approximations. These methods may also be applicable to analyze data arising from a mechanism other than random sampling.
  • Variance estimates and confidence intervals in complex surveys can be obtained via the bootstrap.
  • Modeling approaches with administrative records can help enhance the information obtained from various sample surveys.

Accomplishments (October 2015 - September 2016):

  • Continued developing model based methods for analyzing singly imputed synthetic data under multiple linear regression model and multivariate normal models.
  • Under the framework of linear regression, evaluated and compared properties of inference derived from singly and multiply imputed synthetic data when the data generation, imputation, and analysis models differ. This work assumed that synthetic data were generated via plug-in sampling.
  • Developed exact model-based methods for analyzing singly imputed synthetic data under a multivariate multiple linear regression model.
  • Evaluated several data visualization methods for comparing populations and determining if there is a statistically significant difference between two population parameters.
  • Used data collected by National Crime Victimization Survey (NCVS) Field Representatives to model daily response propensity in the NCVS over a nine month period; identified a set of covariates that serve as strong predictors for response propensity.
  • Created new data visualizations for displaying bootstrap-based inferences for rankings of states based on American Community Survey data.
  • Constructed an entirely new and improved version of a realistic artificial population used to simulate Monthly Wholesale Trade Survey.

Short-Term Activities (FY 2017):

  • Develop exact model-based methods for analyzing multiply imputed synthetic data under a linear regression model and compare with inferences derived using the current state of the art combination formulas.
  • Develop methodology that produces synthetic data whose distribution is the same or similar to that of the original data.
  • Evaluate bootstrap confidence intervals for unknown population ranks and continue evaluating visualization methods for comparing populations.
  • Evaluate properties of synthetic data under formal privacy definitions.
  • Expand the evaluation of properties of synthetic data when the data generation, imputation, and analysis models differ.
  • Evaluate bootstrap inference on sample survey data under some specific scenarios.
  • Refine the artificial population used to simulate Monthly Wholesale Trade Survey Data.

Longer-Term Activities (beyond FY 2017):

  • Develop likelihood-based methods for analyzing singly and multiply imputed synthetic data under various realistic scenarios; develop noise infusion methods for statistical disclosure control.
  • Develop bootstrap methods for analyzing synthetic and noise infused data.
  • Study ways of quantifying the privacy protection/data utility tradeoff in statistical disclosure control.
  • Develop and study bootstrap methods for sample survey data.
  • Create an environment for simulating complex aspects of economic/demographic surveys.
  • Study properties of bootstrap methodology for quantifying uncertainty in statistical rankings and refine visualizations.

Selected Publications:

Moura, R., Klein, M., Coelho, C. and Sinha, B. (2016). "Inference for Multivariate Regression Model based on Synthetic Data generated under Fixed-Posterior Predictive Sampling: Comparison with Plug-in Sampling." To appear in REVSTAT - Statistical Journal.
Klein, M., and Sinha, B. (2016). "Likelihood Based Finite Sample Inference for Singly Imputed Synthetic Data Under the Multivariate Normal and Multiple Linear Regression Models," Journal of Privacy and Confidentiality,7: 43-98.
Klein, M., and Sinha, B. (2015). "Inference for Singly Imputed Synthetic Data Based on Posterior Predictive Sampling under Multivariate Normal and Multiple Linear Regression Models," Sankhya B: The Indian Journal of Statistics 77-B, 293-311.
Klein, M., and Sinha, B. (2015). "Likelihood-Based Inference for Singly and Multiply Imputed Synthetic Data under a Normal Model," Statistics and Probability Letters, 105, 168-175.
Klein, M., and Sinha, B. (2015). "Likelihood-Based Finite Sample Inference for Synthetic Data Based on Exponential Model," Thailand Statistician: Journal of The Thai Statistical Association, 13, 33-47.
Wright, T., Klein, M., and Wieczorek, J. (2014). "Ranking Populations Based on Sample Survey Data," Center for Statistical Research and Methodology, Research and Methodology Directorate Research Report Series (Statistics #2014-12). U.S. Census Bureau. Available online: http://www.census.gov/csrm/papers/pdf/rrs2014-12.pdf.
Klein, M., Lineback, J.F., and Schafer, J. (2014). "Evaluating Imputation Techniques in the Monthly Wholesale Trade Survey," Proceedings of the Joint Statistical Meetings, Alexandria, VA: American Statistical Association.
Klein, M., Mathew, T., and Sinha, B. (2014). "Noise Multiplication for Statistical Disclosure Control of Extreme Values in Log-normal Regression Samples." Journal of Privacy and Confidentiality, 6, 77-125.
Klein, M., Mathew, T., and Sinha, B. (2014). "Likelihood Based Inference Under Noise Multiplication," Thailand Statistician: Journal of The Thai Statistical Association, 12, 1-23.
Wright, T., Klein, M., and Wieczorek, J. (2013). "An Overview of Some Concepts for Potential Use in Ranking Populations Based on Sample Survey Data," The 59th International Statistical Institute World Statistics Congress, Hong Kong, China.
Klein, M., and Sinha, B. (2013). "Statistical Analysis of Noise Multiplied Data Using Multiple Imputation," Journal of Official Statistics, 29, 425-465.
Klein, M., and Linton, P. (2013). "On a Comparison of Tests of Homogeneity of Binomial Proportions," Journal of Statistical Theory and Applications, 12, 208-224.
Klein, M., Mathew, T., and Sinha, B. (2013). "A Comparison of Statistical Disclosure Control Methods: Multiple Imputation Versus Noise Multiplication." Center for Statistical Research and Methodology, Research and Methodology Directorate Research Report Series (Statistics #2013-02). U.S. Census Bureau. Available online: http://www.census.gov/csrm/papers/pdf/rrs2013-02.pdf.
Shao, J., Klein, M., and Xu, J. (2012). "Imputation for Nonmonotone Nonresponse in the Survey of Industrial Research and Development," Survey Methodology, 38, 143-155.
Klein, M., and Wright, T. (2011). "Ranking Procedures for Several Normal Populations: An Empirical Investigation," International Journal of Statistical Sciences, 11, 37-58.
Klein, M., and Creecy, R. (2010). "Steps Toward Creating a Fully Synthetic Decennial Census Microdata File," Proceedings of the Joint Statistical Meetings, Alexandria, VA: American Statistical Association.

Contact: Martin Klein, Isaac Dompreh, Brett Moran, Bimal Sinha

Funding Sources for FY 2017:

  • 0331 - Working Capital Fund / General Research Project
    Various Decennial, Demographic, and Economic Projects

Annual and Quarterly Reports

Contact

Tommy Wright, Center Chief, 301-763-1702
tommy.wright@census.gov

Kelly Taylor, Center Secretary, 301-763-4896
kelly.l.taylor@census.gov

Organization Chart

Source: U.S. Census Bureau | Research and Methodology Directorate | Center for Statistical Research & Methodology | (301) 763-9862 (or lauren.emanuel@census.gov) |   Last Revised: January 25, 2017