10 types of regressions. Which one to use?

Should you use linear or logistic regression? In what contexts? There are hundreds of types of regressions. Here is an overview for data scientists and other analytic practitioners, to help you decide on what regression to use depending on your context. Many of the referenced articles are much better written (fully edited) in my data science Wiley book.

Click here to see source, for this picture

Linear regression: Oldest type of regression, designed 250 years ago; computations (on small data) could easily be carried out by a human being, by design. Can be used for interpolation, but not suitable for predictive analytics; has many drawbacks when applied to modern data, e.g. sensitivity to both ouliers and cross-correlations (both in the variable and observation domains), and subject to over-fitting. A better solution is piecewise-linear regression, in particular for time series.
Logistic regression: Used extensively in clinical trials, scoring and fraud detection, when the response is binary (chance of succeeding or failing, e.g. for a new tested drug or a credit card transaction). Suffers same drawbacks as linear regression (not robust, model-dependent), and computing regression coeffients involves using complex iterative, numerically unstable algorithm. Can be well approximated by linear regression after transforming the response (logit transform). Some versions (Poisson or Cox regression) have been designed for a non-binary response, for categorical data (classification), ordered integer response (age groups), and even continuous response (regression trees).
Ridge regression: A more robust version of linear regression, putting constrainsts on regression coefficients to make them much more natural, less subject to over-fitting, and easier to interpret. Click here for source code.
Lasso regression: Similar to ridge regression, but automatically performs variable reduction (allowing regression coefficients to be zero).
Ecologic regression: Consists in performing one regression per strata, if your data is segmented into several rather large core strata, groups, or bins. Beware about the curse of big data in this context: if you perform millions of regressions, some will be totally wrong, and the best ones will be overshadowed by noisy ones with great but artificial goodness-of-fit: a big concern if you try to identify extreme events and causal relationships (global warming, rare diseases or extreme flood modeling). Here's a fix to this problem.
Regression in unusual spaces: click here for details. Example: to detect if meteorite fragments come from a same celestial body, or to reverse-engineer Coca-Cola formula.
Logic regression: Used when all variables are binary, typically in scoring algorithms. It is a specialized, more robust form of logistic regression (useful for fraud detection where each variable is a 0/1 rule), where all variables have been binned into binary variables.
Bayesian regression: see entry in Wikipedia. It's a kind of penalized likehood estimator, and thus somewhat similar to ridge regression: more flexible and stable than traditional linear regression. It assumes that you have some prior knowledge about the regression coefficients.and the error term - relaxing the assumption that the error must have a normal distribution (the error must still be independent across observations). However, in practice, the prior knowledge is translated into artificial (conjugate) priors - a weakness of this technique.
Quantile regression: Used in connection with extreme events, read Common Errors in Statistics page 238 for details.
LAD regression: Similar to linear regression, but using absolute values (L1 space) rather than squares (L2 space). More robust, see also our L1 metric to assess goodness-of-fit (better than R^2) and our L1 variance (one version of which is scale-invariant).
Jackknife regression: This is the new type of regression, also used as general clustering and data reduction technique. It solves all the drawbacks of traditional regression. It provides an approximate, yet very accurate, robust solution to regression problems, and work well with "independent" variables that are correlated and/or non-normal (for instance, data distributed according to a mixture model with several modes). Ideal for black-box predictive algorithms. It approximates linear regression quite well, but it is much more robust, and work when the assumptions of traditional regression (non correlated variables, normal data, homoscedasticity) are violated.

Note: Jackknife regression has nothing to do with Bradley Efron's Jackknife, bootstrap and other re-sampling techniques published in 1982; indeed it has nothing to do with re-sampling techniques.

Other Solutions

Data reduction can also be performed with our feature selection algorithm.
It's always a good idea to blend multiple techniques together to improve your regression, clustering or segmentation algorithms. An example of such blending is hidden decision trees.
Categorical independent variables such as race, are sometimes coded using multiple (binary) dummy variables.

Before working on any project, read our article on the lifecycle of a data science project.

34 members like this

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Alan Dunham on June 2, 2016 at 6:03am: Nice thumbnail outline. FYI, the term 'jackknife' also was used by Bottenberg and Ward, Applied Multiple Linear Regression, in the '60s and 70's, but in the context of segmenting. As mentioned by Kalyanaraman in this thread, econometrics offers other approaches to addressing multicollinearity, autocorrelation in time series data, solving simultaneous equation systems, heteroskedasticity, and over- and under-identification.

Comment by Jamie Lawson on January 9, 2016 at 6:07pm: I'm puzzled why there isn't more attention here to the underlying model. If you have strong reason to believe that the underlying model is linear, then linear regression is fine. If you have strong reason to believe it's sigmoidal, then linear regression is an unlikely candidate. What it usually boils down to, in my experience, is defining the model, and defining the norm. Answers to those two questions pretty much define the problem that you are solving, and given that, there is a (usually) unique solution. It is frustrating to me when I see people typing stuff in at the keyboard but they don't have a solid description of the problem they are solving. Once you have that problem definition, the specific method of solution is often pretty clear.

Comment by imry kissos on April 7, 2015 at 9:48am: Using python's diabetes dataset I created a visualization to show the Support Vector position in SVR:

I also created a visualization of different regression methods on the same data set, using non optimized hyper-parameters

Comment by imry kissos on April 7, 2015 at 9:24am: Another type of regression that I find very useful is Support Vector Regression, proposed by Vapnik, coming in two flavors:

SVR - (python - sklearn.svm.SVR) - regression depends only on support vectors from the training data. The cost function for building the model ignores any training data epsilon-close to the model prediction.

NuSVR - (python - sklearn.svm.NuSVR), enabling to limit the number of support vectors used by the SVR.

As in support vector classification, in SVR different kernels can be used in order to build more complex models using the kernel trick.

Comment by J.T. Radman on July 31, 2014 at 7:42pm: What are folks thoughts on MARS (Multivariable Adaptive Regression Spines) as far as regression techniques? R: earth. Python: py-earth Salford Systems own the MARS implementation.

http://www.slideshare.net/salfordsystems/evolution-of-regression-ol...

Comment by Iga Korneta on July 30, 2014 at 8:48am: I'd love to see a case study, to show how different methods provide different results.

Comment by Vincent Granville on July 24, 2014 at 6:37am: About R implementations, here is a comment by Alan Parker (see also Amy's comment below):

The CRAN task view: “Robust statistical methods” gives a long list of regression methods, including many that Vincent mentions. Here a some that are not mentioned there:

Regression in unusual spaces. This subject is old. It is usually addressed under the title “Compositional data” (see Wikipedia entry). The late John Aitchison founded this area of statistics. Googling his name + “compositional data” gives access to a number of his articles. The R package “compositions” deals with it comprehensively. Another package treats the problem using robust statistics: “robCompositions”.

Bayesian regression. I find Bayesian stuff conceptually hard, so I am using John Kruschke’s friendly book: “Doing Bayesian data analysis”. Chapter 16 is on linear regression. He provides a free R package to carry out all the analyses in the book. The CRAN view “Bayesian” has many other suggestions. Package BMA does linear regression, but packages for Bayesian versions of many other types of regression are also mentioned.

Comment by Kalyanaraman K on July 24, 2014 at 5:00am: Yes. ARIMA is one among the models I considered.

Comment by Mirko Krivanek on July 23, 2014 at 8:09am: I think what Kalyanaraman has in mind is auto-regressive models for time series, like ARIMA processes and Box & Jenkins types of tools to estimate the parameters. A simple form is x(t) = a * x(t-1) + b * x(t-2) + error, where t is the time, a, b are the "regression" coefficients, and a, b are positive numbers satisfying a + b = 1 (otherwise the time series explodes).

Comment by Kalyanaraman K on July 23, 2014 at 5:05am: Hi Vincent
I was thinking about the class of regressions where the data vary over time, say time series. You may know that Econometric Methods contain a lot of alternative versions of regressions depending upon the type of violation of basic assumptions of linear model. You are right when you say jackknife and transformations may take some of these issues but not all. Thus there are regressions with appropriate transformations to control heteroscedasticity ; regressions with AR(1) disturbances; regressions with distributed lags or geometric lag structure of explanatory variables; regressions with lagged explained variables leading to partial adjustment and adaptive expectation model; regressions with stochastic regressors; regressions with error in measurement leading to regression with instrumental variables. Above all the problem of co-integrated models in regression. I was just adding to your list.
Kalyanaraman