Skip to main content
Stream classification methods classify a continuous stream of data as new labelled samples arrive. They often also have to deal with concept drift. This paper focuses on seasonal drift in stream classification, which can be found in many... more
Manifold learning has been successfully used for finding dominant factors (low-dimensional manifold) in a high-dimensional data set. However, most existing manifold learning algorithms only consider one manifold based on one dissimilarity... more
Contemporary biological technologies produce extremely high-dimensional data sets from which to design classifiers, with 20,000 or more potential features being common place. In addition, sample sizes tend to be small. In such settings,... more
This paper focuses on feature selection for problems dealing with high-dimensional data. We discuss the benefits of adopting a regularized approach with L 1 or L 1–L 2 penalties in two different applications—microarray data analysis in... more
Given a data set consisting of a large number of pre- dictors plus a response, the problem addressed in this work is to select a minimal model which correctly predicts the response. Methods for achieving this subsetting of the predictors... more
Subspace clustering is an extension of traditional clustering that seeks to find clusters in different subspaces within a dataset. Often in high dimensional data, many dimensions are irrelevant and can mask existing clusters in noisy... more
Dimension reduction and variable selection are two types of effective methods that deal with high-dimensional data. In particular, variable selection techniques are of wide-spread use and essentially consist of individual selection... more
Data compression is the most important step in many signal processing and pattern recogni- tion applications. We come across very high dimensional data in such applications. Before processing of large-dimensional datasets, we need to... more
This article introduces the sparse group fused lasso (SGFL) as a statistical framework for segmenting sparse regression models with multivariate time series. To compute solutions of the SGFL, a nonsmooth and nonseparable convex program,... more
GAMLSS is a general framework for fitting regression type models where the distribution of the response variable does not have to belong to the exponential family and includes highly skew and kurtotic continuous and discrete distribution.... more
This PhD thesis explains the recent issue concerning the resolution of high-dimensional problems. We present methods designed to solve them, and their applications for feature selection problems, in the data mining eld. In the rst... more
MXM is an R package which offers variable selection for high-dimensional data in cases of regression and classification. Many regression models are offered. In addition some functions for Bayesian Networks and graphical models are... more
We study the asymptotic properties of the Adaptive LASSO (adaLASSO) in sparse, high-dimensional, linear time-series models. The adaLASSO is a one-step implementation of the family of folded concave penalized least-squares. We assume that... more
Hypothesis tests in models whose dimension far exceeds the sample size can be formulated much like the classical studentized tests only after the initial bias of estimation is removed successfully. The theory of debiased estimators can be... more
This paper develops robust confidence intervals in high-dimensional and left-censored regression. Type-I censored regression models are extremely common in practice, where a competing event makes the variable of interest unobservable.... more
Cluster analysis divides data into groups (clusters) for the purposes of summarization or improved understanding. For example, cluster analysis has been used to group related documents for browsing, to find genes and proteins that have... more
Possible companion paper to How to Build a 21-Dimensional  Universe. I'm not sure it's compatible, but there is some similarity. This is a later development that is not necessarily more perfect.