Kernel-penalized regression

How to install?

  • This R package implements estimation and inference methods for kernel-penalized regression models, designed for two-way structured high dimensional data. Here two-way structured data refers to data with auxiliary row and column structures.
  • The developmental version of this package is available on GitHub. A vignette on how to use the package is also avaialble here.
  • Credit to Yue Wang and Parker Knight.

Why should someone use KPR?

  • KPR provides estimation and inference for two-way structured high-dimensional data, such as microbiome data. Structures among microbiome samples are often available in the form of non-Euclidean distances, including the popular UniFrac distances. Structures among microbial taxa are often available via the phylogenetic tree.
  • KPR results can be visualized using biplots, similar to how principal component analysis results are presented. Biplots for two-way structured data were studied in details here.

How does it compare to other methods?

  • KPR relies on the idea of generalized matrix decomposition to estimate the association between structured high-dimensional predictors and a continuous outcome. It is the first to allow for correlated samples while existing methods assume independent samples.
  • KPR provides statistical inference for the association between each predictor and the outcome, using recent de-biasing based inference procedures.
  • KPR does not assume sparsity of the association between high-dimensional predictors and the outcome. Instead, the auxiliary structures on variable similarities inform how the predictors are associated with the outcome.
  • In Ecology, the principal coordinate analysis (PCoA) is often used for exploratory analysis. However, the PCoA plot does not reveal which taxa are related to the observed sample clustering because the configuration of samples is not based on a coordinate system in which both the samples and variables can be represented. In contrast, the KPR biplot displays both the samples and taxa with respect to the same coordinate system, which further allows the configuration of future samples.