Introduction

XMRF is an R package implemented to enable biomedical researchers to discover complex interaction between genes from multi-dimensional genomics data. These interactions are mathematically captured in the form of Markov Networks. Methods implemented in the package support data produced by different high-throughput genomic profiling technologies including gene expression from DNA microarray, genomic mutation from SNP array, copy number variation from array-CGH, DNA methylation profiles from methylation array, and read counts from next-generation sequencing like RNA-Seq.


This package encodes the recently proposed parametric family of graphical models based on node-conditional univariate exponential family distributions (Yang et al., 2012; Yang et al., 2013a). Specifically, our package has methods for estimating Gaussian graphical model (Meinshausen and Buhlmann, 2006), Ising model (Ravikumar et al., 2010), and the Poisson family graphical models (Allen and Liu, 2013; Yang et al., 2013b; Allen and Liu, 2012). This allows the user to choose the right graphical model according to the native distribution of the genomics data.


To identify the optimal network sparsity, this package implements two data-driven approaches to select the sparsity of a fitted network: a stabilty-based approach for a single regularization value over many bootstrap resamples (Meinshausen and Bühlmann, 2010), or an averaged stability-based approach computed over a range of regularization values via StARS (Liu et al., 2010).


The graph estimation techniques implemented in this package follow the neighborhood selection graph estimation technique by proximal or projected gradient descent using warm starts over the range of regularization parameters, which allows for estimation of the neighborhood for each node independently (Liu et al., 2010; Meinshausen and Buhlmann, 2006). Due to the parallel nature of this algorithm, high computational efficiency is achieved through implementation of forking with functions from parallel (R Core Team, 2013) and snowfall (Knaus, 2013) package. Furthermore, this package also depends on glmnet package (Friedman et al., 2010) for $ L_1$ norm regularized linear regression, igraph package (Csardi and Nepusz, 2006) for graph manipulation, and TCGA2STAT package() to import and process high-throughput genomics data.



2015-05-29