Choosing the Right Graphical Model for Genomics Data

The development of high-throughput technology, such as microarray, SNP array, array-CGH, methylation array, exome-sequencing, and RNA-sequencing, has generated a wide variety of genetics data. To accurately estimate the underlying network structure from these data types, one needs to apply the right network inference algorithm based on the platform-specific data distribution.


In XMRF, data are modeled using their native distribution instead of normalizing the data to follow a Gaussian distribution. To accomplish this, our package implements methods for three families: Gaussian graphical models (GGM), Ising models (ISM), and Poisson family graphical models including regular Poisson graphical models (PGM) as well as several variants of the Poisson family of models such as the truncated Poisson (TPGM), sub-linear Poisson (SPGM), and local Poisson (LPGM) (Allen and Liu, 2013). Note that (Yang et al., 2013b) proposed all these variants of the Poisson family as the regular Poisson graphical model only permits negative conditional dependencies between nodes; each of these variants relaxes restrictions resulting in both positive and negative conditional dependencies. For genomic networks based on sequencing data, we recommend using the LPGM variant as proposed in (Allen and Liu, 2013), noting that this local model closely approximates the proper MRF distribution of the SPGM formulation (Yang et al., 2013b). In the following table, we provide our recommendations for which distributional families to model for data from different high-throughput platforms.


Data Data type XMRF method
RNA-Sequencing Count data LPGM,SPGM
Microarray/ Methylation Gaussian data GGM
Mutation/ CNV Binary data ISM



Subsections

2015-05-29