The development of high-throughput technology, such as microarray, SNP array, array-CGH, methylation array, exome-sequencing, and RNA-sequencing, has generated a wide variety of genetics data. To accurately estimate the underlying network structure from these data types, one needs to apply the right network inference algorithm based on the platform-specific data distribution.
In XMRF, data are modeled using their native distribution instead of normalizing the data to follow a Gaussian distribution. To accomplish this, our package implements methods for three families: Gaussian graphical models (GGM), Ising models (ISM), and Poisson family graphical models including regular Poisson graphical models (PGM) as well as several variants of the Poisson family of models such as the truncated Poisson (TPGM), sub-linear Poisson (SPGM), and local Poisson (LPGM) (Allen and Liu, 2013). Note that
(Yang et al., 2013b) proposed all these variants of the Poisson family as the regular Poisson graphical model only permits negative conditional dependencies between nodes; each of these variants relaxes restrictions resulting in both positive and negative
conditional dependencies. For genomic networks based on sequencing
data, we recommend using the LPGM variant as proposed in
(Allen and Liu, 2013), noting that this local model closely
approximates the proper MRF distribution of the SPGM formulation (Yang et al., 2013b). In the following table, we provide our recommendations for which distributional families to model for data from different high-throughput platforms.
Data | Data type | XMRF method |
RNA-Sequencing | Count data | LPGM,SPGM |
Microarray/ Methylation | Gaussian data | GGM |
Mutation/ CNV | Binary data | ISM |