A Flexible, Efficient Binomial Mixed Model for Identifying Differential DNA Methylation in Bisulfite Sequencing Data

DNA methylation is an important epigenetic modification involved in regulating gene expression. It can be measured at base-pair resolution, on a genome-wide scale, by coupling sodium bisulfite conversion with high-throughput sequencing (a technique known as ‘bisulfite sequencing’). However, the data generated by such methods present several challenges for statistical analysis. In particular, while the raw data generated from bisulfite sequencing experiments are read counts, they are often converted to proportions for ease of modeling, resulting in loss of information. Furthermore, although DNA methylation levels are known to be heritable—and are thus affected by kinship and population structure—existing approaches for modeling bisulfite sequencing data fail to account for this covariance. Such failure can lead to spurious associations and reduced power. Here, we present a new approach that models bisulfite sequencing data using raw read counts, while also taking into account population structure and other sources of data over-dispersion. Using simulations and two real data sets (publicly available data from Arabidopsis thaliana and newly generated data from Papio cynocephalus), we demonstrate that our model provides well-calibrated p-values and improves power compared with previous methods. In addition, the DNA methylation patterns identified by our method agree with those reported in previous studies.

Vyšlo v časopise: A Flexible, Efficient Binomial Mixed Model for Identifying Differential DNA Methylation in Bisulfite Sequencing Data. PLoS Genet 11(11): e32767. doi:10.1371/journal.pgen.1005650
Kategorie: Research Article
prolekare.web.journal.doi_sk: https://doi.org/10.1371/journal.pgen.1005650


