An Immune Response Network Associated with Blood Lipid Levels
While recent scans for genetic variation associated with human disease have been immensely successful in uncovering large numbers of loci, far fewer studies have focused on the underlying pathways of disease pathogenesis. Many loci which are associated with disease and complex phenotypes map to non-coding, regulatory regions of the genome, indicating that modulation of gene transcription plays a key role. Thus, this study generated genome-wide profiles of both genetic and transcriptional variation from the total blood extracts of over 500 randomly-selected, unrelated individuals. Using measurements of blood lipids, key players in the progression of atherosclerosis, three levels of biological information are integrated in order to investigate the interactions between circulating leukocytes and proximal lipid compounds. Pair-wise correlations between gene expression and lipid concentration indicate a prominent role for basophil granulocytes and mast cells, cell types central to powerful allergic and inflammatory responses. Network analysis of gene co-expression showed that the top associations function as part of a single, previously unknown gene module, the Lipid Leukocyte (LL) module. This module replicated in T cells from an independent cohort while also displaying potential tissue specificity. Further, genetic variation driving LL module expression included the single nucleotide polymorphism (SNP) most strongly associated with serum immunoglobulin E (IgE) levels, a key antibody in allergy. Structural Equation Modeling (SEM) indicated that LL module is at least partially reactive to blood lipid levels. Taken together, this study uncovers a gene network linking blood lipids and circulating cell types and offers insight into the hypothesis that the inflammatory response plays a prominent role in metabolism and the potential control of atherogenesis.
Published in the journal:
An Immune Response Network Associated with Blood Lipid Levels. PLoS Genet 6(9): e32767. doi:10.1371/journal.pgen.1001113
Category:
Research Article
doi:
https://doi.org/10.1371/journal.pgen.1001113
Summary
While recent scans for genetic variation associated with human disease have been immensely successful in uncovering large numbers of loci, far fewer studies have focused on the underlying pathways of disease pathogenesis. Many loci which are associated with disease and complex phenotypes map to non-coding, regulatory regions of the genome, indicating that modulation of gene transcription plays a key role. Thus, this study generated genome-wide profiles of both genetic and transcriptional variation from the total blood extracts of over 500 randomly-selected, unrelated individuals. Using measurements of blood lipids, key players in the progression of atherosclerosis, three levels of biological information are integrated in order to investigate the interactions between circulating leukocytes and proximal lipid compounds. Pair-wise correlations between gene expression and lipid concentration indicate a prominent role for basophil granulocytes and mast cells, cell types central to powerful allergic and inflammatory responses. Network analysis of gene co-expression showed that the top associations function as part of a single, previously unknown gene module, the Lipid Leukocyte (LL) module. This module replicated in T cells from an independent cohort while also displaying potential tissue specificity. Further, genetic variation driving LL module expression included the single nucleotide polymorphism (SNP) most strongly associated with serum immunoglobulin E (IgE) levels, a key antibody in allergy. Structural Equation Modeling (SEM) indicated that LL module is at least partially reactive to blood lipid levels. Taken together, this study uncovers a gene network linking blood lipids and circulating cell types and offers insight into the hypothesis that the inflammatory response plays a prominent role in metabolism and the potential control of atherogenesis.
Introduction
Blood lipid levels have long been known to be important markers of coronary artery disease and the underlying pathology of atherosclerosis [1], [2]. High-density lipoprotein cholesterol (HDL) is a small, dense complex of phospholipids and apolipoproteins, including apolipoprotein A1 (APOA1), which is synthesized in the liver and has been shown to be negatively associated with atherogenesis. Low-density lipoprotein cholesterol (LDL) displays a positive association with atherogenesis and contains one apolipoprotein, apolipoprotein B (APOB), as well as numerous fatty acids, lipids, and cholesterols. Atherosclerosis entails the buildup of LDL deposits in the arterial wall where they undergo oxidation and subsequent internalization by macrophages, an inflammatory response, leading to the formation of foam cells and further inflammatory signals which can exacerbate arterial LDL adhesion, leading to stenosis [3].
Genome-wide association studies (GWAS) have yielded many successes in the search for the common genetic variants underlying blood lipid levels and other metabolic traits [4]–[7], however systematic functional investigation of pathways, particularly lipid pathways, has lagged behind. Recently, the link between the inflammatory response and metabolism has been the subject of intense research [8], [9]. Chronic inflammation has been shown to lead to the activation of c-Jun amino-terminal kinases [10], [11], and plasma triglyceride levels have been associated with various mediators of NF-ΚB, a key component of the immune response [12]–[15]. Further, it has been shown that postprandial triglyceride increase activates monocytes and neutrophils and the cardioprotective properties of HDL might be partially mediated by activation of the complement cascade [16], [17]. Recently, two companion studies demonstrated both an enrichment of immune pathways in metabolic syndrome and the utility of integrating genomic and transcriptional variation [18], [19]. In particular, they identify a gene expression network of macrophage origin which is likely to be causative of various metabolic traits.
The proximity of lipid compounds and leukocytes in peripheral blood offers a uniquely accessible system in which to study their interactions. We utilized total blood samples from a population-based cohort of 518 unrelated individuals (240 males and 278 females, aged 25–74 years) from the Dietary, Lifestyle, and Genetic determinants of Obesity and Metabolic syndrome (DILGOM) study which have undergone both genome-wide expression profiling and genome-wide genotyping with imputation. After quality filtering (Materials and Methods), 35,419 expression probes and 541,654 SNPs (2,061,516 SNPs after imputation) were taken forward for further analyses. We first assessed how single gene expression correlated with both the specific lipid measurements of the DILGOM cohort and overall variation in lipids (Table 1) then performed network analyses to identify and characterize clusters of tightly co-expressed genes, modules, which showed strong association with lipids. Replication and tissue specificity of a particular module, termed the Lipid Leukocyte (LL) module, was investigated in an independent cohort of B cells and T cells. Finally, genetic variation was assessed both to identify expression quantitative trait loci (eQTLs) driving expression of LL module, thus connecting our findings with that of a previously published GWAS, and to construct an edge-oriented network to elucidate the chain of causality.
Results and Discussion
To assess how each lipid trait associated with gene expression, levels of HDL, LDL, APOB, APOA1, total serum cholesterol (TC), triglycerides (TG), and free fatty acids (FFA) were modeled using multiple linear regression with appropriate covariates (Table 2) and a Bonferroni adjusted significance level for each trait, equivalent to a nominal P = 1.41×10−6. Models were fitted with and without hypertension and cholesterol lowering medications as covariates; no difference in the results was found. Since there are known gender-specific effects, traits were gender-stratified and standardized to Z-scores. Overall, 49 significant associations with gene expression were found (Table 3), however none were observed for TC, LDL, or APOA1. All reported P values are Bonferroni adjusted.
Mediators of inflammation and allergy are associated with APOB, HDL, and TG levels
The strongest signals for FFA were from genes previously known to be involved in β-oxidation and lipolysis. During fatty acid metabolism, long-chain acyl groups are transported from the cytosol into the mitochondrial matrix by carnitine. At the outer mitochondrial membrane, acyl groups are attached to carnitine by carnitine palmitoyltransferase 1A (CPTA1, P = 1.05×10−8 for FFA) and internalized by carnitine/acylcarnitine translocase (SLC25A20, P = 3.63×10−37) [20]. Pyruvate dehydrogenase kinase 4 (PDK4, P = 5.51×10−21) resides in the mitochondrial matrix and downregulates the activity of the pyruvate dehydrogenase complex, a process important to the substrate competition between fatty acids and glucose [21]. Further, two strongly associated probes lie within adipose differentiation-related protein (ADFP, P = 4.74×10−4 and P = 4.79×10−4) which encodes adipophilin [22], [23], and electron-transferring-flavoprotein dehydrogenase (ETFDH, P = 6.88×10−4) has been previously linked to multiple acyl-CoA dehydrogenation deficiency disorders [24].
While lipid traits were correlated with each other (Figure S1), it was of particular interest that the top associations across APOB, HDL and TG were largely shared (Table 3). These genes included histidine decarboxylase (HDC), the alpha subunit from the Fc fragment of high affinity IgE receptor (FCER1A), prostein (SLC45A3), GATA binding protein 2 (GATA2), and carboxypeptidase A3 (CPA3). These genes were also significant predictors of the APOB-APOA1 ratio, the strongest cholesterol-based risk factor for atherosclerosis and coronary artery disease [25] (Table 4). Differences in transcript levels between samples can arise from the relative expansion or contraction of cell populations, thus to test whether the associations could be due to variation in the relative abundance of a range of blood cell types, previously identified cell type expression markers [26] were added as covariates in the model; significance was unchanged (Table 5). Given inter-trait correlations, multivariate approaches may offer better power to detect relationships between lipids and gene expression by incorporating information from cross-trait covariance [27], [28] (Materials and Methods). When predicting multiple traits simultaneously (termed meta-lipids), 85 unique associations were observed at an equivalent Bonferroni-corrected significance level and the above genes remained strongly associated (Table S1). This represented an almost two-fold increase in the number of significant associations using single lipid traits and offered a unified ranking for assessing each gene's involvement in lipid levels.
The most strongly associated genes for APOB, HDL, and TG present intriguing candidate genes for metabolic dysfunction, inflammation, and atherosclerosis. HDC encodes the catalyst for the conversion of histidine to histamine, a well-known pro-inflammatory molecule that is secreted by basophils and mast cells (BMCs). Histamine plays a role in atherogenesis and HDC expression has been previously associated with atherosclerotic status [29]. Importantly, lipoproteins, in particular very low-density lipoprotein, have been shown to cause secretion of histamine from basophils [30]. HDC may also play a more general role in metabolic syndrome as murine knockouts display hyperleptinemia, obesity, and glucose intolerance [31], [32]. On the cell surfaces of BMCs, FCER1A plays a powerful role in the immune response and in histamine release as the encoded receptor subunit directly interacts with antigen-bound IgE, an antibody isotype capable of the most potent immune reactions [33]. FCER1A was also found to be the strongest signal in a recent GWAS of serum IgE levels [34]. Interestingly, biochemical studies of mast cell specific CPA3 have shown its involvement in the degradation of APOB from LDL thus leading to the potential for LDL fusion [35]–[37]. Our observation of a negative correlation between CPA3 expression and APOB concentrations was consistent with these findings. The transcription factor GATA2 has been shown to both attenuate inflammation in murine adipose tissue and allow for normal mast cell development [38]. Weidinger et al. previously observed the co-expression of GATA2 and FCER1A [34]. The correlation of FCER1A and GATA2 expression in the DILGOM cohort was also extremely strong (Spearman's ρ = 0.664), therefore we investigated the hypothesis that HDC, FCER1A, SLC45A3, GATA2, and CPA3 function as part of the same pathway.
Network analysis of gene co-expression and module replication
In biological pathways, many genes tend to co-express thus it is natural to incorporate these correlations into a network-based framework. Within this framework, pairwise correlations between genes are used to describe the connectedness of the network, and clusters of tightly correlated genes (modules) can define pathways. To construct a co-expression network that characterizes lipid traits, the method of Horvath and Langfelder [39], [40] was used to assess the top 10% of expression signals for meta-lipids (3,520 unique signals, Materials and Methods). Twenty-three modules were identified and each module's summary expression profile (defined by its first principal component) was tested for correlation with individual lipid traits (Figure 1). The strongest expression associations identified above for HDL, APOB, and TG clustered into the same pathway, module K, hereafter referred to as the Lipid Leukocyte (LL) module (Figure 2). The strongest signals for FFA did not cluster into a module. Summary expression of LL module was associated with HDL (P = 5.62×10−7), APOB (P = 3.06×10−6), and TG levels (P = 2.44×10−29), results which were significant after correcting for the estimated number of co-expression modules in the whole gene set (Materials and Methods). It is composed of 11 genes (12 probes) including HDC, FCER1A, GATA2, CPA3, MS4A2 (the beta subunit of high affinity IgE receptor's Fc fragment), SPRYD5 and SLC45A3 (Table S2). Module membership, a measure of intramodular connectivity, showed that the afore mentioned genes constitute the core of the module and are the most correlated with lipid traits (Figure 3).
In order to replicate LL module's existence and investigate tissue specificity, we utilized expression data from the GenCord cohort [41], a unique resource which includes both EBV-immortalized B cell lines (LCLs) and primary T cells drawn from individual umbilical cord blood. LL module co-expression was highly significant in T cells, however LCLs from the same individuals showed a marked absence of any co-expression (Table 6). This suggests that this co-expression module is tissue specific among blood cell types, however it is not clear whether, or to what extent, laboratory treatment might also contribute to the obscurity of co-expression networks. The possibility exists that significant changes in host-cell gene expression patterns occur upon both infection of B cells with EBV, which binds to complement receptors thus initiating the complement system, and the selection of B cells which have successfully integrated episomal EBV. We therefore emphasize caution when interpreting correlations in gene expression from non-primary tissues and encourage further studies into the effects of laboratory treatments.
eQTL analysis of the Lipid Leukocyte module
Gene expression itself is a quantitative trait of genetic variation [42]–[44]. Using genome-wide SNP genotypes from individuals in DILGOM, we investigated the genetic effects on expression for each gene in LL module and for the LL module as a whole (Materials and Methods). For those SNPs in cis, within 1 Mb of the expression probe midpoint, a simple linear regression was performed. In order to determine significance, a permutation procedure was implemented [43]. For trans SNPs, greater than 5 Mb away or on a different chromosome, the non-parametric Spearman rank correlation was used [42], offering a more robust test of association since permutation across the whole genome would be computationally prohibitive. To determine the significance of the nominal Spearman P value, a threshold of 5.0×10−7 was implemented.
At a permutation threshold of 0.05, only two cis SNP associations associated with genes in LL module (Table S3), SLC45A3 expression was associated with variation at rs12569123 and rs12569261, however there was insufficient evidence for either SNP's association with overall expression of the LL module (P = 0.18 and P = 0.057 respectively). It was of note that rs2251746, an experimentally verified eQTL of FCER1A and the strongest signal in a recent GWAS for serum IgE levels [34], [45], nominally influenced FCER1A expression (Figure 4, nominal P = 1.83×10−4). In testing association with LL module expression, rs2251746 showed strong evidence (P = 4.28×10−6). For trans SNPs, only three significant associations were observed, all between MS4A3 expression and a haploblock on chromosome 6 containing PNRC1 and SRrp35. These SNPs also strongly predicted LL module expression (Table S4). Overall, the strongest signal for LL module expression corresponded to rs2251746, evidence that LL module contains both transcriptional and genetic components of the immune response.
Structural equation modeling shows Lipid Leukocyte module is reactive to lipids
Genetic variation can be used to orient network edges and infer causality [46]–[48]. Since we have identified genetic variation driving expression of LL module, we can construct a directed network of core LL module and other lipid measures which have been strongly associated with genetic variants (TG and HDL). To do this, we use SEM as implemented in Network Edge Orienting (NEO) methods [48]. A Local Edge Orienting (LEO) score was calculated to infer edge orientation (Materials and Methods). Simulation studies have previously shown that a LEO score threshold of 0.3 corresponds to a false positive rate less than 0.05 [48]. With this approach, we show that both HDL and TG may be causative of LL module by driving expression of SPRYD5 (LEO score = 0.67) and MS4A2 (LEO score = 0.33) respectively (Figure 5). Interestingly, HDL also appears to influence TG levels (LEO score = 0.75). In addition, core LL module genes were predicted to drive expression of FCER1A (minimum LEO score = 1.4) with the exception of MS4A2, high affinity IgE receptor's beta subunit. However, for these particular edges it should be noted that while deviation from the causal model was at least 25 times less likely than all other models considered, the causal model P values indicated that the causal model itself was likely a poor fit (Table 7). As more eQTLs are uncovered for LL module genes it is likely that model fitting will improve and the chain of causation within LL module will become clearer.
Conclusions
In this report, a previously uncharacterized, potentially tissue-specific gene network (LL module) has been shown to be associated with blood lipid levels. The LL module not only harbors key components of inflammation and allergy which strongly suggest a role for basophils and mast cells but also associates with the SNP that most strongly regulates serum IgE levels. BMCs themselves have been previously associated with atherosclerosis and myocardial infarction [25], [29], [49], [50], however their precise role remains to be elucidated. The LL module described here offers genomic evidence in support of previous functional studies that LL module genes are linked to lipids and metabolism and, importantly, shows that these genes operate as a single gene module. This work provides a general framework to understand how lipid levels might activate cellular pathways in circulating nucleated peripheral blood cells contributing to cascades potentially resulting in atherosclerosis. Our findings should stimulate further, better-targeted molecular experiments to characterize details of this link.
Materials and Methods
Ethics statement
The DILGOM participants provided written informed consent. The protocol was designed and performed according to the principles of the Helsinki Declaration and was approved by the Coordinating Ethical Committee of the Helsinki and Uusimaa Hospital District.
Trait measurements and sample collection
The study samples included a total of 631 unrelated Finnish individuals aged 25–74 years from the Helsinki area, recruited during 2007 as part of the Dietary, Lifestyle, and Genetic determinants of Obesity and Metabolic syndrome (DILGOM) study, an extension of the FINRISK 2007 study. Extensive trait information was collected, including lifestyle factors. Study participants were asked to fast overnight (at least 10 hours) prior to giving a blood sample. After extraction, the blood samples were left at room temperature for 45 minutes then centrifuged to separate the serum and plasma. Samples were kept in a −70°C freezer.
Total serum cholesterol (TC), high density lipoprotein cholesterol (HDL), low density lipoprotein cholesterol (LDL), apolipoprotein B (APOB), apolipoprotein A1 (APOA1), triglycerides (TG), and fasting free fatty acid (FFA) levels were determined in the Laboratory of Analytical Biochemistry of the Institute of Health and Welfare (Helsinki, Finland). TC measurements were carried out with the CHODPAP-assay (Abbott Laboratories, Abbott Park, Illinois, USA). HDL measurements used a direct enzymatic assay (Abbott Laboratories, Abbott Park, Illinois, USA). TG was measured with the enzymatic GPO assay (Abbott Laboratories, Abbott Park, Illinois, USA). APOB and APOA1 levels were determined using an immunoturbidometric method (Abbott Laboratories, Abbott Park, Illinois, USA). For APOB, the coefficients of variation (CVs) were 3.8%, 3.4% and 2.1% at the levels 0.35 g/L, 0.90 g/L and 1.66 g/L respectively. For APOA1, the CVs were 2.0%, 1.4% and 1.6% at the levels 0.91 g/L, 1.19 g/L and 2.15 g/L respectively. All methods used manufacturer protocols. FFA was determined using the enzymatic colorimetric ACS-ACOD method, as implemented in the NEFA-C assay kit, using the Architect c8000 (Abbott Laboratories, Abbott Park, Illinois, USA). Between series repeatability were 0.73 mmol/L, CV = 2.4% (n = 143) for level 1 and 0.99 mmol/L, CV = 2.3% (n = 139) for level 2. All methods used manufacturer protocols.
Genotyping and expression microarrays
DNA was extracted from 10 ml EDTA whole blood samples with salt precipitation using Autopure (Qiagen GmbH, Hilden, Germany). DNA purity and quantity were assessed with PicoGreen (Invitrogen, Carlsbad, CA, USA). Genotyping used 250 ng of DNA and proceeded on the Illumina 610-Quad SNP array (Illumina Inc., San Diego, CA, USA) using standard protocols. After excluding chip failures and poor quality samples (as determined by visual inspection of a 0.75% agarose gel or low Sequenom call rate), 555 samples were successfully genotyped.
To obtain stabilized total RNA, we used the PAXgene Blood RNA System (PreAnalytiX GMbH, Hombrechtikon, Switzerland), which included collection of 2.5 ml peripheral blood into PAXgene Blood RNA Tubes (Becton Dickinson and Co., Franklin Lakes, NJ, USA) and total RNA extraction with PAXgene Blood RNA Kit (Qiagen GmbH, Hilden, Germany). We used the protocol as recommended by the manufacturer.
The integrity and quantity of the RNA sample was evaluated with the 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA). Biotinylated cRNA was produced from 200 ng of total RNA with Ambion Illumina TotalPrep RNA Amplification Kit (Applied Biosystems, Foster City, CA, USA), using the protocol specified by the manufacturer. 750 ng of biotinylated cRNA were hybridized onto Illumina HumanHT-12 Expression BeadChips (Illumina Inc., San Diego, CA, USA), using standard protocol. For each sample, biotinylated cRNA preparation and hybridization onto BeadChip were done in duplicates. For expression arrays, 585 samples were successfully completed.
Data quality, processing and imputation
After each expression array was scanned, background corrected probe signal intensities and bead counts were outputted from Illumina's BeadStudio software in order to undergo further processing. Strip-level quantile normalization was then used to force probe intensity distributions for all samples on all arrays to be the same [51]. Since each sample was technically replicated, the normalized values were then used to measure their correlation via Pearson's product moment correlation coefficient (Ρ) and Spearman's rank correlation coefficient (ρ). Generally, reproducibility was high (Figure S2). To further assess data quality, we also generated MA plots between replicate arrays after normalization [52]. We manually inspected each sample's MA plot for curvature or overt deviation from the M = 0 axis, none exhibited these characteristics. A sample was removed from further analysis if its Ρ was <0.94 or ρ was <0.60 (9 samples fail).
To combine raw signal intensities from corresponding replicates, the signals (S) were weighted by the number of beads (b) contributing to each signal and summed to arrive at one measure of signal intensity (δ) for each sample at each probe:Probes that did not meet certain criteria were removed from further analysis: (a) non-autosomal (b) complementary to cDNA from erythrocyte globin components (c) map to more than one genomic position.
For each genotyping array, Cy3 and Cy5 signal intensities were exported from BeadStudio and pooled together for clustering with the Illuminus genotype calling algorithm [53]. Samples were removed from further analysis if they showed low quality (call rate <0.95, 19 samples removed), failed to match Sequenom genotype fingerprinting (concordance <0.90 for at least 10 genotypes, 0 samples removed), or were a previously unknown close relation or duplication (pairwise identity by descent pi-hat >0.10, 1 sample removed). SNPs failing to meet the following quality thresholds were also removed from further analysis: call rate >0.95, minor allele frequency >0.01, and Hardy-Weinberg equilibrium P value >1.0×10−6. 37,558 SNPs were removed in total.
Un-observed SNPs were imputed with the software IMPUTE version 5 using phased HapMap release 22 haplotypes from the CEU panel [54]. A genotype was assigned if its posterior probability was >0.95 or missing if not, and all SNPs underwent the same filtering as those above. 249,345 SNPs were removed in total, leaving 2,061,516 SNPs for further analysis.
Population structure
To control for structure in the Finnish population, we used principal components analysis (PCA) on the genotypic data in order to identify outliers who descend from outside the Helsinki region (Figure S3). All SNPs underwent PCA with the EIGENSOFT software [55]; regression residuals involving the 2 previous SNPs were used as inputs to correct for SNP linkage disequilibrium. Samples exceeding eight standard deviations along any statistically significant principal component were removed from further analysis (Figure S4, 17 samples removed). A principal component was considered significant if its Tracy-Widom P value was <0.01.
Trait distributions and correlations
Trait distributions and inter-trait correlations were also examined. Given well-known gender differences between many of the traits, distributions for males and females were treated separately. If a trait was not normally distributed as determined by an Anderson-Darling test (P<0.01), a Box-Cox power transformation was implemented to achieve normality and each trait measurement was converted to a Z score. The Z scores for males and females were then combined for further analyses. Inter-trait correlations were calculated via Spearman's rank correlation coefficient, see Figure S1.
Association analysis and multiple test correction
All univariate statistical tests and permutations were done using PopGenomix, a C++ package developed in the Dermitzakis laboratory for the analysis of gene expression data. To test the association of a transcript's expression with a given SNP, linear regression was used. A simple model was constructed where Yi is the probe's log2-normalized expression for individual i, Xi is the genotype of the individual at a given SNP (encoded as 0, 1, or 2 for the number of minor alleles), and εi is a normally distributed random variable with mean equal to zero and constant variance:Nominal P values were calculated for the test of no association, b = 0.
In the case of Spearman's ρ, the coefficient is a function of ranks, xi is the rank of the log2-normalized expression value for individual i, yi is the genotypic rank (0, 1, or 2), and n is the corresponding number of measurements:Since sample sizes were large, a t-test with n-2 degrees of freedom was used to determine a nominal P value.
Null distributions of P values were generated in order to evaluate the significance of the observed P value [43], [56], with expression levels permuted relative to genotypes. Unless otherwise specified, 10,000 permutations were performed, and each test was considered at an alpha level of 0.05.
Multiple and multivariate modeling was done using the R statistical computing language (http://www.r-project.org/). To test the association of a transcript's expression with a given trait, linear regression was used with appropriate covariates that include age, gender, or other correlated traits (see Table 2).
Given the highly correlated nature of the trait measurements, the construction of meta-traits was investigated. The meta-lipids (TC, FFA, HDL, LDL, TG, APOB, APOA1) trait was treated as the response variable in a multivariate linear model with probe expression, age, hypertension medication, and cholesterol medication as regressors (Table 2).where Y is a matrix of normalized individual lipid trait values for genes; X is a matrix of log2-normalized expression values, age values, hypertension and cholesterol medication (as factors) for each individual; and E is a data matrix of error terms. Similarly, when testing SNP association with expression of all LL module genes simultaneously, a simple multivariate linear model was used. In which case Y is a matrix of log2-normalized individual expression values for genes in LL module, X is a vector of individual SNP genotypes (encoded 0, 1, 2), and E is a data matrix of error terms.
Reported P values are from the Wilks' lambda test statistic [57]. Multiple and multivariate modeling use the Bonferroni correction to control for multiple tests. All reported P values are corrected unless otherwise noted.
Correction for cell type expression markers
To correct for relative cell type numbers, we use the expression markers defined in Whitney et al. [26]. The cell type proportions corrected for include lymphocytes, neutrophils, reticulocytes, B cells, cytotoxic T lymphocytes/natural killer cells, erythrocytes, myeloid cells, and Myc-regulated cells (profiles for the time of day were also included), however correction for T cells (uncovered on HT-12 array), mast cells (not assessed in Whitney et al.), and basophils specific markers (not assessed in Whitney et al.) was not possible. Covariates for the cell types were defined as the average standard score across all cell-type specific markers for each sample.
Network analysis
Network analysis was done using the R packages, WGCNA [39], [40], [58] and NEO [48].
The undirected transcription network considered the top 10% of expression signals for meta-lipids (3,520 unique signals). The correlation matrix was constructed via all against all Pearson correlation coefficient calculations and the adjacency matrix was calculated with a soft threshold power of nine (Figure S5). Genes were then hierarchically clustered and visualized in a dendrogram (Figure S6), where a ‘leaf’ constitutes an individual gene and ‘branches’ are clusters of tightly correlated genes. The dynamic tree cut function in WGCNA with a minimum module size of 10 genes was used to determine initial modules. Individual module expression profiles underwent singular value decomposition and the summary module profiles from the vector corresponding to the first singular value were clustered to identify modules that were highly correlated (those less than a dendrogram height of 0.20). These modules were then merged.
To correlate module summary profiles with lipid traits, a t-test of Spearman's rank correlation was used. The corresponding Spearman correlation coefficients and P values can be observed in Figure 1. Statistical significance was determined by estimating the number of co-expression modules in the entire dataset. Given the 23 modules calculated from 1000 expression probes, we estimated the total number of modules to be (23×35419/1000) = 814.637. Therefore, the appropriate alpha level was determined to be (0.05/814.637) = 6.14×10−5. Calculations of module membership and individual gene significance (Figure 3) have been previously defined [39]. Only module K (the Lipid Leukocyte, LL, module) was used in further analyses.
NEO was used to predict the directedness of the network using causal SNPs as anchors. Of the lipid traits associated with LL module expression, HDL and TG were selected because the genetic variation underlying them has been studied extensively. Since the choice of SNPs can have a large impact on the directedness of the network (non-causal SNPs can introduce noise) and the DILGOM dataset (N = 518) is not sufficiently powered to significantly detect many of the known variants, we use only the strongest signals from recent genome-wide association studies [4], [59]; rs3764261 (CETP) was included for HDL and rs1260326 (GCKR) was included for TG. In our dataset, the strongest signal previously found for TG, rs964184 (APOA1-C3-A4-A5), did not pass quality control filters. Since rs2251746 has been shown to be an eQTL for FCER1A and LL module expression, we also include it as a causal anchor. To further verify that these loci can be considered causal anchors in the DILGOM dataset, we adopt the automatic SNP selection approach in NEO using both a greedy method and forward-stepwise regression [48]. We observed that all SNPs were correctly assigned to their respective nodes. An edge exists if the edge score (the absolute value of the Pearson correlation between nodes A and B) exceeds a threshold of 0.3. Since all nodes have a causal anchor, the NEO score (the log10 ratio of a fitted causal model P value to the next best causal model P value) defined in the main text is the NEO.NB.OCA score. An edge is considered significantly oriented if the NEO score exceeds a threshold of 0.3. Simulation studies have shown that a NEO.NB.OCA score of 0.3 or more corresponds to a false positive rate of 5% or less (cite NEO). We further considered the path coefficient for A→B (Z test statistic >1.96 or <−1.96) and, since the NEO score is a ratio of model P values, the fit of the primary model MA→A→B←MB (P value should be >0.05). See Table 7 for directed network edge statistics.
Data availability
The expression data for the individuals analyzed in this study has been made publicly available through the ArrayExpress database (accession number E-TABM-1036).
Supporting Information
Zdroje
1. KannelWB
DawberTR
KaganA
RevotskieN
StokesJIII
1961 Factors of risk in the development of coronary heart disease–six year follow-up experience. the framingham study. Ann Intern Med 55 33 50
2. MillerNE
MillerGJ
1975 Letter: High-density lipoprotein and atherosclerosis. Lancet 1 7914 1033
3. RossR
1999 Atherosclerosis–an inflammatory disease. N Engl J Med 340 2 115 126
4. AulchenkoYS
RipattiS
LindqvistI
BoomsmaD
HeidIM
2009 Loci influencing lipid levels and coronary heart disease risk in 16 european population cohorts. Nat Genet 41 1 47 55
5. ProkopenkoI
LangenbergC
FlorezJC
SaxenaR
SoranzoN
2009 Variants in MTNR1B influence fasting glucose levels. Nat Genet 41 1 77 81
6. SabattiC
ServiceSK
HartikainenAL
PoutaA
RipattiS
2009 Genome-wide association analysis of metabolic traits in a birth cohort from a founder population. Nat Genet 41 1 35 46
7. SandhuMS
WaterworthDM
DebenhamSL
WheelerE
PapadakisK
2008 LDL-cholesterol concentrations: A genome-wide association study. Lancet 371 9611 483 491
8. HanssonGK
2005 Inflammation, atherosclerosis, and coronary artery disease. N Engl J Med 352 16 1685 1695
9. HotamisligilGS
2006 Inflammation and metabolic disorders. Nature 444 7121 860 867
10. HirosumiJ
TuncmanG
ChangL
GorgunCZ
UysalKT
2002 A central role for JNK in obesity and insulin resistance. Nature 420 6913 333 336
11. BaudV
LiuZG
BennettB
SuzukiN
XiaY
1999 Signaling by proinflammatory cytokines: Oligomerization of TRAF2 and TRAF6 is sufficient for JNK and IKK activation and target gene induction via an amino-terminal effector domain. Genes Dev 13 10 1297 1308
12. YuC
ChenY
ClineGW
ZhangD
ZongH
2002 Mechanism by which fatty acids inhibit insulin activation of insulin receptor substrate-1 (IRS-1)-associated phosphatidylinositol 3-kinase activity in muscle. J Biol Chem 277 52 50230 50236
13. ArkanMC
HevenerAL
GretenFR
MaedaS
LiZW
2005 IKK-beta links inflammation to obesity-induced insulin resistance. Nat Med 11 2 191 198
14. YuC
ChenY
ClineGW
ZhangD
ZongH
2002 Mechanism by which fatty acids inhibit insulin activation of insulin receptor substrate-1 (IRS-1)-associated phosphatidylinositol 3-kinase activity in muscle. J Biol Chem 277 52 50230 50236
15. PerseghinG
PetersenK
ShulmanGI
2003 Cellular mechanism of insulin resistance: Potential links with inflammation. Int J Obes Relat Metab Disord 27 Suppl 3 S6 11
16. AlipourA
van OostromAJ
IzraeljanA
VerseydenC
CollinsJM
2008 Leukocyte activation by triglyceride-rich lipoproteins. Arterioscler Thromb Vasc Biol 28 4 792 797
17. VaisarT
PennathurS
GreenPS
GharibSA
HoofnagleAN
2007 Shotgun proteomics implicates protease inhibition and complement activation in the antiinflammatory properties of HDL. J Clin Invest 117 3 746 756
18. ChenY
ZhuJ
LumPY
YangX
PintoS
2008 Variations in DNA elucidate molecular networks that cause disease. Nature 452 7186 429 435
19. EmilssonV
ThorleifssonG
ZhangB
LeonardsonAS
ZinkF
2008 Genetics of gene expression and its effect on disease. Nature 452 7186 423 428
20. BremerJ
1983 Carnitine–metabolism and functions. Physiol Rev 63 4 1420 1480
21. SugdenMC
2003 PDK4: A factor in fatness? Obes Res 11 2 167 169
22. WolinsNE
BrasaemleDL
BickelPE
2006 A proposed model of fat packaging by exchangeable lipid droplet proteins. FEBS Lett 580 23 5484 5491
23. BildiriciI
RohCR
SchaiffWT
LewkowskiBM
NelsonDM
2003 The lipid droplet-associated protein adipophilin is expressed in human trophoblasts and is regulated by peroxisomal proliferator-activated receptor-gamma/retinoid X receptor. J Clin Endocrinol Metab 88 12 6056 6062
24. OlsenRK
OlpinSE
AndresenBS
MiedzybrodzkaZH
PourfarzamM
2007 ETFDH mutations as a major cause of riboflavin-responsive multiple acyl-CoA dehydrogenation deficiency. Brain 130 Pt 8 2045 2054
25. McQueenMJ
HawkenS
WangX
OunpuuS
SnidermanA
2008 Lipids, lipoproteins, and apolipoproteins as risk markers of myocardial infarction in 52 countries (the INTERHEART study): A case-control study. Lancet 372 9634 224 233
26. WhitneyAR
DiehnM
PopperSJ
AlizadehAA
BoldrickJC
2003 Individuality and variation in gene expression patterns in human blood. Proc Natl Acad Sci U S A 100 4 1896 1901
27. AllisonDB
ThielB
St JeanP
ElstonRC
InfanteMC
1998 Multiple phenotype modeling in gene-mapping studies of quantitative traits: Power advantages. Am J Hum Genet 63 4 1190 1201
28. FerreiraMA
PurcellSM
2009 A multivariate test of association. Bioinformatics 25 1 132 133
29. TanimotoA
SasaguriY
OhtsuH
2006 Histamine network in atherosclerosis. Trends Cardiovasc Med 16 8 280 284
30. GonenB
O'DonnellP
PostTJ
QuinnTJ
SchulmanES
1987 Very low density lipoproteins (VLDL) trigger the release of histamine from human basophils. Biochim Biophys Acta 917 3 418 424
31. JorgensenEA
VogelsangTW
KniggeU
WatanabeT
WarbergJ
2006 Increased susceptibility to diet-induced obesity in histamine-deficient mice. Neuroendocrinology 83 5–6 289 294
32. FulopAK
FoldesA
BuzasE
HegyiK
MiklosIH
2003 Hyperleptinemia, visceral adiposity, and decreased glucose tolerance in mice with a targeted disruption of the histidine decarboxylase gene. Endocrinology 144 10 4306 4314
33. KraftS
KinetJP
2007 New developments in FcepsilonRI regulation, function and inhibition. Nat Rev Immunol 7 5 365 378
34. WeidingerS
GiegerC
RodriguezE
BaurechtH
MempelM
2008 Genome-wide scan on total serum IgE levels identifies FCER1A as novel susceptibility locus. PLoS Genet 4 8 e1000166
35. PaananenK
KovanenPT
1994 Proteolysis and fusion of low density lipoprotein particles independently strengthen their binding to exocytosed mast cell granules. J Biol Chem 269 3 2023 2031
36. KokkonenJO
VartiainenM
KovanenPT
1986 Low density lipoprotein degradation by secretory granules of rat mast cells. sequential degradation of apolipoprotein B by granule chymase and carboxypeptidase A. J Biol Chem 261 34 16067 16072
37. PejlerG
KnightSD
HenningssonF
WernerssonS
2009 Novel insights into the biological function of mast cell carboxypeptidase A. Trends Immunol 30 8 401 408
38. TsaiFY
OrkinSH
1997 Transcription factor GATA-2 is required for proliferation/survival of early hematopoietic cells and mast cell formation, but not for erythroid and myeloid terminal differentiation. Blood 89 10 3636 3643
39. HorvathS
DongJ
2008 Geometric interpretation of gene coexpression network analysis. PLoS Comput Biol 4 8 e1000117
40. LangfelderP
HorvathS
2008 WGCNA: An R package for weighted correlation network analysis. BMC Bioinformatics 9 559
41. DimasAS
DeutschS
StrangerBE
MontgomerySB
BorelC
2009 Common regulatory variation impacts gene expression in a cell type-dependent manner. Science 325 5945 1246 1250
42. StrangerBE
NicaAC
ForrestMS
DimasA
BirdCP
2007 Population genomics of human gene expression. Nat Genet 39 10 1217 1224
43. StrangerBE
ForrestMS
ClarkAG
MinichielloMJ
DeutschS
2005 Genome-wide associations of gene expression variation in humans. PLoS Genet 1 6 e78
44. GoringHH
CurranJE
JohnsonMP
DyerTD
CharlesworthJ
2007 Discovery of expression QTLs using large-scale transcriptional profiling in human lymphocytes. Nat Genet 39 10 1208 1216
45. HasegawaM
NishiyamaC
NishiyamaM
AkizawaY
MitsuishiK
2003 A novel -66T/C polymorphism in fc epsilon RI alpha-chain promoter affecting the transcription activity: Possible relationship to allergic diseases. J Immunol 171 4 1927 1933
46. SchadtEE
LambJ
YangX
ZhuJ
EdwardsS
2005 An integrative genomics approach to infer causal associations between gene expression and disease. Nat Genet 37 7 710 717
47. LiR
TsaihSW
ShockleyK
StylianouIM
WergedalJ
2006 Structural model analysis of multiple quantitative traits. PLoS Genet 2 7 e114
48. AtenJE
FullerTF
LusisAJ
HorvathS
2008 Using genetic markers to orient the edges in quantitative trait networks: The NEO software. BMC Syst Biol 2 34
49. KaartinenM
PenttilaA
KovanenPT
1994 Accumulation of activated mast cells in the shoulder region of human coronary atheroma, the predilection site of atheromatous rupture. Circulation 90 4 1669 1678
50. KovanenPT
KaartinenM
PaavonenT
1995 Infiltrates of activated mast cells at the site of coronary atheromatous erosion or rupture in myocardial infarction. Circulation 92 5 1084 1088
51. BolstadBM
IrizarryRA
AstrandM
SpeedTP
2003 A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19 2 185 193
52. IrizarryRA
HobbsB
CollinF
Beazer-BarclayYD
AntonellisKJ
2003 Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics 4 2 249 264
53. TeoYY
InouyeM
SmallKS
GwilliamR
DeloukasP
2007 A genotype calling algorithm for the illumina BeadArray platform. Bioinformatics 23 20 2741 2746
54. MarchiniJ
HowieB
MyersS
McVeanG
DonnellyP
2007 A new multipoint method for genome-wide association studies by imputation of genotypes. Nat Genet 39 7 906 913
55. PattersonN
PriceAL
ReichD
2006 Population structure and eigenanalysis. PLoS Genet 2 12 e190
56. ChurchillGA
DoergeRW
1994 Empirical threshold values for quantitative trait mapping. Genetics 138 3 963 971
57. MardiaKV
1979 Multivariate analysis London Academic Press
58. ZhangB
HorvathS
2005 A general framework for weighted gene co-expression network analysis. Stat Appl Genet Mol Biol 4 Article17
59. WillerCJ
SannaS
JacksonAU
ScuteriA
BonnycastleLL
2008 Newly identified loci that influence lipid concentrations and risk of coronary artery disease. Nat Genet 40 2 161 169
60. FreemanTC
GoldovskyL
BroschM
van DongenS
MaziereP
2007 Construction, visualisation, and clustering of transcription networks from microarray expression data. PLoS Comput Biol 3 10 2032 2042
Štítky
Genetika Reprodukčná medicínaČlánok vyšiel v časopise
PLOS Genetics
2010 Číslo 9
- Je „freeze-all“ pro všechny? Odborníci na fertilitu diskutovali na virtuálním summitu
- Gynekologové a odborníci na reprodukční medicínu se sejdou na prvním virtuálním summitu
Najčítanejšie v tomto čísle
- Synthesizing and Salvaging NAD: Lessons Learned from
- Optimal Strategy for Competence Differentiation in Bacteria
- Long- and Short-Term Selective Forces on Malaria Parasite Genomes
- Identifying Signatures of Natural Selection in Tibetan and Andean Populations Using Dense Genome Scan Data