Deciphering Normal Blood Gene Expression Variation—The NOWAC Postgenome Study

Download PDF České info

There is growing evidence that gene expression profiling of peripheral blood cells is a valuable tool for assessing gene signatures related to exposure, drug-response, or disease. However, the true promise of this approach can not be estimated until the scientific community has robust baseline data describing variation in gene expression patterns in normal individuals. Using a large representative sample set of postmenopausal women (N = 286) in the Norwegian Women and Cancer (NOWAC) postgenome study, we investigated variability of whole blood gene expression in the general population. In particular, we examined changes in blood gene expression caused by technical variability, normal inter-individual differences, and exposure variables at proportions and levels relevant to real-life situations. We observe that the overall changes in gene expression are subtle, implying the need for careful analytic approaches of the data. In particular, technical variability may not be ignored and subsequent adjustments must be considered in any analysis. Many new candidate genes were identified that are differentially expressed according to inter-individual (i.e. fasting, BMI) and exposure (i.e. smoking) factors, thus establishing that these effects are mirrored in blood. By focusing on the biological implications instead of directly comparing gene lists from several related studies in the literature, our analytic approach was able to identify significant similarities and effects consistent across these reports. This establishes the feasibility of blood gene expression profiling, if they are predicated upon careful experimental design and analysis in order to minimize confounding signals, artifacts of sample preparation and processing, and inter-individual differences.

Published in the journal: Deciphering Normal Blood Gene Expression Variation—The NOWAC Postgenome Study. PLoS Genet 6(3): e32767. doi:10.1371/journal.pgen.1000873
Category: Research Article
doi: https://doi.org/10.1371/journal.pgen.1000873

Summary

Introduction

There is growing evidence that transcriptome analysis of peripheral blood cells is a valuable tool for determining signatures related to disease [1]–[5] and drug-response [6]. Differences in blood gene expression may also reflect the effects of a particular exposure, such as smoking [7], metal fumes [8], or ionizing radiation [9]. In our previous research, we studied gene expression profiles from whole blood related to hormone therapy (HT) use in postmenopausal women [10] and identified specific challenges raised by inter-individual variability when isolating signals associated with defined exposure levels. Although blood gene expression profiling promises molecular-level insight into disease mechanisms, there remains a lack of baseline data describing the nature and extent of variability in blood gene expression in the general population. Characterizations of this variation and the underlying factors that most influence gene expression amongst healthy individuals will play an important role in the feasibility, design and analysis of future blood-based studies investigating biomarkers for exposure, disease progression, diagnosis or prognosis [11].

Several studies [12]–[18] have reported that technical variables such as collection, transportation, storage of blood samples, RNA isolation method and choice of microarray platform, in addition to biological effects, can influence gene expression profiles. These technical factors associated with the processing and preparation of human blood and subsequent microarray hybridization represent significant challenges in the analysis of variability.

Furthermore, a few previous studies have used microarrays to analyze blood from healthy volunteers and found that inter-individual sample variation was associated with sex [18], age [13],[18], the time of day the sample was taken [18],[19], and the proportion of the different cell populations comprising the blood sample [13],[18],[20]. However to date, all such studies have focused on gene expression profiles generated from a small set of samples not representative of the general population using different blood cell subtypes. For several reasons including the small sample sizes, these studies have been restricted to the analysis of a small number of variables simultaneously, thus ignoring possible interaction and confounder effects.

Finally, an understanding of these causes of variability would represent a significant step forward in the identification and evaluation of the disease and disease risk biomarkers. Most if not all genes are involved in molecular pathways that provide mechanistic insight in response to exposure or disease development. Pathway depictions are usually simplified, ignoring interactions with other pathways, and we often have incomplete knowledge about the specific interplay of the many elements in almost any particular system.

Using a large representative sample set of postmenopausal women in the Norwegian Women and Cancer (NOWAC) postgenome study [21],[22] processed via a standardized blood collection procedure and via an experimentally validated microarray platform [23], we investigate here the baseline variability of whole blood gene expression profiles. This represents the first comprehensive cross-sectional analysis of blood gene expression changes related to multiple inter-individual and exposure variables, and opens the new research discipline of systems epidemiology [24]. In this setting, we investigated blood gene expression changes due to technical variability, normal inter-individuality, and exposure variables at proportions and levels relevant to real life situations, and establish that these effects are mirrored in the blood transcriptome.

Results

Study design

Population characteristics

Characteristics of women included in the analyses are described in Table S1A. Most of the women were non-smokers, not using HT, and 41.8% were not using any other medication (MED). In average, smokers consumed 2.8 cigarettes (sd 3.8) before blood sampling and 10.2 cigarettes (sd 6.3) the day before. The mean body mass index (BMI) was 25.6 kg/m² (SD 4.2) with most women either of normal BMI (51.7%) or overweight/obese (45.8%). Women in our study range from 48 to 62 years of age (mean 55.7; SD 3.6). Age was significantly associated with smoking (Chi-square p-value = 0.01).

Data analysis strategy

Using the data analysis strategy outlined in Figure 1, three among eight reported technical variables found significant in multivariate global analysis of covariance (ANCOVA) [25] (Figure 1A), as well as three biological (age, body mass index, fasting status), and three exposure variables (smoking, HT and MED) were included in the forward-backward variable selection by the mixed linear model run for each probe (Figure 1B). Additionally, as an interaction between HT and MED use was significant in the blood expression profiles, we also included an interaction variable to account for this in our model. Since model selection based on Bayesian information criterion (BIC) does not take into account issues of multiple testing, we filtered the gene sets based on the z-score value from global test [26] and set a threshold which maximizes the discovery of true positives (weight = 2) versus false positives (weight = 1) associated to each variable (Figure 1D). The z-score obtained from the global test [26] is a useful analytical tool to reduce probes that have previously been found differentially expressed to a core set by estimating the contribution of each probe to the overall measure of association for this set to a specific variable. Throughout this report, we refer to probes filtered by global test z-score more likely to be true positives as core probes. We applied functional clustering via the Database for Annotation, Visualization, and Integrated Discovery (DAVID) [27] (http://david.abcc.ncifcrf.gov/) and gene network predictions via HEFalMp [28] (http://function.princeton.edu/hefalmp) to the resultant core gene list for each phenotype, in order to identify molecular pathways and processes that are variable across our panel of healthy subjects (Figure 1E).

**Fig. 1. Schematic representation of data analysis.**

As a complementary approach (Figure 1C), we curated gene sets from published articles focusing on normal variability in blood or gene expression profiles related to exposure, and subsequently conducted gene set enrichment analysis via the global test [26]. In total, 42 such gene sets were identified from 14 published papers and the Kyoto Encyclopedia of Genes and Genomes (KEGG) database [29] (Table S2).

Investigation of baseline variation in gene expression changes in blood

Global analysis of covariance

We investigated the effects of technical variability by searching for associations between the global blood gene expression profiles and eight reported technical variables (Table S1B) that catalog day-to-day RNA processing, RNA/cRNA purity and concentration. All technical variables were highly significant in the univariate global ANCOVA [25] but three variables (i.e. array lot number, RNA extraction date, time between blood collection and freezing) remain significant the multivariate analysis with permuted p-value less than 0.0001 (Table S3).

EigenR2 analysis and probe sets variability. Via an eigenR2 analysis [30] which is a high-dimensional version of the classic R2 statistic, we estimated that the three above-mentioned technical variables and the six biological/exposure variables explained 46.5% and 8.1% of the overall variation in gene expression, respectively. These results suggest that the contributions of technical variability result in a level of random noise which is deemed to be high for this large sample set even after standard normalization.

Under gene-wise linear model selected by BIC criterion, each probe was found to be associated with 3.4 variables on average (total 10 variables considered, SD 1.2). As complementary analysis, we considered only those probes that were uniquely associated with a single variable to capture specific signals related to one biological variable. However, since most probes (77%) showed expression patterns that associate with array lot, we did not filter probes from this subsequent analysis if they were associated with this technical variable. Throughout this report, we refer to probes that meet this criterion as biologically uniquely associated with a variable of interest (Table S4).

Investigation of variation in gene expression changes in blood associated with biological and exposure variables

Molecular effects of smoking mirrored in blood

Gene-wise mixed linear analysis identified 3,024 probes related to smoking of which 98.1% are core probes (FDR = 0.01; Table 1). Via DAVID, we identified several biological processes significantly over-represented in the smoking-associated set of genes (Table 2) including enrichments for “rhodopsin-like, G protein coupled receptor activity” (DAVID cluster of 5 biological processes, median FDR = 1.60 10⁻⁶; Table 2) and “olfactory receptor activity, sensory perception of smell/chemical stimulus” (DAVID cluster of 6 biological processes, median FDR = 0.46%, Table 2). Two sub-endothelial adhesive proteins (fibronectin and thrombospondin, Table 2) were significantly deregulated by smoking. Finally, specific (e.g. monoamine oxidoreductase activity) as well as more general processes (e.g. substrate-specific/ion trans-membrane transporter activity and receptor activity) were significantly enriched in the smoking-associated genes (Table 2). When investigating core genes biologically uniquely associated by smoking (N = 174; Table S4), we identified one consistent significant enrichment in “oxidoreductase activity acting on NAD(P)H” (DAVID cluster of 5 biological processes, median FDR = 2.65%). The genes biologically uniquely up-regulated in non smokers includes ARHGEF19 encoding a Rho GTPase involved in regulation of small GTPase and signal transduction processes.

**Tab. 1. Gene-wise linear analysis conducted for each probe (N = 16185) and global test z-score filtering conducted for gene sets associated to each biological variable.**

**Tab. 2. Functional enrichment of core probes associated with smoking status in gene-wise mixed linear model based on BIC criterion and filtered based on global test z-score (N = 2966).**

Seventeen of the 42 curated gene sets were found to be significantly enriched (FDR<0.02) in our dataset with respect to smoking status (Table 3). Two studies [7],[31] have previously investigated exposure effect of smoking on blood gene expression and identified two signatures overlapping by only a single gene. In our dataset, 34 and 19 probes on our microarray could be matched to the 26-gene and 17-gene signatures from [7] and [31], respectively. Both of the gene sets induced the most significant enrichment scores associated with smoking status (Table 3). The comparative p-value indicates that only 0.2% and 8.2% respectively of random gene sets of the same size as the two signatures would have a larger standardized test statistic. In the first gene set [7], we identified the 13 core genes constitutive of a gene network predicted by HEFalMp involved in response to wounding, acute inflammatory response and cell chemotaxis (Figure S1). Other curated gene sets were significantly enriched according to smoking status with a FDR <0.02 and comparative p-value <0.50. In non-smokers, two gene sets related to growth factor and stress response signaling due to exercise [32] were up-regulated (Table 3). Several signatures of blood cell subtype were significantly enriched with respect to smoking status. In particular, monocyte-specific genes were up-regulated whereas red blood cell -⁠ and natural killer cell -⁠ specific genes were down-regulated in smokers [18]. The seven-gene signature related to age [13] was enriched with smoking status in our dataset, as were two hormone-related gene sets [10],[33]. Using the core genes of these two hormone-related gene sets up-regulated in non smokers, HEFalMp revealed a gene network (Figure 2) enriched in neuroactive ligand-receptor interactions. It identifies increased expression of sphingosine 1-phosphate receptor (EDG8) and predicted interactions of the query core genes with prolactin (PRLR), glucagon (GL1PR), and prostaglandin E2 (PTGER3) receptors (Figure 2).

Fig. 2. Network between top genes (in grey) in the two hormone-related gene sets up-regulated in non-smokers and genes (in white) predicted by HEFalMp in relation to this query considering all genes in all biological processes.

**Tab. 3. Significant gene sets curated from the literature associated with smoking status using the global test.**

BMI class and mirrored metabolic effect on the blood transcriptome

Among the probes associated with BMI class in the gene-wise linear analysis (N = 3098), 678 were core probes (FDR = 0.20; Table 1). We identified enrichment for several biological processes involved in adaptive immune related responses (Table 4). Of particular note is the identification of a signature for diabetes type I (DAVID cluster of 9 biological processes, median FDR = 5.60 10⁻⁶; Table 4). In women with normal BMI, two curated gene sets related to inflammatory and stress response signaling due to exercise [32] were up-regulated (Table 5).

**Tab. 4. Functional enrichment of core probes associated with BMI class in gene-wise mixed linear model based on BIC criterion and filtered based on global test z-score (N = 678).**

**Tab. 5. Significant gene sets curated from the literature associated with BMI class using global test.**

Genes related to fasting status, medication, and hormone therapy use: correlation and interaction of complex signals

The biological variables fasting, MED, and HT use induced the most significant probes under our gene-wise mixed linear models (84.1%, 62.7% and 44.5% of all genes, respectively; Table 1). In fact, there was a high degree of overlap between all three variables (40.0%, 5,775 genes in total), and 74.5% are associated with at least two variables.

As noted earlier, an interaction between HT and MED in relation to the blood gene expression profiles was statistically significant (permuted p-value = 0.03). HT use was associated with the blood gene expression profiles with a multivariate permuted p-value of 0.09 and 0.44 in users and non-users of other medications, respectively. MED was associated with the blood gene expression profiles with a multivariate permuted p-value of 0.06 and 0.38 in non-users and users of HT, respectively. Further analyses are required in order to investigate the different categories of MED, HT regimens informed by questionnaire and hormone levels measured in plasma, as well as their interactions in relation to blood gene expression profiles.

Of the 13611 probes identified as related to fasting (Table 1), 269 were identified as core probes (FDR = 0.23). This latter probe list was significantly enriched in regulation of transcription and RNA metabolic process (DAVID cluster of 11 biological processes; median FDR = 8.30%, Table 6) partly involving deregulation of zinc finger proteins (DAVID cluster of 3 biological processes; median FDR = 0.65%, Table 6) or bromo-domain containing proteins (DAVID cluster of 1 biological process; median FDR = 7.7%, Table 6) involved in chromatin modification. In accordance with these results, the core probes (N = 36, Table S4) biologically uniquely associated with fasting women were significantly enriched in chromatin modification and control of gene expression by vitamin D receptor (DAVID cluster of 2 biological processes; median FDR = 10.3%).

**Tab. 6. Functional enrichment of core probes associated with fasting status in gene-wise mixed linear model based on BIC criterion and filtered based on global test z-score (N = 269).**

Finally, none of the 14 gene sets curated from the literature were significantly enriched in our dataset with respect to fasting status. A similar absence of significant enrichment was observed for a list of 1356 genes associated with fasting in peripheral blood mononuclear cells [34], of which 784 probes were identifiable in our dataset.

Age difference in postmenopausal women and its weak effect on blood gene expression profiles

No significant enrichment of biological processes was observed for the 40 probes including 36 core probes associated with age group (FDR = 0.01; Table 1). With respect to gene set enrichment analysis, the immunoglobulin gene set (N = 36) had the lowest global test enrichment p-value (p-value = 0.03), but a high false discovery rate (FDR = 0.92). One publication [13] found a gene list (N = 14 genes; N = 9 after mapping to our Applied Biosystems probe IDs) derived from blood and associated with age but was not significantly enriched in our dataset (global test p-value = 0.45).

Discussion

Peripheral blood is an ideal surrogate tissue as it has the potential to reflect responses to changes in the immediate and distant environments by alterations of gene expression levels. Given the number of factors that influence gene regulation and expression, it is not surprising that often more than one strong signal is present in any given high-dimensional dataset. The external validity of NOWAC as a representative sample of the Norwegian female population has been verified in several methodological analyses and found to be acceptable [35]. Studies of the internal validity, including reliability, have been undertaken for dietary questions [36],[37], menopausal status, and use of HT [36],[38], whereas validation of variables measuring physical activity remain ongoing.

Inter-individual variability

In addition to technical variability, substantial differences in gene expression profiles were identified between individuals with respect to exposure. Overall, the functional enrichment of significant single genes and gene set enrichment analyses show that high-throughput gene expression studies implicate similar (although not identical) underlying biology across several studies. Whereas age did not induce a large effect in blood gene expression for our cohort of postmenopausal women aged from 48 to 62 years, pathways and gene sets affected by smoking and, to a lesser extent both BMI and fasting, are numerous and interconnected. Some expression profiles associated with these variables may also be associated with other factors (e.g., lower levels of exercise, age). A host of new candidate genes for regulation by inter-individual (fasting, BMI) and exposure (smoking) factors were identified which could be used as a basis for hypothesis development.

Several processes associated with smoking were involved in cardiovascular regulation by G-coupled receptors (i.e. purinergic, adrenergic beta-1, urotensin II or thromboxan A2 receptors) or protein activity (i.e. thrombospondin type-1, fibronectin type-3). Consistent with previous observations that smoking reduces olfactory sensitivity in a dose -⁠ and time-dependent manner [39],[40], we find that smoking significantly impairs blood gene expression of olfactory receptors. We also observed that smokers have deregulated gene expressions of several P450 cytochromes which catalyse mono-oxygenase activity that can both toxify and detoxify carcinogenic compounds. As established in normal lung [41] and rats [42], smokers tend to have a small increase in NAD(P)H:(quinone-acceptor) oxidoreductase compared to non-smokers.

Two previous studies [7],[31] have examined the effects of cigarette smoking on leukocyte gene expression in circulation and both of the associated signatures had the most significant enrichment scores over all gene sets considered here. Inflammatory responses previously associated with smoking [7] were up-regulated in the blood expression of smokers in our dataset. Lending support that smoking has immune and inflammatory effects, specific blood cell gene signatures [13],[18] (i.e. increased monocytes and decreased red blood cell and natural killer cell signalling) were differentially expressed according to smoking status. This is consistent with previous observations showing that the total numbers of peripheral leukocytes differ by smoking status [43],[44]. Core genes up-regulated in non-smokers from the enriched hormone-related gene sets [10],[33] were predicted to be involved in neuroactive ligand-receptor interactions like prostaglandin receptors. Elevated prostaglandin E2 synthesis has been previously reported in smokers in comparison with non-smokers [45],[46]. The predicted gene network also reflects the effect of smoking on hormone levels with increased secretion of prolactin and glucagon [47]. Two pathways related to exercise [32] were also found up-regulated in non-smokers, which may simply be due to an underlying prevalence of active exercisers in non-smokers [48].

In our study, we found BMI class associated with blood gene expression changes involved in several immune processes including diabetes type I. It has been reported that several immune functions are dysregulated in obesity [49],[50] and both genetic and environmental factors such as obesity have been implicated as triggers in the pathogenesis of diabetes. The role of autoimmunity in the origins of type I diabetes is well-known, including a role in latent autoimmune diabetes in adults [51] and several observations suggest that autoimmunity may be part of type II diabetes [52]–[55]. Finally, two pathways related to exercise [32] were also up-regulated in women with normal BMI which may be due to a higher prevalence of physical exercise than in overweight/obese women.

Of all the variables considered, fasting was associated with the largest number of genes, but few genes were identified as core genes possibly due to the limited number of fasting women (N = 28) at the time of blood sampling. Selection of core genes aims to select a subset of true positives which work together (possibly in similar pathways) towards significance of the set. The significant core genes associated with fasting were generally involved in gene expression regulation and chromatin modification [56]–[58]. Much of our understanding of the effects of nutrition on chromatin structure has been gleaned from model organisms, especially S. cerevisiae, C. elegans, Drosophila, and mice [59]. In humans, two previous studies were unable to characterize acute effects of food intake in blood gene expression profiles [13],[18]. One putative 784-gene signature exists [34], however only 49 genes associated with fasting overlap with this signature. This may simply be due to chance.

Due to a significant interaction between HT and MED within our profiles, further analyses with a larger sample size are needed in order to investigate the different categories of medications, HT regimens and hormone levels, as well as their interactions in blood.

Consistency with other studies

Differences between the genes identified and the interpretation of results in the various studies discussed here are likely to have resulted from technical differences in the array platforms used, the subset of blood cells analyzed, and the chosen analytical procedures. Several studies [12]–[18] examined how gene expression profiles of blood samples are affected by technical variables. Specific blood sample collection methods result in the isolation of different blood cell subpopulations. White blood cells have been defined as the most transcriptionally active of all cell types in blood and may give the most sensitive gene expression profiles in response to defined factors [60]. In large epidemiological studies, RNA stabilization is compulsory and PAXgene tubes have been found satisfactory to stabilize and enable RNA extraction from whole blood cells [61]. While high proportions of globin RNA could reduce sensitivity with respect to certain microarray platforms [60],[62],[63], we previously investigated two globin reduction protocols and determined that they were not beneficial when Applied Biosystems (AB) microarrays are used [23]. We found that RNA extraction and one variable related to RNA degradation (i.e. time between blood collection and freezing) had a significant global effect on blood gene expression profiles. In addition to normalization preprocessing, our results suggest that technical variability should not be ignored and possible adjustment for technical sources of variability should be considered in any analysis. Techniques such as surrogate variable analysis [64] may adjust for hidden sources of heterogeneity and large-scale dependence in gene expression studies [65]. As an example in our study, 25 significant surrogate variables were highly correlated to the strongest identified technical sources of noise, array lot number (canonical correlation r² = 0.95), time between blood collection and freezing (canonical correlation r² = 0.62) and RNA extraction (canonical correlation r² = 0.43).

After adjustment for technical variability, our analysis demonstrates the ability to find significant similarities between studies by focusing on the biological implications of the gene sets from each individual study, rather than the specific single genes that met the criteria for significant differential expression in each individual study. They lend support to the idea that blood gene expression studies can indeed detect exposure-specific differences and that failure to consider this type of biological variation can result in the misidentification of genes when investigating predictive, diagnostic or prognostic signatures in blood.

In conclusion, this study extends the limited baseline information currently available that describes normal patterns of variation in blood gene expression. The data generated have been made freely available and should represent a useful resource for the design of future studies including power calculations. Our results confirm the feasibility of identifying signatures of inter-individual factors (e.g. fasting, BMI) and exposure factors (e.g. smoking) in blood-based gene expression profiles, and reinforces the need for proper study design, sample preparation, and technical analysis.

Methods

Ethics statement

We have received approval from the Regional Committee for Medical Research Ethics for the collection and storing of questionnaire information and blood samples. The informed consent formula explicitly mentions that the blood samples can be used for gene expression analyses as well as large-scale genotyping.

All data are stored and handled according to the permission given by the Norwegian Data Inspectorate. The Directorate of Health and Social affairs (SHD) has given us an exemption from the confidentiality of information in national registers.

Before use of the biological material, a request has been sent to the regional ethical committee for Northern-Norway. Use of biological material requires permission according to laws pertaining to biotechnology and gene technology, both of which are administered by the SHD.

Subjects

The women are part of the Norwegian Women and Cancer (NOWAC) study (http://uit.no/kk/NOWAC/) consisting of 172471 women who were 30 to 70 years of age at recruitment from 1991 to 2006 [22]. The NOWAC postgenome cohort study [21] consists of approximately 50,000 women born between 1943 and 1957, randomly drawn in groups of 500 from the NOWAC registers, who gave blood samples between 2003 and 2006 and filled in a two-page questionnaire. The two-page questionnaire included questions regarding menopausal status, weight, height; past week exposure to smoking, HT, oral contraceptives, other MED, omega-3 fatty acid, soy or other dietary supplements; and details concerning blood specimen collection (date, hour, posture). Women included in the present study received a blood collection kit and an accompanying two-page questionnaire by mail in April 2005. Among the group of 500 women, 444 (89%) returned both citrate and PAXgene blood RNA (PreAnalytiX GmbH, Hembrechtikon, Switzerland) tubes; 3.3% declined to participate, 0.7% had died or migrated and 7% did not respond. Samples were included in the study according to the following inclusion criteria: the donor was postmenopausal (99 donors excluded), blood was successfully collected in one PAXgene tube and in two plasma collection tubes (8 donors excluded), and the samples were frozen within 3 days from blood collection (9 donors excluded). Based on these criteria, 328 PAXgene blood samples were included for RNA extraction.

RNA isolation and quality control

PAXgene blood RNA tubes were thawed at room temperature for 4 h. 500 µL of blood was removed and stored on −70°C for future use. Total RNA was isolated using the PAXgene Blood RNA Isolation Kit, according to the manufacturer's manual. RNA quantity and purity was assessed using the NanoDrop ND-1000 spectrophotometer (ThermoFisher Scientific, Wilmington, Delaware, USA). The absorbance ratio of 260 nm and 280 nm (A260/A280) was between 1.93 and 2.1 for all samples included for further analysis. The Experion automated electrophoresis system (BioRad, Hercules, CA, USA) and the RNA StdSens Analysis Kit was used to evaluate RNA integrity of a randomized 32% of the samples, according to the instruction manual. The electropherograms were inspected for clear ribosomal peaks. We were not able to analyze any numerical criteria corresponding to electrophoresis patterns, because this information was not available. Thirty nine samples were excluded from further analysis due to insufficient RNA purity, yield or integrity. RNA samples were kept at −70°C until further use.

Microarray-based profiling and image analysis

After exclusion based on study design and RNA quality and quantity criteria, samples were analyzed using the Applied Biosystems (AB) expression array system (Foster City, Lousiana, USA). 500 ng total RNA was used for amplification by the NanoAmp RT-IVT labeling kit from AB for one round of amplification, in accordance with the manufacturer's manual. Briefly, the 1^st strand of cDNA was synthesized by reverse transcription using the T7-oligo (dT) primer, followed by 2^nd strand synthesis. The double-stranded cDNA was purified, and used as template for in vitro transcription (IVT). During IVT, digoxigenin (DIG)-labeled UTP was incorporated into the cRNA. The quantity and purity of the cRNA was measured on the NanoDrop ND-1000, and the cRNA was stored on −70°C until further use. 10 µg of DIG-labeled cRNA was fragmented and hybridized to AB Human Genome Survey Microarray V2.0, in accordance with the Chemiluminescence Detection Kit Protocol.

The AB Human Genome Survey Microarray V2.0 contains 277 control probes and 32,878 probes for the interrogation of 29,098 genes. AB Expression System software was used to extract signal intensities, signal to noise ratios (S/N) and flagging.

Data analysis

A total of 304 arrays including 15 technical replicates were analyzed. Data analysis was performed using R (http://cran.r-project.org), an open-source-interpreted computer language for statistical computation and graphics, and tools from the Bioconductor project (http://www.bioconductor.org), adapted to our needs. Using R, we set the expression intensity to “missing” for genes with flagging value >8191 (threshold recommended by the microarray manufacturer). For a set of technical replicate arrays from the same subject, we excluded the array with the least number of probes that had a S/N exceeding 3. Furthermore, arrays (N = 3) where less than 40% of the probes had a S/N≥3 were also removed from the analysis. Individual probes were not considered, if the S/N exceeded 3 in less than 50% of the samples. After sample and probe filtration, we proceeded with a log2 transformation, quantile normalization and imputation of missing values using 10-nearest neighbourhood method [66]. A total of 286 arrays and 16185 probes are analyzed. Microarray data have been deposited at Gene Expression Omnibus (GEO; http://www.ncbi.nlm.nih.gov/geo) accession number GSE15289.

The global ANCOVA [25] was carried out by comparison of linear models via the extra sum of squares principle to test for the univariate and multivariate association between global expression values and technical variables. All significant technical variables with a permuted p-value <0.001 identified in the ANCOVA multivariate analysis were included in the gene-wise linear model selection as random (array lot number, RNA extraction date) and fixed (time between blood collection and freezing) variables.

Forward-backward variable selection was used to select gene-wise model based on BIC. Linear mixed models were used to test the association of each gene with the significant technical and all biological variables. The z-score from the global test [26] was used to select core probes that most strongly explain the difference between groups setting a FDR [67] threshold which maximizes the discovery of true positives (weight = 2) versus false positives (weight = 1) associated with each variable. Gene set enrichment analysis was conducted using the global test [26], which offers the opportunity to compare two or more groups while taking into account the association between probe sets as well as their individual effects. When testing several gene sets curated from the literature, we adjusted for multiple testing using FDR [67]. Functional clustering and gene networks prediction were performed with the Database for Annotation, Visualization, and Integrated Discovery (DAVID) at http://david.abcc.ncifcrf.gov/ [27], and the Human Experimental/Functional Mapper (HEFalMp) [28] at http://function.princeton.edu/hefalmp, respectively.

Supporting Information

Zdroje

1. TwineNC

StoverJA

MarshallB

DukartG

HidalgoM

2003 Disease-associated expression profiles in peripheral blood mononuclear cells from patients with advanced renal cell carcinoma. Cancer Res 63 6069 6075

2. TangY

NeeAC

LuA

RanR

SharpFR

2003 Blood genomic expression profile for neuronal injury. J Cereb Blood Flow Metab 23 310 319

3. GladkevichA

KauffmanHF

KorfJ

2004 Lymphocytes as a neural probe: potential for studying psychiatric disorders. Prog Neuropsychopharmacol Biol Psychiatry 28 559 576

4. AchironA

GurevichM

FriedmanN

KaminskiN

MandelM

2004 Blood transcriptional signatures of multiple sclerosis: unique gene expression of disease activity. Ann Neurol 55 410 417

5. SharmaP

SahniNS

TibshiraniR

SkaaneP

UrdalP

2005 Early detection of breast cancer based on gene-expression patterns in peripheral blood cells. Breast Cancer Res 7 R634 R644

6. BurczynskiME

TwineNC

DukartG

MarshallB

HidalgoM

2005 Transcriptional profiles in peripheral blood mononuclear cells prognostic of clinical outcomes in patients with advanced renal cell carcinoma. Clin Cancer Res 11 1181 1189

7. LampeJW

StepaniantsSB

MaoM

RadichJP

DaiH

2004 Signatures of environmental exposures using peripheral leukocyte gene expression: tobacco smoke. Cancer Epidemiol Biomarkers Prev 13 445 453

8. WangZ

NeuburgD

LiC

SuL

KimJY

2005 Global gene expression profiling in whole-blood samples from individuals exposed to metal fumes. Environ Health Perspect 113 233 241

9. AmundsonSA

DoKT

ShahabS

BittnerM

MeltzerP

2000 Identification of potential mRNA biomarkers in peripheral blood lymphocytes for human exposure to ionizing radiation. Radiat Res 154 342 346

10. DumeauxV

JohansenJ

Borresen-DaleAL

LundE

2006 Gene expression profiling of whole-blood samples from women exposed to hormone replacement therapy. Mol Cancer Ther 5 868 876

11. WildCP

2009 Environmental exposure measurement in cancer epidemiology. Mutagenesis 24 117 125

12. DebeyS

SchoenbeckU

HellmichM

GathofBS

PillaiR

2004 Comparison of different isolation techniques prior gene expression profiling of blood derived cells: impact on physiological responses, on overall expression and the role of different cell types. Pharmacogenomics J 4 193 207

13. EadyJJ

WortleyGM

WormstoneYM

HughesJC

AstleySB

2005 Variation in gene expression profiles of peripheral blood mononuclear cells from healthy volunteers. Physiol Genomics 22 402 411

14. FeezorRJ

BakerHV

MindrinosM

HaydenD

TannahillCL

2004 Whole blood and leukocyte RNA isolation for gene expression analyses. Physiol Genomics 19 247 254

15. McPhailS

GoralskiTJ

2005 Overcoming challenges of using blood samples with gene expression microarrays to advance patient stratification in clinical trials. Drug Discov Today 10 1485 1487

16. TannerMA

BerkLS

FeltenDL

BlidyAD

BitSL

2002 Substantial changes in gene expression level due to the storage temperature and storage duration of human whole blood. Clin Lab Haematol 24 337 341

17. ThachDC

LinB

WalterE

KruzelockR

RowleyRK

2003 Assessment of two methods for handling blood in collection tubes with RNA stabilizing agent for surveillance of gene expression profiles with high density microarrays. J Immunol Methods 283 269 279

18. WhitneyAR

DiehnM

PopperSJ

AlizadehAA

BoldrickJC

2003 Individuality and variation in gene expression patterns in human blood. Proc Natl Acad Sci U S A 100 1896 1901

19. RadichJP

MaoM

StepaniantsS

BieryM

CastleJ

2004 Individual-specific variation of gene expression in peripheral blood leukocytes. Genomics 83 980 988

20. CobbJP

MindrinosMN

Miller-GrazianoC

CalvanoSE

BakerHV

2005 Application of genome-wide expression analysis to human health and disease. Proc Natl Acad Sci U S A 102 4801 4806

21. DumeauxV

Borresen-DaleAL

FrantzenJO

KumleM

KristensenVN

2008 Gene expression analyses in breast cancer epidemiology: the Norwegian Women and Cancer postgenome cohort study. Breast Cancer Res 10 R13

22. LundE

DumeauxV

BraatenT

HjartakerA

EngesetD

2008 Cohort profile: The Norwegian Women and Cancer Study–NOWAC–Kvinner og kreft. Int J Epidemiol 37 36 41

23. DumeauxV

LundE

Borresen-DaleAL

2008 Comparaison of globin RNA processnig methods for genome-wide transcriptome analysis from whole blood. Biomarkers in Medicine 2 11 21

24. LundE

DumeauxV

2008 Systems epidemiology in cancer. Cancer Epidemiol Biomarkers Prev 17 2954 2957

25. MansmannU

MeisterR

2005 Testing differential gene expression in functional groups. Goeman's global test versus an ANCOVA approach. Methods Inf Med 44 449 453

26. GoemanJJ

van de GeerSA

deKF

van HouwelingenHC

2004 A global test for groups of genes: testing association with a clinical outcome. Bioinformatics 20 93 99

27. DennisGJr

ShermanBT

HosackDA

YangJ

GaoW

2003 DAVID: Database for Annotation, Visualization, and Integrated Discovery. Genome Biol 4 3

28. HuttenhowerC

HaleyEM

HibbsMA

DumeauxV

BarrettDR

2009 Exploring the human genome with functional maps. Genome Res 19 1093 1106

29. KanehisaM

GotoS

2000 KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res 28 27 30

30. ChenLS

StoreyJD

2008 Eigen-R2 for dissecting variation in high-dimensional studies. Bioinformatics 24 2260 2262

31. van LeeuwenDM

vanAE

GottschalkRW

VlietinckR

GielenM

2007 Cigarette smoke-induced differential gene expression in blood cells from monozygotic twin pairs. Carcinogenesis 28 691 697

32. ConnollyPH

CaiozzoVJ

ZaldivarF

NemetD

LarsonJ

2004 Effects of exercise on gene expression in human peripheral blood mononuclear cells. J Appl Physiol 97 1461 1469

33. FrasorJ

StossiF

DanesJM

KommB

LyttleCR

2004 Selective estrogen receptor modulators: discrimination of agonistic versus antagonistic activities by gene expression profiling in breast cancer cells. Cancer Res 64 1522 1533

34. BouwensM

AfmanLA

MullerM

2007 Fasting induces changes in peripheral blood mononuclear cell gene expression profiles related to increases in fatty acid beta-oxidation: functional role of peroxisome proliferator activated receptor alpha in human peripheral blood mononuclear cells. Am J Clin Nutr 86 1515 1523

35. LundE

KumleM

BraatenT

HjartakerA

BakkenK

2003 External validity in a population-based national prospective study–the Norwegian Women and Cancer Study (NOWAC). Cancer Causes Control 14 1001 1008

36. HjartakerA

AndersenLF

LundE

2007 Comparison of diet measures from a food-frequency questionnaire with measures from repeated 24-hour dietary recalls. The Norwegian Women and Cancer Study. Public Health Nutr 1 10

37. ParrCL

VeierodMB

LaakeP

LundE

HjartakerA

2006 Test-retest reproducibility of a food frequency questionnaire (FFQ) and estimated effects on disease risk in the Norwegian Women and Cancer Study (NOWAC). Nutr J 5 4

38. WaasethM

BakkenK

DumeauxV

OlsenKS

RylanderC

2008 Hormone replacement therapy use and plasma levels of sex hormones in the Norwegian Women and Cancer postgenome cohort -⁠ a cross-sectional analysis. BMC Womens Health 8 1

39. FryeRE

SchwartzBS

DotyRL

1990 Dose-related effects of cigarette smoking on olfactory function. JAMA 263 1233 1236

40. KatotomichelakisM

BalatsourasD

TripsianisG

DavrisS

MaroudiasN

2007 The effect of smoking on the olfactory function. Rhinology 45 273 280

41. SchlagerJJ

PowisG

1990 Cytosolic NAD(P)H:(quinone-acceptor)oxidoreductase in human normal and tumor tissue: effects of cigarette smoking and alcohol. Int J Cancer 45 403 409

42. OroszZ

CsiszarA

LabinskyyN

SmithK

KaminskiPM

2007 Cigarette smoke-induced proinflammatory alterations in the endothelial phenotype: role of NAD(P)H oxidase activation. Am J Physiol Heart Circ Physiol 292 H130 H139

43. JensenEJ

PedersenB

FrederiksenR

DahlR

1998 Prospective study on the effect of smoking and nicotine substitution on leucocyte blood counts and relation between blood leucocytes and lung function. Thorax 53 784 789

44. SchwartzJ

WeissST

1994 Cigarette smoking and peripheral blood leukocyte differentials. Ann Epidemiol 4 236 242

45. BadawiAF

HabibSL

MohammedMA

AbadiAA

MichaelMS

2002 Influence of cigarette smoking on prostaglandin synthesis and cyclooxygenase-2 gene expression in human urinary bladder cancer. Cancer Invest 20 651 656

46. BeloquiO

ParamoJA

OrbeJ

BenitoA

ColinaI

2005 Monocyte cyclooxygenase-2 overactivity: a new marker of subclinical atherosclerosis in asymptomatic subjects with cardiovascular risk factors? Eur Heart J 26 153 158

47. KapoorD

JonesTH

2005 Smoking and hormones in health and endocrine disorders. Eur J Endocrinol 152 491 499

48. KvaavikE

MeyerHE

TverdalA

2004 Food habits, physical activity and body mass index in relation to smoking status in 40–42 year old Norwegian women and men. Prev Med 38 1 5

49. TanakaS

InoueS

IsodaF

WasedaM

IshiharaM

1993 Impaired immunity in obesity: suppressed but reversible lymphocyte responsiveness. Int J Obes Relat Metab Disord 17 631 636

50. NiemanDC

HensonDA

Nehlsen-CannarellaSL

EkkensM

UtterAC

1999 Influence of obesity on immune function. J Am Diet Assoc 99 294 299

51. ZimmetP

TurnerR

McCartyD

RowleyM

MackayI

1999 Crucial points at diagnosis. Type 2 diabetes or slow type 1 diabetes. Diabetes Care 22 Suppl 2 B59 B64

52. SyedMA

Barinas-MitchellE

PietropaoloSL

ZhangYJ

HendersonTS

2002 Is type 2 diabetes a chronic inflammatory/autoimmune disease? Diabetes Nutr Metab 15 68 83

53. SchmidtMI

DuncanBB

SharrettAR

LindbergG

SavagePJ

1999 Markers of inflammation and prediction of diabetes mellitus in adults (Atherosclerosis Risk in Communities study): a cohort study. Lancet 353 1649 1652

54. PietropaoloM

Barinas-MitchellE

PietropaoloSL

KullerLH

TruccoM

2000 Evidence of islet cell autoimmunity in elderly patients with type 2 diabetes. Diabetes 49 32 38

55. LewisMR

TracyRP

2002 The role of the immune system in the insulin resistance syndrome. Curr Diab Rep 2 96 99

56. DhalluinC

CarlsonJE

ZengL

HeC

AggarwalAK

1999 Structure and ligand of a histone acetyltransferase bromodomain. Nature 399 491 496

57. ZengL

ZhouMM

2002 Bromodomain: an acetyl-lysine binding domain. FEBS Lett 513 124 128

58. KlugA

1999 Zinc finger peptides for the regulation of gene expression. J Mol Biol 293 215 218

59. GarfinkelMD

RudenDM

2004 Chromatin effects in nutrition, cancer, and obesity. Nutrition 20 56 62

60. LiL

YingL

NaesensM

XiaoW

SigdelT

2008 Interference of globin genes with biomarker discovery for allograft rejection in peripheral blood samples. Physiol Genomics 32 190 197

61. RainenL

OelmuellerU

JurgensenS

WyrichR

BallasC

2002 Stabilization of mRNA expression in whole blood samples. Clin Chem 48 1883 1890

62. DebeyS

ZanderT

BrorsB

PopovA

EilsR

2006 A highly standardized, robust, and cost-effective method for genome-wide transcriptome analysis of peripheral blood applicable to large-scale clinical trials. Genomics 87 653 664

63. LiuJ

WalterE

StengerD

ThachD

2006 Effects of globin mRNA reduction methods on gene expression profiles from whole blood. J Mol Diagn 8 551 558

64. LeekJT

StoreyJD

2007 Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genet 3 e161 doi:10.1371/journal.pgen.0030161

65. LeekJT

StoreyJD

2008 A general framework for multiple testing dependence. Proc Natl Acad Sci U S A 105 18718 18723

66. TroyanskayaO