A phenome-wide association study (PheWAS) in the Population Architecture using Genomics and Epidemiology (PAGE) study reveals potential pleiotropy in African Americans

Authors: Sarah A. Pendergrass ^aff001; Steven Buyske ^aff002; Janina M. Jeff ^aff004; Alex Frase ^aff005; Scott Dudek ^aff005; Yuki Bradford ^aff005; Jose-Luis Ambite ^aff006; Christy L. Avery ^aff007; Petra Buzkova ^aff008; Ewa Deelman ^aff006; Megan D. Fesinmeyer ^aff009; Christopher Haiman ^aff010; Gerardo Heiss ^aff007; Lucia A. Hindorff ^aff012; Chun-Nan Hsu ^aff013; Rebecca D. Jackson ^aff014; Yi Lin ^aff015; Loic Le Marchand ^aff016; Tara C. Matise ^aff003; Kristine R. Monroe ^aff010; Larry Moreland ^aff017; Kari E. North ^aff007; Sungshim L. Park ^aff010; Alex Reiner ^aff018; Robert Wallace ^aff019; Lynne R. Wilkens ^aff016; Charles Kooperberg ^aff015; Marylyn D. Ritchie ^aff005; Dana C. Crawford ^aff020
Authors place of work: Genentech, Inc., South San Francisco, California, United States of America ^aff001; Department of Statistics, Rutgers University, Piscataway, New Jersey, United States of America ^aff002; Department of Genetics, Rutgers University, Piscataway, New Jersey, United States of America ^aff003; Illumina, Inc., San Diego, California, United States of America ^aff004; Department of Genetics, Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America ^aff005; Information Sciences Institute; University of Southern California, Marina del Rey, California, United States of America ^aff006; Department of Epidemiology, University of North Carolina, Chapel Hill, North Carolina, United States of America ^aff007; Department of Biostatistics, University of Washington, Seattle, Washington, United States of America ^aff008; Amgen, Thousand Oaks, California, United States of America ^aff009; Department of Preventive Medicine, Keck School of Medicine, University of Southern California/Norris Comprehensive Cancer Center, Los Angeles, California, United States of America ^aff010; Carolina Center for Genome Sciences, University of North Carolina, Chapel Hill, North Carolina, United States of America ^aff011; National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland, United States of America ^aff012; Center for Research in Biological Systems, Department of Neurosciences, University of California, San Diego, La Jolla, California, United States of America ^aff013; The Ohio State University, Columbus, Ohio, United States of America ^aff014; Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America ^aff015; Epidemiology Program, University of Hawaii Cancer Center, Honolulu, Hawaii, United States of America ^aff016; University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America ^aff017; Department of Epidemiology, University of Washington, Seattle, Washington, United states of America ^aff018; Departments of Epidemiology and Internal Medicine, University of Iowa, Iowa City, Iowa, United States of America ^aff019; Cleveland Institute for Computational Biology, Cleveland, Ohio, United States of America ^aff020; Departments of Population and Quantitative Health Sciences and Genetics and Genome Sciences, Case Western Reserve University, Cleveland, Ohio, United States of America ^aff021
Published in the journal: PLoS ONE 14(12)
Category: Research Article
doi: https://doi.org/10.1371/journal.pone.0226771

Summary

We performed a hypothesis-generating phenome-wide association study (PheWAS) to identify and characterize cross-phenotype associations, where one SNP is associated with two or more phenotypes, between thousands of genetic variants assayed on the Metabochip and hundreds of phenotypes in 5,897 African Americans as part of the Population Architecture using Genomics and Epidemiology (PAGE) I study. The PAGE I study was a National Human Genome Research Institute-funded collaboration of four study sites accessing diverse epidemiologic studies genotyped on the Metabochip, a custom genotyping chip that has dense coverage of regions in the genome previously associated with cardio-metabolic traits and outcomes in mostly European-descent populations. Here we focus on identifying novel phenome-genome relationships, where SNPs are associated with more than one phenotype. To do this, we performed a PheWAS, testing each SNP on the Metabochip for an association with up to 273 phenotypes in the participating PAGE I study sites. We identified 133 putative pleiotropic variants, defined as SNPs associated at an empirically derived p-value threshold of p<0.01 in two or more PAGE study sites for two or more phenotype classes. We further annotated these PheWAS-identified variants using publicly available functional data and local genetic ancestry. Amongst our novel findings is SPARC rs4958487, associated with increased glucose levels and hypertension. SPARC has been implicated in the pathogenesis of diabetes and is also known to have a potential role in fibrosis, a common consequence of multiple conditions including hypertension. The SPARC example and others highlight the potential that PheWAS approaches have in improving our understanding of complex disease architecture by identifying novel relationships between genetic variants and an array of common human phenotypes.

Keywords:

Genome-wide association studies – insulin – Smoking habits – hypertension – myocardial infarction – Cell binding assay – African American people – Hematocrit

Introduction

Pleiotropy, however defined, has long been recognized as a feature of genomes with respect to their relationships to individual traits and outcomes that characterize phenomes [1–3]. Interest in human pleiotropy has spiked in the last decade owing to the availability of large genotype-phenotype datasets generated from genome-wide association studies (GWAS). The analysis and catalog collection of one phenotype versus many genotypes studies revealed that a sizable proportion of common genetic variants are associated with multiple related and independent phenotypes [4, 5]. These observations have led to the development of more systematic approaches to identify variant-level pleiotropy [6, 7], many of which have been applied to populations of mostly European-descent individuals ascertained in clinical settings (e.g., [8]).

Here, we describe a phenotype wide association study (PheWAS), a systematic approach to identify cross-phenotype associations, in the Population Architecture using Genomics and Epidemiology (PAGE) I study. The PAGE I study was established by the National Human Genome Research Institute (NHGRI) in 2008 with the intent to characterize GWAS-identified variants discovered in European populations using more diverse populations drawn from epidemiologic [9] and clinical [10] studies. The scope of the PAGE I study was subsequently expanded to include discovery and fine-mapping efforts using the Metabochip [11], a fixed-content array of ~200,000 variants designed to interrogate previously-identified GWAS variants as well as select genome regions related to cardio-metabolic traits for fine-mapping [12].

In this PheWAS, we investigated the associations between the 144,740 common genetic variants assayed on the Metabochip and 273 phenotypes collected in 5,897 African Americans participating in three epidemiologic PAGE I studies: the Atherosclerosis Risk in Communities (ARIC) [13]; Multiethnic Cohort (MEC) [14]; and the Women’s Health Initiative (WHI) [15]. We identified 133 potentially pleiotropic variants, defined as associated with two or more phenotype classes at p<0.01 in two or more PAGE I study sites. We functionally annotated PheWAS-identified variants and characterized the local genetic ancestry in this admixed population. From these data, we highlight variants likely to be pleiotropic and worthy of further statistical and functional studies. These data also underscore the necessity of diversity in study populations and study designs in PheWAS to ensure that all possible genotype-phenotype human relationships are considered.

Results

For this PheWAS (Fig 1), we comprehensively tested for associations between 114,740 SNPs assayed on the Metabochip with up to 273 phenotypes (S1 Table) available for 5,897 African American participants from three PAGE I studies: Atherosclerosis Risk in Communities (ARIC); Multiethnic Cohort (MEC); and the Women’s Health Initiative (WHI) (Table 1). Due to variations in the data collected across these epidemiologic studies, some phenotypes were available in more than one study, such as C-reactive protein (CRP) and low density lipoprotein cholesterol (LDL-C), while other phenotypes were only available within a single study, such as albumin level measurements. In Methods we describe further the studies included in this PheWAS, details of Metabochip genotyping and quality control, and the PheWAS approach including phenotype classification and filtering by statistical significance.

**Fig. 1. Overview of the Metabochip PheWAS study.**

**Tab. 1. Population Architecture using Genomics and Epidemiology (PAGE) I studies available for PheWAS and their characteristics.**

Replication of previously described genotype-phenotype associations

We first performed comprehensive single SNP tests of associations for each PAGE I study across all SNPs with a minor allele frequency >1% on the Metabochip that passed quality control and all phenotypes available (Fig 2). Of note are the two association peaks on chromosomes 1 and 19. These peaks represent two previously known genotype-phenotype associations, and their identification here attests to the quality of this high-throughput PheWAS approach. The first association peak on chromosome 1 between OLFML2B rs6676438 and natural log-transformed white blood cell count (Table 2) recapitulates a known association in African Americans along this chromosomal region. OLFML2B rs6676438 is located on the short arm of chromosome 1 in a 90MB region known to be in linkage disequilibrium with the Duffy null allele (DARC rs2814778) and associated with hematological traits in African Americans [16]. The second most significant association peak on chromosome 19 (Fig 2) represents the known association between APOE rs7412 and natural log-transformed apolipoprotein B (Table 2) [17–19]. Apolipoprotein B is the primary apolipoprotein of LDL-C, a phenotype heavily scrutinized by candidate gene, GWAS, and sequencing studies. From these studies, APOE rs7412 is known to be associated with LDL-C in multiple populations [20–27] including European Americans [18, 19, 28–30] and African Americans [18, 19, 28, 30–32] as well as with related phenotypes such as response to statin therapy [33–37], small dense LDL-C [38], and lipid metabolism phenotypes for LDL-C and free cholesterol [39]. In the present PheWAS, APOE rs7412, along with nearby SNPs, were within 100kb of previously-reported GWAS associations and associated with the following lipid-related traits in a single PAGE I study (at p<1.0x10^-4): total cholesterol, LDL-C, response to statin therapy, lipid metabolism phenotypes, and hypertriglyceridemia (Fig 3).

**Fig. 2. All genetic tests of association results, by PAGE study.**

<i>APOE</i> rs7412 and nearby single nucleotide polymorphisms associated with lipid-related traits in a single Population Architecture using Genomics and Epidemiology (PAGE) study. — **Fig. 3. *APOE* rs7412 and nearby single nucleotide polymorphisms associated with lipid-related traits in a single Population Architecture using Genomics and Epidemiology (PAGE) study.**

**Tab. 2. Most significant and previously reported genotype-phenotype associations identified in the Population Architecture using Genomics and Epidemiology (PAGE) I study.**

In addition to the strongly associated chromosome 1 and 19 peaks, this PheWAS replicated other previously-reported GWAS findings. For example, LDLR rs6511720 was significantly associated with lipid measurements, including LDL-C (p = 1.13x10^-08, beta(SE) = -9.0(1.60) in ARIC) (Fig 4), which has been reported in previous GWAS and genetic association studies in European Americans and African Americans [20, 40–43] where the A allele is associated with lower LDL-C levels. Likewise, the CETP rs3764261 was associated with HDL-C levels in African Americans (p = 1.13x10^-13, beta(SE) = 3.48(0.47) in ARIC; Fig 5), as previously reported [20, 40, 43].

<i>LDLR</i> rs6511720 and nearby single nucleotide polymorphisms associated with lipid-related traits in a single Population Architecture using Genomics and Epidemiology (PAGE) study. — **Fig. 4. *LDLR* rs6511720 and nearby single nucleotide polymorphisms associated with lipid-related traits in a single Population Architecture using Genomics and Epidemiology (PAGE) study.**

<i>CETP</i> rs3764261 and nearby single nucleotide polymorphisms associated with lipid-related traits in a single Population Architecture using Genomics and Epidemiology (PAGE) study. — **Fig. 5. *CETP* rs3764261 and nearby single nucleotide polymorphisms associated with lipid-related traits in a single Population Architecture using Genomics and Epidemiology (PAGE) study.**

Evidence of pleiotropy in African Americans

A total of 5,424 tests of association were significant at p<0.01 in two or more PAGE I studies and in the same direction for the same phenotype (S2 Table). To facilitate the identification of potential pleiotropy in African Americans, we grouped similar phenotypes measured in PAGE I studies into 30 phenotype classes regardless of genetic associations (Methods and Table 3). A PheWAS-identified variant then represented a variant associated with two or more phenotype classes meeting the significance threshold (Methods and S1 File). After phenotype class binning, we noted 133 SNPs associated with two or more distinct phenotype classes with the same direction of effect within a given phenotype class (S3 Table). As expected, the phenotype class combination ‘LDL-C/total cholesterol levels’ was associated with dozens (53) of the same SNPs. Also, 37 SNPs were associated with white blood count (WBC) coupled with other phenotype classes on chromosome 1, results likely driven by the Duffy polymorphism [16, 44].

**Tab. 3. Phenotype classes represented in the Population Architecture using Genomics and Epidemiology (PAGE) I study PheWAS in African Americans.**

The remaining 43 PheWAS-identified associations (Table 4; Fig 6) represent 38 independent associations at r²≥0.80 based on African population data from the 1000 Genomes Project [45]. Of these, seven (18.4%) PheWAS-identified variants were associated in the opposite direction between phenotype classes. Approximately half (20) of the phenotype-class combinations were associated with a single variant; the remainder were associated with more than one variant (Table 4). These multiple-associated phenotype classes were associated with two (insulin/height, body mass index/C-reactive protein, smoking/myocardial infarction, hypertension/smoking) and three (smoking/LDL-C, hemoglobin/hematocrit, smoking/alcohol consumption) variants each. One PheWAS-identified variant (rs9349379) was associated with three phenotype classes (smoking/diabetes/hypertension; Table 4 and Fig 6).

**Tab. 4. Concomitant PheWAS results in African Americans from the Population Architecture using Genomics and Epidemiology (PAGE) study.**

**Fig. 6. Phenotype classes associated with the same single nucleotide polymorphism in African Americans from the Population Architecture using Genomics and Epidemiology (PAGE) study.**

Apart from the expected pleiotropic associations represented by the LDL-total cholesterol and white blood cell phenotype classes, this PheWAS in African Americans from PAGE revealed potentially novel pleiotropic relationships, notably with phenotype classes that represent common exposure, lifestyle, or environmental variables. For example, rs568938 was associated with both LDL-C and smoking phenotype classes (Table 4 and Fig 6). The LDL-C/rs568938 association has been previously described in diverse populations [20, 46]; however, the association with the smoking phenotype class is novel regardless of population. The direction of effect for these associations suggests that the coded allele of rs568938 is associated with both increasing LDL-C and duration of smoking reported (Table 4), results that are consistent with epidemiological studies that describe a relationship between smoking and increased LDL-C [47]. Likewise, DOCK7 rs10889334, previously associated with total cholesterol [48] and cardiovascular disease [49] via linkage disequilibrium, was also associated with LDL-C and smoking phenotype classes in the same direction (Table 4). Among the non-LDL-C associations, PheWAS-identified PHACTR1 rs9349379 was associated with the three phenotype classes of smoking, diabetes, and hypertension in opposing directions. The PHACTR1 association with hypertension in this PheWAS is supported by the recent GWAS literature for blood pressure [49–51]. In contrast, the opposite-direction-of-effect association observed for smoking and diabetes is not yet supported by genetic data but instead supported by some of the epidemiologic literature where those who report current smoking have lower blood pressure and less hypertension compared with non-smokers (e.g., [52]). Other exposure, lifestyle, and environmental phenotype classes implicated in this PheWAS include alcohol consumption and hormone use (Table 4).

Functional and ancestral annotation of potentially pleiotropic SNPs

To better understand the functional impact of the 38 potentially pleiotropic SNPs (Table 4), we implemented several in silico annotation approaches for these as well as proxy SNPs (in linkage disequilibrium at r²≥0.8) using various public resources, including HaploReg v4.1 [53], RegulomeDB v1.1 [54], and the SNP and CNV Annotation Database (SCAN) [55]. Almost all (94.7%) of these 38 PheWAS-identified variants are intronic (20) or intergenic (16), with the remaining two classified as synonymous (rs114374279) and missense (rs76394293; Table 5). We note that most (24) PheWAS-identified variants were annotated as associated with gene expression or as expression quantitative trait loci (eQTL) in at least one resource used here (Table 5).

**Tab. 5. Functional and ancestral annotation of potentially pleiotropic single nucleotide polymorphisms in the Population Architecture using Genomics and Epidemiology (PAGE) study.**

We also estimated local genetic ancestry at these 38 PheWAS-identified loci given that African Americans are admixed, with varying proportions of African, European, and other ancestral alleles throughout the genome (Table 5). Consistent with reported global estimates of African and European ancestry proportions [56–59], PAGE African Americans have on average 78.8% African ancestry and 21.1% European ancestry for Metabochip variants (S1 Fig). For ancestry proportions at specific PheWAS-identified loci, we found that the majority of the annotated SNPs, such as SNPs within CELSR2 for example, were truly admixed and are consistent with global proportions (S2 Fig). However, there were loci where ancestral proportions substantially deviated from global proportions. For example, PheWAS-identified loci in DARC, JAZF1, MTMR11, and TLL2 have greater proportions of European ancestry than expected (S3 Fig). Conversely, regions such as PheWAS-identified RBKS have significantly greater proportions of African ancestry compared to global proportions (S4 Fig).

Discussion

We conducted here a large-scale PheWAS for >5,000 African Americans using dense array data and carefully collected and curated epidemiologic data. With these data, we replicate previous GWAS findings from mostly European-descent populations as well as identify novel pleiotropic associations. Because the PAGE study and other efforts have focused or are focusing on multi-population discovery efforts [20, 50, 60–65] as well as replication, generalization, and fine-mapping of GWAS-identified signals [43, 66–77], we focus the remainder of our Discussion on the potential novel pleiotropic associations identified in this African American PheWAS. Potential pleiotropic common variants were identified via single SNP tests of association by the PAGE I study followed by statistical significance filtering and comparison across phenotype classes. For tests of association with consistent statistical evidence across PAGE I studies, we further characterized the PheWAS-identified variants using functional and local genetic ancestry annotations to better understand possible mechanisms or explanations underlying the evidence for pleiotropy in this population. Of the 133 PheWAS-identified findings, we bring to attention those with the most statistical and in silico functional evidence.

Three PheWAS-identified variants were consistently associated with two phenotype classes in two or more PAGE study sites at p<0.01, and they or their proxies were identified as possible eQTLs and were previously associated with one of the phenotype classes in GWAS: DOCK7 rs10889334, APOB rs568938, and PHACTR1 rs9349379. All three were associated with the smoking phenotype class, and none of the three have been implicated in GWAS for any of the smoking categories curated by the NHGRI-EBI GWAS Catalog nor have they been implicated in recent gene-environment studies for lipid traits [78]. The other phenotype classes represented in these associations (LDL-C, hypertension, and diabetes) all have complex relationships with smoking, and these PheWAS data do not provide a clear causal pathway that defines the potentially pleiotropic variants’ relationships with the phenotype classes or between the phenotype classes themselves.

Among those variants without evidence of previous GWAS relationships, one example of a novel and potentially pleiotropic variant is the intronic SPARC rs4958487-A associated with increased glucose levels and hypertension. The secreted protein acidic and rich in cysteine (SPARC) gene product modulates the interaction between the extracellular matrix and surrounding cells and is highly expressed in fibrotic tissues [79]. Fibrosis is a clinical feature of hypertension, and both human and animal models support a relationship between SPARC and type 2 diabetes pathogenesis [80, 81]. While intronic, annotation of SPARC rs4958487 suggests that it is a significant eQTL in tibial artery (GTEx p = 7.0x10^-11) and coronary artery (GTEx p = 5.0x10^-7) tissues among others, with the A allele associated with higher SPARC expression compared with the G ancestral allele. Local genetic ancestry estimates for this locus suggest no deviations from expected proportions of European and African ancestry at this locus. Although SPARC rs4958487 has not yet been associated with any phenotype (including glucose or hypertension) at p<10⁻⁸ in the NHGRI-EBI GWAS Catalog, it was included on the Metabochip genotyping array for replication based on early meta-analyses of mean platelet volume in European-descent populations at p<1.0x10^-3 [82–84]. To our knowledge, the present PheWAS in African Americans provides the first statistical and in silico evidence for pleiotropy for this locus, which has already been noted as likely pleiotropic based on its possible roles in type 2 diabetes, obesity, cardiovascular disease, bone strength, tendinopathies, and cancers [81, 85].

Among the annotations examined for these PheWAS-identified associations, local genetic ancestry was among the least informative. Genetic ancestry and admixture are widely recognized as useful markers of human migration [56, 58] and disease associations [86], including potential genetic interactions [87]. Here we note several PheWAS-identified variants with fewer (DARC “Duffy” locus, JAZF1 rs216922, MTMR11 rs2205303, and TLL2 rs94208) or more (RBKS locus) African-derived alleles than expected. While some have interpreted deviations such as those likely to be due to natural selection since admixture [88], recent large-scale studies have suggested that most local ancestry deviations are due to chance [89].

The present study has several limitations as well as strengths. A major limitation of this and other PheWAS is sample size and power for any individual test of association, a limitation compounded by the multiple testing penalty. An ideal PheWAS would be one conducted in a large sample size of hundreds of thousands of uniformly genotyped (or sequenced) and phenotyped participants. The PAGE I study PheWAS represents a collaboration across several, independent epidemiologic cohorts each genotyped on the Metabochip, necessitating a strategy that emphasized within-study tests of association and across-study patterns of consistent results. The phenotype class assignments made here, while facilitating the within and across-study comparisons, were based mostly on study data labels interpreted by human curators rather than formal statistical examination of the phenotypic data. As a result, some correlated phenotypes were considered separate phenotype classes rather than a single large class. It is unclear, however, how to best classify the multiply-related phenotypes given that the phentoypic correlations are imperfect and the current GWAS-based evidence of overlapping but not completely identical genetic architectures for many of the phenotypes considered here.

A second major limitation of this and other PheWAS is interpretation of the observed associations. These data only include genetic variants targeted by the Metabochip [11, 12], a fixed-content array of GWAS-identified variants and fine-mapping regions from cardio-metabolic studies of mostly European populations. It is likely that other population-specific and trans-population variants not assayed here are associated with many of the phenotypes tested. For the significant associations identified in the present study, a PheWAS-identified association can be interpreted as evidence of true pleiotropy, true comorbidity, or confounding, among others [90]. The PAGE I study PheWAS-identified associations involving the phenotype class smoking illustrate this major limitation: LDL-C is associated with smoking, and the genetic variants are associated with both phenotype classes. It may be that these PheWAS results are highlighting the correlation between phenotype classes, revealing a novel causal pathway, or representing confounding. These PheWAS results could also be due to chance. Further statistical (e.g., independent statistical replication, mediation analysis, effect modification) and functional data will be required to properly interpret complex PheWAS associations.

While we acknowledge that this PheWAS has major limitations, it also has considerable strengths that complement other reported PheWAS. This PheWAS was conducted in African Americans using all Metabochip variants and phenotypes available whereas some previous PheWAS were conducted in European Americans or using specific variants or class of variants and/or a limited set of phenotypes [90–96]. The few genome-wide, phenome-wide reported PheWAS are based on clinical data extracted from electronic health records (EHRs) [8, 97, 98]. EHR-based PheWAS rely on structured phenotype data such as International Classification of Diseases codes (ICDs, otherwise known as billing codes) and laboratory values. While EHR data represent real-world clinical phenotypes, these data are not uniformly collected across all patients and are associated with known and unknown biases [99]. Also, EHR PheWAS have yet to consider unstructured exposure, behavioral, or lifestyle variables, which are known to be highly relevant to human health and disease risk but are notoriously difficult to extract from clinical free text [99]. The PAGE study is the first to introduce exposure, behavioral, and lifestyle data to the PheWAS landscape, and results suggest these variables may be relevant in describing the complex genetic architecture of traits and disease risk in humans. These results also provide useful data towards Mendelian randomization studies, which aim to use instrument variables to establish causal relationships. The ideal instrument variable is free of pleiotropy; thus, PheWAS could serve as a test of this important assumption of Mendelian randomization [100].

Conclusions

Our work reinforces the potential of PheWAS in epidemiologically collected, diverse populations. We confirm known genetic associations as well as identify potentially pleiotropic common variants across the genome in African Americans. These data reveal complex genetic relationships between common, complex disorders and, in some cases, exposures as-of-yet undetected in univariate analyses common in GWAS, underscoring the need for phenotype-wide studies to better understand the multiple dimensions of genotype-phenotype relationships in humans.

Methods and materials

PAGE study sites: Designs and populations

Summary descriptions for each PAGE study site are presented in Table 1. All study protocols were approved by Institutional Review Boards at their respective study sites (S2 File).

Causal Variants Across the Life Course (CALiCo) and the Atherosclerosis Risk in Communities (ARIC) study

CALiCo is a consortium of six demographically diverse population-based studies comprising of 58,000 men and women ranging in age from childhood to older adulthood and a central laboratory. The ARIC study is one of the six studies included in CALiCo and is a multi-center prospective investigation of atherosclerotic disease in a predominantly bi-racial population. European American and African American men and women aged 45–64 years at baseline were recruited from four communities: Forsyth County, North Carolina; Jackson, Mississippi; suburban areas of Minneapolis, Minnesota; and Washington County, Maryland m. A total of 15,792 individuals participated in the baseline examination in 1987–1989, with follow-up examinations in approximate 3-year intervals, during 1990–1992, 1993–1995, and 1996–1998. After the institutional review board at every participating university approved the ARIC Study protocol, written informed consent was obtained from each participant. A subset of ARIC participants was selected for genotyping and inclusion in these PAGE analyses. Data dictionaries for ARIC are available on their website (https://sites.cscc.unc.edu/aric/) as well as the database of Genotypes and Phenotypes (dbGaP) [101].

Multiethnic cohort (MEC)

The MEC is a population-based prospective cohort study consisting of 215,251 men and women, and comprises mainly five self-reported racial/ethnic populations: African Americans, Japanese Americans, Latinos, Native Hawaiians and European Americans [14]. The MEC was designed to provide prospective data on exposures and biomarkers potentially involved in cancer initiation and progression across groups with distinct cultural and dietary patterns. Between 1993 and 1996, adults between 45 and 75 years old were enrolled by completing a 26-page, self-administered questionnaire asking detailed information about dietary habits, demographic factors, level of education, personal behaviors, and history of prior medical conditions (e.g. diabetes). Between 1995 and 2004, blood specimens were collected from ~67,000 MEC participants at which time a short questionnaire was administered to update certain exposures and collect current information about medication use. Study protocols and consent forms were approved by the institutional review boards at all participating institutions. A subset of MEC participants were selected for genotyping and inclusion in these PAGE analyses. Data dictionaries for MEC (https://www.uhcancercenter.org/mec) are available in dbGaP.

Women’s health initiative (WHI)

WHI is a long-term national health study that focuses on strategies for preventing heart disease, breast and colorectal cancer and fracture in postmenopausal women. A total of 161,838 women aged 50–79 years old were recruited from 40 clinical centers in the US between 1993 and 1998 [102]. WHI consists of an observational study, two clinical trials of postmenopausal hormone therapy (estrogen alone or estrogen plus progestin), a calcium and vitamin D supplement trial, and a dietary modification trial. Trial exclusion criteria have been described previously [15]. Study protocols and consent forms were approved by the institutional review boards at all participating institutions. A subset of WHI women were selected for genotyping and inclusion in these PAGE analyses. Data dictionaries for WHI are available on their website (https://www.whi.org/researchers/data/WHIStudies/StudySites/Pages/home.aspx) as well as dbGaP.

Metabochip content and genotyping

The Metabochip has SNPs selected as GWAS replication targets for cardio-metabolic traits as well as SNPs in fine mapping regions around target SNPs [12]. The remaining SNPs on Metabochip include coverage of the HLA region, SNPs associated at genome wide significance with any human trait from the NHGRI GWAS catalog at the time of chip development, mitochondrial SNPs, SNPs on the X and Y chromosomes (not used in this study), and a series of “wild card” SNPs. Further details of this chip are available at the following URL: http://www.sph.umich.edu/csg/kang/Metabochip/.

Full Metabochip genotyping and quality control details are available in Buyske et al. [11]. Briefly, DNA samples were genotyped at the Human Genetics Center of the University of Texas-Houston (ARIC), the University of Southern California Genomics Core (MEC), and the Translational Genomics Research Institute (TGen) (WHI). Ninety HapMap YRI (Yoruba in Ibadan, Nigeria) samples were genotyped in each of the three sites for cross-site quality control. Genotypes were called separately for each PAGE study site at the PAGE Coordinating Center under a common protocol, using both the Genome Studio GenCall 2.0 algorithm as well the GenoSNP genotyping algorithm [103], which is a sample-based approach for capturing some of the rarer genotypes represented on Metabochip. Discordance between the results of the two algorithms were used as a quality control filter. A total of 0.9% of samples were removed based on sample quality control measures. A total of 14,328 (7.3%) SNPs was considered technical failures because of the GenCall or cluster separation score, call rate, Mendelian error rate, replication error rate, or deviation from Hardy Weinberg Equilibrium. An additional 5,248 (2.7%) SNPs were not used in this study because the probe sequence matched poorly to the reference genome. Identification of related individuals were identified using PLINK [104] and the calculations of identity by descent (IBD) for all pairs, up to 2^nd degree relatives. For pairs identified as related, one from each pair was dropped out of further analysis based on which individual had the higher call rate. Overall, 5,897 samples and 161,097 SNPs on the Metabochip passed the quality control criteria of the PAGE I study. A total of 144,740 of these SNPs passed the present study allele frequency threshold (1%).

To adjust for population stratification across study sites, principal components were determined separately for each PAGE I study using the smartpca package of the Eigensoft software [105]. The first two principal components were used as covariates in all analyses. Full analysis details for the ancestry adjustments are also available in Buyske et al. [11].

Genetic tests of association

All tests of association were performed separately for each PAGE I study site in PLINK [104] and adjusted for the first two principal components and sex (except for the women only WHI). A total of 144,740 SNPs were used in the PheWAS for 273 phenotypes. S1 Table lists the 273 phenotypes used in this study. Linear or logistic regression was performed for continuous or categorical dependent variables, respectively, assuming an additive genetic model (0, 1, or 2 copies of the coded allele). For variables with multiple categories, binning was used to create new variables of the form “A versus not A” for each category, and logistic regression was used to model the new binary variable. Linear regressions were repeated following a y to log (y+1) transformation of the response variable with +1 added to all continuous measurements before transformation to prevent variables recorded as zero from being omitted from analysis. The total numbers of associations calculated for this PheWAS where the coded allele frequency was greater than 1% were ARIC 22x10⁶, MEC 8x10⁶, and WHI 26x10⁶. Data were visualized using PhenoGram [106].

Phenotype Class Matching

All 273 individual PAGE study phenotypes were grouped into categories within sites and then grouped into categories across sites regardless of genetic association. As an example of within study collapsing, WHI had four separate phenotypic measurements related to diabetes, including “Diabetes ever (Y/N)” and “treated diabetes (Y/N)”, all binned together in the same phenotype class. Across PAGE, specific phenotypes clearly were collected for more than one study site, such as for the phenotype “Hemoglobin”. Other groups of phenotypes that fell within similar phenotypic domains but were not represented in the same form across PAGE study sites (e.g., hormone use, smoking) were also collapsed into phenotype classes. Phenotype classes were developed by one curator, and a second curator reviewed the resultant phenotypes and phenotype classes for consistency and accuracy. Neither curator used genetic association results in the development or review of the phenotype classes. The end result was a total of 30 phenotype classes.

Permutation testing

To determine an empirically derived p-value threshold, we used permutation testing. PheWAS is exploratory and thus incurs a substantial multiple hypothesis testing burden depending on the number of associations being calculated. Dependent on individual PAGE study, a Bonferroni correction would have resulted in an adjusted p-value threshold between ~4x10^-9 and ~7x10^-9 (S1 File). Bonferroni correction is not suitable for this and other PheWAS as there are correlations between phenotypes as well as correlations between SNPs (i.e., linkage disequilibrium). Therefore, the multiple associations of this PheWAS cannot be considered independent. To determine an empirical p-value threshold, we took a two-step approach. The first step was to permute the data within each study separately (ARIC, MEC, WHI):

Randomize the association between the genotype matrix and the phenotype matrix 1000 times, generating 1000 individual datasets.
- This preserved the relationships between genotypes
- This preserved the relationships between phenotypes
Perform PheWAS—comprehensive tests of association between all the phenotypes and genotypes—for each of the 1000 permuted datasets
- Output: Results for 1000 permuted PheWAS datasets

The second step was to determine how often SNP-phenotype associations were significant across two or more studies by chance alone (in the permuted null data). These results were then compared to the results from the unpermuted data. Our definition of replication was two or more studies with an association for the same phenotype class at a specific p-value threshold, and in S1 File we present results across 1000 permutations at various p-value thresholds. When requiring replication at any of our p-value thresholds, any single permuted data set did not have a total number of results equal to or greater than the total number of results in the unpermuted data, indicating that requiring replication and using a p-value threshold of 0.01 would allow us to explore the data while still maintaining a stringent-enough threshold to reduce our type-1 error rate. As we wanted to explore pleiotropy, we wanted to explore how many different phenotype classes we would expect for a single SNP by chance alone. Thus, we also compared the results of the permuted data versus the non-permuted data, when requiring replication for any single SNP result and more than one phenotype-class at different p-value thresholds. S1 File also presents results across 1000 permutations, requiring replication for each individual phenotype class, for more than one phenotype class at various p-value thresholds. At a p-value threshold of 0.01, we found only three permuted data sets with more SNPs associated with more than one phenotype class, compared to the 188 results of the non-permuted data. It is important to note that within the non-permuted data the 188 results were further refined, we removed any results that did not have the same direction of effect across studies.

Functional annotation

For each independent PheWAS-identified variant as well as SNPs in linkage disequilibrium (r²≥0.80), we annotated individual variants using HaploReg v4.1 [53], RegulomeDB v1.1 [54], and the SNP and CNV Annotation Database (SCAN) [55] (http://www.scandb.org/newinterface/index.html). HaploReg v4.1 [53] (http://www.broadinstitute.org/mammals/haploreg/haploreg.php) annotates SNPs with ENCODE and GENCODE, GTEx [107], and NHGRI-EBI GWAS Catalog data [108]. We supplemented GWAS annotations using the more recent (2019-June 20) version of the NHGRI-EBI GWAS Catalog. RegulomeDB [54] annotates variants based on evidence for transcription factor binding. SCAN is a database that provides summary information from eQTL experiments, mapping HapMap SNPs to gene expression in European Americans from UT, USA (CEU) and Yoruba people from Ibadan, Nigeria (YRI). The database provides a list of genes showing local and distant associations to the SNP in these two HapMap populations along with p-values calculated using quantitative trait linkage disequilibrium test (QTDT) method. The database also provides functional summary information available from other databases as well as other GWAS summary information for the SNPs used for annotation.

Genetic ancestry

We estimated both global and local genetic ancestry for all PAGE African Americans in this study. Global ancestry was estimated using ~196k SNPs on the Metabochip array and the ADMIXTURE software assuming K = 2 populations [109]. Although this test was unsupervised, HapMap YRI samples were included and a 5-fold cross validation was used to ensure accuracy. Local estimates of ancestry were calculated using LAMP-LD [110] for ~175,600 SNPs after LD pruning in PLINK [104]. Phased haplotypes for CEU and YRI reference samples from the 1000 Genomes Project were used. We calculated local ancestry using a sliding window of 50 SNPs (200 kb) and 10 states per SNP per recommended by the LAMP-LD manual for maximal accuracy and minimal computational time [110].

Supporting information

S1 Table [xlsx]
List of phenotypes included in the Population Architecture using Genomics and Epidemiology (PAGE) I phenome-wide association study (PheWAS), by PAGE study.

S2 Table [aric]
All PheWAS tests of association for the Population Architecture using Genomics and Epidemiology (PAGE) I study in African Americans.

S3 Table [xlsx]
Significant PheWAS tests of association for the Population Architecture using Genomics and Epidemiology (PAGE) I study in African Americans.

S1 File [docx]
Supporting text and tables for deriving the p-value threshold.

S2 File [docx]
Individual institutional review boards that approved the current study.

S1 Fig [snps]
Global genetic ancestry estimated for African Americans in the Population Architecture using Genomics and Epidemiology (PAGE) I study.

S2 Fig [ceu]
PheWAS-identified admixed loci based on local genetic ancestry estimates.

S3 Fig [ceu]
PheWAS-identified European-derived loci based on local genetic ancestry estimates.

S4 Fig [ceu]
PheWAS-identified African-derived locus based on local genetic ancestry estimates.

Zdroje

1. Stearns FW. One Hundred Years of Pleiotropy: A Retrospective. Genetics. 2010;186(3):767–73. doi: 10.1534/genetics.110.122549 21062962

2. Paaby AB, Rockman MV. The many faces of pleiotropy. Trends in Genetics. 2013;29(2):66–73. doi: 10.1016/j.tig.2012.10.010 23140989

3. Tyler AL, Crawford DC, Pendergrass SA. The detection and characterization of pleiotropy: discovery, progress, and promise. Brief Bioinform. 2016;17(1):13–22. doi: 10.1093/bib/bbv050 26223525

4. Hindorff LA, Sethupathy P, Junkins HA, Ramos EM, Mehta JP, Collins FS, et al. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proceedings of the National Academy of Sciences. 2009;106(23):9362–7.

5. Sivakumaran S, Agakov F, Theodoratou E, Prendergast JG, Zgaga L, Manolio T, et al. Abundant pleiotropy in human complex diseases and traits. Am J Hum Genet. 2011;89(5):607–18. doi: 10.1016/j.ajhg.2011.10.004 22077970

6. Denny JC. PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene-disease associations. Bioinformatics. 2010;26. doi: 10.1093/bioinformatics/btq126 20335276

7. Pendergrass SA, Brown-Gentry K, Dudek SM, Torstenson ES, Ambite JL, Avery CL, et al. The use of phenome-wide association studies (PheWAS) for exploration of novel genotype-phenotype relationships and pleiotropy discovery. Genetic Epidemiology. 2011;35(5):410–22. doi: 10.1002/gepi.20589 21594894

8. Verma A, Bang L, Miller JE, Zhang Y, Lee MTM, Zhang Y, et al. Human-Disease Phenotype Map Derived from PheWAS across 38,682 Individuals. The American Journal of Human Genetics. 2019;104(1):55–64. doi: 10.1016/j.ajhg.2018.11.006 30598166

9. Matise TC, Ambite JL, Buyske S, Carlson CS, Cole SA, Crawford DC, et al. The Next PAGE in Understanding Complex Traits: Design for the Analysis of Population Architecture Using Genetics and Epidemiology (PAGE) Study. American Journal of Epidemiology. 2011;174(7):849–59. doi: 10.1093/aje/kwr160 21836165

10. Crawford DC, Goodloe R, Farber-Eger E, Boston J, Pendergrass SA, Haines JL, et al. Leveraging epidemiologic and clinical collections for genomic studies of complex traits. Human Heredity. 2015;79(3–4):137–46. doi: 10.1159/000381805 26201699

11. Buyske S, Wu Y, Carty CL, Cheng I, Assimes TL, Dumitrescu L, et al. Evaluation of the Metabochip Genotyping Array in African Americans and Implications for Fine Mapping of GWAS-Identified Loci: The PAGE Study. PLoS ONE. 2012;7(4):e35651. doi: 10.1371/journal.pone.0035651 22539988

12. Voight BF, Kang HM, Ding J, Palmer CD, Sidore C, Chines PS, et al. The Metabochip, a custom genotyping array for genetic studies of metabolic, cardiovascular, and anthropometric traits. PLoS Genet. 2012;8(8):e1002793. doi: 10.1371/journal.pgen.1002793 22876189

13. The Atherosclerosis Risk in Communities (ARIC) Study: design and objectives. The ARIC investigators. Am J Epidemiol. 1989;129(4):687–702. 2646917

14. Kolonel LN, Henderson BE, Hankin JH, Nomura AMY, Wilkens LR, Pike MC, et al. A Multiethnic Cohort in Hawaii and Los Angeles: Baseline Characteristics. American Journal of Epidemiology. 2000;151(4):346–57. doi: 10.1093/oxfordjournals.aje.a010213 10695593

15. The Women’s Health Initiative. Design of the Women’s Health Inititiative clinical trail and observational study. Control Clin Trials. 1998;19(1):61–109.

16. Reiner AP. Genome-wide association study of white blood cell count in 16,388 African Americans: the continental origins and genetic epidemiology network (COGENT). PLoS Genet. 2011;7. doi: 10.1371/journal.pgen.1002108 21738479

17. Chiba-Falek O, Linnertz C, Guyton J, Gardner SD, Roses AD, McCarthy JJ, et al. Pleiotropy and allelic heterogeneity in the TOMM40-APOE genomic region related to clinical and metabolic features of hepatitis C infection. Human Genetics. 2012;131(12):1911–20. doi: 10.1007/s00439-012-1220-0 22898894

18. Radwan ZH, Wang X, Waqar F, Pirim D, Niemsiri V, Hokanson JE, et al. Comprehensive Evaluation of the Association of APOE Genetic Variation with Plasma Lipoprotein Traits in U.S. Whites and African Blacks. PLOS ONE. 2014;9(12):e114618. doi: 10.1371/journal.pone.0114618 25502880

19. Pirim D, Radwan ZH, Wang X, Niemsiri V, Hokanson JE, Hamman RF, et al. Apolipoprotein E-C1-C4-C2 gene cluster region and inter-individual variation in plasma lipoprotein levels: a comprehensive genetic association study in two ethnic groups. PLOS ONE. 2019;14(3):e0214060. doi: 10.1371/journal.pone.0214060 30913229

20. Hoffmann TJ, Theusch E, Haldar T, Ranatunga DK, Jorgenson E, Medina MW, et al. A large electronic-health-record-based genome-wide study of serum lipids. Nature Genetics. 2018;50(3):401–13. doi: 10.1038/s41588-018-0064-5 29507422

21. Verma A, Bradford Y, Verman SS, Pendergrass SA, Daar ES, Venuto C, et al. Multiphenotype association study of patients randomized to initiate antiretroviral regimens in AIDS Clinical Trials Group protocol A5202. Pharmacogenet Genomics. 2017;27(3):101–11. doi: 10.1097/FPC.0000000000000263 28099408

22. Takeuchi F, Isono M, Katsuya T, Yokota M, Yamamoto K, Nabika T, et al. Association of Genetic Variants Influencing Lipid Levels with Coronary Artery Disease in Japanese Individuals. PLOS ONE. 2012;7(9):e46385. doi: 10.1371/journal.pone.0046385 23050023

23. Burman D, Mente A, Hegele RA, Islam S, Yusuf S, Anand SS. Relationship of the ApoE polymorphism to plasma lipid traits among South Asians, Chinese, and Europeans living in Canada. Atherosclerosis. 2009;203(1):192–200. doi: 10.1016/j.atherosclerosis.2008.06.007 18656198

24. Larifla L, Armand C, Bangou J, Blanchet-Deverly A, Numeric P, Fonteau C, et al. Association of APOE gene polymorphism with lipid profile and coronary artery disease in Afro-Caribbeans. PLOS ONE. 2017;12(7):e0181620. doi: 10.1371/journal.pone.0181620 28727855

25. Natarajan P, Peloso GM, Zekavat SM, Montasser M, Ganna A, Chaffin M, et al. Deep-coverage whole genome sequences and blood lipids among 16,324 individuals. Nature Communications. 2018;9(1):3391. doi: 10.1038/s41467-018-05747-8 30140000

26. Sanna S, Li B, Mulas A, Sidore C, Kang HM, Jackson AU, et al. Fine Mapping of Five Loci Associated with Low-Density Lipoprotein Cholesterol Detects Variants That Double the Explained Heritability. PLOS Genetics. 2011;7(7):e1002198. doi: 10.1371/journal.pgen.1002198 21829380

27. Kanoni S, Masca NGD, Stirrups KE, Varga TV, Warren HR, Scott RA, et al. Analysis with the exome array identifies multiple new independent variants in lipid loci. Human Molecular Genetics. 2016;25(18):4094–106. doi: 10.1093/hmg/ddw227 27466198

28. Mh Chang, Ned ReM, Hong Y, Yesupriya A, Yang Q, Liu T, et al. Racial/Ethnic Variation in the Association of Lipid-Related Genetic Variants With Blood Lipids in the US Adult Population / Clinical Perspective. Circulation: Cardiovascular Genetics. 2011;4(5):523–33.

29. Talmud PJ, Drenos F, Shah S, Shah T, Palmen J, Verzilli C, et al. Gene-centric Association Signals for Lipids and Apolipoproteins Identified via the HumanCVD BeadChip. The American Journal of Human Genetics. 2009;85(5):628–42. doi: 10.1016/j.ajhg.2009.10.014 19913121

30. Lange Leslie A, Hu Y, Zhang H, Xue C, Schmidt Ellen M, Tang Z-Z, et al. Whole-Exome Sequencing Identifies Rare and Low-Frequency Coding Variants Associated with LDL Cholesterol. The American Journal of Human Genetics. 2014;94(2):233–45. doi.org/10.1016/j.ajhg.2014.01.010. 24507775

31. Chang MH, Yesupriya A, Ned RM, Mueller PW, Dowling NF. Genetic variants associated wtih fasting blood lipids in the US population: Third National Health and Nutrition Examination Survey. BMC Med Genet. 2010;11:62. doi: 10.1186/1471-2350-11-62 20406466

32. Rasmussen-Torvik LJ, Pacheco JA, Wilke RA, Thompson WK, Ritchie MD, Kho AN, et al. High Density GWAS for LDL Cholesterol in African Americans Using Electronic Medical Records Reveals a Strong Protective Variant in APOE. Clinical and Translational Science. 2012;5(5):394–9. doi: 10.1111/j.1752-8062.2012.00446.x 23067351

33. Chasman DI, Giulianini F, MacFadyen J, Barratt BJ, Nyberg F, Ridker PM. Genetic Determinants of Statin-Induced Low-Density Lipoprotein Cholesterol Reduction. Circulation: Cardiovascular Genetics. 2012;5(2):257–64.

34. Ciuculete DM, Bandstein M, Benedict C, Waeber G, Vollenweider P, Lind L, et al. A genetic risk score is significantly associated with statin therapy response in the elderly population. Clinical Genetics. 2017;91(3):379–85. doi: 10.1111/cge.12890 27943270

35. Lagos J, Zambrano T, Rosales A, Salazar LA. APOE polymorphisms contribute to reduced atorvastatin response in Chilean Amerindian subjects. Int J Mol Sci. 2015;16(4):7890–9. doi: 10.3390/ijms16047890 25860945

36. Thompson JF, Hyde CL, Wood LS, Paciga SA, Hinds DA, Cox DR, et al. Comprehensive Whole-Genome and Candidate Gene Analysis for Response to Statin Therapy in the Treating to New Targets (TNT) Cohort. Circulation: Cardiovascular Genetics. 2009;2(2):173–81. doi: 10.1161/CIRCGENETICS.108.818062 20031582

37. Mega JL, Morrow DA, Brown A, Cannon CP, Sabatine MS. Identification of Genetic Variants Associated With Response to Statin Therapy. Arteriosclerosis, Thrombosis, and Vascular Biology. 2009;29(9):1310–5. doi: 10.1161/ATVBAHA.109.188474 19667110

38. Morrison AC, Huang Z, Yu B, Metcalf G, Liu X, Ballantyne C, et al. Practical Approaches for Whole-Genome Sequence Analysis of Heart- and Blood-Related Traits. The American Journal of Human Genetics. 2017;100(2):205–15. doi.org/10.1016/j.ajhg.2016.12.009. 28089252

39. Kettunen J, Tukiainen T, Sarin A-P, Ortega-Alonso A, Tikkanen E, Lyytikäinen L-P, et al. Genome-wide association study identifies multiple loci influencing human serum metabolite levels. Nature Genetics. 2012;44:269. doi: 10.1038/ng.1073 22286219

40. Dumitrescu L, Carty CL, Taylor K, Schumacher FR, Hindorff LA, Ambite J-L, et al. Genetic Determinants of Lipid Traits in Diverse Populations from the Population Architecture using Genomics and Epidemiology (PAGE) Study. PLoS Genet. 2011;7(6):e1002138. doi: 10.1371/journal.pgen.1002138 21738485

41. Kathiresan S, Melander O, Guiducci C, Surti A, Burtt NP, Rieder MJ, et al. Six new loci associated with blood low-density lipoprotein cholesterol, high-density lipoprotein cholesterol or triglycerides in humans. Nat Genet. 2008;40(2):189–97. doi: 10.1038/ng.75 18193044

42. Fairoozy RH, White J, Palmen J, Kalea AZ, Humphries SE. Identification of the Functional Variant(s) that Explain the Low-Density Lipoprotein Receptor (LDLR) GWAS SNP rs6511720 Association with Lower LDL-C and Risk of CHD. PLOS ONE. 2016;11(12):e0167676. doi: 10.1371/journal.pone.0167676 27973560

43. Zubair N, Graff M, Luis Ambite J, Bush WS, Kichaev G, Lu Y, et al. Fine-mapping of lipid regions in global populations discovers ethnic-specific signals and refines previously identified lipid loci. Human Molecular Genetics. 2016;25(24):5500–12. doi: 10.1093/hmg/ddw358 28426890

44. Crosslin D, McDavid A, Weston N, Nelson S, Zheng X, Hart E, et al. Genetic variants associated with the white blood cell count in 13,923 subjects in the eMERGE Network. Human Genetics. 2012;131(4):639–52. doi: 10.1007/s00439-011-1103-9 22037903

45. Consortium GP, Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, et al. A global reference for human genetic variation. Nature. 2015;526(7571):68–74. doi: 10.1038/nature15393 26432245

46. Wu Y, Waite LL, Jackson AU, Sheu WH-H, Buyske S, Absher D, et al. Trans-Ethnic Fine-Mapping of Lipid Loci Identifies Population-Specific Signals and Allelic Heterogeneity That Increases the Trait Variance Explained. PLoS Genet. 2013;9(3):e1003379. doi: 10.1371/journal.pgen.1003379 23555291

47. Craig WY, Palomaki GE, Haddow JE. Cigarette smoking and serum lipid and lipoprotein concentrations: an analysis of published data. British Medical Journal. 1989;298(6676):784–8. doi: 10.1136/bmj.298.6676.784 2496857

48. Nagy R, Boutin TS, Marten J, Huffman JE, Kerr SM, Campbell A, et al. Exploration of haplotype research consortium imputation for genome-wide association studies in 20,032 Generation Scotland participants. Genome Medicine. 2017;9(1):23. doi: 10.1186/s13073-017-0414-4 28270201

49. Kichaev G, Bhatia G, Loh P-R, Gazal S, Burch K, Freund MK, et al. Leveraging Polygenic Functional Enrichment to Improve GWAS Power. The American Journal of Human Genetics. 2019;104(1):65–75. doi: 10.1016/j.ajhg.2018.11.008 30595370

50. Giri A, Hellwege JN, Keaton JM, Park J, Qiu C, Warren HR, et al. Trans-ethnic association study of blood pressure determinants in over 750,000 individuals. Nature Genetics. 2019;51(1):51–62. doi: 10.1038/s41588-018-0303-9 30578418

51. Surendran P, Drenos F, Young R, Warren H, Cook JP, Manning AK, et al. Trans-ancestry meta-analyses identify rare and common variants associated with blood pressure and hypertension. Nat Genet. 2016;48(10):1151–61. doi: 10.1038/ng.3654 27618447

52. Liu X, Byrd JB. Cigarette Smoking and Subtypes of Uncontrolled Blood Pressure Among Diagnosed Hypertensive Patients: Paradoxical Associations and Implications. American Journal of Hypertension. 2017;30(6):602–9. doi: 10.1093/ajh/hpx014 28203691

53. Ward LD, Kellis M. HaploReg: a resource for exploring chromatin states, conservation, and regulatory motif alterations within sets of genetically linked variants. Nucleic Acids Research. 2012;40(D1):D930–D4. doi: 10.1093/nar/gkr917 22064851

54. Boyle AP, Hong EL, Hariharan M, Cheng Y, Schaub MA, Kasowski M, et al. Annotation of functional variation in personal genomes using RegulomeDB. Genome Research. 2012;22(9):1790–7. doi: 10.1101/gr.137323.112 22955989

55. Gamazon ER, Zhang W, Konkashbaev A, Duan S, Kistner EO, Nicolae DL, et al. SCAN: SNP and copy number annotation. Bioinformatics. 2010;26(2):259–62. doi: 10.1093/bioinformatics/btp644 19933162

56. Tishkoff SA, Reed FA, Friedlaender FR, Ehret C, Ranciaro A, Froment A, et al. The Genetic Structure and History of Africans and African Americans. Science. 2009;324(5930):1035–44. doi: 10.1126/science.1172257 19407144

57. Dumitrescu L, Restrepo NA, Goodloe R, Boston J, Farber-Eger E, Pendergrass SA, et al. Towards a phenome-wide catalog of human clinical traits impacted by genetic ancestry. BioData Mining. 2015;8(35). doi: 10.1186/s13040-015-0068-y 26566401

58. Bryc K, Durand E-á, Macpherson J-á, Reich D, Mountain J-á. The Genetic Ancestry of African Americans, Latinos, and European Americans across the United States. The American Journal of Human Genetics. 2015;96(1):37–53. doi: 10.1016/j.ajhg.2014.11.010 25529636

59. Baharian S, Barakatt M, Gignoux CR, Shringarpure S, Errington J, Blot WJ, et al. The Great Migration and African-American Genomic Diversity. PLoS Genet. 2016;12(5):e1006059. doi: 10.1371/journal.pgen.1006059 27232753

60. Klarin D, Damrauer SM, Cho K, Sun YV, Teslovich TM, Honerlaw J, et al. Genetics of blood lipids among ~300,000 multi-ethnic participants of the Million Veteran Program. Nature Genetics. 2018;50(11):1514–23. doi: 10.1038/s41588-018-0222-9 30275531

61. Evangelou E, Warren HR, Mosen-Ansorena D, Mifsud B, Pazoki R, Gao H, et al. Genetic analysis of over 1 million people identifies 535 new loci associated with blood pressure traits. Nature Genetics. 2018;50(10):1412–25. doi: 10.1038/s41588-018-0205-x 30224653

62. Lin BM, Nadkarni GN, Tao R, Graff M, Fornage M, Buyske S, et al. Genetics of Chronic Kidney Disease Stages Across Ancestries: The PAGE Study. Frontiers in Genetics. 2019;10(494). doi: 10.3389/fgene.2019.00494 31178898

63. Wyss AB, Sofer T, Lee MK, Terzikhan N, Nguyen JN, Lahousse L, et al. Multiethnic meta-analysis identifies ancestry-specific and cross-ancestry loci for pulmonary function. Nature Communications. 2018;9(1):2976. doi: 10.1038/s41467-018-05369-0 30061609

64. Locke AE, Kahali B, Berndt SI, Justice AE, Pers TH, Day FR, et al. Genetic studies of body mass index yield new insights for obesity biology. Nature. 2015;518(7538):197–206. doi: 10.1038/nature14177 25673413

65. Wojcik GL, Graff M, Nishimura KK, Tao R, Haessler J, Gignoux CR, et al. Genetic analyses of diverse populations improves discovery for complex traits. Nature. 2019. doi: 10.1038/s41586-019-1310-4 31217584

66. Fernández-Rhodes L, Malinowski JR, Wang Y, Tao R, Pankratz N, Jeff JM, et al. The genetic underpinnings of variation in ages at menarche and natural menopause among women from the multi-ethnic Population Architecture using Genomics and Epidemiology (PAGE) Study: A trans-ethnic meta-analysis. PLOS ONE. 2018;13(7):e0200486. doi: 10.1371/journal.pone.0200486 30044860

67. Hodonsky CJ, Schurmann C, Schick UM, Kocarnik J, Tao R, van Rooij FJA, et al. Generalization and fine mapping of red blood cell trait genetic associations to multi-ethnic populations: The PAGE study. American Journal of Hematology. 2018;93(8):1061–73. doi: 10.1002/ajh.25161 29905378

68. Kocarnik JM, Richard M, Graff M, Haessler J, Bien S, Carlson C, et al. Discovery, fine-mapping, and conditional analyses of genetic variants associated with C-reactive protein in multiethnic populations using the Metabochip in the Population Architecture using Genomics and Epidemiology (PAGE) study. Human Molecular Genetics. 2018;27(16):2940–53. doi: 10.1093/hmg/ddy211 29878111

69. Gong J, Nishimura KK, Fernandez-Rhodes L, Haessler J, Bien S, Graff M, et al. Trans-ethnic analysis of metabochip data identifies two new loci associated with BMI. International Journal Of Obesity. 2018;42:384. doi: 10.1038/ijo.2017.304 29381148

70. Bien SA, Pankow JS, Haessler J, Lu YN, Pankratz N, Rohde RR, et al. Transethnic insight into the genetics of glycaemic traits: fine-mapping results from the Population Architecture using Genomics and Epidemiology (PAGE) consortium. Diabetologia. 2017;60(12):2384–98. doi: 10.1007/s00125-017-4405-1 28905132

71. Ng MCY, Graff M, Lu Y, Justice AE, Mudgal P, Liu C-T, et al. Discovery and fine-mapping of adiposity loci using high density imputation of genome-wide association studies in individuals of African ancestry: African Ancestry Anthropometry Genetics Consortium. PLOS Genetics. 2017;13(4):e1006719. doi: 10.1371/journal.pgen.1006719 28430825

72. Fernández-Rhodes L, Gong J, Haessler J, Franceschini N, Graff M, Nishimura KK, et al. Trans-ethnic fine-mapping of genetic loci for body mass index in the diverse ancestral populations of the Population Architecture using Genomics and Epidemiology (PAGE) Study reveals evidence for multiple signals at established loci. Human Genetics. 2017;136(6):771–800. doi: 10.1007/s00439-017-1787-6 28391526

73. Avery CL, Wassel CL, Richard MA, Highland HM, Bien S, Zubair N, et al. Fine mapping of QT interval regions in global populations refines previously identified QT interval loci and identifies signals unique to African and Hispanic descent populations. Heart Rhythm. 2017;14(4):572–80. doi: 10.1016/j.hrthm.2016.12.021 27988371

74. Yoneyama S, Yao J, Guo X, Fernandez-Rhodes L, Lim U, Boston J, et al. Generalization and fine mapping of European ancestry-based central adiposity variants in African ancestry populations. International Journal Of Obesity. 2016;41:324. doi: 10.1038/ijo.2016.207 27867202

75. Evans DS, Avery CL, Nalls MA, Li G, Barnard J, Smith EN, et al. Fine-mapping, novel loci identification, and SNP association transferability in a genome-wide association study of QRS duration in African Americans. Human Molecular Genetics. 2016;25(19):4350–68. doi: 10.1093/hmg/ddw284 27577874

76. Franceschini N, Carty CL, Lu Y, Tao R, Sung YJ, Manichaikul A, et al. Variant Discovery and Fine Mapping of Genetic Loci Associated with Blood Pressure Traits in Hispanics and African Americans. PLOS ONE. 2016;11(10):e0164132. doi: 10.1371/journal.pone.0164132 27736895

77. Liu C-T, Raghavan S, Maruthur N, Kabagambe Edmond K, Hong J, Ng Maggie CY, et al. Trans-ethnic Meta-analysis and Functional Annotation Illuminates the Genetic Architecture of Fasting Glucose and Insulin. The American Journal of Human Genetics. 2016;99(1):56–75. doi: 10.1016/j.ajhg.2016.05.006 27321945

78. Bentley AR, Sung YJ, Brown MR, Winkler TW, Kraja AT, Ntalla I, et al. Multi-ancestry genome-wide gene–smoking interaction study of 387,272 individuals identifies new loci associated with serum lipids. Nature Genetics. 2019;51(4):636–48. doi: 10.1038/s41588-019-0378-y 30926973

79. Trombetta-Esilva J, Bradshwa AD. The function of SPARC as a mediator of fibrosis. Open Rheumatol J. 2012;6:146–55. doi: 10.2174/1874312901206010146 22802913

80. Atorrasagasti C, Onorato A, Gimeno María L, Andreone L, Garcia M, Malvicini M, et al. SPARC is required for the maintenance of glucose homeostasis and insulin secretion in mice. Clinical Science. 2019;133(2):351–65. doi: 10.1042/CS20180714 30626728

81. Kos K, Wilding JPH. SPARC: a key player in the pathologies associated with obesity and diabetes. Nature Reviews Endocrinology. 2010;6:225. doi: 10.1038/nrendo.2010.18 20195270

82. Preuss M, König IR, Thompson JR, Erdmann J, Absher D, Assimes TL, et al. Design of the Coronary ARtery DIsease Genome-Wide Replication And Meta-Analysis (CARDIoGRAM) Study. Circulation: Cardiovascular Genetics. 2010;3(5):475–83. doi: 10.1161/CIRCGENETICS.109.899443 20923989

83. Meisinger C, Prokisch H, Gieger C, Soranzo N, Mehta D, Rosskopf D, et al. A Genome-wide Association Study Identifies Three Loci Associated with Mean Platelet Volume. The American Journal of Human Genetics. 2009;84(1):66–71. doi: 10.1016/j.ajhg.2008.11.015 19110211

84. Soranzo N, Spector TD, Mangino M, Kühnel B, Rendon A, Teumer A, et al. A genome-wide meta-analysis identifies 22 loci associated with eight hematological parameters in the HaemGen consortium. Nature Genetics. 2009;41:1182. doi: 10.1038/ng.467 19820697

85. Gehwolf R, Wagner A, Lehner C, Bradshaw AD, Scharler C, Niestrawska JA, et al. Pleiotropic roles of the matricellular protein Sparc in tendon maturation and ageing. Scientific Reports. 2016;6:32635. doi: 10.1038/srep32635 27586416

86. Winkler CA, Nelson GW, Smith MW. Admixture Mapping Comes of Age. Annual Review of Genomics and Human Genetics. 2010;11(1):65–89.

87. Fish AE, Crawford DC, Capra John A, Bush WS. Local ancestry transitions modify SNP-trait associations. Pac Symp Biocomput. 2018;23:424–35. 29218902

88. Bryc K, Velez C, Karafet T, Moreno-Estrada A, Reynolds A, Auton A, et al. Genome-wide patterns of population structure and admixture among Hispanic/Latino populations. Proceedings of the National Academy of Sciences. 2010;107(Supplement 2):8954–61.

89. Bhatia G, Tandon A, Patterson N, Aldrich Melinda C, Ambrosone Christine B, Amos C, et al. Genome-wide Scan of 29,141 African Americans Finds No Evidence of Directional Selection since Admixture. The American Journal of Human Genetics. 2014;95(4):437–44. doi: 10.1016/j.ajhg.2014.08.011 25242497

90. Bush WS, Oetjens MT, Crawford DC. Unravelling the human genome-phenome relationship using phenome-wide association studies. Nat Rev Genet. 2016;17(3):129–45. doi: 10.1038/nrg.2015.36 26875678

91. Oetjens MT, Bush WS, Denny JC, Birdwell K, Kodaman N, Verma A, et al. Evidence for extensive pleiotropy among pharmacogenes. Pharmacogenomics. 2016;17(8):853–66. doi: 10.2217/pgs-2015-0007 27249515

92. Chami N, Chen M-H, Slater Andrew J, Eicher John D, Evangelou E, Tajuddin Salman M, et al. Exome Genotyping Identifies Pleiotropic Variants Associated with Red Blood Cell Traits. The American Journal of Human Genetics. 2016;99(1):8–21. doi: 10.1016/j.ajhg.2016.05.007 27346685

93. Safarova MS, Satterfield BA, Fan X, Austin EE, Ye Z, Bastarache L, et al. A phenome-wide association study to discover pleiotropic effects of PCSK9, APOB, and LDLR. NPJ Genom Med. 2019;4:3. doi: 10.1038/s41525-019-0078-7 30774981

94. Verma SS, Frase AT, Verma A, Pendergrass SA, Mahony SA, Haas DW, et al. Phenome-wide interaction study (PheWIS) in AIDS Clinical Trials Group Data (ACTG). Pac Symp Biocomput. 2016;(21):5768–68.

95. Verma A, Verma SS, Pendergrass SA, Crawford DC, Crosslin DR, Kuivaniemi H, et al. eMERGE Phenome-Wide Association Study (PheWAS) identifies clinical associations and pleiotropy for stop-gain variants. BMC Medical Genomics. 2016;9(1):19–25. doi: 10.1186/s12920-016-0191-8 27535653

96. Verma A, Basile AO, Bradford Y, Kuivaniemi H, Tromp G, Carey D, et al. Phenome-Wide Association Study to Explore Relationships between Immune System Related Genetic Loci and Complex Traits and Diseases. PLOS ONE. 2016;11(8):e0160573. doi: 10.1371/journal.pone.0160573 27508393

97. Verma A, Leader JB, Verma SS, Frase A, Wallace J, Dudek S, et al. Integrating clinical laboratory measures and ICD-9 code diagnoses in phenome-wide association studies. Pac Symp Biocomput. 2016;21:168–79. 26776183

98. Denny JC. Systematic comparison of phenome-wide association study of electronic medical record data and genome-wide association study data. Nat Biotechnol. 2013;31. doi: 10.1038/nbt.2749 24270849

99. Pendergrass SA, Crawford DC. Using Electronic Health Records To Generate Phenotypes For Research. Current Protocols in Human Genetics. 2019;100(1):e80. doi: 10.1002/cphg.80 30516347

100. Emdin CA, Khera AV, Kathiresan S. Mendelian RandomizationMendelian RandomizationMendelian Randomization. JAMA. 2017;318(19):1925–6.

101. Mailman MD, Feolo M, Jin Y, Kimura M, Tryka K, Bagoutdinov R, et al. The NCBI dbGaP database of genotypes and phenotypes. Nat Genet. 2007;39(10):1181–6. doi: 10.1038/ng1007-1181 17898773

102. Anderson GL, Manson J, Wallace R, Lund B, Hall D, Davis S, et al. Implementation of the Women’s Health Initiative study design. Ann Epidemiol. 2003;13(9 Suppl):S5–S17. doi: 10.1016/s1047-2797(03)00043-7 14575938

103. Giannoulatou E, Yau C, Colella S, Ragoussis J, Holmes CC. GenoSNP: a variational Bayes within-sample SNP genotyping algorithm that does not require a reference population. Bioinformatics. 2008;24(19):2209–14. doi: 10.1093/bioinformatics/btn386 18653518

104. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81(3):559–75. Epub 559. doi: 10.1086/519795 17701901

105. Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D. Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet. 2006;38(8):904–9. doi: 10.1038/ng1847 16862161

106. Wolfe D, Dudek S, Ritchie M, Pendergrass S. Visualizing genomic information across chromosomes with PhenoGram. BioData Mining. 2013;6(1):18. doi: 10.1186/1756-0381-6-18 24131735

107. Lonsdale J, Thomas J, Salvatore M, Phillips R, Lo E, Shad S, et al. The Genotype-Tissue Expression (GTEx) project. Nat Genet. 2013;45(6):580–5. doi: 10.1038/ng.2653 23715323

108. MacArthur J, Bowler E, Cerezo M, Gil L, Hall P, Hastings E, et al. The new NHGRI-EBI Catalog of published genome-wide association studies (GWAS Catalog). Nucleic Acids Research. 2017;45(D1):D896–D901. doi: 10.1093/nar/gkw1133 27899670

109. Alexander DH, Novembre J, Lange K. Fast model-based estimation of ancestry in unrelated individuals. Genome Research. 2009;19(9):1655–64. doi: 10.1101/gr.094052.109 19648217

110. Baran Y, Pasaniuc B, Sankararaman S, Torgerson DG, Gignoux C, Eng C, et al. Fast and accurate inference of local ancestry in Latino populations. Bioinformatics. 2012;28(10):1359–67. doi: 10.1093/bioinformatics/bts144 22495753