Enhancing genomic selection by fitting large-effect SNPs as fixed effects and a genotype-by-environment effect using a maize BC1F3:4 population

Authors: Dongdong Li ^aff001; Zhenxiang Xu ^aff002; Riliang Gu ^aff002; Pingxi Wang ^aff002; Demar Lyle ^aff002; Jialiang Xu ^aff001; Hongwei Zhang ^aff001; Guogying Wang ^aff001
Authors place of work: National Key Facility for Crop Gene Resources and Genetic Improvement, Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, P. R. China ^aff001; Center for Seed Science and Technology, College of Agronomy and Biotechnology, China Agricultural University, Beijing, P. R. China ^aff002
Published in the journal: PLoS ONE 14(10)
Category: Research Article
doi: https://doi.org/10.1371/journal.pone.0223898

Summary

The popularity of genomic selection (GS) has increased owing to its prospects in commercial breeding. It is necessary to enhance GS to increase its efficiency. In this study, a maize BC₁F_3:4 population, consisting of 481 families, was evaluated for days to anthesis in four environments, and genotyped with DNA chips including 55,000 single nucleotide polymorphisms (SNPs). This population was used to investigate whether GS could be enhanced by borrowing information from the genetic basis and genotype-by-environment (G × E) interaction. The results showed that: 1) fitting the top four large-effect SNPs as fixed effects could increase prediction accuracy, including three minor-effect SNPs explaining less than 10% phenotypic variance; 2) the increase of prediction accuracy when fitting large-effect SNPs as fixed effects was related to the decrease of genetic variance; 3) generally, the GS model fitting large-effect SNPs as fixed effects and G × E component enhanced GS. Therefore, we propose fitting large-effect markers as fixed effects and G × E effect for crop breeding projects in order to obtain accurately predicted phenotypic data and conduct efficient selection of desired plants.

Keywords:

Plant genomics – Heredity – Maize – Genetic loci – Quantitative trait loci – Genome-wide association studies – Variant genotypes – Plant breeding

Introduction

Plant quantitative genetics is a burgeoning field, enabling the identification of a great number of quantitative trait locus/loci (QTL) and genes in crops. The QTL or gene information (including its position and effect) should be summarized and transferred to molecular markers to better serve crop breeding [1, 2]. The conventional use of QTL or gene information in plant breeding typically involves of marker-assisted selection (MAS), which requires the identification of significant QTL and selection of desired plants in advanced populations [3]. In the MAS method, the target QTL are usually major QTL, and minor QTL are often not detected due to the probability of false negative, thus the QTL information is not fully exploited [3]. Genomic selection (GS), which was introduced in animals and later applied to crop genetic research, could make use of minor-effect QTL for the improvement of target traits [4, 5].

GS entails the prediction of genomic estimated breeding values (GEBVs) of a validation population based on a training population, for which both phenotypic and genotypic data are available [4]. Factors influencing the prediction accuracy (PA) of GS should be considered and optimally controlled to accurately estimate GEBVs. These factors include, but are not limited to, population size, marker density, heritability, linkage disequilibrium, the genetic architecture of the target traits (QTL number, QTL effects, and QTL interactions), genetic relatedness between the training and validation populations, and GS models [6–8]. Some factors are difficult to modify when both genotypic and phenotypic data are available, whereas others can be optimized using statistical approaches. There might be differences in PAs among GS models, including ridge regression best linear unbiased prediction (rrBLUP), genomic best linear unbiased prediction (GBLUP), or the Bayesian alphabet [5, 9, 10]. However, the selection of an optimal GS model has proven difficult, as there is no clear indication of which model improves PA in all cases [9, 11, 12].

Examining of the components of GS models may provide insight into improving the PAs of GS models. In an rrBLUP model, the linear model is y = Xβ+Zu+ε, which is composed of fixed effect β and random effect u. Modifying the fixed and random effects might improve GS models. For example, a simulation study found that fitting major-effect SNPs as fixed effects could enhance genomic prediction [13, 14]. A study on wheat stem rust resistance found that the PA of a GBLUP model using markers linked to Sr2 (involved in stem rust resistance) as fixed effects was larger than that of an ordinary GBLUP [14]. Generally, it is beneficial to use known genetic or gene information to improve GS models. In maize, it is common to select breeding materials from the offsprings of F₁ plants [15]. Therefore, it is necessary to confirm the effect of fitting large-effect markers as fixed effects in GS models using a maize biparental population.

The use of additional random variables might be helpful in enhancing the prediction of GS models. As suggested by a study on wheat, the PA of a GS model including a genotype-by-environment (G × E) effect was higher than that of other GS models [16]. In another study on dairy cattle, the PA of a G × E GS model was the higher than other models for predicting metabolic body weight [17]. It is important to study the effect of modeling G × E in maize, which shows adaptations to various environments ranging from tropical to temperate regions. Flowering time is crucial for the adaptation of maize to diverse ecological regions, and is easily influenced by the environment, especially temperature and photoperiod [18, 19]. It was demonstrated that QTL controlling flowering time were closely related to environmental conditions, some QTL showed significant QTL × environmental interaction effects [18, 20]. Therefore, flowering time is suitable for analyzing the contribution of G × E to the PA of GS model.

In this study, 481 maize BC₁F_3:4 families were constructed using elite inbred lines. Meanwhile, genotypic and phenotypic data of the 481 families were obtained. The objectives of this study were to investigate whether fitting large-effect SNPs as fixed effects could increase the PAs of GS models, and whether modeling G × E interaction could increase the PAs of GS models. Afterward, we assessed the performance of the GS models fitting large-effect SNPs as fixed effects and G × E interaction. This analysis would provide insight into improving the PAs of GS models using available phenotypic and genotypic data.

Materials and methods

Plant materials and phenotyping

A biparental population was constructed using elite inbred lines Zheng58 and PH4CV. Zheng58 is the female parent of Zhengdan958 and PH4CV is the male parent of Xianyu335. Zhengdan958 and Xianyu335 are popular hybrids in China [21, 22]. The F₁ plants were backcrossed to PH4CV to produce the BC₁F₁ seeds. Each BC₁F₁ plant was pollinated with bulked pollens collected from at least ten other BC₁F₁ plants in the summer of 2014 in Shunyi, Beijing. The offsprings of these BC₁F₁ plants were defined as bulk-BC₁F₂. In the summer of 2015 in Shunyi, Beijing, forty-three bulk-BC₁F₂ families were sown, with each family in a one-row plot. Three plants in each row were self-pollinated to produce the BC₁F₃ seeds. In the winter of 2015 in Sanya, Hainan, 481 BC₁F₃ plants from three BC₁F₂ ears sown and self-pollinated to produce 481 BC₁F_3:4 families. The flowchart for the construction of materials used in this study was demonstrated in S1 Fig. The BC₁F_3:4 families were sown in Shunyi, Beijing, and Changji, Xinjiang, in the summer of 2016 and 2017, the four environments were identified as 16BJ, 17BJ, 16XJ and 17XJ, respectively. In each environment, the BC₁F_3:4 families were planted in a randomized complete design with two replicates. Within each replicate, each family was sown in a one-row plot. The row space was 50cm and the distance between two neighboring plants was 25cm. DA was recorded when 50% of the plants in each plot reached anthesis. The phenotypic data of DA are included in S1 File.

Phenotype data analysis

The best linear unbiased estimates (BLUEs) of the 481 BC₁F_3:4 families were estimated following the model:

where y_ijm is the phenotype of the i^th (i = 1,2 …,481) genotype in the j^th (j = 1,2,3,4) environment, the m^th (m = 1,2) replicate effect was nested in each environment. μ is the overall mean, g_i is the genotype effect, e_j is the environmental effect, ge_ij is the G × E effect, δ_(j)m is the replicate effect, and ε_ijm~N (0, σε2) is the error term. N stands for normal distribution. To compute BLUEs, g_i was treated as a fixed effect, and the other effects were treated as random effects with each random effect following a specific normal distribution. The model was fitted using the R package lme4 [23].

To calculate broad-sense heritability on an entry-mean basis (H²), all variables were treated as random effects to estimate their variances using the above model, which was fitted using R package lme4 [23]. The variances of genotype, G × E and error term were identified as σg2,σge2, and σε2, respectively. The formula for calculating H² is [24]:

where N_e is the number of environments, and r is the number of replicates.

Genotyping and data preprocessing

Fresh leaf tissues of the 481 BC₁F₃ plants were collected and DNA of each plant was extracted using a cetyltrimethyl ammonium bromide method [25]. DNA samples were sent to CapitalBio Corporation for DNA chip assay, which included 55,000 SNP loci covering the whole genome [26]. The physical position of the SNP markers was based on the B73 RefGen_V3 sequence. SNPs with a calling rate larger than 97% were used. The genotyping data were filtered by removing SNPs with missing data in any parent, SNPs that were non-polymorphic between parents, and SNPs with a missing rate larger than 0.05. Missing markers were imputed with the expected values calculated from estimates of allele frequencies [10], the processed genotypic data were included in S2 File.

GWAS and selection of large-effect SNPs

GWAS was performed using the R package sommer [27] following the model:

where y* is an N×1 matrix of the BLUEs, β is a vector of fixed effects, g is the genetic effect, and is treated as a random effect with normal distribution g∼N(0,Kσu2), τ is the additive marker effects, ε is the residual and follows the normal distribution ε∼N(0,Iσε2). X, Z, and W are the corresponding design matrixes. K was estimated using the A.mat function in the R package rrBLUP with the following formula [5]:

for the j^th marker, p_j and q_j are the allele frequencies of “A” and “a”, respectively. SNP markers were coded as -1, 0, 1 for the genotypes “aa”, “Aa”, and “AA”, where “aa”, “Aa”, and “AA” were homozygous Zheng58, heterozygous and homozygous PH4CV alleles, respectively. W was computed by subtracting P from M as suggested by VanRaden, where the i^th column of P is 2(p_i − 0.5), M is the genotype matrix, and p_i is the minor allele frequency of locus i [28].

Considering that flowering time was controlled by a small number of QTL in most biparental populations [2], we selected the top 50 SNPs with the largest -log₁₀ (P) value to find the SNPs with the largest effects. The 50 SNPS were fitted in a multiple linear model, from which SS_reg and SS_tol for each SNP were computed. Here, SS_reg is the sum of square of each selected SNP, SS_tol is the sum of square of the linear model. Phenotypic variance explained (PVE) of each SNP was calculated by dividing SS_reg into SS_tol [29].

The effect of fitting large-effect SNPs as fixed effects on the PAs of GS models

The BLUEs were used to test how many large-effect SNPs should be used as fixed effects. PA, calculated as the correlation coefficient between predicted and observed phenotypic data, was obtained by running 100 five-fold cross validations (CVs). The linear mixed model was as follows [30]:

where y is the BLUEs, β is a matrix containing the fixed effects, u is the genetic effect treated as a random effect with u∼N(0,Kσu2), ε is the error term with the distribution ε∼N(0,Iσε2). σu2 and σε2 are the genetic and error variances, respectively. The additive relationship matrix K was calculated according to a previous report [31]. X and Z are the corresponding design matrixes. The above model was fitted using R package BGLR, Gaussian processes (RKHS) model was used for estimating the variances of random effects. The number of iterations and burn-in were set to 20,000 and 5,000, respectively [10].

When the top large-effect SNPs were fitted as fixed effects, β included the intercept and the effects of the large-effect SNPs, the genotypic data of the large-effect SNPs were added as columns of the X matrix. Meanwhile, the top SNPs were removed from overall markers when calculating K matrix [32]. Two-tailed student’s t-test analysis was used to test whether fitting one more SNP as fixed effect could increase PA by comparing the 100 PAs calculated by fitting top n SNPs with the 100 PAs calculated by fitting top n-1 SNPs (n≧1).

The above t-test analysis revealed that fitting the top four large-effect SNPs as fixed effects was optimal. To test the effect of adding the four large-effect SNPs on PA, four randomly-selected markers were chosen as fixed effects and PA was calculated correspondingly. This process was repeated for 200 times, then the 200 PAs were compared with the PA calculated using the four large-effect SNPs as fixed effects.

To calculate the PA of MAS using the top four large-effect SNPs, the four SNPs was fitted in a multiple regression model using the lm function in R. The phenotype was estimated using the predict function [33]. The PAs were calculated using 100 CVs. In order to prove the effect of MAS using the top four SNPs, we also calculate the PAs of MAS using four randomly-selected SNP.

GS using three environment models, with and without large-effect SNPs fitted as fixed effects

(1) Single environment (SE) model

The SE model can be expressed as:

where y_i is a vector of phenotypic data in the i^th environment, μ_i is the overall mean, β_i is a vector of the marker effect, X is the genotype matrix, and ε_i is the residual.

(2) A-E model

In this model, the marker effect of each SNP in all environments is assumed to be constant, and supposing that we have n environments [16, 34, 35], the model is:

where y_i is the phenotype in the i^th (1, 2, …, n) environment, μ_i is the overall mean in the i^th environment, X_i is the genotype matrix, and ε_i is the residual error.

(3) G × E model

In the G × E model, y_i and μ_i were the same as those in the A-E model, marker effect β was decomposed into two parts, a constant main effect β₀ and the environment-specific effect β_i. The mixed linear model is:

The three environment models were analyzed in the R package BGLR [10]. The code for implementing A-E and G × E was revised from a previous report [36].

Cross-validation strategies

The variance components were estimated by fitting the full data set to each of the three models (the SE, A-E, and G × E models). The full data were scaled to standard normal distribution with mean and variance set to zero and one, respectively. In all cases, the number of iterations and burn-in were set to 20,000 and 5,000, respectively.

In the SE analysis, prediction accuracy was calculated using 100 five-fold CVs.

In the multiple environments GS models (the A-E and G × E models), two different CV schemes (CV1 and CV2, S1 Table) were used according to different breeding practices [16, 35, 37]. Briefly, CV1 was designed to predict the performance of newly-developed or untested lines that were not evaluated in any environment. CV2 was designed to predict the phenotype of some materials that was missing or not evaluated in some environments. Because one pair of environments was used to perform multi-environments GS each time, and the number of families in the two environments was different, the CV was performed based on the minimum number of families evaluated in the pair of environments.

PA was calculated as the correlation coefficient between the predicted and observed phenotype for either of the three models.

Results

Phenotypic data analysis

DA of the 481 BC₁F_3:4 families were evaluated in four environments over two years (16BJ, 17BJ, 16XJ, and 17XJ), the families flowered earlier in 17BJ (Table 1, Fig 1A and 1B). The correlation coefficients between each pair of environments varied from 0.48 to 0.63, suggesting that DA shared a common genetic basis across all environments (Fig 1A). The heritability estimated across multiple environments and the coefficients of variance proved the stability of DA (Table 1).

**Fig. 1. Distribution of days to anthesis (DA) and the correlation of DA between each pair of environments.**

Basic description of days to anthesis (DA) of the BC<sub>1</sub>F<sub>3:4</sub> population. — **Tab. 1. Basic description of days to anthesis (DA) of the BC₁F_3:4 population.**

Genotypic data analysis

In total, 11,781 polymorphic SNP markers were obtained after filtering, these markers distributed across the whole genome with a sufficiently high density for GS analysis (Fig 2A). Genotypic analysis of the 481 BC₁F₃ plants revealed that the backgrounds of most plants were the homozygous PH4CV genotype, which covered 65.4% of the genome on average. The average coverages of homozygous Zheng58 and heterozygous genotypes were 16.0% and 18.6%, respectively (Fig 2B; S2 Fig; S2 Table). Zheng58 alleles were present across the whole genome, although it was the donor parent (Fig 2B), suggesting that the BC₁F₃ population was segregating across the whole genome.

Distribution of 11,781 polymorphic SNPs in the maize genome, and genetic composition of each of the 481 BC<sub>1</sub>F<sub>3</sub> plants. — **Fig. 2. Distribution of 11,781 polymorphic SNPs in the maize genome, and genetic composition of each of the 481 BC₁F₃ plants.**

GWAS and mutiple linear regression analysis identified large-effect SNPs

GWAS was used to identify the genetic basis of DA, QQ plot revealed that the GWAS model was well-fitted in the population under study. Manhattan plot revealed that the highest peak was on chromosome 2, followed by chromosome 9 (Fig 3A and 3B). To identify the loci with large effects, the top 50 SNPs with the largest -log₁₀(P) values were selected and fitted using a multiple linear regression model, then PVE of each SNP was calculated. Chr3_159867173, an SNP on chromosome 3, had the largest PVE of 11.88%, followed by Chr2_56238969, Chr9_154782803 and Chr3_23119818, explaining 7.52%, 4.81% and 4.59% of total phenotypic variance, respectively (Fig 3C).

**Fig. 3. Identification of large-effect SNPs and comparison of the PAs of various models.**

The PA of GS model fitting the top four large-effect SNPs as fixed effects outperform the other models

The BLUEs were used to determine how many large-effect SNPs should be fitted as fixed effects. The PAs of GS models increased with the increase of the number of top SNPs fitted as fixed effects. Student’s t-test analysis revealed that the increase of PA was not significant when the number large-effect SNPs increased from four to five (Fig 3D). Therefore, fitting the top four SNPs with the largest effects was optimal. To further demonstrate that the PA increase didn’t happen by chance, four randomly-selected SNPs were fitted as fixed effects using the GBLUP model. The PA calculated by fitting the four large-effect SNPs as fixed effects was larger than each of the 200 PAs of GS models fitting four randomly-selected SNPs as fixed effects (Fig 3E). By comparing the PA of GS models with the PA of MAS, we found that the PA of MAS using the top four SNPs was lower than the PA of GS models. The PA of MAS using the top four SNPs was higher than that of MAS using four randomly-selected SNPs (Fig 3F). The above analysis revealed that the four large-effect SNPs should represent real QTL and could be fitted as fixed effects in the following analysis.

Fitting the four large-effect SNPs as fixed effects generally decreased genetic variances and increased PA

To investigate the effects of fitting the four large-effect SNPs as fixed effects, the variance components were dissected using the full data. For the SE model, the genetic variances of the GS models decreased when the four large-effect SNPs were fitted as fixed effects (Table 2; S3 Fig). For the A-E and G × E models, the most evident differences were the decreases of the genetic variances when the four large-effect SNPs were fitted as fixed effects. For each of the two G × E interaction variances (σu12 and σu22 in Table 2), no constant differences were found when the four SNPs were fitted as fixed effects (Table 2). The analysis demonstrated that fitting the four large-effect SNPs as fixed effects would generally decrease the genetic variances, and that the four loci had constant effects across the four environments.

**Tab. 2. The genetic and error variances of the SE, A-E and G × E models and G × E interaction variances of the G × E model.**

The PAs of the three models (the SE, A-E, and G × E models) was calculated to demonstrate the effect of fitting the four large-effect SNPs as fixed effects. For the SE model, it was demonstrated that the PA increased when the four large-effect SNPs were fitted as fixed effects for each environment (S3 Fig). We also found that fitting the top four SNP identified in each environment as fixed effects could also increase PA (S4 Fig). For the multi-environment GS model including A-E model and G × E model, two cross-validation (CV) schemes, named as CV1 and CV2, were used (S1 Table). For the A-E model, fixing the four SNPs generally resulted in higher PAs for the CV1 and CV2 schemes (excluding one case in CV1 and three cases in CV2, Table 3). The results were similar for the G × E model. Generally speaking, the results suggested that fitting the four large-effect SNPs as fixed effects was advisable for each of the three models.

**Tab. 3. Fitting the four large-effect SNPs as fixed effects and a G × E component generally enhanced genomic prediction.**

The G × E models with the four large-effect SNPs fitted as fixed effects generally had better performance

When comparing the two multi-environment models (the A-E and G × E models) without fitting any SNP as a fixed effect, the G × E models had better performance than the A-E models in ten of the twelve cases for the CV1 scheme, and in eight of the twelve cases for the CV2 scheme (Table 3). When comparing the two multi-environment models with the four large-effect SNPs fitted as fixed effects, the G × E models outperformed the A-E models in nine of the twelve cases for the CV1 scheme and in eight of the twelve cases for the CV2 scheme (Table 3). The results supported that the PAs of the G × E models were generally larger than those of the A-E models.

Because both fitting large-effect SNPs as fixed effects and modeling G × E interaction could increase PA, it was assumed that the best prediction could be achieved using the models including both of the two factors. By looking through each row in Table 3, it could be found that the G × E models fitting the four large-effect SNPs as fixed effects had the highest PAs in ten of the twelve cases for the CV1 scheme and in eight of the twelve cases for the CV2 scheme. Therefore, including the two factors into the GS models should be a powerful strategy for enhancing GS efficiency.

Discussion

With the fast development of genome sequencing technology and the continual decreasing of the genotyping cost, efficient selection is becoming increasingly important for any commercial breeding programme. GS can increase breeding efficiency by making prediction at the seedling stage as soon as DNA of the prediction population was available, thus help breeders to exclude undesired genotypes. GS could also increase breeding efficiency by making the best prediction and selecting the desired plants at the decision-making stage of a GS breeding programme. This study was designed to examine how to make the best use of available genotypic and phenotypic data to make the best prediction by including additional components to the GS models. The analysis in this study suggested that, compared with the use of crude GS models, manipulating existing data using statistical approaches enhanced genomic prediction without increasing any cost.

The finding that using known genetic loci as fixed effects could increase PA highlighted the importance of obtaining and assessing these data [14, 38]. There are two general approaches to obtaining these data: summarizing the chromosome position of QTL and genes by retrieving published articles; performing QTL analysis using established training population with both genotypic and phenotypic data. It should be noted that the collected historical QTL and gene information might not be useful if not validated in the breeding population. However, even when QTL and gene information are validated in the training population, this information should be carefully examined. QTL of a specific trait can be influenced by heritability and genetic architecture, the target trait might be controlled by one or two major QTL, or by many minor-effect QTL. It was demonstrated using simulation data that the selection efficiency increased with the increase of heritability for a given genetic architecture where only one locus with a major effect was fitted as a fixed effect. The increase in prediction accuracy was negligible when the effect of a locus fitted as fixed effect was 5% [13]. However, in this study, we proved using real data that the increase in prediction accuracy was significant even the effects of three loci were less than 10% (Fig 3D), which might be related to the relatively high heritability of DA in this study.

Our analysis showed that fitting large-effect SNPs as fixed effects enhance GS. The prerequisite is that each of the large-effect SNPs should be in linkage disequilibrium with a real QTL. The four SNPs were in the chromosome regions of maize bin 3.05, 2.04, 9.07 and 3.04 according to ISU Integrated IBM 2009 (https://www.maizegdb.org/data_center/map). These regions contained consensus QTL according to QTL meta-analysis [2, 39], suggesting that the four SNPs detected in this study should represent real QTL.

Marker effects estimated using mixed models might not reflect the real genetic effects, because the genetic variance of each SNP was assumed to follow some prior distribution, and modeling of this prior distribution might affect the estimation of marker effects, especially for large-effect SNPs [10, 13, 40]. When markers with large effects were fitted as fixed effects, only the effects of the remaining SNP markers should be estimated, strong shrinkage could be avoided in estimating the effects of large-effect SNPs when solving GS models [5, 10]. Thus, the GEBVs can be estimated accurately when major genes are fitted as fixed effects.

In maize breeding programmes, a frequently-used strategy is to select lines from the advanced generation formed by crossing two elite inbred lines. However, GS studies modeling large-effect SNPs as fixed effects and G × E interaction effects using this kind of breeding population are relatively few. Therefore, it is necessary to examine how the PAs of GS models would change when the two factors are included in the GS models using a maize biparental population. Our study was conducted to address this concern, and we ultimately proved that fitting large-effect SNPs as fixed effects in the GS models would increase PA in a maize biparental population, even the effects of some SNPs were less than 5%. Furthermore, GS models fitting large-effect SNPs as fixed effects and G × E effects generally had the best performance. Our results should be useful for molecular crop breeding.

Conclusion

GWAS and multiple linear regression analysis was successfully applied to identify large-effect SNPs. Using the BLUEs, it was demonstrated that fitting the four large-effect SNPs as fixed effects increased PA and decrease genetic variance. We further demonstrated that combining G × E interaction and fitting large-effect SNPs as fixed effects could generally increase PA.

Supporting information

S1 File [zip]
Phenotypic data.

S2 File [zip]
Genotypic data.

S1 Fig [tif]
Flowchart for the construction of the 481 BCF families used in this study.

S2 Fig [tif]
Frequency distribution of the three genotypes in the 481 BCF plants.

S3 Fig [a]
Fitting the four large-effect SNPs as fixed effects could decrease genetic variance and increase PA for each environment.

S4 Fig [tif]
Fitting the four SNPs identified in each environment as fixed effects could increase PA.

S1 Table [docx]
The two cross validation schemes adopted to test the PA of AE and G*E GS models.

S2 Table [docx]
The proportion of each of the three genotypes in the 481 BCF plants.

Zdroje

1. Zhang HW, Uddin MS, Zou C, Xie CX, Xu YB, Li WX. Meta-analysis and candidate gene mining of low-phosphorus tolerance in maize. J Integr Plant Biol. 2014;56(3):262–70. doi: 10.1111/jipb.12168 24433531

2. Xu J, Liu Y, Liu J, Cao M, Wang J, Lan H, et al. The genetic architecture of flowering time and photoperiod sensitivity in maize as revealed by QTL review and meta analysis. J Integr Plant Biol. 2012;54(6):358–373. doi: 10.1111/j.1744-7909.2012.01128.x 22583799

3. Hospital F. Challenges for effective marker-assisted selection in plants. Genetica. 2009;136(2):303–310. doi: 10.1007/s10709-008-9307-1 18695989

4. Bernardo R, Yu J. Prospects for genomewide selection for quantitative traits in maize. Crop Sci. 2007;47(3):1082–1090.

5. Endelman JB. Ridge regression and other kernels for genomic selection with R package rrBLUP. Plant Genome. 2014;4(3):250–255.

6. Zhong SQ, Dekkers JCM, Fernando RL, Jannink JL. Factors affecting accuracy from genomic selection in populations derived from multiple inbred lines: a barley case study. Genetics. 2009;182(1):355–364. doi: 10.1534/genetics.108.098277 19299342

7. Zhang A, Wang H, Beyene Y, Semagn K, Liu Y, Cao S, et al. Effect of trait heritability, training population size and marker density on genomic prediction accuracy estimation in 22 bi-parental tropical maize populations. Front Plant Sci. 2017;8:1916. doi: 10.3389/fpls.2017.01916 29167677

8. Lenz PRN, Beaulieu J, Mansfield SD, Clement S, Desponts M, Bousquet J. Factors affecting the accuracy of genomic selection for growth and wood quality traits in an advanced-breeding population of black spruce (Picea mariana). BMC genomics. 2017;18(1):335. doi: 10.1186/s12864-017-3715-5 28454519

9. Habier D, Fernando RL, Kizilkaya K, Garrick DJ. Extension of the bayesian alphabet for genomic selection. BMC Bioinformatics. 2011;12:186. doi: 10.1186/1471-2105-12-186 21605355

10. Perez P, de los Campos G. Genome-wide regression and prediction with the BGLR statistical package. Genetics. 2014;198(2):483–495. doi: 10.1534/genetics.114.164442 25009151

11. Christian R, Frank T, Melchinger AE. Comparison of whole-genome prediction models for traits with contrasting genetic architecture in a diversity panel of maize inbred lines. BMC genomics. 2012;13(1):452.

12. Heslot N, Yang HP, Sorrells ME, Jannink JL. Genomic selection in plant breeding: A comparison of models. Crop Sci. 2012;52(1):146–160.

13. Bernardo R. Genomewide selection when major genes are known. Crop Sci. 2014;54(1):68–75.

14. Rutkoski JE, Poland JA, Singh RP, Huertaespino J, Bhavani S, Barbier H, et al. Genomic selection for quantitative adult plant stem rust resistance in wheat. Plant Genome. 2014; doi: 10.3835/plantgenome2014.02.0006

15. Hallauer AR, Carena MJ, Miranda Filho JB, Hallauer AR, Carena MJ, Miranda Filho JB. Quantitative genetics in maize breeding: Springer Science & Business Media; 2010.

16. Lopez-Cruz M, Crossa J, Bonnett D, Dreisigacker S, Poland J, Jannink JL, et al. Increased prediction accuracy in wheat breeding trials using a marker × environment interaction genomic selection model. G3. 2015;5(4):569–582. doi: 10.1534/g3.114.016097 25660166

17. Yao C, de los Campos G, Vandehaar MJ, Spurlock DM, Armentano LE, Coffey M, et al. Use of genotype × environment interaction model to accommodate genetic heterogeneity for residual feed intake, dry matter intake, net energy in milk, and metabolic body weight in dairy cattle. J Dairy Sci. 2017;100(3):2007–2016. doi: 10.3168/jds.2016-11606 28109605

18. Wang CL, Chen YH, Ku LX, Wang TG, Sun ZH, Cheng FF, et al. Mapping QTL associated with photoperiod sensitivity and assessing the importance of QTL x environment interaction for flowering time in maize. PLoS ONE. 2010;5(11): e14068. doi: 10.1371/journal.pone.0014068 21124912

19. Jung C, Muller AE. Flowering time control and applications in plant breeding. Trends Plant Sci. 2009;14(10):563–573. doi: 10.1016/j.tplants.2009.07.005 19716745

20. Nakagawa H, Yamagishi J, Miyamoto N, Motoyama M, Yano M, Nemoto K. Flowering response of rice to photoperiod and temperature: a QTL analysis using a phenological model. Theor Appl Genet. 2005;110(4):778–786. doi: 10.1007/s00122-004-1905-4 15723276

21. Ma J, Zhang DF, Cao YY, Wang LF, Li JJ, Lubberstedt T, et al. Heterosis-related genes under different planting densities in maize. J Exp Bot. 2018;69(21):5077–5087. doi: 10.1093/jxb/ery282 30085089

22. Song W, Shi Z, Xing JF, Duan MX, Su AG, Li CH, et al. Molecular mapping of quantitative trait loci for grain moisture at harvest in maize. Plant Breeding. 2017;136(1):28–32.

23. Bates D, Machler M, Bolker BM, Walker SC. Fitting linear mixed-effects models using lme4. J Stat Softw. 2015;67(1):1–48.

24. Han S, Miedaner T, Utz HF, Schipprack W, Schrag TA, Melchinger AE. Genomic prediction and GWAS of Gibberella ear rot resistance traits in dent and flint lines of a public maize breeding program. Euphytica. 2018;214(1):6.

25. Senior ML, Heun M. Mapping maize microsatellites and polymerase chain reaction confirmation of the targeted repeats using a CT primer. Genome. 1993;36(5):884–889. doi: 10.1139/g93-116 7903654

26. Xu C, Ren Y, Jian Y, Guo Z, Zhang Y, Xie C, et al. Development of a maize 55 K SNP array with improved genome coverage for molecular breeding. Mol Breeding. 2017;37(3):20.

27. Covarrubias-Pazaran G. Genome-assisted prediction of quantitative traits using the R package sommer. PLoS ONE. 2016;11(6):e0156744. doi: 10.1371/journal.pone.0156744 27271781

28. VanRaden PM. Efficient methods to compute genomic predictions. J Dairy Sci. 2008;91(11):4414–23. doi: 10.3168/jds.2007-0980 18946147

29. Zhao K, Tung CW, Eizenga GC, Wright MH, Ali ML, Price AH, et al. Genome-wide association mapping reveals a rich genetic architecture of complex traits in Oryza sativa. Nat Commun. 2011;2(1):467.

30. Su G, Christensen OF, Ostersen T, Henryon M, Lund MS. Estimating additive and non-additive genetic variances and predicting genetic merits using genome-wide dense single nucleotide polymorphism markers. PLoS ONE. 2012;7(9):e45293. doi: 10.1371/journal.pone.0045293 23028912

31. Jiang Y, Schmidt RH, Zhao YS, Reif JC. A quantitative genetic framework highlights the role of epistatic effects for grain-yield heterosis in bread wheat. Nat Genet. 2017;49(12):1741–1746. doi: 10.1038/ng.3974 29038596

32. Spindel JE, Begum H, Akdemir D, Collard B, Redona E, Jannink JL, et al. Genome-wide prediction models that incorporate de novo GWAS are a powerful new tool for tropical rice improvement. Heredity. 2016;116(4):395–408. doi: 10.1038/hdy.2015.113 26860200

33. Hadasch S, Simko I, Hayes RJ, Ogutu JO, Piepho HP. Comparing the predictive abilities of phenotypic and marker-assisted selection methods in a biparental lettuce population. Plant Genome. 2016;9(1). doi: 10.3835/plantgenome2015.03.0014 27898769

34. Sousa MBE, Cuevas J, Couto EGDO, Pérezrodríguez P, Jarquín D, Fritscheneto R, et al. Genomic-enabled prediction in maize using kernel models with genotype × environment interaction. G3. 2017;7(6):1995–2014. doi: 10.1534/g3.117.042341 28455415

35. Crossa J, de los Campos G, Maccaferri M, Tuberosa R, Burgueño J, Pérez-Rodríguez P. Extending the marker × environment interaction model for genomic-enabled prediction and genome-wide association analysis in durum wheat. Crop Sci. 2016;56:2193–2209.

36. Lopez-Cruz M, Crossa J, Bonnett D, Dreisigacker S, Poland J, Jannink JL, et al. Increased prediction accuracy in wheat breeding trials using a marker × environment interaction genomic selection model. G3. 2015;5(4):569–582. doi: 10.1534/g3.114.016097 25660166

37. Burgueño J, de los Campos G, Weigel K, Crossa J. Genomic prediction of breeding values when modeling genotype × environment interaction using pedigree and dense molecular markers. Crop Sci. 2012;52(2):707–719.

38. Spindel JE, Begum H, Akdemir D, Collard B, Redoña E, Jannink JL, et al. Genome-wide prediction models that incorporate de novo GWAS are a powerful new tool for tropical rice improvement. Heredity. 2016;116(4):395–408. doi: 10.1038/hdy.2015.113 26860200

39. Salvi S, Castelletti S, Tuberosa R. An updated consensus map for flowering time QTLs in maize. Maydica. 2009;54(4):501–12.

40. de los Campos G, Naya H, Gianola D, Crossa J, Legarra A, Manfredi E, et al. Predicting quantitative traits with regression models for dense molecular markers and pedigree. Genetics. 2009;182(1):375–385. doi: 10.1534/genetics.109.101501 19293140