Statistical methods for analysis of multienvironment trials in plant breeding

Buntaran, Harimurti

Doctoral Thesis

2021

Statistical methods for analysis of multienvironment trials in plant breeding

accuracy and precision

Buntaran, Harimurti

Dissertation_Harimurti_Buntaran.pdf (1.68 MB)

Abstract (English)

Multienvironment trials (MET) are carried out every year in different environmental conditions to evaluate a vast number of cultivars, i.e., yield, because different cultivars perform differently in various environmental conditions, known as genotype×environment interactions. MET aim to provide accurate information on cultivar performance so that a recommendation of which cultivar performs the best in a growers’ field condition can be available. MET data is often analysed via mixed models, which allow the cultivar effect to be random. The random effect of cultivar enables genetic correlation to be exploited across zones and considering the trials’ heterogeneity. A zone can be viewed as a larger target of population environments. The accuracy and precision of the cultivar predictions are crucial to be evaluated. The prediction accuracy can be evaluated via a cross-validation (CV) study, and the model selection can be done based on the lowest mean squared error prediction (MSEP). Also, since the trials’ locations hardly coincide with growers’ field, the precision of predictions needs to be evaluated via standard errors of predictions of cultivar values (SEPV) and standard errors of the predictions of pairwise differences of cultivar values (SEPD). The central objective of this thesis is to assess the model performance and conduct model selection via a CV study for zone-based cultivar predictions. Chapter 2 assessed the performance between empirical best linear unbiased estimations (EBLUE) and empirical best linear unbiased predictions (EBLUP) for zone-based prediction. Different CV schemes were done for the single-year and multi-year datasets to mimic the practice. A complex covariance structure such as factor-analytic (FA) was imposed to account for the heterogeneity of cultivar×zone (CZ) effect. The MSEP showed that the EBLUP models outperformed the EBLUE models. The zonation was necessary since it improved the accuracy and was preferable to make cultivar recommendations. The FA structure did not improve the accuracy compared to the simpler covariance structure, and so the EBLUP model with a simple covariance structure is sufficient for the single and multi-year datasets. Chapter 3 assessed the single-stage and stagewise analyses. The three weighting methods were compared in the stagewise analysis: two diagonal approximation methods and the fully efficient method with the unweighted analysis. The assessment was based on the MSEP instead of Pearson’s and Spearman’s correlation coefficients since the correlation coefficients are often very close between the compared models. The MSEP showed that the single-stage EBLUP and the stagewise weighting EBLUP strategy were very similar. Thus, the loss of information due to diagonal approximation is minor. In fact, the MSEP showed a more apparent distinction between the single-stage and the stagewise weighting analyses with the unweighted EBLUE compared to the correlation coefficients. The simple compound-symmetric covariance structure was sufficient for the CZ effect than the more complex structures. The choice between the single-stage and stagewise weighting analysis, thus, depends on the computational resources and the practicality of data handling. Chapter 4 assessed the accuracy and precision of the predictions for the new locations. The environmental covariates were combined with the EBLUP in the random coefficient (RC) models since the covariates provide more information for the new locations. The MSEP showed that the RC models were not the model with the smallest MSEP, but the RC models had the lowest SEPV and SEPD. Thus, the model selection can be done by joint consideration of the MSEP, SEPV, and SEPD. The models with EBLUE and covariate interaction effects performed poorly regarding the MSEP. The EBLUP models without RC performed best, but the SEPV and SEPD were large, considered unreliable. The covariate scale and selection are essential to obtain a positive definite covariance matrix. Employing unstructured covariance int the RC is crucial to maintaining the RC models’ invariance feature. The RC framework is suitable to be implemented with GIS data to provide an accurate and precise projection of cultivar performance for the new locations or environments. To conclude, the EBLUP model for zoned-based predictions should be preferred to obtain the predictions and rankings closer to the true values and rankings. The stagewise weighting analysis can be recommended due to its practicality and its computational efficiency. Furthermore, projecting cultivar performances to the new locations should be done to provide more targeted information for growers. The available environmental covariates can be utilised to improve the predictions’ accuracy and precision in the new locations in the RC model framework. Such information is certainly more valuable for growers and breeders than just providing means across a whole target population of environments.

Abstract (German)

Multi-Umwelt-Versuche (MET) werden unter verschiedenen Umweltbedingungen durchgeführt, um eine große Anzahl von Sorten, d.h. den Ertrag, zu bewerten, da verschiedene Sorten unter verschiedenen Umweltbedingungen unterschiedlich abschneiden, was als Genotyp×Umwelt-Interaktionen bekannt ist. Ziel der MET ist es, genaue Informationen über die Leistung der Sorten zu liefern, damit den Landwirten eine Empfehlung gegeben werden kann. MET-Daten werden häufig mit Hilfe von gemischten Modellen analysiert, bei denen der Effekt der Sorte zufällig ist. Der Zufallseffekt der Sorte ermöglicht es, die genetische Korrelation zwischen den Zonen zu nutzen und die Heterogenität der Versuche zu berücksichtigen. Eine Zone kann als ein größeres Ziel von Populationsumgebungen betrachtet werden. Die Genauigkeit und Präzision der Sortenvorhersagen müssen unbedingt bewertet werden. Die Vorhersagegenauigkeit kann durch eine Kreuzvalidierungsstudie (CV) bewertet werden, und die Modellauswahl kann auf der Grundlage der Vorhersage mit dem niedrigsten mittleren quadratischen Fehler (MSEP) erfolgen. Da die Versuchsstandorte kaum mit den Feldern der Landwirte übereinstimmen, muss auch die Genauigkeit der Vorhersagen anhand der Standardfehler der Vorhersagen der Sortenwerte (SEPV) und der Standardfehler der Vorhersagen der paarweisen Unterschiede der Sortenwerte (SEPD) bewertet werden. Das Hauptziel dieser Arbeit ist die Bewertung der Modellleistung und die Durchführung einer Modellauswahl mittels einer CV-Studie für zonenbasierte Sortenvorhersagen. In Kapitel 2 wurde die Leistung der empirisch besten linearen unvoreingenommenen Schätzungen (EBLUE) und der empirisch besten linearen unvoreingenommenen Vorhersagen (EBLUP) für die zonenbasierte Vorhersage bewertet. Für die ein- und mehrjährigen Datensätze wurden verschiedene CV-Schemata angewandt, um die Praxis zu imitieren. Eine komplexe Kovarianzstruktur wie die faktorenanalytische (FA) wurde eingeführt, um die Heterogenität des Effekts von Sorte×Zone (CZ) zu berücksichtigen. Der MSEP zeigte, dass die EBLUP-Modelle die EBLUE-Modelle übertrafen. Die Zonierung war notwendig, da sie die Genauigkeit verbesserte und bevorzugt zu Anbauempfehlungen führte. Die FA-Struktur verbesserte die Genauigkeit nicht im Vergleich zur einfacheren Kovarianzstruktur. Somit ist das EBLUP-Modell mit einer einfachen Kovarianzstruktur ausreichend. In Kapitel 3 wurden die einstufigen und stufenweisen Analysen bewertet. Bei der stufenweisen Analyse wurden die drei Gewichtungsmethoden miteinander verglichen. Die Bewertung erfolgte anhand des MSEP anstelle der Korrelationskoeffizienten von Pearson und Spearman, da die Korrelationskoeffizienten zwischen den verglichenen Modellen oft sehr eng beieinander liegen. Der MSEP zeigte, dass die einstufige EBLUP- und die stufenweise gewichtete EBLUP-Strategie sehr ähnlich waren. Der Informationsverlust durch die diagonale Approximation ist also gering. Der MSEP zeigte einen deutlicheren Unterschied zwischen den einstufigen und den stufenweisen gewichteten Analysen mit dem ungewichteten EBLUE im Vergleich zu den Korrelationskoeffizienten. Die einfache compound-symmetrische Kovarianzstruktur reichte für den CZ-Effekt besser aus als die komplexeren Strukturen. Die Wahl zwischen der einstufigen und der stufenweisen Gewichtungsanalyse hängt also von den Rechenressourcen und der Praktikabilität der Datenverarbeitung ab. In Kapitel 4 wurden die Genauigkeit und Präzision der Vorhersagen für die neuen Standorte bewertet. Die Umweltkovariaten wurden mit dem EBLUP in den Zufallskoeffizientenmodellen (RC) kombiniert, da die Kovariaten mehr Informationen für die neuen Standorte liefern. Der MSEP zeigte, dass die RC-Modelle nicht das Modell mit dem kleinsten MSEP waren, aber die RC-Modelle hatten den niedrigsten SEPV und SEPD. Daher kann die Modellauswahl durch eine gemeinsame Betrachtung von MSEP, SEPV und SEPD erfolgen. Die Modelle mit EBLUE und Kovariaten-Interaktionseffekten schnitten in Bezug auf den MSEP schlecht ab. Die EBLUP-Modelle ohne RC schnitten am besten ab, aber der SEPV und SEPD waren groß und wurden als unzuverlässig angesehen. Die Skalierung und die Auswahl der Kovariaten sind wesentlich, um eine positiv definite Kovarianzmatrix zu erhalten. Die Verwendung einer unstrukturierten Kovarianz in der RC ist entscheidend für die Aufrechterhaltung der Invarianz der RC-Modelle. Der RC-Rahmen eignet sich für die Implementierung mit GIS-Daten, um eine genaue und präzise Projektion der Leistung von Kulturpflanzen für neue Standorte oder Umgebungen zu erhalten. Zusammenfassend lässt sich sagen, dass die Analyse von MET durch EBLUP-Modelle und die Einbeziehung von Umweltkovariaten in die Modelle verbessert werden kann.

Publication license

Copyright

Faculty

Faculty of Agricultural Sciences

Institute

Institute of Crop Science

Examination date

2021-07-26

Supervisor

Piepho, Hans-Peter

Cite this publication

Buntaran, H. (2021). Statistical methods for analysis of multienvironment trials in plant breeding : accuracy and precision. https://hohpublica.uni-hohenheim.de/handle/123456789/6637

Identification

https://hohpublica.uni-hohenheim.de/handle/123456789/6637

Language

English

Classification (DDC)

630 Agriculture

Collections

Institut für Kulturpflanzenwissenschaften

Free keywords

Plant breeding Biostatistics Mixed models Genotype-environment interaction Cross-validation Gemischte Modelle Genotyp-Umwelt-Interaktion Kreuzvalidierung

Standardized keywords (GND)

Pflanzenzüchtung Biostatistik Landwirtschaft

BibTeX@phdthesis{Buntaran2021,
url = {https://hohpublica.uni-hohenheim.de/handle/123456789/6637},
author = {Buntaran, Harimurti},
title = {Statistical methods for analysis of multienvironment trials in plant breeding : accuracy and precision},
year = {2021},
school = {Universität Hohenheim},
}

Share this publication

Full item page

A new version of this entry is available:

Statistical methods for analysis of multienvironment trials in plant breeding

accuracy and precision

Abstract (English)

Abstract (German)

File is subject to an embargo until

This is a correction to:

A correction to this entry is available:

This is a new version of:

Other version

Notes

Publication license

Publication series

Published in

Other version

Faculty

Institute

Examination date

Supervisor

Cite this publication

Edition / version

Citation

Identification

DOI

ISSN

ISBN

Language

Publisher

Publisher place

Classification (DDC)

Collections

Original object

University bibliography

Free keywords

Standardized keywords (GND)

Sustainable Development Goals

BibTeX

Share this publication