Genomic selection in synthetic populations

Müller, Dominik

Doctoral Thesis

2017

Genomic selection in synthetic populations

Müller, Dominik

PhD_Thesis_Dominik_Mueller_Final.pdf (561.91 KB)

Abstract (English)

The foundation of genomic selection has been laid at the beginning of this century. Since then, it has developed into a very active field of research. Although it has originally been developed in dairy cattle breeding, it rapidly attracted the attention of the plant breeding community and has, by now (2017), developed into an integral component of the breeding armamentarium of international companies. Despite its practical success, there are numerous open questions that are highly important to plant breeders. The recent development of large-scale and cost-efficient genotyping platforms was the prerequisite for the rise of genomic selection. Its functional principle is based on information shared between individuals. Genetic similarities between individuals are assessed by the use of genomic fingerprints. These similarities provide information beyond mere family relationships and allow for pooling information from phenotypic data. In practice, first a training set of phenotyped individuals has to be established and is then used to calibrate a statistical model. The model is then used to derive predictions of the genomic values for individuals lacking phenotypic information. Using these predictions can save time by accelerating the breeding program and cost by reducing resources spent for phenotyping. A large body of literature has been devoted to investigate the accuracy of genomic selection for unphenotyped individuals. However, training individuals are themselves often times selection candidates in plant breeding, and there is no conceptual obstacle to apply genomic selection to them, making use of information obtained via marker-based similarities. It is therefore also highly important to assess prediction accuracy and possibilities for its improvement in the training set. Our results demonstrated that it is possible to increase accuracy in the training set by shrinkage estimation of marker-based relationships to reduce the associated noise. The success of this approach depends on the marker density and the population structure. The potential is largest for broad-based populations and under a low marker density. Synthetic populations are produced by intermating a small number of parental components, and they have played an important role in the history of plant breeding for improving germplasm pools through recurrent selection as well as for actual varieties and research on quantitative genetics. The properties of genomic selection have so far not been assessed in synthetics. Moreover, synthetics are an ideal population type to assess the relative importance of three factors by which markers provide information about the state of alleles at QTL, namely (i) pedigree relationships, (ii) co-segregation and (ii) LD in the source germplasm. Our results show that the number of parents is a crucial factor for prediction accuracy. For a very small number of parents, prediction accuracy in a single cycle is highest and mainly determined by co-segregation between markers and QTL, whereas prediction accuracy is reduced for a larger number of parents, where the main source of information is LD within the source germplasm of the parents. Across multiple selection cycles, information from pedigree relationships rapidly vanishes, while co-segregation and ancestral LD are a stable source of information. Long-term genetic gain of genomic selection in synthetics is relatively unaffected by the number of parents, because information from co-segregation and from ancestral LD compensate for each other. Altogether, our results provide an important contribution to a better understanding of the factors underlying genomic selection, and in which cases it works and what information contributes to prediction accuracy.

Abstract (German)

Die jüngste Entwicklung von großen, kosteneffizienten Genotypisierungsplattformen stellt eine Grundvoraussetzung für den Erfolg der genomischen Selektion dar. Das funktionale Prinzip beruht auf der Ausnutzung von Informationen zwischen Individuen. Vorhandene genetische Ähnlichkeiten werden durch den genomischen Fingerabdruck erfasst. Diese Ähnlichkeiten liefern Informationen, die über die reinen Verwandschaftsverhältnisse hinausgehen und erlauben die Ausnutzung phänotypischer Daten über Individuen hinweg. In der Praxis muss zunächst ein Kalibrierungsdatensatz mit phänotypisierten Individuen erstellt werden, der zur Schätzung eines statistischen Modells dient. Dieses Model wird hernach eingesetzt, um Vorhersagen über den genomischen Wert von Individuen ohne phänotypische Daten zu treffen. Die Verwendung dieser Vorhersagen kann Zeit einsparen, indem das Zuchtprogramm beschleunigt wird, aber auch durch eine Verringerung der zur Phänotypisierung eingesetzten Ressourcen Kosten senken. Die Untersuchung der Vorhersagegenauigkeit genomischer Selektion innerhalb nicht phänotypisierter Individuen war bereits Gegenstand zahlreicher Forschungsarbeiten. Bei den Trainingsindividuen zur Kalibrierung des Modells handelt es sich in der Pflanzenzüchtung jedoch häufig ebenfalls um potentielle Selektionskandidaten und es existiert kein prinzipielles Hindernis, genomische Selektion ebenso auf diese anzuwenden und die Information von markerbasierten Ähnlichkeiten auszunutzen. Daher ist es wichtig, die Vorhersagegenauigkeit sowie deren Verbesserungsmöglichkeiten im Trainingsdatensatz zu prüfen. Unsere Ergebnisse zeigen, dass es grundsätzlich möglich ist durch Schrumpfungsschätzung von markerbasierten Verwandschaften deren Störsignale zu vermindern und die Genauigkeit im Trainingsdatensatz zu steigern. Dabei hängt der Erfolg von der Markerdichte und der Populationstruktur ab. Das Potential ist am größten für breite Populationen bei einer geringen Markerdichte. Synthetische Populationen werden durch Kreuzung einer geringen Anzahl an elterlichen Komponenten erzeugt und haben in der Geschichte der Pflanzenzüchtung eine wichtige Rolle gespielt. Dies betrifft sowohl die Verbesserung des Zuchtmaterials durch rekurrente Selektion, als auch die Erstellung von Sorten sowie die quantitativ-genetische Züchtungsforschung. Die Eigenschaften genomischer Selektion wurden bisher nicht in Synthetiks untersucht. Zudem handelt es sich bei Synthetiks um einen idealen Populationstyp, um die Bedeutung der drei Faktoren zu untersuchen, durch welche Marker Informationen über den Zustand an QTL liefern, nämlich (i) Verwandschaftsverhältnisse (ii) Kosegregation und (iii) Kopplungsphasenungleichgewicht (LD) im Zuchtmaterial. Unsere Ergebnisse zeigen, dass die Elternzahl einen entscheidenden Faktor für die Vorhersagegenauigkeit darstellt. Bei einer sehr geringen Elternzahl ist die Vorhersagegenauigkeit innerhalb eines Zyklus am größten und wird hauptsächlich durch Kosegregation zwischen Markern und QTL bestimmt. Ist die Elternzahl hingegen groß, so tritt als vornehmliche Informationsquelle LD im Ursprungsmaterial der Eltern hervor. Wird genomische Selektion über mehrere Zyklen hinweg praktiziert, so verschwindet die Information aus Verwandschaftsverhältnissen sehr schnell, wohingegen sich Kosegregation und LD als stabile Informationsquellen erweisen. Der langfristige Selektionserfolg genomischer Selektion in einem Synthetik ist nur in einem geringen Maße abhängig von der Elternzahl, da sich Informationen aus Kosegregation und LD gegenseitig aufwiegen. Insgesamt liefern unsere Ergebnisse einen wichtigen Beitrag für ein besseres Verständnis der Grundlagen der genomischen Selektion, in welchen Fällen sie Erfolg verspricht, und welche Informationen die Vorhersagegenauigkeit beeinflussen.

Publication license

CC BY 3.0

Faculty

Faculty of Agricultural Sciences

Institute

Institute of Plant Breeding, Seed Science and Population Genetics

Examination date

2018-03-25

Supervisor

Melchinger, Albrecht E.

Cite this publication

Müller, D. (2017). Genomic selection in synthetic populations. https://hohpublica.uni-hohenheim.de/handle/123456789/6341

Identification

https://hohpublica.uni-hohenheim.de/handle/123456789/6341

Language

English

Classification (DDC)

630 Agriculture

Collections

Institut für Pflanzenzüchtung, Saatgutforschung und Populationsgenetik

Free keywords

Genom Selection Synthetic Prediction Synthetik

Standardized keywords (GND)

Genom Auslese Population Prognose

BibTeX@phdthesis{Müller2017,
url = {https://hohpublica.uni-hohenheim.de/handle/123456789/6341},
author = {Müller, Dominik},
title = {Genomic selection in synthetic populations},
year = {2017},
school = {Universität Hohenheim},
}

Share this publication

Full item page

A new version of this entry is available:

Genomic selection in synthetic populations

Abstract (English)

Abstract (German)

File is subject to an embargo until

This is a correction to:

A correction to this entry is available:

This is a new version of:

Other version

Notes

Publication license

Publication series

Published in

Other version

Faculty

Institute

Examination date

Supervisor

Cite this publication

Edition / version

Citation

Identification

DOI

ISSN

ISBN

Language

Publisher

Publisher place

Classification (DDC)

Collections

Original object

University bibliography

Free keywords

Standardized keywords (GND)

Sustainable Development Goals

BibTeX

Share this publication