Institut für Pflanzenzüchtung, Saatgutforschung und Populationsgenetik
Permanent URI for this collectionhttps://hohpublica.uni-hohenheim.de/handle/123456789/13
Browse
Browsing Institut für Pflanzenzüchtung, Saatgutforschung und Populationsgenetik by Sustainable Development Goals "9"
Now showing 1 - 8 of 8
- Results Per Page
- Sort Options
Publication DeepCob: precise and high-throughput analysis of maize cob geometry using deep learning with an application in genebank phenomics(2021) Kienbaum, Lydia; Correa Abondano, Miguel; Blas, Raul; Schmid, KarlBackground: Maize cobs are an important component of crop yield that exhibit a high diversity in size, shape and color in native landraces and modern varieties. Various phenotyping approaches were developed to measure maize cob parameters in a high throughput fashion. More recently, deep learning methods like convolutional neural networks (CNNs) became available and were shown to be highly useful for high-throughput plant phenotyping. We aimed at comparing classical image segmentation with deep learning methods for maize cob image segmentation and phenotyping using a large image dataset of native maize landrace diversity from Peru. Results: Comparison of three image analysis methods showed that a Mask R-CNN trained on a diverse set of maize cob images was highly superior to classical image analysis using the Felzenszwalb-Huttenlocher algorithm and a Window-based CNN due to its robustness to image quality and object segmentation accuracy (r = 0.99). We integrated Mask R-CNN into a high-throughput pipeline to segment both maize cobs and rulers in images and perform an automated quantitative analysis of eight phenotypic traits, including diameter, length, ellipticity, asymmetry, aspect ratio and average values of red, green and blue color channels for cob color. Statistical analysis identified key training parameters for efficient iterative model updating. We also show that a small number of 10–20 images is sufficient to update the initial Mask R-CNN model to process new types of cob images. To demonstrate an application of the pipeline we analyzed phenotypic variation in 19,867 maize cobs extracted from 3449 images of 2484 accessions from the maize genebank of Peru to identify phenotypically homogeneous and heterogeneous genebank accessions using multivariate clustering. Conclusions: Single Mask R-CNN model and associated analysis pipeline are widely applicable tools for maize cob phenotyping in contexts like genebank phenomics or plant breeding.Publication Effects of using deep learning to predict the geographic origin of barley genebank accessions on genome–environment association studies(2025) Chang, Che-Wei; Schmid, KarlGenome–environment association (GEA) is an approach for identifying adaptive loci by combining genetic variation with environmental parameters, offering potential for improving crop resilience. However, its application to genebank accessions is limited by missing geographic origin data. To address this limitation, we explored the use of neural networks to predict the geographic origins of barley accessions and integrate imputed environmental data into GEA. Neural networks demonstrated high accuracy in cross-validation but occasionally produced ecologically implausible predictions as models solely considered geographical proximity. For example, some predicted origins were located within non-arable regions, such as the Mediterranean Sea. Using barley flowering time genes as benchmarks, GEA integrating imputed environmental data ( N=11,032) displayed partially concordant yet complementary detection of genomic regions near flowering time genes compared to regular GEA ( N=1,626), highlighting the potential of GEA with imputed data to complement regular GEA in uncovering novel adaptive loci. Also, contrary to our initial hypothesis anticipating a significant improvement in GEA performance by increasing sample size, our simulations yield unexpected insights. Our study suggests potential limitations in the sensitivity of GEA approaches to the considerable expansion in sample size achieved through predicting missing geographical data. Overall, our study provides insights into leveraging incomplete geographical origin data by integrating deep learning with GEA. Our findings indicate the need for further development of GEA approaches to optimize the use of imputed environmental data, such as incorporating regional GEA patterns instead of solely focusing on global associations between allele frequencies and environmental gradients across large-scale landscapes.Publication Genetic architecture underlying the expression of eight α-amylase trypsin inhibitors(2021) El Hassouni, Khaoula; Sielaff, Malte; Curella, Valentina; Neerukonda, Manjusha; Leiser, Willmar; Würschum, Tobias; Schuppan, Detlef; Tenzer, Stefan; Longin, C. Friedrich H.Amylase trypsin inhibitors (ATIs) are important allergens in baker’s asthma and suspected triggers of non-celiac wheat sensitivity (NCWS) inducing intestinal and extra-intestinal inflammation. As studies on the expression and genetic architecture of ATI proteins in wheat are lacking, we evaluated 149 European old and modern bread wheat cultivars grown at three different field locations for their content of eight ATI proteins. Large differences in the content and composition of ATIs in the different cultivars were identified ranging from 3.76 pmol for ATI CM2 to 80.4 pmol for ATI 0.19, with up to 2.5-fold variation in CM-type and up to sixfold variation in mono/dimeric ATIs. Generally, heritability estimates were low except for ATI 0.28 and ATI CM2. ATI protein content showed a low correlation with quality traits commonly analyzed in wheat breeding. Similarly, no trends were found regarding ATI content in wheat cultivars originating from numerous countries and decades of breeding history. Genome-wide association mapping revealed a complex genetic architecture built of many small, few medium and two major quantitative trait loci (QTL). The major QTL were located on chromosomes 3B for ATI 0.19-like and 6B for ATI 0.28, explaining 70.6 and 68.7% of the genotypic variance, respectively. Within close physical proximity to the medium and major QTL, we identified eight potential candidate genes on the wheat reference genome encoding structurally related lipid transfer proteins. Consequently, selection and breeding of wheat cultivars with low ATI protein amounts appear difficult requiring other strategies to reduce ATI content in wheat products.Publication Genomic prediction in hybrid breeding: I. Optimizing the training set design(2023) Melchinger, Albrecht E.; Fernando, Rohan; Stricker, Christian; Schön, Chris-Carolin; Auinger, Hans-JürgenGenomic prediction holds great promise for hybrid breeding but optimum composition of the training set (TS) as determined by the number of parents (nTS) and crosses per parent (c) has received little attention. Our objective was to examine prediction accuracy (ra) of GCA for lines used as parents of the TS (I1 lines) or not (I0 lines), and H0, H1 and H2 hybrids, comprising crosses of type I0 × I0, I1 × I0 and I1 × I1, respectively, as function of nTS and c. In the theory, we developed estimates for ra of GBLUPs for hybrids: (i)r^a based on the expected prediction accuracy, and (ii) r~a based on ra of GBLUPs of GCA and SCA effects. In the simulation part, hybrid populations were generated using molecular data from two experimental maize data sets. Additive and dominance effects of QTL borrowed from literature were used to simulate six scenarios of traits differing in the proportion (τSCA = 1%, 6%, 22%) of SCA variance in σG2 and heritability (h2 = 0.4, 0.8). Values of r~a and r^a closely agreed with ra for hybrids. For given size NTS = nTS × c of TS, ra of H0 hybrids and GCA of I0 lines was highest for c = 1. Conversely, for GCA of I1 lines and H1 and H2 hybrids, c = 1 yielded lowest ra with concordant results across all scenarios for both data sets. In view of these opposite trends, the optimum choice of c for maximizing selection response across all types of hybrids depends on the size and resources of the breeding program.Publication Hybrid transcriptome sequencing approach improved assembly and gene annotation in Cynara cardunculus (L.)(2020) Puglia, Giuseppe D.; Prjibelski, Andrey D.; Vitale, Domenico; Bushmanova, Elena; Schmid, Karl J.; Raccuia, Salvatore A.Background: The investigation of transcriptome profiles using short reads in non-model organisms, which lack of well-annotated genomes, is limited by partial gene reconstruction and isoform detection. In contrast, long-reads sequencing techniques revealed their potential to generate complete transcript assemblies even when a reference genome is lacking. Cynara cardunculus var. altilis (DC) (cultivated cardoon) is a perennial hardy crop adapted to dry environments with many industrial and nutraceutical applications due to the richness of secondary metabolites mostly produced in flower heads. The investigation of this species benefited from the recent release of a draft genome, but the transcriptome profile during the capitula formation still remains unexplored. In the present study we show a transcriptome analysis of vegetative and inflorescence organs of cultivated cardoon through a novel hybrid RNA-seq assembly approach utilizing both long and short RNA-seq reads. Results: The inclusion of a single Nanopore flow-cell output in a hybrid sequencing approach determined an increase of 15% complete assembled genes and 18% transcript isoforms respect to short reads alone. Among 25,463 assembled unigenes, we identified 578 new genes and updated 13,039 gene models, 11,169 of which were alternatively spliced isoforms. During capitulum development, 3424 genes were differentially expressed and approximately two-thirds were identified as transcription factors including bHLH, MYB, NAC, C2H2 and MADS-box which were highly expressed especially after capitulum opening. We also show the expression dynamics of key genes involved in the production of valuable secondary metabolites of which capitulum is rich such as phenylpropanoids, flavonoids and sesquiterpene lactones. Most of their biosynthetic genes were strongly transcribed in the flower heads with alternative isoforms exhibiting differentially expression levels across the tissues. Conclusions: This novel hybrid sequencing approach allowed to improve the transcriptome assembly, to update more than half of annotated genes and to identify many novel genes and different alternatively spliced isoforms. This study provides new insights on the flowering cycle in an Asteraceae plant, a valuable resource for plant biology and breeding in Cynara and an effective method for improving gene annotation.Publication Optimum breeding strategies using genomic and phenotypic selection for the simultaneous improvement of two traits(2021) Marulanda, Jose J.; Mi, Xuefei; Utz, H. Friedrich; Melchinger, Albrecht E.; Würschum, Tobias; Longin, C. Friedrich H.Selection indices using genomic information have been proposed in crop-specific scenarios. Routine use of genomic selection (GS) for simultaneous improvement of multiple traits requires information about the impact of the available economic and logistic resources and genetic properties (variances, trait correlations, and prediction accuracies) of the breeding population on the expected selection gain. We extended the R package “selectiongain” from single trait to index selection to optimize and compare breeding strategies for simultaneous improvement of two traits. We focused on the expected annual selection gain (ΔGa) for traits differing in their genetic correlation, economic weights, variance components, and prediction accuracies of GS. For all scenarios considered, breeding strategy GSrapid (one-stage GS followed by one-stage phenotypic selection) achieved higher ΔGa than classical two-stage phenotypic selection, regardless of the index chosen to combine the two traits and the prediction accuracy of GS. The Smith–Hazel or base index delivered higher ΔGa for net merit and individual traits compared to selection by independent culling levels, whereas the restricted index led to lower ΔGa in net merit and divergent results for selection gain of individual traits. The differences among the indices depended strongly on the correlation of traits, their variance components, and economic weights, underpinning the importance of choosing the selection indices according to the goal of the breeding program. We demonstrate our theoretical derivations and extensions of the R package “selectiongain” with an example from hybrid wheat by designing indices to simultaneously improve grain yield and grain protein content or sedimentation volume.Publication Order from entropy: big data from FAIR data cohorts in the digital age of plant breeding(2025) Gogna, Abhishek; Arend, Daniel; Beier, Sebastian; Rezaei, Ehsan Eyshi; Würschum, Tobias; Zhao, Yusheng; Chu, Jianting; Reif, Jochen C.Lack of interoperable datasets in plant breeding research creates an innovation bottleneck, requiring additional effort to integrate diverse datasets—if access is possible at all. Handling of plant breeding data and metadata must, therefore, change toward adopting practices that promote openness, collaboration, standardization, ethical data sharing, sustainability, and transparency of provenance and methodology. FAIR Digital Objects, which build on research data infrastructures and FAIR principles, offer a path to address this interoperability crisis, yet their adoption remains in its infancy. In the present work, we identify data sharing practices in the plant breeding domain as Data Cohorts and establish their connection to FAIR Digital Objects. We further link these cohorts to broader research infrastructures and propose a Data Trustee model for federated data sharing. With this we aim to push the boundaries of data management, often viewed as the last step in plant breeding research, to an ongoing process to enable future innovations in the field.Publication Using landscape genomics to infer genomic regions involved in environmental adaptation of soybean genebank accessions(2025) Haupt, Max; Schmid, KarlBackground: Understanding how crops adapt to specific environmental conditions is becoming increasingly important in the face of accelerating climate change, but the genetics of local adaptation remains little understood for many crops. Landscape genomics can reveal patterns of genetic variation that indicate adaptive diversification during crop evolution and dispersal. Here, we examine genetic differentiation and association signatures with environmental gradients in soybean ( Glycine max ) germplasm groups from China that were inferred from the USDA Soybean Germplasm Collection ( N = 17, 019 accessions) based on population structure and passport information. Results: We recover genes previously known to be involved in soybean environmental adaptation and report numerous new candidate genes in adaptation signatures implicated by genomic resources such as the genome annotation and gene expression datasets to function in flowering regulation, photoperiodism and stress reaction cascades. Linkage disequilibrium network analysis suggested functional relationships between genomic regions with signatures of genetic differentiation, consistent with a polygenic nature of environmental adaptation. We tested whether haplotypes associated with environmental adaptation in China were present in 843 North American and 160 European soybean cultivars and found that haplotypes in major genes for early maturity have been selected during breeding, but also that a large number of haplotypes exhibiting putative adaptive variation for cold regions at high latitudes are underrepresented in modern cultivars. Conclusions: Our results demonstrate the value of landscape genomics analysis of genebank accessions studying crop environmental adaptation and to inform future research and breeding efforts for improved adaptation of soybean and other crops to future climates.
