Analysis of wheat spike characteristics using image analysis, machine learning, and genomics
Date
2022
Authors
Hammers, Mikayla, author
Mason, Esten, advisor
Ben-Hur, Asa, committee member
Mueller, Nathan, committee member
Rhodes, Davina, committee member
Journal Title
Journal ISSN
Volume Title
Abstract
Understanding genetics regulating yield component and spike traits can contribute to the development of new wheat cultivars. The flowering pathway in wheat is not entirely known, but spike architecture and its relationship with yield component traits could provide valuable information for crop improvement. Spikelets spike-1 (SPS) has previously been positively associated with kernel number spike (KNS) and negatively correlated with thousand kernel weight, meaning a further understanding of SPS could help unlock full yield potential. While genomics research has improved efficiency over time with the development of techniques such as genotyping by sequencing (GBS), phenotyping remains a labor and time intensive process, limiting the amount of phenomic data available for research. Recently, there has been more interest in generating high-throughput methods for collecting and analyzing phenotypic data. Imaging is a cheap and easily reproducible way to collect data at a specific maturity point or over time, and is a promising candidate for implementing deep learning algorithms to extract traits of interest. For this study, a population of 594 soft red winter wheat (SRWW) inbred lines were evaluated for wheat spike characteristics over two years. Images of wheat spikes were taken in a controlled environment and used to train deep learning algorithms to count SPS. A total of 12,717 images were prepared for analysis and used to train, test, and validate a basic classification and regression convolutional neural network (CNN), as well as a VGG16 and VGG19 regression model. Classification had a low accuracy and did not allow for an assessment of error margins. Regression models were more accurate. Of the regression models, VGG16 had the lowest mean absolute error (MAE) (MAE = 1.09) and mean squared error (MSE) (MSE = 2.08), and the highest coefficient of determination (R2) (R2 = 0.53) meaning it had the best fit of all models. The basic CNN was the next well fit model (MAE = 1.27, MSE = 2.61, r = 0.48) followed by the VGG19 (MAE = 1.32, MSE = 2.98, r = 0.45). With an average error of just above one spikelet, it is possible that counting methods could provide enough data with an accuracy high enough for use in statistical analyses such as genome wide association studies (GWAS), or genomic selection (GS). A GWAS was used to identify markers associated with SPS and yield component traits, while demonstrating the use of genomic selection (GS) for prediction and screening of individuals across multiple breeding programs. The GWAS results indicated similar markers and genotypic regions underpinning both KNS and SPS on chromosome 6A and spike length and SPS on chromosome 7A. It was observed that favorable alleles at each locus were associated with higher KNS and SPS on chromosome 6A and longer wheat spikes with higher SPS on chromosome 7A. Significant markers on 7A were observed in the region near WAPO1, the causal gene for SPS on the long arm of chromosome 7A, indicating they could be associated with that gene. GS results showed promise for whole genome selection, with the lowest prediction accuracy observed for heading date (rgs = 0.30) and the highest for spike area (rgs = 0.62). SPS showed prediction accuracies ranging from 0.33 to 0.42, high enough to aid in the selection process. These results indicate that knowledge of the flowering pathway and wheat spike architecture and how it relates to yield components could be beneficial for making selections and increasing grain yield.