Genetic selection for resistance to bovine respiratory disease using pooled DNA approaches
dc.contributor.author | Boldt, Ryan J., author | |
dc.contributor.author | Enns, R. Mark, advisor | |
dc.contributor.author | Speidel, Scott, advisor | |
dc.contributor.author | Keele, John, committee member | |
dc.contributor.author | Kuehn, Larry, committee member | |
dc.contributor.author | McDaneld, Tara, committee member | |
dc.contributor.author | Holt, Tim, committee member | |
dc.date.accessioned | 2025-06-02T15:21:05Z | |
dc.date.available | 2025-06-02T15:21:05Z | |
dc.date.issued | 2025 | |
dc.description.abstract | Bovine Respiratory Disease (BRD) is the costliest disease that affects the beef cattle industry. However, the only methods that are currently available to reduce the incidence of the microbial organisms (viruses and bacteria) that cause BRD are vaccination and antibiotic treatment. Examples using other species and diseases have shown that the selection for resistance to disease is an effective method to reduce the economic burden of that disease on the industry. Due to the challenge of collection of phenotypes for a trait like BRD resistance, one of the best methods for selection could be genomic selection. To try and capture a representative sample of commercial genetic makeup of the beef industry, samples for the study were collected from a commercial harvest facilities. To reduce overall genotyping costs, samples were genotyped using a pooled DNA approach. While pooled DNA has been used previously to identify genomic regions that differentiate based on disease status, this has not been done for animals that showed symptoms for BRD during the post weaning period. Therefore, the objectives of this research were to, 1) examine different analysis techniques for pooled DNA information, and 2) identify across breed SNP that are significant for identifying animals more likely to develop clinical signs of BRD. To investigate the first objective of the dissertation, two separate analyses were done. The first analysis evaluated the number of SNPs used to calculate a genomic relationship matrix. While using DNA pooling does reduce the cost of genotyping by grouping samples, the cost could potentially be further reduced by using SNP chips with lower density. For the analysis, 106 pools comprised of 96 individuals each were genotyped using a high-density genomic panel that contained 777,962 SNP. To evaluate the use of lower density SNP chip on pooled DNA analyses, 50 replications of number of SNP from 500 to 770,000 were sampled randomly. For each level and replication, the resulting genomic relationship matrix was compared to the full relationship matrix calculated from 776,749 SNP, after individual SNP were removed for minor allele frequency <0.05. To calculate the equivalence of the matrices, the genomic relationship matrix calculated from the reduced number of SNP was multiplied by the Eigenvalues and Eigenvectors of the genomic relationship matrix formed from all SNP. After this multiplication, the variance of the Eigenvalues of the reduced matrix was standardized by the full matrix variance of the Eigenvalues of the resulting matrix was calculated. The closer the resulting variance is to 0 both matrices were considered to be proportional to one another. When examining the resulting Eigenvalues variances after 2,000 SNP the reduction of variance decreased in magnitude. These results suggest that a low-density panel may be used for pooled DNA data and for calculating genomic relationship matrices. The second analysis that was conducted to address the first objective looked at alternative analysis techniques for identification of simulated important SNP at varying levels of allelic prevalence and effect size. For the analysis, 100 random SNP across all chromosomes were selected to act as the significant SNP among the approximately 770,000 SNP available on the BovineHD chip. All SNP pooling allele frequencies (PAF) were simulated using a beta distribution. For the 100 significant SNP, the PAF were then modified based on differing levels of prevalence and the effect that the disease-causing SNP would have. For prevalence levels from 0.10 to 0.90, increments of 0.10 were simulated and for effect of the SNP values from 0.01 to 0.50 were simulated in increments of 0.01. For each of the 450 combinations of prevalence and effect, two different models were applied to the same dataset. The first model type was a GWAS analysis that has previously been applied to this data type. Under this model each SNP is tested via an F-test. The dependent variable for this analysis was the PAF and the fixed effect was a binary classification of if a pool was a case or a control. Additionally, a relationship matrix was calculated to account for any population stratification that was occurring in the simulated dataset. For each F-test, a p-value was calculated. The second type of analysis that was conducted was a Random Forrest analysis. For the Random Forrest the same number of trees, terminal node size, and number of explanatory variables to try at each node were applied to all combinations. The optimal number was determined to be 2,000 trees, a terminal node size of 1, and to try 60,000 explanatory variables. For each of the combinations the results were ranked based on lowest p-value and highest variable importance factor for the GWAS and Random Forrest analysis, respectively. From there, the top 100 most significant SNP were compared, and the number of pre-identified significant SNP were counted within the subset. Across all levels of prevalence each model was able to identify a subset of the most significant SNP. Across all levels of prevalence, the Random Forrest model started? identifying significant SNP at lower levels of effect of the disease-causing allele. Random Forest model started identifying significant SNP at lower levels of the disease-causing allele. At low (0.10, 0.20, 0.30) and high levels (0.70, 0.80, 0.90) prevalence levels the traditional GWAS model was able to identify a higher number of significant SNP at high effect levels. Whereas at moderate prevalence levels (0.40, 0.50, 0.60) the Random Forest model more correctly identified a larger number of the significant SNP. To address objective two, several analyses were run looking at estimating SNP effects to identify informative variants for selection against development of BRDC. For this analysis samples were collected from three large commercial processing plants in Colorado and Nebraska. DNA samples were collected from ears when the animals were harvested. Samples for the study were collected over a four-year period. For pooling, punches were removed from each ear, and animals were sorted into either a case or control pool. Within each individual pool 96 animals were represented. For each case a corresponding control from the same group from the feedlot was also collected. In total 106 pools were constructed representing 10,176 animals across all pools with a matching case and control strategy. DNA was extracted using a Quigen Kit and pools were sent to Neogen (Lincoln, NE) for genotyping on a Bovine SNP chip that contained approximately 770,000 individual SNP. For each SNP and each pool, a PAF was calculated. To account for population stratification in the analysis a covariance matrix among pools, PAF was calculated. Mixed model methodology was used to solve for effects in the model. In the first analysis, each individual SNP was examined. For each individual SNP an F-test was performed to test for significance. Additionally, analyses were performed using SNP groups. SNP groups were formed using 100, 500, and 1,000 SNP regions. For each region a distance matrix based on the PAF for SNPs in the region was calculated. This was then used as a response variable for an ANOVA analysis. Fixed effects were the A matrix to account for population stratification as well as 2 x 106 matrix to signify if an animal was either in a case or control pool. For all analysis types, no significant SNP were discovered. Additionally, several regions that have been previously reported to be significantly associated with BRDC in previous studies were also examined. To see if similar signal was being picked up, SNP were ranked from being estimated as the most significant to least significant and compared to previous results. Among the previously reported results there were regions on BTA16 (70-71), BTA16 (70-71), BTA14 (9-10), and BTA8 (63-64) that were among the top 1% of most significant SNP in the single SNP analyses. However, in the grouped SNP analyses none of these regions were in the top 1% of significant SNP. Other regions that have been previously identified in other papers were either not in the top 1% of SNP in any analysis or had p-values that were 0.85 or greater. | |
dc.format.medium | born digital | |
dc.format.medium | doctoral dissertations | |
dc.identifier | Boldt_colostate_0053A_18628.pdf | |
dc.identifier.uri | https://hdl.handle.net/10217/241011 | |
dc.language | English | |
dc.language.iso | eng | |
dc.publisher | Colorado State University. Libraries | |
dc.relation.ispartof | 2020- | |
dc.rights | Copyright and other restrictions may apply. User is responsible for compliance with all applicable laws. For information about copyright law, please see https://libguides.colostate.edu/copyright. | |
dc.title | Genetic selection for resistance to bovine respiratory disease using pooled DNA approaches | |
dc.type | Text | |
dcterms.rights.dpla | This Item is protected by copyright and/or related rights (https://rightsstatements.org/vocab/InC/1.0/). You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s). | |
thesis.degree.discipline | Animal Sciences | |
thesis.degree.grantor | Colorado State University | |
thesis.degree.level | Doctoral | |
thesis.degree.name | Doctor of Philosophy (Ph.D.) |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Boldt_colostate_0053A_18628.pdf
- Size:
- 1.1 MB
- Format:
- Adobe Portable Document Format