Repository logo
 

A case for context in quantitative ecology: statistical techniques to increase efficiency, accuracy, and equity in biodiversity research

Abstract

The current era of ecological research is characterized by rapid technological innovation, large datasets, and numerous computational and quantitative techniques. Together, big data and advanced computing are expanding our understanding of natural systems, allowing us to capture more complexity in our models, and helping us find solutions for salient challenges facing modern ecology and conservation, including climate change and biodiversity loss. However, large datasets are often characterized by noise, complex observational processes, and other challenges that can impede our ability to apply these data to address ecological research gaps. In each chapter of this dissertation, I seek to address a data problem inherent to the 'big data' that characterizes modern ecological research. Together, they extend the strategies available for addressing a problem facing many ecologists – how to make use of the large volumes of data we are collecting given (1) current computational limitations and (2) specific sampling biases that characterize various methods for data collection. In the first chapter, I present a recursive Bayesian computing (RB) method that can be used to fit Bayesian hierarchical models in sequential MCMC stages to ease computation and streamline hierarchical inference. I also demonstrate the application of transformation-assisted RB (TARB) to a hierarchical animal movement model to create unsupervised MCMC algorithms and obtain inference about individual- and population-level migratory characteristics. This recursive procedure reduced computation time for fitting our hierarchical movement model by half compared to fitting the model with a single MCMC algorithm. Transformation-assisted RB is a relatively accessible method for reducing the computational demands of fitting complex ecological statistical models, like those for animal movement, multi-species systems, or large spatial and temporal scales. Biodiversity monitoring projects that rely on collaborative, crowdsourced data collection are characterized by huge volumes of data that represent a major facet of 'big data ecology,' and quantitative methods designed to use these data for ecological research and conservation represent a leading edge of contemporary quantitative ecology. However, because participants select where to observe biodiversity, crowdsourced data are often influenced by sampling bias, including being biased toward affluent, white neighborhoods in urban areas. Despite the growing evidence of social sampling bias, research has yet to explore how socially driven sampling bias impacts inference and prediction informed by crowdsourced data, or if existing data pre-processing or analytical methods can effectively mitigate this bias. Thus, in Chapters 2 and 3, I explored social sampling bias in data from the crowdsourced avian biodiversity platform eBird. In Chapter 2, I studied patterns of social sampling bias in the locations of eBird "hotspots" to determine whether hotspots in Fresno, California, U.S.A. are more biased by social factors than the locations of Fresno eBird observations overall. My findings support previous work showing that eBird locations are biased by demographics. Further, I found that demographic bias is most pronounced in the locations of hotspots specifically, with hotspots being more likely to occur in areas with higher proportions of non-Hispanic white residents than eBird locations overall. This relationship is reinforced because hotspots in these predominantly white areas also amass more eBird checklists overall than hotspots in areas with more demographic diversity. These findings raise concerns that the eBird hotspot system may be exacerbating spatial bias in sampling and reinforcing patterns of inequity in data availability and eBird participation, by leading to datasets and user-facing maps of birding hotspots that mostly represent predominantly white neighborhoods. Then, in Chapter 3, I investigated the impacts of not accounting for socially biased sampling when using eBird data to study patterns of urban biodiversity. The luxury effect has emerged as a prominent hypothesis in urban ecology, describing a pattern of higher biodiversity associated with greater socioeconomic status observed in many cities. Using eBird data from 2015-2019, I tested whether an avian luxury effect is observed in Raleigh-Durham, North Carolina, U.S.A. before and after accounting for social sampling bias. By jointly modeling sampling intensity and species richness, I found that sampling intensity and species richness are positively correlated and sampling bias influences the estimated relationship between species richness and income. Thus, failing to account for sampling bias can hinder our ability to accurately observe social-ecological dynamics. Additionally, I found that randomly spatially subsampling eBird data prior to analysis, as recommended by existing guidelines to mitigate sampling bias in eBird data, does not reduce biased sampling related to demographics, because there are data gaps in communities of color and low-income communities that cannot be addressed via spatial subsampling. Therefore, it is paramount that crowdsourced and contributory science projects prioritize more equitable participation in their platforms, both for more ethical, equitable practice and because current sampling inequity negatively impacts data quality and project goals. Quantitative techniques can help us understand the complex observational processes influencing ecological data, and each chapter of this dissertation highlights how tailoring statistical or computing methods to these observational contexts can advance ecological knowledge – either by extending the complexity of models we can feasibly fit, as in Chapter 1, or by acknowledging and accounting for sampling inequity, in Chapters 2 and 3. We are all participants actively shaping the ecological processes we observe, and the actions, approaches, and assumptions used in our research reflect societal systems and biases. Data are never objective, and it is dangerous and false to assume that quantitative techniques can take data out of the contexts in which they were collected. Instead, quantitative frameworks that embrace, reflect, and seek to improve the ways in which social and observational contexts inform what is observed can elevate analytical techniques to tools towards more just, inclusive, and transparent ecological research and conservation.

Description

Rights Access

Embargo expires: 05/20/2025.

Subject

computational statistics
sampling bias
citizen science
urban ecology
quantitative ecology

Citation

Associated Publications