THESIS FROM OCCURRENCES TO PREDICTORS: MODELING SPECIES DISTRIBUTION AND ENVIRONMENTAL DRIVERS OF NYMPHALIDAE Submitted by Melissa Morales Graduate Degree Program in Ecology In partial fulfillment of the requirements For the Degree of Master of Science Colorado State University Fort Collins, Colorado Fall 2025 Master’s Committee: Advisor: Greg Newman Ruth Hufbauer Seth Davis Copyright by Melissa Morales 2025 All Rights Reserved ii ABSTRACT FROM OCCURRENCES TO PREDICTORS: MODELING SPECIES DISTRIBUTION AND ENVIRONMENTAL DRIVERS OF NYMPHALIDAE Anthropogenic climate change poses growing threats to global biodiversity. These threats put pollinators at risk, which affects their ecological and agricultural roles. Butterflies in the Nymphalidae family act as both pollinators and indicators of environmental change, making them valuable models for ecological research. The reliability of occurrence data--from professional or participatory--significantly shapes species distribution modeling outcomes. Chapter 1 evaluates the comparative performance of professional, participatory, and combined occurrence datasets in modeling Nymphalidae distributions across two climatically sensitive North American ecoregions, the Western Cordillera and South Central Semi-Arid Prairies. Using MaxEnt and Random Forest algorithms across four temporal bins (2008-2022), we demonstrate that Random Forest consistently outperforms MaxEnt in predictive performance. Spatial autocorrelation analyses revealed fundamental differences between data sources: participatory records exhibited strong spatial clustering (Moran’s I = 0.36-0.65), while professional records showed weak to moderate clustering (Moran’s I = 0.13-0.32), reflecting systematic differences in sampling efforts. Random Forest models built with professional datasets achieved the highest performance (AUC = 0.984; TSS = 0.944), while combined datasets offered the best trade-off between spatial coverage and predictive strength. iii Chapter 2 applies these methodological insights to Vanessa cardui (Nymphalidae), a migratory butterfly and ecological generalist, examining species distribution across the same ecoregions. Random Forest models again achieved strong predictive performance (AUC = 0.968; TSS = 0.895). Variable importance analyses identified precipitation seasonality, maximum temperature of the warmest month, and mean diurnal range as the strongest predictors of species distribution. Residual diagnostics revealed systematic deviations at low and high suitability values, highlighting challenges in predicting rare outcomes. However, removing abundance- related bias shifted environmental relationships: population change was more strongly associated with temperature seasonality and diurnal variation, while consistently warm conditions showed negative correlations. Spatial predictions revealed persistent but patchy areas of suitability from 2008-2022, concentrated in both urban-adjacent and remote landscapes, with overall suitable habitat comprising a small proportion of the total study area. Together, this research demonstrates both the potential of integrating diverse datasets with machine learning to assess species distributions and the importance of selecting appropriate datasets, as outcomes and interpretations highly depend on data source and quality. Simultaneously, it emphasizes the methodological and interpretive challenges posed by sampling bias and residual patterns. By identifying key environmental drivers and spatial hotspots, these models inform future monitoring and conservation strategies for butterflies in the face of accelerating environmental change. iv ACKNOWLEDGEMENTS I want to express my gratitude to my advisor, Dr. Greg Newman, for his guidance and support. I appreciate the incredible opportunities he has provided me with. You are truly exceptional at what you do, and I thank you for inspiring me to become a better scientist and person. I would like to sincerely thank my committee members, Dr. Ruth Hufbauer and Dr. Seth Davis, for their expertise and guidance throughout my research, it has helped me grow in ways that I will carry with me throughout my career. Additional thanks to those at CSU's Geospatial Centroid and the C.P. Gillette Museum for their resources, space, and time. To Dr. John Leiser, thank you for being an incredible first mentor in this field and for being the first educator to ever offer me genuine support; your example continues to inspire me today. To Dr. Joshua Lord, thank you for your mentorship and for the opportunities that have shaped my path and the scientist I am today. To the National Parks, whose beauty and presence continues to guide my journey, it has been one of my greatest privileges to work alongside and for you. These lands kindled my love for the natural world, and in return, I feel a deep responsibility to protect public lands and honor the irreplaceable wonders of Mother Nature and all the species who call her home. Thank you to my family for your endless love and unwavering support. To my Mom and Dad, thank you for your embrace and for always being my guiding light. From the bottom of my heart, thank you for letting me discover my path and for supporting v and cheering me on every step of the way. My path would have been much more difficult without your sacrifice in moving our family from New York City to the Pocono Mountains. I’m lovingly indebted to you both--thank you for being the most amazing parents that I could ask for. To my younger siblings, thank you for the countless laughs and the much-needed tears, no matter the hour. To my brother, I’m so excited to see you stepping into the environmental science field and can’t wait to see all the incredible opportunities that come your way. This field urgently needs passionate, capable minds like yours, and I know you’ll give your absolute best to make a real difference. To my sister, if I could struggle through high school and still end up where I am today, then you--of all people-- can achieve anything you set your heart and mind to. Be patient and kind with yourself; I’m excited to see where your wildest dreams take you. You both have my eternal love, support, and friendship. Last but certainly not least, I owe endless thanks to my best friend and the love of my life, Tommy, for your unwavering love, support, and companionship. I could not have completed this marathon without you. Thank you for believing in me and guiding me forward with patience, care, and encouragement at every turn. I’m grateful for every breather we took during each paddle, in all the cabins, at every picnic, and on each trail from the mountains to the deserts and everywhere in between. We’ve shared so many incredible chapters since high school, and I’m so excited for all the ones still to come. vi TABLE OF CONTENTS ABSTRACT ................................................................................................................. ii ACKNOWLEDGEMENTS ............................................................................................ iv CHAPTER 1: BUTTERFLIES AND BIAS: UNDERSTANDING NYMPHALIDAE DISTRIBUTIONS THROUGH DIVERSE DATA LENSES ……………………………....….1 Introduction .................................................................................................................. 1 Methods ....................................................................................................................... 4 Ecoregions and Occurrence Data ............................................................................. 4 Population Density .................................................................................................... 6 Environmental Variables ........................................................................................... 7 Fishnet Grid with Zonal Statistics ............................................................................ 10 Background points .................................................................................................. 10 Spatial Autocorrelation Analysis .............................................................................. 11 SDM ....................................................................................................................... 12 MaxEnt ............................................................................................................... 12 Random Forest ................................................................................................... 12 Model Performance Metrics .................................................................................... 13 Results ...................................................................................................................... 14 Trend Analysis for Occurrences ........................................................................... 14 Spatial Autocorrelation Analysis for Occurrence Datasets .................................... 14 SDM: MaxEnt ...................................................................................................... 15 Participatory Dataset Performance ...................................................................... 16 Professional Dataset Performance ...................................................................... 16 Combined Dataset Performance .......................................................................... 16 MaxEnt Heatmaps ............................................................................................... 17 SDM: Random Forest .......................................................................................... 19 Participatory Dataset Performance ...................................................................... 19 Professional Dataset Performance ...................................................................... 19 vii Combined Dataset Performance .......................................................................... 20 Random Forest Heatmaps .................................................................................. 20 Discussion ................................................................................................................. 23 REFERENCES .......................................................................................................... 27 CHAPTER 2: PATTERNS IN PAINTED LADIES: RANDOM FOREST INSIGHTS INTO DISTRIBUTION AND ENVIRONMENTAL DRIVERS ...………………………….………. 34 Introduction ................................................................................................................ 34 Methods ..................................................................................................................... 36 Ecoregions and Occurrence Data ........................................................................... 36 Environmental Variables ......................................................................................... 36 Background Points and Data Preparation ................................................................ 37 Random Forest Modeling ........................................................................................ 37 Model Evaluation .................................................................................................... 37 Species Distribution ................................................................................................ 38 Visualization and Output ......................................................................................... 38 Results ...................................................................................................................... 38 Discussion ................................................................................................................. 46 REFERENCES .......................................................................................................... 50 1 CHAPTER 1: BUTTERFLIES AND BIAS: UNDERSTANDING NYMPHALIDAE DISTRIBUTIONS THROUGH DIVERSE DATA LENSES Introduction Species occurrence data, which record the presence or absence of a species at a specific location and time, are fundamental to ecological research, management, and policy. These georeferenced datasets allow researchers to study biodiversity patterns, community assembly, ecosystem functioning, and species-environment relationships, and they are essential for conservation planning, including identifying biodiversity hotspots, evaluating vulnerability to climate change, and designing protected areas (Franklin, 2023; Jetz et al., 2019; Petersen et al., 2021; Zhang et al., 2022). Occurrence data are widely accessible through networks such as the Global Biodiversity Information Facility (GBIF), which compiles contributions from natural history museums, research institutions, and participatory science platforms like iNaturalist. While museums remain important, participatory contributions are rapidly increasing and, in some cases, surpassing professional collections, particularly for birds (Wagner et al., 2021; Galván, Barrientos et al., 2022). Recent advances in machine-based pattern recognition have enabled the creation of occurrence maps for over 600,000 species using GBIF data, with strong alignment to expert-generated maps across taxa (Dasgupta et al., 2024), reinforcing the value of occurrence data in biodiversity research and conservation. As global climates shift, species distribution models (SDM), also known as habitat suitability or climate envelope models, have become essential for forecasting where species are likely to persist under environmental change (Guisan & Thuiller, 2 2005; Elith & Leathwick, 2009). Using occurrence data and abiotic variables to define ecological niches, SDM estimate past, present, and future ranges. When aggregated across species, they inform broader richness and diversity analyses, making them crucial to segments of conservation planning (Franklin, 2013). Despite their usefulness, SDM carry uncertainty. Biases in species selection, study region, and uneven sampling can influence results (Thomas et al., 2004; Loiselle et al., 2003). Data sources also shape outcomes: participatory science platforms such as iNaturalist, eBird, and Zooniverse provide vast coverage but can overrepresent accessible areas, whereas museum collections mainly reflect historical sampling patterns (Beck et al., 2014). Most SDM emphasize abiotic variables while neglecting biotic interactions (Araújo & Luoto, 2007) or microhabitat features (Dormann et al., 2018; Guisan et al., 2017), resulting in less reliable predictions in some situations. Advances in machine learning have expanded SDM accessibility, but these methods highlight correlations without demonstrating causality, unlike mechanistic approaches that require more data and expertise (Arif & MacNeil, 2022; Baker et al., 2017). To strengthen predictive capacity, this study applies two widely used algorithms: MaxEnt and Random Forest. MaxEnt is a presence-only method that contrasts environmental conditions at observed occurrence points with background locations, making it effective when only opportunistic or spatially biased presence data are available (Syfert et al., 2013). Random Forest is a presence-absence classifier that integrates pseudo-absence points into an ensemble of decision trees, capturing nonlinear relationships and remaining robust to noise and correlation among predictors 3 (Zhao et al., 2022). Their complementary strengths make them valuable for evaluating the reliability of participatory versus professional datasets in SDM. Professional data sources--such as museum collections, research sites, and public lands--provide decades of spatially and temporally rich information with high taxonomic accuracy, which is especially important for groups that are difficult to identify, such as invertebrates (Suarez & Tsutsui, 2004; Spear et al., 2017; Shultz et al., 2021; Konowalik & Nosol, 2021; Turley et al., 2024). However, they are often biased toward areas of scientific interest and remain only partially digitized (Lazagabaster et al., 2024; Fisher-Phelps et al., 2017; Barends et al., 2020). Participatory science platforms such as iNaturalist and CitSci.org broaden datasets through casual to structured observations, often with photo documentation and ID support (Pocock et al., 2014). They provide broad coverage and long-term monitoring (e.g., Bald Eagle Watch; Bove et al., 2024), while also engaging the public in biodiversity stewardship. Yet challenges remain, including observer bias, inconsistent sampling, and retention of participants (Kosmala et al., 2016; Johnston et al., 2023). Professional science offers taxonomic rigor and historical depth, while participatory science expands spatial and temporal coverage; integrating both with quality assurance can maximize their complementary strengths (Araújo et al., 2019; Rocchini et al., 2023). Given the growing use of participatory science in ecological research, it is essential to evaluate how different data sources influence SDM performance, particularly for Lepidoptera, which are sensitive to environmental change, well-studied, and widely observed. Few studies have directly compared SDM built from participatory versus professional sources for butterflies in the Western United States, leaving an 4 important gap. To address this, this study asks: Do SDM built with participatory data differ in accuracy from those built with professional data? Using Nymphalidae butterflies as a focal group, I compare models based on participatory and professionally collected records, applying both MaxEnt and Random Forest, to evaluate whether participatory data can achieve predictive accuracy and ecological reliability comparable to professional data. Methods Ecoregions and Occurrence Data Understanding ecological changes benefits from analyzing species within ecoregions rather than political boundaries such as counties or states (Omernik, 2004). This study focused on the Western Cordillera and South Central Semi-Arid Prairies Level II ecoregions in the United States, extracted from the U.S. EPA Ecoregion shapefile (using coordinate system WGS 84) employing ArcGIS Pro (Figure 1). I excluded the Canadian portion of the Western Cordillera because the available PRISM climate data (1961-1990) did not align with this study’s temporal scope (2008-2022). 5 Figure 1: Map of the study area showing the South Central Semi-Arid Prairies and Western Cordillera ecoregions, delineated using the U.S. EPA Level II Ecoregions shapefile. These two distinct ecological regions form the spatial extent for the SDM in this study. I downloaded occurrence data from GBIF (GBIF.org, 2023), querying Nymphalidae records within a polygon covering both ecoregions from 2008 through 2022. This query returned records from 70 datasets, however, datasets outside of the United States were removed. Using GBIF metadata, I classified datasets as either participatory initiatives (no formal training for collectors) or professional initiatives (involving some form of training) (Table 1). 6 I created three datasets: professional, participatory, and combined (Table 1), each subdivided into four fixed time bins. Using identical temporal bins allowed direct comparison between all datasets, since each model represented the same time frame. Table 1: Number of Nymphalidae occurrence records from professional and participatory datasets, as well as a separate combined dataset integrating both sources, summarized across four temporal bins (2008-2011, 2012-2015, 2016-2019, and 2020- 2022). Fixed year bins Professional Records Participatory Records Combined Records 2008-2011 1,226 1,343 2,569 2012-2015 3,961 5,852 9,813 2016-2019 2,505 46,547 49,052 2020-2022 2,810 79,145 81,955 Sum of all bins 10,502 132,887 143,389 Population Density I downloaded the 2020 human population density data for North America from ArcGIS Online. The Center for International Earth Science Information Network at Columbia University developed the source dataset, the Gridded Population of the World, Version 4 (GPWv4), in 2018, with a spatial resolution of 30 arc-seconds per 7 pixel. I then imported the population density layer into ArcGIS Pro and overlaid it with ecoregions to highlight areas of highest population density. Environmental Variables Land cover data provides critical information on habitat availability, quality, and connectivity for Nymphalidae butterflies (Tzortzakaki et al., 2019; Diengdoh et al., 2023). As a categorical variable, land cover represents vegetation type, urban areas, and water bodies, all of which influence butterfly population dynamics and distribution (Tong et al., 2025; Shrestha et al., 2024; Senapathi et al., 2015; Huang et al., 2024; Munisi et al., 2024). I obtained land cover data from the National Land Cover Dataset (NLCD) via the U.S. Geological Survey (USGS) through the Living Atlas Portal in ArcGIS Pro, and I processed them to restrict coverage to the study regions. The dataset spans a time series of land cover categories from 2001 to 2021. I included topographic variables (i.e., elevation, slope, and aspect) because they shape microclimatic conditions, habitat structure, and resource availability (Popović et al., 2021; Shrestha et al., 2024; Svancara et al., 2019; Rodríguez-Castañeda et al., 2019). Elevation strongly influences temperature, moisture, and sunlight exposure, which in turn affect butterfly survival and reproduction (Mahata et al., 2023). I obtained elevation data from USGS Earth Explorer, mosaicked the raster’s in ArcGIS Pro, and clipped them to the study ecoregions. From this layer, I derived slope and aspect using ArcGIS spatial analyst tools. I also included bioclimatic variables, which describe patterns of temperature and precipitation that directly influence butterfly physiology, life cycles, and host plant 8 availability (Hill et al., 2021; Wells & Tonkyn, 2018). These variables are essential for modeling species distribution under climate change scenarios. To generate them, I downloaded precipitation (PPT), maximum temperature (TMAX), and minimum temperature (TMIN) data (2008-2022) from the PRISM database at Oregon State University. After importing and clipping the datasets to the ecoregions in ArcGIS Pro, I used the R package bioclim (Booth et al., 2014) to calculate 19 bioclimatic variables (Table 2). 9 Table 2: Bioclimatic variables using PPT, TMIN and TMAX from PRISM. Variable Name Description BIO1 Annual Mean Temperature Mean of all monthly temperatures BIO2 Mean Diurnal Range Mean of monthly (max temp - min temp) BIO3 Isothermality (BIO2 / BIO7) × 100; day-night temp oscillation vs annual range BIO4 Temperature Seasonality Standard deviation of monthly temps ×100; measures temperature variability BIO5 Max Temperature of Warmest Month Highest average maximum temperature in the warmest month BIO6 Min Temperature of Coldest Month Lowest average minimum temperature in the coldest month BIO7 Temperature Annual Range BIO5 - BIO6; overall temperature range BIO8 Mean Temperature of Wettest Quarter Average temp of the 3-month period with highest precipitation BIO9 Mean Temperature of Driest Quarter Average temp of the 3-month period with lowest precipitation BIO10 Mean Temperature of Warmest Quarter Average temperature during the warmest quarter BIO11 Mean Temperature of Coldest Quarter Average temperature during the coldest quarter BIO12 Annual Precipitation Total yearly precipitation BIO13 Precipitation of Wettest Month Total precipitation in the wettest month BIO14 Precipitation of Driest Month Total precipitation in the driest month BIO15 Precipitation Seasonality Coefficient of variation (CV) of monthly precipitation BIO16 Precipitation of Wettest Quarter Total precipitation during the wettest quarter BIO17 Precipitation of Driest Quarter Total precipitation during the driest quarter BIO18 Precipitation of Warmest Quarter Total precipitation during the warmest quarter BIO19 Precipitation of Coldest Quarter Total precipitation during the coldest quarter 10 Fishnet Grid with Zonal Statistics I created a uniform fishnet grid using the WGS 84 coordinate system, with cells measuring 0.054° × 0.054° (≈6 km × 6 km at the equator), an appropriate resolution for regional-scale ecological modeling. To align the grid with ecologically relevant boundaries, I clipped it to an ecoregion shapefile covering the Western Cordillera and South Central Semi-Arid Prairies. I then exported the clipped grid as a shapefile to ensure consistent spatial referencing and integration with other datasets. For each grid cell, I extracted zonal statistics of all environmental variables using the exactextractr package in R (Baston, 2020). This method computed minimum, mean, and maximum values of raster-based covariates overlapping each cell, allowing precise aggregation of spatial data while preserving environmental variation across heterogeneous landscapes. Background points I generated background points for MaxEnt through simple random sampling across the entire study area using a predefined fishnet grid. This approach ensured an unbiased representation of the available environmental space and avoided spatial or environmental bias that can arise from non-random sampling or observer behavior. By distributing background points uniformly and independently of presence data, the model more accurately distinguished suitable habitat conditions from the broader landscape (Merow et al., 2013). This method established a neutral baseline for comparison, allowing the model to contrast presence locations against a consistent environmental background without overrepresenting heavily sampled or easily accessible regions. For 11 Random Forest, I generated pseudo-absence points from the fishnet grid at a ratio of five background points for every presence point. Spatial Autocorrelation Analysis I evaluated the spatial independence of model predictions and residual errors for both MaxEnt and Random Forest models using Moran’s I statistics, which quantify the degree to which residuals are spatially clustered or dispersed. In SDM, significant residual autocorrelation can signal model misspecification, sampling bias, or unaccounted environmental structures. I applied Moran’s I to residuals from each fixed year bin to assess whether participatory or professional collections produced spatially biased errors. I hypothesized that models developed from participatory collections would differ significantly from those based on professional collections, while the alternative hypothesis predicted no significant difference between data sources. Variable Selection and Multicollinearity Assessment I initially ran MaxEnt with all 24 environmental variables to assess their contributions, then applied variable selection to reduce multicollinearity and improve interpretability. For MaxEnt, I used Variance Inflation Factor (VIF) analysis and retained only predictors with VIF < 10; for Random Forest, I applied correlation-based filtering with the findCorrelation function (r > 0.85). From this process, I selected ten biologically relevant and statistically independent predictors, all with exceptionally low VIF values (< 1), ensuring independent contributions to the models. The final predictors included climatic variables (BIO3, BIO4, BIO7, BIO14, BIO15, BIO18, BIO19), topography (elevation, slope), and land cover. This approach produced ecologically robust, 12 interpretable models for both MaxEnt and Random Forest with minimal redundancy among predictors. SDM MaxEnt I implemented MaxEnt models using linear, quadratic, and hinge features (lqh) with a regularization multiplier (β) of 1.0, randomly sampling 10,000-20,000 background points depending on dataset size (combined datasets used 20,000). I excluded background points from fishnet grids missing required variables (e.g., bioclimatic, elevation, land cover). I ran separate models for each dataset (participatory, professional, combined) and fixed year bin (2008-2011, 2012-2015, 2016-2019, 2020- 2022) to evaluate temporal changes in performance, variable importance, and predicted distributions. I assessed model performance using 5-fold spatial block cross-validation with 6 × 6 km stratified blocks and repeated fold assignments over 100 iterations. Performance metrics included Area Under the Curve, True Skill Statistic and Akaike Information Criterion. Random Forest I implemented Random Forest (RF) models to predict Nymphalidae distributions and compare results with MaxEnt. For each dataset (participatory, professional, combined) and fixed year bin, I ran RF using the same predictors selected for MaxEnt, with 500 trees, default node size, and cross-validated mtry optimization. I capped sampling at 50,000 points and used a fixed seed to ensure reproducibility of pseudo- absence selection. I evaluated model performance with Area Under the Curve, True 13 Skill Statistic, and Cohen’s Kappa, and calculated Moran’s I on residuals using k- nearest neighbors (k = 10). I generated predictions, performance metrics, and variable rankings for each fixed year bin to allow direct comparisons with MaxEnt results. Model Performance Metrics I assessed the predictive performance and reliability of MaxEnt and Random Forest models using multiple evaluation metrics. Both models were evaluated with Area Under the Curve (AUC) and True Skill Statistic (TSS) to provide standardized measures for cross-model comparison. AUC quantifies a model’s ability to distinguish between presence and absence (or pseudo-absence), ranging from 0.5 (random) to 1.0 (perfect discrimination). TSS accounts for both sensitivity and specificity, ranging from -1 to 1, where 1 indicates flawless classification and values below 0 indicate no better than random performance. For each model, I included a third, model-specific metric to complement AUC and TSS: for MaxEnt, I calculated the corrected Akaike Information Criterion (AICc), which balances goodness-of-fit against model complexity and small sample sizes, with lower values indicating more parsimonious models and reduced overfitting risk. For Random Forest, I calculated Cohen’s Kappa, which measures agreement between predicted and observed presence/absence values beyond chance, ranging from -1 (complete disagreement) to 1 (perfect agreement). AICc does not apply to Random Forest because it is a non-parametric ensemble method, while Cohen’s Kappa could be applied to MaxEnt only by imposing an arbitrary threshold on continuous suitability scores, introducing subjectivity. In addition to Cohen’s Kappa, Random Forest performance was also evaluated using out-of-bag (OOB) error. This measure is calculated during model training by predicting data excluded from each 14 bootstrap sample, thereby providing an unbiased estimate of prediction error without requiring a separate test dataset, although such datasets were also employed in this study. AUC remains the preferred metric for presence-only models because it evaluates predictive performance without threshold dependence. Taken together, these metrics provide a robust evaluation of model performance and allow me to test the hypothesis of whether SDM based on participatory data differ in accuracy from those based on professional collection data. Results Trend Analysis for Occurrences Professional records declined modestly across time periods, decreasing by 17% from 2,423 records in 2008-2011 to 2,014 in 2012-2015, then by 11% to 1,782 records in 2016-2019, before slightly recovering with a 9% increase to 1,948 records in 2020- 2022. In contrast, participatory records expanded rapidly, rising by 229% from 2,215 (2008-2011) to 7,276 (2012-2015), then by 168% to 19,470 (2016-2019), and finally by 236% to 65,402 records in 2020-2022. The combined dataset mirrors the participatory trajectory after 2011, reaching 67,350 records in the most recent period. Participatory became the dominant driver of data availability for Nymphalidae, contributing more than 97% of records by 2020-2022. Spatial Autocorrelation Analysis for Occurrence Datasets Moran's I analysis revealed significant spatial clustering (p < 0.001) across all datasets and time periods, with distinct patterns between data sources (Table 3). Professional data exhibited weak to moderate spatial autocorrelation (Moran's I = 0.127- 0.319), while participatory data showed moderate to strong clustering (Moran's I = 15 0.362-0.655). Professional data maintained weak autocorrelation during 2008-2019, shifting to moderate clustering in 2020-2022 (Moran's I = 0.319). The combined dataset closely mirrored participatory patterns as its dominance increased, ranging from Moran's I = 0.298 in 2008-2011 to a maximum of 0.711 in 2016-2019. Despite large differences in sampling intensity (1,226-79,145 presences), background points remained relatively constant across datasets and periods, indicating stable geographic extent. Standard errors of Moran's I estimates were inversely related to sample size across the range of 1,226 to 79,145 observations. These results indicate that the spatial distribution of errors differs between data sources, with participatory data exhibiting systematically different spatial patterns compared to professional data. Table 3: Moran’s I results for participatory, professional, and combined datasets. Dataset Time period Moran's I Interpretation Participatory 2008-2011 0.3615 Moderate spatial clustering 2012-2015 0.6546 Strong spatial clustering 2016-2019 0.6507 Strong spatial clustering 2020-2022 0.6142 Strong spatial clustering Professional 2008-2011 0.1266 Weak spatial clustering 2012-2015 0.1902 Weak spatial clustering 2016-2019 0.1851 Weak spatial clustering 2020-2022 0.3188 Moderate spatial clustering Combined 2008-2011 0.2981 Moderate spatial clustering 2012-2015 0.6192 Strong spatial clustering 2016-2019 0.7106 Strong spatial clustering 2020-2022 0.6797 Strong spatial clustering SDM: MaxEnt MaxEnt models exhibited distinct performance trajectories across data sources. Participatory models achieved the highest overall performance levels and demonstrated 16 the most consistent temporal improvement, while professional models showed greater variability with notable performance fluctuations mid-series. Combined dataset models displayed intermediate characteristics, benefiting from increased sample sizes while maintaining stable performance metrics. All models achieved AUC values above 0.7, indicating acceptable to excellent discriminatory ability, with TSS values approaching or exceeding the 0.6 threshold for good model performance in later periods across most datasets. Participatory Dataset Performance These MaxEnt models showed progressively improved performance across time periods (Table 4). AUC values increased from 0.804 in 2008-2011 to 0.873 in 2020- 2022. TSS values improved from 0.494 to 0.625, with the final two periods meeting or exceeding the 0.6 threshold for good model performance. AICc values declined from 107.1 to 72.1, reflecting more parsimonious models. A temporary AICc increase during 2012-2015 was followed by substantial declines. Professional Dataset Performance These MaxEnt models showed variable performance across periods (Table 4). AUC values fluctuated from 0.719 in 2008-2011 to a peak of 0.863 in 2020-2022, with a dip to 0.752 in 2016-2019. TSS values followed a similar pattern, reaching 0.668 in the final period. AICc values peaked at 117.5 in 2016-2019. Combined Dataset Performance These MaxEnt models demonstrated progressively improving performance (Table 4). AUC values increased from 0.773 in 2008-2011 to a peak of 0.859 in 2016- 17 2019, then slightly declined to 0.851 in 2020-2022. TSS values consistently improved from 0.447 to 0.585, approaching the 0.6 threshold in later periods. AICc values decreased steadily from 106.4 to 73.4. MaxEnt Heatmaps MaxEnt species distribution predictions showed substantial temporal variation across datasets and periods (Figure 2). Participatory models exhibited progressive spatial expansion, with high-probability areas in 2008-2011 spreading broadly across the western regions by 2020-2022. In contrast, professional models displayed range contraction, shifting from extensive Pacific and Rocky Mountain distributions early on to spatially restricted Western Cordillera predictions later, specifically in Idaho. Combined dataset predictions were intermediate, gradually expanding over time with the most extensive suitable habitat in 2020-2022. Unlike Random Forest models, MaxEnt predictions reflected considerable temporal shifts in suitable habitat, with changing probability distributions over the 15-year study period. 18 19 Figure 2: MaxEnt species distribution predictions for Nymphalidae across four time periods (2008-2011, 2012-2015, 2016-2019, 2020-2022). Each row demonstrates results from participatory (left), professional (center), and combined datasets (right). Colors represent species distribution probabilities from 0.00 (dark blue, unsuitable) to 1.00 (red, highly suitable) across the Western Cordillera and South Central Semi-Arid Prairies ecoregions. SDM: Random Forest All Random Forest models demonstrated strong performance with Moran's I values remaining low across datasets and time periods (-0.003 to 0.055), indicating minimal spatial autocorrelation in residuals and effective capture of spatial patterns. Professional models exhibited slightly higher Moran's I values (0.021-0.037) than participatory models, though all remained well below the threshold of concern (>0.1). Participatory Dataset Performance These Random Forest models performed strongly and improved over time (Table 4). AUC values rose from 0.811 in 2008-2011 to a peak of 0.953 in 2016-2019, declining slightly to 0.946 in 2020-2022. TSS values increased from 0.527 to 0.804-0.812 in later periods. Cohen's Kappa rose from 0.593 to 0.799-0.813. OOB error rates remained below 10%, with the lowest in 2012-2015 (6.0%). Professional Dataset Performance These Random Forest models consistently produced the highest model performance across all periods (Table 4). AUC values rose from 0.895 to a peak of 0.984 in 2020-2022. TSS values improved from 0.745 to 0.944, reaching exceptional performance above 0.8 in the latter three periods. Cohen's Kappa increased from 0.808 20 to 0.943. OOB error rates declined from 4.7% to 1.7%, confirming superior performance. Combined Dataset Performance These Random Forest models displayed strong performance with distinct temporal patterns (Table 4). AUC values increased from 0.885 to 0.953 in 2016-2019, with slight decline to 0.948 in 2020-2022. TSS values rose to 0.855 in 2012-2015, then declined but remained above the 0.6 threshold in all periods. Cohen's Kappa peaked at 0.839 in 2012-2015. OOB error rates showed a bimodal pattern across time periods, with the lowest error rate in 2012-2015 (4.8%) and higher rates in 2008-2011 (7.2%) and the two later periods (9.4% and 9.7%). Moran's I declined over time from modest early values to near zero in recent years, reflecting improved spatial structure capture. Random Forest Heatmaps Random Forest species distribution predictions showed remarkable temporal stability across all datasets and time periods (Figure 3). Participatory models maintained consistent spatial patterns throughout all periods. Professional models exhibited similar stability, displaying nearly identical spatial distributions across time. Combined dataset predictions mirrored this pattern of temporal consistency, showing stable species distributions with minimal variation between periods. Unlike MaxEnt models, Random Forest predictions demonstrated little temporal expansion or contraction of suitable habitat, maintaining consistent geographic boundaries and probability distributions across the 15-year study period regardless of increasing sample sizes or changing data composition. 21 22 Figure 3: Random Forest species distribution predictions for Nymphalidae across four time periods (2008-2011, 2012-2015, 2016-2019, 2020-2022). Each row demonstrates results from participatory (left), professional (center), and combined datasets (right). Colors represent probability percentiles from lowest suitability (dark blue) to highest suitability (red, top 1%) across the Western Cordillera and South Central Semi-Arid Prairies ecoregions. Table 4: Comparative performance metrics for MaxEnt and Random Forest (RF) SDM evaluating Nymphalidae species distribution across four time periods (2008-2011, 2012- 2015, 2016-2019, 2020-2022) using participatory, professional, and combined datasets. Metrics include AUC, TSS, and AICc for MaxEnt, and AUC, TSS, Kappa, Moran’s I, and OOB Error for Random Forest. Dataset Period MaxEnt AUC MaxEnt TSS MaxEnt AICc RF AUC RF TSS RF Kappa RF Moran's I RF OOB Error Participatory 2008- 2011 0.804 0.494 107.1 0.8115 0.5278 0.5928 0.054537 0.08835 2012- 2015 0.859 0.594 111.3 0.9292 0.8047 0.8 0.032446 0.06042 2016- 2019 0.867 0.6 92.3 0.9535 0.8129 0.8128 -0.00383 0.09504 2020- 2022 0.873 0.625 72.1 0.9463 0.7778 0.7806 0.006225 0.0978 Professional 2008- 2011 0.719 0.4 107 0.8959 0.7456 0.8081 0.021986 0.04737 2012- 2015 0.759 0.463 110.3 0.969 0.8767 0.8839 0.028512 0.02938 2016- 2019 0.752 0.421 117.5 0.9569 0.8844 0.8934 0.033361 0.03068 2020- 2022 0.863 0.668 111.6 0.9849 0.9444 0.9439 0.037674 0.01753 Combined 2008- 2011 0.773 0.447 106.4 0.8853 0.735 0.7619 0.035022 0.07227 2012- 2015 0.814 0.517 101.5 0.9482 0.8555 0.8393 0.043175 0.04818 2016- 2019 0.859 0.579 84.2 0.9532 0.8108 0.8108 0.008574 0.09441 2020- 2022 0.851 0.585 73.4 0.948 0.7846 0.7868 0.007289 0.09702 23 Discussion This study demonstrates that participatory and professional data contribute differently to SDM for Nymphalidae, and that combining them can provide complementary benefits. Participatory data increased nearly 80-fold from 1,343 records in 2008-2011 to over 79,000 in 2020-2022, reflecting heightened public engagement. This growth largely reflects the scale and distributed participation of citizen science programs, which often generate more occurrence records than professional surveys (Goldstein et al., 2024; Theobald et al., 2015; Di Cecco et al., 2021), as seen in eBird (Sullivan et al., 2014). Beyond size, participatory science enhances monitoring capacity and public engagement (Cooper et al., 2014; Kays et al., 2020; Bonney, 2021; Callaghan et al., 2021). Professional datasets remained stable (1,226-3,961 records), potentially constrained by funding and staffing, and by 2020-2022 participatory records comprised 96.5% of the combined dataset, highlighting spatial bias and the need for structured participatory approaches (Johnston et al., 2023). Spatial analyses revealed that participatory and combined datasets were moderately to strongly clustered (Moran's I = 0.36-0.65), reflecting possible bias toward accessible areas, whereas professional data were more evenly distributed (Moran's I = 0.12-0.31). Strong spatial autocorrelation can violate independence assumptions and inflate apparent model performance (Santos-Fernandez & Mengersen, 2021; Aiello- Lammens et al., 2015; Steen et al., 2021; Boria et al., 2013), and participatory data tend to cluster in urban or accessible regions (Leong & Trautwein, 2019). MaxEnt models mirrored these differences. Participatory models improved over time (AUC: 0.804→0.873; TSS: 0.494→0.625) but became diffuse, reflecting 24 overgeneralization from spatial bias and rapid sample growth. Professional models produced restricted, ecologically targeted predictions with high performance (AUC = 0.863, TSS = 0.668). Combined datasets yielded geographically stable predictions capturing both fine-scale habitat associations and broader coverage (AUC = 0.773- 0.859, TSS = 0.447-0.585), though heatmaps still showed overgeneralization from participatory bias. These results indicate that participatory and professional datasets complement rather than replicate each other, with participatory data influenced by accessibility rather than systematic design (Mandeville et al., 2022). Random Forest outperformed MaxEnt. Professional Random Forest models achieved the highest performance (AUC = 0.984, TSS = 0.944, Cohen's Kappa = 0.944), underscoring the value of structured, high-quality data. Expertise remains important: trained observers often detect more species per survey than volunteers, particularly those difficult to identify, due to skills untrained observers may lack (Farr et al., 2023). However, validated participatory data can produce Random Forest models comparable to professional data (Matutini et al., 2021). Participatory Random Forest models improved over time, peaking in 2016-2019 (AUC = 0.953) before minor declines from clustered, noisy observations. Combined datasets leveraged complementary strengths, providing broader coverage while mitigating some spatial bias, though Random Forest’s correlation-based approach may reduce generalizability with heterogeneous sources. Visual inspection of the heatmaps highlight that MaxEnt predictions from participatory data expanded temporally, likely reflecting sampling artifacts, while professional models remained restricted and high confidence. Combined models 25 produced intermediate patterns that integrated both sampling approaches, though they maintained the excessive generalizability that obscured fine-scale habitat distinctions. Random Forest predictions showed temporal stability in core habitat identification (e.g., Pacific coast, Rockies, Texas Prairies), indicating reduced susceptibility to sampling bias and better spatial specificity than the overgeneralized MaxEnt distributions. In this study, high-probability areas in participatory models often coincide with dense human populations (Figure 4), indicating observation density may reflect accessibility rather than species distribution. Debates about participatory data quality persist. Aceves-Bueno et al. (2017) argued that fewer than two-thirds of participatory datasets meet minimum accuracy standards, though critiques note substantial variation across projects and taxa (Specht & Lewandowski, 2018). Integrating participatory and professional data provides broader ecological coverage than either source alone, as highlighted in comparative analyses of insect datasets (Díaz-Calafat et al., 2024). Overall, SDM differs by data source. Professional data yield high-quality, ecologically realistic models even with limited samples, while participatory data expand coverage, improving performance but potentially introducing clustered and spatial bias (Pocock et al., 2014; Pocock et al., 2023). Integrating datasets with bias-correction techniques (e.g., spatial thinning or block cross-validation) merges broad coverage with high-confidence observations. Future research should refine integration strategies, address spatial autocorrelation, and evaluate ecological realism using independent validation and spatially explicit frameworks. Model choice should align with data: MaxEnt suits presence-only or smaller datasets, capturing broad patterns, whereas 26 Random Forest handles larger, heterogeneous datasets, producing stable predictions robust to spatial bias. Figure 4: Human population density map for 2020 across the Western Cordillera and South Central Semi-Arid Prairies ecoregions. Colors indicate population density from low (white, 0-1 people/km²) to high (dark red, 1,000-30,000 people/km²). 27 REFERENCES Aceves-Bueno, E., Adeleye, A. S., Feraud, M., Huang, Y., Tao, M., Yang, Y., & Anderson, S. E. (2017). The accuracy of citizen science data: a quantitative review. Bulletin of the Ecological Society of America, 98(4), 278-290. Aiello‐Lammens, M. E., Boria, R. A., Radosavljevic, A., Vilela, B., & Anderson, R. P. (2015). spThin: an R package for spatial thinning of species occurrence records for use in ecological niche models. Ecography, 38(5), 541-545. Allouche, O., Tsoar, A., & Kadmon, R. (2006). Assessing the accuracy of species distribution models: prevalence, kappa and the true skill statistic (TSS). Journal of Applied Ecology, 43(6), 1223-1232. Araújo, M. B., & Luoto, M. (2007). The importance of biotic interactions for modelling species distributions under climate change. Global Ecology and Biogeography, 16(6), 743-753. Araújo, M. B., Anderson, R. P., Márcia Barbosa, A., Beale, C. M., Dormann, C. F., Early, R., ... & Rahbek, C. (2019). Standards for distribution models in biodiversity assessments. Science Advances, 5(1), eaat4858. Arenas-Castro, S., Goncalves, J., Alves, P., Alcaraz-Segura, D., & Honrado, J. P. (2018). Assessing the multi-scale predictive ability of ecosystem functional attributes for species distribution modelling. PLoS ONE, 13(6), e0199292. Arif, S., & MacNeil, M. A. (2022). Predictive models aren't for causal inference. Ecology Letters, 25(8), 1741-1745. Baker, B., Gupta, O., Raskar, R., & Naik, N. (2017). Accelerating neural architecture search using performance prediction. arXiv preprint arXiv:1705.10823. Baker, E., Drury, J. P., Judge, J., Roy, D. B., Smith, G. C., & Stephens, P. A. (2021). The verification of ecological citizen science data: current approaches and future possibilities. Citizen Science: Theory and Practice, 6(1). Barends, J. M., Pietersen, D. W., Zambatis, G., Tye, D. R., & Maritz, B. (2020). Sampling bias in reptile occurrence data for the Kruger National Park. Koedoe: African Protected Area Conservation and Science, 62(1), 1-9. Baston, D. (2020). exactextractr: Fast Extraction from Raster Datasets using Polygons. R package version 0.5.0. https://CRAN.R-project.org/package=exactextractr Beck, J., Böller, M., Erhardt, A., & Schwanghart, W. (2014). Spatial bias in the GBIF database and its effect on modeling species' geographic distributions. Ecological Informatics, 19, 10-15. Booth, T. H., Nix, H. A., Busby, J. R., & Hutchinson, M. F. (2014). BIOCLIM: the first species distribution modelling package, its early applications and relevance to most current MAXENT studies. Diversity and Distributions, 20(1), 1-9. Bonney, R. (2021). Expanding the impact of citizen science. BioScience, 71(5), 448-451. https://cran.r-project.org/package=exactextractr 28 Boria, R. A., Olson, L. E., Goodman, S. M., & Anderson, R. P. (2014). Spatial filtering to reduce sampling bias can improve the performance of ecological niche models. Ecological Modelling, 275, 73-77. Bove, D. J., Anderson, H. A., Smith, M. A., & Kuhn, T. A. (2024). Nesting bald eagle population numbers, density, territorial resources, and relationship to human development in northern Colorado's Front Range. Western Birds, 55(1). Brown, E. D., & Williams, B. K. (2019). The potential for citizen science to produce reliable and useful information in ecology. Conservation Biology, 33(3), 561-569. Callaghan, C. T., Poore, A. G., Mesaglio, T., Moles, A. T., Nakagawa, S., Roberts, C., ... & Cornwell, W. K. (2021). Three frontiers for the future of biodiversity research using citizen science data. BioScience, 71(1), 55-63. Castañeda, S., Botello, F., Sánchez-Cordero, V., & Sarkar, S. (2019). Spatio-temporal distribution of monarch butterflies along their migratory route. Frontiers in Ecology and Evolution, 7, 400. Chandler, M., See, L., Copas, K., Bonde, A. M., López, B. C., Danielsen, F., ... & Turak, E. (2017). Contribution of citizen science towards international biodiversity monitoring. Biological Conservation, 213, 280-294. Cooper, C. B., Shirk, J., & Zuckerberg, B. (2014). The invisible prevalence of citizen science in global research: migratory birds and climate change. PLoS ONE, 9(9), e106508. Couvet, D., & Prevot, A. C. (2015). Citizen-science programs: Towards transformative biodiversity governance. Environmental Development, 13, 39-45. Dasgupta, S., Blankespoor, B., & Wheeler, D. (2024). Toward Better Conservation: A Spatial Analysis of Species Occurrence Data from the Global Biodiversity Information Facility. Earth System Science Data Discussions, 2024, 1-27. Díaz-Calafat, J., Jaume-Ramis, S., Soacha, K., Álvarez, A., & Piera, J. (2024). Revealing biases in insect observations: A comparative analysis between academic and citizen science data. PLoS ONE, 19(7), e0305757. Di Cecco, G. J., Barve, V., Belitz, M. W., Stucky, B. J., Guralnick, R. P., & Hurlbert, A. H. (2021). Observing the observers: How participants contribute data to iNaturalist and implications for biodiversity science. BioScience, 71(11), 1179-1188. Dickinson, J. L., Shirk, J., Bonter, D., Bonney, R., Crain, R. L., Martin, J., ... & Purcell, K. (2012). The current state of citizen science as a tool for ecological research and public engagement. Frontiers in Ecology and the Environment, 10(6), 291-297. Diengdoh, V. L., Ondei, S., Amin, R. J., Hunt, M., & Brook, B. W. (2023). Landscape functional connectivity for butterflies under different scenarios of land-use, land-cover, and climate change in Australia. Biological Conservation, 277, 109825. Dormann, C. F., Bobrowski, M., Dehling, D. M., Harris, D. J., Hartig, F., Lischke, H., ... & Kraan, C. (2018). Biotic interactions in species distribution modelling: 10 questions to guide 29 interpretation and avoid false conclusions. Global Ecology and Biogeography, 27(9), 1004-1016. Elith, J., & Leathwick, J. R. (2009). Species distribution models: ecological explanation and prediction across space and time. Annual Review of Ecology, Evolution, and Systematics, 40(1), 677-697. Farr, C. M., Ngo, F., & Olsen, B. (2023). Evaluating data quality and changes in species identification in a citizen science bird monitoring project. Citizen Science: Theory and Practice, 8(1). Feldman, M. J., Imbeau, L., Marchand, P., Mazerolle, M. J., Darveau, M., & Fenton, N. J. (2021). Trends and gaps in the use of citizen science derived data as input for species distribution models: A quantitative review. PLoS ONE, 16(3), e0234587. Ferro, M. L., & Flick, A. J. (2015). "Collection Bias" and the importance of natural history collections in species habitat modeling: A case study using Thoracophorus costalis Erichson (Coleoptera: Staphylinidae: Osoriinae), with a critique of GBIF.org. The Coleopterists Bulletin, 69(3), 415-425. Fisher-Phelps, M., Cao, G., Wilson, R. M., & Kingston, T. (2017). Protecting bias: across time and ecology, open-source bat locality data are heavily biased by distance to protected area. Ecological Informatics, 40, 22-34. Franklin, J. (2013). Species distribution models in conservation biogeography: developments and challenges. Diversity and Distributions, 19(10), 1217-1223. Franklin, J. (2023). Species distribution modelling supports the study of past, present and future biogeographies. Journal of Biogeography, 50(9), 1533-1545. Fritz, S., See, L., & Grey, F. (2022). The grand challenges facing environmental citizen science. Frontiers in Environmental Science, 10, 1019628. Galván, S., Barrientos, R., & Varela, S. (2022). No bird database is perfect: citizen science and professional datasets contain different and complementary biodiversity information. Ardeola, 69(1), 97-114. GBIF.org (11 December 2023) GBIF Occurrence Download https://doi.org/10.15468/dl.nrgwxj Geldmann, J., Heilmann‐Clausen, J., Holm, T. E., Levinsky, I., Markussen, B. O., Olsen, K., ... & Tøttrup, A. P. (2016). What determines spatial bias in citizen science? Exploring four recording schemes with different proficiency requirements. Diversity and Distributions, 22(11), 1139-1149. Goldstein, B. R., Stoudt, S., Lewthwaite, J. M., Shirey, V., Mendoza, E., & Guzman, L. M. (2024). Logistical and preference bias in participatory science butterfly data. Frontiers in Ecology and the Environment, 22(8), e2783. Guisan, A., & Thuiller, W. (2005). Predicting species distribution: offering more than simple habitat models. Ecology Letters, 8(9), 993-1009. https://doi.org/10.15468/dl.nrgwxj 30 Guisan, A., Thuiller, W., & Zimmermann, N. E. (2017). Habitat suitability and distribution models: with applications in R. Cambridge University Press. Hansen, S. E., Monfils, M. J., Hackett, R. A., Goebel, R. T., & Monfils, A. K. (2024). Data‐ centric species distribution modeling: Impacts of modeler decisions in a case study of invasive European frog‐bit. Applications in Plant Sciences, 12(3), e11573. Hill, G. M., Kawahara, A. Y., Daniels, J. C., Bateman, C. C., & Scheffers, B. R. (2021). Climate change effects on animal ecology: butterflies and moths as a case study. Biological Reviews, 96(5), 2113-2126. Huang, S., Lin, Y., Dong, J., Lin, Y., Su, Z., Li, J., ... & Fu, W. (2024). Relationship between Plant Habitat Types and Butterfly Diversity in Urban Mountain Parks. Forests, 15(8), 1390. Jetz, W., McGeoch, M. A., Guralnick, R., Ferrier, S., Beck, J., Costello, M. J., ... & Turak, E. (2019). Essential biodiversity variables for mapping and monitoring species populations. Nature Ecology & Evolution, 3(4), 539-551. Johnston, A., Matechou, E., & Dennis, E. B. (2023). Outstanding challenges and future directions for biodiversity monitoring using citizen science data. Methods in Ecology and Evolution, 14(1), 103-116. Kays, R., Lasky, M., Parsons, A. W., Pease, B., & Pacifici, K. (2021). Evaluation of the spatial biases and sample size of a statewide citizen science project. Citizen Science: Theory and Practice, 6(1). Konowalik, K., & Nosol, A. (2021). Evaluation metrics and validation of presence-only species distribution models based on distributional maps with varying coverage. Scientific Reports, 11(1), 1482. Kosmala, M., Wiggins, A., Swanson, A., & Simmons, B. (2016). Assessing data quality in citizen science. Frontiers in Ecology and the Environment, 14(10), 551-560. Lazagabaster, I. A., Thomas, C. D., Spedding, J. V., Ikram, S., Solano‐Regadera, I., Snape, S., & Bro‐Jørgensen, J. (2024). Evaluating species distribution model predictions through time against paleozoological records. Ecology and Evolution, 14(10), e70288. Leong, M., & Trautwein, M. (2019). A citizen science approach to evaluating US cities for biotic homogenization. PeerJ, 7, e6879. Loiselle, B. A., Howell, C. A., Graham, C. H., Goerck, J. M., Brooks, T., Smith, K. G., & Williams, P. H. (2003). Avoiding pitfalls of using species distribution models in conservation planning. Conservation Biology, 17(6), 1591-1600. Mahata, A., Panda, R. M., Dash, P., Naik, A., Naik, A. K., & Palita, S. K. (2023). Microclimate and vegetation structure significantly affect butterfly assemblages in a tropical dry forest. Climate, 11(11), 220. Mandeville, C. P., Nilsen, E. B., & Finstad, A. G. (2022). Spatial distribution of biodiversity citizen science in a natural area depends on area accessibility and differs from other recreational area use. Ecological Solutions and Evidence, 3(4), e12185. 31 Matutini, F., Baudry, J., Pain, G., Sineau, M., & Pithon, J. (2021). How citizen science could improve species distribution models and their independent assessment. Ecology and Evolution, 11(7), 3028-3039. Merow, C., Smith, M. J., & Silander Jr, J. A. (2013). A practical guide to MaxEnt for modeling species' distributions: what it does, and why inputs and settings matter. Ecography, 36(10), 1058-1069. Munisi, E. J., Masenga, E. H., Nkwabi, A. K., Kiwango, H. R., & Mjingo, E. E. (2024). Butterfly Abundance and Diversity in Different Habitat Types in the Usangu Area, Ruaha National Park. Psyche: A Journal of Entomology, 2024(1), 8833655. Omernik, J. M. (2004). Perspectives on the nature and definition of ecological regions. Environmental Management, 34(Suppl 1), S27-S38. Paseka, R. E., White, L. A., Van de Waal, D. B., Strauss, A. T., González, A. L., Everett, R. A., ... & Borer, E. T. (2020). Disease-mediated ecosystem services: pathogens, plants, and people. Trends in Ecology & Evolution, 35(8), 731-743. Petersen, T. K., Speed, J. D., Grøtan, V., & Austrheim, G. (2021). Species data for understanding biodiversity dynamics: The what, where and when of species occurrence data collection. Ecological Solutions and Evidence, 2(1), e12048. Pinto-Ledezma, J. N., & Cavender-Bares, J. (2021). Predicting species distributions and community composition using satellite remote sensing predictors. Scientific Reports, 11(1), 16448. Pocock, M. J., Chapman, D. S., Sheppard, L. J., & Roy, H. E. (2014). Choosing and using citizen science: A guide to when and how to use citizen science to monitor biodiversity and the environment. NERC/Centre for Ecology & Hydrology. https://nora.nerc.ac.uk/id/eprint/510644/ Pocock, M. J., Hamlin, I., Christelow, J., Passmore, H. A., & Richardson, M. (2023). The benefits of citizen science and nature‐noticing activities for well‐being, nature connectedness and pro‐nature conservation behaviours. People and Nature, 5(2), 591- 606. Popović, M., Micevski, B., & Verovnik, R. (2021). Effects of elevation gradient and aspect on butterfly diversity on Galičica Mountain in the Republic of Macedonia (south-eastern Europe). Animal Biodiversity and Conservation, 44(1), 67-78. Rocchini, D., Tordoni, E., Marchetto, E., Marcantonio, M., Barbosa, A. M., Bazzichetto, M., ... & Malavasi, M. (2023). A quixotic view of spatial bias in modelling the distribution of species and their diversity. npj Biodiversity, 2(1), 10. Rodríguez-Castañeda, G., Hof, A. R., Jansson, R., & Harding, L. E. (2012). Predicting the fate of biodiversity using species' distribution models: Enhancing model comparability and repeatability. PLoS ONE, 7(9), e44402. https://doi.org/10.1371/journal.pone.0044402 https://nora.nerc.ac.uk/id/eprint/510644/ https://doi.org/10.1371/journal.pone.0044402 32 Santos‐Fernandez, E., & Mengersen, K. (2021). Understanding the reliability of citizen science observational data using item response models. Methods in Ecology and Evolution, 12(8), 1533-1548. Senapathi, D., Carvalheiro, L. G., Biesmeijer, J. C., Dodson, C. A., Evans, R. L., McKerchar, M., ... & Potts, S. G. (2015). The impact of over 80 years of land cover changes on bee and wasp pollinator communities in England. Proceedings of the Royal Society B: Biological Sciences, 282(1806), 20150294. Shrestha, B. R., Baral, S., Budha‐Magar, S., Thapa Magar, K., Gaudel, P., Suwal, S. P., ... & Shrestha, P. (2024). Vegetation productivity determines the response of butterflies along elevation gradients in the trans‐Himalayas, Nepal. Ecosphere, 15(10), e70019. Shirey, V., Larsen, E., Doherty, A., Kim, C. A., Al-Sulaiman, F. T., Hinolan, J. D., ... & Ries, L. (2022). LepTraits 1.0 A globally comprehensive dataset of butterfly traits. Scientific Data, 9(1), 382. Shultz, A. J., Adams, B. J., Bell, K. C., Ludt, W. B., Pauly, G. B., & Vendetti, J. E. (2021). Natural history collections are critical resources for contemporary and future studies of urban evolution. Evolutionary Applications, 14(1), 233-247. Spear, D. M., Pauly, G. B., & Kaiser, K. (2017). Citizen science as a tool for augmenting museum collection data from urban areas. Frontiers in Ecology and Evolution, 5, 86. Specht, H., & Lewandowski, E. (2018). Biased assumptions and oversimplifications in evaluations of citizen science data quality. Bulletin of the Ecological Society of America, 99(2), 251-256. Steen, V. A., Tingley, M. W., Paton, P. W., & Elphick, C. S. (2021). Spatial thinning and class balancing: Key choices lead to variation in the performance of species distribution models with citizen science data. Methods in Ecology and Evolution, 12(2), 216-226. Suarez, A. V., & Tsutsui, N. D. (2004). The value of museum collections for research and society. BioScience, 54(1), 66-74. Sullivan, B. L., Aycrigg, J. L., Barry, J. H., Bonney, R. E., Bruns, N., Cooper, C. B., ... & Kelling, S. (2014). The eBird enterprise: An integrated approach to development and application of citizen science. Biological Conservation, 169, 31-40. Svancara, L. K., Abatzoglou, J. T., & Waterbury, B. (2019). Modeling current and future potential distributions of milkweeds and the monarch butterfly in Idaho. Frontiers in Ecology and Evolution, 7, 168. Syfert, M. M., Smith, M. J., & Coomes, D. A. (2013). The effects of sampling bias and model complexity on the predictive performance of MaxEnt species distribution models. PLoS ONE, 8(2), e55158. Theobald, E. J., Ettinger, A. K., Burgess, H. K., DeBey, L. B., Schmidt, N. R., Froehlich, H. E., ... & Parrish, J. K. (2015). Global change and local solutions: Tapping the unrealized potential of citizen science for biodiversity research. Biological Conservation, 181, 236- 244. 33 Thomas, C. D., Cameron, A., Green, R. E., Bakkenes, M., Beaumont, L. J., Collingham, Y. C., & Williams, S. E. (2004). Extinction risk from climate change. Nature, 427(6970), 145- 148. Thuiller, W., Pollock, L. J., Gueguen, M., & Münkemüller, T. (2015). From species distributions to meta‐communities. Ecology Letters, 18(12), 1321-1328. Tong, X. Y., Dong, R., & Zhu, X. X. (2025). Global high categorical resolution land cover mapping via weak supervision. ISPRS Journal of Photogrammetry and Remote Sensing, 220, 535-549. Turley, N. E., Kania, S. E., Petitta, I. R., Otruba, E. A., Biddinger, D. J., Butzler, T. M., ... & López-Uribe, M. M. (2024). Bee monitoring by community scientists: comparing a collections-based program with iNaturalist. Annals of the Entomological Society of America, 117(4), 220-233. Tzortzakaki, O., Kati, V., Panitsa, M., Tzanatos, E., & Giokas, S. (2019). Butterfly diversity along the urbanization gradient in a densely-built Mediterranean city: Land cover is more decisive than resources in structuring communities. Landscape and Urban Planning, 183, 79-87. Wagner, D. L. (2020). Insect declines in the Anthropocene. Annual Review of Entomology, 65(1), 457-480. WallisDeVries, M. F., Baxter, W., & Van Vliet, A. J. (2011). Beyond climate envelopes: effects of weather on regional population trends in butterflies. Oecologia, 167(2), 559-571. Wells, C. N., & Tonkyn, D. (2018). Changes in the geographic distribution of the Diana fritillary (Speyeria diana: Nymphalidae) under forecasted predictions of climate change. Insects, 9(3), 94. Zhang, W., Bussmann, R. W., Li, J., Liu, B., Xue, T., Yang, X., ... & Yu, S. (2022). Biodiversity hotspots and conservation efficiency of a large drainage basin: Distribution patterns of species richness and conservation gaps analysis in the Yangtze River Basin, China. Conservation Science and Practice, 4(4), e12653. Zhao, Z., Xiao, N., Shen, M., & Li, J. (2022). Comparison between optimized MaxEnt and Random Forest modeling in predicting potential distribution: A case study with Quasipaa boulengeri in China. Science of the Total Environment, 842, 156867. 34 CHAPTER 2: PATTERNS IN PAINTED LADIES: RANDOM FOREST INSIGHTS INTO DISTRIBUTION AND ENVIRONMENTAL DRIVERS Introduction Vanessa cardui, commonly known as the Painted Lady, belongs to the Nymphalidae family and is one of the world’s most widespread butterflies, found on every continent except Antarctica, while mainly absent from Australia and South America (Talavera & Vila, 2017). As a generalist pollinator, Vanessa cardui contributes to biodiversity by visiting a wide variety of flowers, particularly species in the Asteraceae family (e.g., rabbitbrush, purple coneflower, dandelion, and Canada thistle), along with species of Fabaceae and Scrophulariaceae (Scott, 1992). The caterpillars and adults serve as food for numerous animals and host parasites and parasitoids, while larval host plants, such as thistles, hold additional ecological significance (Stefanescu, 2023; Lampert et al., 2014; Muchoney et al., 2022; Hernandez & Bowers, 2024). The species’ mobility, ecological breadth, and availability of occurrence data make it an ideal focal taxon for modeling species distribution across diverse ecoregions in the United States. Importantly, Vanessa cardui also functions as a sensitive indicator of ecosystem change, reflecting the effects of habitat fragmentation, drought, agricultural intensification, and climate-driven range shifts (Menchetti et al., 2019; Stefanescu et al., 2007; Peterson et al., 2019; Hu et al., 2021; Mesler & Mabry, 2024; Granato et al., 2024). Like many insects (Sánchez-Bayo & Wyckhuys, 2019), Vanessa cardui is vulnerable to intensifying effects of anthropogenic change. Urban development and 35 agricultural intensification eliminate essential habitat patches, concentrating populations into fewer locations where disease transmission and demographic stress increase (Chowdhury et al., 2021; Di Mauro, 2004). Climate change exacerbates these pressures by altering temperature and precipitation cycles, with extremes impairing butterfly development and movement (Tüzün et al., 2018; Bristow et al., 2024). Winter periods (December-March) already impose demographic constraints that climate shifts may further intensify (Menchetti et al., 2019). Meanwhile, herbicide-driven thistle declines threaten larval food plants and long-term stability (Chowdhury et al., 2021). Studying a mobile pollinator like Vanessa cardui is therefore crucial for understanding landscape connectivity, ecosystem resilience, and how multiple stressors interact across prairie, alpine, and subalpine ecosystems. Despite its ecological importance, significant gaps remain in understanding Vanessa cardui habitat distribution patterns within the United States. While migration behaviors and general distributions are documented (Abbott, 1951; Stefanescu et al., 2007; Talavera & Vila, 2017; Stefanescu et al., 2017), no studies to our current knowledge have applied machine learning approaches such as Random Forest to predict Vanessa cardui species distribution across anywhere in Western North America. In contrast, the monarch butterfly (Danaus plexippus) has been the focus of numerous modeling efforts that leverage advanced statistical methods (Jepsen et al., 2015; Dingle et al., 2005; Belsky & Joshi, 2018; Svancara et al., 2019; Reppert & de Roode, 2018; Brower et al., 2006; Bartel et al., 2011). This imbalance represents a critical knowledge gap, limiting our ability to assess climate change impacts on a globally important migratory species. 36 Building on the established effectiveness of Random Forest modeling for species distribution, this study asks: How can Random Forests best predict Vanessa cardui species distribution across two diverse landscapes? To answer this, we modeled species distribution in two ecologically distinct regions--the Western Cordillera and South Central Semi-Arid Prairies--while identifying key environmental drivers. Methods Ecoregions and Occurrence Data I prepared ecoregion data following the same procedures described in Chapter 1 (see Methods, Chapter 1). I restricted occurrence records to professionally verified Vanessa cardui observations spanning 2008-2022 (GBIF.org, 2023) and excluded all non-focal species. To assess temporal change, I divided records into two periods: Early (2008-2014) and Late (2016-2022), excluding intermediate years to maximize temporal contrast. Environmental Variables I assembled environmental covariates that largely mirrored those described in Chapter 1, including land cover, bioclimatic variables, elevation, slope, and aspect. To capture additional ecological drivers, I incorporated soil and anthropogenic variables. I obtained soil characteristics from the Soil Survey Geographic Database (SSURGO) via the United States Department of Agriculture Geospatial Data Gateway, derived road density from the 2014 U.S. Census TIGER dataset (accessed through ArcGIS Pro's Living Atlas) and obtained impervious surface data from the National Land Cover Database (NLCD) 2019 Impervious Product. I imported all layers into ArcGIS Pro and 37 clipped them to the Western Cordillera and South Central Semi-Arid Prairies ecoregions using the "Extract by Mask" tool. Background Points and Data Preparation I generated background points using stratified random sampling with a 5:1 background-to-presence ratio to balance model efficiency and reliability. To reduce temporal sampling bias, I assigned random years within the study range to background points. I screened environmental predictors for multicollinearity and removed variables with Pearson's r > 0.85 using the caret package (Kuhn, 2008). I retained only complete cases for modeling. I stratified occurrence and background data by time period to preserve temporal representation during training and evaluation. Random Forest Modeling I implemented Random Forest modeling in R using the randomForest package (Liaw, 2002). I trained models with 500 decision trees, setting the number of variables considered at each split (mtry) to the square root of the total number of predictors. I applied bootstrap sampling and used out-of-bag (OOB) samples for internal validation. I also partitioned data into training (80%) and testing (20%) subsets using stratified sampling to balance presence and background points across time periods. Model Evaluation I assessed model performance using both OOB estimates and independent test sets. I calculated accuracy, sensitivity, specificity, precision, F1-score, the Area Under the Curve (AUC), and the True Skill Statistic (TSS) and Cohen’s Kappa. I quantified 38 variable importance using mean decrease in accuracy and mean decrease in Gini impurity and ranked predictors accordingly. Species Distribution I generated predicted species distribution across the study area on a continuous scale from 0 to 1 and categorized values into five classes: Very Low (0.0-0.2), Low (0.2- 0.4), Moderate (0.4-0.6), High (0.6-0.8), and Very High (0.8-1.0). I calculated distributional change between Early (2008-2014) and Late (2016-2022) periods as absolute and proportional differences in predicted abundance per grid cell. To control for sampling bias, I extracted residuals from linear regression models and used them as bias-corrected measures of distributional change. I then calculated correlations between environmental predictors and both raw and residual change metrics to identify potential ecological drivers. Visualization and Output I visualized results using heatmaps of model performance, predictor correlations, and variable importance. I applied hexagonal aggregation to highlight spatial patterns in predicted suitability and reduce ambiguity (Birch et al., 2007). I conducted analyses in R using the randomForest, caret, pROC, dplyr, and ggplot2 packages (Liaw, 2002; Kuhn, 2008; Robin et al., 2011; Wickham et al., 2025; Wickham, 2016). Results Residual diagnostics revealed systematic patterns in model performance (Figures 5 and 6). The distribution of residuals was approximately normal and centered near zero (Figure 5), indicating that the model captured central tendencies effectively for 39 most predictions. However, in the residuals versus fitted values plot (Figure 6), systematic deviations were evident at the extremes, with underestimation at low suitability values and overestimation at high values. The few extreme residual values visible in the distribution (Figure 5) correspond to these systematic deviations, indicating limitations in predicting rare distributional outcomes. Figure 5: Distribution of population change residuals for Vanessa cardui. Histogram shows the frequency distribution of unexplained population changes (residuals) from the SDM. The red dashed line indicates zero change, with most residuals clustered near zero and few extreme values at the distribution tails. 40 . Figure 6: Residuals vs. fitted values plot for Vanessa cardui population change model. Scatter plot showing model residuals (with abundance bias removed) against fitted values, with locally weighted smoothing line (blue) and 95% confidence interval (gray shading). The red dashed line indicates zero residuals. Removing abundance-related bias through residual extraction revealed altered environmental relationships with Vanessa cardui population change compared to raw abundance data (Figure 7). Several bioclimatic variables emerged as positively associated with residuals, with temperature seasonality (BIO4_max, r = 0.17) showing the strongest positive correlation, followed by diurnal temperature range (BIO7_min, r = 0.14) and mean diurnal range (BIO2_min, r = 0.09). Minimum temperature of the coldest month (BIO6_min) shifted from near-zero correlation (r = 0.09) in raw data to a negative 41 correlation (r = -0.24) in residuals, while annual mean temperature (BIO1_min and BIO1_max) also showed stronger negative correlations (r = -0.19 and r = -0.18, respectively). Conversely, maximum temperature of the warmest month (BIO5_max) decreased in correlation strength from r = -0.26 to r = -0.16, and isothermality (BIO3_max) weakened from r = -0.24 to r = -0.15. Figure 7: Environmental correlations with Vanessa cardui population change, comparing raw abundance data versus residuals with abundance bias removed. Correlation coefficients are shown for the top 15 environmental variables, with green indicating positive correlations and red indicating negative correlations. The residual 42 analysis (right column) reveals environmental relationships after controlling for abundance-related biases. RF variable importance analysis identified key environmental predictors driving species distribution patterns (Figure 8). Precipitation seasonality (BIO15_max) and maximum temperature of the warmest month (BIO5_max) emerged as the most important variables based on mean decrease in accuracy, followed by minimum mean diurnal range (BIO2_min). Road density variables (both minimum and maximum) also ranked highly, indicating the importance of anthropogenic landscape features. The standardized importance heatmap revealed that Gini-based importance measures generally aligned with accuracy-based measures, though with some notable differences in ranking, particularly for slope and soil characteristics which showed higher Gini importance relative to their accuracy contributions. 43 44 Figure 8: Variable importance heatmap for Vanessa cardui species distribution model showing standardized importance metrics. Variables are ranked by mean decrease in accuracy (left column) and mean decrease in Gini (right column). Red colors indicate high importance while blue indicates lower relative importance. The top variables include precipitation seasonality (BIO15_max), temperature variables (BIO5_max, BIO2_min), and anthropogenic features (road density). Model evaluation metrics indicated strong predictive performance. The Random Forest model achieved an AUC of 0.968, a True Skill Statistic of 0.895, a Kappa of 0.924, an out-of-bag error of 1.27%, and a test accuracy of 97.93%. Table 6: Evaluation metrics for the Random Forest model of Vanessa cardui species distribution, including AUC, TSS, Cohen’s Kappa, out-of-bag error, and test accuracy. Metric Value AUC 0.968 TSS 0.895 Kappa 0.924 OOB Error (%) 1.27 Test Accuracy (%) 97.93 Between early (2008-2014) and late (2016-2022) periods, a map of species distribution revealed persistence of Vanessa cardui occurrence across the Western Cordillera and South Central Semi-Arid Prairies (Figure 9). Much of the study area shows low suitability (0.0-0.2), with suitable habitat representing a small proportion of 45 the total landscape. Additionally, some suitable areas overlap densely populated regions (such as California's Central Valley, Denver metropolitan area, and the Texas Triangle which encompasses cities like Dallas and Houston), while others occur in less developed regions (e.g., Yellowstone National Park). Figure 9: Species distribution model predictions for Vanessa cardui across western North America. The rescaled species distribution heatmap demonstrates predicted occurrence probability from Random Forest modeling, with values compressed (0-0.2) and expanded (0.2-1.0) to enhance visualization of moderate to high suitability areas. Red areas indicate highest species distribution (0.8-1.0), while blue areas represent lowest species distribution (0.0-0.2). 46 Discussion The Random Forest models effectively captured major distributional dynamics of Vanessa cardui across the Western Cordillera and South Central Semi-Arid Prairies, with high accuracy and discriminatory power underscoring the strength of the framework. However, the resulting systematic residual patterns highlight challenges in predicting rare outcomes. To better interpret these systematic errors and disentangle true ecological signals from site and data specific influences, we examined residual correlations in relation to population changes. The correlation analysis of population change residuals provides compatible insights into these patterns. The weaker residual correlations suggest that raw abundance values partly reflect site-level density differences, inflating apparent climate- population links. This agrees with recent work showing that abundance indices often misrepresent true population size (Numminen et al., 2023). Residuals help correct this bias, isolating subtler and more ecologically meaningful climate relationships. The residual-based correlations align with ecological expectations for Vanessa cardui. A positive response to temperature seasonality and diurnal variation is typical for migratory butterflies, as marked temporal variability generates windows of suitable conditions such as nectar resources and host plant availability (Hu et al., 2021; Saldivar et al., 2022; Stefanescu et al., 2017). Negative associations with warmer baselines suggest that consistently warm conditions may constrain growth, likely via phenological mismatches, reduced diapause cues, or physiological costs (Huang et al., 2024; von Schmalensee et al., 2024; Granato et al., 2024). Residuals also indicate that maximum heat is less limiting than abundance patterns suggested, while thermal stability plays a 47 minor role for this thermally flexible, migratory species. Overall, Vanessa cardui appears to perform better in regions where temperature fluctuates substantially around cooler baselines, favoring environments with pronounced climatic variability over consistently warm or thermally stable areas such as tropical lowlands (Mesler & Mabry, 2024; Saldivar et al., 2022; Stefanescu et al., 2017). These patterns suggest that migratory life history strategies are closely tied to environments with high seasonality, relationships that become clearer once abundance-related biases are removed. Mapping these ecological tendencies geographically, the combined heatmap (2008–2014 and 2016–2022) revealed distinct spatial patterns: some suitable areas overlap densely populated regions (e.g., California’s Central Valley, Denver, Dallas, Houston, Oklahoma City), while many others occur in less developed landscapes (e.g., Yellowstone National Park and the Sangre de Cristo Mountains). These hotspots in remote mountainous and prairie regions suggest monitoring captured ecologically representative habitats, but the concentration of records near urban centers indicates that spatial sampling bias likely remains a limitation affecting predictions in less surveyed regions. Variable importance analyses show that both climatic factors (i.e., precipitation, temperature) and anthropogenic variables (i.e., impervious surface coverage, road density) shaped suitability, along with soil characteristics. The apparent positive association with impervious surfaces likely reflects how human-modified landscapes create the microclimatic and resource heterogeneity that aligns with this species' preference for variable environments, through ornamental plantings, green spaces, and edge habitats (Schueller et al., 2023; Kc, 2025; Nason, 2021) not a direct benefit of impervious surfaces themselves. 48 These results emphasize the importance of maintaining connectivity across the broader landscape to support migratory movements, with persistent high suitability areas providing clear conservation priorities. Areas of declining suitability may benefit from restoration or management to mitigate anthropogenic pressures, particularly where climate remains suitable, but connectivity may have been compromised. Important limitations must also be acknowledged. The analysis does not directly model abundance or density, and occurrence records reflect variation in reporting effort and detection probability in addition to potential spatial sampling bias. Some of the systematic residual patterns indicate reduced reliability in predicting rare outcomes, which is particularly relevant for conservation focused on range margins or refugia. Overall, this study demonstrates how methodological rigor can sharpen ecological understanding and improve conservation outcomes. By employing residual analysis to correct for abundance-related biases, we uncovered climate-population relationships that raw occurrence data obscured: specifically, that Vanessa cardui thrives in environments with high thermal variability rather than consistently warm or stable conditions. This insight, made visible through diagnostic approaches, has direct conservation implications. It suggests that maintaining landscape connectivity across regions with pronounced seasonality should be prioritized over focusing solely on warm, thermally stable areas that superficially appear suitable based on uncorrected abundance patterns. Furthermore, the identification of persistent high-suitability corridors in both remote mountainous terrain and human-modified landscapes underscores the need for conservation strategies that span the rural-urban gradient. As climate change potentially reduces thermal variability in some regions, while increasing 49 it in others, this framework provides a replicable approach for identifying refugia and migration corridors for other wide-ranging species. Ultimately, these findings illustrate that rigorous analytical frameworks--particularly those that account for observational biases--are essential for translating species distribution models into actionable conservation priorities that reflect true ecological requirements rather than artifacts of sampling design. 50 REFERENCES Abbott, C. H. (1951). A quantitative study of the migration of the painted lady butterfly, Vanessa cardui L. Ecology, 32(2), 155-171. Ahlström, A., Raupach, M. R., Schurgers, G., Smith, B., Arneth, A., Jung, M., ... & Zeng, N. (2015). The dominant role of semi-arid ecosystems in the trend and variability of the land CO2 sink. Science, 348(6237), 895-899. Bauer, D. M., & Wing, I. S. (2010). Economic consequences of pollinator declines: a synthesis. Agricultural and Resource Economics Review, 39(3), 368-383. Bartel, R. A., Oberhauser, K. S., De Roode, J. C., & Altizer, S. M. (2011). Monarch butterfly migration and parasite transmission in eastern North America. Ecology, 92(2), 342-351. Belsky, J., & Joshi, N. K. (2018). Assessing role of major drivers in recent decline of monarch butterfly population in North America. Frontiers in Environmental Science, 6, 86. Birch, C. P., Oom, S. P., & Beecham, J. A. (2007). Rectangular and hexagonal grids used for observation, experiment and simulation in ecology. Ecological Modelling, 206(3-4), 347-359. Bonebrake, T. C., Ponisio, L. C., Boggs, C. L., & Ehrlich, P. R. (2010). More than just indicators: a review of tropical butterfly ecology and conservation. Biological Conservation, 143(8), 1831-1841. Boyes, D. H., Evans, D. M., Fox, R., Parsons, M. S., & Pocock, M. J. (2021). Street lighting has detrimental impacts on local insect populations. Science Advances, 7(35), eabi8322. Boyle, J. H., Dalgleish, H. J., & Puzey, J. R. (2019). Monarch butterfly and milkweed declines substantially predate the use of genetically modified crops. Proceedings of the National Academy of Sciences, 116(8), 3006-3011. Bradford, J. B., Schlaepfer, D. R., Lauenroth, W. K., & Palmquist, K. A. (2020). Robust ecological drought projections for drylands in the 21st century. Global Change Biology, 26(7), 3906-3919. Bristow, L. V., Grundel, R., Dzurisin, J. D., Wu, G. C., Li, Y., Hildreth, A., & Hellmann, J. J. (2024). Warming experiments test the temperature sensitivity of an endangered butterfly across life history stages. Journal of Insect Conservation, 28(1), 1-13. Brower, L. P., Brower, J. V. Z., & Cranston, F. P. (1968). Courtship behavior of the queen butterfly, Danaus gilippus berenice (Cramer). Zoologica, 53(1), 1-39. Brower, L. P., Fink, L. S., & Walford, P. (2006). Fueling the fall migration of the monarch butterfly. Integrative and Comparative Biology, 46(6), 1123-1142. 51 Brower, L. P., Taylor, O. R., Williams, E. H., Slayback, D. A., Zubieta, R. R., & Ramirez, M. I. (2012). Decline of monarch butterflies overwintering in Mexico: is the migratory phenomenon at risk?. Insect Conservation and Diversity, 5(2), 95-100. CaraDonna, P. J., Iler, A. M., & Inouye, D. W. (2014). Shifts in flowering phenology reshape a subalpine plant community. Proceedings of the National Academy of Sciences, 111(13), 4916-4921. Cowie, R. H., Bouchet, P., & Fontaine, B. (2022). The Sixth Mass Extinction: fact, fiction or speculation?. Biological Reviews, 97(2), 640-663. Crewe, T. L., & McCracken, J. D. (2015). Long-term trends in the number of monarch butterflies (Lepidoptera: Nymphalidae) counted on fall migration at Long Point, Ontario, Canada (1995-2014). Annals of the Entomological Society of America, 108(5), 707-717. Crossley, M. S., Smith, O. M., Berry, L. L., Phillips‐Cosio, R., Glassberg, J., Holman, K. M., ... & Snyder, W. E. (2021). Recent climate change is creating hotspots of butterfly increase and decline across North America. Global Change Biology, 27(12), 2702-2714. Dalton, R. M., Underwood, N. C., Inouye, D. W., Soulé, M. E., & Inouye, B. D. (2023). Long‐term declines in insect abundance and biomass in a subalpine habitat. Ecosphere, 14(8), e4620. Dell, D. E. N. N. I. S., Sparks, T. H., & Dennis, R. L. (2005). Climate change and the effect of increasing spring temperatures on emergence dates of the butterfly Apatura iris (Lepidoptera: Nymphalidae). European Journal of Entomology, 102(2), 161-167. DeVries, P. J. (1987). The butterflies of Costa Rica and their natural history: Papilionidae, Pieridae, Nymphalidae. Dingle, H., Zalucki, M. P., Rochester, W. A., & Armijo-Prewitt, T. (2005). Distribution of the monarch butterfly, Danaus plexippus (L.)(Lepidoptera: Nymphalidae), in western North America. Biological Journal of the Linnean Society, 85(4), 491- 500. Ehrlich, P. R., & Raven, P. H. (1964). Butterflies and plants: A study in coevolution. Evolution, 18(4), 586-608. https://doi.org/10.2307/2406212 Feinsinger, P. (1983). Coevolution and pollination. Fiedler, A. K., Landis, D. A., & Arduser, M. (2012). Rapid shift in pollinator communities following invasive species removal. Restoration Ecology, 20(5), 593-602. Forister, M. L., McCall, A. C., Sanders, N. J., Fordyce, J. A., Thorne, J. H., O'Brien, J., ... & Shapiro, A. M. (2010). Compounded effects of climate change and habitat alteration shift patterns of butterfly diversity. Proceedings of the National Academy of Sciences, 107(5), 2088-2092. https://doi.org/10.1073/pnas.0909686107 https://doi.org/10.2307/2406212 https://doi.org/10.1073/pnas.0909686107 52 Fox, R., Brereton, T. M., Asher, J., August, T. A., Botham, M. S., Bourn, N. A. D., Cruickshanks, K. L., Bulman, C. R., Ellis, S., Harrower, C. A., Middlebrook, I., Noble, D. G., Powney, G. D., Randle, Z., Warren, M. S., & Roy, D. B. (2015). The State of the UK's Butterflies 2015. Butterfly Conservation and the Centre for Ecology & Hydrology. Fox, R., Dennis, E. B., Purdy, K. M., Middlebrook, I., Roy, D. B., Noble, D. G., Botham, M. S., & Bourn, N. A. D. (2023). The State of the UK's Butterflies 2022. Butterfly Conservation. Flockhart, D. T., Brower, L. P., Ramirez, M. I., Hobson, K. A., Wassenaar, L. I., Altizer, S., & Norris, D. R. (2017). Regional climate on the breeding grounds predicts variation in the natal origin of monarch butterflies overwintering in Mexico over 38 years. Global Change Biology, 23(7), 2565-2576. Furlong, M. J., & Zalucki, M. P. (2017). Climate change and biological control: the consequences of increasing temperatures on host-parasitoid interactions. Current Opinion in Insect Science, 20, 39-44. GBIF.org (11 December 2023) GBIF Occurrence Download https://doi.org/10.15468/dl.nrgwxj Gibbons, S. M., Lekberg, Y., Mummey, D. L., Sangwan, N., Ramsey, P. W., & Gilbert, J. A. (2017). Invasive plants rapidly reshape soil properties in a grassland ecosystem. mSystems, 2(2), 10-1128. Granato, C., Campera, M., & Bulbert, M. (2024). Sensitivity of Vanessa cardui to temperature variations: A cost-effective experiment for environmental education. Insects, 15(4), 221. Hald-Mortensen, C. (2023). The Main Drivers of Biodiversity Loss: A Brief Overview. Journal of Ecology and Natural Resources, 7(3), 000346. Hernandez, K., & Bowers, M. D. (2024). Food and time: dietary plasticity of different sources of a generalist insect herbivore. Journal of Insect Science, 24(2), 15. Hu, G., Stefanescu, C., Oliver, T. H., Roy, D. B., Brereton, T., Van Swaay, C., ... & Chapman, J. W. (2021). Environmental drivers of annual population fluctuations in a trans-Saharan insect migrant. Proceedings of the National Academy of Sciences, 118(26), e2102762118. Inamine, H., Ellner, S. P., Springer, J. P., & Agrawal, A. A. (2016). Linking the continental migratory cycle of the monarch butterfly to understand its population decline. Oikos, 125(8), 1081-1091. Inouye, D. W. (2022). Climate change and phenology. Wiley Interdisciplinary Reviews: Climate Change, 13(3), e764. https://doi.org/10.15468/dl.nrgwxj 53 Jepsen, S., Berglundh, T., Genco, R., Aass, A. M., Demirel, K., Derks, J., ... & Zitzmann, N. U. (2015). Primary prevention of peri‐implantitis: Managing peri‐implant mucositis. Journal of Clinical Periodontology, 42, S152-S157. Júnior, G. D. B. F., & Diniz, I. R. (2015). Temporal dynamics of fruit-feeding butterflies (Lepidoptera: Nymphalidae) in two habitats in a seasonal Brazilian environment. Florida Entomologist, 98(4), 1207-1216. Kc, S. (2025). An Untapped and Undocumented Butterfly Diversity in a Rapidly Urbanizing and Fragmenting Forest Habitat in Pokhara, Nepal: First Checklist and Implications for Conservation and Ecotourism. Ecology and Evolution, 15(8), e71937. Kogan, F. (2023). The IPCC Reports on Global Warming and Land Changes. In Remote Sensing Land Surface Changes: The 1981-2020 Intensive Global Warming (pp. 67-79). Cham: Springer International Publishing. Kuhn, M. (2008). Building predictive models in R using the caret package. Journal of Statistical Software, 28, 1-26. Kulmatiski, A., & Beard, K. H. (2011). Long-term plant growth legacies overwhelm short- term plant growth effects on soil microbial community structure. Soil Biology and Biochemistry, 43(4), 823-830. Lampert, T., Müters, S., Stolzenberg, H., Kroll, L. E., & KiGGS Study Group. (2014). Messung des sozioökonomischen Status in der KiGGS-Studie. Bundesgesundheitsblatt-Gesundheitsforschung-Gesundheitsschutz, 57(7), 762- 770. Liang, H., He, Y. D., Theodorou, P., & Yang, C. F. (2023). The effects of urbanization on pollinators and pollination: A meta‐analysis. Ecology Letters, 26(9), 1629-1642. Liaw, A., & Wiener, M. (2002). Classification and regression by randomForest. R News, 2(3), 18-22. Malcolm, S. B. (2018). Anthropogenic impacts on mortality and population viability of the monarch butterfly. Annual Review of Entomology, 63, 277-302. Markevich, A., Jones, S. R., Markevich, C. G., & Seastedt, T. R. (2023). Cheatgrass, Mammals, Birds, Butterflies, and Wildfire: A Study of Ecosystem Interactions. McKinney, M. L. (2006). Urbanization as a major cause of biotic homogenization. Biological Conservation, 127(3), 247-260. Menchetti, M., Guéguen, M., & Talavera, G. (2019). Spatio-temporal ecological niche modelling of multigenerational insect migrations. Proceedings of the Royal Society B, 286(1910), 20191583. Mesler, S. P., & Mabry, K. E. (2024). Effects of temperature experienced across life stages on morphology and flight behavior of painted lady butterflies (Vanessa cardui). Movement Ecology, 12(1), 76. 54 Middleton, A. D., Kauffman, M. J., McWhirter, D. E., Cook, J. G., Cook, R. C., Nelson, A. A., ... & Klaver, R. W. (2013). Animal migration amid shifting patterns of phenology and predation: lessons from a Yellowstone elk herd. Ecology, 94(6), 1245-1256. Mirzabaev, A., Stringer, L. C., Benjaminsen, T. A., Gonzalez, P., Harris, R., Jafari, M., Stevens, N., Tirado, C. M., & Zakieldeen, S. (2022). Cross-Chapter Paper 3: Deserts, semiarid areas and desertification. In H.-O. Pörtner, D. C. Roberts, M. Tignor, E. S. Poloczanska, K. Mintenbeck, A. Alegría, M. Craig, S. Langsdorf, S. Löschke, V. Möller, A. Okem