Program | Workshop on Applied Statistics in Agriculture & Natural Resources

Monday, May 11, 2026	Pre-Workshop on Casual Inference in R Lucy D'Agostino McGowan, Associate Professor of Statistics at Wake Forest University
8.00 - 5:00 pm	Registration Open and Poster & Sponsor Display Setup
8.00 - 9:00 am	Morning Refreshments
Workshop Schedule
9:00 am	Begin Workshop
10:30 - 10:45 am	Short break
11:45 - 1:30 pm	Luncheon (Workshop Attendees)
3:00 - 3:15 pm	Short break
4:30 pm	Workshop End (with the opportunity for folks to stay until 5 with questions)

Tuesday, May 12, 2026		Day 1
8:00 - 8:30 am		Morning Refreshments (Registration Open and Poster & Sponsor Display Setup)
Welcome
8:30 - 8:45 am	8:30 - 8:40 am	Opening Remarks Derek McLean, Dean/Director Agricultural Research Division, University of Nebraska-Lincoln, USA
8:30 - 8:45 am	8:40-8:45 am	Welcome Reka Howard, Associate Professor in Statistics, University of Nebraska-Lincoln, USA
Session 1: Causal Inference
8:45-10:10 am Moderator: Guilherme Rosa	8:45-9:45 am	Mind the Gap: Causal Inference is Not Just a Statistics Problem, Lucy D'Agostino McGoan, Associate Professor, Department of Statistical Sciences, Wake Forest University
8:45-10:10 am Moderator: Guilherme Rosa	9:45-10:10 am	Smooth Optimization for Forbidden-Edge Construction in Causal Structural Learning Promate Nakkirt, Department of Statistics, Iowa State University, Ames, IA, USA
10:10-10:40 am		Morning Beverages
Session 2: AI in Agriculture
10:40 - 11:35 am Moderator: Nora Bello	10:40-11:15 am	Multi-modal machine learning for the early detection of subclinical ketosis in dairy cattle Rafael Ferreira, University of Wisconsin-Madison, USA
	11:15 - 11:40 am	FruitPhenoNet and Hyperprobe Insight: A Deep Learning Toolbox for Automated Fruit Detection, Tracking, and Temporal Phenotyping from Time-Series Hyperspectral Imagery Sruti Das Choudhury, School of Natural Resources, University of Nebraska - Lincoln, USA
	11:40 - 12:05 am	Detection of Pig Health Status Using Longitudinal Feeding Behavior Data Maksuda Aktar Toma, Department of Statistics, University of Nebraska–Lincoln, Lincoln, USA
12:05 -1:30 pm		Group Luncheon
Session 3: Statistical Advancements in Agriculture
1:30 - 2:45 pm Moderator: Hans-Peter Piepho	1:30 - 1:55 pm	Mapping Genetic Effects in Dynamic Populations Brian Rice, Department of Agronomy and Horticulture, University of Nebraska-Lincoln, USA
	1:55 - 2:20 pm	Mixed-Model Approaches for Genotype-by-Environment Analysis: Enviromics-Based Multi-trait and Multidimensional Reaction Norm Modeling Guilherme Rosa, University of Wisconsin, Madison, USA
	2:20 - 2:45 pm	A Service-Based Joint Model Used for Distributed Learning: Application for Smart Agriculture Dixon Vimalajeewa, Department of Statistics, University of Nebraska-Lincoln, USA
2:45 - 3:10 pm		Afternoon Beverages
Session 4: Experimental Designs
3:10 - 5:00 pm Moderator: Nora Bello	3:10-3:45 pm	The Design (and Analysis) of Long-Term Agricultural Experiments Andrew Mead, Head of Statistics and Data Science, Rothamsted Research, UK
	3:45 - 4:10 pm	Comprehensive multi-nutrient response surface analysis of global polyhalite trials indicates context-dependent yield benefits for corn and wheat Hans-Peter Piepho, Biostatistics Unit, Institute of Crop Science, University of Hohenheim
	4:10 - 4:35 pm	Power Approximations with Non-Normal Data in Generalized Linear Mixed Models in R using Steep Priors on Variance Components Carlie Prinster, Utah State University, Logan, USA
4:35 - 6:00 pm		Break
6:00 - 8:00 pm		Dinner + Networking Social (Geat Plains Room C, Nebraska East Union)

Wednesday, May 13, 2026		Day 2
8:30 - 8:30 am		Morning Refreshments (Registration Open and Poster & Sponsor Display Setup)
Session 5: Spatial Modeling
8:30 - 10:05 am Moderator: Hans-Peter Piepho	8:30 - 9:05 am	A new model selection approach based on local economically optimal input rate Tarro Mieno, Assocate professor in Agricultural Economics, UNiversity of Nebraska Lincoln
	9:05 - 9:30 am	Spatial Variation and Environmental Factors Associated with Detection of Ring-Necked Pheasant Broods During August Roadside Surveys Philip Dixon, Iowa State University, Ames, USA
	9:30 - 9:55 am	An EM Method for Regularized Spatial Temporal Linear Model Selection for Lattice with Missing Values Francis Jo, Kansas State University, Manhattan, USA
	9:55 - 10:20 am	Long‑Term Crop Trials: A Practical Approach to Improving Trend Analysis Jixiang Wu, USDA-ARS, Genetics and Sustainable Agriculture Research Uit
	10:20 - 10:45 am	How Much Location Information Do We Need in Distance Sampling? Abraham Arbelaez, Department of Statistics, Kansas State University, Manhattan, USA
10:45 - 11:15 am		Break
Session 6: Statistical Consulting
11.15 - 12.15 pm Moderator: Réka Howard	11:15 - 11:50 am	Democratizing Statistical Consulting via Large Language Models: Progress, Pitfalls, and Practical Constraints in an Academic Setting, Bruce A Craig, Department of Statistics, Purdue University & Tadd Colver, and Arman Sabbaghi
11.15 - 12.15 pm Moderator: Réka Howard	11:50 - 12:15 pm	Concordance in Hibernal Phenology of Dirca occidentalis (Western Leatherwood) Rebekah Scott, Department of Statistics and William Graves, Department of Horticulture, Iowa State University, Ames, USA
12:15 - 1:45 pm		Group Luncheon
Session 7: Statistical Consulting Panel
1:45 - 3:10 pm Moderator: Josefina Lacasa	1:45 - 2:10 pm	Statistical Consulting at UNL Reka Howard, Department of Statistics, University of Nebraska Lincoln, USA
1:45 - 3:10 pm Moderator: Josefina Lacasa	2:10 - 3:10 pm	Panel Discussion
3:10 - 3:40 pm		Break
Session 8: Poster Lightning Talks
3.40 - 5.00 pm Moderator: Guilherme Rosa	3:40 - 3:55 pm	Poster Lightning Talks
3.40 - 5.00 pm Moderator: Guilherme Rosa	3:55 -5:00 pm	Poster Session

Thursday, May 14, 2026		Day 3
8:00 - 8:30 am		Morning Refreshments
Session 9: NCCC170
8:30 - 10:00 am Moderator: Philip Dixon		These Are the Voyages of NCCC-170: Past, present, and future Bruce Craig, Walter Stroup, Julia Piaskowski, Josefina Lacasa
8:30 - 10:00 am Moderator: Philip Dixon		West by Midwest: Package Maintenance, Transition and Handover Julia Piaskowski, Director of Statistical Programs, University of Idaho
10:00 - 10:30 am		Break
Session 10
10:30 - 11:50 am Moderator: Philip Dixon	10:30 - 11:30 am	You can't do that with Statistics! Chris Bilder, Department of Statistics, University of Nebraska - Lincoln
	11:30 - 11:50 am	Awards and Closing Remarks
11:50 - 1:00 pm		Lunch

Monday, May 11, 2026	Pre-Workshop on Casual Inference in R
8.00 - 5:00 pm	Registration Open and Poster & Sponsor Display Setup
8.00 - 9:00 am	Morning Refreshments
Mind the Gap: Causal Inference is Not Just a Statistics Problem Lucy D'Agostino McGowan, Associate Professor of Statistics at Wake Forest University In this talk we will discuss some of the major challenges in causal inference, and why statistical tools alone cannot uncover the data-generating mechanism when attempting to answer causal questions. We will showcase the Causal Quartet, which consists of four datasets that have the same statistical properties, but different true causal effects due to different ways in which the data was generated. These examples illustrate the limitations of relying solely on statistical tools in data analyses and highlight the crucial role of domain-specific knowledge.
Workshop Schedule
9:00 am	Begin Workshop
10:30 - 10:45 am	Short break
11:45 - 1:30 pm	Luncheon (Workshop Attendees)
3:00 - 3:15 pm	Short break
4:30 pm	Workshop End (with the opportunity for folks to stay until 5 with questions)


Tuesday, May 12, 2026		Day 1
8:00 - 8:30 am		Morning Refreshments (Registration Open and Poster & Sponsor Display Setup)
Welcome
8:30 - 8:45 am	8:30 - 8:40 am	Opening Remarks Derek McLean, Dean/Director Agricultural Research Division, University of Nebraska-Lincoln, USA
8:30 - 8:45 am	8:40-8:45 am	Welcome Reka Howard, Associate Professor in Statistics, University of Nebraska-Lincoln, USA
Session 1: Causal Inference
8:45-10:10 am Moderator: Guilherme Rosa	8:45-9:45 am	Causal Inference is Not Just a Statistics Problem, Lucy D'Agostino McGoan, Associate Professor, Department of Statistical Sciences, Wake Forest University Abstract: In this talk we will discuss some of the major challenges in causal inference, and why statistical tools alone cannot uncover the data-generating mechanism when attempting to answer causal questions. We will showcase the Causal Quartet, which consists of four datasets that have the same statistical properties, but different true causal effects due to different ways in which the data was generated. These examples illustrate the limitations of relying solely on statistical tools in data analyses and highlight the crucial role of domain-specific knowledge.
8:45-10:10 am Moderator: Guilherme Rosa	9:45-10:10 am	Smooth Optimization for Forbidden-Edge Construction in Causal Structural Learning , Promate Nakkirt, Department of Statistics, Iowa State University, Ames, IA, USA Abstract: Directed Acyclic Graph (DAG) causal structural learning seeks to uncover the true causal relationships among observed variables by representing them as directed graphs without cycles. Many existing methods, however, cannot easily incorporate domain knowledge, particularly forbidden-edge constraints, which specify certain edges should not exist. Ignoring these constraints can introduce spurious associations, as some variables may be determined externally and cannot be influenced by others; any incoming edges to such variables are invalid and must be excluded. Recent algorithms have attempted to integrate prior knowledge by adding constraints via continuous optimization. While this approach can improve the detection of forbidden edges, its structural constraint function contains non-smooth components, limiting the numerical stability and efficiency of gradient-based continuous optimization. To overcome these challenges, we propose a structural learning algorithm that directly incorporates gradient information from forbidden-edge constraints. This reformulation preserves differentiability, enhances numerical stability, and strictly enforces domain-specific exclusions. Simulation studies on both synthetic and real-world data demonstrate that our method outperforms existing benchmarks by accurately eliminating forbidden edges and recovering the true causal structure.
10:10-10:40 am		Morning Beverages
Session 2: AI in Agriculture
10:40 - 11:35 am Moderator: Nora Bello	10:40-11:15 am	Multi-modal machine learning for the early detection of subclinical ketosis in dairy cattle, Rafael Ferreira, University of Wisconsin-Madison, USA Abstract: Subclinical ketosis (SCK) is a prevalent metabolic disorder in transition dairy cows causing significant economic and welfare impacts, yet early detection remains challenging due to absent clinical signs and disease complexity. Two complementary projects developed multi-modal machine learning systems for prepartum prediction of postpartum SCK risk. The first integrated depth image-derived body shape features (extracted via CNN, anatomical keypoint sampling, and CNN-RNN), wearable sensor behavioral data, cow history, and text embeddings from farm management notes using generic and fine-tuned LLMs; Random Forest models combining image and tabular data achieved F₁ = 0.706, with fine-tuned LLM text embeddings further improving sensor-based models (F₁ 0.681 vs. 0.655). The second project added genomic data (78,964 SNPs reduced to 128 dimensions via UMAP) and introduced a cloud-based modular pipeline for automated image processing and data integration; among three fusion strategies tested, late fusion achieved the highest performance (F₁ up to 0.750) for binary SCK classification. Together, these projects show that integrating phenomic and genomic data through modern machine learning and fusion techniques substantially improves early SCK detection, supporting preventive health management in dairy farms.
	11:15 - 11:40 am	FruitPhenoNet and Hyperprobe Insight: A Deep Learning Toolbox for Automated Fruit Detection, Tracking, and Temporal Phenotyping from Time-Series Hyperspectral Imagery, Sruti Das Choudhury, School of Natural Resources, University of Nebraska - Lincoln, USA Abstract: A phenotype represents the observable expression of a genome for specific traits within a given environment. When phenotypes are analyzed as a function of time using time-series imagery, they are referred to as temporal phenotypes, which provide critical insights into plant vigor and development. This research introduces a novel framework, FruitPhenoNet, designed for automated fruit detection and temporal phenotyping using time-series visible light and hyperspectral imagery. The proposed method employs a multimodal image analysis approach to address the challenge of detecting fruits that share similar color characteristics with surrounding foliage (e.g., peppers), thereby enabling accurate computation of temporal phenotypes. We first demonstrate that the reflectance spectra of fruits exhibit distinct patterns compared to other plant components. Leveraging this observation, FruitPhenoNet—a deep neural network—is trained to learn fruit-specific spectral characteristics derived from hyperspectral imagery. The model is capable of detecting fruits at early visibility stages, enabling precise estimation of fruit emergence timing, followed by continuous growth tracking. Given that visible light imagery provides significantly higher spatial resolution than hyperspectral data, fruit regions identified in hyperspectral images are spatially aligned with corresponding regions in visible images. This integration facilitates the extraction of detailed color features, which are essential for capturing color transition events in crops such as pepper. FruitPhenoNet generates a comprehensive status report comprising key temporal phenotypes, including: (i) the day of first fruit emergence, (ii) the number of fruits present at any given time, (iii) the total fruit yield per plant, and (iv) the timing of fruit color transitions. The model is trained and evaluated on a benchmark dataset, FruitPheno, which consists of multimodal image sequences capturing the full life cycle of pepper plants. These data are acquired from a high-throughput plant phenotyping platform across multiple viewpoints, demonstrating the robustness and effectiveness of the proposed approach. In addition, this research presents HyperProbe Insight, an interactive toolbox for the exploration and analysis of hyperspectral image sequences. The toolbox supports comprehensive pre-processing of hyperspectral data, including dimensionality reduction, automated band selection, and image segmentation, followed by the application of machine learning techniques and the computation of key analytical metrics. The efficacy of HyperProbe Insight is demonstrated using hyperspectral image sequences of bell pepper plants from the FruitPheno dataset, through the computation of temporal phenotypes.
	11:40 - 12:05 am	Detection of Pig Health Status Using Longitudinal Feeding Behavior Data, Maksuda Aktar Toma, Department of Statistics, University of Nebraska–Lincoln, Lincoln, USA Abstract: Early detection of illness in group-housed pigs remains a major challenge in precision livestock systems, as behavioral changes preceding clinical diagnosis are often subtle and difficult to quantify. This study develops a statistical and machine learning framework to identify health status using longitudinal feeding behavior data collected from a multi-year grow–finish swine study conducted at the U.S. Meat Animal Research Center (USDA MARC) between 2008 and 2015. The work demonstrates how AI-driven monitoring can strengthen precision livestock management and improve animal welfare and production efficiency. A total of 311,806 daily feeding records from twelve experimental batches were analyzed. Key behavioral measures included meal frequency, total feeding time, and feeding span. Feeding trajectories were aligned relative to clinical diagnosis (Day 0), and segmented regression was used to characterize temporal changes and define four health stages: Healthy (≤ –20 days), Pre-symptom (–20 to 0 days), Sick and early recovery (0 to +90 days), and Recovery (≥ +90 days). Multiclass classification models—including Random Forest, multinomial LASSO, and Ridge regression—were trained and evaluated on held-out data to predict daily health status. Results revealed clear behavioral signatures associated with disease progression. Meal frequency peaked approximately 17 days prior to diagnosis, while total feeding time declined steadily during illness. Random Forest achieved the highest performance, with accuracy of 0.64 and balanced accuracy of 0.70 when including Week 22 body weight, compared to 0.53–0.54 accuracy for penalized regression models. When body weight was excluded to reflect real-time monitoring, performance decreased across all models (Random Forest accuracy = 0.58), though LASSO and Ridge remained relatively stable. Across all approaches, detection of the Pre-symptom stage remained limited, reflecting the gradual and heterogeneous nature of early disease onset. These findings demonstrate that feeding behavior provides meaningful early indicators of illness, with detectable changes emerging approximately 2–3 weeks prior to clinical diagnosis. While tree-based methods offer higher predictive performance when full information is available, penalized regression models provide more stable and interpretable results under real-time constraints. By combining segmented time-series modeling with machine learning, this framework links behavioral dynamics with health status and supports the development of practical early-warning tools for disease surveillance in precision livestock systems.
12:05 -1:30 pm		Group Luncheon
Session 3: Statistical Advancements in Agriculture
1:30 - 2:45 pm Moderator: Hans-Peter Piepho	1:30 - 1:55 pm	Mapping Genetic Effects in Dynamic Populations, Brian Rice, Department of Agronomy and Horticulture, University of Nebraska-Lincoln, USA Abstract: Identifying causal genetic variants underlying complex traits remains a central challenge in plant breeding, particularly when genotype-by-environment interactions and population structure confound statistical inference. Conventional genome-wide association studies (GWAS) are typically conducted in static mapping populations, limiting their ability to capture the dynamic nature of allele frequency change and selection that governs real breeding programs. Here, we evaluate an alternative framework in which causal inference is embedded directly within an active breeding population undergoing rapid recurrent selection. Using a sorghum breeding population as a case study, we demonstrate that a highly intermated population can simultaneously support genetic gain and enable robust detection of quantitative trait loci (QTL). Empirical analyses show that repeated intercrossing maintains effective population size, accelerates linkage disequilibrium decay, and reduces confounding from major adaptive loci, improving the identifiability of genetic effects. Multi-environment GWAS further reveals environment-dependent genetic architectures for delayed leaf senescence, highlighting the importance of modeling context-specific effects in causal inference. Through forward-time simulations, we compare rapid cycling breeding designs to conventional pure line development and show that increased recombination and reduced drift improve both QTL detection power and localization accuracy across a range of genetic architectures, including low-heritability and rare-variant scenarios. Importantly, these designs enable continuous updating of causal estimates as selection proceeds, rather than treating inference and application as separate steps. We propose a “Flywheel Genomics” framework in which breeding and inference operate as a coupled system, allowing causal signals to be iteratively refined and directly translated into selection decisions. This approach reframes causal inference in agriculture as a dynamic process embedded within population improvement, with implications for accelerating genetic gain under changing environmental conditions.
	1:55 - 2:20 pm	Mixed-Model Approaches for Genotype-by-Environment Analysis: Enviromics-Based Multi-trait and Multidimensional Reaction Norm Modeling, Guilherme Rosa, University of Wisconsin, Madison, USA Abstract: Genotype-by-environment interaction (G×E) presents a statistical challenge in large-scale livestock breeding programs because genetic effects may vary across heterogeneous production environments. Modern breeding datasets increasingly combine phenotypic, genomic, environmental, and management information, creating opportunities to study G×E using large, high-dimensional datasets within mixed-model frameworks. This presentation illustrates two complementary statistical approaches for modeling G×E that are widely used in livestock genetic evaluation: multi-trait mixed models and reaction norm mixed models. The first example applies a multi-trait single-step genomic BLUP (ssGBLUP) framework to model genotype-by-environment-by-management interactions. The dataset integrates nearly 400,000 phenotypic records across multiple traits, pedigree information spanning over two million animals, genomic data from more than 70,000 individuals, and farm-level descriptors of environmental and management conditions. Environmental conditions were defined using a data-driven hierarchical clustering strategy based on Gower distances computed from climate, soil, elevation, and management variables. A bi-variate mixed model treating performance in different environmental clusters as correlated traits was used to estimate genetic (co)variance components and genomic breeding values. Genetic correlations between environmental clusters provided statistical evidence of heterogeneous genetic effects and re-ranking of individuals across environments. The second example illustrates the reaction norm approach using more than 360,000 phenotypic records collected across thousands of farms. Environmental descriptors were constructed using an enviromics strategy that integrates geographic, climatic, and soil information obtained from external databases. Principal component analysis was used to summarize these environmental variables, and the resulting components were used as continuous environmental gradients in a multidimensional reaction norm mixed model. Random regression coefficients were estimated within a Bayesian mixed-model framework, allowing genetic effects to be expressed as functions of multiple environmental covariates and enabling estimation of environment-specific genetic variances and genetic correlations across the environmental space. These studies highlight two emerging directions in statistical modeling of G×E: (1) the use of enviromics to construct quantitative environmental descriptors from heterogeneous environmental and management data, and (2) the development of multidimensional reaction norm models that combine mixed-model methodology with dimension-reduction techniques such as principal component analysis. Together, these approaches illustrate how modern mixed-model methodology can leverage large, heterogeneous datasets to quantify G×E interaction and enable environment-specific prediction in complex biological systems.
	2:20 - 2:45 pm	A Service-Based Joint Model Used for Distributed Learning: Application for Smart Agriculture, Dixon Vimalajeewa, Department of Statistics, University of Nebraska-Lincoln, USA Abstract: Distributed analytics facilitate making the data-driven services smarter for a wider range of applications in many domains, including agriculture. The key to producing services at such level is timely analysis for deriving insights from reliable data. Centralized data analytic services are becoming infeasible due to limitations in the Information and Communication Technologies (ICT) infrastructure, timeliness of the information, and data ownership. Distributed Machine Learning (DML) platforms facilitate efficient data analysis and overcome such limitations effectively. Federated Learning (FL) is a DML methodology, that enables optimizing resource consumption while performing privacy-preserved, timely analytics. In order to create such services through FL, there needs to be innovative machine learning (ML) models, as data complexity as well as application requirements limit the applicability of existing ML models. Even though NN-based models are highly advantageous, the use of NN in FL settings is limited to clients (with less computational capabilities) and high-dimensional data (with a large number of model parameters). Therefore, in this paper, we propose a novel Neural Network (NN)- and Partial Least Squares (PLS) regression- based joint FL model (FL-NNPLS). Its predictive performance is evaluated under sequential- and parallel-updating based FL algorithms in a smart farming context for milk quality analysis. Smart farming is a fast-growing industrial sector that requires effective analytics platforms to enable sustainable farming practices. However, the use of advanced ML techniques is still at an early stage for improving the effectiveness of farming practices. Our FL-NNPLS approach performs and compares well with a centralized approach and demonstrates state-of-the-art performance.
2:45 - 3:10 pm		Afternoon Beverages
Session 4: Experimental Designs
3:10 - 5:00 pm Moderator: Nora Bello	3:10-3:45 pm	The Design (and Analysis) of Long-Term Agricultural Experiments, Andrew Mead, Head of Statistics and Data Science, Rothamsted Research, UK Abstract: Rothamsted Research is famous for its’ long-term agricultural experiments (LTEs), established from 1843 onwards. An interest in analysing the data from these experiments stimulated the appointment of R.A. Fisher as the institute’s first statistician in 1919, leading to the development of the principles for the design and analysis of experiments that still underpin the statistical approaches we use today. Many of these ideas were first presented in Fisher’s seminal paper on “The arrangement of field experiments” in 1926. A century later and LTEs remain an increasingly important component of modern agricultural research, helping us to address the twin threats of climate change and population growth on the effective and efficient use of resources. I will describe how developments from Fisher’s design principles support the design of modern LTEs, illustrated through the design of a new multidisciplinary LTE, established in 2017/8, to address modern agri-environmental questions about the sustainability and resilience of farming systems. As time allows I will then briefly present approaches to the analysis of data from this new LTE, the development of a resource (the Global Long-Term Experiments Network (GLTEN)) to support the integrated use of LTE data, a meta-analysis approach developed to illustrate the value of such integration, and the development of an ontology to better describe statistical designs for experiments.
	3:45 - 4:10 pm	Comprehensive multi-nutrient response surface analysis of global polyhalite trials indicates context-dependent yield benefits for corn and wheat, Hans-Peter Piepho,Biostatistics Unit, Institute of Crop Science, University of Hohenheim Abstract: Efficient nutrient management is critical for improving crop productivity, yet the interactions among multiple nutrients remain poorly understood. Quantifying rate-dependent interactions requires analytical approaches that can handle non-linear, multi-nutrient response surfaces. The objective of this study was to implement Response Surface Methodology (RSM) within mixed-effects models to analyze the response of corn and wheat yields to polyhalite and to identify nutrient interaction patterns across diverse regions. RSM was applied to a database of 28 680 field trials conducted in 29 countries over 10 years. It incorporated the application rates of 7 variables (nutrients N, P, K, S, Ca, Mg and polyhalite) and accounted for environmental variation through hierarchical random effects. The results documented yield benefits with polyhalite compared to conventional nutrient sources. The models identified statistically significant polyhalite interactions with K and P in corn and with N in wheat. In corn, polyhalite increased yield by 1.4-2.8% with benefits diminishing as K and P application rates increased. Wheat exhibited consistent polyhalite benefits across the entire K rate gradient. In wheat, benefits ranged from substantial gains (+16.7%) at zero N application to modest gains (0.6%) at medium N rates, with slight yield penalties (-1.0%) observed at high N rates. These interaction patterns indicate that polyhalite’s benefits vary based on the crop, nutrient, or application rates. Multi-nutrient interactions were successfully identified by RSM. However, further research is needed to convert these insights into practical crop nutrition strategies.
	4:10 - 4:35 pm	Power Approximations with Non-Normal Data in Generalized Linear Mixed Models in R using Steep Priors on Variance Components, Carlie Prinster, Utah State University, Logan, USA Abstract: Estimating statistical power is often an important element of designing an experiment. The probability distribution method, with an exemplary data set showing the magnitude of hypothesized effects, and with variance components held constant, provides a flexible way to do this for generalized linear mixed models, and can be implemented in SAS’s PROC GLIMMIX. Corresponding modeling approaches in R do not currently allow variance components to be held constant in the same way, but previous work has shown that they can be essentially held constant by placing steep priors on them. However, this approach in R can encounter challenges when dealing with non-normal data. This project presents some approaches to resolve those challenges so that these power approximations are more accessible to researchers.
4:35 - 6:00 pm		Break
6:00 - 8:00 pm		Dinner + Networking Social

Wednesday, May 13, 2026		Day 2
8:30 - 8:30 am		Morning Refreshments (Registration Open and Poster & Sponsor Display Setup)
Session 5: Spatial Modeling
8:30 - 10:05 am Moderator: Hans-Peter Piepho	8:30 - 9:05 am	A new model selection approach based on local economically optimal input rate , Tarro Mieno Abstract: In on-farm precision experiments (OFPE), machine learning models for estimating site-specific yield response functions are typically selected based on yield prediction accuracy. However, the ultimate objective is not predicting yield levels but determining economically optimal nitrogen rates (EONR). This study proposes an alternative model selection approach that evaluates candidate models based on their ability to predict local EONR, estimated via a GAM model during spatial cross-validation. Using Monte Carlo simulations with 500 iterations across five candidate models (linear, spatial error, random forest, boosted regression forest, and causal forest), we compare the proposed approach against conventional yield-based selection. Results show that yield-based selection consistently favors models that predict yield well but perform poorly in EONR estimation, leading to substantial profit losses. The proposed local-EONR-based approach more accurately identifies models that generate profitable variable-rate nitrogen recommendations, demonstrating that model selection criteria should align with the ultimate decision objective rather than intermediate prediction tasks.
	9:05 - 9:30 am	Spatial Variation and Environmental Factors Associated with Detection of Ring-Necked Pheasant Broods During August Roadside Surveys, Philip Dixon, Iowa State University, Ames, USA Abstract: The August roadside survey is a population index used to monitor status and trends of ring-necked pheasant (Phasianus colchicus) in several U.S. states. Inter-annual population changes from roadside surveys have occasionally implied biologically implausible outcomes, hinting that roadside surveys may be biased and imprecise because of unaccounted-for variation in detection probability. We organized a large-scale study of detectability of pheasant broods across 11 states where roadside surveys are an important monitoring tool. State wildlife resource agencies conducted 1000 August roadside surveys on 174 unique route-by-year combinations during 2019 – 2021. We estimated detection probability using a single-species N-mixture model in a Bayesian framework. We evaluated 1) associations with environmental characteristics and 2) variability at two spatial and two temporal scales. Estimated detection probability was negatively associated with wind speed, cloud cover, and dewpoint depression, a proxy for morning dew conditions where higher values indicate less dew. Soil moisture was positively associated with detection probability. Variation in detection probability was greatest between states and between repeated visits to the same route. Variability between routes and between years was much smaller. Our results indicate that surveys should target mornings with few clouds, low winds, and favorable dew conditions. This will both increase detection probability and reduce variability between visits on the same route. However, soil moisture may be difficult to control over larger scales of space and time.
	9:30 - 9:55 am	An EM Method for Regularized Spatial Temporal Linear Model Selection for Lattice with Missing Values, Francis Jo, Kansas State University, Manhattan, USA Abstract: A motivating problem comes from a real data application of corn production in Kansas. Because some observations at county level in Kansas are missing, the application consists of incomplete data. Incomplete data is a common problem, which is often exacerbated by ignoring or deleting the missing values rather than treating them. Lattice data observed over time are not exempt. Removing a lattice cell that is absent for a period of time may not be appropriate or possible. Linear regression models are one of the most useful tools to not only find relationships between variables, but also to quantify effect sizes. Regularization methods have been developed to do variable selection and parameter estimation through a penalized approach. In some literature, we can find attempts to address the missing data issue for spatial lattices, and of efficient variable selection, but they are limited to one time period or penalty. We propose a method that can handle missing spatial and temporal lattice data with the option to choose the penalty used to optimize the selection of variables and dependence structure. A simulation study will compare the proposed method with others in literature. Evaluation will focus on prediction performance, and precision of estimation of parameters and missing values. The method will also be evaluated with a real data application of corn production in Kansas.
	9:55 - 10:20 am	Long‑Term Crop Trials: A Practical Approach to Improving Trend Analysis, Jixiang Wu, USDA-ARS, Genetics and Sustainable Agriculture Research Uit Abstract: Long-term crop trial datasets are valuable resources for understanding genetic progress and improving prediction of genotype performance. However, these datasets are often highly unbalanced because entries and locations change frequently across years. This lack of continuity makes it difficult to analyze multi‑year data in a single statistical framework without introducing bias. In this presentation, we describe a recently developed stepwise adjustment method that strengthens year‑to‑year connectivity within long-term trials. Using Monte Carlo simulations, we show that this approach improves the prediction of genotypic values by better accounting for environmental variation among years. We demonstrate the method’s usefulness using two real datasets: a 16‑year soybean yield trial from South Dakota and a 47‑year regional cotton high‑quality trial. In both cases, statistical model fit for estimating genetic gain trends improved, and the resulting estimates were more consistent with documented historical genetic gains. This work highlights how appropriate statistical adjustments can unlock the full value of long-term public trial data for genetics and crop improvement.
	10:20 - 10:45 am	How Much Location Information Do We Need in Distance Sampling?, Abraham Arbelaez, Department of Statistics, Kansas State University, Manhattan, USA Abstract: Distance sampling is widely used to estimate wildlife population abundance while accounting for imperfect detection. In line transect surveys, observers typically record detections and their perpendicular distances from the transect line to model how detection probability declines with distance. Advances in survey technology increasingly allow observers to record the full spatial location of each detection rather than distance measurements alone. Although richer spatial information is often assumed to improve statistical inference, collecting precise location data can increase survey complexity, cost, and logistical effort. This raises an important question for survey design: how much information is actually needed to obtain efficient abundance estimates in distance sampling? We investigate how different levels of information collected during distance sampling surveys affect the efficiency of abundance estimators. Specifically, we compare estimators based on counts alone, unsigned distances, signed distances, and full spatial locations of detections. Because analytical comparisons are difficult under realistic ecological conditions, we conduct a simulation study based on spatial point processes that vary spatial heterogeneity in abundance, detection decay with distance, and survey intensity. We also illustrate the practical implications of these approaches using line transect surveys of Grasshopper Sparrows collected at the Konza Prairie Biological Station. Our results highlight clear tradeoffs between different levels of information in distance sampling surveys and provide practical guidance on when collecting full spatial location data meaningfully improves inference.
10:45 - 11:15 am		Break
Session 6: Statistical Consulting
11.15 - 12.15 pm Moderator: Réka Howard	11:15 - 11:50 am	Democratizing Statistical Consulting via Large Language Models: Progress, Pitfalls, and Practical Constraints in an Academic Setting, Bruce A Craig Department of Statistics, Purdue University, Tadd Colver, and Arman Sabbaghi Abstract: As large language models (LLMs) continue to evolve in their ability to mimic statistical reasoning, and assist with tasks involving design, analysis, and interpretation, statistical consulting services face both new opportunities and emerging challenges. About a year ago, Purdue’s Statistical Consulting Service (SCS) started the development and adoption of a custom LLM platform based on retrieval-augmented generation (RAG) techniques. The goal for this platform was to enhanced the accessibility, efficiency, and scalability of the statistical support offered by the SCS. In this talk, I will discuss our ever-evolving efforts to tailor the platform for our consulting purposes, the necessary computational infrastructure that we identified to support the workflows in the platform, and the challenges and pitfalls associated with our efforts in an academic setting. In particular, I will highlight the difficulty of maintaining the rapidly evolving methodology given the breakneck speed of advances in LLMs, as well as the efforts required to establish inputs such as (but not limited to) system prompts, number of retrieved documents, and similarity weights in order to improve the reliability and utility of the platform. By reflecting on the ups and downs of our internal deployment experience, this presentation aims to illuminate a path toward responsible and effective integration of LLMs in academic statistical services.
11.15 - 12.15 pm Moderator: Réka Howard	11:50 - 12:15 pm	Concordance in Hibernal Phenology of Dirca occidentalis (Western Leatherwood) , Rebekah Scott, Department of Statistics and William Graves, Department of Horticulture, Iowa State University, Ames, USA Abstract: Dirca occidentalis A.Gray (western leatherwood) is a deciduous shrub endemic in small portions of six counties of California near the San Francisco Bay. Reproduction of this obscure species, which blooms and initiates seed development during winter, may be threatened by changes in the Mediterranean climate where it occurs. We therefore seek to understand the phenology of blooming of D. occidentalis. Previous research indicated that individual shrubs of D. occidentalis bloom on different schedules during late autumn and winter at Stanford University’s Jasper Ridge Biological Preserve in San Mateo County, California. We build on those findings by addressing several questions: Are individual shrubs that bloom early in one year consistently early across years? How do the timing and quantity of rainfall influence the order in which shrubs bloom? Do shrubs in certain geographic clusters bloom earlier than others, and if so, are these differences driven by spatial or environmental factors? To answer these questions, we use a 4-year dataset representing 101 shrubs and 14-year dataset representing 30 shrubs at the Jasper Ridge preserve. We apply concordance-based methods to assess the stability of phenological rankings of bloom order across years. Concordance, traditionally used to evaluate agreement among judges, is used to measure the consistency of bloom timing across years. Our approaches include ANACONDA (Analysis of Concordance), which quantifies single- and multi-group concordance with Kendall’s W and Schucany’s ℒ, as well as a novel extension of Kendall’s W. Our findings contribute to a deeper understanding of phenological patterns in D. occidentalis and provide support for biological theory regarding the drivers of bloom timing and consistency, particularly in relation to environmental factors.
12:15 - 1:45 pm		Group Luncheon
Session 7: Statistical Consulting Panel
1:45 - 3:10 pm Moderator: Josefina Lacasa	1:45 - 2:10 pm	Statistical Consulting at UNL, Reka Howard, Department of Statistics, University of Nebraska Lincoln, USA Abstract: Statistical consulting plays an important role in improving study design, data analysis, and interpretation in interdisciplinary research. This presentation describes the Statistical Cross-disciplinary Collaboration and Consulting Lab (SC3L) at the University of Nebraska–Lincoln and its model for combining statistical support with student training. SC3L provides consulting services, workshops, and collaborative problem solving for researchers across disciplines, while also creating structured opportunities for students to develop consulting skills through coursework and real client interactions. The talk highlights UNL’s consulting training pipeline, examples of collaborative activities, and the broader impact of SC3L on research quality, communication, and cross-disciplinary scholarship.
1:45 - 3:10 pm Moderator: Josefina Lacasa	2:10 - 3:10 pm	Panel Discussion
3:10 - 3:40 pm		Break
Session 8: Poster Lightning Talks
3.40 - 5.00 pm Moderator: Guilherme Rosa	3:40 - 3:55 pm	Poster Lightning Talks
3.40 - 5.00 pm Moderator: Guilherme Rosa	3:55 -5:00 pm	Poster Session

Thursday, May 14, 2026		Day 3
8:00 - 8:30 am		Morning Refreshments
Session 9: NCCC170
8:30 - 10:00 am Moderator : Guilherme Rosa		These Are the Voyages of NCCC-170: Past, present, and future, Bruce Craig, Walter Stroup, Julia Piaskowski, Josefina Lacasa Abstract: The NCCC-170 “Research Advances in Agricultural Statistics” group is a multistate research coordinating committee and information exchange group that connects applied statisticians from academia, government, and industry. Participants discuss and work on research advances in agricultural statistics to help other statisticians and practitioners. We will talk about the history of the group, highlight past and present accomplishments, and explore the future direction of the committee's initiatives.
8:30 - 10:00 am Moderator : Guilherme Rosa		West by Midwest: Package Maintenance, Transition and Handover, Julia Piaskowski, Director of Statistical Programs, University of Idaho Abstract: The open-source statistical programming universe is bolstered by many contributed libraries from individual, usually unpaid programmers who may lack formal training in software development. This is true for R, Python, Julia and many other programming languages. These contributed libraries have differing scopes and quality of code and documentation; likewise, the adoption of these contributed libraries also varies. Regardless of popularity of a contributed package, every package creator or maintainer eventually faces the issue of whether to continue maintenance of their library, pass it on to another person or team, or discontinue the library. For those that choose to pass their project on to another person or team, challenges remain. These include how to keep a package compatible with R overall and package dependencies (both regular and reverse), handling user feedback and bug reports, feature extension, and learning how to collaborate with a new person. For the new maintainer, taking on a legacy package of high popularity can be intimidating and overwhelming. For the original creator, one is suddenly accountable to another person, and the development process (multiple branches, timing of updates, etc.) changes. We share what we have learned from this transition and encourage the new or seasoned programmer to become involved in software development and maintenance.
10:00 - 10:30 am		Break
Session 10
10:30 - 11:50 am Moderator: Guilherme Rosa	10:30 - 11:30 am	You can't do that with Statistics!, Chris Bilder, Department of Statistics, University of Nebraska - Lincoln Abstract: On December 5, 2025, the University of Nebraska Board of Regents voted 7-1 to eliminate the University of Nebraska-Lincoln’s Department of Statistics as part of a $27.5M budget cut. The vote concluded the department’s three-month long fight for its survival. How did this happen‽ The purpose of this presentation is to describe the conditions that led to the original proposal for elimination and to examine the department’s efforts to remain. University administrators touted their “data-informed approach” to justify the elimination of multiple departments. However, Statistics faculty discovered numerous problems with the data used and the statistical analysis performed by these administrators. Further analyses performed by the faculty showed that Statistics (and other departments proposed for elimination) actually ranked as top 20 programs at the university, rather than “under performing” as characterized by the administration. While all shared-governance committees voted overwhelmingly to retain Statistics, the elimination proposal went forward, leading to the Board of Regents vote. SaveOurStats.com contains a timeline of events that includes news media coverage; campuswide presentations; and testimony by faculty, students, family, and friends at Board of Regents meetings.
	11:30 - 11:50 am	Awards and Closing Remarks
11:50 - 1:00 pm		Lunch

Agenda-cover pic

Schedule at a glance

Agenda

Monday, May 11, 2026

Pre-Workshop on Casual Inference in R

Mind the Gap: Causal Inference is Not Just a Statistics Problem

Workshop Schedule

Tuesday, May 12, 2026

Day 1

8:00 - 8:30 am

Morning Refreshments

Welcome

8:30 - 8:45 am

Session 1: Causal Inference

8:45-10:10 am

10:10-10:40 am

Morning Beverages

Session 2: AI in Agriculture

10:40 - 11:35 am

12:05 -1:30 pm

Group Luncheon

Session 3: Statistical Advancements in Agriculture

1:30 - 2:45 pm

2:45 - 3:10 pm

Afternoon Beverages

Session 4: Experimental Designs

3:10 - 5:00 pm

4:35 - 6:00 pm

Break

6:00 - 8:00 pm

Dinner + Networking Social

Wednesday, May 13, 2026

Day 2

8:30 - 8:30 am

Morning Refreshments

Session 5: Spatial Modeling

8:30 - 10:05 am

10:45 - 11:15 am

Break

Session 6: Statistical Consulting

11.15 - 12.15 pm

12:15 - 1:45 pm

Group Luncheon

Session 7: Statistical Consulting Panel

1:45 - 3:10 pm

3:10 - 3:40 pm

Break

Session 8: Poster Lightning Talks

3.40 - 5.00 pm

Thursday, May 14, 2026

Day 3

8:00 - 8:30 am

Morning Refreshments

Session 9: NCCC170

Moderator :

10:00 - 10:30 am

Break

Session 10

11:50 - 1:00 pm

Lunch

Content