Researcher(s)
- Talha Mahmood, Computer Science, University of Delaware
Faculty Mentor(s)
- Dr. Xu Yuan, Computer & Information Science, University of Delaware
Abstract
Accurate crop yield prediction is crucial for effective agricultural planning and decision-making. However, timely prediction remains challenging due to the sensitivity of crop growth to seasonal weather variations and climate change. This work aims to replicate and validate the results of a deep learning-based approach, the Multi-Modal Spatial-Temporal Vision Transformer (MMST-ViT), which predicts county-level crop yield across the United States. The MMST-ViT model integrates a Multi-Modal Transformer, a Spatial Transformer, and a Temporal Transformer. The Multi-Modal Transformer utilizes visual remote sensing data and short-term meteorological data to assess the impact of seasonal weather variations on crop yield. The Spatial Transformer captures high-resolution spatial dependencies among counties for precise agricultural tracking, while the Temporal Transformer identifies long-range temporal dependencies to evaluate the effect of long-term climate change on crops. A novel multi-modal contrastive learning technique pre-trains the model with minimal human supervision. By leveraging satellite imagery and meteorological data, the MMST-ViT effectively accounts for both short-term weather fluctuations and long-term climate impacts on crop yield. Extensive experiments conducted on over 200 counties in the United States show that the MMST-ViT outperforms its counterparts under three performance metrics: RMSE, R², and Corr. This approach demonstrates its potential in aiding agricultural stakeholders in making informed decisions.