CROPINVEST - Crop Yield Estimator
Project Summary
CROPINVEST - Crop Yield Estimator for the State of North Dakota, USA
Problem Statement
North Dakota, a state in the Midwestern United States, is popular with agricultural investors for its abundant agricultural resources and vast farmland. However, the agencies like – banks, crop insurance companies, etc need a tool that can predict the crop yield value of a particular farm to ensure that they can safeguard their investment. In this regard, we create CROPINVEST, a GEE application that monitors the environment to predict crop yields, and based on these yields, we obtain the market value of the crop, which in turn can help end users (banks, crop insurance companies ,etc) to accurately and scientifically capture the farms potential to pay the crop loan amount or the risk associated with the insured crops and enable stakeholders to make informed decisions that optimize agricultural productivity, market strategies, and economic outcomes.
End User
- Insurance Companies
- Based on potential crop yields, charge Insurance premiums to farmers thus mitigating financial losses from decreased production.
- Managing financial exposure if low yield or drought conditions prevail, which could lead to substantial claims from farmers.
- Banks
- Utilize it to evaluate loan feasibility for farmers and adjust interest rates based on predicted yields and associated risks.
- Farmer Associations
- Utilize it to forecast yields, plan for adverse conditions, and identify areas needing more irrigation.
- Governments
- To anticipate and prepare for potential crop deficits or surpluses at a county level and thus developing policies/infrastructure for the same.
Data
Datasets
- A1 - United States Census Bureau TIGER Dataset: Boundary information for U.S. counties.
- A2 - USDA NASS Cropland Data Layers: Crop-specific Land cover data.
- A3 - MOD16A2GF.061 Terra Net Evapotranspiration: Total evapotranspiration over land.
- A4 - MOD11A1.061 Terra Land Surface Temperature: Land surface temperature data.
- A5 - MOD13Q1.061 Terra Vegetation Indices 16-Day Global: Vegetation indices.
- A6 - GRIDMET: Gridded Surface Meteorological Dataset: Meteorological data.
- A7 - MCD18C2.061 Photosynthetically Active Radiation Daily 3 hour: Solar Radiation Levels.
- A8 - SPL3SMP_E.005 SMAP L3 Radiometer Global Daily 9km Soil Moisture: Soil moisture.
- A9 - Crop Yield Data: Contains county-wise crop yield data.USDA
- A10 - Current Crop Price Data: Uses current crop price data from Financial Times
Data Variables Overview
Data Type | Dataset | Description | Data Used |
---|---|---|---|
GEE DATA | A1 | County wise Boundary Information | STATEFP |
GEE DATA | A2 | Crop-specific Land Cover Data Type | cropland |
GEE DATA | A3 | Evapo-Transpiration | ET |
GEE DATA | A4 | Land Surface Temperature Day | LST_Day_1km |
GEE DATA | A4 | Land Surface Temperature Night | LST_Night_1km |
GEE DATA | A5 | Normalized Difference Vegetation Index | NDVI |
GEE DATA | A6 | Precipitation | pr |
GEE DATA | A7 | Photosynthetically Active Radiation | GMT_1200_PAR |
GEE DATA | A8 | Day Soil Moisture | soil_moisture_am |
GEE DATA | A8 | Night Soil Moisture | soil_moisture_pm |
OTHER DATA | A9 | USDA Crop Yield (Year Wise) | Yield |
OTHER DATA | A10 | Crop Day Price $ - Financial Times | Price |
Methodology
We explored machine learning techniques to forecast crop yields, evaluating models like Linear Regression, Random Forest, Gradient Boosting, and Neural Networks based on their R2 and RMSE scores ,and found RF best fit for each crop. Previous studies showed that RF is an effective and universal machine-learning method for crop yield prediction on a regional and global scale with high accuracy, precision, and ease of use (Jeong et al., 2016.; Prasad et al., 2021).
Building Random Forest Model:
Prepare original CSV file
- including the three crops (corn, wheat, soybean) among several X variables and Y variable (the crop yield).
Prepare Training Data(80%) / Validation Data(20%)
Use the Training data to train three different Random Forest Regression Models in GEE
Validation
To get the performance of our models, we can use the test data from the previous split. We used R square and Root Mean Squared Error (RMSE) to validate our models. Some graphs showing these metrics are below:
Interface
CROPINVEST supports insurance companies, banks, and farmer associations with advanced features for forecasting crop yields in North Dakota for wheat, corn, and soybeans. Users can select different Crops and Years and utilize two key functionalities: a County feature to display crop yields, planting area, total production, and county-wide crop price totals via clickable maps; and a custom Area feature allowing users to draw specific regions to analyze crop yields, areas planted, and aggregate pricing. This interface ensures banks and insurance companies can assess financial risks effectively, while farmers gain crucial insights into expected yields and market conditions. Here is brief overview of how our application works:
The Application
How it Works
Data Extraction Code:
- Here we use the Python environment to extract the data from different datasets using Google Earth Engine API
- Afterwards, we first use the crop-specific land cover data to distinguish different crops- wheat, soybean or corn for each county in North Dakota.
- Further, NDVI, Precipitation, SAR, Soil Moisture & other values are used to get the county-wise values from the year 2000-2024.
- Afterwards, the Yield data is obtained from the United States Department of Agriculture for each of the years and a final dataset is obtained which has all the X Variables (NDVI,PA,SMS_AM,LST_DAY,SMS_PM,LST_NIGHT,PAR,ET) & Y variable (YIELD).
-api
pip install earthengineimport ee
ee.AuthcessYear(year):# Load the CDL dataset for the given year
= ee.ImageCollection('USDA/NASS/CDL')\
dataset filter(ee.Filter.date(f'{year}-01-01', f'{year}-12-31'))\
.
.first()= dataset.select('cropland')
crop_landcover
# Filter for North Dakota counties
#`STATEFP` parameter of the dataset which provides the State FIPS code & the North Dakota value is used.
= ee.FeatureCollection('TIGER/2016/Counties')
counties = counties.filter(ee.Filter.eq('STATEFP', '38'))
nd
# Identify corn areas in North Dakota
#`cropland` values for different crops of our study are used Wheat, Corn & Soybean Values provided from the Cropland Table.
= crop_landcover.eq(1).Or(crop_landcover.eq(12)).Or(crop_landcover.eq(13))
corn = crop_landcover.updateMask(corn).clipToCollection(nd)
masked_corn
# Calculate NDVI for corn areas using MODIS data
#`NDVI` parameter of the dataset and we obtain the mean over the growth period of the crop
= ee.ImageCollection('MODIS/061/MOD13Q1')\
NDVI_dataset filter(ee.Filter.date(f'{year}-05-01', f'{year}-10-01'))
.= NDVI_dataset.select('NDVI')
ndvi = ndvi.mean().rename('NDVI')
mean_ndvi = mean_ndvi.updateMask(masked_corn)
cornNDVI
# Calculate precipitation using GRIDMET data
#`pr` parameter of the dataset which provides the 'Precipitation amount' in mm (daily total)
= ee.ImageCollection("IDAHO_EPSCOR/GRIDMET")\
precipitation_dataset filter(ee.Filter.date(f'{year}-05-01', f'{year}-10-01'))\
.'pr')
.select(= precipitation_dataset.mean().rename('PA')
mean_precipitation
# Load Radiometer Global Daily 9 km Soil Moisture AM
#`soil_moisture_am` & `soil_moisture_pm` parameter of the dataset which provides 'Retrieved soil moisture estimate from the
# disaggregated/downscaled vertical polarization brightness temperature at 9-km grid cell one at AM overpass & other at PM overpass. in dB.
= ee.ImageCollection("NASA/SMAP/SPL3SMP_E/005")\
smap_dataset filter(ee.Filter.date(f'{year}-05-01', f'{year}-10-01'))\
.'soil_moisture_am')
.select(= smap_dataset.mean().rename('SMS_AM')
mean_soil_moisture
# Load Radiometer Global Daily 9 km Soil Moisture PM
= ee.ImageCollection("NASA/SMAP/SPL3SMP_E/005")\
smapDataset_pm filter(ee.Filter.date(f'{year}-05-01', f'{year}-10-01'))\
.'soil_moisture_pm')
.select(= smapDataset_pm.mean().rename('SMS_PM')
meanSoilMoisture_pm
# Load MODIS Land Surface Temperature DAY
#`LST_Day_1km` & `LST_Night_1km` parameter of the dataset which provides 'Daytime Land Surface Temperature' &
# Nighttime Land Surface Temperature' both in Kelvin (K).
= ee.ImageCollection("MODIS/061/MOD11A1")\
lstDataset filter(ee.Filter.date(f'{year}-05-01', f'{year}-10-01'))
.
= lstDataset.select('LST_Day_1km')\
lstmean_celsius \
.mean()0.02)\
.multiply(273.15)\
.subtract('LST_DAY')
.rename(# Load MODIS Land Surface Temperature NIGHT
= ee.ImageCollection("MODIS/061/MOD11A1")\
lstDataset_night filter(ee.Filter.date(f'{year}-05-01', f'{year}-10-01'))
.
= lstDataset_night.select('LST_Night_1km')\
lstmean_celsius_night \
.mean()0.02)\
.multiply(273.15)\
.subtract('LST_NIGHT')
.rename(
# Photosynthetically Active Radiation Daily 3-Hour
#`GMT_1200_PAR` parameter of the dataset which provides 'Total PAR at GMT 12:00'. PAR is incident solar radiation in
# the visible spectrum (400-700 nanometers) and is an important variable in land-surface models having use in agriculture &
# other scientific applications.
= ee.ImageCollection("MODIS/061/MCD18C2")\
par_12 filter(ee.Filter.date(f'{year}-05-01', f'{year}-10-01'))\
.'GMT_1200_PAR')
.select(
= par_12.mean().rename('PAR'); # Calculate the Photosynthetically Active Radiation at 12
mean_par_12
# Net Evapotranspiration
# `ET` parameter of the dataset which provides 'Total evapotranspiration' in kg/m^2/8day.s.
= ee.ImageCollection("MODIS/061/MOD16A2GF")\
netevapo filter(ee.Filter.date(f'{year}-05-01', f'{year}-10-01'))\
.'ET')
.select(
= netevapo.mean().rename('ET') # Calculate the mean Soil Moisture
mean_netevapo
# Combine all layers
= cornNDVI.addBands(mean_precipitation).addBands(mean_s1_vv).addBands(mean_soil_moisture).addBands(lstmean_celsius).addBands(meanSoilMoisture_pm).addBands(lstmean_celsius_night).addBands(mean_par_12).addBands(mean_netevapo)
combinedDataset
# Reduce regions and calculate mean values over the specified areas
= combinedDataset.reduceRegions(
combined_mean =nd,
collection=ee.Reducer.mean(),
reducer=30,
scale=4,
tileScale
)
# Define export parameters
= {
export_params 'collection': combined_mean,
'description': f'combined_{year}',
'folder': 'GEE_Folder',
'fileNamePrefix': f'Combined_{year}',
'fileFormat': 'CSV',
'selectors': ['NAME', 'GEOID', 'NDVI', 'PA', 'SAR', 'SMS_AM', 'LST_DAY', 'SMS_PM', 'LST_NIGHT', 'PAR', 'ET']
}
# Commented the line below as I have got the data in my drive already
#ee.batch.Export.table.toDrive(**export_params).start()
# Example of processing each year
for year in range(2000, 2024):
processYear(year)
Methodology Code:
- First, we experimented with various machine learning methods in Python, including Linear Regression (LR), Random Forest Regressor (RF), Gradient Boosting Regressor (XGB), and Artificial Neural Network (ANN). By comparing their R2 and RMSE scores, we identified the most suitable method for each of the three crops.
- Considering the impact of environmental factors on crop growth within a specified time frame, we initially explored all nine parameters. Subsequently, based on the importance tests from the Random Forest model, five to six key variables were selected as independent variables in the predictive models for each crop.
- In GEE, we trained Random Forest (RF) predictive models separately for three types of crops using data from six years, spanning 2018 to 2023, and split them into training/validation datasets. After obtaining the models, we assessed their performance using R2 and RMSE.
- Using the historical average method to make a simple prediction for the crop yield of the coming year.
Part 1:
#install packages
!pip install tensorflow
#load packages
import pandas as pd
from sklearn.preprocessing import StandardScaler
#packages for manipulating dataframe
import numpy as np
import pandas as pd
import geopandas as gpd
from shapely.geometry import Point
import sklearn
#packages for machine learning
##train-test-split
from sklearn.model_selection import train_test_split, GridSearchCV, validation_curve
##method 1: Linear Regression (LR)
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.model_selection import cross_val_score
##method 2: Random Forest Regressor (RF)
import rfpimp
from sklearn.ensemble import RandomForestRegressor
##method 3: Gradient Boosting Regressor (XGB)
import xgboost
from xgboost import XGBRegressor
##method 4: Artificial Neural Network (ANN)
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
##cross validation
##evaluation metrics (R2 and RMSE)
from sklearn.metrics import r2_score, mean_squared_error
#data visualization
import matplotlib.pyplot as plt
'display.max_rows', 300) # specifies number of rows to show
pd.set_option(= '{:40,.4f}'.format # specifies default number format to 4 decimal places
pd.options.display.float_format 'ggplot') # specifies that graphs should use ggplot styling
plt.style.use(%matplotlib inline
# 1. Load & Cleaning Data
#load data
= pd.read_csv('https://www.dropbox.com/scl/fi/ixibqk9pyuehwvoz7mr1h/2018_County_Summary_Merged.csv?rlkey=wueodqdvzsht5or4eftwdd3ys&dl=1')
soybean_2018 = pd.read_csv('https://www.dropbox.com/scl/fi/x3oiqrikr0xagqrh5tiqu/2019_County_Summary_Merged.csv?rlkey=zr5wru8sg5iybp43lnjf7ls5a&dl=1')
soybean_2019 = pd.read_csv('https://www.dropbox.com/scl/fi/m4r74ydgw3yno93nakagl/2020_County_Summary_Merged.csv?rlkey=kmd8bozo9z9jxznoa775gg33z&dl=1')
soybean_2020 = pd.read_csv('https://www.dropbox.com/scl/fi/2vmvsyjloz9hvzejtdez4/2021_County_Summary_Merged.csv?rlkey=ossgr4x3fvzgebhqprchntt2f&dl=1')
soybean_2021 = pd.read_csv('https://www.dropbox.com/scl/fi/x61224yvwtq4idnqphwjy/2022_County_Summary_Merged.csv?rlkey=s3zm9ooap8pjqom3jr0aklc6t&dl=1')
soybean_2022 = pd.read_csv('https://www.dropbox.com/scl/fi/zrjnmeqysrokfsua24hb3/2023_County_Summary_Merged.csv?rlkey=jb4w1pt295zeagbvgm7c2t7pk&dl=1')
soybean_2023
= [soybean_2018, soybean_2019, soybean_2020, soybean_2021, soybean_2022, soybean_2023]
soybean_list = pd.concat(soybean_list)
soybean_df = soybean_df.drop(['NAME','GEOID','SMS_PM','SMS_AM','SAR'], axis=1)
soybean_df
soybean_df.info()
# 2.1 Train & Test Data Split
#split the dataset
= soybean_df.drop('YIELD', axis=1)
X = soybean_df['YIELD']
y
= train_test_split(X, y, test_size=0.25, random_state=100)
X_train, X_test, y_train, y_test
# 2.2 Train & Test Data Standarderlized
def standardize_columns(file_name, columns_to_standardize):
= StandardScaler()
scaler
= file_name
df = scaler.fit_transform(df[columns_to_standardize])
df[columns_to_standardize] return df
def standardize_series(series):
= StandardScaler()
scaler = scaler.fit_transform(series.values.reshape(-1, 1)).flatten()
series return series
= ['LST_DAY','PA','NDVI','ET','LST_NIGHT','PAR']
X_columns = ['YIELD']
y_columns
= standardize_columns(X_train, X_columns)
X_train = standardize_columns(X_test, X_columns)
X_test
# 3. Model Training and Parameter Tuning
# 3.1. Linear Regression (LR)
= LinearRegression()
model_lr
# Cross validation
= cross_val_score(model_lr, X, y, cv=5, scoring='neg_mean_squared_error')
scores
= np.mean(scores)
mean_mse = np.std(scores)
std_mse
print(f'Mean MSE: {mean_mse}')
print(f'Standard Deviation of MSE: {std_mse}')
model_lr.fit(X_train, y_train)
# 3.2. Random Forest Regressor (RF)
# values of max_depth and min_samples_split
= {'max_depth':[3,5,10,20,30], 'min_samples_split':[2,4,6,8,10]}
hyperparameters
= 10000
randomState_dt = RandomForestRegressor(random_state=randomState_dt)
model_rf
# cv=5 by default, which means 5-fold cross-validation
= GridSearchCV(model_rf, hyperparameters)
clf
clf.fit(X_train, y_train)
# we can query the best parameter value and its accuracy score
print ("The best parameter value is: ")
print (clf.best_params_)
print ("The best score is: ")
print (clf.best_score_)
# Train the final RF
= RandomForestRegressor(max_depth=clf.best_params_['max_depth'], min_samples_split=clf.best_params_['min_samples_split'], random_state=randomState_dt)
rf_final
rf_final.fit(X_train, y_train)
# 3.3. Gradient Boosting Regressor (XGB)
import warnings
='ignore', category=FutureWarning)
warnings.simplefilter(action# values of max_depth and min_samples_split
= {'max_depth':[2,4,6,8,10], 'n_estimators':[4,8,12,16,20]}
hyperparameters
= 125
randomState_xgb = XGBRegressor(random_state=randomState_xgb)
xgb
# cv=5 by default, which means 5-fold cross-validation
= GridSearchCV(xgb, hyperparameters)
gscv_xgb
gscv_xgb.fit(X_train, y_train)
# we can query the best parameter value and its accuracy score
print ("The best parameter value is: ")
print (gscv_xgb.best_params_)
print ("The best score is: ")
print (gscv_xgb.best_score_)
# 3.4. Artificial Neural Network (ANN)
= keras.Sequential([
model_ann =(6,)), # Input layer
layers.Input(shape128, activation='relu'), # Hidden layer with ReLU activation
layers.Dense(0.5), # Dropout layer for regularization
layers.Dropout(64, activation='relu'), # Additional hidden layer
layers.Dense(0.3), # Another dropout layer
layers.Dropout(1) # Output layer
layers.Dense(
])
#measuring the training with certain metrics
compile(optimizer='adam', loss='mean_squared_error', metrics=['accuracy'])
model_ann.
#train the model
=100, validation_data=(X_test, y_test))
model_ann.fit(X_train, y_train, epochs
# 4. Model Evaluation and Performance Comparison
# 4.1. E. Linear Regression (LR)
= model_lr.predict(X_train)
train_predictions = model_lr.predict(X_test)
test_predictions
= r2_score(y_train, train_predictions)
r2_train_lr = r2_score(y_test, test_predictions)
r2_test_lr
= mean_squared_error(y_train, train_predictions, squared=False)
rmse_train_lr = mean_squared_error(y_test, test_predictions, squared=False)
rmse_test_lr
print(f"Training R^2: {r2_train_lr:.4f}")
print(f"Test R^2: {r2_test_lr:.4f}")
print(f"Training RMSE: {rmse_train_lr:.4f}")
print(f"Test RMSE: {rmse_test_lr:.4f}")
# 4.2. E. Random Forest Regressor (RF)
= rf_final.score(X=X_train, y=y_train)
r2_train_rf = rf_final.score(X=X_test, y=y_test)
r2_test_rf
print("R2 on the training data:")
print(r2_train_rf)
print("R2 on the testing data:")
print(r2_test_rf)
= mean_squared_error(y_train, rf_final.predict(X_train), squared=False)
rmse_train_rf = mean_squared_error(y_test, rf_final.predict(X_test), squared=False)
rmse_test_rf
print("RMSE on the training data:")
print(rmse_train_rf)
print("RMSE on the testing data:")
print(rmse_test_rf)
# Calculate and plot the feature importance of the RF model
= rfpimp.importances(rf_final, X_test, y_test)
imp print(imp)
= rfpimp.plot_importances(imp)
viz
viz.view()
# 4.3. E. Gradient Boosting Regressor (XGB)
= XGBRegressor(max_depth=gscv_xgb.best_params_['max_depth'], n_estimators=gscv_xgb.best_params_['n_estimators'], random_state=randomState_xgb)
model_xgb
model_xgb.fit(X_train, y_train)
# r2_train_xgb, r2_test_xgb, rmse_train_xgb, rmse_test_xgb
= model_xgb.score(X=X_train, y=y_train)
r2_train_xgb = model_xgb.score(X=X_test, y=y_test)
r2_test_xgb = mean_squared_error(y_train, model_xgb.predict(X_train), squared=False)
rmse_train_xgb = mean_squared_error(y_test, model_xgb.predict(X_test), squared=False)
rmse_test_xgb
print("R2 on the training data:")
print(r2_train_xgb)
print("R2 on the testing data:")
print(r2_test_xgb)
print("RMSE on the training data:")
print(rmse_train_xgb)
print("RMSE on the testing data:")
print(rmse_test_xgb)
= rfpimp.importances(model_xgb, X_test, y_test) # permutation
imp_xgb print(imp_xgb)
= rfpimp.plot_importances(imp_xgb)
viz_xgb
viz_xgb.view()
# 4.4. E. Artificial Neural Network (ANN)
#predictions
= model_ann.predict(X_train).flatten()
y_pred_train_ann = model_ann.predict(X_test).flatten()
y_pred_test_ann
#Compute R2 and RMSE
= np.round(r2_score(y_train, y_pred_train_ann),2)
r2_train_ann = np.round(r2_score(y_test, y_pred_test_ann),2)
r2_test_ann = np.round(np.sqrt(mean_squared_error(y_train, y_pred_train_ann)),2)
rmse_train_ann = np.round(np.sqrt(mean_squared_error(y_test, y_pred_test_ann)),2)
rmse_test_ann
#print the result
print("Train R2:", r2_train_ann)
print("Test R2:", r2_test_ann)
print("Train RMSE:", rmse_train_ann)
print("Test RMSE:", rmse_test_ann)
#crosscheck the y value between real and predicted
= {
crosscheck_y_dict 'y_test' : y_test,
'y_pred' : np.round(y_pred_test_ann,0),
'delta' : np.abs(np.round((y_test - y_pred_test_ann),0))
}
#plotting histogram
= pd.DataFrame(crosscheck_y_dict)
crosscheck_y_df 'delta'], bins=10)
plt.hist(crosscheck_y_df[f'Delta\nR2_test = {r2_test_ann}, RMSE_test = {rmse_test_ann}, delta_max = {crosscheck_y_df.delta.max()}')
plt.xlabel('Frequency (Number of Data)')
plt.ylabel(f"Accuracy of ANN Model Based on Delta of Y Test and Y Pred)")
plt.title(
plt.show()
# 4.5. Model Performance Comparison
#please input your metrics in here
= {
metrics_dict 'metrics': ["Train R2","Test R2","Train RMSE","Test RMSE"],
'LR': [r2_train_lr, r2_test_lr, rmse_train_lr, rmse_test_lr],
'RF': [r2_train_rf, r2_test_rf, rmse_train_rf, rmse_test_rf],
'XGB': [r2_train_xgb, r2_test_xgb, rmse_train_xgb, rmse_test_xgb],
'ANN': [r2_train_ann, r2_test_ann, rmse_train_ann, rmse_test_ann]
}
#create dataframe
= pd.DataFrame(metrics_dict)
metrics_df 'metrics') metrics_df.set_index(
Part 2:
// ========================== GEE ========================================
// Import CSV data
// Load data from CSV file
var csvFile = ee.FeatureCollection('projects/ee-ucfnfli/assets/corn_df');
// ['B1', 'B2', 'B3', 'B4', 'B5', 'B6', 'B7', 'B8', 'B9', 'B10', 'B11', 'BQA']
var bands = ['NDVI', 'LST_DAY', 'ET', 'PAR', 'PA'];
// ========================== train the RF classification model ========================================
// Split the training and testing data
var withRandom = csvFile.randomColumn({
columnName:'random',
seed:2,
distribution: 'uniform',
;
})
var split = 0.8;
var trainingData = withRandom.filter(ee.Filter.lt('random', split));
var validationData = withRandom.filter(ee.Filter.gte('random', split));
// var classifier = ee.Classifier.smileRandomForest(100, null, 8, 0.5, null, 125)
var classifier = ee.Classifier.smileRandomForest(100, null, 1, 0.7, null, 0).setOutputMode('REGRESSION')
.train({
features: trainingData,
classProperty: 'YIELD',
inputProperties: bands
;
})//print(classifier);
// =================================== img ======================
var featureTest = trainingData.select(['NDVI', 'LST_DAY', 'ET', 'PAR', 'PA','YIELD']);
var classifiedFeatures = featureTest.classify(classifier,'YIELDPredicted');
// ============================ Validation =================================
// ============================ variable importance =================================
var dict = classifier.explain();
print("Classifier information:", dict);
var variableImportance = ee.Feature(null, ee.Dictionary(dict)
.get('importance'));
// Make chart, print it
var chart =
.Chart.feature.byProperty(variableImportance)
ui.setChartType('ColumnChart')
.setOptions({
title: 'Random Forest Variable Importance',
legend: {
position: 'none'
,
}hAxis: {
title: 'Bands'
,
}vAxis: {
title: 'Importance'
};
})print(chart);
// =================================== Compute RSME R2 =====================
// Separate the observed (REDOX_CM) and predicted (regression) properties
// var sampleTraining = predictedTraining.select(['Yield', 'predicted']);
// Create chart, print it
var chartTraining = ui.Chart.feature.byFeature({features:classifiedFeatures, xProperty:'YIELD', yProperties:['YIELDPredicted']})
.setChartType('ScatterChart')
.setOptions({
title: 'Predicted vs Observed - Training data ',
hAxis: {
'title': 'observed'
,
}vAxis: {
'title': 'predicted'
,
}pointSize: 3,
trendlines: {
0: {
showR2: true,
visibleInLegend: true
,
}1: {
showR2: true,
visibleInLegend: true
}
};
})print(chartTraining);
// Training data
// R2 = 0.964; RMSE = 6.601 in python
// R2 = 0.948; RMSE = 8.396 in GEE
// Compute RSME
// Get array of observation and prediction values
var observationTraining = ee.Array(classifiedFeatures.aggregate_array('YIELD'));
var predictionTraining = ee.Array(classifiedFeatures.aggregate_array('YIELDPredicted'));
//print('observationTraining', observationTraining)
//print('predictionTraining', predictionTraining)
// Compute residuals
var residualsTraining = observationTraining.subtract(predictionTraining);
//print('residualsTraining', residualsTraining)
// Compute RMSE with equation, print it
var rmseTraining = residualsTraining.pow(2)
.reduce('mean', [0])
.sqrt();
print('Training RMSE', rmseTraining);
// =============================== Validation =================================
var featureValidation = validationData.select(['NDVI', 'LST_DAY', 'ET', 'PAR', 'PA','YIELD']);
// Validation data
var classifiedValidation = featureValidation.classify(classifier,'YIELDPredicted');
// print result
//print('ValidRegressionResult:', classifiedValidation);
var sampleValidation = classifiedValidation.select(['YIELD', 'YIELDPredicted']);
// Create chart, print it
var chartValidation = ui.Chart.feature.byFeature(sampleValidation, 'YIELD', 'YIELDPredicted')
.setChartType('ScatterChart')
.setOptions({
title: 'Predicted vs Observed - Validation data',
hAxis: {
'title': 'predicted'
,
}vAxis: {
'title': 'observed'
,
}pointSize: 3,
trendlines: {
0: {
showR2: true,
visibleInLegend: true
,
}1: {
showR2: true,
visibleInLegend: true
}
};
})print(chartValidation);
// Get array of observation and prediction values
var observationValidation = ee.Array(sampleValidation.aggregate_array('YIELD'));
var predictionValidation = ee.Array(sampleValidation.aggregate_array('YIELDPredicted'));
// Compute residuals
var residualsValidation = observationValidation.subtract(predictionValidation);
// Compute RMSE with equation, print it
var rmseValidation = residualsValidation.pow(2)
.reduce('mean', [0])
.sqrt();
print('Validation RMSE', rmseValidation);
// Testing data
// R2 = 0.812; RMSE = 15.166 in python
// R2 = 0.841; RMSE = 12.990 in GEE
// ============================== merge raster data ============================
// Merge
var yield2021 = processYear(2021);
var yield2022 = processYear(2022);
var yield2023 = processYear(2023);
// Concatenate two images into one multi-band image.
var mergeYield = ee.Image.cat([yield2021, yield2022, yield2023]);
print('mergeImg:', mergeYield);
// Obtain the union of all images in the image collection and sum them
// sum all three years yields
var summedImage = mergeYield.reduce(ee.Reducer.sum());
print(summedImage)
// Print the results
print('sumImg:', summedImage);
// Map.addLayer(summedImage,{}, 'TotalYield2024');
var yield2024 = summedImage.divide(3);
print('yield2024:', yield2024);
Map.addLayer(yield2024,{}, 'yield2024');
Interface Code:
- In this section, we perform calculations on the crop layers obtained from the random forest prediction model to generate various outputs. By defining global variables, we dynamically update the crop layers based on the user’s selected year and crop type. The core functionality lies in the computation logic for the County level and Area level modes.
- In the County level mode, a click event listener is created to provide real-time feedback on the user-selected county. The number of pixels within the county is calculated, enabling the estimation of the crop area. Utilizing historical and predicted average yield data and the current year’s average market price, the total production and total value are computed.
- In the Area level mode extends the calculation scope to a user-defined polygon region, with a similar computation method to the County level mode. The “Clear Selected Area Button” allows users to remove unnecessary polygons in the Area level mode, enhancing usability.
- The Interface Code is as follows :
// ——————————————————————————define crop layers————————————————————————————————
var cropLayers = {
Corn: {
// Add layers of corn
'2018': ee.Image("projects/ee-songzimeng/assets/corn2018"),
'2019': ee.Image("projects/ee-songzimeng/assets/corn2019"),
'2020': ee.Image("projects/ee-songzimeng/assets/corn2020"),
'2021': ee.Image("projects/ee-songzimeng/assets/corn2021"),
'2022': ee.Image("projects/ee-songzimeng/assets/corn2022"),
'2023': ee.Image("projects/ee-songzimeng/assets/corn2023"),
'2024': ee.Image("projects/ee-songzimeng/assets/corn2024")
,
}
Soybean: {
// Add layers of soybean
'2018': ee.Image("projects/ee-songzimeng/assets/soybean2018"),
'2019': ee.Image("projects/ee-songzimeng/assets/soybean2019"),
'2020': ee.Image("projects/ee-songzimeng/assets/soybean2020"),
'2021': ee.Image("projects/ee-songzimeng/assets/soybean2021"),
'2022': ee.Image("projects/ee-songzimeng/assets/soybean2022"),
'2023': ee.Image("projects/ee-songzimeng/assets/soybean2023"),
'2024': ee.Image("projects/ee-songzimeng/assets/soybean2024")
,
}
Wheat: {
// Add layers of wheat
'2018': ee.Image("projects/ee-songzimeng/assets/wheat2018"),
'2019': ee.Image("projects/ee-songzimeng/assets/wheat2019"),
'2020': ee.Image("projects/ee-songzimeng/assets/wheat2020"),
'2021': ee.Image("projects/ee-songzimeng/assets/wheat2021"),
'2022': ee.Image("projects/ee-songzimeng/assets/wheat2022"),
'2023': ee.Image("projects/ee-songzimeng/assets/wheat2023"),
'2024': ee.Image("projects/ee-songzimeng/assets/wheat2024")
};
}
// -------------------------- Data ------------------------------
Map.setCenter(-100.55, 47.5, 7);
Map.setOptions('SATELLITE');
// clip the north dakota
var counties = ee.FeatureCollection('TIGER/2016/Counties');
var nd = counties.filter(ee.Filter.eq('STATEFP', '38'));
// Formatted county name function
var nd = nd.map(function(feature) {
var name = ee.String(feature.get('NAME')).toUpperCase().replace(' ', '', 'g');
return feature.set('NAME', name);
;
})
// Show the county boundary
var ndCounties = ee.Image().byte().paint({
featureCollection: nd,
color: null,
width: 1
;
})
// Add the counties layer
Map.addLayer(ndCounties, {}, 'ND Counties');
/// ——————————————Function and global variables——————————————————————————
// Function to read csv
function readCsvFile(selectedYear, selectedCrop) {
var fileName = selectedYear +'_'+ selectedCrop;
var csvFile = ee.FeatureCollection('projects/ee-songzimeng/assets/' + fileName);
return csvFile;
}
// Function to fomat county name
function processCountyColumn(table) {
var countyColumnName = 'County';
function processCountyName(countyName) {
return ee.String(countyName).toUpperCase().replace('\\s+', '');
}
var processedCountyColumn = table.map(function(feature) {
var countyName = feature.get(countyColumnName);
var processedCountyName = processCountyName(countyName);
return feature.set(countyColumnName, processedCountyName);
;
})
// return FeatureCollection
return processedCountyColumn;
}
var selectedCrop='Select...';
var selectedYear='Select...';
var soybeanPrice = 11.90; // 2024 average
var CornPrice = 41.68; // 2024 average
var wheatPrice = 6.07; // 2024 average
var cropPrice = 0; //
var crops = {
'Corn': 1,
'Wheat': 23,
'Soybean': 5
;
}
// ————————————————interface——————————————————————————
// set default year
var defaultYear = '2018';
var cropYieldLayer = null;
var statsLabel_1 = ui.Label('Click on County to see info:');
var statsLabel_2 = ui.Label('Select an area to see info:');
// set original info status
.style().set('shown', true);
statsLabel_1.style().set('shown', false);
statsLabel_2
// Clear button to remove all selected layers
var drawingTools = Map.drawingTools();
var clearButton = ui.Button({
label: 'Clear Selected Area',
onClick: function() {
var layers = drawingTools.layers();
.forEach(function(layer) {
layers.layers().remove(layer);
drawingTools;
})
.clear();
resultsPanel,
}style: {margin: '10px'}
;
})
// the main panel to select mode, year, croptype
var panel = ui.Panel({
widgets: [
.Label('North Dakota Crop Yield', {
uifontWeight: 'bold',
fontSize: '22px',
textAlign: 'center',
stretch: 'horizontal'
,
})
.Label('Select Mode:'),
ui.Select({
uiitems: ['Select...','County Level', 'Area Level'],
value: 'Select...',
onChange: function(mode) {
// operate different
if (mode === 'County Level') {
// County Level
.style().set('shown', true);
statsLabel_1.style().set('shown', false);
statsLabel_2
// reset button
.remove(clearButton);
panel.add(clearButton);
panel
// ban polygon drawing selection
var drawingTools = Map.drawingTools();
.setShown(false);
drawingTools
//Function for getting value from image
var getCalculation = function(countyName, cropYieldLayer) {
var county = nd.filter(ee.Filter.eq('NAME', countyName)).first();
var countyGeometry = county.geometry();
//print(selectedYear, selectedCrop);
var countyData=readCsvFile(selectedYear, selectedCrop);
// print(countyData);
= processCountyColumn(countyData);
countyData
.clear();
resultsPanel
var countStats = cropYieldLayer.reduceRegion({
reducer: ee.Reducer.count(),
geometry: countyGeometry,
scale: 30,
maxPixels: 1e9
;
})//print(countStats);
var selectedCounty = countyData.filter(ee.Filter.eq('County', countyName));
var averYield = selectedCounty.reduceColumns({
reducer: ee.Reducer.mean(),
selectors: ['Value']
;
})//print(averYield);
// create labels
var countyLabel = ui.Label({
value: 'County: ' + countyName,
style: {fontSize: '13px', padding: '0px 50px'}
;
})
var count_sumLabel = ui.Label({
value: 'Calculating...',
style: {fontSize: '13px', padding: '0px 50px'}
;
})
// update labels by calculating
// get the mean yield data
.evaluate(function(result) {
averYieldvar meanYield = result.mean;
var count_averYieldLabel = ui.Label({
value: 'Average Yield: ' + meanYield.toFixed(2) + ' BU/Acre',
style: {fontSize: '13px', padding: '0px 50px'}
;
}).add(count_averYieldLabel);
resultsPanel;
})
// calculate the area and total yield
.get('YIELDpredicted').evaluate(function(value){
countStats
var areaInSqKm = (value / 1e6) * 900;
var areaInAcres = areaInSqKm * 247.105;
.setValue('Crop Area: ' + areaInSqKm.toFixed(2) +
count_sumLabel' km² (' + areaInAcres.toFixed(2) + ' Acres)');
.evaluate(function(result) {
averYieldvar meanYield = result.mean;
var totalYield = areaInAcres * meanYield;
var count_totalYieldLabel = ui.Label({
value: 'Total Yield: ' + totalYield.toFixed(2) + ' BU',
style: {fontSize: '13px', padding: '0px 50px'}
;
})var yieldPrice = totalYield * cropPrice;
var yieldPriceLabel = ui.Label({
value: 'Total Yield Value: ' + yieldPrice.toFixed(2) + ' $',
style: {fontSize: '13px', padding: '0px 50px'}
;
}).add(count_totalYieldLabel);
resultsPanel.add(yieldPriceLabel);
resultsPanel;
});
})
// add the new label to sub-panel
.add(countyLabel);
resultsPanel.add(count_sumLabel);
resultsPanel;
}
Map.unlisten()
// create onclick function
Map.onClick(function(coords) {
var point = ee.Geometry.Point(coords.lon, coords.lat);
var county = ee.Feature(nd.filterBounds(point).first());
var countyName = county.get('NAME');
.evaluate(function(name) {
countyNamegetCalculation(name, cropYieldLayer);
;
})
})
// Area level
else if (mode === 'Area Level') {
}
.style().set('shown', false);
statsLabel_1.style().set('shown', true);
statsLabel_2
// delet onclick monitor
Map.unlisten()
//reset button
.remove(clearButton);
panel.add(clearButton);
panel
// draw polygon
var drawingTools = Map.drawingTools();
.setShown(true);
drawingTools
// function under area level
function initializeAreaLevelMode() {
// create a new drawing tools
var drawingTools = Map.drawingTools();
.setShown(true);
drawingTools
.onDraw(function(geometry) {
drawingTools// get the polygon user drawing
var userPolygon = geometry;
// calculate pixels number inside the polygon user draw
var pixelCount = cropYieldLayer.reduceRegion({
reducer: ee.Reducer.count(),
geometry: userPolygon,
scale: 30,
maxPixels: 1e9
;
})
//calculate average yield user draw
var meanStats = cropYieldLayer.reduceRegion({
reducer: ee.Reducer.mean(),
geometry: userPolygon,
scale: 30,
maxPixels: 1e9
;
})// print(meanStats)
// combined 2 results
var results = ee.Dictionary({
meanYield: meanStats.get('YIELDpredicted'),
pixelCount: pixelCount.get('YIELDpredicted')
;
})
// calculate average yield, crop area, total yield, and update labels
.evaluate(function(values) {
results.clear();
resultsPanel
var area_sumLabel = ui.Label({
value: 'Calculating...',
style: {fontSize: '14px', padding: '0px 50px'}
;
})
var meanYield_sumLabel = ui.Label({
value: 'Calculating...',
style:{fontSize: '14px', padding: '0px 50px'}
;
})
var count_totalYieldLabel = ui.Label({
value: 'Calculating...',
style:{fontSize: '14px', padding: '0px 50px'}
;
})
.add(area_sumLabel);
resultsPanel.add(meanYield_sumLabel);
resultsPanel.add(count_totalYieldLabel);
resultsPanel
.setValue('Average Yield: ' + values.meanYield.toFixed(2) + ' BU/Acre');
meanYield_sumLabel
var areaInSqKm = (values.pixelCount / 1e6) * 900;
var areaInAcres = areaInSqKm * 247.105;
.setValue('Crop Area: ' + areaInSqKm.toFixed(2) +
area_sumLabel' km² (' + areaInAcres.toFixed(2) + ' Acres)');
var totalYield = areaInAcres * values.meanYield;
.setValue('Total Yield: ' + totalYield.toFixed(2) + ' BU');
count_totalYieldLabel
var yieldPrice = totalYield * cropPrice;
var yieldPriceLabel = ui.Label({
value: 'Total Yield Value: ' + yieldPrice.toFixed(2) + ' $',
style: {fontSize: '13px', padding: '0px 50px'}
;
}).add(yieldPriceLabel);
resultsPanel
;
})
;
})
}initializeAreaLevelMode();
}
},
})
.Label('Select Year:'),
ui.Select({
uiitems: ['Select...', '2018', '2019', '2020',
'2021', '2022', '2023', '2024'],
value: 'Select...',
onChange: function(year) {
// update global variable selectedYear, the year user chose
= year;
selectedYear updateMap();
},
})
.Label('Select Crop:'),
ui.Select({
uiitems: ['Select...', 'Soybean', 'Corn', 'Wheat'],
value: 'Select...',
onChange: function(crop) {
= crop;
selectedCrop
// set cropPrice according to selected
if (selectedCrop === 'Soybean') {
= 11.90;
cropPrice else if (selectedCrop === 'Wheat') {
} = 6.07;
cropPrice else if (selectedCrop === 'Corn') {
} = 5.80;
cropPrice else {
} = 0;
cropPrice
}
updateMap();
},
})
,
statsLabel_1
statsLabel_2,
]style: {position: 'top-right'}
;
})
Map.add(panel);
// Add a sub-panel to show calculation info
var resultsPanel = ui.Panel({
layout: ui.Panel.Layout.Flow('vertical'),
style: {width: '310px'}
;
}).add(resultsPanel);
panel
// update new layers accoording to user's selection
function updateMap() {
// // Remove particular layers
// Map.layers().forEach(function(layer) {
// var layerName = layer.getName();
// if (layerName.indexOf('YIELD_') === 0) {
// Map.remove(layer);
// }
// });
Map.layers().reset();
// Show layers if user choose both selections
if (selectedYear !== 'Select...' && selectedCrop !== 'Select...') {
= cropLayers[selectedCrop][selectedYear];
cropYieldLayer
if (cropYieldLayer) {
var layerName = selectedCrop + '_' + selectedYear;
Map.addLayer(cropYieldLayer, {}, 'YIELD_' + layerName);
}
}
// add the counties layer
Map.addLayer(ndCounties, {}, 'ND Counties');
}