Ukraine GPP Analysis
Michael Wellington, Roger Lawes and Petra Kuhnert
25 January 2023
Ukraine-GPP-analysis.Rmd
Overview
The following script accompanies the paper by Wellington et al. (2022) that explores crop production and grain exports in Ukraine. This first vignette focuses on trends in crop production using Gross Primary Productivity (GPP).
Reading in relevant datasets
We load in GPP data clipped to Ukraine and load in the relevant shape
file. These datasets are also available in the /data
and
/inst/extdata
directories and can be read in using the
readRDS
and st_read
commands respectively.
library(UkraineCrops)
data(GPP_mod_df)
data(Ukr_bnds)
data(Ukr_FAOSTAT)
Fitting the Generalised Additive Model (GAM)
We fit a Generalised Additive Model (GAM) (Wood 2006) assuming a Gaussian distribution for the response variable, NPP and include smooth terms for year, season, space and relevant interactions, which consist of season within year, and space and time.
all_f <- GPP ~
# smooth term for year
s(year, bs="cr", k=12) +
# cyclic term for season
s(month, bs="cc", k=12) +
# smooth term for space
s(x,y, bs='gp', k=50) +
# seasonal within year
ti(month, year, bs = c("cc", "cr"), k = c(12,12)) +
# space x time
ti(x, y, year, d = c(2, 1), bs = c("gp", "cr"),
k = c(50,13), m=list(2, NA))
We then add in the factor inv
which allows us to
investigate a pre and post invasion effect.
We also explore a GAM structure that fits a term reflecting the pre
(2012-21) and post war (2022) periods. This term which we refer to as
war
represents a binary variable where 1 indicates year
2022 when the war began.
all_f_war <- GPP ~
# smooth term for year
war +
# cyclic term for season
s(month, bs="cc", k=12, by=war) +
# smooth term for space
s(x,y, bs='gp', k=50, by=war) +
# space x time
ti(x, y, month, d = c(2, 1), bs = c("gp", "cr"),
k = c(50,12), m=list(2, NA), by=war)
We now fit each model using the mgcv
package in
R
and store the results. To accommodate temporal
dependencies we fit a correlation term in each model that was the result
of an iterative process as outlined in Wood et al. (2017). The final
model fit for the invasion model is presented here from which we extract
effect direction and size.
# Run GAMs for large datasets
all_gam <- bam(all_f, data=GPP_mod_df, discrete=TRUE, nthreads=8, rho=0.8)
all_gam_war <- bam(all_f_war, data=GPP_mod_df, discrete=TRUE, nthreads=8, rho=0.6)
# Output shows effect of the invasion period on cropland GPP
summary(all_gam_war)
##
## Family: gaussian
## Link function: identity
##
## Formula:
## GPP ~ war + s(month, bs = "cc", k = 12, by = war) + s(x, y, bs = "gp",
## k = 50, by = war) + ti(x, y, month, d = c(2, 1), bs = c("gp",
## "cr"), k = c(50, 12), m = list(2, NA), by = war)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.911e-02 9.465e-06 2018.918 < 2e-16 ***
## warPost-invasion -2.472e-04 4.176e-05 -5.919 3.24e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(month):warPre-invasion 9.998 10.00 388543.14 <2e-16 ***
## s(month):warPost-invasion 9.987 10.00 30197.84 <2e-16 ***
## s(x,y):warPre-invasion 48.530 48.92 3377.65 <2e-16 ***
## s(x,y):warPost-invasion 44.607 45.92 22.75 <2e-16 ***
## ti(month,x,y):warPre-invasion 490.729 519.96 630.80 <2e-16 ***
## ti(month,x,y):warPost-invasion 366.474 419.60 86.94 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.862 Deviance explained = 86.2%
## fREML = -5.3045e+06 Scale est. = 2.4779e-05 n = 1292171
Model Checking
We examine the fit of the model and check for temporal and spatial autocorrelation in the residuals. These diagnostics are examined below and sufficiently account for correlation in the data with the given model structure.
# Temporal autocorrelation plot
check_resid(all_gam, ask=FALSE)
# Variogram for spatial autocorrelation
resids <- residuals.gam(all_gam)
data_pred <- data.frame(resids = resids, long = GPP_mod_df$x, lat = GPP_mod_df$y)
coordinates(data_pred) <- ~long+lat
# Select a sample for compute efficiency
var_plot <- variogram (resids ~ 1, data = data_pred[sample(1:nrow(data_pred), 10000),])
plot(var_plot)
Investigating trends and agreement with ground data
We generate a summed effect of year plot, extract data from it, and pass it to a ggplot2 object.
Ukr_year_smplot <- plot_smooth(all_gam, view='year', rm.ranef = F)
## Summary:
## * year : numeric predictor; with 30 values ranging from 2010.000000 to 2022.000000.
## * month : numeric predictor; set to the value(s): 7.
## * x : numeric predictor; set to the value(s): 31.7105295.
## * y : numeric predictor; set to the value(s): 48.9581829.
Ukr_year_sm_ggplot <- ggplot() + geom_line(aes(y=Ukr_year_smplot$fv$fit,
x=Ukr_year_smplot$fv$year)) +
geom_ribbon(aes(x=Ukr_year_smplot$fv$year,
ymax= Ukr_year_smplot$fv$ul, ymin=Ukr_year_smplot$fv$ll), alpha=0.2) +
scale_x_continuous(breaks=seq(2010,2022,1), limits = c(2009.5,2022)) + theme_bw() + ylim(c(0,0.055)) +
labs(x="Year", y=bquote('GPP'~(kgC/m^2)))
We will also generate a bar plot showing the total crop production tonnage in Ukraine from 2010 to 2020 and the crops comprising it. These data were taken from the FAOSAT database.
Ukr_grain_dat <- Ukr_FAOSTAT %>% filter(Element == "Production")
Ukr_real_grain <- ggplot() + geom_bar(data=Ukr_grain_dat, aes(fill=Item, y=Value, x=Year),
position="stack", stat="identity") + scale_fill_npg() +
theme_bw() + labs(fill='Crop Type', y= 'Total tonnes produced') +
scale_y_continuous(labels = unit_format(unit = "M", scale = 1e-6)) +
scale_x_continuous(breaks=seq(2010,2022, 1), limit=c(2009.5, 2022))
Before we format the final plot, we’ll perform a correlation test to check agreement between total crop production from FAOSTAT data and our NPP model. First of all, we extract the prediction terms for the year effect from our GAM.
Ukr_annual_preds_terms <- predict(all_gam, type="iterms", ## Predict without new data
se.fit=TRUE)
head(Ukr_annual_preds_terms$fit)
## s(year) s(month) s(x,y) ti(month,year) ti(year,x,y)
## 1 -0.0007233708 -0.01661777 -0.001468356 0.001645004 -0.0027265129
## 2 -0.0007233708 -0.01661777 -0.001402574 0.001645004 -0.0023444105
## 3 -0.0007233708 -0.01661777 -0.001000935 0.001645004 -0.0007011057
## 4 -0.0007233708 -0.01661777 -0.001042145 0.001645004 -0.0009392004
## 5 -0.0007233708 -0.01661777 -0.001069893 0.001645004 -0.0011056382
## 6 -0.0007233708 -0.01661777 -0.001463295 0.001645004 -0.0018297377
Ukr_annual_preds_fit <- as.data.frame(Ukr_annual_preds_terms$fit)
Ukr_annual_preds_fit$year <- GPP_mod_df$year
names(Ukr_annual_preds_fit) <- c('year_hat', 'month_hat', 'space_hat', 'yxm_hat', 'yxs_hat', 'year')
Ukr_annual_terms <- Ukr_annual_preds_fit %>% group_by(year) %>%
summarise(year_term=mean(year_hat)) %>% filter(year<2021)
We’ll also summarise the crop production data and create a dataframe prior to the correlation test.
Ukr_grainprod_annual <- Ukr_grain_dat %>% group_by(Year) %>% filter(Element =="Production") %>%
summarise(tonnes=sum(Value))
Ukr_corr_df <- bind_cols(Ukr_grainprod_annual$tonnes, Ukr_annual_terms$year_term)
## New names:
## • `` -> `...1`
## • `` -> `...2`
Let’s inspect the relationship to see if it is linear before formally conducting the test.
ggscatter(data = Ukr_corr_df, x = 'YearTerms', y = 'RealTonnes',
add = "reg.line", conf.int = TRUE,
cor.coef = TRUE, cor.method = "pearson",
xlab = "Predicted GPP term", ylab = "Real T crop produced")
## `geom_smooth()` using formula 'y ~ x'
Check both columns for normality and conduct a correlation test.
shapiro.test(Ukr_grainprod_annual$tonnes)
##
## Shapiro-Wilk normality test
##
## data: Ukr_grainprod_annual$tonnes
## W = 0.91013, p-value = 0.2447
shapiro.test(Ukr_annual_terms$year_term)
##
## Shapiro-Wilk normality test
##
## data: Ukr_annual_terms$year_term
## W = 0.93534, p-value = 0.4674
cor.test(Ukr_grainprod_annual$tonnes, Ukr_annual_terms$year_term, method="pearson", alternative="greater")
##
## Pearson's product-moment correlation
##
## data: Ukr_grainprod_annual$tonnes and Ukr_annual_terms$year_term
## t = 2.7352, df = 9, p-value = 0.01151
## alternative hypothesis: true correlation is greater than 0
## 95 percent confidence interval:
## 0.2317292 1.0000000
## sample estimates:
## cor
## 0.6737394
Finally, we can plot the summed effect of year alongside the real production data.
Ukr_realagree <- ggarrange(Ukr_year_sm_ggplot,Ukr_real_grain,
labels = NULL,
ncol = 1, nrow = 2,
common.legend = TRUE, legend = "bottom",
align = "hv",
font.label = list(size = 10, color = "black", face = "bold", family = NULL, position = "top"))
Ukr_realagree
Monthly effect and crop calendar
We’d also like to inspect the effect of the month on cropland NPP pre and post invasion. We can do this with the plot_smooth function again. The code chunk below also produces a crop calendar adapted from the UN FAO GIEWS Factsheet.
month_war_plot <- plot_smooth(all_gam_war, view="month",plot_all="war", rm.ranef = F)
## Summary:
## * war : factor; set to the value(s): Post-invasion, Pre-invasion.
## * month : numeric predictor; with 30 values ranging from 1.000000 to 12.000000.
## * x : numeric predictor; set to the value(s): 31.7105295.
## * y : numeric predictor; set to the value(s): 48.9581829.
month_war_plot_dat <- month_war_plot$fv %>% select(war, month, fit, ll, ul) %>%
filter(war=="Pre-invasion" | month<12&month>2.8)
month_war_plot_dat$month_abb <- month.abb[month_war_plot_dat$month]
month_comp_plot <- ggplot(data=month_war_plot_dat) + geom_line(aes(x=month, y=fit, col=war)) +
geom_ribbon(aes(ymin=ll, ymax=ul, x=month, fill=war), alpha=0.2) + theme_bw() +
scale_x_continuous(breaks = 1:12,
labels = c('Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec')) +
labs(y=bquote('GPP'~(kgC/m^2)), x='Month') +
theme(legend.title=element_blank()) + scale_fill_npg()
colors <- c("Sowing" = "brown", "Growing" = "darkgreen", "Harvest" = "orange")
Ukr_crop_cal <- ggplot() +
geom_linerange(aes(x= "Winter Cereals", ymin=as.Date("2013-01-01"), ymax=as.Date("2013-06-30"), color="Growing"), size=5) +
geom_linerange(aes(x= "Winter Cereals", ymin=as.Date("2013-06-30"), ymax=as.Date("2013-08-20"), color="Harvest"), size=5) +
geom_linerange(aes(x= "Winter Cereals", ymin=as.Date("2013-09-01"), ymax=as.Date("2013-10-30"), color="Sowing"), size=5) +
geom_linerange(aes(x= "Winter Cereals", ymin=as.Date("2013-10-30"), ymax=as.Date("2013-12-31"), color="Growing"), size=5) +
geom_linerange(aes(x= "Maize", ymin=as.Date("2013-05-01"), ymax=as.Date("2013-06-10"), color="Sowing"), size=5) +
geom_linerange(aes(x= "Maize", ymin=as.Date("2013-06-10"), ymax=as.Date("2013-10-01"), color="Growing"), size=5) +
geom_linerange(aes(x= "Maize", ymin=as.Date("2013-10-01"), ymax=as.Date("2013-11-30"), color="Harvest"), size=5) +
geom_linerange(aes(x= "Sunflower", ymin=as.Date("2013-04-01"), ymax=as.Date("2013-06-01"), color="Sowing"), size=5) +
geom_linerange(aes(x= "Sunflower", ymin=as.Date("2013-06-01"), ymax=as.Date("2013-08-31"), color="Growing"), size=5) +
geom_linerange(aes(x= "Sunflower", ymin=as.Date("2013-08-31"), ymax=as.Date("2013-10-30"), color="Harvest"), size=5) +
geom_linerange(aes(x= "Rapeseed", ymin=as.Date("2013-01-01"), ymax=as.Date("2013-06-30"), color="Growing"), size=5) +
geom_linerange(aes(x= "Rapeseed", ymin=as.Date("2013-06-30"), ymax=as.Date("2013-08-31"), color="Harvest"), size=5) +
geom_linerange(aes(x= "Rapeseed", ymin=as.Date("2013-08-31"), ymax=as.Date("2013-10-01"), color="Sowing"), size=5) +
geom_linerange(aes(x= "Rapeseed", ymin=as.Date("2013-10-01"), ymax=as.Date("2013-12-31"), color="Growing"), size=5) +
geom_linerange(aes(x= "Potatoes", ymin=as.Date("2013-04-01"), ymax=as.Date("2013-06-01"), color="Sowing"), size=5) +
geom_linerange(aes(x= "Potatoes", ymin=as.Date("2013-06-01"), ymax=as.Date("2013-09-01"), color="Growing"), size=5) +
geom_linerange(aes(x= "Potatoes", ymin=as.Date("2013-09-01"), ymax=as.Date("2013-10-31"), color="Harvest"), size=5) +
coord_flip() + scale_y_date(lim = c(as.Date("2013-01-01"), as.Date("2013-12-31")),breaks=date_breaks(width = "1 month"), labels = date_format("%b"))+
labs(x="", y="Month") + theme_bw()+ scale_color_manual(values = colors) + theme(legend.title=element_blank())
Now we can combine the plots.
Ukr_cropplots <- ggdraw() +
draw_plot(month_comp_plot, x=0.075, y=0.5, width=0.91, height=0.5) +
draw_plot(Ukr_crop_cal, x=0, y=0, width=1, height=0.5)+
theme(plot.background = element_rect(fill="white", color = NA))
Ukr_cropplots
Generate a spatial plot with uncertainty
We will use the Vizumap package to compare the distribution of NPP pre and post invasion and visualise uncertainty associated with the estimate (Lucchesi, 2021). First, we extract data from the GAMs.
space_nowar <- fvisgam(all_gam_war, view = c("x", "y"),
cond = list(war = "Pre-invasion"),
main = "Pre-invasion", rm.ranef = T,
n.grid=100,
too.far=0.01)
## Summary:
## * war : factor; set to the value(s): Pre-invasion.
## * month : numeric predictor; set to the value(s): 7.
## * x : numeric predictor; with 100 values ranging from 22.188388 to 40.154693.
## * y : numeric predictor; with 100 values ranging from 44.466606 to 52.371781.
## * NOTE : The following random effects columns are canceled: s(month):warPre-invasion,s(month):warPost-invasion
##
# Extract data for plotting spatial term
space_nowar_dat <- space_nowar$fv
space_nowar_dat$SE <- space_nowar_dat$CI/1.96
# Rpt for war
space_war <- fvisgam(all_gam_war, view = c("x", "y"),
cond = list(war = "Post-invasion"),
main = "Post-invasion", rm.ranef = T,
n.grid=100,
too.far=0.01)
## Summary:
## * war : factor; set to the value(s): Post-invasion.
## * month : numeric predictor; set to the value(s): 7.
## * x : numeric predictor; with 100 values ranging from 22.188388 to 40.154693.
## * y : numeric predictor; with 100 values ranging from 44.466606 to 52.371781.
## * NOTE : The following random effects columns are canceled: s(month):warPre-invasion,s(month):warPost-invasion
##
# Extract data for plotting spatial term
space_war_dat <- space_war$fv
space_war_dat$SE <- space_war_dat$CI/1.96
Next, we use Vizumap functions to build a palette and colour key.
# Create palette
gpp_pal <- build_palette(name = "usr",
colrange = list(colour = c("chartreuse4", "darkblue"),
difC = c(3, 4)))
view(gpp_pal)
# creating df for Vizumap plotting
space_gpp_dat <- bind_rows(space_nowar_dat, space_war_dat)
gpp_war_df <- read.uv(data = space_gpp_dat, estimate = "fit",
error = "SE")
names(gpp_war_df)[c(1, 5,6)] <- c("Estimate", "long", "lat")
# build key
UKey <- build_bkey(data = gpp_war_df, terciles = T, palette = gpp_pal)
k <- view(UKey)
nppBivMap <- build_bmap(data = gpp_war_df, terciles = T,
palette = gpp_pal)
Finally, we can build the Vizumap plot.
obj <- nppBivMap
obj_cropped <- exclude.too.far(obj$output_data$long, obj$output_data$lat,
GPP_mod_df$x, GPP_mod_df$y, dist=0.015)
m <- ggplot() +
geom_raster(data = obj$output_data[!obj_cropped,],
aes(x = long, y = lat, fill = hex_code)) +
facet_wrap(~war, nrow=2) + scale_fill_identity() + coord_quickmap() +
xlab("") + ylab("") + geom_sf(data=Ukr_bnds, alpha=0) + theme_bw()
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
mk <- ggdraw() +
draw_plot(m, x=0, y=0, width=0.6, height=1) +
draw_plot(k, x=0.6, y=0, width = 0.4, height=1)+
theme(plot.background = element_rect(fill="white", color = NA))
mk
References
Lucchesi, L., Kuhnert, P., Wikle, C. (2021). Vizumap: an R package for visualising uncertainty in spatial data. Journal of Open Source Software, 6(59), 2409, https://doi.org/10.21105/joss.02409
Wellington, M., Lawes, R., Kuhnert, P. (2022) Rapid monitoring of crop growth, grain exports, and fire patterns in Ukraine.
Wood, S. N., Li, Z., Shaddick, G. & Augustin, N. H. (2017) Generalized Additive Models for Gigadata: Modeling the U.K. Black 166 Smoke Network Daily Data. J. Am. Stat. Assoc. 112, 1199–1210, DOI: 10.1080/01621459.2016.1195744. Wood, S. N., Li, Z., Shaddick, G. & Augustin, N. H. (2017) Generalized Additive Models for Gigadata: Modeling the U.K. Black 166 Smoke Network Daily Data. J. Am. Stat. Assoc. 112, 1199–1210, DOI: 10.1080/01621459.2016.1195744.
Wood, S. N. (2006) Generalized additive models: an introduction with R, Chapman and Hall/CRC.