Cereal yield in West Africa¶

Introduction¶

This case study follows the paper entitled “Cereal yield response to conservation agriculture practices in drylands of West Africa: A quantitative synthesis” by Bayala et al. (2012) which was published in the Journal of Arid Environments. You can access the article here (it is behind a paywall). The data was made available on the World Agroforestry Center data repository.

This study provides a quantitative synthesis of the effect of conservation agriculture practices on crop yield in the drylands of West Africa.

Get the data¶

We can download the data with the get_data_from_uri from the agro package.

ff <- agro::get_data_from_uri("doi:10.34725/DVN/1BLSQY", ".")
ff
## [1] "./doi_10.34725_DVN_1BLSQY/Data of CAWT Septembe 2010_2.xls"

You can see that we have a single Excel file that we can read with readxl::read_excel. But first we look at the names of the “sheets” in the file.

readxl::excel_sheets(ff)
## [1] "Rawdata"      "Grain"        "Straw"        "Grainfor SAS" "StrawforSAS"

There are five sheets. We need the sheets that have the grain and straw data.

grain <- readxl::read_excel(ff, sheet = "Grain")
straw <- readxl::read_excel(ff, sheet = "Straw")
dim(straw)
## [1] 443  22
dim(grain)
## [1] 613  22
colnames(grain)
##  [1] "Technology"       "PubN"             "Author"           "Year"
##  [5] "Journal"          "Country"          "Region"           "Site"
##  [9] "Elevation"        "Rainfall"         "Rainfall pattern" "Research type"
## [13] "Soil type"        "Season"           "Species"          "Treatment"
## [17] "Crop"             "Component"        "Treatment yield"  "Control yield"
## [21] "Yield Gap"        "% Yield Increase"

We wanted to quickly look at the “Rawdata” sheet too, to understand the difference.

raw <- readxl::read_excel(ff, sheet = "Rawdata")
dim(raw)
## [1] 1155   22
dim(straw) + dim(grain)
## [1] 1056   44
table(raw$Component)
##
## Biomass  Calice  Cotton   Fruit   grain   Grain  Haulms  Leaves    Pods  Stover
##      38       2      11       7      20     593      21       3      12       2
##  Straws   Tuber
##     443       3

It appears that records in Rawdata that are not straw or grain have been ignored. Creating multiple tables with subsets of the data is generally a bad idea because it may lead to errors due to copying, or when an error is corrected in one of the tables but not everywhere. We could have created the grain and straw subsets like this (showing for grain only)

grain2 <- raw[tolower(raw$Component) == "grain", ]

To test if this is indeed gives us the same values, we first sort the both datasets.

grain  <- grain[order(grain$PubN, grain$Treatment), ]
grain2 <- grain2[order(grain2$PubN, grain2$Treatment), ]

And then we can compare them

all(grain == grain2, na.rm=TRUE)
## [1] TRUE

Quod erat demonstrandum

Organize data¶

We need to fix some values in the Technology variable in both tables.

fixme <- function(x) {
  x$Technology[x$Technology == "Coppiced G. manure"] <- "Coppicing"
  x$Technology[x$Technology == "Rot-Ass"] <- "Rotations"
  x$Technology[x$Technology == "SWC"] <- "Soil water"
  x
}
grain <- fixme(grain)
straw <- fixme(straw)

And we compute two new variables, to be used below in the analysis. D is the difference between the treatment and control yields. productivity classifies the control yield into 5 groups.

newvars <- function(x) {
  x$D <- x$`Treatment yield` - x$`Control yield`
  x$productivity = cut(x$`Control yield`, c(0, 0.5, 1, 1.5, 2, Inf))
}
grain <- newvars(grain)
straw <- newvars(straw)

Tables 1 and 2¶

Now we can attempt to reproduce Table 1 from the article: Summary of statistics of mean difference in grain yield (D, t/ha) across all crops in different practices of conservation agriculture in four West African Sahelian countries (Burkina Faso, Mali, Niger and Senegal).

First we compute the summary statistics. We use a function, so that we can use the same code for the next table.

make_table <- function(x) {
  # unique number of publications by technology
  N <- tapply(x$PubN, x$Technology, function(i) length(unique(i)))
  # observations by technology
  k <- table(x$Technology)
  # mean yield gain by technology
  mean <- tapply(x$D, x$Technology, mean)
  # quartiles of yield gain by technology
  q <- tapply(x$D, x$Technology, function(i) quantile(i, c(.75, .5, .25)))
  q <- do.call(cbind, q)
  iqr <- q[1,] - q[3,]
  ## count cases within classes
  # classify
  ycls <- cut(x$D, c(-Inf, 0, 1, 2, Inf))
  # tabulate
  z <- table(ycls, x$Technology)
  # compute percentage. Note that we need to transpose to divide
  # correctly, and then transpose back
  z <- 100 * t(t(z) / colSums(z))
  ## combine results
  tab <- rbind(N, k, mean, q, iqr, z)
  tab <- round(tab, 2)
  ## ordering
  tab <- tab[, c("Parkland", "Coppicing", "Green manure", "Mulching", "Rotations", "Soil water")]
  tab
}

And we use the function

tab1 <- make_table(grain)
## Error in x$Technology: $ operator is invalid for atomic vectors
tab1
## Error in eval(expr, envir, enclos): object 'tab1' not found

Results look very similar. Except for the classes counts. But the published table is odd. There are intervals missing, and it is unclear what the exact intervals are; so we’ll just leave it at that.

Recreating Table 2, Summary of statistics of mean difference in straw biomass yield (D, t/ha) in different practices of conservation agriculture in four West African Sahelian countries (Burkina Faso, Mali, Niger and Senegal) is now straightforward, because we can re-use the function defined above for Table 1.

tab2 <- make_table(straw)
## Error in x$Technology: $ operator is invalid for atomic vectors
tab2
## Error in eval(expr, envir, enclos): object 'tab2' not found

Results are similar. But note, for example, that we find k=28 for Parkland, whereas the paper reports 26.

Figure 1.¶

Figure 1 shows Variation in mean difference in yield with management practices in four West African Sahelian countries (Burkina Faso, Mali, Niger and Senegal). Vertical bars indicate 95% confidence bands.

# Check the crop types
sort(unique(grain$Crop))
## Error in grain$Crop: $ operator is invalid for atomic vectors

We need to create a column with “clean” crop names. I left “Millet-sorghum” observations out, as we do not know if it is millet or sorghum.

### Select data for sorghum
grain$cropname <- ""
## Warning in grain$cropname <- "": Coercing LHS to a list
grain$cropname[substr(grain$Crop, 1, 7) == "Sorghum"] <- "Sorghum"
grain$cropname[grain$Crop %in% c("White sorghum", "Red sorghum")] <- "Sorghum"
grain$cropname[substr(grain$Crop, 1, 5) == "Maize"] <- "Maize"
grain$cropname[grain$Crop == "Millet"] <- "Millet"
# check by making a table
tab <- table(grain$Crop, grain$cropname)
## Error in table(grain$Crop, grain$cropname): all arguments must have the same length
head(tab, 15)
## Error in eval(expr, envir, enclos): object 'tab' not found

Now aggregate for the three crops of interest

par(mfrow=c(1,3))
for (crop in c("Sorghum", "Millet", "Maize")) {
    plotmeans(D~Technology, data=grain, subset=cropname==crop, connect=FALSE,
          xlab= "", ylab="Grain yield", las=2, ylim=c(-1, 1.6))
    text(1, 1.5, crop)
}
## Error in plotmeans(D ~ Technology, data = grain, subset = cropname == : could not find function "plotmeans"

Figs 2 and 3¶

As figure 2 and 3 are similar, Here figure 3 is replicated. Variation in mean difference in yield (across cereal crops) with soil and water conservation practices and site productivity in four West African Sahelian countries (Burkina Faso, Mali, Niger and Senegal). The broken line represents the point where the treatment and control yields are equal. 1 and 2 defined low potential, 3 and 4 medium potential and above 4 defined high potential sites. Vertical bars represent standard errors.

g <- grain[grain$cropname != "", ]
## Error in grain[grain$cropname != "", ]: incorrect number of dimensions
g$productivity <- as.integer(g$productivity)
## Error in eval(expr, envir, enclos): object 'g' not found
sciplot::lineplot.CI(productivity, D, group=cropname, data=g,
          ci.fun=function(x) c(mean(x)-1.96 * sciplot::se(x), mean(x)+1.96 * sciplot::se(x)),
          xlab= "site productivity", ylab="Mean difference in yield", lwd=2)
## Error in loadNamespace(x): there is no package called 'sciplot'
abline(h=0, lwd=2, col="gray")
## Error in int_abline(a = a, b = b, h = h, v = v, untf = untf, ...): plot.new has not been called yet

—————————–RMEL MODEL NOT SOLVED———————————-

3.5. Table 3. Effect of rainfall on cereal grain yield (across crops) response to conservation agriculture practices in four West African Sahelian countries (Burkina Faso, Mali, Niger and Senegal).¶

## Seperate data for grain yield of parkland based on rainfall class
The methods section of the paper states that "Rainfall (long-term average of total annual) of sites was also classified as low (<600 mm), medium (600-800 mm) and high (>800 mm)."
## Error: <text>:4:5: unexpected symbol
## 3:
## 4: The methods
##        ^

rain <- cut(grain_pt$Rainfall, c(0, 600, 800, Inf))
## Error in eval(expr, envir, enclos): object 'grain_pt' not found
levels(rain) <- c("low", "medium", "high")
## Error: object 'rain' not found
grain_pt$raincls <- rain
## Error in eval(expr, envir, enclos): object 'rain' not found

The authors also state that the effects of factors such as rainfall and site condition were determined by a mixed model analysis that allowed for average correlation of observations from the same study. That information is so sparse that we did not attempt to reproduce this model. But it would look like this, perhaps.

library(lme4)
lm_pt <- lmer(`Yield Gap` ~ Rainfall + (Rainfall|Site), data = grain_pt)