Soil compaction effect on rice

1. Background and introduction

This case study shows how to do principal component analysis and stepwise multiple regression in R.

Here we follow the analysis of the study by Singh et al (2017), entitled “Depth of soil compaction predominantly affects rice yield reduction by reproductive-stage drought at varietal screening sites in Bangladesh, India, and Nepal”. The study was published in the journal Plant Soil. You can access the article (it is behind a paywall) online. The data was also made available on Harvard Dataverse and can be downloaded.

— Things to be noted here: Variables released in the dataset are not the same as in the paper. Results in the case study are different from the paper.—

2. Get the data

library(agro)
ff <- get_data_from_uri("https://doi.org/10.7910/DVN/SNQZTT", ".")
ff
## [1] "./doi_10.7910_DVN_SNQZTT/all data for PCA w imputed.xlsx"
## [2] "./doi_10.7910_DVN_SNQZTT/penetrometer.csv"
## [3] "./doi_10.7910_DVN_SNQZTT/rainfall tensiom water table.csv"
## [4] "./doi_10.7910_DVN_SNQZTT/soil texture.csv"
## [5] "./doi_10.7910_DVN_SNQZTT/Variable Descriptions.txt"

2.1. Import data

The data is provided as a tab file. This file can be read by read.table function in R.

library(readxl)
# Import table of data
wdata <- read_excel(ff[1])
# Check column names to see the variable provided
colnames(wdata)
##  [1] "Year"                        "location"
##  [3] "Ave yield control"           "Ave yield stress"
##  [5] "% yield reduction"           "Rainfall 60-100 DAS"
##  [7] "ave tensiom 60-100"          "max tensiom 60-100"
##  [9] "aver water table 60-100 DAS" "bd 5-10 cm"
## [11] "bd 25-30 cm"                 "pH"
## [13] "Avail-P ppm"                 "Exch-K ppm"
## [15] "% Clay"                      "% Silt"
## [17] "% Sand"                      "0.1 bar 5-10 cm"
## [19] "3 bar 5-10 cm"               "5 bar 5-10 cm"
## [21] "15 bar 5-10 cm"              "max penetrom"
## [23] "depth of max penetrom"       "penetrom 30 cm"

2.2. Subset data

Subset data for use and delete the information not use in the following analysis

wdata1 <- wdata[, !(colnames(wdata) %in% c("Year", "location", "Ave yield control", "Ave yield stress"))]

3. Explore data

3.1. Step-wise multiple regression for table 5 in the paper

Yield reduction by drought across three years of study as a function of rainfall and soil-related parameters as determined by step-wise multiple regression

3.1.1 First, use regsubsets(), which is a function in r for model selection

# Make a multiple linear regression model with all explanatory variables and yield reduction as the response variable
# Load packages
library(car)
## Loading required package: carData
# Using the dot notation for shorter code, which means everything else in the data frame
model_full <- lm(`% yield reduction` ~ ., wdata1)
# Start with all variables and develop a good model using regsubsets() from the leaps package. Select the model with the minimum BIC
# Load packages
library(leaps)
## Warning: package 'leaps' was built under R version 4.3.1
# Using regsubset
model_regsub <- regsubsets(`% yield reduction` ~ ., data = wdata1)
## Warning in leaps.setup(x, y, wt = wt, nbest = nbest, nvmax = nvmax, force.in =
## force.in, : 9 linear dependencies found
# View results
summary(model_regsub)
## Subset selection object
## Call: eval(expr, envir, enclos)
## 19 Variables  (and intercept)
##                               Forced in Forced out
## `Rainfall 60-100 DAS`             FALSE      FALSE
## `ave tensiom 60-100`              FALSE      FALSE
## `max tensiom 60-100`              FALSE      FALSE
## `aver water table 60-100 DAS`     FALSE      FALSE
## `bd 5-10 cm`                      FALSE      FALSE
## `bd 25-30 cm`                     FALSE      FALSE
## pH                                FALSE      FALSE
## `Avail-P ppm`                     FALSE      FALSE
## `Exch-K ppm`                      FALSE      FALSE
## `% Clay`                          FALSE      FALSE
## `% Silt`                          FALSE      FALSE
## `% Sand`                          FALSE      FALSE
## `0.1 bar 5-10 cm`                 FALSE      FALSE
## `3 bar 5-10 cm`                   FALSE      FALSE
## `5 bar 5-10 cm`                   FALSE      FALSE
## `15 bar 5-10 cm`                  FALSE      FALSE
## `max penetrom`                    FALSE      FALSE
## `depth of max penetrom`           FALSE      FALSE
## `penetrom 30 cm`                  FALSE      FALSE
## 1 subsets of each size up to 8
## Selection Algorithm: exhaustive
##          `Rainfall 60-100 DAS` `ave tensiom 60-100` `max tensiom 60-100`
## 1  ( 1 ) " "                   " "                  "*"
## 2  ( 1 ) " "                   " "                  "*"
## 3  ( 1 ) "*"                   " "                  "*"
## 4  ( 1 ) "*"                   " "                  "*"
## 5  ( 1 ) " "                   " "                  " "
## 6  ( 1 ) "*"                   " "                  " "
## 7  ( 1 ) "*"                   "*"                  " "
## 8  ( 1 ) "*"                   "*"                  "*"
##          `aver water table 60-100 DAS` `bd 5-10 cm` `bd 25-30 cm` pH
## 1  ( 1 ) " "                           " "          " "           " "
## 2  ( 1 ) " "                           " "          " "           "*"
## 3  ( 1 ) " "                           " "          " "           "*"
## 4  ( 1 ) " "                           " "          " "           " "
## 5  ( 1 ) "*"                           " "          " "           " "
## 6  ( 1 ) "*"                           "*"          "*"           "*"
## 7  ( 1 ) "*"                           "*"          " "           "*"
## 8  ( 1 ) "*"                           "*"          " "           "*"
##          `Avail-P ppm` `Exch-K ppm` `% Clay` `% Silt` `% Sand`
## 1  ( 1 ) " "           " "          " "      " "      " "
## 2  ( 1 ) " "           " "          " "      " "      " "
## 3  ( 1 ) " "           " "          " "      " "      " "
## 4  ( 1 ) " "           " "          " "      " "      " "
## 5  ( 1 ) " "           "*"          " "      "*"      " "
## 6  ( 1 ) " "           " "          " "      " "      " "
## 7  ( 1 ) " "           " "          "*"      " "      " "
## 8  ( 1 ) " "           " "          "*"      " "      " "
##          `0.1 bar 5-10 cm` `3 bar 5-10 cm` `5 bar 5-10 cm` `15 bar 5-10 cm`
## 1  ( 1 ) " "               " "             " "             " "
## 2  ( 1 ) " "               " "             " "             " "
## 3  ( 1 ) " "               " "             " "             " "
## 4  ( 1 ) " "               " "             " "             "*"
## 5  ( 1 ) " "               " "             " "             "*"
## 6  ( 1 ) " "               " "             " "             " "
## 7  ( 1 ) " "               " "             " "             " "
## 8  ( 1 ) " "               " "             " "             " "
##          `max penetrom` `depth of max penetrom` `penetrom 30 cm`
## 1  ( 1 ) " "            " "                     " "
## 2  ( 1 ) " "            " "                     " "
## 3  ( 1 ) " "            " "                     " "
## 4  ( 1 ) " "            " "                     "*"
## 5  ( 1 ) "*"            " "                     " "
## 6  ( 1 ) " "            "*"                     " "
## 7  ( 1 ) " "            "*"                     " "
## 8  ( 1 ) " "            "*"                     " "
# Visually determine best model by BIC (subset( ) are bic, cp, adjr2, and rss.)
summary(model_regsub)$bic
## [1] -24.36176 -29.64764 -31.70518 -34.89176 -40.10164 -44.31511 -47.67645
## [8] -62.07997
subsets(model_regsub, statistic = "bic", legend = F)

image0

##                               Abbreviation
## `Rainfall 60-100 DAS`                  `6D
## `ave tensiom 60-100`            `vt60-100`
## `max tensiom 60-100`            `mt60-100`
## `aver water table 60-100 DAS`        `wt6D
## `bd 5-10 cm`                           `5c
## `bd 25-30 cm`                          `2c
## pH                                       p
## `Avail-P ppm`                          `Ap
## `Exch-K ppm`                           `Ep
## `% Clay`                                `C
## `% Silt`                             `%Sl`
## `% Sand`                             `%Sn`
## `0.1 bar 5-10 cm`                    `0b5c
## `3 bar 5-10 cm`                      `3b5c
## `5 bar 5-10 cm`                      `5b5c
## `15 bar 5-10 cm`                     `1b5c
## `max penetrom`                         `p`
## `depth of max penetrom`               `omp
## `penetrom 30 cm`                       `3c
# Creat the best model
model_reduced <- lm(`% yield reduction` ~ `Rainfall 60-100 DAS` + `ave tensiom 60-100` + `max tensiom 60-100` + `aver water table 60-100 DAS` + `bd 5-10 cm` + pH + `% Clay` + `depth of max penetrom`, data = wdata1)
summary(model_reduced)
##
## Call:
## lm(formula = `% yield reduction` ~ `Rainfall 60-100 DAS` + `ave tensiom 60-100` +
##     `max tensiom 60-100` + `aver water table 60-100 DAS` + `bd 5-10 cm` +
##     pH + `% Clay` + `depth of max penetrom`, data = wdata1)
##
## Residuals:
##        1        2        9       11       16       24       26       31
## -0.57505  0.37881 -0.27115  0.67720  0.54332  1.42357  0.08142  0.22219
##       33       36       38       40
## -0.55984 -0.33308 -1.24719 -0.34020
##
## Coefficients:
##                                 Estimate Std. Error t value Pr(>|t|)
## (Intercept)                    8.901e+02  1.005e+02   8.858  0.00303 **
## `Rainfall 60-100 DAS`         -4.606e-02  7.291e-03  -6.318  0.00802 **
## `ave tensiom 60-100`           1.518e+00  2.724e-01   5.572  0.01141 *
## `max tensiom 60-100`          -6.051e-01  1.989e-01  -3.042  0.05577 .
## `aver water table 60-100 DAS` -9.579e-01  1.218e-01  -7.861  0.00429 **
## `bd 5-10 cm`                  -3.486e+02  4.253e+01  -8.197  0.00380 **
## pH                            -2.171e+01  1.982e+00 -10.951  0.00163 **
## `% Clay`                      -5.970e-01  8.378e-02  -7.125  0.00569 **
## `depth of max penetrom`       -5.618e-01  5.863e-02  -9.582  0.00241 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.351 on 3 degrees of freedom
##   (28 observations deleted due to missingness)
## Multiple R-squared:  0.9991, Adjusted R-squared:  0.9968
## F-statistic: 426.4 on 8 and 3 DF,  p-value: 0.0001707
# Compare the best model with the full model
anova(model_full, model_reduced)
## Analysis of Variance Table
##
## Model 1: `% yield reduction` ~ `Rainfall 60-100 DAS` + `ave tensiom 60-100` +
##     `max tensiom 60-100` + `aver water table 60-100 DAS` + `bd 5-10 cm` +
##     `bd 25-30 cm` + pH + `Avail-P ppm` + `Exch-K ppm` + `% Clay` +
##     `% Silt` + `% Sand` + `0.1 bar 5-10 cm` + `3 bar 5-10 cm` +
##     `5 bar 5-10 cm` + `15 bar 5-10 cm` + `max penetrom` + `depth of max penetrom` +
##     `penetrom 30 cm`
## Model 2: `% yield reduction` ~ `Rainfall 60-100 DAS` + `ave tensiom 60-100` +
##     `max tensiom 60-100` + `aver water table 60-100 DAS` + `bd 5-10 cm` +
##     pH + `% Clay` + `depth of max penetrom`
##   Res.Df    RSS Df Sum of Sq      F  Pr(>F)
## 1      1 0.0243
## 2      3 5.4796 -2   -5.4553 112.03 0.06666 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

3.1.2. Second, use stepAIC() function from MASS package perform step model selection by AIC including forward, backward and both

library(MASS)
# Stepwise Regression
model_step <- stepAIC(model_full, direction = "both")
## Start:  AIC=-52.4
## `% yield reduction` ~ `Rainfall 60-100 DAS` + `ave tensiom 60-100` +
##     `max tensiom 60-100` + `aver water table 60-100 DAS` + `bd 5-10 cm` +
##     `bd 25-30 cm` + pH + `Avail-P ppm` + `Exch-K ppm` + `% Clay` +
##     `% Silt` + `% Sand` + `0.1 bar 5-10 cm` + `3 bar 5-10 cm` +
##     `5 bar 5-10 cm` + `15 bar 5-10 cm` + `max penetrom` + `depth of max penetrom` +
##     `penetrom 30 cm`
##
##
## Step:  AIC=-52.4
## `% yield reduction` ~ `Rainfall 60-100 DAS` + `ave tensiom 60-100` +
##     `max tensiom 60-100` + `aver water table 60-100 DAS` + `bd 5-10 cm` +
##     `bd 25-30 cm` + pH + `Avail-P ppm` + `Exch-K ppm` + `% Clay` +
##     `% Silt` + `% Sand` + `0.1 bar 5-10 cm` + `3 bar 5-10 cm` +
##     `5 bar 5-10 cm` + `15 bar 5-10 cm` + `max penetrom` + `depth of max penetrom`
##
##
## Step:  AIC=-52.4
## `% yield reduction` ~ `Rainfall 60-100 DAS` + `ave tensiom 60-100` +
##     `max tensiom 60-100` + `aver water table 60-100 DAS` + `bd 5-10 cm` +
##     `bd 25-30 cm` + pH + `Avail-P ppm` + `Exch-K ppm` + `% Clay` +
##     `% Silt` + `% Sand` + `0.1 bar 5-10 cm` + `3 bar 5-10 cm` +
##     `5 bar 5-10 cm` + `15 bar 5-10 cm` + `max penetrom`
##
##
## Step:  AIC=-52.4
## `% yield reduction` ~ `Rainfall 60-100 DAS` + `ave tensiom 60-100` +
##     `max tensiom 60-100` + `aver water table 60-100 DAS` + `bd 5-10 cm` +
##     `bd 25-30 cm` + pH + `Avail-P ppm` + `Exch-K ppm` + `% Clay` +
##     `% Silt` + `% Sand` + `0.1 bar 5-10 cm` + `3 bar 5-10 cm` +
##     `5 bar 5-10 cm` + `15 bar 5-10 cm`
##
##
## Step:  AIC=-52.4
## `% yield reduction` ~ `Rainfall 60-100 DAS` + `ave tensiom 60-100` +
##     `max tensiom 60-100` + `aver water table 60-100 DAS` + `bd 5-10 cm` +
##     `bd 25-30 cm` + pH + `Avail-P ppm` + `Exch-K ppm` + `% Clay` +
##     `% Silt` + `% Sand` + `0.1 bar 5-10 cm` + `3 bar 5-10 cm` +
##     `5 bar 5-10 cm`
##
##
## Step:  AIC=-52.4
## `% yield reduction` ~ `Rainfall 60-100 DAS` + `ave tensiom 60-100` +
##     `max tensiom 60-100` + `aver water table 60-100 DAS` + `bd 5-10 cm` +
##     `bd 25-30 cm` + pH + `Avail-P ppm` + `Exch-K ppm` + `% Clay` +
##     `% Silt` + `% Sand` + `0.1 bar 5-10 cm` + `3 bar 5-10 cm`
##
##
## Step:  AIC=-52.4
## `% yield reduction` ~ `Rainfall 60-100 DAS` + `ave tensiom 60-100` +
##     `max tensiom 60-100` + `aver water table 60-100 DAS` + `bd 5-10 cm` +
##     `bd 25-30 cm` + pH + `Avail-P ppm` + `Exch-K ppm` + `% Clay` +
##     `% Silt` + `% Sand` + `0.1 bar 5-10 cm`
##
##
## Step:  AIC=-52.4
## `% yield reduction` ~ `Rainfall 60-100 DAS` + `ave tensiom 60-100` +
##     `max tensiom 60-100` + `aver water table 60-100 DAS` + `bd 5-10 cm` +
##     `bd 25-30 cm` + pH + `Avail-P ppm` + `Exch-K ppm` + `% Clay` +
##     `% Silt` + `% Sand`
##
##
## Step:  AIC=-52.4
## `% yield reduction` ~ `Rainfall 60-100 DAS` + `ave tensiom 60-100` +
##     `max tensiom 60-100` + `aver water table 60-100 DAS` + `bd 5-10 cm` +
##     `bd 25-30 cm` + pH + `Avail-P ppm` + `Exch-K ppm` + `% Clay` +
##     `% Silt`
##
##
## Step:  AIC=-52.4
## `% yield reduction` ~ `Rainfall 60-100 DAS` + `ave tensiom 60-100` +
##     `max tensiom 60-100` + `aver water table 60-100 DAS` + `bd 5-10 cm` +
##     `bd 25-30 cm` + pH + `Avail-P ppm` + `Exch-K ppm` + `% Clay`
##
##                                 Df Sum of Sq     RSS     AIC
## <none>                                         0.024 -52.402
## - `bd 25-30 cm`                  1     2.541   2.566   1.488
## - `Rainfall 60-100 DAS`          1     8.517   8.542  15.920
## - `max tensiom 60-100`           1    20.159  20.183  26.239
## - `ave tensiom 60-100`           1    36.034  36.058  33.203
## - `Avail-P ppm`                  1    36.725  36.750  33.431
## - `aver water table 60-100 DAS`  1    62.781  62.806  39.862
## - `bd 5-10 cm`                   1   123.621 123.646  47.990
## - `% Clay`                       1   125.970 125.995  48.216
## - pH                             1   136.495 136.520  49.179
## - `Exch-K ppm`                   1   147.668 147.693  50.123
# Display results
model_step$anova
## Stepwise Model Path
## Analysis of Deviance Table
##
## Initial Model:
## `% yield reduction` ~ `Rainfall 60-100 DAS` + `ave tensiom 60-100` +
##     `max tensiom 60-100` + `aver water table 60-100 DAS` + `bd 5-10 cm` +
##     `bd 25-30 cm` + pH + `Avail-P ppm` + `Exch-K ppm` + `% Clay` +
##     `% Silt` + `% Sand` + `0.1 bar 5-10 cm` + `3 bar 5-10 cm` +
##     `5 bar 5-10 cm` + `15 bar 5-10 cm` + `max penetrom` + `depth of max penetrom` +
##     `penetrom 30 cm`
##
## Final Model:
## `% yield reduction` ~ `Rainfall 60-100 DAS` + `ave tensiom 60-100` +
##     `max tensiom 60-100` + `aver water table 60-100 DAS` + `bd 5-10 cm` +
##     `bd 25-30 cm` + pH + `Avail-P ppm` + `Exch-K ppm` + `% Clay`
##
##
##                         Step Df Deviance Resid. Df Resid. Dev       AIC
## 1                                                1 0.02434837 -52.40236
## 2         - `penetrom 30 cm`  0        0         1 0.02434837 -52.40236
## 3  - `depth of max penetrom`  0        0         1 0.02434837 -52.40236
## 4           - `max penetrom`  0        0         1 0.02434837 -52.40236
## 5         - `15 bar 5-10 cm`  0        0         1 0.02434837 -52.40236
## 6          - `5 bar 5-10 cm`  0        0         1 0.02434837 -52.40236
## 7          - `3 bar 5-10 cm`  0        0         1 0.02434837 -52.40236
## 8        - `0.1 bar 5-10 cm`  0        0         1 0.02434837 -52.40236
## 9                 - `% Sand`  0        0         1 0.02434837 -52.40236
## 10                - `% Silt`  0        0         1 0.02434837 -52.40236
# Final model
model_final <- lm(`% yield reduction` ~ `Rainfall 60-100 DAS` + `ave tensiom 60-100` + `max tensiom 60-100` + `aver water table 60-100 DAS` + `bd 5-10 cm` + `bd 25-30 cm` + pH + `Avail-P ppm` + `Exch-K ppm` + `% Clay`, data = wdata1)
# Compare models (best reduced model from the previous analysis vs. the final model from this analysis)
anova(model_reduced, model_final)
## Analysis of Variance Table
##
## Model 1: `% yield reduction` ~ `Rainfall 60-100 DAS` + `ave tensiom 60-100` +
##     `max tensiom 60-100` + `aver water table 60-100 DAS` + `bd 5-10 cm` +
##     pH + `% Clay` + `depth of max penetrom`
## Model 2: `% yield reduction` ~ `Rainfall 60-100 DAS` + `ave tensiom 60-100` +
##     `max tensiom 60-100` + `aver water table 60-100 DAS` + `bd 5-10 cm` +
##     `bd 25-30 cm` + pH + `Avail-P ppm` + `Exch-K ppm` + `% Clay`
##   Res.Df    RSS Df Sum of Sq      F  Pr(>F)
## 1      3 5.4796
## 2      1 0.0243  2    5.4553 112.03 0.06666 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

3.2. Principal component analysis for table 6

Eigenvector values from a principal component analysis of yield reduction by drought a various soil characteristics at the research station drought screening sites characterized in this study.

# Perform  a PCA on the 20 varaibles using the correlation matrix
# Load package
library(stats)
# Pricipal Component Analysis (When scale=TRUE, the correlation matrix is used, omit data wich is NA)
sr.pca <- prcomp(na.omit(wdata1), scale = TRUE)
# Display results
summary(sr.pca)
## Importance of components:
##                           PC1    PC2    PC3    PC4     PC5     PC6     PC7
## Standard deviation     2.6393 2.1736 1.8476 1.4634 1.11394 1.05625 0.56808
## Proportion of Variance 0.3483 0.2362 0.1707 0.1071 0.06204 0.05578 0.01614
## Cumulative Proportion  0.3483 0.5845 0.7552 0.8623 0.92433 0.98011 0.99625
##                            PC8     PC9    PC10      PC11      PC12
## Standard deviation     0.18562 0.17563 0.09851 0.0009984 1.112e-15
## Proportion of Variance 0.00172 0.00154 0.00049 0.0000000 0.000e+00
## Cumulative Proportion  0.99797 0.99951 1.00000 1.0000000 1.000e+00
# Using a scree plot to check how many PC's to retain (find elbow)
plot(sr.pca, type = "l", main ="PCA of dataset")

image1

# Check eigenvalues for each PC
summary(sr.pca)$sdev^2
##  [1] 6.966041e+00 4.724590e+00 3.413529e+00 2.141598e+00 1.240864e+00
##  [6] 1.115660e+00 3.227133e-01 3.445467e-02 3.084572e-02 9.703758e-03
## [11] 9.968291e-07 1.236526e-30
# Check Eigenvector values for each PC's
re <- eigen(cor(na.omit(wdata1)))

3.3. Principal component analysis for figure 5

Biplot from a principal component analysis of yield reduction by drought a various soil characteristics at the research station drought screening sites characterized in this study. Parameter abbreviations are described in Table 5

# Make a biplot
biplot(sr.pca)

image2