This document is part of the supplementary material of “A practical guide to descriptive and statistical analysis of R. solanacearum infection data using R”. This file explores more complex survival analysis, that contain multiple levels of Plants (genotype) and Strains. This unpublished work by Santatra Ravelomanantsoa and Philippe Prior, both with CIRAD / INRA UMR PVBMT (Réunion Island), has been pseudonymized.
Initially, data needs to be read into R.
###Name of the file to be read
table <- c("S4_Ravelomanantsoa_Prior.csv") disease_index <- as.data.frame(read.table(table, header=T,
sep=";" , ###Sets the seperator of the csv file
dec=","), ###Sets the decimal operatore of the csv file
stringsAsFactors=T) ###Using str() one can see if the table was properly imported.
str(disease_index)## 'data.frame': 3920 obs. of 12 variables:
## $ Strain : Factor w/ 14 levels "RS1","RS10","RS11",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ Plant : Factor w/ 8 levels "A","B","C","D",..: 1 1 1 1 1 1 1 2 2 2 ...
## $ X1 : int 0 0 0 0 0 0 0 0 0 0 ...
## $ X8 : int 0 0 0 0 0 0 0 0 0 0 ...
## $ X11 : int 0 0 0 0 0 0 0 0 0 0 ...
## $ X15 : int 0 0 0 0 0 0 0 0 0 0 ...
## $ X18 : int 0 0 0 0 0 0 0 0 0 0 ...
## $ X22 : int 0 0 0 0 0 0 0 0 0 0 ...
## $ X25 : int 0 0 0 1 1 0 1 0 1 1 ...
## $ X32 : int 0 0 0 1 1 0 1 0 1 1 ...
## $ X43 : Factor w/ 3 levels "0","1","na": 1 1 1 2 2 1 2 1 2 2 ...
## $ ROTOPLAN: int 1 1 1 1 1 1 1 1 1 1 ...
This table is a survival table already, go directly into surv_table.
library("tidyr")surv_table<- gather(disease_index, key=DPI,
value=Diseased,
c(X1,X8,X11,X15,X18,X22,X25,X32, X43),
na.rm=T)## Warning: attributes are not identical across measure variables; they will
## be dropped
str(surv_table)## 'data.frame': 34186 obs. of 5 variables:
## $ Strain : Factor w/ 14 levels "RS1","RS10","RS11",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ Plant : Factor w/ 8 levels "A","B","C","D",..: 1 1 1 1 1 1 1 2 2 2 ...
## $ ROTOPLAN: int 1 1 1 1 1 1 1 1 1 1 ...
## $ DPI : chr "X1" "X1" "X1" "X1" ...
## $ Diseased: chr "0" "0" "0" "0" ...
###Do stuff to surv_table that makes it more useable
surv_table$DPI <- as.numeric(
na.omit(
as.numeric(
unlist(
strsplit(
as.character(
surv_table$DPI
),
"X" )
)
)
)
)
surv_table <- surv_table[(surv_table$Diseased %in% c(0,1)),] ##Drop Disease recordings that are not binary
surv_table$Strain <- as.factor(surv_table$Strain)
surv_table$Plant <- as.factor(surv_table$Plant)
surv_table$ROTOPLAN <- as.factor(surv_table$ROTOPLAN)
surv_table$StrainPlant <- interaction(surv_table$Strain,surv_table$Plant)
surv_table$StrainRoto <- interaction(surv_table$Strain, surv_table$ROTOPLAN)
surv_table$PlantRoto <- interaction(surv_table$Plant, surv_table$ROTOPLAN)
contrasts(surv_table$Strain) <- "contr.treatment"
contrasts(surv_table$Plant) <- "contr.treatment"
contrasts(surv_table$ROTOPLAN) <- "contr.sum"
contrasts(surv_table$StrainPlant) <- "contr.treatment"In this dataset, Strain is the bacterial strain, Plant is the plant genotype that was infected, and ROTOPLAN, is the rotoplan (a kind of greenhouse) that this infection was carried out in. In other words, ROTOPLAN is an indicator that descriminates replicates.
Above, i am generating interaction columns. In the R notation interactions are denoted using * in a formula. Strain*Plant in a formula, is equivalent to interaction(Strain,Plant). For me, predifining the interaction terms, makes it easier to tangle them apart for visualization.I also find it easier to interpret.
Initially, one may enjoy generating Kaplan-Meier estimates of the survival data. This is done using the survfit function. Here, the rotoplan is not included as a predictor.
library("survival")
surv_fit <- survfit(Surv(DPI, as.numeric(Diseased), type="right") ~Strain + Plant, data=surv_table)
###Make Dataframe for plotting
strata_dummy <-NULL
for(i in 1:length(surv_fit$strata)){
# add vector for one strata according to number of rows of strata
strata_dummy <- c(strata_dummy, rep(names(surv_fit$strata)[i], surv_fit$strata[i]))
}
###Data frame generation inspired by a post by Hadley Wickham to the ggplot2 Googlegroup
surv_fit.df <- data.frame(
time = surv_fit$time,
n.risk = surv_fit$n.risk,
n.event = surv_fit$n.event,
surv = surv_fit$surv,
strata = strata_dummy,
upper = surv_fit$upper,
lower = surv_fit$lower
)
zeros <- data.frame(time = 0, surv = 1, strata = names((surv_fit$strata)),
upper = 1, lower = 1)
surv_fit.df <- plyr::rbind.fill(zeros, surv_fit.df) ###I dont want to load plyr because i guess it will interfere with dplyr...
rm(strata_dummy)
rm(zeros)
stratas <- matrix( nrow=length(surv_fit.df$strata),ncol=2, unlist(strsplit(as.character(surv_fit.df$strata),", ")), byrow=T )
surv_fit.df$Plant <- as.factor(matrix(nrow=length(surv_fit.df$strata), ncol=2,unlist(strsplit(stratas[,2],"=")), byrow=T )[,2])
surv_fit.df$Strain <- as.factor(matrix(nrow=length(surv_fit.df$strata), ncol=2,unlist(strsplit(stratas[,1],"=")), byrow=T )[,2])
###End of data frame generation
library("ggplot2")
###Start plotting
ggplot(surv_fit.df,aes(time, surv, colour = Strain)) +
facet_grid(Strain~Plant) +
stat_summary(aes(y = surv), fun.y=mean, geom="step") +
ggtitle("All KM survival estimates")If one is interested between all pairwise differences of ALL StrainPlant combinations (6216 pairs), run the below. This will perform a logrank test for every possible combination of strain and plant and test against all others. This is stratified by ROTOPLAN. This is not evaluated in the rmarkdown file because the output is huge and not particularily helpful. The below does not correct for multiple comparisons.
###Pairwise--- http://stackoverflow.com/questions/17338774/r-formula-how-to-constrain-calculations-to-two-groups-using-formula
library("gtools")
levels<-(unique(surv_table$StrainPlant))
groups<-combinations(length(levels),2,levels)
pairWise_StrainPlant <- plyr::alply(groups, 1,
function(pair) {
survdiff(Surv(DPI, as.numeric(Diseased), type="right")~StrainPlant + frailty(ROTOPLAN), surv_table , subset=surv_table$StrainPlant %in% c(pair[[1]], pair[[2]]) )
})
names(pairWise_StrainPlant) <- plyr::alply(groups, 1,
function(pair) {
c(pair[[1]], pair[[2]])
})library("survcomp")## Loading required package: prodlim
###Cox-Proportional hazards####
#Build model
srv_coxph <- coxph(Surv(DPI, as.numeric(Diseased),type="right") ~Strain+Plant+ROTOPLAN, data=surv_table)
###Check porportionality of hazards
cox.zph(srv_coxph, transform = "log")## Warning: contrasts dropped from factor Strain
## Warning: contrasts dropped from factor Plant
## Warning: contrasts dropped from factor ROTOPLAN
## rho chisq p
## StrainRS10 -0.19033 3.72e+02 0.00e+00
## StrainRS11 -0.17451 3.12e+02 0.00e+00
## StrainRS12 -0.06034 3.73e+01 1.04e-09
## StrainRS13 -0.01777 3.23e+00 7.21e-02
## StrainRS14 0.00438 1.98e-01 6.57e-01
## StrainRS2 0.00809 6.71e-01 4.13e-01
## StrainRS3 0.01729 3.06e+00 8.03e-02
## StrainRS4 0.02189 4.91e+00 2.67e-02
## StrainRS5 0.01829 3.43e+00 6.41e-02
## StrainRS6 -0.01722 3.04e+00 8.12e-02
## StrainRS7 0.01656 2.81e+00 9.37e-02
## StrainRS8 -0.00281 8.09e-02 7.76e-01
## StrainRS9 -0.01052 1.13e+00 2.87e-01
## PlantB 0.01789 3.28e+00 7.01e-02
## PlantC -0.01067 1.17e+00 2.80e-01
## PlantD -0.00924 8.75e-01 3.50e-01
## PlantE -0.00789 6.41e-01 4.23e-01
## PlantF -0.01989 4.05e+00 4.42e-02
## PlantG -0.01084 1.20e+00 2.73e-01
## PlantH -0.04584 2.15e+01 3.50e-06
## ROTOPLAN1 -0.05891 3.63e+01 1.72e-09
## ROTOPLAN2 0.01473 2.23e+00 1.35e-01
## ROTOPLAN3 0.01926 3.84e+00 5.01e-02
## ROTOPLAN4 -0.06416 4.28e+01 6.07e-11
## GLOBAL NA 1.78e+03 0.00e+00
###Hazard ration analysis, not appropriate if cox.zph above returns a low pvalue
hazrat <- hazard.ratio(surv_table$Strain, surv_table$DPI, surv.event = as.numeric(surv_table$Diseased))The proportional hazards assumption is violated.
As the hazards are non-proportional, and generating a ton of pairwise tests as done above without correcting for multiple comparisons is a really bad idea, survival regression could be handy.
library("rms")psm_gaus <- psm(Surv(DPI, as.numeric(Diseased), type="right") ~ Strain * Plant + ROTOPLAN, surv_table, dist = "gaussian")
psm_logistic <- psm(Surv(DPI, as.numeric(Diseased), type="right") ~ Strain * Plant + ROTOPLAN, surv_table, dist = "logistic")
psm_lnorm <- psm(Surv(DPI, as.numeric(Diseased), type="right") ~ Strain * Plant + ROTOPLAN, surv_table, dist = "lognormal")
psm_wei <- psm(Surv(DPI, as.numeric(Diseased), type="right") ~ Strain * Plant + ROTOPLAN, surv_table, dist = "weibull")## Warning in survreg.fit(X, Y, weights, offset, init = init, controlvals =
## control, : Ran out of iterations and did not converge
aic.scores.psm <- rbind(
extractAIC(psm_wei),
extractAIC(psm_gaus),
extractAIC(psm_logistic),
extractAIC(psm_lnorm))
###Make useable AIC table
rownames(aic.scores.psm) <- c("Weibull", "Gaussian", "Logist", "Lognorm")
colnames(aic.scores.psm) <- c("df", "AIC")
aic.scores.psm## df AIC
## Weibull 117 113645.10
## Gaussian 117 79600.29
## Logist 117 80607.59
## Lognorm 117 80392.17
Gaussian is the best fit. Explore the gaussian psm
library("multcomp")## Loading required package: mvtnorm
## Loading required package: TH.data
## Loading required package: MASS
##
## Attaching package: 'TH.data'
## The following object is masked from 'package:MASS':
##
## geyser
summary(glht(psm_gaus, linfct=mcp(Strain="Tukey")))## Warning in mcp2matrix(model, linfct = linfct): covariate interactions found
## -- default contrast might be inappropriate
## Warning in RET$pfunction("adjusted", ...): Completion with error > abseps
## Warning in RET$pfunction("adjusted", ...): Completion with error > abseps
## Warning in RET$pfunction("adjusted", ...): Completion with error > abseps
## Warning in RET$pfunction("adjusted", ...): Completion with error > abseps
## Warning in RET$pfunction("adjusted", ...): Completion with error > abseps
## Warning in RET$pfunction("adjusted", ...): Completion with error > abseps
## Warning in RET$pfunction("adjusted", ...): Completion with error > abseps
## Warning in RET$pfunction("adjusted", ...): Completion with error > abseps
## Warning in RET$pfunction("adjusted", ...): Completion with error > abseps
## Warning in RET$pfunction("adjusted", ...): Completion with error > abseps
## Warning in RET$pfunction("adjusted", ...): Completion with error > abseps
## Warning in RET$pfunction("adjusted", ...): Completion with error > abseps
## Warning in RET$pfunction("adjusted", ...): Completion with error > abseps
## Warning in RET$pfunction("adjusted", ...): Completion with error > abseps
##
## Simultaneous Tests for General Linear Hypotheses
##
## Multiple Comparisons of Means: Tukey Contrasts
##
##
## Fit: psm(formula = Surv(DPI, as.numeric(Diseased), type = "right") ~
## Strain * Plant + ROTOPLAN, data = surv_table, dist = "gaussian")
##
## Linear Hypotheses:
## Estimate Std. Error z value Pr(>|z|)
## RS10 - RS1 == 0 -10.95176 1.04690 -10.461 <0.01 ***
## RS11 - RS1 == 0 -8.71967 1.07746 -8.093 <0.01 ***
## RS12 - RS1 == 0 -1.14039 1.21223 -0.941 0.9996
## RS13 - RS1 == 0 2.94921 1.34993 2.185 0.6354
## RS14 - RS1 == 0 -2.49739 1.20444 -2.073 0.7141
## RS2 - RS1 == 0 -0.65822 1.21126 -0.543 1.0000
## RS3 - RS1 == 0 0.67626 1.24148 0.545 1.0000
## RS4 - RS1 == 0 -0.01345 1.21878 -0.011 1.0000
## RS5 - RS1 == 0 -0.26534 1.21387 -0.219 1.0000
## RS6 - RS1 == 0 -1.17259 1.18939 -0.986 0.9994
## RS7 - RS1 == 0 -0.27662 1.21800 -0.227 1.0000
## RS8 - RS1 == 0 -0.41951 1.21248 -0.346 1.0000
## RS9 - RS1 == 0 -1.72957 1.18244 -1.463 0.9731
## RS11 - RS10 == 0 2.23208 0.86330 2.586 0.3443
## RS12 - RS10 == 0 9.81137 1.02700 9.553 <0.01 ***
## RS13 - RS10 == 0 13.90097 1.18639 11.717 <0.01 ***
## RS14 - RS10 == 0 8.45436 1.01736 8.310 <0.01 ***
## RS2 - RS10 == 0 10.29354 1.02573 10.035 <0.01 ***
## RS3 - RS10 == 0 11.62801 1.06140 10.955 <0.01 ***
## RS4 - RS10 == 0 10.93830 1.03459 10.573 <0.01 ***
## RS5 - RS10 == 0 10.68642 1.02863 10.389 <0.01 ***
## RS6 - RS10 == 0 9.77917 0.99965 9.783 <0.01 ***
## RS7 - RS10 == 0 10.67514 1.03361 10.328 <0.01 ***
## RS8 - RS10 == 0 10.53225 1.02710 10.254 <0.01 ***
## RS9 - RS10 == 0 9.22219 0.99143 9.302 <0.01 ***
## RS12 - RS11 == 0 7.57929 1.05813 7.163 <0.01 ***
## RS13 - RS11 == 0 11.66888 1.21347 9.616 <0.01 ***
## RS14 - RS11 == 0 6.22228 1.04907 5.931 <0.01 ***
## RS2 - RS11 == 0 8.06146 1.05708 7.626 <0.01 ***
## RS3 - RS11 == 0 9.39593 1.09150 8.608 <0.01 ***
## RS4 - RS11 == 0 8.70622 1.06550 8.171 <0.01 ***
## RS5 - RS11 == 0 8.45433 1.05977 7.977 <0.01 ***
## RS6 - RS11 == 0 7.54709 1.03170 7.315 <0.01 ***
## RS7 - RS11 == 0 8.44305 1.06461 7.931 <0.01 ***
## RS8 - RS11 == 0 8.30016 1.05831 7.843 <0.01 ***
## RS9 - RS11 == 0 6.99010 1.02371 6.828 <0.01 ***
## RS13 - RS12 == 0 4.08959 1.33441 3.065 0.1140
## RS14 - RS12 == 0 -1.35701 1.18709 -1.143 0.9973
## RS2 - RS12 == 0 0.48217 1.19424 0.404 1.0000
## RS3 - RS12 == 0 1.81664 1.22485 1.483 0.9700
## RS4 - RS12 == 0 1.12693 1.20173 0.938 0.9997
## RS5 - RS12 == 0 0.87505 1.19676 0.731 1.0000
## RS6 - RS12 == 0 -0.03220 1.17184 -0.027 1.0000
## RS7 - RS12 == 0 0.86376 1.20082 0.719 1.0000
## RS8 - RS12 == 0 0.72088 1.19543 0.603 1.0000
## RS9 - RS12 == 0 -0.58918 1.16495 -0.506 1.0000
## RS14 - RS13 == 0 -5.44660 1.32740 -4.103 <0.01 **
## RS2 - RS13 == 0 -3.60742 1.33362 -2.705 0.2719
## RS3 - RS13 == 0 -2.27295 1.36115 -1.670 0.9246
## RS4 - RS13 == 0 -2.96266 1.34046 -2.210 0.6160
## RS5 - RS13 == 0 -3.21455 1.33589 -2.406 0.4697
## RS6 - RS13 == 0 -4.12180 1.31368 -3.138 0.0942 .
## RS7 - RS13 == 0 -3.22583 1.33973 -2.408 0.4681
## RS8 - RS13 == 0 -3.36872 1.33468 -2.524 0.3848
## RS9 - RS13 == 0 -4.67878 1.30755 -3.578 0.0234 *
## RS2 - RS14 == 0 1.83918 1.18609 1.551 0.9569
## RS3 - RS14 == 0 3.17365 1.21712 2.608 0.3306
## RS4 - RS14 == 0 2.48394 1.19384 2.081 0.7104
## RS5 - RS14 == 0 2.23206 1.18872 1.878 0.8352
## RS6 - RS14 == 0 1.32481 1.16368 1.138 0.9974
## RS7 - RS14 == 0 2.22077 1.19292 1.862 0.8437
## RS8 - RS14 == 0 2.07788 1.18733 1.750 0.8955
## RS9 - RS14 == 0 0.76782 1.15668 0.664 1.0000
## RS3 - RS2 == 0 1.33447 1.22377 1.090 0.9983
## RS4 - RS2 == 0 0.64476 1.20076 0.537 1.0000
## RS5 - RS2 == 0 0.39288 1.19570 0.329 1.0000
## RS6 - RS2 == 0 -0.51437 1.17086 -0.439 1.0000
## RS7 - RS2 == 0 0.38160 1.19995 0.318 1.0000
## RS8 - RS2 == 0 0.23871 1.19423 0.200 1.0000
## RS9 - RS2 == 0 -1.07135 1.16380 -0.921 0.9997
## RS4 - RS3 == 0 -0.68971 1.23118 -0.560 1.0000
## RS5 - RS3 == 0 -0.94159 1.22632 -0.768 1.0000
## RS6 - RS3 == 0 -1.84884 1.20215 -1.538 0.9597
## RS7 - RS3 == 0 -0.95288 1.23055 -0.774 1.0000
## RS8 - RS3 == 0 -1.09577 1.22498 -0.895 0.9998
## RS9 - RS3 == 0 -2.40583 1.19523 -2.013 0.7555
## RS5 - RS4 == 0 -0.25188 1.20329 -0.209 1.0000
## RS6 - RS4 == 0 -1.15913 1.17862 -0.983 0.9994
## RS7 - RS4 == 0 -0.26317 1.20755 -0.218 1.0000
## RS8 - RS4 == 0 -0.40606 1.20195 -0.338 1.0000
## RS9 - RS4 == 0 -1.71612 1.17162 -1.465 0.9728
## RS6 - RS5 == 0 -0.90725 1.17341 -0.773 1.0000
## RS7 - RS5 == 0 -0.01128 1.20250 -0.009 1.0000
## RS8 - RS5 == 0 -0.15417 1.19684 -0.129 1.0000
## RS9 - RS5 == 0 -1.46423 1.16644 -1.255 0.9931
## RS7 - RS6 == 0 0.89597 1.17776 0.761 1.0000
## RS8 - RS6 == 0 0.75308 1.17204 0.643 1.0000
## RS9 - RS6 == 0 -0.55698 1.14099 -0.488 1.0000
## RS8 - RS7 == 0 -0.14289 1.20111 -0.119 1.0000
## RS9 - RS7 == 0 -1.45295 1.17084 -1.241 0.9938
## RS9 - RS8 == 0 -1.31006 1.16504 -1.124 0.9977
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)
summary(glht(psm_gaus, linfct=mcp(Plant="Tukey")))## Warning in mcp2matrix(model, linfct = linfct): covariate interactions found
## -- default contrast might be inappropriate
## Warning in mcp2matrix(model, linfct = linfct): Completion with error >
## abseps
## Warning in mcp2matrix(model, linfct = linfct): Completion with error >
## abseps
## Warning in mcp2matrix(model, linfct = linfct): Completion with error >
## abseps
## Warning in mcp2matrix(model, linfct = linfct): Completion with error >
## abseps
## Warning in mcp2matrix(model, linfct = linfct): Completion with error >
## abseps
##
## Simultaneous Tests for General Linear Hypotheses
##
## Multiple Comparisons of Means: Tukey Contrasts
##
##
## Fit: psm(formula = Surv(DPI, as.numeric(Diseased), type = "right") ~
## Strain * Plant + ROTOPLAN, data = surv_table, dist = "gaussian")
##
## Linear Hypotheses:
## Estimate Std. Error z value Pr(>|z|)
## B - A == 0 -0.7901 1.2049 -0.656 0.99801
## C - A == 0 -2.4904 1.1628 -2.142 0.38577
## D - A == 0 -3.7013 1.1459 -3.230 0.02680 *
## E - A == 0 1.6412 1.2791 1.283 0.90451
## F - A == 0 0.6841 1.2635 0.541 0.99942
## G - A == 0 2.3893 1.3163 1.815 0.60753
## H - A == 0 -2.9529 1.1568 -2.553 0.17143
## C - B == 0 -1.7004 1.1371 -1.495 0.80895
## D - B == 0 -2.9112 1.1198 -2.600 0.15461
## E - B == 0 2.4313 1.2559 1.936 0.52384
## F - B == 0 1.4741 1.2399 1.189 0.93467
## G - B == 0 3.1793 1.2937 2.458 0.21194
## H - B == 0 -2.1628 1.1309 -1.912 0.54007
## D - C == 0 -1.2109 1.0743 -1.127 0.95058
## E - C == 0 4.1316 1.2155 3.399 0.01545 *
## F - C == 0 3.1745 1.1991 2.647 0.13757
## G - C == 0 4.8797 1.2547 3.889 0.00240 **
## H - C == 0 -0.4625 1.0859 -0.426 0.99988
## E - D == 0 5.3425 1.1993 4.455 < 0.001 ***
## F - D == 0 4.3854 1.1828 3.708 0.00497 **
## G - D == 0 6.0905 1.2389 4.916 < 0.001 ***
## H - D == 0 0.7484 1.0678 0.701 0.99695
## F - E == 0 -0.9571 1.3123 -0.729 0.99609
## G - E == 0 0.7481 1.3630 0.549 0.99937
## H - E == 0 -4.5941 1.2097 -3.798 0.00353 **
## G - F == 0 1.7052 1.3483 1.265 0.91109
## H - F == 0 -3.6370 1.1933 -3.048 0.04682 *
## H - G == 0 -5.3422 1.2491 -4.277 < 0.001 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)
Maybe, it would be interesting to only know which strain performed how within each Plant genotype used.
options(warn = -1)
plants_array <- array(levels(surv_table$Plant))
plant_within <- sapply(plants_array,
function(plant) {
psm(Surv(DPI, as.numeric(Diseased), type="right") ~ Strain + ROTOPLAN, data=surv_table[surv_table$Plant==plant,], dist="gaussian")
}, simplify=F)
names(plant_within) <- levels(surv_table$Plant)
plant_within_cld <- sapply(plants_array,
function(plant) {
cld( glht(
psm(Surv(DPI, as.numeric(Diseased), type="right") ~ Strain + ROTOPLAN, data=surv_table[surv_table$Plant==plant,], dist="gaussian"),
linfct=mcp(Strain="Tukey")
)
)
}
, simplify=F)
names(plant_within_cld) <- levels(surv_table$Plant)
plant_within_cld## $A
## RS1 RS10 RS11 RS12 RS13 RS14 RS2 RS3 RS4 RS5 RS6 RS7 RS8 RS9
## "ab" "c" "c" "ab" "a" "b" "ab" "ab" "ab" "ab" "ab" "ab" "ab" "b"
##
## $B
## RS1 RS10 RS11 RS12 RS13 RS14 RS2 RS3 RS4 RS5 RS6 RS7 RS8 RS9
## "ab" "c" "c" "ab" "ab" "ab" "ab" "a" "ab" "ab" "ab" "ab" "ab" "b"
##
## $C
## RS1 RS10 RS11 RS12 RS13 RS14 RS2 RS3 RS4 RS5 RS6 RS7 RS8 RS9
## "bc" "e" "e" "c" "ab" "d" "ab" "ab" "ab" "ab" "bc" "a" "bc" "bc"
##
## $D
## RS1 RS10 RS11 RS12 RS13 RS14 RS2 RS3 RS4 RS5 RS6 RS7
## "ad" "f" "e" "acd" "b" "ab" "bc" "ab" "b" "ab" "ae" "ab"
## RS8 RS9
## "bd" "ab"
##
## $E
## RS1 RS10 RS11 RS12 RS13 RS14 RS2 RS3 RS4 RS5 RS6 RS7
## "bc" "d" "d" "ab" "ab" "bcd" "bc" "bc" "c" "bc" "ab" "ac"
## RS8 RS9
## "bc" "b"
##
## $F
## RS1 RS10 RS11 RS12 RS13 RS14 RS2 RS3 RS4 RS5 RS6 RS7
## "ac" "f" "f" "cd" "bce" "ab" "ae" "ac" "a" "abd" "bc" "ac"
## RS8 RS9
## "ac" "c"
##
## $G
## RS1 RS10 RS11 RS12 RS13 RS14 RS2 RS3 RS4 RS5 RS6 RS7
## "bc" "a" "a" "cd" "b" "abc" "bd" "bc" "bc" "bc" "bc" "cd"
## RS8 RS9
## "bd" "c"
##
## $H
## RS1 RS10 RS11 RS12 RS13 RS14 RS2 RS3 RS4 RS5 RS6 RS7
## "be" "f" "f" "b" "b" "a" "cde" "c" "c" "ce" "bc" "ce"
## RS8 RS9
## "cde" "bd"
strain_array <- array(levels(surv_table$Strain))
strain_within <- sapply(strain_array,
function(strain) {
psm(Surv(DPI, as.numeric(Diseased), type="right") ~ Plant + ROTOPLAN, data=surv_table[surv_table$Strain==strain,], dist="gaussian")
}, simplify=F)
names(strain_within) <- levels(surv_table$Strain)
###Strain4 / Rs14, can not be fit. This is probably because it is flatlining (not causing symptoms) for some plants. This strain is excluded.
###I would consider this a relevant finding, but it is also quite obvious from all analysis in this file that this strain is different from the others
strain_within_cld <- sapply(strain_array[c(1:5,7:length(strain_array))],
function(strain) {
cld( glht(
psm(Surv(DPI, as.numeric(Diseased), type="right") ~ Plant + ROTOPLAN, data=surv_table[surv_table$Strain==strain,], dist="gaussian"),
linfct=mcp(Plant="Tukey")
)
)
}
, simplify=F)
names(strain_within_cld) <- levels(surv_table$Strain)[ c(1:5,7:length(strain_array)) ]
strain_within_cld## $RS1
## A B C D E F G H
## "ac" "ad" "bcd" "d" "a" "ab" "a" "cd"
##
## $RS10
## A B C D E F G H
## "bc" "ac" "ac" "c" "a" "ab" "ab" "c"
##
## $RS11
## A B C D E F G H
## "ab" "ab" "ab" "a" "ab" "b" "ab" "c"
##
## $RS12
## A B C D E F G H
## "ac" "ac" "b" "ab" "c" "bc" "c" "b"
##
## $RS13
## A B C D E F G H
## "ab" "bc" "bc" "bc" "bc" "c" "a" "d"
##
## $RS2
## A B C D E F G H
## "bc" "c" "ac" "ac" "ab" "ab" "a" "ac"
##
## $RS3
## A B C D E F G H
## "ab" "ab" "ab" "a" "b" "ab" "ab" "ab"
##
## $RS4
## A B C D E F G H
## "bc" "c" "bc" "bc" "a" "ab" "ab" "ac"
##
## $RS5
## A B C D E F G H
## "bc" "b" "ab" "b" "a" "ab" "ac" "ab"
##
## $RS6
## A B C D E F G H
## "ab" "ab" "bc" "c" "ab" "ac" "b" "bc"
##
## $RS7
## A B C D E F G H
## "ab" "ab" "ac" "b" "c" "ab" "bc" "bc"
##
## $RS8
## A B C D E F G H
## "bc" "c" "c" "bc" "ab" "ac" "a" "c"
##
## $RS9
## A B C D E F G H
## "a" "a" "a" "a" "a" "a" "a" "a"
contrasts(surv_table$Strain)## RS10 RS11 RS12 RS13 RS14 RS2 RS3 RS4 RS5 RS6 RS7 RS8 RS9
## RS1 0 0 0 0 0 0 0 0 0 0 0 0 0
## RS10 1 0 0 0 0 0 0 0 0 0 0 0 0
## RS11 0 1 0 0 0 0 0 0 0 0 0 0 0
## RS12 0 0 1 0 0 0 0 0 0 0 0 0 0
## RS13 0 0 0 1 0 0 0 0 0 0 0 0 0
## RS14 0 0 0 0 1 0 0 0 0 0 0 0 0
## RS2 0 0 0 0 0 1 0 0 0 0 0 0 0
## RS3 0 0 0 0 0 0 1 0 0 0 0 0 0
## RS4 0 0 0 0 0 0 0 1 0 0 0 0 0
## RS5 0 0 0 0 0 0 0 0 1 0 0 0 0
## RS6 0 0 0 0 0 0 0 0 0 1 0 0 0
## RS7 0 0 0 0 0 0 0 0 0 0 1 0 0
## RS8 0 0 0 0 0 0 0 0 0 0 0 1 0
## RS9 0 0 0 0 0 0 0 0 0 0 0 0 1
Finally, check for a rotoplan effect across the full dataset, using the same distribution as earlier
cld(glht(
survreg(Surv(DPI, as.numeric(Diseased), type="right") ~ Strain + ROTOPLAN, data=surv_table, dist="gaussian"),
linfct=mcp(ROTOPLAN="Tukey")
))## 1 2 3 4 5
## "c" "b" "b" "c" "a"
As can be seen, ROTOPLAN has a signficant effect. Rotoplans 2 and 3 are in the same signficance group, meaning they are not significantly different. The same is true for 1 and 4. One could explore this further by subsetting the dataset into two groups, one made of ROTOPLANs (1,4) and one of (2,3) and analyze those individually.
library("stringr")
library("dplyr")##
## Attaching package: 'dplyr'
## The following object is masked from 'package:MASS':
##
## select
## The following objects are masked from 'package:Hmisc':
##
## combine, src, summarize
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
###The StrainPlant variable was generated earlier.
###The below assumes interactions between Strain and Plant, meaning we could write
### Surv(DPI, as.numeric(Diseased), type="right") ~ Strain * Plant
###and obtain the same results. I prefer the way it is currently done, because i can split it back into the original things more conveniently.
s_reg_gaus <- survreg(Surv(DPI, as.numeric(Diseased), type="right") ~ StrainPlant, surv_table, dist="gaussian")
s_reg_logistic <- survreg(Surv(DPI, as.numeric(Diseased), type="right") ~ StrainPlant, surv_table, dist="logistic")
s_reg_lnorm <- survreg(Surv(DPI, as.numeric(Diseased), type="right") ~ StrainPlant, surv_table, dist="lognormal")
s_reg_wei <- survreg(Surv(DPI, as.numeric(Diseased), type="right") ~ StrainPlant, surv_table, dist="weibull")
###Step 1, extract the coefficients. These are relative to StrainPlant combination RS1.A because StrainPlant is treatment contrasted.
surv_fit.df$StrainPlant <- interaction(surv_fit.df$Strain,surv_fit.df$Plant)
for (i in 1:nlevels(surv_fit.df$StrainPlant)) { #For loop through strains
if(i==1) { #Strain1 is relative to itself, so no change
coef_wei <- list()
coef_logistic <- list()
coef_gaus <- list()
coef_lnorm <- list()
coef_wei[i] <- coef(s_reg_wei)[i]
coef_logistic[i] <- coef(s_reg_logistic)[i]
coef_gaus[i] <- coef(s_reg_gaus)[i]
coef_lnorm[i] <- coef(s_reg_lnorm)[i]
} else { ###Other strains are relative to 1
coef_wei[i] <- coef(s_reg_wei)[1] + coef(s_reg_wei)[i]
coef_logistic[i] <- coef(s_reg_logistic)[1] + coef(s_reg_logistic)[i]
coef_gaus[i] <- coef(s_reg_gaus)[1] + coef(s_reg_gaus)[i]
coef_lnorm[i] <- coef(s_reg_lnorm)[1] + coef(s_reg_lnorm)[i]
}
}
##Step 2
####Store the coefficients and the scale in a new data frame, of parameters
### Keep in mind that survreg.distributions$weibull is different from rweibull, hence the difference in names.
sregparams <- data.frame(
StrainPlant = rep(levels(surv_table$StrainPlant),4 ), #Fill with strains
scale.wei = exp(unlist(coef_wei)), #weibull fit scale parameters
scale.logistic = rep(s_reg_logistic$scale, nlevels(surv_table$StrainPlant)), #fill with logis scales
scale.gaus = rep(s_reg_gaus$scale, nlevels(surv_table$StrainPlant)), #fill with gaus scales
scale.lnorm = rep(s_reg_lnorm$scale, nlevels(surv_table$StrainPlant)), #fill with lnorm scale
shape.wei = rep(1/s_reg_wei$scale, nlevels(surv_table$StrainPlant)), #shape for weibull
shape.logistic = unlist(coef_logistic), #shape for logistic
shape.gaus = unlist(coef_gaus), #shape for gaus
shape.lnorm = unlist(coef_lnorm) #shape for lnorm
)
##Step 3
###Calculate the "daily" value of each curve
for (i in 1:nlevels(surv_fit.df$StrainPlant)){
if(i==1) {
wei <- list()
logis <- list()
gaus <- list()
lnorm <- list()
}
x <- levels(surv_fit.df$StrainPlant)[i]
n <- c(1:max(surv_table$DPI))
data <- dplyr::filter(sregparams, StrainPlant==x) ###Watch out here. It is crucial to use dplyr::filter and not base::filter
time <- n
wei <- cbind(wei, pweibull(
q=n,
scale=data$scale.wei,
shape=data$shape.wei,
lower.tail=FALSE))
logis <- cbind(logis,plogis(
q=n,
scale=data$scale.logistic,
location=data$shape.logistic,
lower.tail=FALSE ))
gaus <- cbind(gaus,pnorm(
q=n,
sd=data$scale.gaus,
mean=data$shape.gaus,
lower.tail = F))
lnorm <- cbind(lnorm, plnorm(
q=n,
sd=data$scale.lnorm,
mean=data$shape.lnorm,
lower.tail=F))
}
##Step 4
###Put all the curves into a data.frame that contains information on "time" and also "Strain", for compatibility with other data.frames
sreg_curves <- data.frame(
wei.sreg = cbind(unlist(wei)),
logis.sreg = cbind(unlist(logis)),
gaus.sreg = cbind(unlist(gaus)),
lnorm.sreg = cbind(unlist(lnorm)),
StrainPlant = rep(unlist(levels(surv_fit.df$StrainPlant)),each=max(surv_table$DPI)),
time = rep(c(1:max(surv_table$DPI)), nlevels(surv_fit.df$StrainPlant))
)
sreg_curves$Strain <- str_split_fixed(sreg_curves$StrainPlant, "\\.",2)[,1]
sreg_curves$Plant <- str_split_fixed(sreg_curves$StrainPlant, "\\.",2)[,2]
##Step 5
###Turn that data.frame into a long data.frame (not used here but for other figures.)
sreg_long <- sreg_curves %>% gather(., key="Distribution",values = c(lnorm.sreg, wei.sreg,gaus.sreg,logis.sreg) )
sreg_long$Distribution <- as.factor(sreg_long$Distribution)
##Levels: gaus.sreg wei.sreg
levels(sreg_long$Distribution) <- c("Gaussian","Lognormal","Loglogistic","Weibull")
sreg_long$Strain <- str_split_fixed(sreg_long$StrainPlant, "\\.",2)[,1]
sreg_long$Plant <- str_split_fixed(sreg_long$StrainPlant, "\\.",2)[,2]Now, these can be plotted and inspected visually..
###Plot of KM+Weibull
ggplot(surv_fit.df, aes(time, surv, colour = Strain)) +
geom_step() +
geom_line(data=sreg_curves,aes(y=wei.sreg),color="black") +
facet_grid(Plant~Strain)+
ggtitle("Kaplan-Meier estimates and fit to\nWeibull distribution")###Plot of KM+Gaussian
ggplot(surv_fit.df, aes(time, surv, colour = Strain)) +
geom_step() +
geom_line(data=sreg_curves,aes(y=gaus.sreg),color="black") +
facet_grid(Plant~Strain) +
ggtitle("Kaplan-Meier estimates and fit to\nGaussian distribution")##Logis
ggplot(surv_fit.df, aes(time, surv, colour = Strain)) +
geom_step() +
geom_line(data=sreg_curves,aes(y=logis.sreg),color="black") +
facet_grid(Plant~Strain) +
ggtitle("Kaplan-Meier estimates and fit to\nLogistic distribution") ##Lnorm
ggplot(surv_fit.df, aes(time, surv, colour = Strain)) +
geom_step() +
geom_line(data=sreg_curves,aes(y=lnorm.sreg),color="black") +
facet_grid(Plant~Strain) +
ggtitle("Kaplan-Meier estimates and fit to\nLognormal distribution")sessionInfo()## R version 3.3.2 (2016-10-31)
## Platform: x86_64-apple-darwin13.4.0 (64-bit)
## Running under: macOS Sierra 10.12.1
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] dplyr_0.5.0 stringr_1.1.0 multcomp_1.4-6 TH.data_1.0-7
## [5] MASS_7.3-45 mvtnorm_1.0-5 rms_5.1-0 SparseM_1.74
## [9] Hmisc_4.0-2 Formula_1.2-1 lattice_0.20-34 survcomp_1.24.0
## [13] prodlim_1.5.7 ggplot2_2.2.1 survival_2.40-1 tidyr_0.6.0
##
## loaded via a namespace (and not attached):
## [1] zoo_1.7-14 reshape2_1.4.2 splines_3.3.2
## [4] colorspace_1.3-2 htmltools_0.3.5 yaml_2.1.14
## [7] base64enc_0.1-3 foreign_0.8-67 DBI_0.5-1
## [10] RColorBrewer_1.1-2 plyr_1.8.4 lava_1.4.6
## [13] MatrixModels_0.4-1 munsell_0.4.3 survivalROC_1.0.3
## [16] gtable_0.2.0 codetools_0.2-15 evaluate_0.10
## [19] labeling_0.3 latticeExtra_0.6-28 knitr_1.15.1
## [22] quantreg_5.29 htmlTable_1.8 Rcpp_0.12.8
## [25] acepack_1.4.1 KernSmooth_2.23-15 scales_0.4.1
## [28] backports_1.0.4 checkmate_1.8.2 rmeta_2.16
## [31] bootstrap_2015.2 gridExtra_2.2.1 digest_0.6.11
## [34] polspline_1.1.12 stringi_1.1.2 SuppDists_1.1-9.4
## [37] grid_3.3.2 rprojroot_1.1 tools_3.3.2
## [40] sandwich_2.3-4 magrittr_1.5 lazyeval_0.2.0
## [43] tibble_1.2 cluster_2.0.5 Matrix_1.2-7.1
## [46] data.table_1.10.0 assertthat_0.1 rmarkdown_1.3
## [49] R6_2.2.0 rpart_4.1-10 nlme_3.1-128
## [52] nnet_7.3-12