Part A
A statistical analysis of the impact of drought on algal growth in experimental
mesocosm channels.
Introduction and Data Collection:
Algal biofilms play an important trophic role in riverine systems (Ledger and Hildrew, 1998).
The composition and biomass of biofilms is dependent on light availability, nutrient supply
and grazing pressure (Caramujo, et al. 2008). Therefore, Biofilms are at risk from the direct
impacts of drought and top down effects from grazers.
The DriStream experimental facility in Fobdown, Hampshire, consists of 21 channels
supplied with water from a borehole. The channels are 10m long and 50cm wide, with an
equal amount and size of substrate distributed in a pool riffle sequence that is consistent
across all channels. Channels were colonised with matching macroinvertebrate and
macrophyte populations. The macroinvertebrate assemblage is based on the nearby chalk
stream river, where the invertebrates were originally collected from. Once populations had
stabilised, channel depths were altered. In total there are three replicates of seven different
depths (0, 2, 5, 7, 10, 25 and 35 cm).
Biofilms were grown on terracotta tiles with an area of 100 cm2
. The Biofilm was
washed off using a toothbrush and distilled water before being stored for laboratory analysis.
Four measurements of both grazed and ungrazed tiles were taken from each channel.
Grazed tiles were placed on the substrate in the pool sections and ungrazed tiles were
located alongside, suspended in the water column in order to minimise the impact of grazing
benthic invertebrates. Samples were brought back to the laboratory where the Ash Free Dry
Mass (AFDM) and the mass of Chlorophyll A (CHLA) were tested. The data used in this
analysis was collected in November, 2014.
Data Preparation:
The response variables consisted of: CHLA and AFDM. Explanatory variables included:
Depth, Grazed/Ungrazed, Flow and Plant Volume. Average response values for grazed and
ungrazed tiles were used for each channel in order to reduce the impact of variability in the
explanatory variables (depth). The data was organised in a long format (Appendix 2).Rstudio
was used to conduct all statistical analyses (R Core Team, 2013). The following
commands were used to apply necessary attributes to the data in order to allow the
undertaking of particular techniques:
Algae <- read.csv(file.choose()) # Load data file
Part A
str (Algae) # check structure
Algae$Depth.Num <- Algae$Depth # creates an extra Depth column which will remain as
an integer
Algae$Depth <- as.factor(Algae$Depth) # convert depth from Int. to Fac.
Algae["Log10.Flow"] <- NA # creates the new column named "log10.Flow" filled with "NA"
Algae$Log10.Flow <- log10(Algae$Flow) # creates log10 flow for GAM analyses later
Algae["Log10.plant"] <- NA # creates the new column named "log10.Plant" filled with "NA"
Algae$Log10.Plant <- log10(Algae$Plant.Volume) # adds log 10 data for use with matlines
The following code was then used to subset grazed and ungrazed data so that the
relationships within each treatment may be investigated.
Ungrazed <- subset(Algae, Graze == "Ungrazed") # Ungrazed subset
Grazed <- subset(Algae, Graze == "Grazed") # Grazed subset
Data Exploration
Boxplots
A number of box plots were created to compare AFDM and CHLA with depth in order to
determine the nature of any response. Boxplots are simple but are a highly robust way of
exploring data because the inter quartile range (IQR) helps to limit the influence of outliers.
This is particularly useful for environmental data where distributions are commonly nonlinear
(Massart, et al. 2005).
1) Comparison of grazed and Ungrazed AFDM
op <- par(mfrow = c(1, 2))
boxplot(AFDM ~ Graze, data = Algae, ylab = "Ash Free Dry Mass") # hard to interpret
due to distribution needs transforming...
boxplot(log10(AFDM+0.001) ~ Graze, data = Algae, ylab = "log10 Ash Free Dry Mass") #
inspect transformed Ash Free Dry Mass (AFDM) difference between grazed and ungrazed
tiles
par(op)
Part A
Figure 1: A comparison of AFDM from grazed and ungrazed tiles. Left – before response (AFDM) was
transformed, Right – following log10+ 0.001 transformation.
Due to the distribution of the AFDM data set, a comparison was only possible following the
transformation of AFDM. There is a difference between grazed and ungrazed AFDM, which
is investigated further in this report. Log10 + 0.001 was used to transform AFDM as there
are many zero values in the data set (Zurr, et al. 2010).
2) Change in AFDM with depth
op <- par(mfrow = c(1, 2))
boxplot(log10(AFDM+0.001) ~ Depth, data = Ungrazed, ylab = "Ash Free Dry Mass", xlab =
"Depth (cm)", main = "Ungrazed") # inspect difference in AFDM with Depth for Ungrazed
boxplot(log10(AFDM+0.001) ~ Depth, data = Grazed, ylab = "Ash Free Dry Mass", xlab =
"Depth (cm)", main = "Grazed") # inspect difference in AFDM with Depth for Grazed
par(op)
Part A
Figure 2: change in AFDM with depth for both grazed and ungrazed tiles.
Changes in grazed and ungrazed tiles differ considerably. Neither patterns appear to be
linear.
3) A comparison of CHLA with grazed and ungrazed tiles.
boxplot(CHLA ~ Graze, data = Algae, ylab = "Chlorophyll A") # inspect Chlorophyll A
(CHLA) difference between grazed and ungrazed
Figure 3: CHLA for grazed and ungrazed tiles
Part A
Figure 3 shows that there is a greater spread for grazed CHLA than ungrazed. However,
there is not a significant difference between the means.
4) Change in CHLA with depth
op <- par(mfrow = c(1, 2))
boxplot(CHLA ~ Depth, data = Ungrazed, ylab = "Chlorophyll A",xlab = "Depth (cm)", main =
"Ungrazed") # inspect difference in CHLA with Depth for Ungrazed
boxplot(CHLA ~ Depth, data = Grazed, ylab = "Chlorophyll A",xlab = "Depth (cm)", main =
"Grazed") # inspect difference in CHLA with Depth for Grazed
par(op)
Figure 4: change in CHLA with depth for grazed and ungrazed tiles
Figure 4 highlights that there is a general increase in CHLA, for both grazed and ungrazed
tiles, with depth.
CoPlots
Based on the evidence from the boxplots it appeared that there was a difference between
grazed and ungrazed responses to Depth. Therefore coplots were created in order to
visualize potential interactions that may have been taking place (Zuur, et al. 2010).
Part A
1) AFDM change with grazing and Depth
coplot(log10(AFDM+0.001) ~ Depth | Graze, panel = panel.smooth, data = Algae) # coplots
for AFDM with depth and Grazing
Figure 5: Coplot of grazed and ungrazed AFDM change with depth.
There is not a significant difference between the lines in the diagrams which suggests there
is not a significant interaction taking place (Zurr, et al. 2010). Further test will be needed to
determine this with more certainty.
2) CHLA change with grazing and Depth
coplot(CHLA ~ Depth | Graze, panel = panel.smooth, data = Algae) # coplot for CHLA with
depth and Grazing
Part A
Figure 6: Coplot of grazed and ungrazed CHLA change with depth.
A difference in line shape suggest that interaction may be taking place between grazed and
ungrazed tiles or that the response of CHLA to depth differs slightly between grazed and
ungrazed tiles.
3) AFDM change with grazing and Plant Volume
It was hypothesised that light penetration may have an impact on bethic algae
photosynthesis. Therefore, the change in AFDM and CHLA (figure 8) with plant volume is
visulaised using coplots.
coplot(log10(AFDM+0.001) ~ log10(Plant.Volume)| Graze, panel = panel.smooth, data =
Algae) # coplots for AFDM with plant volume and Grazing
Part A
Figure 7: Coplot of grazed and ungrazed AFDM change with plant volume.
Figure 7 indicates that there may be interaction between grazed and ungrazed tiles with
changing plant volume. However, grazed AFDM seems to be affected by some significant
outliers which may obscure the comparison in this form.
4) CHLA change with grazing and Plant Volume
coplot(CHLA ~ log10(Plant.Volume) | Graze, panel = panel.smooth, data = Algae) # coplot
for CHLA with plant volume and Grazing - very similar to Depth
Figure 8: Coplot of grazed and ungrazed CHLA change with plant volume.
Part A
Figure 8 indicated interaction may be taking place. However, the slope patterns are similar to
figure 6 suggesting that depth and plant volume are collinear.
Scatter Matrices
Two scatter matrices were created at this stage (grazed and Ungrazed) which included all
variables. This was to help visualise the change in response variables with explanatory
variables, within the grazed or ungrazed treatments. The use of smoothers in the scatter
matrices could help to identify possible non-linear patterns (Logan, 2010).
1) Grazed Scatter Matrix
scatterplotMatrix(~ Depth + log10(Flow) + log10(Plant.Volume) + log10(AFDM+0.001) +
CHLA , data = Grazed, diag = "boxplot", main = "Grazed") # scatter matrix to view
responses in grazed tiles
Figure 9: Scatterplot Matrix for the Grazed tiles
Part A
Figure 9 shows that depth, flow and plant volume are highly collinear. Folllowing
transformation, the AFDM is still not normally distributed. CHLA appears to increase with
Depth, although at this stage it is hard to determine if this relationship is linear or not. CHLA
and AFDM are inversely related.
2) Ungrazed scatter matrix
scatterplotMatrix(~ Depth + log10(Flow) + log10(Plant.Volume) + log10(AFDM+0.001) +
CHLA, data = Ungrazed, diag = "boxplot",main = "Ungrazed") # scatter matrix to view
responses in ungrazed tiles
Figure 10: Scatterplot Matrix for the ungrazed tiles
Figure 10 shows that ungrazed CHLA increases with Depth, although the shape of both the
linear regression and smoother are considerably different to the grazed treatment. The
distribution of CHLA data is less normal than in the grazed treatment, whereas the
distribution of AFDM is more normal than the grazed treatment. The response of AFDM to
depth appears to be nonlinear.
Part A
GG Plots
The interaction between grazed and ungrazed treatments for both CHLA and log10 AFDM
were plotted using the plyr (Wickham, 2011) and ggplot2 packages (Wickham, 2009). Using
ggplot2 allows you to visualize the change in the mean of the response whilst also
considering the variance of the data.
1) Mean grazed/ungrazed CHLA against Depth
library(plyr)
AlgaeSum <- ddply(Algae, c("Depth.Num", "Graze"), summarise,
N = length(CHLA),
mean = mean(CHLA),
sd = sd(CHLA),
se = sd / sqrt(N) ) # parameters for CHLA ggplot
AlgaeSum # check the file....
require(ggplot2)
pd <- position_dodge(.3)
ggplot(AlgaeSum, aes(x = Depth.Num, y = mean, colour = Graze, group = Graze)) +
geom_errorbar(aes(ymin=mean-se, ymax=mean+se), width=.2, size = 0.25, colour =
"Black",
position = pd) + geom_line(position = pd) + geom_point(position = pd, size = 2.5)
# plot for CHLA
Part A
Figure 11: Grazed and ungrazed CHLA against depth (cm)
There is clearly interaction between grazed and ungrazed CHLA. At lower depths, grazed
tiles seem to have higher CHLA mass than ungrazed tiles. At higher depths, ungrazed CHLA
is higher than grazed. A lack of measurements between 10-35cm reduces the certainty of
predictions at larger depths.
2) Mean grazed/ungrazed log10 AFDM against Depth
AlgaeSumAFDM <- ddply(Algae, c("Depth.Num", "Graze"), summarise,
N = length(log10(AFDM+0.001)),
mean = mean(log10(AFDM+0.001)),
sd = sd(log10(AFDM+0.001)),
se = sd / sqrt(N) ) # parameters for AFDM ggplot
pd <- position_dodge(.3)
Part A
ggplot(AlgaeSumAFDM, aes(x = Depth.Num, y = mean, colour = Graze, group = Graze)) +
geom_errorbar(aes(ymin=mean-se, ymax=mean+se), width=.2, size = 0.25, colour =
"Black",
position = pd) + geom_line(position = pd) + geom_point(position = pd, size = 2.5)
# ggplot for log10AFDM
Figure 12: Grazed and ungrazed log10 AFDM against depth (cm)
Figure 12 indicates that there is limited interaction between grazed and ungrazed AFDM and
that grazed AFDM is consistently higher than ungrazed AFDM. Response of both grazed
and ungrazed AFDM is non-linear.
Testing Assumptions
Before conducting statistical analysis it is important to determine if the data fits the
distributional assumptions of parametric tests as these tests have a greater statistical power
than non-parametric tests (Logan, 2010). In order to conduct two sample parametric tests
Part A
(two-sample t-test) and a parametric analysis of variance (ANOVA), the data must be
normally distributed and have homogeneous variance.
1) Data distribution
The Shapiro-Wilks test was used to test is the data were normally distributed. The ShapiroWilks
test is a null hypothesis test against the assumption of normality. Therefore a p-value
below the confidence limit indicates data is not normally distributed (Fells Statsplyr, 2011).
shapiro.test(Algae$CHLA) # 0.000505
data: Algae$CHLA
W = 0.8843, p-value = 0.000505
shapiro.test(Algae$AFDM) # 2.422e-13 both p values are < 0.05 suggesting data is not
normally distributed need QQ plot to check
data: Algae$AFDM
W = 0.2512, p-value = 2.422e-13
CHLA and AFDM are determined to have P-values less than 0.05 therefore we reject the null
hypothesis that the data is normally distributed. However, the use of normality tests is
disputed due to their inflexibility with differing sample sizes (R-bloggers, 2011). Therefore,
further investigation is required using QQ-plots which plot a continuous data set against a
theoretical normal distribution (Zuur, 2009).
op <- par(mfrow = c(2, 1))
qqnorm(Algae$CHLA)
qqline(Algae$CHLA) # looks alright will assume CHLA isnormall distributed
qqnorm(Algae$AFDM)
qqline(Algae$AFDM) # not good. AFDM not normally distributed - must use nonparametric
Wilcox test for AFDM
par(op)
Part A
Figure 13: QQplots of (top) CHLA and (bottom) AFDM
AFDM data is clearly not normally distributed and as such will have to be analysed using
non-parametric tests. CHLA appears to be have reasonable normal distribution and
considering the robust nature of linear regression, ANOVA and t-tests to non-normal
distributions, the CHLA data is considered to be normally distributed (Zuur, et al. 2010,
Logan, 2010).
1) Homogeneity of Variance
As AFDM is not considered to be normally distributed, tests for variance were only carried
out for CHLA. Levene’s test was used to calculate homogeneity of variance. The test is
hypothesis testing, with a null hypothesis that variance is homogeneous. In order to carry out
a t-test variances between CHLA and grazed/ungrazed must be homogenous. ANOVA and
ANCOVA also require equal variance. However, the homogeneity of these statistical models
will be determined using residual validation plots (Zurr, et al.,2010)
leveneTest(Algae$CHLA~Algae$Graze) # p > 0.05 therefore variance is considered
homogeneous - can use parametric t-test CHLA
Levene's Test for Homogeneity of Variance (center = median)
Part A
Df F value Pr(>F)
group 1 1.0437 0.3131
40
leveneTest(Algae$CHLA~Algae$Depth) # p > 0.05 therefore variance is considered
homogeneous - can use ANOVA
Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 6 1.6695 0.1578
35
Both Levene’s tests revealed that variances were homogeneous. Therefore, CHLA data fits
the requirements for parametric testing.
Two-sample tests
1) Wilcoxon signed-rank test
A Wilcoxon test was used to investigate the difference between grazed and ungrazed AFDM
samples because the lack of normality in the distribution prevents the use of a t-test
(Dalgaard, 2002).
library(coin) # load package needed for U test (Hawthorn, et al. 2006)
wilcox_test(AFDM ~ Graze, data = Algae, distribution = "exact") # p<0.05 (0.0004401)
therefore we can conclude there is a difference between grazed and ungrazed AFDM data
Exact Wilcoxon Mann-Whitney Rank Sum Test
data: AFDM by Graze (Grazed, Ungrazed)
Z = 3.3969, p-value = 0.0004401
alternative hypothesis: true mu is not equal to 0
A p-value < 0.05 indicates that the null hypothesis is rejected and it is considered that there
is a significant difference between grazed and ungrazed AFDM.
Part A
2) T-test
As CHLA data fits the requirements for a parametric test (Dalgaard, 2002), a two-sample ttest
was undertaken to compare the grazed and ungrazed CHLA data.
t.test(CHLA ~ Graze, data = Algae) # p> 0.05 (0.6893) therefore the null hypothesis is
accepted. The mean CHLA value is not significantly different between grazed and ungrazed
tiles...
data: CHLA by Graze
t = 0.4027, df = 39.768, p-value = 0.6893
The calculated P-value is greater than 0.05 and therefore we accept the null hypothesis that
the difference between grazed and ungrazed CHLA is not significant.
Analysis of Variance
1) One-way ANOVA – Grazed treatment
ANOVA tests were carried out on CHLA with each control variable on grazed and ungrazed
treatments. This analysis will help to determine the influence of different control variables on
CHLA (Doncaster and Davey, 2007). For those tests that were statistically significant,
verification plots were analysed and post-hoc Tukey tests were applied.
#CHLA ~ Depth
CHLAgrazedaov <- aov(CHLA ~ Depth, data = Grazed) # anova of grazed tile CHLA
against Depth
summary(CHLAgrazedaov) # P<0.05 (0.0393) significant relationship
TukeyHSD(CHLAgrazedaov) # post hoc test to identify the difference between
means that are greater than the standard error
op <- par(mfrow = c(2, 2))
plot(CHLAgrazedaov) # plot validation shows slightly wedge shaped residuals
par(op)
Part A
leveneTest(Grazed$CHLA~Grazed$Depth) # p > 0.05 so we can consider variance
homogeneous.
#CHLA ~ PLant Volume
CHLAgrazedaov2 <- aov(CHLA ~ Plant.Volume, data = Grazed) # anova of grazed tile
CHLA against plant volume
summary(CHLAgrazedaov2) # p > 0.05 (0.595) Not significant
#CHLA ~ Flow
CHLAgrazedaov3 <- aov(CHLA ~ Flow, data = Grazed) # anova of grazed tile CHLA
against Flow
summary(CHLAgrazedaov3) # p > 0.05 (0.607) Not significant
For Grazed tiles, the only significant ANOVA test was CHLA with depth. The results are
summarised here:
Df Sum Sq Mean Sq F value Pr(>F)
Depth 6 0.03620 0.006034 3.069 0.0393 *
Residuals 14 0.02753 0.001966
A post-hoc Tukey test was then applied to identify differences between means that were
greater than the standard error (Logan, 2010):
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = CHLA ~ Depth, data = Grazed)
$Depth
diff lwr upr p adj
2-0 0.036333333 -0.087289876 0.15995654 0.9448870
5-0 0.125333333 0.001710124 0.24895654 0.0459610
7-0 0.086000000 -0.037623209 0.20962321 0.2770254
10-0 0.098000000 -0.025623209 0.22162321 0.1670851
25-0 0.023333333 -0.100289876 0.14695654 0.9937725
35-0 0.076333333 -0.047289876 0.19995654 0.3983507
5-2 0.089000000 -0.034623209 0.21262321 0.2453632
7-2 0.049666667 -0.073956543 0.17328988 0.8074029
10-2 0.061666667 -0.061956543 0.18528988 0.6248235
Part A
25-2 -0.013000000 -0.136623209 0.11062321 0.9997609
35-2 0.040000000 -0.083623209 0.16362321 0.9164273
7-5 -0.039333333 -0.162956543 0.08428988 0.9221504
10-5 -0.027333333 -0.150956543 0.09628988 0.9858612
25-5 -0.102000000 -0.225623209 0.02162321 0.1396981
35-5 -0.049000000 -0.172623209 0.07462321 0.8163400
10-7 0.012000000 -0.111623209 0.13562321 0.9998495
25-7 -0.062666667 -0.186289876 0.06095654 0.6086721
35-7 -0.009666667 -0.133289876 0.11395654 0.9999574
25-10 -0.074666667 -0.198289876 0.04895654 0.4220462
35-10 -0.021666667 -0.145289876 0.10195654 0.9958082
35-25 0.053000000 -0.070623209 0.17662321 0.7602295
This test shows that the differences in the means of different depths is not significant.
Validation plots were then analysed to determine if the model can be accepted.
Figure 14: Validation plots for ANOVA test between grazed CHLA and depth
The residuals in figure 14 appear to be slightly wedge shaped. However, this is considered
to be attributed to a select few outliers. All plots are considered acceptable. A levene’s test
was carried out to check for homogeneity of variance:
Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 6 0.7662 0.6085
14
Part A
A P-value > 0.05 indicates that the null hypothesis is accepted – the variance for the test is
considered to be homogeneous.
2) One-way ANOVA – Ungrazed treatment
The same process was repeated for the ungrazed treatment.
#Ungrazed CHLA with Depth
CHLAUngrazedaov <- aov(CHLA ~ Depth, data = Ungrazed) # anova of ungrazed tile
CHLA against Depth
summary(CHLAUngrazedaov) # P>0.05 (0.18) not significant
# CHLA ~ PLant Volume
CHLAUngrazedaov2 <- aov(CHLA ~ Plant.Volume, data = Ungrazed) # anova of ungrazed
tile CHLA against plant volume.
summary(CHLAUngrazedaov2) # p <0.05 (0.0437) significant relationship
op <- par(mfrow = c(2, 2))
plot(CHLAUngrazedaov2) # plot validation shows that plant volume requires transforming...
par(op)
CHLAUngrazedaov2 <- aov(CHLA ~ log10(Plant.Volume), data = Ungrazed) # run anova
again with transformed plant volume
summary(CHLAUngrazedaov2) # p <0.05 (0.0236) significant relationship. improved
following transformation
op <- par(mfrow = c(2, 2))
Part A
plot(CHLAUngrazedaov2) # validaion plots are considerably better, apart from one major
outlier...
par(op)
Algaeno1 <- Ungrazed[(-1),] # remove influential point
CHLAUngrazedaov2 <- aov(CHLA ~ log10(Plant.Volume), data = Algaeno1) # run anova
again without influential point plant volume
summary(CHLAUngrazedaov2) # P<0.1 (0.0607)
op <- par(mfrow = c(2, 2))
plot(CHLAUngrazedaov2) # validaion plots look good.
par(op)
# Tukey test not applied as control variable is required to be a factor.
# Ungrazed CHLA with Flow
CHLAUngrazedaov3 <- aov(CHLA ~ Flow, data = Ungrazed) # anova of ungrazed tile
CHLA against Flow
summary(CHLAUngrazedaov3) # p > 0.05 (0.382) not significant
The only statistically significant ANOVA test for the ungrazed treatment was with plant
volume. The first summary is shown below:
Df Sum Sq Mean Sq F value Pr(>F)
Plant.Volume 1 0.01079 0.01079 4.668 0.0437 *
Residuals 19 0.04390 0.00231
Summary plots were then examined, which revealed that plant volume required transforming
in order to centre the residuals, fixing the heterogeneity issue.
Part A
Figure 15: Verification plots for ANOVA of ungrazed CHLA with plant volume (before data
transformation).
A log 10 transformation was used to reduce the difference in variances (Logan, 2010).
Figure 16: Verification plots for ANOVA of ungrazed CHLA with log10 plant volume (after control
variable transformation).
Part A
The Log10 transformation has reduced the heterogeneity of variance and verification plots
are considered acceptable, apart from one significant outlier, which records approximately
1.0 on the cook’s distance plot and is considered highly influential (Logan, 2010). This outlier
was removed and the ANOVA was re run, recording a P-value < 0.1 (0.06):
Df Sum Sq Mean Sq F value Pr(>F)
log10(Plant.Volume) 1 0.003307 0.003307 4.004 0.0607 .
Residuals 18 0.014866 0.000826
Validation plots were examined. Removal of the outlier, significantly improves any possible
issues of heterogeneous variance.
Figure 17: Final validation plot for ANOVA of ungrazed CHLA with plant volume (influential outlier
removed).
No Post-Hoc Tukey test was carried out for this ANOVA because a factor control variable is
required.
One-way ANOVA summary
One way ANOVA analyses reveal that grazed CHLA is significantly related to depth
(P<0.05). However, ungrazed CHLA is significantly related to plant volume (P<0.1).
Part A
3) Kruskal Wallace Test - Grazed AFDM
AFDM data is not normally distributed and as a result, the Kruskal-Wallace test is used to
test the difference between the medians of the population. This rank-based test is used as
an alternative to ANOVA (Logan, 2010).
#Grazed AFDM with Depth
AFDM.graze.KW <- kruskal.test(AFDM ~ Depth, data = Grazed) # Kruskal-Wallace test
(KW test) of grazed AFDM against Depth
AFDM.graze.KW # P > 0.05 (0.3614) not significant
#PLant Volume
AFDM.graze.KW2 <- kruskal.test(AFDM ~ Plant.Volume, data = Grazed) # KruskalWallace
test (KW test) of AFDM against Plant volume
AFDM.graze.KW2 # P > 0.05 (0.4579) not significant
#Flow
AFDM.graze.KW3 <- kruskal.test(AFDM ~ Flow, data = Grazed) # Kruskal-Wallace test
(KW test) of AFDM against Flow
AFDM.graze.KW3 # P > 0.05 (0.3954) not significant
#Ungrazed AFDM with Depth
AFDM.ungraze.KW <- kruskal.test(AFDM ~ Depth, data = Ungrazed) # Kruskal-Wallace
test (KW test) of Ungrazed AFDM against Depth
AFDM.ungraze.KW # P>0.05 (0.284) not significant
# PLant Volume
AFDM.ungraze.KW2 <- kruskal.test(AFDM ~ Plant.Volume, data = Ungrazed) # KruskalWallace
test (KW test) of Ungrazed AFDM against plant volume
Part A
AFDM.ungraze.KW2 # P>0.05 (0.4579) not significant
# Flow
AFDM.ungraze.KW3 <- kruskal.test(AFDM ~ Flow, data = Ungrazed) # Kruskal-Wallace
test (KW test) of Ungrazed AFDM against Flow
AFDM.ungraze.KW3 # P>0.05 (0.3954) not significant
Kruskal-Wallace summary:
No significant relationships between either grazed or ungrazed AFDM with any of the
explanatory variables.
4) Two-Way ANOVA
From the previous one-way ANOVA analysis it has been established that grazed CHLA is
related to depth and ungrazed CHLA is related to plant volume. The two-way analysis will
test the following four hypotheses (Doncaster and Davey, 2007):
1) Variation in CHLA is explained by the variance in depth and independently by the
variation in grazing (grazed/ungrazed).
# two-way anova considering grazing and Depth
Algae.aov1 <- aov(CHLA ~ Graze + Depth, data = Algae)
summary(Algae.aov1) # p > 0.05 not significant
The variance of CHLA cannot be explained by hypothesis 1.
2) Variation in CHLA is explained by the variance in plant volume and independently by the
variation in grazing (grazed/ungrazed).
# two-way anova considering grazing and plant volume
Algae.aov3 <- aov(CHLA ~ Graze + Plant.Volume, data = Algae)
Part A
summary(Algae.aov3) # p > 0.05 not significant
The variance of CHLA cannot be explained by hypothesis 2.
3) Variation in CHLA is explained by the inter-dependent effects of variance in plant volume
and grazing (grazed/ungrazed).
# two-way interaction considering grazing*plant volume
Algae.aov4 <- aov(CHLA ~ Graze*Plant.Volume, data = Algae)
summary(Algae.aov4) # Graze:Plant.Volume P = 0.0796
Df Sum Sq Mean Sq F value Pr(>F)
Graze 1 0.00048 0.000480 0.171 0.6815
Plant.Volume 1 0.00265 0.002647 0.943 0.3377
Graze:Plant.Volume 1 0.00911 0.009107 3.245 0.0796 .
Residuals 38 0.10666 0.002807
Interaction between graze and plant volume is significant (P<0.1). Inspect validation plots:
Figure 18: Explanatory plots for CHLA ~ graze*plant volume ANOVA.
Part A
Plots indicate that the control variable (plant volume) requires transformation due to
heterogeneity in the residuals. The ANOVA was re-run with log10 transformed plant volume.
Algae.aov4 <- aov(CHLA ~ Graze*log10(Plant.Volume), data = Algae) # run with
transformed explanatory
summary(Algae.aov4) # interaction is no longer significant
Df Sum Sq Mean Sq F value Pr(>F)
Graze 1 0.00048 0.000480 0.174 0.679
log10(Plant.Volume) 1 0.00893 0.008929 3.236 0.080 .
Graze:log10(Plant.Volume) 1 0.00464 0.004638 1.681 0.203
Residuals 38 0.10485 0.002759
Interaction between Graze and Plant volume is no longer significant following the removal of
residual heterogeneity.
4) Variation in CHLA is explained by the inter-dependent effects of variance in Depth and
grazing (grazed/ungrazed).
# two-way interaction considering grazing*Depth
Algae.aov2 <- aov(CHLA ~ Graze*Depth, data = Algae)
summary(Algae.aov2) # significant relationship (P<0.1). Graze:Depth (P= 0.0584)
Df Sum Sq Mean Sq F value Pr(>F)
Graze 1 0.00048 0.000480 0.229 0.6360
Depth 6 0.03019 0.005032 2.399 0.0536 .
Graze:Depth 6 0.02950 0.004917 2.344 0.0584 .
Residuals 28 0.05872 0.002097
Interaction between grazing and depth is significant (P<0.1). Validation plots were used to
assess the reliability of the analysis:
Part A
Figure 19: Explanatory plots for CHLA ~ graze*Depth ANOVA.
Validation plots indicate that there is some heterogeneity in residuals however, this is
influenced by three outliers. This ANOVA test is accepted. Further investigation with an
ANCOVA analysis is undertaken in this report.
Linear Models
1) One-way linear models
Based on the one-way ANOVA analyses we know which one-way linear models will be
significant (grazed CHLA with Depth and ungrazed CHLA with plant volume). Linear models
were created for these variables to visualise the relationship.
Ungrazed CHLA with Plant Volume:
CHLA.ungrazed.lm <- lm(CHLA ~ Log10.Plant, data = Algaeno1) # lm of grazed tile CHLA
against plant volume
summary(CHLA.ungrazed.lm ) # P<0.1 (0.0607) significant relationship
Part A
op <- par(mfrow = c(2, 2))
plot(CHLA.ungrazed.lm) # Plot validation same as anova - looks alright one major outlier
par(op)
Verification plots are shown in figure 17. Following verification, the model was plotted with
confidence intervals of 95%.
Plot(CHLA ~ Log10.Plant , data = Ungrazed, ylim=c(0,0.3), xlab = “log10 (plant volume)”,
ylab = “Chlorophyll A”,col = “blue”, main = “Ungrazed Tiles”) # plot CHLA for
ungrazed against Plant volume.
Abline(CHLA.ungrazed.lm) # fit the linear model line.
Text(5.5,0.25, “Chlorophyll A = -0.07508Plant.Volume+ 0.02797”, pos = 2)
text(5,0.23, expression(paste(P == 0.0607)), pos = 2)
text(5,0.21, expression(paste(R^2 == 0.1365)), pos = 2) # add equation P value and R2
x <- seq(min(Ungrazed$Log10.Plant), max(Ungrazed$Log10.Plant), l=1000) # for each
value of x, calculate the upper and lower 95% confidence
y<-predict(CHLA.ungrazed.lm, data.frame(Log10.Plant=x), interval=”c”)
matlines(x,y, lty=3, col=”black”) #plot the upper and lower 95% confidence limits
Part A
Figure 20: Linear model of ungrazed CHLA change with plant volume.
The residuals were then tested:
plot(CHLA.ungrazed.lm$resid ~ log10(Algaeno1$Plant.Volume)) # residuals plot looks
good . Model can be accepted.
Figure 21: Fitted vs. predicted residuals for the linear model of ungrazed CHLA against plant volume.
It was concluded that the model was reliable because variance is considered homogeneous.
Part A
Grazed CHLA with Depth:
CHLA.Grazed.lm2 <- lm(CHLA ~ Depth, data = Grazed) # lm of grazed tile CHLA against
plant volume
summary(CHLA.Grazed.lm2 ) # significant P<0.05
Residuals:
Min 1Q Median 3Q Max
-0.083667 -0.019000 -0.001667 0.023000 0.064333
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.002667 0.025600 0.104 0.91852
Depth2 0.036333 0.036204 1.004 0.33262
Depth5 0.125333 0.036204 3.462 0.00381 **
Depth7 0.086000 0.036204 2.375 0.03236 *
Depth10 0.098000 0.036204 2.707 0.01703 *
Depth25 0.023333 0.036204 0.644 0.52968
Depth35 0.076333 0.036204 2.108 0.05350 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.04434 on 14 degrees of freedom
Multiple R-squared: 0.5681, Adjusted R-squared: 0.383
F-statistic: 3.069 on 6 and 14 DF, p-value: 0.03929
This model shows a significant relationship with a P- value <0.05. Model validation plots can
be seen in figure 14.The model was then plotted to visualise the relationship. As depth is a
factor, plotting the linear model required the input of the slope value for each factor, as
shown in the code.
plot(CHLA ~ Depth.Num, data = Grazed, col = "blue", xlab = "Depth (cm)", ylab =
"Chlorophyll A", main = "Grazed Tiles") # create plot
abline(coef(CHLA.Grazed.lm2)[1], coef(CHLA.Grazed.lm2)[2-3-4-5-6-7]) # plot linear model
text(35,0.13, "Chlorophyll A = 0.002667Depth + 0.445332", pos = 2)
text(35,0.12, expression(paste(P == 0.03929)), pos = 2)
Part A
text(35,0.11, expression(paste(R^2 == 0.383)), pos = 2) # add equation, R^2 and P value
#calculate slope#
0.036333+0.125333+0.086000+0.098000+0.023333+0.076333
Figure 22: Linear model of grazed CHLA change with depth.
In order to finalise plot validation, a plots of predicted vs. fitted residuals were created:
plot(CHLA.Grazed.lm2$resid ~ Grazed$Depth.Num) # looks slightly wedge shaped,
outliers mainly causing this. model is accepted.
Part A
Figure 23: Fitted vs. predicted residuals for the linear model of grazed CHLA against depth.
Figure 24: Fitted vs. predicted residuals boxplot for the linear model of grazed CHLA against depth.
Figure 23 shows the residuals are unbiased but slightly heteroscedastic (Logan, 2010).
However, as depth is a factor, Figure 24 more accurately show the residual distribution,
which is slightly humped indicating possible heterogeneity of variance. This difference in
variance is caused by an uneven distribution of measured depths. If more measurements
had been recorded at depths between 10-35cm it is likely that the variance would be less
Part A
heterogeneous. The model is accepted, although further analysis is required to produce a
better fitting model.
2) Analysis of Co Variance (ANCOVA)
In order to produce an effective model for the whole data set, without separating grazed and
ungrazed treatments it is necessary to add grazing/ungrazing to a linear model as a
covariate, similar to two way-anova (figure 19) (Logan, 2010). This will help to explain some
of the variability in the data set and show how grazed and ungrazed CHLA respond to
changes in depth.
Before undertaking the ANCOVA analysis, a Coplot was created to explore the
significance of using Grazing as a covariate (Sarkar, 2008).
require(lattice)
# Here's the plot
xyplot(CHLA ~ Depth |Graze # create plot
,xlab = "Depth", ylab = "CHLA", data = Algae)
Figure 25: Coplot for CHLA against depth divided into the respective grazing catagories.
Part A
Figure 25 indicates that adding grazing as a covariate could help to remove some variation,
despite being considered not significantly different during a t-test. The ANCOVA model was
created and validated as follow:
algae.lm <- lm(CHLA ~ Depth* Graze, data = Algae) # linear ancova for CHLA with graze
and Depth
op <- par(mfrow = c(2,2))
plot(algae.lm)
par(op) # residuals are a bit wedge shaped as earlier with 2-way anova but mainly
caused by outliers and the unequal range of factors so I'll continue
Figure 26: Validation plots for ANCOVA analysis ( CHLA ~ Graze*Depth)
Validation plot indicates, possible heterogeneity of variance, however, this could have been
caused by the outliers or unequal distribution of depths. Further investigation of residuals is
required:
hist(algae.lm$resid) # residuals histogram
Part A
Figure 27: Histogram showing residuals for the ANCOVA model.
plot(algae.lm$resid) # residuals plot
Figure 28: Scatterplot showing the residuals for the ANCOVA model.
Part A
plot(algae.lm$resid ~ Algae$Depth) # residuals here look very good.
Figure 29: Boxplot showing the residuals against measured explanatory variable.
Through the Analysis of Figures 27-29, it is concluded that homogeneity of variance is
satisfactory and therefore the model can be accepted. The model was then evaluated using
an ANOVA analysis which recorded a significant value (P<0.1). The model fit had a higher
significance (P<0.05):
anova(algae.lm) # ANOVA of linear model, the same as the two-way ANOVA earlier
Response: CHLA
Df Sum Sq Mean Sq F value Pr(>F)
Depth 6 0.030194 0.0050323 2.3995 0.05364 .
Graze 1 0.000480 0.0004801 0.2289 0.63604
Depth:Graze 6 0.029499 0.0049165 2.3443 0.05838 .
Residuals 28 0.058723 0.0020972
summary(algae.lm) # looks good model has a significant p value (0.0386)
Residuals:
Min 1Q Median 3Q Max
-0.083667 -0.018917 -0.001833 0.019000 0.119667
Part A
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.002667 0.026440 0.101 0.92038
Depth2 0.036333 0.037392 0.972 0.33953
Depth5 0.125333 0.037392 3.352 0.00231 **
Depth7 0.086000 0.037392 2.300 0.02911 *
Depth10 0.098000 0.037392 2.621 0.01401 *
Depth25 0.023333 0.037392 0.624 0.53767
Depth35 0.076333 0.037392 2.041 0.05073 .
GrazeUngrazed 0.005000 0.037392 0.134 0.89458
Depth2:GrazeUngrazed 0.019667 0.052880 0.372 0.71276
Depth5:GrazeUngrazed -0.069333 0.052880 -1.311 0.20046
Depth7:GrazeUngrazed -0.058333 0.052880 -1.103 0.27937
Depth10:GrazeUngrazed -0.051000 0.052880 -0.964 0.34308
Depth25:GrazeUngrazed 0.095333 0.052880 1.803 0.08219 .
Depth35:GrazeUngrazed -0.018667 0.052880 -0.353 0.72673
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.0458 on 28 degrees of freedom
Multiple R-squared: 0.5061, Adjusted R-squared: 0.2768
F-statistic: 2.207 on 13 and 28 DF, p-value: 0.03867
The model is therefore accepted and is statistically significant. The ANCOVA model was
plotted as follows:
coef(algae.lm)
algae.lm$coeff[1] # Grazed intercept
algae.lm$coeff[2-3-4-5-6-7] # Grazedlope
algae.lm$coeff[1+8] # Ungrazed intercept
algae.lm$coeff[9-10-11-12-13-14] # Ungrazed slope # These are the coefficients
needed to form the lines with Depth as a factor
# plot
plot(CHLA ~ Depth.Num, data = Algae, pch = 19,cex = 1.5,
col = c("Black", "Red")[Algae$Graze],
xlab = list("Depth", cex = 1.2),
Part A
ylab = list("Chlorophyll A", cex = 1.2))
# Now add a legend
legend(1, 0.25, legend = c("Grazed", "Ungrazed"),
col = c("black", "red"), pch = 19)
# Now add the coefficients from above to the abline command to get both lines
abline(coef(algae.lm)[1], coef(algae.lm)[2-3-4-5-6-7]) # This adds the Grazed
regression line
# Now add the ungrazed regression line
abline(coef(algae.lm)[1] + coef(algae.lm)[8], # Ungrazed intercept
coef(algae.lm)[9-10-11-12-13-14], # Ungrazed slope
col = "red")
text(35,0.18, "Chlorophyll A*Graze = -0.0023 + 0.3629 * Depth", pos = 2)
text(35,0.16, expression(paste(P == 0.03867)), pos = 2)
text(35,0.14, expression(paste(R^2 == 0.2768)), pos = 2)
The following indicates the calculation of the ANCOVA equation:
#calculate Grazed slope#
0.036333+0.125333+0.086000+0.098000+0.023333+0.076333 # 0.445332
#calculate Ungrazed slope
0.019667-0.069333-0.058333-0.051000+0.095333-0.018667 # - 0.082333
# calculate intercept
0.0027-0.005 # -0.0023
# calculate intercept
0.445332- 0.082333 # 0.362999 therefore CHLOROPHYLLAgrazetype = -
0.0023+0.3629*Depth
Part A
Figure 30: ANCOVA plot showing the change in CHLA on grazed and ungrazed tiles with Depth (cm).
This model is the most effective at explaining the variance in CHLA in response to changing
depth (P=0.0039). It shows that there is (i) not a significant difference in CHLA mass on
grazed and ungrazed tiles, (ii) no significant interaction between grazed and ungrazed tiles
and (iii) that on average, ungrazed tiles have higher CHLA mass than grazed tiles.
3) Multiple Linear Regression
Based on the physiological requirements of benthic algae, changes in either flow or plant
volume could affect the growth of biofilms. Multiple linear regression may help to test this
possibility. The variation inflation factors (vifs) were inspected to determine multicollinearity.
Values above 5 are considered collinear and are therefore not appropriate for multiple linear
regression (Logan, 2010).
Part A
vif(lm(CHLA ~ Depth + log10(Flow) + log10(Plant.Volume), data = Algae ))
# colinearity prevents conducting multiple linear regression.
GVIF Df GVIF^(1/(2*Df))
Depth 550.57799 6 1.692005
log10(Flow) 67.72208 1 8.229343
log10(Plant.Volume) 101.36930 1 10.068232
>
With values well in excess of 5, these control variables can be considered highly collinear
and multiple linear regression was therefore not carried out.
4) Non-linear modelling:
Non-linear modelling cannot me conducted where the control variable is a factor. Therefore
the only relationship that presented the possibility of non-linear modelling was between
ungrazed CHLA and plant volume (figure 20). The model created was a second order
polynomial relationship, however, the model was a less suitable fit than the linear model
shown in figure 20 and was therefore not accepted:
algae.lm1 <- lm(CHLA ~ I(Log10.Plant^2), data = Algaeno1)
summary(algae.lm1) # significant p value (0.076999). However not as significant as the
linear model
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.0172684 0.0187194 0.922 0.368
I(Log10.Plant^2) 0.0014130 0.0007533 1.876 0.077 .
General Additive Models (GAMs)
Where trends are too complex, or their distribution does not fit the requirements of a
parametric regression analysis, GAMs can be used to show patterns in response variables
using a smoother function (Logan, 2010). As the change in AFDM was determined to be
unsuitable for linear modelling and the trends displayed in the data are clearly non-linear
(Figure 12), GAM analysis is more appropriate than general linear modelling.
Part A
GAMs require a numerical control variable and therefore, depth is unsuitable for use
with GAMs. In its place, log10 transformed flow values have been used. The relationship
between log10 flow and depth is shown to be highly significant (P< 2.2e-16) at all depths:
depth.log10flow <- lm( log10(Flow) ~ Depth , data = Algae)
summary(depth.log10flow) # linear model of depth and log10 flow to show they
are highly related. log10 flow is used for GAMS because Depth is a factor and has fewer
unique covariate combinations than specified maximum degrees of freedom. P = 2.2e-16
and R^2 = 0.978
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.08329 0.06374 -17.00 <2e-16 ***
Depth2 1.55748 0.09014 17.28 <2e-16 ***
Depth5 1.66349 0.09014 18.45 <2e-16 ***
Depth7 2.03831 0.09014 22.61 <2e-16 ***
Depth10 2.30769 0.09014 25.60 <2e-16 ***
Depth25 3.10382 0.09014 34.44 <2e-16 ***
Depth35 3.34504 0.09014 37.11 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.1561 on 35 degrees of freedom
Multiple R-squared: 0.9812, Adjusted R-squared: 0.978
F-statistic: 305 on 6 and 35 DF, p-value: < 2.2e-16
AFDM was transformed using the log10+1 technique (Zuur, et al. 2010) in order to remove
some of the variance and allow for greater ease of interpretation. The analysis was carried
out on grazed and ungrazed tiles separately as follows:
1) GAM of grazed AFDM with flow
AFDM.gam1 <- gam(log10(AFDM+0.001)~ s(Log10.Flow), data = Grazed) # run GAM of
AFDM against flow for grazed tiles
plot(AFDM.gam1) # looks as if there is a decrease in AFDM with flow.
summary(AFDM.gam1) # significant smoother term (p = 0.00172)
Part A
Approximate significance of smooth terms:
edf Ref.df F p-value
s(Log10.Flow) 1.863 2.216 8.66 0.00172 **
gam.check(AFDM.gam1) # verification plots are mostly
Figure 31: Verification plots for GAM of AFDM with log10 flow.
The verification plots for this GAM (Figure 31) are mostly satisfactory. Heterogeneity in the
residuals is likely caused by the lack of channels between10 – 25cm depth. The complete
GAM plot is seen below:
plot(log10(AFDM+0.001) ~ Log10.Flow, data = Grazed,
pch = 16,
xlab = "Log10 Flow",
ylab = "Ash Free Dry Mass", main = "GAM of Grazed AFDM ")
# predict across the data
x <- seq(min(Grazed$Log10.Flow), max(Grazed$Log10.Flow), l=100) # 100 steps....
y <- predict(AFDM.gam1, data.frame(Log10.Flow = x), se = TRUE) # using standard errors
se = TRUE
Part A
# add lines
lines(x, y$fit) # plots the fitted values
lines(x, y$fit + 2 * y$se.fit, lty = 2) # plots a dashed line for 2 * SE above the fit
lines(x, y$fit - 2 * y$se.fit, lty = 2) # plots a dashed line for 2 * SE below the fit
Figure 32: GAM of log10 +1 AFDM change with log10 Flow.
The model is accepted and indicates that AFDM decreases with increasing log10 flow.
2) GAM of ungrazed AFDM with flow
AFDM.gam2 <- gam(log10(AFDM+0.001) ~ s(Log10.Flow), data = Ungrazed) # run GAM of
AFDM against flow for ungrazed tiles
plot(AFDM.gam2) # plot GAM - very clearly no pattern...
Part A
Figure 33: GAM plot for ungrazed log10+1 AFDM change with log10 flow.
summary(AFDM.gam2) # smoother term not significant P>0.1
Approximate significance of smooth terms:
edf Ref.df F p-value
s(Log10.Flow) 1 1 0.004 0.948
The GAM (Figure 33) indicates that there is no relationship between ungrazed AFDM and
flow.
Summary:
1) A Wilcoxon’s test revealed that there was a significant difference between AFDM on
grazed and ungrazed tiles (P<0.05).
2) Using a T-test is was shown that CHLA mass was not significantly different on grazed and
ungrazed tiles (P>0.1).
3) One-way ANOVA analysis revealed that grazed CHLA was significantly related to depth
(P<0.05) and that ungrazed CHLA was significantly related to plant volume (P<0.01).
4) 2-way ANOVA and ANCOVA analyses were used to investigate the covariation effects of
depth and grazing. The model showed that variation in CHLA is explained by the interdependent
effects of variance in Depth and grazing (P<0.05).
5) GAMs were produced for grazed and ungrazed AFDM change with depth. The models
showed that grazed AFDM decreased non-linearly with increasing depth and that ungrazed
AFDM did not respond to changes in depth.
Part A
References:
Caramujo, M. J., Mendes, C. R. B., Cartaxana, P., Brotas, V., & Boavida, M. J. (2008).
Influence of drought on algal biofilms and meiofaunal assemblages of temperate
reservoirs and rivers. Hydrobiologia, 598(1), 77-94.
Dalgaard, P (2008). Introductory statistics with R. New York: Springer. 95-224.
Doncaster, C. P., & Davey, A. J. (2007). Analysis of variance and covariance: how to choose
and construct models for the life sciences. Cambridge University Press.
Fells Stats. (2011). Normality tests don't do what you think they do.Available:
http://blog.fellstat.com/?p=61. Last accessed 1st May 2014.
Fox, J and Weisberg, S. (2011). An {R} Companion to Applied Regression, Second Edition.
Thousand Oaks CA: Sage. URL:
http://socserv.socsci.mcmaster.ca/jfox/Books/Companion
Hothorn, T., Hornik, K., van de Wiel, M.A., and Zeileis, A. (2006). A Lego System for
Conditional Inference. The American Statistician 60(3), 257-263.
Ledger, M. E., & Hildrew, A. G. (1998). Temporal and spatial variation in the epilithic biofilm
of an acid stream. Freshwater Biology, 40(4), 655-670.
Massart, D. L., Smeyers-Verbeke, J., Capron, X., & Schlesier, K. (2005). Visual presentation
of data by means of box plots.Logan, 2010
R Core Team (2013). R: A language and environment for statistical computing. R
Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/.
Sarkar, D. (2008) Lattice: Multivariate Data Visualization with R. Springer, New York. ISBN
978-0-387-75968-5
Wickham, H. (2009) ggplot2: elegant graphics for data analysis. Springer New York.
Wickham, H. (2011). The Split-Apply-Combine Strategy for Data Analysis. Journal of
Statistical Software, 40(1), 1-29. URL: http://www.jstatsoft.org/v40/i01/.
Zuur, A. F., Ieno, E. N., & Elphick, C. S. (2010). A protocol for data exploration to avoid
common statistical problems. Methods in Ecology and Evolution,1(1), 3-14.
Part A
Appendices:
Appendix 1 – R script:
######### Algae Analysis Coursework for Environmenta Analysis and Monitoring ##############
rm(list=ls()) # Clear Memory
# set working directory
getwd() # check directory location
Algae <- read.csv(file.choose()) # Load data file
str (Algae) # check structure
Algae$Depth.Num <- Algae$Depth # creates an extra Depth column which will remain as an
integer
Algae$Depth <- as.factor(Algae$Depth) # convert depth from Int. to Fac.
Algae["Log10.Flow"] <- NA # creates the new column named "log10.Flow" filled with "NA"
Algae$Log10.Flow <- log10(Algae$Flow) # creates log10 data for flow for GAM analyses later
Algae["Log10.Plant"] <- NA # creates the new column named "log10.Plant" filled with "NA"
Algae$Log10.Plant <- log10(Algae$Plant.Volume) # adds log 10 data for use with matlines
######## subset grazed and ungrazed ############
Ungrazed <- subset(Algae, Graze == "Ungrazed") # Ungrazed subset
Ungrazed
Grazed <- subset(Algae, Graze == "Grazed") # Grazed subset
Grazed
##### plot some images to see what the data looks like ######
Part A
#Boxplots
op <- par(mfrow = c(1, 2))
boxplot(AFDM ~ Graze, data = Algae, ylab = "Ash Free Dry Mass") # hard to interpret due to
distribution needs transforming...
boxplot(log10(AFDM+0.001) ~ Graze, data = Algae, ylab = "log10 Ash Free Dry Mass") # inspect
transformed Ash Free Dry Mass (AFDM) difference between grazed and ungrazed
par(op)
op <- par(mfrow = c(1, 2))
boxplot(log10(AFDM+0.001) ~ Depth, data = Ungrazed, ylab = "log10 Ash Free Dry Mass", xlab =
"Depth (cm)", main = "Ungrazed") # inspect difference in AFDM with Depth for Ungrazed
boxplot(log10(AFDM+0.001) ~ Depth, data = Grazed, ylab = "log10 Ash Free Dry Mass", xlab =
"Depth (cm)", main = "Grazed") # inspect difference in AFDM with Depth for Grazed
par(op)
boxplot(CHLA ~ Graze, data = Algae, ylab = "Chlorophyll A") # inspect Chlorophyll A (CHLA)
difference between grazed and ungrazed
op <- par(mfrow = c(1, 2))
boxplot(CHLA ~ Depth, data = Ungrazed, ylab = "Chlorophyll A",xlab = "Depth (cm)", main =
"Ungrazed") # inspect difference in CHLA with Depth for Ungrazed
boxplot(CHLA ~ Depth, data = Grazed, ylab = "Chlorophyll A",xlab = "Depth (cm)", main = "Grazed") #
inspect difference in CHLA with Depth for Grazed
par(op)
# Coplots
coplot(log10(AFDM+0.001) ~ Depth | Graze, panel = panel.smooth, data = Algae) # coplots for AFDM
with depth and Grazing
coplot(CHLA ~ Depth | Graze, panel = panel.smooth, data = Algae) # coplots for CHLA with depth
and Grazing
coplot(log10(AFDM+0.001) ~ log10(Plant.Volume)| Graze, panel = panel.smooth, data = Algae) #
coplots for AFDM with plant volume and Grazing
Part A
coplot(CHLA ~ log10(Plant.Volume) | Graze, panel = panel.smooth, data = Algae) # coplots for
CHLA with plant volume and Grazing - very similar to Depth
# scatterplot matrix
require(car)
scatterplotMatrix(~ Depth + log10(Flow) + log10(Plant.Volume) + log10(AFDM+0.001) + CHLA
, data = Grazed, diag = "boxplot", main = "Grazed") # scatter matrix to view responses
in grazed tiles
scatterplotMatrix(~ Depth + log10(Flow) + log10(Plant.Volume) + log10(AFDM+0.001) + CHLA
, data = Ungrazed, diag = "boxplot",main = "Ungrazed") # scatter matrix to view
responses in ungrazed tiles
###gg plot to compare the mean and errors of grazed and ungrazed CHLA ####
library(plyr)
AlgaeSum <- ddply(Algae, c("Depth.Num", "Graze"), summarise,
N = length(CHLA),
mean = mean(CHLA),
sd = sd(CHLA),
se = sd / sqrt(N) ) # parameters for CHLA ggplot
AlgaeSum # check the file....
require(ggplot2)
pd <- position_dodge(.3)
ggplot(AlgaeSum, aes(x = Depth.Num, y = mean, colour = Graze, group = Graze)) +
geom_errorbar(aes(ymin=mean-se, ymax=mean+se), width=.2, size = 0.25, colour = "Black",
position = pd) + geom_line(position = pd) + geom_point(position = pd, size = 2.5) # plot for
CHLA
Part A
##
AlgaeSumAFDM <- ddply(Algae, c("Depth.Num", "Graze"), summarise,
N = length(log10(AFDM+0.001)),
mean = mean(log10(AFDM+0.001)),
sd = sd(log10(AFDM+0.001)),
se = sd / sqrt(N) ) # parameters for AFDM ggplot
pd <- position_dodge(.3)
ggplot(AlgaeSumAFDM, aes(x = Depth.Num, y = mean, colour = Graze, group = Graze)) +
geom_errorbar(aes(ymin=mean-se, ymax=mean+se), width=.2, size = 0.25, colour = "Black",
position = pd) + geom_line(position = pd) + geom_point(position = pd, size = 2.5) # ggplot
for log10AFDM
######## testing Assumptions ########
# shapiro Wilks
shapiro.test(Algae$CHLA) # 0.000505
shapiro.test(Algae$AFDM) # 2.422e-13 both p values are < 0.05 suggesting data is not normally
distributed need QQ plot to check
# qqplots based on http://www.r-bloggers.com/normality-tests-don%E2%80%99t-do-what-you-thinkthey-do/
do not accept shapiro test completely...
op <- par(mfrow = c(2, 1))
qqnorm(Algae$CHLA)
qqline(Algae$CHLA) # looks alright will assume CHLA isnormall distributed
qqnorm(Algae$AFDM)
qqline(Algae$AFDM) # not good. AFDM not normally distributed - must use non-parametric Wilcox
test for AFDM
par(op)
# variance
Part A
var(Grazed$CHLA) # 0.003186514
var(Ungrazed$CHLA) # 0.002734262 quite similar
# levene's test
leveneTest(Algae$CHLA~Algae$Graze) # p > 0.05 therefore variance is considered homogeneous -
can use parametric t-test CHLA
leveneTest(Algae$CHLA~Algae$Depth) # p > 0.05 therefore variance is considered homogeneous -
can use ANOVA
#### comparison between grazed and ungrazed tiles #######
###AFDM Wilcoxon test ####
library(coin) # load package needed for U test
wilcox_test(AFDM ~ Graze, data = Algae, distribution = "exact") # p<0.05 (0.0004401) therefore we
can conclude there is a difference between
# grazed and ungrazed AFDM data
### CHLA t-test ######
t.test(CHLA ~ Graze, data = Algae) # p> 0.05 (0.6893) therefore the null hypothesis is accepted. The
mean CHLA value is not significantly different
# between grazed and ungrazed tiles...
###### one way ANOVA ##########
#Depth
CHLAgrazedaov <- aov(CHLA ~ Depth, data = Grazed) # anova of grazed tile CHLA against Depth
summary(CHLAgrazedaov) # P<0.05 (0.0393) significant relationship
TukeyHSD(CHLAgrazedaov) # post hoc test to identify the difference between means that are
greater than the standard error
op <- par(mfrow = c(2, 2))
Part A
plot(CHLAgrazedaov) # plot validation shows wedge shaped residuals
par(op)
leveneTest(Grazed$CHLA~Grazed$Depth) # p > 0.05 so we can consider variance homogeneous.
#PLant Volume
CHLAgrazedaov2 <- aov(CHLA ~ Plant.Volume, data = Grazed) # anova of grazed tile CHLA against
plant volume
summary(CHLAgrazedaov2) # p > 0.05 (0.595) Not significant
#Flow
CHLAgrazedaov3 <- aov(CHLA ~ Flow, data = Grazed) # anova of grazed tile CHLA against Flow
summary(CHLAgrazedaov3) # p > 0.05 (0.607) Not significant
#Ungrazed CHLA with Depth
CHLAUngrazedaov <- aov(CHLA ~ Depth, data = Ungrazed) # anova of ungrazed tile CHLA against
Depth
summary(CHLAUngrazedaov) # P>0.05 (0.18) not significant
# PLant Volume
CHLAUngrazedaov2 <- aov(CHLA ~ Plant.Volume, data = Ungrazed) # anova of ungrazed tile
CHLA against plant volume.
summary(CHLAUngrazedaov2) # p <0.05 (0.0437) significant relationship
op <- par(mfrow = c(2, 2))
plot(CHLAUngrazedaov2) # plot validation shows that plant volume requires
transforming...
par(op)
CHLAUngrazedaov2 <- aov(CHLA ~ log10(Plant.Volume), data = Ungrazed) # run anova again with
transformed plant volume
summary(CHLAUngrazedaov2) # p <0.05 (0.0236) significant relationship.
improved following transformation
Part A
op <- par(mfrow = c(2, 2))
plot(CHLAUngrazedaov2) # validaion plots are considerably better, apart from one
major outlier...
par(op)
Algaeno1 <- Ungrazed[(-1),] # remove influential point
CHLAUngrazedaov2 <- aov(CHLA ~ log10(Plant.Volume), data = Algaeno1) # run anova again
without influential point plant volume
summary(CHLAUngrazedaov2) # P<0.1 (0.0607)
op <- par(mfrow = c(2, 2))
plot(CHLAUngrazedaov2) # validaion plots look good.
par(op)
# Tukey test not applied as control variable is required to be a factor.
# Flow
CHLAUngrazedaov3 <- aov(CHLA ~ Flow, data = Ungrazed) # anova of ungrazed tile CHLA against
Flow
summary(CHLAUngrazedaov3) # p > 0.05 (0.382) not significant
### one way ANOVA summary - grazed CHLA is significantly related to Depth and ungrazed CHLA is
significantly related to plant Volume #####
####### Kruskal Wallace test for AFDM data ###### non parametric test is needed for AFDM
#Grazed AFDM with Depth
AFDM.graze.KW <- kruskal.test(AFDM ~ Depth, data = Grazed) # Kruskal-Wallace test (KW test) of
grazed AFDM against Depth
AFDM.graze.KW # P > 0.05 (0.3614) not significant
#PLant Volume
AFDM.graze.KW2 <- kruskal.test(AFDM ~ Plant.Volume, data = Grazed) # Kruskal-Wallace test
(KW test) of AFDM against Plant volume
AFDM.graze.KW2 # P > 0.05 (0.4579) not significant
Part A
#Flow
AFDM.graze.KW3 <- kruskal.test(AFDM ~ Flow, data = Grazed) # Kruskal-Wallace test (KW test) of
AFDM against Flow
AFDM.graze.KW3 # P > 0.05 (0.3954) not significant
#Ungrazed AFDM with Depth
AFDM.ungraze.KW <- kruskal.test(AFDM ~ Depth, data = Ungrazed) # Kruskal-Wallace test (KW
test) of Ungrazed AFDM against Depth
AFDM.ungraze.KW # P>0.05 (0.284) not significant
# PLant Volume
AFDM.ungraze.KW2 <- kruskal.test(AFDM ~ Plant.Volume, data = Ungrazed) # Kruskal-Wallace
test (KW test) of Ungrazed AFDM against plant volume
AFDM.ungraze.KW2 # P>0.05 (0.4579) not significant
# Flow
AFDM.ungraze.KW3 <- kruskal.test(AFDM ~ Flow, data = Ungrazed) # Kruskal-Wallace test (KW
test) of Ungrazed AFDM against Flow
AFDM.ungraze.KW3 # P>0.05 (0.3954) not significant
## KW test summary - no significant relationships found between grazed or ungrazed AFDM with
control variables ######
########## two-way anova with CHLA data ######
# two-way anova considering grazing and Depth
Algae.aov1 <- aov(CHLA ~ Graze + Depth, data = Algae)
summary(Algae.aov1) # p > 0.05 not significant
# two-way interaction considering grazing*Depth
Algae.aov2 <- aov(CHLA ~ Graze*Depth, data = Algae)
summary(Algae.aov2) # significant relationship (P<0.1). Graze:Depth (P= 0.0584)
op <- par(mfrow = c(2, 2))
Part A
plot(Algae.aov2) # Plot validation - Risiduals vs. fitted is wedge shaped, caused by a
few outliers...
par(op)
# two-way anova considering grazing and plant volume
Algae.aov3 <- aov(CHLA ~ Graze + Plant.Volume, data = Algae)
summary(Algae.aov3) # p > 0.05 not significant
# two-way interaction considering grazing*plant volume
Algae.aov4 <- aov(CHLA ~ Graze*Plant.Volume, data = Algae)
summary(Algae.aov4) # Graze:Plant.Volume 0.0796. Not as significant as with
depth
op <- par(mfrow = c(2, 2))
plot(Algae.aov4) # Plot validation - Risiduals vs. fitted is wedge shaped. Variance is
hetrogeneous - needs transforming.
par(op)
Algae.aov4 <- aov(CHLA ~ Graze*log10(Plant.Volume), data = Algae) # run with transformed
explanatory
summary(Algae.aov4) # interaction is no longer significant
#### linear models... we know which aov tests were significant - run the linear model for Ungrazed
CHLA with plant volume #####
CHLA.ungrazed.lm <- lm(CHLA ~ Log10.Plant, data = Algaeno1) # lm of grazed tile CHLA against
plant volume
summary(CHLA.ungrazed.lm ) # P<0.1 (0.0607) significant relationship
op <- par(mfrow = c(2, 2))
plot(CHLA.ungrazed.lm) # Plot validation same as anova - looks alright one major
outlier
par(op)
# plot model
plot(CHLA ~ Log10.Plant , data = Ungrazed, ylim=c(0,0.3), xlab = "log10 (plant volume)",
ylab = "Chlorophyll A",col = "blue", main = "Ungrazed Tiles") # plot CHLA for ungrazed
against Plant volume.
Part A
abline(CHLA.ungrazed.lm) # fit the linear model line.
text(5.5,0.25, "Chlorophyll A = -0.07508Plant.VOlume+ 0.02797", pos = 2)
text(5,0.23, expression(paste(P == 0.0607)), pos = 2)
text(5,0.21, expression(paste(R^2 == 0.1365)), pos = 2) # add equation P value and R2
x <- seq(min(Ungrazed$Log10.Plant), max(Ungrazed$Log10.Plant), l=1000) # #for each value of x,
calculate the upper and lower 95% confidence
y<-predict(CHLA.ungrazed.lm, data.frame(Log10.Plant=x), interval="c")
matlines(x,y, lty=3, col="black") #plot the upper and lower 95% confidence limits
plot(CHLA.ungrazed.lm$resid ~ log10(Algaeno1$Plant.Volume)) # residuals plot looks good . Model
can be accepted.
### linear model for CHLA grazed with depth ####
CHLA.Grazed.lm2 <- lm(CHLA ~ Depth, data = Grazed) # lm of grazed tile CHLA against plant
volume
summary(CHLA.Grazed.lm2 ) # significant P<0.05
plot(CHLA ~ Depth.Num, data = Grazed, col = "blue", xlab = "Depth (cm)", ylab = "Chlorophyll A",
main = "Grazed Tiles") # create plot
abline(coef(CHLA.Grazed.lm2)[1], coef(CHLA.Grazed.lm2)[2-3-4-5-6-7]) # plot linear model
text(35,0.13, "Chlorophyll A = 0.002667Depth + 0.445332", pos = 2)
text(35,0.12, expression(paste(P == 0.03929)), pos = 2)
text(35,0.11, expression(paste(R^2 == 0.383)), pos = 2) # add equation, R^2 and P value
#calculate slope#
0.036333+0.125333+0.086000+0.098000+0.023333+0.076333
op <- par(mfrow = c(2, 2))
plot(CHLA.Grazed.lm2) # Plot validation same as anova - looks alright. 3 outliers effect
the residuals
par(op)
Part A
plot(CHLA.Grazed.lm2$resid ~ Grazed$Depth.Num) # looks slightly wedge shaped, outliers mainly
causing this. model is accepted.
plot(CHLA.Grazed.lm2$resid ~ Grazed$Depth) # less wedge shaped but slightly humped
####### ANCOVA ###########
require(lattice)
# Here's the plot
xyplot(CHLA ~ Depth |Graze # create a plot of CHLA change with depth
and grazed/ungrazed.Sarkar, D. (2008) Lattice: Multivariate Data Visualisation. Springer.
,xlab = "Depth", ylab = "CHLA",
data = Algae)
plot(CHLA ~ Depth.Num, data = Algae, pch = 19,cex = 1.5,
col = c("Black", "Red")[Algae$Graze],
xlab = list("Depth", cex = 1.2),
ylab = list("Chlorophyll A", cex = 1.2))
# Now add a legend
legend(1, 0.25, legend = c("Grazed", "Ungrazed"), # creates similar to above but on the
same graph
col = c("black", "red"), pch = 19)
algae.lm <- lm(CHLA ~ Depth* Graze, data = Algae) # linear ancova for CHLA with graze and Depth
op <- par(mfrow = c(2,2))
plot(algae.lm)
par(op) # residuals are a bit wedge shaped as earlier with 2-way anova but mainly caused by
outliers so I'll continue
hist(algae.lm$resid) # residuals histogram
plot(algae.lm$resid) # residuals plot
plot(algae.lm$resid ~ Algae$Depth) # residuals here look very good!
Part A
anova(algae.lm)
summary(algae.lm) # looks good model has a significant p value (0.0386)
coplot(CHLA ~ Depth | Graze, panel = panel.smooth, data = Algae) # co plot as earlier to view
interaction
interaction.plot(Algae$Depth.Num, Algae$Graze, Algae$CHLA, col=c(2,3)) # interaction plot - similar
to gg plot earlier
###### create final final ancova plot #######
coef(algae.lm)
algae.lm$coeff[1] # Grazed intercept
algae.lm$coeff[2-3-4-5-6-7] # Grazedlope
algae.lm$coeff[1+8] # Ungrazed intercept
algae.lm$coeff[9-10-11-12-13-14] # Ungrazed slope # These are the coefficients needed to
form the lines with Depth as a factor
# plot
plot(CHLA ~ Depth.Num, data = Algae, pch = 19,cex = 1.5,
col = c("Black", "Red")[Algae$Graze],
xlab = list("Depth", cex = 1.2),
ylab = list("Chlorophyll A", cex = 1.2))
# Now add a legend
legend(1, 0.25, legend = c("Grazed", "Ungrazed"),
col = c("black", "red"), pch = 19)
# Now add the coefficients from above to the abline command to get both lines
abline(coef(algae.lm)[1], coef(algae.lm)[2-3-4-5-6-7]) # This adds the Grazed regression line
# Now add the ungrazed regression line
abline(coef(algae.lm)[1] + coef(algae.lm)[8], # Ungrazed intercept
coef(algae.lm)[9-10-11-12-13-14], # Ungrazed slope
col = "red") # This adds the ungrazed line
text(35,0.18, "Chlorophyll A*Graze = -0.0023 + 0.3629 * Depth", pos = 2)
text(35,0.16, expression(paste(P == 0.03867)), pos = 2)
text(35,0.14, expression(paste(R^2 == 0.2768)), pos = 2)
Part A
#calculate Grazed slope#
0.036333+0.125333+0.086000+0.098000+0.023333+0.076333 # 0.445332
#calculate Ungrazed slope
0.019667-0.069333-0.058333-0.051000+0.095333-0.018667 # - 0.082333
# calculate intercept
0.0027-0.005 # -0.0023
# calculate intercept
0.445332- 0.082333 # 0.362999 therefore CHLOROPHYLLAgrazetype = -0.0023+0.3629*Depth
##### multiple linear regression ######
scatterplotMatrix(~ Depth + log10(Flow) + log10(Plant.Volume) + CHLA
, data = Algae, diag = "boxplot")
#Check for multi colinearity by correlation
cor(Algae[,2:3]) # check for correlation between flow and plant volume.
?cor # big issues
# and again with vifs. No values below 3.
vif(lm(CHLA ~ Depth + log10(Flow) + log10(Plant.Volume), data = Algae ))
# colinearity prevents conducting multiple linear regression.
##### non linear models ###########
# following plotting of the linear model of CHLA against plant volume, it looked as if a non
linearpattern may be present, here is the model...
algae.lm1 <- lm(CHLA ~ I(Log10.Plant^2), data = Algaeno1)
summary(algae.lm1) # significant p value (0.076999). However not as
significant as the linear model
op <- par(mfrow = c(2,2))
plot(algae.lm1)
par(op) # apart from outliers, validation plots look good. Model can be accepted
Part A
plot(CHLA ~ Log10.Plant , data = Ungrazed, ylim=c(0,0.3), xlab = "log10 (plant volume)",
ylab = "Chlorophyll A",col = "blue") # plot CHLA for ungrazed against Plant volume. Not able
to apply best fit line due to log10 x axis.
x <- seq(min(Algae$Log10.Plant), max(Algae$Log10.Plant), l=1000) # 1000 steps....
y <- predict(algae.lm1, data.frame(Log10.Plant = x), interval = "c") # plots matlines of ploy2
relationship on graph
matlines(x, y, lty = 1, col = 1)
text(5.5,0.25, "Chlorophyll A = --0.007340*log10(Plant.VOlume)+ 0.002791", pos = 2)
text(5,0.23, expression(paste(P == 0.02509)), pos = 2)
text(5,0.21, expression(paste(R^2 == 0.1972)), pos = 2) # plot equation, P value and R2.
# the model is reasonable, however the linear model provides a higher P value and R^2 value. Accept
the linear model.
#### Generalised addative models for AFDM data ####### GAMS have been chosen over
generalised linear models due to non-linear patterns
depth.log10flow <- lm( log10(Flow) ~ Depth , data = Algae)
summary(depth.log10flow) # linear model of depth and log10 flow to show they
are highly related. log10 flow is used
# for GAMS because Depth is a factor and has fewer unique
covariate combinations than specified
# maximum degrees of freedom. P = 2.2e-16 and R^2 = 0.978
scatterplotMatrix(~ Depth + log10(Flow) + log10(Plant.Volume) + log10(AFDM+0.001)
, data = Algae, diag = "boxplot") # inspect variables and transform
AFDM
require(mgcv) # for GAMS
# use Log10.Flow because plotting GAMS doesn't allow transformations during the plot creation
Part A
plot(log10(AFDM+0.001) ~ Log10.Flow, data= Grazed) # plot to observe possible patterns of
grazed AFDM with flow. possible non-linear pattern
AFDM.gam1 <- gam(log10(AFDM+0.001)~ s(Log10.Flow), data = Grazed) # run GAM of AFDM
against flow for grazed tiles
plot(AFDM.gam1) # looks as if there is a decrease in AFDM with flow.
summary(AFDM.gam1) # significant smoother term (p = 0.00172)
gam.check(AFDM.gam1) # verification plots are mostly good with the
exception of a few major outliers
# plot GAM1
plot(log10(AFDM+0.001) ~ Log10.Flow, data = Grazed,
pch = 16,
xlab = "Log10 Flow",
ylab = "Ash Free Dry Mass", main = "GAM of Grazed AFDM ")
# predict across the data
x <- seq(min(Grazed$Log10.Flow), max(Grazed$Log10.Flow), l=100) # 100 steps....
y <- predict(AFDM.gam1, data.frame(Log10.Flow = x), se = TRUE) # using standard errors se =
TRUE
# add lines
lines(x, y$fit) # plots the fitted values
lines(x, y$fit + 2 * y$se.fit, lty = 2) # plots a dashed line for 2 * SE above the fit
lines(x, y$fit - 2 * y$se.fit, lty = 2) # plots a dashed line for 2 * SE below the fit
# plot indicates a decrease in AFDM with flow
### Ungrazed AFDM
plot(log10(AFDM+0.001) ~ Log10.Flow, data = Ungrazed) # doesn't look as if there is a significant
response of AFDM to flow
AFDM.gam2 <- gam(log10(AFDM+0.001) ~ s(Log10.Flow), data = Ungrazed) # run GAM of AFDM
against flow for ungrazed tiles
plot(AFDM.gam2) # plot GAM - very clearly no pattern...
summary(AFDM.gam2) # # smoother term not significant P>0.1
gam.check(AFDM.gam2) # humped response vs. fitted is far from linear.
further plotting not needed.
Part A
###### All models so far produced for CHLA indicate a high spread of results with low R^2 values.
GAMS are used to investigate
###### whether non-linear trends have been overlooked during the previous analysis.
### Grazed CHLA
plot(CHLA ~ Log10.Flow, data = Grazed) # check for non linear patterns using a GAM
AFDM.gam3 <- gam(CHLA ~ s(Log10.Flow), data = Grazed) # run GAM of AFDM against flow for
grazed tiles
plot(AFDM.gam3) # indicates a non linear pattern
summary(AFDM.gam3) # smoother term only significant to p<0.1 (0.0964)
gam.check(AFDM.gam3) # a bit wedge shaped
# plot Gam3
plot(CHLA ~ Log10.Flow, data = Grazed,
pch = 16,
xlab = "Log10 Flow",
ylab = "Ash Free Dry Mass")
# predict across the data
x <- seq(min(Grazed$Log10.Flow), max(Grazed$Log10.Flow), l=100) # 100 steps....
y <- predict(AFDM.gam3, data.frame(Log10.Flow = x), se = TRUE) # using standard errors se =
TRUE
# add lines
lines(x, y$fit) # plots the fitted values
lines(x, y$fit + 2 * y$se.fit, lty = 2) # plots a dashed line for 2 * SE above the fit
lines(x, y$fit - 2 * y$se.fit, lty = 2) # plots a dashed line for 2 * SE below the fit
## Ungrazed CHLA
plot(CHLA ~ Log10.Flow, data = Ungrazed) # patter looks like there may be non linear pattern -
check for non linear patterns using a GAM
AFDM.gam4 <- gam(CHLA ~ s(Log10.Flow), data = Ungrazed) # run GAM of AFDM against flow for
grazed tiles
plot(AFDM.gam4) # very clear non linear pattern with narrow confidence limits
summary(AFDM.gam4) # highly significant smoother term (p = 1.21e-05)
Part A
gam.check(AFDM.gam4) # residuals are slightly humped but generally look good.
# plot GAM 4
plot(CHLA ~ Log10.Flow, data = Ungrazed,
pch = 16,
xlab = "Log10 Flow",
ylab = "Ash Free Dry Mass")
# predict across the data
x <- seq(min(Ungrazed$Log10.Flow), max(Ungrazed$Log10.Flow), l=100) # 100 steps....
y <- predict(AFDM.gam4, data.frame(Log10.Flow = x), se = TRUE) # using standard errors se =
TRUE
# add lines
lines(x, y$fit) # plots the fitted values
lines(x, y$fit + 2 * y$se.fit, lty = 2) # plots a dashed line for 2 * SE above the fit
lines(x, y$fit - 2 * y$se.fit, lty = 2) # plots a dashed line for 2 * SE below the fit
### note on GAM 4 - major outlier caused significant pattern change in GAM
Part A
Appendix 2 - Dataset:
Depth Flow Plant.Volume AFDM CHLA Graze
25 46.56 1201786 0.0234 0.246 Ungrazed
10 20.34 72028.57 0.0084 0.058 Ungrazed
7 8.76 31196.43 0 0.017 Ungrazed
7 9.96 41871.43 0.0078 0.057 Ungrazed
2 2.41 15121.43 0.0108 0.076 Ungrazed
35 209.1 1171936 0 0.037 Ungrazed
2 3.21 20967.86 0.3114 0.05 Ungrazed
0 0.075 2650 0.006 0.012 Ungrazed
10 23.4 50732.14 0.009 0.067 Ungrazed
35 130.5 1424357 0.0294 0.045 Ungrazed
25 150 1090200 0.0114 0.054 Ungrazed
25 165 1276714 0.0678 0.079 Ungrazed
5 2.1 13485.71 0.027 0.116 Ungrazed
35 223.5 1319379 0.024 0.114 Ungrazed
0 0.075 2235.714 0.0066 0.007 Ungrazed
2 3.42 25385.71 0.0276 0.065 Ungrazed
10 9.9 55778.57 0.0108 0.039 Ungrazed
5 5.22 31492.86 0.042 0.03 Ungrazed
7 8.4 34646.43 0.0384 0.032 Ungrazed
5 5.02 32964.29 0.0216 0.045 Ungrazed
0 0.1 4878.571 0.0132 0.004 Ungrazed
25 46.56 1201786 0.0282 0.056 Grazed
10 20.34 72028.57 0.021 0.075 Grazed
7 8.76 31196.43 0.021 0.005 Grazed
7 9.96 41871.43 0.0132 0.108 Grazed
2 2.41 15121.43 0.0324 0.034 Grazed
35 209.1 1171936 0.0204 0.077 Grazed
2 3.21 20967.86 0.033 0.063 Grazed
0 0.075 2650 6.7008 0.003 Grazed
10 23.4 50732.14 0.0396 0.065 Grazed
35 130.5 1424357 0.0222 0.097 Grazed
25 150 1090200 0.0294 0 Grazed
25 165 1276714 0.0498 0.022 Grazed
5 2.1 13485.71 0.0432 0.173 Grazed
35 223.5 1319379 0.0546 0.063 Grazed
0 0.075 2235.714 3.0228 0.004 Grazed
2 3.42 25385.71 0.1176 0.02 Grazed
10 9.9 55778.57 0.0456 0.162 Grazed
5 5.22 31492.86 0.048 0.151 Grazed
7 8.4 34646.43 0.0804 0.153 Grazed
5 5.02 32964.29 0.4884 0.06 Grazed
0 0.1 4878.571 0.0384 0.001 Grazed
Part B
Modelling the stage of a river in a glacial catchment
(All relevant images, Figures 1-5, are attached at the end of the document)
# Intoduction: This model attempts to predict the stage of a river in a hypothetical glacial catchment.
The inputs of two different glaciers
# of different sizes are considered, along with the input of snow melt and rain. The model consists of
5 water stores: i. Glacier 1,
# ii. Glacier 2, iii. Snow pack, iv. catchment storage and v. River storage. The model predicts the daily
average stage of the river at a
# particular location for 364 days. The data set used is not based on a real location and assumes a
start date of January 2nd and an end date
# of December 31st.
# clear memory
rm(list=ls())
# set working directory
Climate <- read.csv(file.choose()) # load file
Climate$temp0 <- ifelse(Climate$Temperature < 0, 0, Climate$Temperature) # changes negative
temperatures to 0^C. Ice melt is assumed takes place
# at 1^C and above (Pelliciotti, et al. 2005). Negative temperatures will affect the equations for snow
and glacial melt, therefore all negative temperatures are set
# to zero, and as a result no melt takes place.
# define the model parameters and the inital conditions
t <- 0 # Time at the start of the simulations
k1 <- 7 # Model parameter for glacier 1 melt - describes the contribution of the glacier to river
discharge. A function of glacier size and melt rate
Part B
k2 <- 2 # Model parameter for small glacier melt - describes the contribution of glacier 2 to river
discharge
k3 <- 0.9 # model parameter for river flow - This constant controls the rate of flow from the river.
The constant is a reflection of the river's
# cross sectional area. A smaller cross section (due to a constriction) will lead to reduced
discharge (Q=Av). When K3 = 1, it is assumed that
# the rate of inflow is equal to the rate of outflow.
k4 <- 20 # model parameter for rainfall and snow melt contribution to river discharge. Reflects the
size of the catchment
k5 <- 5 # model parameter for snow melt. Describes the rate of snow melt
k7 <- 0.1 # decribes the storage time of water within the catchment (the flashiness of the
catchment). Where a value of 1 means that all water in the catchment
# reaches the river immediately and 0 means that no water reaches the river.
S1 <- 50000 # Inital storage of Glacier 1 (large value that will not be significantly diminished over
a year)
S2 <- 30000 # Inital storage of Glacier 2
S3 <- 10 # Inital river stage below the confluence of the two glacial streams
S4 <- 200 # initial snow cover
S5 <- 5 # intial rainwater storage in catchment
ntimesteps=365 # model runs for a year
timestep = 1 #timestep is 1 day
output <- mat.or.vec(ntimesteps,10) # creation of Output vector. 10 columns are required due for
the input of values following the model run.
# Run model
for(n in 1:ntimesteps){ # set loop to run model for each time step
dS1 <- k1*Climate[n,5]*timestep # Calculate the flow out of the Glacier 1 (K* Temp * t). Clacial
discharge is assumed to be linearly related to temperature.
# (Pelliciotti, et al. 2005)
dS2 <- k2*Climate[n,5]*timestep # Calculate the flow out of the Glacier 2 (K* Temp * t)
dS3 <- k3*S3*timestep # Calculate the flow through the river (K* water level * t)
Part B
dS4 <- (k5*Climate[n,5]*timestep) # calculate flow from snow melt (K* Temp * t). snowmelt rate is
assumed to have a linear relationship with temperature (Hock, 2005)
dS5 <- k7*S5*timestep # calculate the runoff from catchment.
if (S4<dS4){ #if the snow melt discharge is greater than the depth of the snow pack then no melt
takes place. Ensures that snow melt does not
dS4 = S4 # occur when there is no snow present.
}
S1 <- S1-dS1 # Volume of ice in Glacier 1
S2 <- S2-dS2 # Volume of ice in Glacier 2
S3 <- (S3+ (dS5*k4) +dS1+dS2)-dS3 # The stage of the river below the confluence of the glacial
streams. It includes the contribution
# of discharge from the glaciers, snow melt, rain and the loss of water due to river discharge.
S4 <- (S4+Climate[n,4])-dS4 # describes the average depth of the snow pack within the catchment.
S5 <- (S5+Climate[n,3]+dS4)- dS5 # decribes the average storage of rain water and snow melt in the
catchment
t <- t+timestep # New time step
output[n,1] <- t # Data is output into the matrix that was created earlier.
output[n,2] <- S1
output[n,3] <- S2
output[n,4] <- S3
Part B
output[n,5] <- S4
}
output[,6] <- append(output[,2], 0, 0)[-nrow(output)]
output[,7] <- as.matrix(output[,6]-output[,2]) # creates daily average discharge from Glacier 1.
storage - the storage from the previous day
output[,8] <- append(output[,3], 0, 0)[-nrow(output)]
output[,9] <- as.matrix(output[,8]-output[,3]) # creates daily average discharge from Glacier 2.
storage - the storage from the previous day
output2 <- output[-1,] # removes january 1st value due to inability calculate glacier discharge
without the previous day's storage.
#creates new output matrix.
climate2 <- Climate[-1,] # as a result January 1st values must also be removed from Climate data.
#-----------------
# PLot the Model
#-----------------
x <- output2[,1] # define x, y and secondary y axis
y1 <- output2[,4]
y2 <- climate2[,3]
par(mar=c(5,4,4,5)+.1) # allows for the labelling of the secondary axis
Part B
plot(x,y1,type="l",col="blue", xlab="Time (days) ",ylab= "River stage (cm), Glacial Discharge (
m^3/s^-1) and Snowpack depth (cm)",
cex.lab = 0.8, ylim= c(-87,400), main = "Glacial Catchment Model") # plots stage / time
points(x = (output2[,1]), y = output2[,7],xlab = "", ylab="", col="chartreuse",type="l") # adds
glacier 1 discharge
points(x = (output2[,1]), y = output2[,9],xlab = "", ylab="", col="chartreuse4",type="l") # adds
glacier 2 dscharge
points(x = (output2[,1]), y = output2[,5],xlab = "", ylab="", col="gold",type="l") # adds depth of
snow pack
par(new=TRUE)
g_range <- range(0, climate2[,2:3]) # define range of secondary axis
plot(x, y2,type="l",col="black",xaxt="n",yaxt="n",xlab="",ylab="", ylim=g_range ) # plot rainfall
data
axis(4)
points(x = (output2[,1]), y = climate2[,2],xlab = "", ylab="", col="red",type="l") # plot temperature
data
mtext("Rainfall (mm) and Temperature (c)",side=4,line=3)
legend("topleft",cex=0.8,col=c("blue","chartreuse","chartreuse4","gold","black","red"),lty=1, #
add legend
legend=c("River Stage","Glacier1 Q", "Glacier2 Q", "Snowpack depth","rainfall","temperature"))
# See Figure 1
#----------------------------------------------------------------------------------------------
# Scenario testing
#----------------------------------------------------------------------------------------------
# In order to test the sensitivity of the model, three different values for each parameter will be
tested in order
# to determine the significance of different factors in influencing the outcome of the model.
Part B
#----------------------------------------------------
#-- GLACIAL MELT volume --
# ---------------------------------------------------
# Parameters K1 and K2 control the amount of discharge from glaciers 1 and 2. They therefore
reflect the size of the glacial toe
# and the sensitivity to melting. This could be affected by topography, subglacial characteristics and
englacial dynamics.
# Melt rates can be assumed to be linearly related to temperature, however, the rates between
glaciers may vary (Pelliciotti, et al. 2005).
#1) K1 = 2, K2 = 1
# define the model parameters and the inital conditions
t <- 0
k1 <- 2
k2 <- 1
k3 <- 0.9
k4 <- 20
k5 <- 5
k7 <- 0.1
S1 <- 50000
S2 <- 30000
S3 <- 10
S4 <- 200
S5 <- 5
ntimesteps=365
timestep = 1
nomelt = 1
output3 <- mat.or.vec(ntimesteps,10)
for(n in 1:ntimesteps){
Part B
dS1 <- k1*Climate[n,5]*timestep
dS2 <- k2*Climate[n,5]*timestep
dS3 <- k3*S3*timestep
dS4 <- (k5*Climate[n,5]*timestep)
dS5 <- k7*S5*timestep
if (S4<dS4){
dS4 = S4
}
S1 <- S1-dS1
S2 <- S2-dS2
S3 <- (S3+ (dS5*k4) +dS1+dS2)-dS3
S4 <- (S4+Climate[n,4])-dS4
S5 <- (S5+Climate[n,3]+dS4)- dS5
t <- t+timestep
output3[n,1] <- t # Now lets output the data from this time step
output3[n,2] <- S1
output3[n,3] <- S2
output3[n,4] <- S3
output3[n,5] <- S4
}
#2) K1 = 4, K2 = 2.5
# define the model parameters and the inital conditions
t <- 0
Part B
k1 <- 4
k2 <- 2.5
k3 <- 0.9
k4 <- 20
k5 <- 5
k7 <- 0.1
S1 <- 50000
S2 <- 30000
S3 <- 10
S4 <- 200
S5 <- 5
ntimesteps=365
timestep = 1
nomelt = 1
output4 <- mat.or.vec(ntimesteps,10)
for(n in 1:ntimesteps){
dS1 <- k1*Climate[n,5]*timestep
dS2 <- k2*Climate[n,5]*timestep
dS3 <- k3*S3*timestep
dS4 <- (k5*Climate[n,5]*timestep)
dS5 <- k7*S5*timestep
if (S4<dS4){
dS4 = S4
}
S1 <- S1-dS1
S2 <- S2-dS2
S3 <- (S3+ (dS5*k4) +dS1+dS2)-dS3
S4 <- (S4+Climate[n,4])-dS4
Part B
S5 <- (S5+Climate[n,3]+dS4)- dS5
t <- t+timestep
output4[n,1] <- t # Now lets output the data from this time step
output4[n,2] <- S1
output4[n,3] <- S2
output4[n,4] <- S3
output4[n,5] <- S4
}
#3) K1 = 7, K2 = 4
# define the model parameters and the inital conditions
t <- 0
k1 <- 7
k2 <- 4
k3 <- 0.9
k4 <- 20
k5 <- 5
k7 <- 0.1
S1 <- 50000
S2 <- 30000
S3 <- 10
S4 <- 200
S5 <- 5
ntimesteps=365
timestep = 1
nomelt = 1
Part B
output5 <- mat.or.vec(ntimesteps,10)
for(n in 1:ntimesteps){
dS1 <- k1*Climate[n,5]*timestep
dS2 <- k2*Climate[n,5]*timestep
dS3 <- k3*S3*timestep
dS4 <- (k5*Climate[n,5]*timestep)
dS5 <- k7*S5*timestep
if (S4<dS4){
dS4 = S4
}
S1 <- S1-dS1
S2 <- S2-dS2
S3 <- (S3+ (dS5*k4) +dS1+dS2)-dS3
S4 <- (S4+Climate[n,4])-dS4
S5 <- (S5+Climate[n,3]+dS4)- dS5
t <- t+timestep
output5[n,1] <- t # Now lets output the data from this time step
output5[n,2] <- S1
output5[n,3] <- S2
output5[n,4] <- S3
output5[n,5] <- S4
}
plot(output3[,1],output3[,4],type="l",col="deepskyblue", xlab="Time (days) ",ylab="River Stage (cm)
", ylim= c(0,420),
Part B
main = "Scenario testing - Glacial melt rate parameter (K1 and K2)")
# plots first parameter scenario (K1 = 2, k2 = 1)
points(x = (output4[,1]), y = output4[,4],xlab = "", ylab="", col="blueviolet",type="l")
# adds second parameter scenario (K1 = 4, k2 = 2.5)
points(x = (output5[,1]), y = output5[,4],xlab = "", ylab="", col="deeppink",type="l")
# adds second parameter scenario (K1 = 6, k2 = 3)
legend("topleft",cex=0.8,col=c("deepskyblue","blueviolet","deeppink"),lty=1, # add legend
legend=c("K1=2, K2=1","K1=4, K2=2.5", "K1=6, K2=3"))
# Summary:
# increases in volume are linearly related to each other. Increases in glacial melt lead to increased
river stage. If necessary, a
# lag function could be applied to the model depending on the time taken for glacial melt to reach
the river system. This would be
# dependent on subglacial lakes or the positioning of moraines. However, with a time step of one
day, it is considered that this lag
# would have a limited impact on the model.
# See Figure 2
#----------------------------------------------------------------------------
# River Flow Parameter
#----------------------------------------------------------------------------
# The river flow parameter (K3) determines the rate of flow through the river. THis parameter will be
a function of cross sectional
# area and velocity (slope). Where K3 = 1 discharge of inputs is considered to be equal to the river
disharge.
#1) K3=0.3
# define the model parameters and the inital conditions
Part B
t <- 0
k1 <- 7
k2 <- 2
k3 <- 0.3
k4 <- 20
k5 <- 5
k7 <- 0.1
S1 <- 50000
S2 <- 30000
S3 <- 10
S4 <- 200
S5 <- 5
ntimesteps=365
timestep = 1
nomelt = 1
output6 <- mat.or.vec(ntimesteps,10)
for(n in 1:ntimesteps){
dS1 <- k1*Climate[n,5]*timestep
dS2 <- k2*Climate[n,5]*timestep
dS3 <- k3*S3*timestep
dS4 <- (k5*Climate[n,5]*timestep)
dS5 <- k7*S5*timestep
if (S4<dS4){
dS4 = S4
}
S1 <- S1-dS1
S2 <- S2-dS2
S3 <- (S3+ (dS5*k4) +dS1+dS2)-dS3
Part B
S4 <- (S4+Climate[n,4])-dS4
S5 <- (S5+Climate[n,3]+dS4)- dS5
t <- t+timestep
output6[n,1] <- t # Now lets output the data from this time step
output6[n,2] <- S1
output6[n,3] <- S2
output6[n,4] <- S3
output6[n,5] <- S4
}
#2) K3=0.6
# define the model parameters and the inital conditions
t <- 0
k1 <- 7
k2 <- 2
k3 <- 0.6
k4 <- 20
k5 <- 5
k7 <- 0.1
S1 <- 50000
S2 <- 30000
S3 <- 10
S4 <- 200
S5 <- 5
ntimesteps=365
timestep = 1
Part B
nomelt = 1
output7 <- mat.or.vec(ntimesteps,10)
for(n in 1:ntimesteps){
dS1 <- k1*Climate[n,5]*timestep
dS2 <- k2*Climate[n,5]*timestep
dS3 <- k3*S3*timestep
dS4 <- (k5*Climate[n,5]*timestep)
dS5 <- k7*S5*timestep
if (S4<dS4){
dS4 = S4
}
S1 <- S1-dS1
S2 <- S2-dS2
S3 <- (S3+ (dS5*k4) +dS1+dS2)-dS3
S4 <- (S4+Climate[n,4])-dS4
S5 <- (S5+Climate[n,3]+dS4)- dS5
t <- t+timestep
output7[n,1] <- t # Now lets output the data from this time step
output7[n,2] <- S1
output7[n,3] <- S2
output7[n,4] <- S3
output7[n,5] <- S4
}
Part B
#3) K3= 1
# define the model parameters and the inital conditions
t <- 0
k1 <- 7
k2 <- 2
k3 <- 1
k4 <- 20
k5 <- 5
k7 <- 0.1
S1 <- 50000
S2 <- 30000
S3 <- 10
S4 <- 200
S5 <- 5
ntimesteps=365
timestep = 1
nomelt = 1
output8 <- mat.or.vec(ntimesteps,10)
for(n in 1:ntimesteps){
dS1 <- k1*Climate[n,5]*timestep
dS2 <- k2*Climate[n,5]*timestep
dS3 <- k3*S3*timestep
dS4 <- (k5*Climate[n,5]*timestep)
dS5 <- k7*S5*timestep
if (S4<dS4){
dS4 = S4
}
Part B
S1 <- S1-dS1
S2 <- S2-dS2
S3 <- (S3+ (dS5*k4) +dS1+dS2)-dS3
S4 <- (S4+Climate[n,4])-dS4
S5 <- (S5+Climate[n,3]+dS4)- dS5
t <- t+timestep
output8[n,1] <- t # Now lets output the data from this time step
output8[n,2] <- S1
output8[n,3] <- S2
output8[n,4] <- S3
output8[n,5] <- S4
}
plot(output6[,1],output6[,4],type="l",col="darkturquoise", xlab="Time (days) ",ylab="Dishcarge (m3
s-1) ", ylim= c(0,1000),
main = "Scenario testing - River flow parameter (River cross section) (K3) ")
# plots first parameter scenario (K3=0.3)
points(x = (output7[,1]), y = output7[,4],xlab = "", ylab="", col="dodgerblue4",type="l")
# adds second parameter scenario (K3=0.6)
points(x = (output8[,1]), y = output8[,4],xlab = "", ylab="", col="aquamarine3",type="l")
# adds second parameter scenario (K3=1)
legend("topleft",cex=0.8,col=c("darkturquoise","dodgerblue4","aquamarine3"),lty=1, # add
legend
legend=c("K3=0.3","K3=0.6", "K3=1"))
Part B
# Summary:
# Changes in K3 lead to a proportional change in the river stage. The function allows the model to be
adjusted for different types of
# river or lake stores. small changes in K3 can have significant impacts on the stage of the river.
# see Figure 3
#----------------------------------------------------------------------------
# Rain and snow melt contribution Parameter (Catchment size)
#----------------------------------------------------------------------------
# This parameter (K4) scales the average rainfall and snowmelt across the catchment to a value
which reflects the runoff contribution
# to river discharge. An increase in K4 is expected to increase the magnitude of the response to
catchment runoff.
#1) K4=1
# define the model parameters and the inital conditions
t <- 0
k1 <- 7
k2 <- 2
k3 <- 0.9
k4 <- 1
k5 <- 5
k7 <- 0.1
S1 <- 50000
S2 <- 30000
S3 <- 10
S4 <- 200
S5 <- 5
ntimesteps=365
timestep = 1
Part B
nomelt = 1
output9 <- mat.or.vec(ntimesteps,10)
for(n in 1:ntimesteps){
dS1 <- k1*Climate[n,5]*timestep
dS2 <- k2*Climate[n,5]*timestep
dS3 <- k3*S3*timestep
dS4 <- (k5*Climate[n,5]*timestep)
dS5 <- k7*S5*timestep
if (S4<dS4){
dS4 = S4
}
S1 <- S1-dS1
S2 <- S2-dS2
S3 <- (S3+ (dS5*k4) +dS1+dS2)-dS3
S4 <- (S4+Climate[n,4])-dS4
S5 <- (S5+Climate[n,3]+dS4)- dS5
t <- t+timestep
output9[n,1] <- t # Now lets output the data from this time step
output9[n,2] <- S1
output9[n,3] <- S2
output9[n,4] <- S3
output9[n,5] <- S4
}
Part B
#2) K4=20
# define the model parameters and the inital conditions
t <- 0
k1 <- 7
k2 <- 2
k3 <- 0.9
k4 <- 20
k5 <- 5
k7 <- 0.1
S1 <- 50000
S2 <- 30000
S3 <- 10
S4 <- 200
S5 <- 5
ntimesteps=365
timestep = 1
nomelt = 1
output10 <- mat.or.vec(ntimesteps,10)
for(n in 1:ntimesteps){
dS1 <- k1*Climate[n,5]*timestep
dS2 <- k2*Climate[n,5]*timestep
dS3 <- k3*S3*timestep
dS4 <- (k5*Climate[n,5]*timestep)
dS5 <- k7*S5*timestep
if (S4<dS4){
dS4 = S4
}
Part B
S1 <- S1-dS1
S2 <- S2-dS2
S3 <- (S3+ (dS5*k4) +dS1+dS2)-dS3
S4 <- (S4+Climate[n,4])-dS4
S5 <- (S5+Climate[n,3]+dS4)- dS5
t <- t+timestep
output10[n,1] <- t # Now lets output the data from this time step
output10[n,2] <- S1
output10[n,3] <- S2
output10[n,4] <- S3
output10[n,5] <- S4
}
#3) K4=50
# define the model parameters and the inital conditions
t <- 0
k1 <- 7
k2 <- 2
k3 <- 0.9
k4 <- 50
k5 <- 5
k7 <- 0.1
S1 <- 50000
S2 <- 30000
S3 <- 10
Part B
S4 <- 200
S5 <- 5
ntimesteps=365
timestep = 1
nomelt = 1
output11 <- mat.or.vec(ntimesteps,10)
for(n in 1:ntimesteps){
dS1 <- k1*Climate[n,5]*timestep
dS2 <- k2*Climate[n,5]*timestep
dS3 <- k3*S3*timestep
dS4 <- (k5*Climate[n,5]*timestep)
dS5 <- k7*S5*timestep
if (S4<dS4){
dS4 = S4
}
S1 <- S1-dS1
S2 <- S2-dS2
S3 <- (S3+ (dS5*k4) +dS1+dS2)-dS3
S4 <- (S4+Climate[n,4])-dS4
S5 <- (S5+Climate[n,3]+dS4)- dS5
t <- t+timestep
output11[n,1] <- t # Now lets output the data from this time step
output11[n,2] <- S1
output11[n,3] <- S2
output11[n,4] <- S3
output11[n,5] <- S4
Part B
}
plot(output9[,1],output9[,4],type="l",col="forestgreen", xlab="Time (days) ",ylab="Stage (cm) ",
ylim= c(0,700),
main = "Scenario testing - catchment size parameter (K4) ")
# plots first parameter scenario (K4=1)
points(x = (output10[,1]), y = output10[,4],xlab = "", ylab="", col="chartreuse2",type="l")
# adds second parameter scenario (K4=20)
points(x = (output11[,1]), y = output11[,4],xlab = "", ylab="", col="cyan4",type="l")
# adds second parameter scenario (K4=50)
legend("topleft",cex=0.8,col=c("forestgreen","chartreuse2","cyan4"),lty=1, # add legend
legend=c("K4=1","K4=20", "K4=50"))
#Summary:
# The plot indicates that increasing K4 increases the stage of the river, particularly during those
periods where river flow is dominated
# by catchment runoff, rather than glacial melt.
# See Figure 4
#------------------------------------------------------------------------------------------------------
# Snow melt rate parameter
#------------------------------------------------------------------------------------------------------
# This parameter (K5) dictates the rate that snow pack decreases in depth. The melting of snow is
linearly related to
#temperature (Hock, 2005), this is reflected in the snow melt equation included in the model. The
rate of snow melt may also
# be influenced by cloud cover and topography. However, this has been simplified for the purposes
of this model.
Part B
#1) K5=1
# define the model parameters and the inital conditions
t <- 0
k1 <- 7
k2 <- 2
k3 <- 0.9
k4 <- 20
k5 <- 1
k7 <- 0.1
S1 <- 50000
S2 <- 30000
S3 <- 10
S4 <- 200
S5 <- 5
ntimesteps=365
timestep = 1
nomelt = 1
output12 <- mat.or.vec(ntimesteps,10)
for(n in 1:ntimesteps){
dS1 <- k1*Climate[n,5]*timestep
dS2 <- k2*Climate[n,5]*timestep
dS3 <- k3*S3*timestep
dS4 <- (k5*Climate[n,5]*timestep)
dS5 <- k7*S5*timestep
if (S4<dS4){
dS4 = S4
}
S1 <- S1-dS1
Part B
S2 <- S2-dS2
S3 <- (S3+ (dS5*k4) +dS1+dS2)-dS3
S4 <- (S4+Climate[n,4])-dS4
S5 <- (S5+Climate[n,3]+dS4)- dS5
t <- t+timestep
output12[n,1] <- t # Now lets output the data from this time step
output12[n,2] <- S1
output12[n,3] <- S2
output12[n,4] <- S3
output12[n,5] <- S4
}
#2) K5=5
# define the model parameters and the inital conditions
t <- 0
k1 <- 7
k2 <- 2
k3 <- 0.9
k4 <- 20
k5 <- 5
k7 <- 0.1
S1 <- 50000
S2 <- 30000
S3 <- 10
S4 <- 200
S5 <- 5
Part B
ntimesteps=365
timestep = 1
nomelt = 1
output13 <- mat.or.vec(ntimesteps,10)
for(n in 1:ntimesteps){
dS1 <- k1*Climate[n,5]*timestep
dS2 <- k2*Climate[n,5]*timestep
dS3 <- k3*S3*timestep
dS4 <- (k5*Climate[n,5]*timestep)
dS5 <- k7*S5*timestep
if (S4<dS4){
dS4 = S4
}
S1 <- S1-dS1
S2 <- S2-dS2
S3 <- (S3+ (dS5*k4) +dS1+dS2)-dS3
S4 <- (S4+Climate[n,4])-dS4
S5 <- (S5+Climate[n,3]+dS4)- dS5
t <- t+timestep
output13[n,1] <- t # Now lets output the data from this time step
output13[n,2] <- S1
output13[n,3] <- S2
output13[n,4] <- S3
output13[n,5] <- S4
Part B
}
#3) K5=10
# define the model parameters and the inital conditions
t <- 0
k1 <- 7
k2 <- 2
k3 <- 0.9
k4 <- 20
k5 <- 10
k7 <- 0.1
S1 <- 50000
S2 <- 30000
S3 <- 10
S4 <- 200
S5 <- 5
ntimesteps=365
timestep = 1
nomelt = 1
output14 <- mat.or.vec(ntimesteps,10)
for(n in 1:ntimesteps){
dS1 <- k1*Climate[n,5]*timestep
dS2 <- k2*Climate[n,5]*timestep
dS3 <- k3*S3*timestep
dS4 <- (k5*Climate[n,5]*timestep)
dS5 <- k7*S5*timestep
if (S4<dS4){
Part B
dS4 = S4
}
S1 <- S1-dS1
S2 <- S2-dS2
S3 <- (S3+ (dS5*k4) +dS1+dS2)-dS3
S4 <- (S4+Climate[n,4])-dS4
S5 <- (S5+Climate[n,3]+dS4)- dS5
t <- t+timestep
output14[n,1] <- t # Now lets output the data from this time step
output14[n,2] <- S1
output14[n,3] <- S2
output14[n,4] <- S3
output14[n,5] <- S4
}
plot(output12[,1],output12[,4],type="l",col="darkslategray", xlab="Time (days) ",ylab="Stage (cm) ",
ylim= c(0,500),
main = "Scenario testing - Snow Melt Rate (K5) ")
# plots first parameter scenario (K5=1)
points(x = (output13[,1]), y = output13[,4],xlab = "", ylab="", col="firebrick1",type="l")
# adds second parameter scenario (K5=5)
points(x = (output14[,1]), y = output14[,4],xlab = "", ylab="", col="deepskyblue3",type="l")
# adds second parameter scenario (K5=10)
legend("topleft",cex=0.8,col=c("darkslategray","firebrick1","deepskyblue3"),lty=1, # add legend
Part B
legend=c("K5=1","K5=5", "K5=10"))
#Summary:
#The rate of snow melt has a significant impact on the timing of periods of high flow. The Where the
lag is highest (K5 = 1)
# snowmelt coincides with glacial melt and as a result yields the highest stage. Where K5=10, snow
melt occurs very early in the year and
# the snow pack is diminished well before significant glacial melt takes place.
# See Figure 5
#------------------------------------------------------------------------------------------------------
# run off rate parameter for rain and snow melt
#------------------------------------------------------------------------------------------------------
# This parameter (K7) describes the flahiness of the catchment. This factor will be dependent on the
geology, topogrpahy and
# vegetation cover of the catchment. Changes in K7 will affect the response rate of river stage to
runoff from both snow melt and rain.
#1) K7=1
# define the model parameters and the inital conditions
t <- 0
k1 <- 7
k2 <- 2
k3 <- 0.9
k4 <- 20
k5 <- 5
k7 <- 0.001
S1 <- 50000
S2 <- 30000
S3 <- 10
Part B
S4 <- 200
S5 <- 5
ntimesteps=365
timestep = 1
nomelt = 1
output15 <- mat.or.vec(ntimesteps,10)
for(n in 1:ntimesteps){
dS1 <- k1*Climate[n,5]*timestep
dS2 <- k2*Climate[n,5]*timestep
dS3 <- k3*S3*timestep
dS4 <- (k5*Climate[n,5]*timestep)
dS5 <- k7*S5*timestep
if (S4<dS4){
dS4 = S4
}
S1 <- S1-dS1
S2 <- S2-dS2
S3 <- (S3+ (dS5*k4) +dS1+dS2)-dS3
S4 <- (S4+Climate[n,4])-dS4
S5 <- (S5+Climate[n,3]+dS4)- dS5
t <- t+timestep
output15[n,1] <- t # Now lets output the data from this time step
output15[n,2] <- S1
output15[n,3] <- S2
output15[n,4] <- S3
Part B
output15[n,5] <- S4
}
#2) K5=5
# define the model parameters and the inital conditions
t <- 0
k1 <- 7
k2 <- 2
k3 <- 0.9
k4 <- 20
k5 <- 5
k7 <- 0.01
S1 <- 50000
S2 <- 30000
S3 <- 10
S4 <- 200
S5 <- 5
ntimesteps=365
timestep = 1
nomelt = 1
output16 <- mat.or.vec(ntimesteps,10)
for(n in 1:ntimesteps){
dS1 <- k1*Climate[n,5]*timestep
dS2 <- k2*Climate[n,5]*timestep
dS3 <- k3*S3*timestep
dS4 <- (k5*Climate[n,5]*timestep)
dS5 <- k7*S5*timestep
Part B
if (S4<dS4){
dS4 = S4
}
S1 <- S1-dS1
S2 <- S2-dS2
S3 <- (S3+ (dS5*k4) +dS1+dS2)-dS3
S4 <- (S4+Climate[n,4])-dS4
S5 <- (S5+Climate[n,3]+dS4)- dS5
t <- t+timestep
output16[n,1] <- t # Now lets output the data from this time step
output16[n,2] <- S1
output16[n,3] <- S2
output16[n,4] <- S3
output16[n,5] <- S4
}
#3) K5=10
# define the model parameters and the inital conditions
t <- 0
k1 <- 7
k2 <- 2
k3 <- 0.9
k4 <- 20
k5 <- 5
Part B
k7 <- 0.1
S1 <- 50000
S2 <- 30000
S3 <- 10
S4 <- 200
S5 <- 5
ntimesteps=365
timestep = 1
nomelt = 1
output17 <- mat.or.vec(ntimesteps,10)
for(n in 1:ntimesteps){
dS1 <- k1*Climate[n,5]*timestep
dS2 <- k2*Climate[n,5]*timestep
dS3 <- k3*S3*timestep
dS4 <- (k5*Climate[n,5]*timestep)
dS5 <- k7*S5*timestep
if (S4<dS4){
dS4 = S4
}
S1 <- S1-dS1
S2 <- S2-dS2
S3 <- (S3+ (dS5*k4) +dS1+dS2)-dS3
S4 <- (S4+Climate[n,4])-dS4
S5 <- (S5+Climate[n,3]+dS4)- dS5
t <- t+timestep
output17[n,1] <- t # Now lets output the data from this time step
Part B
output17[n,2] <- S1
output17[n,3] <- S2
output17[n,4] <- S3
output17[n,5] <- S4
}
plot(output15[,1],output15[,4],type="l",col="darkslategray", xlab="Time (days) ",ylab="Stage (cm) ",
ylim= c(0,400),
main = "Scenario testing - Run off rate (K7) ")
grid (NULL,NULL, lty = 7, col = "cornsilk2")
# plots first parameter scenario (K5=1)
points(x = (output16[,1]), y = output16[,4],xlab = "", ylab="", col="firebrick1",type="l")
# adds second parameter scenario (K5=5)
points(x = (output17[,1]), y = output17[,4],xlab = "", ylab="", col="deepskyblue3",type="l")
# adds second parameter scenario (K5=10)
legend("topleft",cex=0.8,col=c("darkslategray","firebrick1","deepskyblue3"),lty=1, # add legend
legend=c("K7=0.001","K7=0.01", "K7=0.1"))
#Summary:
# High values of K7 (0.1) result in more rapid responses to snow melt and rain events. Lower K7
values lead to more gradual release of
# water stored in the catchment and the drop in river stage following a runoff event is considerably
slower.
# See Figure 6
Part B
#-----------------------------------------------------------------------------------------------------
# Conclusion
#-----------------------------------------------------------------------------------------------------
# Despite significant simpliifaction of many catchment processes, the model produces a realistic
prediction of the response of river
# stage to contributing water sources in a glacial catchment.
# References:
#
# R Core Team (2013). R: A language and environment for statistical computing. R Foundation for
Statistical Computing, Vienna, Austria.
# URL http://www.R-project.org/.
#
#Hock, R. (2005). Glacier melt: a review of processes and their modelling. Progress in physical
geography, 29(3), 362-391.
#
# Pellicciotti, F., Brock, B., Strasser, U., Burlando, P., Funk, M., & Corripio, J. (2005). An enhanced
temperature-index glacier melt model
# including the shortwave radiation balance: Development and testing for Haut Glacier d'Arolla,
Switzerland. Journal of Glaciology, 51(175),
# 573-587.
Part B
Figure 1: Complete model
Part B
Figure 2: Scenario test investigating the impact of changing glacial melt parameters
Part B
Figure 3: Scenario test investigating the impact of changing the river flow parameter
Part B
Figure 4: Scenario test investigating the impact of changing the catchment size parameter
Part B
Figure 5: Scenario test investigating the impact of changing the catchment snow melt rate parameter
Part B
Figure 6: Scenario test investigating the impact of changing the catchment runoff rate parameter
版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。