代做Data Collection、代写R编程设计、代做Java/Python语言-代写Python编程

联系方式

QQ：99515681
邮箱：99515681@qq.com
工作时间：8:00-21:00
微信：codinghelp

您当前位置：首页 >> Python编程Python编程

代做Data Collection、代写R编程设计、代做Java/Python语言

日期：2019-05-20 11:21

Part A

A statistical analysis of the impact of drought on algal growth in experimental

mesocosm channels.

Introduction and Data Collection:

Algal biofilms play an important trophic role in riverine systems (Ledger and Hildrew, 1998).

The composition and biomass of biofilms is dependent on light availability, nutrient supply

and grazing pressure (Caramujo, et al. 2008). Therefore, Biofilms are at risk from the direct

impacts of drought and top down effects from grazers.

The DriStream experimental facility in Fobdown, Hampshire, consists of 21 channels

supplied with water from a borehole. The channels are 10m long and 50cm wide, with an

equal amount and size of substrate distributed in a pool riffle sequence that is consistent

across all channels. Channels were colonised with matching macroinvertebrate and

macrophyte populations. The macroinvertebrate assemblage is based on the nearby chalk

stream river, where the invertebrates were originally collected from. Once populations had

stabilised, channel depths were altered. In total there are three replicates of seven different

depths (0, 2, 5, 7, 10, 25 and 35 cm).

Biofilms were grown on terracotta tiles with an area of 100 cm2

. The Biofilm was

washed off using a toothbrush and distilled water before being stored for laboratory analysis.

Four measurements of both grazed and ungrazed tiles were taken from each channel.

Grazed tiles were placed on the substrate in the pool sections and ungrazed tiles were

located alongside, suspended in the water column in order to minimise the impact of grazing

benthic invertebrates. Samples were brought back to the laboratory where the Ash Free Dry

Mass (AFDM) and the mass of Chlorophyll A (CHLA) were tested. The data used in this

analysis was collected in November, 2014.

Data Preparation:

The response variables consisted of: CHLA and AFDM. Explanatory variables included:

Depth, Grazed/Ungrazed, Flow and Plant Volume. Average response values for grazed and

ungrazed tiles were used for each channel in order to reduce the impact of variability in the

explanatory variables (depth). The data was organised in a long format (Appendix 2).Rstudio

was used to conduct all statistical analyses (R Core Team, 2013). The following

commands were used to apply necessary attributes to the data in order to allow the

undertaking of particular techniques:

Algae <- read.csv(file.choose()) # Load data file

Part A

str (Algae) # check structure

Algae$Depth.Num <- Algae$Depth # creates an extra Depth column which will remain as

an integer

Algae$Depth <- as.factor(Algae$Depth) # convert depth from Int. to Fac.

Algae["Log10.Flow"] <- NA # creates the new column named "log10.Flow" filled with "NA"

Algae$Log10.Flow <- log10(Algae$Flow) # creates log10 flow for GAM analyses later

Algae["Log10.plant"] <- NA # creates the new column named "log10.Plant" filled with "NA"

Algae$Log10.Plant <- log10(Algae$Plant.Volume) # adds log 10 data for use with matlines

The following code was then used to subset grazed and ungrazed data so that the

relationships within each treatment may be investigated.

Ungrazed <- subset(Algae, Graze == "Ungrazed") # Ungrazed subset

Grazed <- subset(Algae, Graze == "Grazed") # Grazed subset

Data Exploration

Boxplots

A number of box plots were created to compare AFDM and CHLA with depth in order to

determine the nature of any response. Boxplots are simple but are a highly robust way of

exploring data because the inter quartile range (IQR) helps to limit the influence of outliers.

This is particularly useful for environmental data where distributions are commonly nonlinear

(Massart, et al. 2005).

1) Comparison of grazed and Ungrazed AFDM

op <- par(mfrow = c(1, 2))

boxplot(AFDM ~ Graze, data = Algae, ylab = "Ash Free Dry Mass") # hard to interpret

due to distribution needs transforming...

boxplot(log10(AFDM+0.001) ~ Graze, data = Algae, ylab = "log10 Ash Free Dry Mass") #

inspect transformed Ash Free Dry Mass (AFDM) difference between grazed and ungrazed

tiles

par(op)

Part A

Figure 1: A comparison of AFDM from grazed and ungrazed tiles. Left – before response (AFDM) was

transformed, Right – following log10+ 0.001 transformation.

Due to the distribution of the AFDM data set, a comparison was only possible following the

transformation of AFDM. There is a difference between grazed and ungrazed AFDM, which

is investigated further in this report. Log10 + 0.001 was used to transform AFDM as there

are many zero values in the data set (Zurr, et al. 2010).

2) Change in AFDM with depth

op <- par(mfrow = c(1, 2))

boxplot(log10(AFDM+0.001) ~ Depth, data = Ungrazed, ylab = "Ash Free Dry Mass", xlab =

"Depth (cm)", main = "Ungrazed") # inspect difference in AFDM with Depth for Ungrazed

boxplot(log10(AFDM+0.001) ~ Depth, data = Grazed, ylab = "Ash Free Dry Mass", xlab =

"Depth (cm)", main = "Grazed") # inspect difference in AFDM with Depth for Grazed

par(op)

Part A

Figure 2: change in AFDM with depth for both grazed and ungrazed tiles.

Changes in grazed and ungrazed tiles differ considerably. Neither patterns appear to be

linear.

3) A comparison of CHLA with grazed and ungrazed tiles.

boxplot(CHLA ~ Graze, data = Algae, ylab = "Chlorophyll A") # inspect Chlorophyll A

(CHLA) difference between grazed and ungrazed

Figure 3: CHLA for grazed and ungrazed tiles

Part A

Figure 3 shows that there is a greater spread for grazed CHLA than ungrazed. However,

there is not a significant difference between the means.

4) Change in CHLA with depth

op <- par(mfrow = c(1, 2))

boxplot(CHLA ~ Depth, data = Ungrazed, ylab = "Chlorophyll A",xlab = "Depth (cm)", main =

"Ungrazed") # inspect difference in CHLA with Depth for Ungrazed

boxplot(CHLA ~ Depth, data = Grazed, ylab = "Chlorophyll A",xlab = "Depth (cm)", main =

"Grazed") # inspect difference in CHLA with Depth for Grazed

par(op)

Figure 4: change in CHLA with depth for grazed and ungrazed tiles

Figure 4 highlights that there is a general increase in CHLA, for both grazed and ungrazed

tiles, with depth.

CoPlots

Based on the evidence from the boxplots it appeared that there was a difference between

grazed and ungrazed responses to Depth. Therefore coplots were created in order to

visualize potential interactions that may have been taking place (Zuur, et al. 2010).

Part A

1) AFDM change with grazing and Depth

coplot(log10(AFDM+0.001) ~ Depth | Graze, panel = panel.smooth, data = Algae) # coplots

for AFDM with depth and Grazing

Figure 5: Coplot of grazed and ungrazed AFDM change with depth.

There is not a significant difference between the lines in the diagrams which suggests there

is not a significant interaction taking place (Zurr, et al. 2010). Further test will be needed to

determine this with more certainty.

2) CHLA change with grazing and Depth

coplot(CHLA ~ Depth | Graze, panel = panel.smooth, data = Algae) # coplot for CHLA with

depth and Grazing

Part A

Figure 6: Coplot of grazed and ungrazed CHLA change with depth.

A difference in line shape suggest that interaction may be taking place between grazed and

ungrazed tiles or that the response of CHLA to depth differs slightly between grazed and

ungrazed tiles.

3) AFDM change with grazing and Plant Volume

It was hypothesised that light penetration may have an impact on bethic algae

photosynthesis. Therefore, the change in AFDM and CHLA (figure 8) with plant volume is

visulaised using coplots.

coplot(log10(AFDM+0.001) ~ log10(Plant.Volume)| Graze, panel = panel.smooth, data =

Algae) # coplots for AFDM with plant volume and Grazing

Part A

Figure 7: Coplot of grazed and ungrazed AFDM change with plant volume.

Figure 7 indicates that there may be interaction between grazed and ungrazed tiles with

changing plant volume. However, grazed AFDM seems to be affected by some significant

outliers which may obscure the comparison in this form.

4) CHLA change with grazing and Plant Volume

coplot(CHLA ~ log10(Plant.Volume) | Graze, panel = panel.smooth, data = Algae) # coplot

for CHLA with plant volume and Grazing - very similar to Depth

Figure 8: Coplot of grazed and ungrazed CHLA change with plant volume.

Part A

Figure 8 indicated interaction may be taking place. However, the slope patterns are similar to

figure 6 suggesting that depth and plant volume are collinear.

Scatter Matrices

Two scatter matrices were created at this stage (grazed and Ungrazed) which included all

variables. This was to help visualise the change in response variables with explanatory

variables, within the grazed or ungrazed treatments. The use of smoothers in the scatter

matrices could help to identify possible non-linear patterns (Logan, 2010).

1) Grazed Scatter Matrix

scatterplotMatrix(~ Depth + log10(Flow) + log10(Plant.Volume) + log10(AFDM+0.001) +

CHLA , data = Grazed, diag = "boxplot", main = "Grazed") # scatter matrix to view

responses in grazed tiles

Figure 9: Scatterplot Matrix for the Grazed tiles

Part A

Figure 9 shows that depth, flow and plant volume are highly collinear. Folllowing

transformation, the AFDM is still not normally distributed. CHLA appears to increase with

Depth, although at this stage it is hard to determine if this relationship is linear or not. CHLA

and AFDM are inversely related.

2) Ungrazed scatter matrix

scatterplotMatrix(~ Depth + log10(Flow) + log10(Plant.Volume) + log10(AFDM+0.001) +

CHLA, data = Ungrazed, diag = "boxplot",main = "Ungrazed") # scatter matrix to view

responses in ungrazed tiles

Figure 10: Scatterplot Matrix for the ungrazed tiles

Figure 10 shows that ungrazed CHLA increases with Depth, although the shape of both the

linear regression and smoother are considerably different to the grazed treatment. The

distribution of CHLA data is less normal than in the grazed treatment, whereas the

distribution of AFDM is more normal than the grazed treatment. The response of AFDM to

depth appears to be nonlinear.

Part A

GG Plots

The interaction between grazed and ungrazed treatments for both CHLA and log10 AFDM

were plotted using the plyr (Wickham, 2011) and ggplot2 packages (Wickham, 2009). Using

ggplot2 allows you to visualize the change in the mean of the response whilst also

considering the variance of the data.

1) Mean grazed/ungrazed CHLA against Depth

library(plyr)

AlgaeSum <- ddply(Algae, c("Depth.Num", "Graze"), summarise,

N = length(CHLA),

mean = mean(CHLA),

sd = sd(CHLA),

se = sd / sqrt(N) ) # parameters for CHLA ggplot

AlgaeSum # check the file....

require(ggplot2)

pd <- position_dodge(.3)

ggplot(AlgaeSum, aes(x = Depth.Num, y = mean, colour = Graze, group = Graze)) +

geom_errorbar(aes(ymin=mean-se, ymax=mean+se), width=.2, size = 0.25, colour =

"Black",

position = pd) + geom_line(position = pd) + geom_point(position = pd, size = 2.5)

# plot for CHLA

Part A

Figure 11: Grazed and ungrazed CHLA against depth (cm)

There is clearly interaction between grazed and ungrazed CHLA. At lower depths, grazed

tiles seem to have higher CHLA mass than ungrazed tiles. At higher depths, ungrazed CHLA

is higher than grazed. A lack of measurements between 10-35cm reduces the certainty of

predictions at larger depths.

2) Mean grazed/ungrazed log10 AFDM against Depth

AlgaeSumAFDM <- ddply(Algae, c("Depth.Num", "Graze"), summarise,

N = length(log10(AFDM+0.001)),

mean = mean(log10(AFDM+0.001)),

sd = sd(log10(AFDM+0.001)),

se = sd / sqrt(N) ) # parameters for AFDM ggplot

pd <- position_dodge(.3)

Part A

ggplot(AlgaeSumAFDM, aes(x = Depth.Num, y = mean, colour = Graze, group = Graze)) +

geom_errorbar(aes(ymin=mean-se, ymax=mean+se), width=.2, size = 0.25, colour =

"Black",

position = pd) + geom_line(position = pd) + geom_point(position = pd, size = 2.5)

# ggplot for log10AFDM

Figure 12: Grazed and ungrazed log10 AFDM against depth (cm)

Figure 12 indicates that there is limited interaction between grazed and ungrazed AFDM and

that grazed AFDM is consistently higher than ungrazed AFDM. Response of both grazed

and ungrazed AFDM is non-linear.

Testing Assumptions

Before conducting statistical analysis it is important to determine if the data fits the

distributional assumptions of parametric tests as these tests have a greater statistical power

than non-parametric tests (Logan, 2010). In order to conduct two sample parametric tests

Part A

(two-sample t-test) and a parametric analysis of variance (ANOVA), the data must be

normally distributed and have homogeneous variance.

1) Data distribution

The Shapiro-Wilks test was used to test is the data were normally distributed. The ShapiroWilks

test is a null hypothesis test against the assumption of normality. Therefore a p-value

below the confidence limit indicates data is not normally distributed (Fells Statsplyr, 2011).

shapiro.test(Algae$CHLA) # 0.000505

data: Algae$CHLA

W = 0.8843, p-value = 0.000505

shapiro.test(Algae$AFDM) # 2.422e-13 both p values are < 0.05 suggesting data is not

normally distributed need QQ plot to check

data: Algae$AFDM

W = 0.2512, p-value = 2.422e-13

CHLA and AFDM are determined to have P-values less than 0.05 therefore we reject the null

hypothesis that the data is normally distributed. However, the use of normality tests is

disputed due to their inflexibility with differing sample sizes (R-bloggers, 2011). Therefore,

further investigation is required using QQ-plots which plot a continuous data set against a

theoretical normal distribution (Zuur, 2009).

op <- par(mfrow = c(2, 1))

qqnorm(Algae$CHLA)

qqline(Algae$CHLA) # looks alright will assume CHLA isnormall distributed

qqnorm(Algae$AFDM)

qqline(Algae$AFDM) # not good. AFDM not normally distributed - must use nonparametric

Wilcox test for AFDM

par(op)

Part A

Figure 13: QQplots of (top) CHLA and (bottom) AFDM

AFDM data is clearly not normally distributed and as such will have to be analysed using

non-parametric tests. CHLA appears to be have reasonable normal distribution and

considering the robust nature of linear regression, ANOVA and t-tests to non-normal

distributions, the CHLA data is considered to be normally distributed (Zuur, et al. 2010,

Logan, 2010).

1) Homogeneity of Variance

As AFDM is not considered to be normally distributed, tests for variance were only carried

out for CHLA. Levene’s test was used to calculate homogeneity of variance. The test is

hypothesis testing, with a null hypothesis that variance is homogeneous. In order to carry out

a t-test variances between CHLA and grazed/ungrazed must be homogenous. ANOVA and

ANCOVA also require equal variance. However, the homogeneity of these statistical models

will be determined using residual validation plots (Zurr, et al.,2010)

leveneTest(Algae$CHLA~Algae$Graze) # p > 0.05 therefore variance is considered

homogeneous - can use parametric t-test CHLA

Levene's Test for Homogeneity of Variance (center = median)

Part A

Df F value Pr(>F)

group 1 1.0437 0.3131

leveneTest(Algae$CHLA~Algae$Depth) # p > 0.05 therefore variance is considered

homogeneous - can use ANOVA

Levene's Test for Homogeneity of Variance (center = median)

Df F value Pr(>F)

group 6 1.6695 0.1578

Both Levene’s tests revealed that variances were homogeneous. Therefore, CHLA data fits

the requirements for parametric testing.

Two-sample tests

1) Wilcoxon signed-rank test

A Wilcoxon test was used to investigate the difference between grazed and ungrazed AFDM

samples because the lack of normality in the distribution prevents the use of a t-test

(Dalgaard, 2002).

library(coin) # load package needed for U test (Hawthorn, et al. 2006)

wilcox_test(AFDM ~ Graze, data = Algae, distribution = "exact") # p<0.05 (0.0004401)

therefore we can conclude there is a difference between grazed and ungrazed AFDM data

Exact Wilcoxon Mann-Whitney Rank Sum Test

data: AFDM by Graze (Grazed, Ungrazed)

Z = 3.3969, p-value = 0.0004401

alternative hypothesis: true mu is not equal to 0

A p-value < 0.05 indicates that the null hypothesis is rejected and it is considered that there

is a significant difference between grazed and ungrazed AFDM.

Part A

2) T-test

As CHLA data fits the requirements for a parametric test (Dalgaard, 2002), a two-sample ttest

was undertaken to compare the grazed and ungrazed CHLA data.

t.test(CHLA ~ Graze, data = Algae) # p> 0.05 (0.6893) therefore the null hypothesis is

accepted. The mean CHLA value is not significantly different between grazed and ungrazed

tiles...

data: CHLA by Graze

t = 0.4027, df = 39.768, p-value = 0.6893

The calculated P-value is greater than 0.05 and therefore we accept the null hypothesis that

the difference between grazed and ungrazed CHLA is not significant.

Analysis of Variance

1) One-way ANOVA – Grazed treatment

ANOVA tests were carried out on CHLA with each control variable on grazed and ungrazed

treatments. This analysis will help to determine the influence of different control variables on

CHLA (Doncaster and Davey, 2007). For those tests that were statistically significant,

verification plots were analysed and post-hoc Tukey tests were applied.

#CHLA ~ Depth

CHLAgrazedaov <- aov(CHLA ~ Depth, data = Grazed) # anova of grazed tile CHLA

against Depth

summary(CHLAgrazedaov) # P<0.05 (0.0393) significant relationship

TukeyHSD(CHLAgrazedaov) # post hoc test to identify the difference between

means that are greater than the standard error

op <- par(mfrow = c(2, 2))

plot(CHLAgrazedaov) # plot validation shows slightly wedge shaped residuals

par(op)

Part A

leveneTest(Grazed$CHLA~Grazed$Depth) # p > 0.05 so we can consider variance

homogeneous.

#CHLA ~ PLant Volume

CHLAgrazedaov2 <- aov(CHLA ~ Plant.Volume, data = Grazed) # anova of grazed tile

CHLA against plant volume

summary(CHLAgrazedaov2) # p > 0.05 (0.595) Not significant

#CHLA ~ Flow

CHLAgrazedaov3 <- aov(CHLA ~ Flow, data = Grazed) # anova of grazed tile CHLA

against Flow

summary(CHLAgrazedaov3) # p > 0.05 (0.607) Not significant

For Grazed tiles, the only significant ANOVA test was CHLA with depth. The results are

summarised here:

Df Sum Sq Mean Sq F value Pr(>F)

Depth 6 0.03620 0.006034 3.069 0.0393 *

Residuals 14 0.02753 0.001966

A post-hoc Tukey test was then applied to identify differences between means that were

greater than the standard error (Logan, 2010):

Tukey multiple comparisons of means

95% family-wise confidence level

Fit: aov(formula = CHLA ~ Depth, data = Grazed)

$Depth

diff lwr upr p adj

2-0 0.036333333 -0.087289876 0.15995654 0.9448870

5-0 0.125333333 0.001710124 0.24895654 0.0459610

7-0 0.086000000 -0.037623209 0.20962321 0.2770254

10-0 0.098000000 -0.025623209 0.22162321 0.1670851

25-0 0.023333333 -0.100289876 0.14695654 0.9937725

35-0 0.076333333 -0.047289876 0.19995654 0.3983507

5-2 0.089000000 -0.034623209 0.21262321 0.2453632

7-2 0.049666667 -0.073956543 0.17328988 0.8074029

10-2 0.061666667 -0.061956543 0.18528988 0.6248235

Part A

25-2 -0.013000000 -0.136623209 0.11062321 0.9997609

35-2 0.040000000 -0.083623209 0.16362321 0.9164273

7-5 -0.039333333 -0.162956543 0.08428988 0.9221504

10-5 -0.027333333 -0.150956543 0.09628988 0.9858612

25-5 -0.102000000 -0.225623209 0.02162321 0.1396981

35-5 -0.049000000 -0.172623209 0.07462321 0.8163400

10-7 0.012000000 -0.111623209 0.13562321 0.9998495

25-7 -0.062666667 -0.186289876 0.06095654 0.6086721

35-7 -0.009666667 -0.133289876 0.11395654 0.9999574

25-10 -0.074666667 -0.198289876 0.04895654 0.4220462

35-10 -0.021666667 -0.145289876 0.10195654 0.9958082

35-25 0.053000000 -0.070623209 0.17662321 0.7602295

This test shows that the differences in the means of different depths is not significant.

Validation plots were then analysed to determine if the model can be accepted.

Figure 14: Validation plots for ANOVA test between grazed CHLA and depth

The residuals in figure 14 appear to be slightly wedge shaped. However, this is considered

to be attributed to a select few outliers. All plots are considered acceptable. A levene’s test

was carried out to check for homogeneity of variance:

Levene's Test for Homogeneity of Variance (center = median)

Df F value Pr(>F)

group 6 0.7662 0.6085

Part A

A P-value > 0.05 indicates that the null hypothesis is accepted – the variance for the test is

considered to be homogeneous.

2) One-way ANOVA – Ungrazed treatment

The same process was repeated for the ungrazed treatment.

#Ungrazed CHLA with Depth

CHLAUngrazedaov <- aov(CHLA ~ Depth, data = Ungrazed) # anova of ungrazed tile

CHLA against Depth

summary(CHLAUngrazedaov) # P>0.05 (0.18) not significant

# CHLA ~ PLant Volume

CHLAUngrazedaov2 <- aov(CHLA ~ Plant.Volume, data = Ungrazed) # anova of ungrazed

tile CHLA against plant volume.

summary(CHLAUngrazedaov2) # p <0.05 (0.0437) significant relationship

op <- par(mfrow = c(2, 2))

plot(CHLAUngrazedaov2) # plot validation shows that plant volume requires transforming...

par(op)

CHLAUngrazedaov2 <- aov(CHLA ~ log10(Plant.Volume), data = Ungrazed) # run anova

again with transformed plant volume

summary(CHLAUngrazedaov2) # p <0.05 (0.0236) significant relationship. improved

following transformation

op <- par(mfrow = c(2, 2))

Part A

plot(CHLAUngrazedaov2) # validaion plots are considerably better, apart from one major

outlier...

par(op)

Algaeno1 <- Ungrazed[(-1),] # remove influential point

CHLAUngrazedaov2 <- aov(CHLA ~ log10(Plant.Volume), data = Algaeno1) # run anova

again without influential point plant volume

summary(CHLAUngrazedaov2) # P<0.1 (0.0607)

op <- par(mfrow = c(2, 2))

plot(CHLAUngrazedaov2) # validaion plots look good.

par(op)

# Tukey test not applied as control variable is required to be a factor.

# Ungrazed CHLA with Flow

CHLAUngrazedaov3 <- aov(CHLA ~ Flow, data = Ungrazed) # anova of ungrazed tile

CHLA against Flow

summary(CHLAUngrazedaov3) # p > 0.05 (0.382) not significant

The only statistically significant ANOVA test for the ungrazed treatment was with plant

volume. The first summary is shown below:

Df Sum Sq Mean Sq F value Pr(>F)

Plant.Volume 1 0.01079 0.01079 4.668 0.0437 *

Residuals 19 0.04390 0.00231

Summary plots were then examined, which revealed that plant volume required transforming

in order to centre the residuals, fixing the heterogeneity issue.

Part A

Figure 15: Verification plots for ANOVA of ungrazed CHLA with plant volume (before data

transformation).

A log 10 transformation was used to reduce the difference in variances (Logan, 2010).

Figure 16: Verification plots for ANOVA of ungrazed CHLA with log10 plant volume (after control

variable transformation).

Part A

The Log10 transformation has reduced the heterogeneity of variance and verification plots

are considered acceptable, apart from one significant outlier, which records approximately

1.0 on the cook’s distance plot and is considered highly influential (Logan, 2010). This outlier

was removed and the ANOVA was re run, recording a P-value < 0.1 (0.06):

Df Sum Sq Mean Sq F value Pr(>F)

log10(Plant.Volume) 1 0.003307 0.003307 4.004 0.0607 .

Residuals 18 0.014866 0.000826

Validation plots were examined. Removal of the outlier, significantly improves any possible

issues of heterogeneous variance.

Figure 17: Final validation plot for ANOVA of ungrazed CHLA with plant volume (influential outlier

removed).

No Post-Hoc Tukey test was carried out for this ANOVA because a factor control variable is

required.

One-way ANOVA summary

One way ANOVA analyses reveal that grazed CHLA is significantly related to depth

(P<0.05). However, ungrazed CHLA is significantly related to plant volume (P<0.1).

Part A

3) Kruskal Wallace Test - Grazed AFDM

AFDM data is not normally distributed and as a result, the Kruskal-Wallace test is used to

test the difference between the medians of the population. This rank-based test is used as

an alternative to ANOVA (Logan, 2010).

#Grazed AFDM with Depth

AFDM.graze.KW <- kruskal.test(AFDM ~ Depth, data = Grazed) # Kruskal-Wallace test

(KW test) of grazed AFDM against Depth

AFDM.graze.KW # P > 0.05 (0.3614) not significant

#PLant Volume

AFDM.graze.KW2 <- kruskal.test(AFDM ~ Plant.Volume, data = Grazed) # KruskalWallace

test (KW test) of AFDM against Plant volume

AFDM.graze.KW2 # P > 0.05 (0.4579) not significant

#Flow

AFDM.graze.KW3 <- kruskal.test(AFDM ~ Flow, data = Grazed) # Kruskal-Wallace test

(KW test) of AFDM against Flow

AFDM.graze.KW3 # P > 0.05 (0.3954) not significant

#Ungrazed AFDM with Depth

AFDM.ungraze.KW <- kruskal.test(AFDM ~ Depth, data = Ungrazed) # Kruskal-Wallace

test (KW test) of Ungrazed AFDM against Depth

AFDM.ungraze.KW # P>0.05 (0.284) not significant

# PLant Volume

AFDM.ungraze.KW2 <- kruskal.test(AFDM ~ Plant.Volume, data = Ungrazed) # KruskalWallace

test (KW test) of Ungrazed AFDM against plant volume

Part A

AFDM.ungraze.KW2 # P>0.05 (0.4579) not significant

# Flow

AFDM.ungraze.KW3 <- kruskal.test(AFDM ~ Flow, data = Ungrazed) # Kruskal-Wallace

test (KW test) of Ungrazed AFDM against Flow

AFDM.ungraze.KW3 # P>0.05 (0.3954) not significant

Kruskal-Wallace summary:

No significant relationships between either grazed or ungrazed AFDM with any of the

explanatory variables.

4) Two-Way ANOVA

From the previous one-way ANOVA analysis it has been established that grazed CHLA is

related to depth and ungrazed CHLA is related to plant volume. The two-way analysis will

test the following four hypotheses (Doncaster and Davey, 2007):

1) Variation in CHLA is explained by the variance in depth and independently by the

variation in grazing (grazed/ungrazed).

# two-way anova considering grazing and Depth

Algae.aov1 <- aov(CHLA ~ Graze + Depth, data = Algae)

summary(Algae.aov1) # p > 0.05 not significant

The variance of CHLA cannot be explained by hypothesis 1.

2) Variation in CHLA is explained by the variance in plant volume and independently by the

variation in grazing (grazed/ungrazed).

# two-way anova considering grazing and plant volume

Algae.aov3 <- aov(CHLA ~ Graze + Plant.Volume, data = Algae)

Part A

summary(Algae.aov3) # p > 0.05 not significant

The variance of CHLA cannot be explained by hypothesis 2.

3) Variation in CHLA is explained by the inter-dependent effects of variance in plant volume

and grazing (grazed/ungrazed).

# two-way interaction considering grazing*plant volume

Algae.aov4 <- aov(CHLA ~ Graze*Plant.Volume, data = Algae)

summary(Algae.aov4) # Graze:Plant.Volume P = 0.0796

Df Sum Sq Mean Sq F value Pr(>F)

Graze 1 0.00048 0.000480 0.171 0.6815

Plant.Volume 1 0.00265 0.002647 0.943 0.3377

Graze:Plant.Volume 1 0.00911 0.009107 3.245 0.0796 .

Residuals 38 0.10666 0.002807

Interaction between graze and plant volume is significant (P<0.1). Inspect validation plots:

Figure 18: Explanatory plots for CHLA ~ graze*plant volume ANOVA.

Part A

Plots indicate that the control variable (plant volume) requires transformation due to

heterogeneity in the residuals. The ANOVA was re-run with log10 transformed plant volume.

Algae.aov4 <- aov(CHLA ~ Graze*log10(Plant.Volume), data = Algae) # run with

transformed explanatory

summary(Algae.aov4) # interaction is no longer significant

Df Sum Sq Mean Sq F value Pr(>F)

Graze 1 0.00048 0.000480 0.174 0.679

log10(Plant.Volume) 1 0.00893 0.008929 3.236 0.080 .

Graze:log10(Plant.Volume) 1 0.00464 0.004638 1.681 0.203

Residuals 38 0.10485 0.002759

Interaction between Graze and Plant volume is no longer significant following the removal of

residual heterogeneity.

4) Variation in CHLA is explained by the inter-dependent effects of variance in Depth and

grazing (grazed/ungrazed).

# two-way interaction considering grazing*Depth

Algae.aov2 <- aov(CHLA ~ Graze*Depth, data = Algae)

summary(Algae.aov2) # significant relationship (P<0.1). Graze:Depth (P= 0.0584)

Df Sum Sq Mean Sq F value Pr(>F)

Graze 1 0.00048 0.000480 0.229 0.6360

Depth 6 0.03019 0.005032 2.399 0.0536 .

Graze:Depth 6 0.02950 0.004917 2.344 0.0584 .

Residuals 28 0.05872 0.002097

Interaction between grazing and depth is significant (P<0.1). Validation plots were used to

assess the reliability of the analysis:

Part A

Figure 19: Explanatory plots for CHLA ~ graze*Depth ANOVA.

Validation plots indicate that there is some heterogeneity in residuals however, this is

influenced by three outliers. This ANOVA test is accepted. Further investigation with an

ANCOVA analysis is undertaken in this report.

Linear Models

1) One-way linear models

Based on the one-way ANOVA analyses we know which one-way linear models will be

significant (grazed CHLA with Depth and ungrazed CHLA with plant volume). Linear models

were created for these variables to visualise the relationship.

Ungrazed CHLA with Plant Volume:

CHLA.ungrazed.lm <- lm(CHLA ~ Log10.Plant, data = Algaeno1) # lm of grazed tile CHLA

against plant volume

summary(CHLA.ungrazed.lm ) # P<0.1 (0.0607) significant relationship

Part A

op <- par(mfrow = c(2, 2))

plot(CHLA.ungrazed.lm) # Plot validation same as anova - looks alright one major outlier

par(op)

Verification plots are shown in figure 17. Following verification, the model was plotted with

confidence intervals of 95%.

Plot(CHLA ~ Log10.Plant , data = Ungrazed, ylim=c(0,0.3), xlab = “log10 (plant volume)”,

ylab = “Chlorophyll A”,col = “blue”, main = “Ungrazed Tiles”) # plot CHLA for

ungrazed against Plant volume.

Abline(CHLA.ungrazed.lm) # fit the linear model line.

Text(5.5,0.25, “Chlorophyll A = -0.07508Plant.Volume+ 0.02797”, pos = 2)

text(5,0.23, expression(paste(P == 0.0607)), pos = 2)

text(5,0.21, expression(paste(R^2 == 0.1365)), pos = 2) # add equation P value and R2

x <- seq(min(Ungrazed$Log10.Plant), max(Ungrazed$Log10.Plant), l=1000) # for each

value of x, calculate the upper and lower 95% confidence

y<-predict(CHLA.ungrazed.lm, data.frame(Log10.Plant=x), interval=”c”)

matlines(x,y, lty=3, col=”black”) #plot the upper and lower 95% confidence limits

Part A

Figure 20: Linear model of ungrazed CHLA change with plant volume.

The residuals were then tested:

plot(CHLA.ungrazed.lm$resid ~ log10(Algaeno1$Plant.Volume)) # residuals plot looks

good . Model can be accepted.

Figure 21: Fitted vs. predicted residuals for the linear model of ungrazed CHLA against plant volume.

It was concluded that the model was reliable because variance is considered homogeneous.

Part A

Grazed CHLA with Depth:

CHLA.Grazed.lm2 <- lm(CHLA ~ Depth, data = Grazed) # lm of grazed tile CHLA against

plant volume

summary(CHLA.Grazed.lm2 ) # significant P<0.05

Residuals:

Min 1Q Median 3Q Max

-0.083667 -0.019000 -0.001667 0.023000 0.064333

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 0.002667 0.025600 0.104 0.91852

Depth2 0.036333 0.036204 1.004 0.33262

Depth5 0.125333 0.036204 3.462 0.00381 **

Depth7 0.086000 0.036204 2.375 0.03236 *

Depth10 0.098000 0.036204 2.707 0.01703 *

Depth25 0.023333 0.036204 0.644 0.52968

Depth35 0.076333 0.036204 2.108 0.05350 .

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.04434 on 14 degrees of freedom

Multiple R-squared: 0.5681, Adjusted R-squared: 0.383

F-statistic: 3.069 on 6 and 14 DF, p-value: 0.03929

This model shows a significant relationship with a P- value <0.05. Model validation plots can

be seen in figure 14.The model was then plotted to visualise the relationship. As depth is a

factor, plotting the linear model required the input of the slope value for each factor, as

shown in the code.

plot(CHLA ~ Depth.Num, data = Grazed, col = "blue", xlab = "Depth (cm)", ylab =

"Chlorophyll A", main = "Grazed Tiles") # create plot

abline(coef(CHLA.Grazed.lm2)[1], coef(CHLA.Grazed.lm2)[2-3-4-5-6-7]) # plot linear model

text(35,0.13, "Chlorophyll A = 0.002667Depth + 0.445332", pos = 2)

text(35,0.12, expression(paste(P == 0.03929)), pos = 2)

Part A

text(35,0.11, expression(paste(R^2 == 0.383)), pos = 2) # add equation, R^2 and P value

#calculate slope#

0.036333+0.125333+0.086000+0.098000+0.023333+0.076333

Figure 22: Linear model of grazed CHLA change with depth.

In order to finalise plot validation, a plots of predicted vs. fitted residuals were created:

plot(CHLA.Grazed.lm2$resid ~ Grazed$Depth.Num) # looks slightly wedge shaped,

outliers mainly causing this. model is accepted.

Part A

Figure 23: Fitted vs. predicted residuals for the linear model of grazed CHLA against depth.

Figure 24: Fitted vs. predicted residuals boxplot for the linear model of grazed CHLA against depth.

Figure 23 shows the residuals are unbiased but slightly heteroscedastic (Logan, 2010).

However, as depth is a factor, Figure 24 more accurately show the residual distribution,

which is slightly humped indicating possible heterogeneity of variance. This difference in

variance is caused by an uneven distribution of measured depths. If more measurements

had been recorded at depths between 10-35cm it is likely that the variance would be less

Part A

heterogeneous. The model is accepted, although further analysis is required to produce a

better fitting model.

2) Analysis of Co Variance (ANCOVA)

In order to produce an effective model for the whole data set, without separating grazed and

ungrazed treatments it is necessary to add grazing/ungrazing to a linear model as a

covariate, similar to two way-anova (figure 19) (Logan, 2010). This will help to explain some

of the variability in the data set and show how grazed and ungrazed CHLA respond to

changes in depth.

Before undertaking the ANCOVA analysis, a Coplot was created to explore the

significance of using Grazing as a covariate (Sarkar, 2008).

require(lattice)

# Here's the plot

xyplot(CHLA ~ Depth |Graze # create plot

,xlab = "Depth", ylab = "CHLA", data = Algae)

Figure 25: Coplot for CHLA against depth divided into the respective grazing catagories.

Part A

Figure 25 indicates that adding grazing as a covariate could help to remove some variation,

despite being considered not significantly different during a t-test. The ANCOVA model was

created and validated as follow:

algae.lm <- lm(CHLA ~ Depth* Graze, data = Algae) # linear ancova for CHLA with graze

and Depth

op <- par(mfrow = c(2,2))

plot(algae.lm)

par(op) # residuals are a bit wedge shaped as earlier with 2-way anova but mainly

caused by outliers and the unequal range of factors so I'll continue

Figure 26: Validation plots for ANCOVA analysis ( CHLA ~ Graze*Depth)

Validation plot indicates, possible heterogeneity of variance, however, this could have been

caused by the outliers or unequal distribution of depths. Further investigation of residuals is

required:

hist(algae.lm$resid) # residuals histogram

Part A

Figure 27: Histogram showing residuals for the ANCOVA model.

plot(algae.lm$resid) # residuals plot

Figure 28: Scatterplot showing the residuals for the ANCOVA model.

Part A

plot(algae.lm$resid ~ Algae$Depth) # residuals here look very good.

Figure 29: Boxplot showing the residuals against measured explanatory variable.

Through the Analysis of Figures 27-29, it is concluded that homogeneity of variance is

satisfactory and therefore the model can be accepted. The model was then evaluated using

an ANOVA analysis which recorded a significant value (P<0.1). The model fit had a higher

significance (P<0.05):

anova(algae.lm) # ANOVA of linear model, the same as the two-way ANOVA earlier

Response: CHLA

Df Sum Sq Mean Sq F value Pr(>F)

Depth 6 0.030194 0.0050323 2.3995 0.05364 .

Graze 1 0.000480 0.0004801 0.2289 0.63604

Depth:Graze 6 0.029499 0.0049165 2.3443 0.05838 .

Residuals 28 0.058723 0.0020972

summary(algae.lm) # looks good model has a significant p value (0.0386)

Residuals:

Min 1Q Median 3Q Max

-0.083667 -0.018917 -0.001833 0.019000 0.119667

Part A

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 0.002667 0.026440 0.101 0.92038

Depth2 0.036333 0.037392 0.972 0.33953

Depth5 0.125333 0.037392 3.352 0.00231 **

Depth7 0.086000 0.037392 2.300 0.02911 *

Depth10 0.098000 0.037392 2.621 0.01401 *

Depth25 0.023333 0.037392 0.624 0.53767

Depth35 0.076333 0.037392 2.041 0.05073 .

GrazeUngrazed 0.005000 0.037392 0.134 0.89458

Depth2:GrazeUngrazed 0.019667 0.052880 0.372 0.71276

Depth5:GrazeUngrazed -0.069333 0.052880 -1.311 0.20046

Depth7:GrazeUngrazed -0.058333 0.052880 -1.103 0.27937

Depth10:GrazeUngrazed -0.051000 0.052880 -0.964 0.34308

Depth25:GrazeUngrazed 0.095333 0.052880 1.803 0.08219 .

Depth35:GrazeUngrazed -0.018667 0.052880 -0.353 0.72673

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.0458 on 28 degrees of freedom

Multiple R-squared: 0.5061, Adjusted R-squared: 0.2768

F-statistic: 2.207 on 13 and 28 DF, p-value: 0.03867

The model is therefore accepted and is statistically significant. The ANCOVA model was

plotted as follows:

coef(algae.lm)

algae.lm$coeff[1] # Grazed intercept

algae.lm$coeff[2-3-4-5-6-7] # Grazedlope

algae.lm$coeff[1+8] # Ungrazed intercept

algae.lm$coeff[9-10-11-12-13-14] # Ungrazed slope # These are the coefficients

needed to form the lines with Depth as a factor

# plot

plot(CHLA ~ Depth.Num, data = Algae, pch = 19,cex = 1.5,

col = c("Black", "Red")[Algae$Graze],

xlab = list("Depth", cex = 1.2),

Part A

ylab = list("Chlorophyll A", cex = 1.2))

# Now add a legend

legend(1, 0.25, legend = c("Grazed", "Ungrazed"),

col = c("black", "red"), pch = 19)

# Now add the coefficients from above to the abline command to get both lines

abline(coef(algae.lm)[1], coef(algae.lm)[2-3-4-5-6-7]) # This adds the Grazed

regression line

# Now add the ungrazed regression line

abline(coef(algae.lm)[1] + coef(algae.lm)[8], # Ungrazed intercept

coef(algae.lm)[9-10-11-12-13-14], # Ungrazed slope

col = "red")

text(35,0.18, "Chlorophyll A*Graze = -0.0023 + 0.3629 * Depth", pos = 2)

text(35,0.16, expression(paste(P == 0.03867)), pos = 2)

text(35,0.14, expression(paste(R^2 == 0.2768)), pos = 2)

The following indicates the calculation of the ANCOVA equation:

#calculate Grazed slope#

0.036333+0.125333+0.086000+0.098000+0.023333+0.076333 # 0.445332

#calculate Ungrazed slope

0.019667-0.069333-0.058333-0.051000+0.095333-0.018667 # - 0.082333

# calculate intercept

0.0027-0.005 # -0.0023

# calculate intercept

0.445332- 0.082333 # 0.362999 therefore CHLOROPHYLLAgrazetype = -

0.0023+0.3629*Depth

Part A

Figure 30: ANCOVA plot showing the change in CHLA on grazed and ungrazed tiles with Depth (cm).

This model is the most effective at explaining the variance in CHLA in response to changing

depth (P=0.0039). It shows that there is (i) not a significant difference in CHLA mass on

grazed and ungrazed tiles, (ii) no significant interaction between grazed and ungrazed tiles

and (iii) that on average, ungrazed tiles have higher CHLA mass than grazed tiles.

3) Multiple Linear Regression

Based on the physiological requirements of benthic algae, changes in either flow or plant

volume could affect the growth of biofilms. Multiple linear regression may help to test this

possibility. The variation inflation factors (vifs) were inspected to determine multicollinearity.

Values above 5 are considered collinear and are therefore not appropriate for multiple linear

regression (Logan, 2010).

Part A

vif(lm(CHLA ~ Depth + log10(Flow) + log10(Plant.Volume), data = Algae ))

# colinearity prevents conducting multiple linear regression.

GVIF Df GVIF^(1/(2*Df))

Depth 550.57799 6 1.692005

log10(Flow) 67.72208 1 8.229343

log10(Plant.Volume) 101.36930 1 10.068232

With values well in excess of 5, these control variables can be considered highly collinear

and multiple linear regression was therefore not carried out.

4) Non-linear modelling:

Non-linear modelling cannot me conducted where the control variable is a factor. Therefore

the only relationship that presented the possibility of non-linear modelling was between

ungrazed CHLA and plant volume (figure 20). The model created was a second order

polynomial relationship, however, the model was a less suitable fit than the linear model

shown in figure 20 and was therefore not accepted:

algae.lm1 <- lm(CHLA ~ I(Log10.Plant^2), data = Algaeno1)

summary(algae.lm1) # significant p value (0.076999). However not as significant as the

linear model

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 0.0172684 0.0187194 0.922 0.368

I(Log10.Plant^2) 0.0014130 0.0007533 1.876 0.077 .

General Additive Models (GAMs)

Where trends are too complex, or their distribution does not fit the requirements of a

parametric regression analysis, GAMs can be used to show patterns in response variables

using a smoother function (Logan, 2010). As the change in AFDM was determined to be

unsuitable for linear modelling and the trends displayed in the data are clearly non-linear

(Figure 12), GAM analysis is more appropriate than general linear modelling.

Part A

GAMs require a numerical control variable and therefore, depth is unsuitable for use

with GAMs. In its place, log10 transformed flow values have been used. The relationship

between log10 flow and depth is shown to be highly significant (P< 2.2e-16) at all depths:

depth.log10flow <- lm( log10(Flow) ~ Depth , data = Algae)

summary(depth.log10flow) # linear model of depth and log10 flow to show they

are highly related. log10 flow is used for GAMS because Depth is a factor and has fewer

unique covariate combinations than specified maximum degrees of freedom. P = 2.2e-16

and R^2 = 0.978

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) -1.08329 0.06374 -17.00 <2e-16 ***

Depth2 1.55748 0.09014 17.28 <2e-16 ***

Depth5 1.66349 0.09014 18.45 <2e-16 ***

Depth7 2.03831 0.09014 22.61 <2e-16 ***

Depth10 2.30769 0.09014 25.60 <2e-16 ***

Depth25 3.10382 0.09014 34.44 <2e-16 ***

Depth35 3.34504 0.09014 37.11 <2e-16 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.1561 on 35 degrees of freedom

Multiple R-squared: 0.9812, Adjusted R-squared: 0.978

F-statistic: 305 on 6 and 35 DF, p-value: < 2.2e-16

AFDM was transformed using the log10+1 technique (Zuur, et al. 2010) in order to remove

some of the variance and allow for greater ease of interpretation. The analysis was carried

out on grazed and ungrazed tiles separately as follows:

1) GAM of grazed AFDM with flow

AFDM.gam1 <- gam(log10(AFDM+0.001)~ s(Log10.Flow), data = Grazed) # run GAM of

AFDM against flow for grazed tiles

plot(AFDM.gam1) # looks as if there is a decrease in AFDM with flow.

summary(AFDM.gam1) # significant smoother term (p = 0.00172)

Part A

Approximate significance of smooth terms:

edf Ref.df F p-value

s(Log10.Flow) 1.863 2.216 8.66 0.00172 **

gam.check(AFDM.gam1) # verification plots are mostly

Figure 31: Verification plots for GAM of AFDM with log10 flow.

The verification plots for this GAM (Figure 31) are mostly satisfactory. Heterogeneity in the

residuals is likely caused by the lack of channels between10 – 25cm depth. The complete

GAM plot is seen below:

plot(log10(AFDM+0.001) ~ Log10.Flow, data = Grazed,

pch = 16,

xlab = "Log10 Flow",

ylab = "Ash Free Dry Mass", main = "GAM of Grazed AFDM ")

# predict across the data

x <- seq(min(Grazed$Log10.Flow), max(Grazed$Log10.Flow), l=100) # 100 steps....

y <- predict(AFDM.gam1, data.frame(Log10.Flow = x), se = TRUE) # using standard errors

se = TRUE

Part A

# add lines

lines(x, y$fit) # plots the fitted values

lines(x, y$fit + 2 * y$se.fit, lty = 2) # plots a dashed line for 2 * SE above the fit

lines(x, y$fit - 2 * y$se.fit, lty = 2) # plots a dashed line for 2 * SE below the fit

Figure 32: GAM of log10 +1 AFDM change with log10 Flow.

The model is accepted and indicates that AFDM decreases with increasing log10 flow.

2) GAM of ungrazed AFDM with flow

AFDM.gam2 <- gam(log10(AFDM+0.001) ~ s(Log10.Flow), data = Ungrazed) # run GAM of

AFDM against flow for ungrazed tiles

plot(AFDM.gam2) # plot GAM - very clearly no pattern...

Part A

Figure 33: GAM plot for ungrazed log10+1 AFDM change with log10 flow.

summary(AFDM.gam2) # smoother term not significant P>0.1

Approximate significance of smooth terms:

edf Ref.df F p-value

s(Log10.Flow) 1 1 0.004 0.948

The GAM (Figure 33) indicates that there is no relationship between ungrazed AFDM and

flow.

Summary:

1) A Wilcoxon’s test revealed that there was a significant difference between AFDM on

grazed and ungrazed tiles (P<0.05).

2) Using a T-test is was shown that CHLA mass was not significantly different on grazed and

ungrazed tiles (P>0.1).

3) One-way ANOVA analysis revealed that grazed CHLA was significantly related to depth

(P<0.05) and that ungrazed CHLA was significantly related to plant volume (P<0.01).

4) 2-way ANOVA and ANCOVA analyses were used to investigate the covariation effects of

depth and grazing. The model showed that variation in CHLA is explained by the interdependent

effects of variance in Depth and grazing (P<0.05).

5) GAMs were produced for grazed and ungrazed AFDM change with depth. The models

showed that grazed AFDM decreased non-linearly with increasing depth and that ungrazed

AFDM did not respond to changes in depth.

Part A

References:

Caramujo, M. J., Mendes, C. R. B., Cartaxana, P., Brotas, V., & Boavida, M. J. (2008).

Influence of drought on algal biofilms and meiofaunal assemblages of temperate

reservoirs and rivers. Hydrobiologia, 598(1), 77-94.

Dalgaard, P (2008). Introductory statistics with R. New York: Springer. 95-224.

Doncaster, C. P., & Davey, A. J. (2007). Analysis of variance and covariance: how to choose

and construct models for the life sciences. Cambridge University Press.

Fells Stats. (2011). Normality tests don't do what you think they do.Available:

http://blog.fellstat.com/?p=61. Last accessed 1st May 2014.

Fox, J and Weisberg, S. (2011). An {R} Companion to Applied Regression, Second Edition.

Thousand Oaks CA: Sage. URL:

http://socserv.socsci.mcmaster.ca/jfox/Books/Companion

Hothorn, T., Hornik, K., van de Wiel, M.A., and Zeileis, A. (2006). A Lego System for

Conditional Inference. The American Statistician 60(3), 257-263.

Ledger, M. E., & Hildrew, A. G. (1998). Temporal and spatial variation in the epilithic biofilm

of an acid stream. Freshwater Biology, 40(4), 655-670.

Massart, D. L., Smeyers-Verbeke, J., Capron, X., & Schlesier, K. (2005). Visual presentation

of data by means of box plots.Logan, 2010

R Core Team (2013). R: A language and environment for statistical computing. R

Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/.

Sarkar, D. (2008) Lattice: Multivariate Data Visualization with R. Springer, New York. ISBN

978-0-387-75968-5

Wickham, H. (2009) ggplot2: elegant graphics for data analysis. Springer New York.

Wickham, H. (2011). The Split-Apply-Combine Strategy for Data Analysis. Journal of

Statistical Software, 40(1), 1-29. URL: http://www.jstatsoft.org/v40/i01/.

Zuur, A. F., Ieno, E. N., & Elphick, C. S. (2010). A protocol for data exploration to avoid

common statistical problems. Methods in Ecology and Evolution,1(1), 3-14.

Part A

Appendices:

Appendix 1 – R script:

######### Algae Analysis Coursework for Environmenta Analysis and Monitoring ##############

rm(list=ls()) # Clear Memory

# set working directory

getwd() # check directory location

Algae <- read.csv(file.choose()) # Load data file

str (Algae) # check structure

Algae$Depth.Num <- Algae$Depth # creates an extra Depth column which will remain as an

integer

Algae$Depth <- as.factor(Algae$Depth) # convert depth from Int. to Fac.

Algae["Log10.Flow"] <- NA # creates the new column named "log10.Flow" filled with "NA"

Algae$Log10.Flow <- log10(Algae$Flow) # creates log10 data for flow for GAM analyses later

Algae["Log10.Plant"] <- NA # creates the new column named "log10.Plant" filled with "NA"

Algae$Log10.Plant <- log10(Algae$Plant.Volume) # adds log 10 data for use with matlines

######## subset grazed and ungrazed ############

Ungrazed <- subset(Algae, Graze == "Ungrazed") # Ungrazed subset

Ungrazed

Grazed <- subset(Algae, Graze == "Grazed") # Grazed subset

Grazed

##### plot some images to see what the data looks like ######

Part A

#Boxplots

op <- par(mfrow = c(1, 2))

boxplot(AFDM ~ Graze, data = Algae, ylab = "Ash Free Dry Mass") # hard to interpret due to

distribution needs transforming...

boxplot(log10(AFDM+0.001) ~ Graze, data = Algae, ylab = "log10 Ash Free Dry Mass") # inspect

transformed Ash Free Dry Mass (AFDM) difference between grazed and ungrazed

par(op)

op <- par(mfrow = c(1, 2))

boxplot(log10(AFDM+0.001) ~ Depth, data = Ungrazed, ylab = "log10 Ash Free Dry Mass", xlab =

"Depth (cm)", main = "Ungrazed") # inspect difference in AFDM with Depth for Ungrazed

boxplot(log10(AFDM+0.001) ~ Depth, data = Grazed, ylab = "log10 Ash Free Dry Mass", xlab =

"Depth (cm)", main = "Grazed") # inspect difference in AFDM with Depth for Grazed

par(op)

boxplot(CHLA ~ Graze, data = Algae, ylab = "Chlorophyll A") # inspect Chlorophyll A (CHLA)

difference between grazed and ungrazed

op <- par(mfrow = c(1, 2))

boxplot(CHLA ~ Depth, data = Ungrazed, ylab = "Chlorophyll A",xlab = "Depth (cm)", main =

"Ungrazed") # inspect difference in CHLA with Depth for Ungrazed

boxplot(CHLA ~ Depth, data = Grazed, ylab = "Chlorophyll A",xlab = "Depth (cm)", main = "Grazed") #

inspect difference in CHLA with Depth for Grazed

par(op)

# Coplots

coplot(log10(AFDM+0.001) ~ Depth | Graze, panel = panel.smooth, data = Algae) # coplots for AFDM

with depth and Grazing

coplot(CHLA ~ Depth | Graze, panel = panel.smooth, data = Algae) # coplots for CHLA with depth

and Grazing

coplot(log10(AFDM+0.001) ~ log10(Plant.Volume)| Graze, panel = panel.smooth, data = Algae) #

coplots for AFDM with plant volume and Grazing

Part A

coplot(CHLA ~ log10(Plant.Volume) | Graze, panel = panel.smooth, data = Algae) # coplots for

CHLA with plant volume and Grazing - very similar to Depth

# scatterplot matrix

require(car)

scatterplotMatrix(~ Depth + log10(Flow) + log10(Plant.Volume) + log10(AFDM+0.001) + CHLA

, data = Grazed, diag = "boxplot", main = "Grazed") # scatter matrix to view responses

in grazed tiles

scatterplotMatrix(~ Depth + log10(Flow) + log10(Plant.Volume) + log10(AFDM+0.001) + CHLA

, data = Ungrazed, diag = "boxplot",main = "Ungrazed") # scatter matrix to view

responses in ungrazed tiles

###gg plot to compare the mean and errors of grazed and ungrazed CHLA ####

library(plyr)

AlgaeSum <- ddply(Algae, c("Depth.Num", "Graze"), summarise,

N = length(CHLA),

mean = mean(CHLA),

sd = sd(CHLA),

se = sd / sqrt(N) ) # parameters for CHLA ggplot

AlgaeSum # check the file....

require(ggplot2)

pd <- position_dodge(.3)

ggplot(AlgaeSum, aes(x = Depth.Num, y = mean, colour = Graze, group = Graze)) +

geom_errorbar(aes(ymin=mean-se, ymax=mean+se), width=.2, size = 0.25, colour = "Black",

position = pd) + geom_line(position = pd) + geom_point(position = pd, size = 2.5) # plot for

CHLA

Part A

AlgaeSumAFDM <- ddply(Algae, c("Depth.Num", "Graze"), summarise,

N = length(log10(AFDM+0.001)),

mean = mean(log10(AFDM+0.001)),

sd = sd(log10(AFDM+0.001)),

se = sd / sqrt(N) ) # parameters for AFDM ggplot

pd <- position_dodge(.3)

ggplot(AlgaeSumAFDM, aes(x = Depth.Num, y = mean, colour = Graze, group = Graze)) +

geom_errorbar(aes(ymin=mean-se, ymax=mean+se), width=.2, size = 0.25, colour = "Black",

position = pd) + geom_line(position = pd) + geom_point(position = pd, size = 2.5) # ggplot

for log10AFDM

######## testing Assumptions ########

# shapiro Wilks

shapiro.test(Algae$CHLA) # 0.000505

shapiro.test(Algae$AFDM) # 2.422e-13 both p values are < 0.05 suggesting data is not normally

distributed need QQ plot to check

# qqplots based on http://www.r-bloggers.com/normality-tests-don%E2%80%99t-do-what-you-thinkthey-do/

do not accept shapiro test completely...

op <- par(mfrow = c(2, 1))

qqnorm(Algae$CHLA)

qqline(Algae$CHLA) # looks alright will assume CHLA isnormall distributed

qqnorm(Algae$AFDM)

qqline(Algae$AFDM) # not good. AFDM not normally distributed - must use non-parametric Wilcox

test for AFDM

par(op)

# variance

Part A

var(Grazed$CHLA) # 0.003186514

var(Ungrazed$CHLA) # 0.002734262 quite similar

# levene's test

leveneTest(Algae$CHLA~Algae$Graze) # p > 0.05 therefore variance is considered homogeneous -

can use parametric t-test CHLA

leveneTest(Algae$CHLA~Algae$Depth) # p > 0.05 therefore variance is considered homogeneous -

can use ANOVA

#### comparison between grazed and ungrazed tiles #######

###AFDM Wilcoxon test ####

library(coin) # load package needed for U test

wilcox_test(AFDM ~ Graze, data = Algae, distribution = "exact") # p<0.05 (0.0004401) therefore we

can conclude there is a difference between

# grazed and ungrazed AFDM data

### CHLA t-test ######

t.test(CHLA ~ Graze, data = Algae) # p> 0.05 (0.6893) therefore the null hypothesis is accepted. The

mean CHLA value is not significantly different

# between grazed and ungrazed tiles...

###### one way ANOVA ##########

#Depth

CHLAgrazedaov <- aov(CHLA ~ Depth, data = Grazed) # anova of grazed tile CHLA against Depth

summary(CHLAgrazedaov) # P<0.05 (0.0393) significant relationship

TukeyHSD(CHLAgrazedaov) # post hoc test to identify the difference between means that are

greater than the standard error

op <- par(mfrow = c(2, 2))

Part A

plot(CHLAgrazedaov) # plot validation shows wedge shaped residuals

par(op)

leveneTest(Grazed$CHLA~Grazed$Depth) # p > 0.05 so we can consider variance homogeneous.

#PLant Volume

CHLAgrazedaov2 <- aov(CHLA ~ Plant.Volume, data = Grazed) # anova of grazed tile CHLA against

plant volume

summary(CHLAgrazedaov2) # p > 0.05 (0.595) Not significant

#Flow

CHLAgrazedaov3 <- aov(CHLA ~ Flow, data = Grazed) # anova of grazed tile CHLA against Flow

summary(CHLAgrazedaov3) # p > 0.05 (0.607) Not significant

#Ungrazed CHLA with Depth

CHLAUngrazedaov <- aov(CHLA ~ Depth, data = Ungrazed) # anova of ungrazed tile CHLA against

Depth

summary(CHLAUngrazedaov) # P>0.05 (0.18) not significant

# PLant Volume

CHLAUngrazedaov2 <- aov(CHLA ~ Plant.Volume, data = Ungrazed) # anova of ungrazed tile

CHLA against plant volume.

summary(CHLAUngrazedaov2) # p <0.05 (0.0437) significant relationship

op <- par(mfrow = c(2, 2))

plot(CHLAUngrazedaov2) # plot validation shows that plant volume requires

transforming...

par(op)

CHLAUngrazedaov2 <- aov(CHLA ~ log10(Plant.Volume), data = Ungrazed) # run anova again with

transformed plant volume

summary(CHLAUngrazedaov2) # p <0.05 (0.0236) significant relationship.

improved following transformation

Part A

op <- par(mfrow = c(2, 2))

plot(CHLAUngrazedaov2) # validaion plots are considerably better, apart from one

major outlier...

par(op)

Algaeno1 <- Ungrazed[(-1),] # remove influential point

CHLAUngrazedaov2 <- aov(CHLA ~ log10(Plant.Volume), data = Algaeno1) # run anova again

without influential point plant volume

summary(CHLAUngrazedaov2) # P<0.1 (0.0607)

op <- par(mfrow = c(2, 2))

plot(CHLAUngrazedaov2) # validaion plots look good.

par(op)

# Tukey test not applied as control variable is required to be a factor.

# Flow

CHLAUngrazedaov3 <- aov(CHLA ~ Flow, data = Ungrazed) # anova of ungrazed tile CHLA against

Flow

summary(CHLAUngrazedaov3) # p > 0.05 (0.382) not significant

### one way ANOVA summary - grazed CHLA is significantly related to Depth and ungrazed CHLA is

significantly related to plant Volume #####

####### Kruskal Wallace test for AFDM data ###### non parametric test is needed for AFDM

#Grazed AFDM with Depth

AFDM.graze.KW <- kruskal.test(AFDM ~ Depth, data = Grazed) # Kruskal-Wallace test (KW test) of

grazed AFDM against Depth

AFDM.graze.KW # P > 0.05 (0.3614) not significant

#PLant Volume

AFDM.graze.KW2 <- kruskal.test(AFDM ~ Plant.Volume, data = Grazed) # Kruskal-Wallace test

(KW test) of AFDM against Plant volume

AFDM.graze.KW2 # P > 0.05 (0.4579) not significant

Part A

#Flow

AFDM.graze.KW3 <- kruskal.test(AFDM ~ Flow, data = Grazed) # Kruskal-Wallace test (KW test) of

AFDM against Flow

AFDM.graze.KW3 # P > 0.05 (0.3954) not significant

#Ungrazed AFDM with Depth

AFDM.ungraze.KW <- kruskal.test(AFDM ~ Depth, data = Ungrazed) # Kruskal-Wallace test (KW

test) of Ungrazed AFDM against Depth

AFDM.ungraze.KW # P>0.05 (0.284) not significant

# PLant Volume

AFDM.ungraze.KW2 <- kruskal.test(AFDM ~ Plant.Volume, data = Ungrazed) # Kruskal-Wallace

test (KW test) of Ungrazed AFDM against plant volume

AFDM.ungraze.KW2 # P>0.05 (0.4579) not significant

# Flow

AFDM.ungraze.KW3 <- kruskal.test(AFDM ~ Flow, data = Ungrazed) # Kruskal-Wallace test (KW

test) of Ungrazed AFDM against Flow

AFDM.ungraze.KW3 # P>0.05 (0.3954) not significant

## KW test summary - no significant relationships found between grazed or ungrazed AFDM with

control variables ######

########## two-way anova with CHLA data ######

# two-way anova considering grazing and Depth

Algae.aov1 <- aov(CHLA ~ Graze + Depth, data = Algae)

summary(Algae.aov1) # p > 0.05 not significant

# two-way interaction considering grazing*Depth

Algae.aov2 <- aov(CHLA ~ Graze*Depth, data = Algae)

summary(Algae.aov2) # significant relationship (P<0.1). Graze:Depth (P= 0.0584)

op <- par(mfrow = c(2, 2))

Part A

plot(Algae.aov2) # Plot validation - Risiduals vs. fitted is wedge shaped, caused by a

few outliers...

par(op)

# two-way anova considering grazing and plant volume

Algae.aov3 <- aov(CHLA ~ Graze + Plant.Volume, data = Algae)

summary(Algae.aov3) # p > 0.05 not significant

# two-way interaction considering grazing*plant volume

Algae.aov4 <- aov(CHLA ~ Graze*Plant.Volume, data = Algae)

summary(Algae.aov4) # Graze:Plant.Volume 0.0796. Not as significant as with

depth

op <- par(mfrow = c(2, 2))

plot(Algae.aov4) # Plot validation - Risiduals vs. fitted is wedge shaped. Variance is

hetrogeneous - needs transforming.

par(op)

Algae.aov4 <- aov(CHLA ~ Graze*log10(Plant.Volume), data = Algae) # run with transformed

explanatory

summary(Algae.aov4) # interaction is no longer significant

#### linear models... we know which aov tests were significant - run the linear model for Ungrazed

CHLA with plant volume #####

CHLA.ungrazed.lm <- lm(CHLA ~ Log10.Plant, data = Algaeno1) # lm of grazed tile CHLA against

plant volume

summary(CHLA.ungrazed.lm ) # P<0.1 (0.0607) significant relationship

op <- par(mfrow = c(2, 2))

plot(CHLA.ungrazed.lm) # Plot validation same as anova - looks alright one major

outlier

par(op)

# plot model

plot(CHLA ~ Log10.Plant , data = Ungrazed, ylim=c(0,0.3), xlab = "log10 (plant volume)",

ylab = "Chlorophyll A",col = "blue", main = "Ungrazed Tiles") # plot CHLA for ungrazed

against Plant volume.

Part A

abline(CHLA.ungrazed.lm) # fit the linear model line.

text(5.5,0.25, "Chlorophyll A = -0.07508Plant.VOlume+ 0.02797", pos = 2)

text(5,0.23, expression(paste(P == 0.0607)), pos = 2)

text(5,0.21, expression(paste(R^2 == 0.1365)), pos = 2) # add equation P value and R2

x <- seq(min(Ungrazed$Log10.Plant), max(Ungrazed$Log10.Plant), l=1000) # #for each value of x,

calculate the upper and lower 95% confidence

y<-predict(CHLA.ungrazed.lm, data.frame(Log10.Plant=x), interval="c")

matlines(x,y, lty=3, col="black") #plot the upper and lower 95% confidence limits

plot(CHLA.ungrazed.lm$resid ~ log10(Algaeno1$Plant.Volume)) # residuals plot looks good . Model

can be accepted.

### linear model for CHLA grazed with depth ####

CHLA.Grazed.lm2 <- lm(CHLA ~ Depth, data = Grazed) # lm of grazed tile CHLA against plant

volume

summary(CHLA.Grazed.lm2 ) # significant P<0.05

plot(CHLA ~ Depth.Num, data = Grazed, col = "blue", xlab = "Depth (cm)", ylab = "Chlorophyll A",

main = "Grazed Tiles") # create plot

abline(coef(CHLA.Grazed.lm2)[1], coef(CHLA.Grazed.lm2)[2-3-4-5-6-7]) # plot linear model

text(35,0.13, "Chlorophyll A = 0.002667Depth + 0.445332", pos = 2)

text(35,0.12, expression(paste(P == 0.03929)), pos = 2)

text(35,0.11, expression(paste(R^2 == 0.383)), pos = 2) # add equation, R^2 and P value

#calculate slope#

0.036333+0.125333+0.086000+0.098000+0.023333+0.076333

op <- par(mfrow = c(2, 2))

plot(CHLA.Grazed.lm2) # Plot validation same as anova - looks alright. 3 outliers effect

the residuals

par(op)

Part A

plot(CHLA.Grazed.lm2$resid ~ Grazed$Depth.Num) # looks slightly wedge shaped, outliers mainly

causing this. model is accepted.

plot(CHLA.Grazed.lm2$resid ~ Grazed$Depth) # less wedge shaped but slightly humped

####### ANCOVA ###########

require(lattice)

# Here's the plot

xyplot(CHLA ~ Depth |Graze # create a plot of CHLA change with depth

and grazed/ungrazed.Sarkar, D. (2008) Lattice: Multivariate Data Visualisation. Springer.

,xlab = "Depth", ylab = "CHLA",

data = Algae)

plot(CHLA ~ Depth.Num, data = Algae, pch = 19,cex = 1.5,

col = c("Black", "Red")[Algae$Graze],

xlab = list("Depth", cex = 1.2),

ylab = list("Chlorophyll A", cex = 1.2))

# Now add a legend

legend(1, 0.25, legend = c("Grazed", "Ungrazed"), # creates similar to above but on the

same graph

col = c("black", "red"), pch = 19)

algae.lm <- lm(CHLA ~ Depth* Graze, data = Algae) # linear ancova for CHLA with graze and Depth

op <- par(mfrow = c(2,2))

plot(algae.lm)

par(op) # residuals are a bit wedge shaped as earlier with 2-way anova but mainly caused by

outliers so I'll continue

hist(algae.lm$resid) # residuals histogram

plot(algae.lm$resid) # residuals plot

plot(algae.lm$resid ~ Algae$Depth) # residuals here look very good!

Part A

anova(algae.lm)

summary(algae.lm) # looks good model has a significant p value (0.0386)

coplot(CHLA ~ Depth | Graze, panel = panel.smooth, data = Algae) # co plot as earlier to view

interaction

interaction.plot(Algae$Depth.Num, Algae$Graze, Algae$CHLA, col=c(2,3)) # interaction plot - similar

to gg plot earlier

###### create final final ancova plot #######

coef(algae.lm)

algae.lm$coeff[1] # Grazed intercept

algae.lm$coeff[2-3-4-5-6-7] # Grazedlope

algae.lm$coeff[1+8] # Ungrazed intercept

algae.lm$coeff[9-10-11-12-13-14] # Ungrazed slope # These are the coefficients needed to

form the lines with Depth as a factor

# plot

plot(CHLA ~ Depth.Num, data = Algae, pch = 19,cex = 1.5,

col = c("Black", "Red")[Algae$Graze],

xlab = list("Depth", cex = 1.2),

ylab = list("Chlorophyll A", cex = 1.2))

# Now add a legend

legend(1, 0.25, legend = c("Grazed", "Ungrazed"),

col = c("black", "red"), pch = 19)

# Now add the coefficients from above to the abline command to get both lines

abline(coef(algae.lm)[1], coef(algae.lm)[2-3-4-5-6-7]) # This adds the Grazed regression line

# Now add the ungrazed regression line

abline(coef(algae.lm)[1] + coef(algae.lm)[8], # Ungrazed intercept

coef(algae.lm)[9-10-11-12-13-14], # Ungrazed slope

col = "red") # This adds the ungrazed line

text(35,0.18, "Chlorophyll A*Graze = -0.0023 + 0.3629 * Depth", pos = 2)

text(35,0.16, expression(paste(P == 0.03867)), pos = 2)

text(35,0.14, expression(paste(R^2 == 0.2768)), pos = 2)

Part A

#calculate Grazed slope#

0.036333+0.125333+0.086000+0.098000+0.023333+0.076333 # 0.445332

#calculate Ungrazed slope

0.019667-0.069333-0.058333-0.051000+0.095333-0.018667 # - 0.082333

# calculate intercept

0.0027-0.005 # -0.0023

# calculate intercept

0.445332- 0.082333 # 0.362999 therefore CHLOROPHYLLAgrazetype = -0.0023+0.3629*Depth

##### multiple linear regression ######

scatterplotMatrix(~ Depth + log10(Flow) + log10(Plant.Volume) + CHLA

, data = Algae, diag = "boxplot")

#Check for multi colinearity by correlation

cor(Algae[,2:3]) # check for correlation between flow and plant volume.

?cor # big issues

# and again with vifs. No values below 3.

vif(lm(CHLA ~ Depth + log10(Flow) + log10(Plant.Volume), data = Algae ))

# colinearity prevents conducting multiple linear regression.

##### non linear models ###########

# following plotting of the linear model of CHLA against plant volume, it looked as if a non

linearpattern may be present, here is the model...

algae.lm1 <- lm(CHLA ~ I(Log10.Plant^2), data = Algaeno1)

summary(algae.lm1) # significant p value (0.076999). However not as

significant as the linear model

op <- par(mfrow = c(2,2))

plot(algae.lm1)

par(op) # apart from outliers, validation plots look good. Model can be accepted

Part A

plot(CHLA ~ Log10.Plant , data = Ungrazed, ylim=c(0,0.3), xlab = "log10 (plant volume)",

ylab = "Chlorophyll A",col = "blue") # plot CHLA for ungrazed against Plant volume. Not able

to apply best fit line due to log10 x axis.

x <- seq(min(Algae$Log10.Plant), max(Algae$Log10.Plant), l=1000) # 1000 steps....

y <- predict(algae.lm1, data.frame(Log10.Plant = x), interval = "c") # plots matlines of ploy2

relationship on graph

matlines(x, y, lty = 1, col = 1)

text(5.5,0.25, "Chlorophyll A = --0.007340*log10(Plant.VOlume)+ 0.002791", pos = 2)

text(5,0.23, expression(paste(P == 0.02509)), pos = 2)

text(5,0.21, expression(paste(R^2 == 0.1972)), pos = 2) # plot equation, P value and R2.

# the model is reasonable, however the linear model provides a higher P value and R^2 value. Accept

the linear model.

#### Generalised addative models for AFDM data ####### GAMS have been chosen over

generalised linear models due to non-linear patterns

depth.log10flow <- lm( log10(Flow) ~ Depth , data = Algae)

summary(depth.log10flow) # linear model of depth and log10 flow to show they

are highly related. log10 flow is used

# for GAMS because Depth is a factor and has fewer unique

covariate combinations than specified

# maximum degrees of freedom. P = 2.2e-16 and R^2 = 0.978

scatterplotMatrix(~ Depth + log10(Flow) + log10(Plant.Volume) + log10(AFDM+0.001)

, data = Algae, diag = "boxplot") # inspect variables and transform

AFDM

require(mgcv) # for GAMS

# use Log10.Flow because plotting GAMS doesn't allow transformations during the plot creation

Part A

plot(log10(AFDM+0.001) ~ Log10.Flow, data= Grazed) # plot to observe possible patterns of

grazed AFDM with flow. possible non-linear pattern

AFDM.gam1 <- gam(log10(AFDM+0.001)~ s(Log10.Flow), data = Grazed) # run GAM of AFDM

against flow for grazed tiles

plot(AFDM.gam1) # looks as if there is a decrease in AFDM with flow.

summary(AFDM.gam1) # significant smoother term (p = 0.00172)

gam.check(AFDM.gam1) # verification plots are mostly good with the

exception of a few major outliers

# plot GAM1

plot(log10(AFDM+0.001) ~ Log10.Flow, data = Grazed,

pch = 16,

xlab = "Log10 Flow",

ylab = "Ash Free Dry Mass", main = "GAM of Grazed AFDM ")

# predict across the data

x <- seq(min(Grazed$Log10.Flow), max(Grazed$Log10.Flow), l=100) # 100 steps....

y <- predict(AFDM.gam1, data.frame(Log10.Flow = x), se = TRUE) # using standard errors se =

TRUE

# add lines

lines(x, y$fit) # plots the fitted values

lines(x, y$fit + 2 * y$se.fit, lty = 2) # plots a dashed line for 2 * SE above the fit

lines(x, y$fit - 2 * y$se.fit, lty = 2) # plots a dashed line for 2 * SE below the fit

# plot indicates a decrease in AFDM with flow

### Ungrazed AFDM

plot(log10(AFDM+0.001) ~ Log10.Flow, data = Ungrazed) # doesn't look as if there is a significant

response of AFDM to flow

AFDM.gam2 <- gam(log10(AFDM+0.001) ~ s(Log10.Flow), data = Ungrazed) # run GAM of AFDM

against flow for ungrazed tiles

plot(AFDM.gam2) # plot GAM - very clearly no pattern...

summary(AFDM.gam2) # # smoother term not significant P>0.1

gam.check(AFDM.gam2) # humped response vs. fitted is far from linear.

further plotting not needed.

Part A

###### All models so far produced for CHLA indicate a high spread of results with low R^2 values.

GAMS are used to investigate

###### whether non-linear trends have been overlooked during the previous analysis.

### Grazed CHLA

plot(CHLA ~ Log10.Flow, data = Grazed) # check for non linear patterns using a GAM

AFDM.gam3 <- gam(CHLA ~ s(Log10.Flow), data = Grazed) # run GAM of AFDM against flow for

grazed tiles

plot(AFDM.gam3) # indicates a non linear pattern

summary(AFDM.gam3) # smoother term only significant to p<0.1 (0.0964)

gam.check(AFDM.gam3) # a bit wedge shaped

# plot Gam3

plot(CHLA ~ Log10.Flow, data = Grazed,

pch = 16,

xlab = "Log10 Flow",

ylab = "Ash Free Dry Mass")

# predict across the data

x <- seq(min(Grazed$Log10.Flow), max(Grazed$Log10.Flow), l=100) # 100 steps....

y <- predict(AFDM.gam3, data.frame(Log10.Flow = x), se = TRUE) # using standard errors se =

TRUE

# add lines

lines(x, y$fit) # plots the fitted values

lines(x, y$fit + 2 * y$se.fit, lty = 2) # plots a dashed line for 2 * SE above the fit

lines(x, y$fit - 2 * y$se.fit, lty = 2) # plots a dashed line for 2 * SE below the fit

## Ungrazed CHLA

plot(CHLA ~ Log10.Flow, data = Ungrazed) # patter looks like there may be non linear pattern -

check for non linear patterns using a GAM

AFDM.gam4 <- gam(CHLA ~ s(Log10.Flow), data = Ungrazed) # run GAM of AFDM against flow for

grazed tiles

plot(AFDM.gam4) # very clear non linear pattern with narrow confidence limits

summary(AFDM.gam4) # highly significant smoother term (p = 1.21e-05)

Part A

gam.check(AFDM.gam4) # residuals are slightly humped but generally look good.

# plot GAM 4

plot(CHLA ~ Log10.Flow, data = Ungrazed,

pch = 16,

xlab = "Log10 Flow",

ylab = "Ash Free Dry Mass")

# predict across the data

x <- seq(min(Ungrazed$Log10.Flow), max(Ungrazed$Log10.Flow), l=100) # 100 steps....

y <- predict(AFDM.gam4, data.frame(Log10.Flow = x), se = TRUE) # using standard errors se =

TRUE

# add lines

lines(x, y$fit) # plots the fitted values

lines(x, y$fit + 2 * y$se.fit, lty = 2) # plots a dashed line for 2 * SE above the fit

lines(x, y$fit - 2 * y$se.fit, lty = 2) # plots a dashed line for 2 * SE below the fit

### note on GAM 4 - major outlier caused significant pattern change in GAM

Part A

Appendix 2 - Dataset:

Depth Flow Plant.Volume AFDM CHLA Graze

25 46.56 1201786 0.0234 0.246 Ungrazed

10 20.34 72028.57 0.0084 0.058 Ungrazed

7 8.76 31196.43 0 0.017 Ungrazed

7 9.96 41871.43 0.0078 0.057 Ungrazed

2 2.41 15121.43 0.0108 0.076 Ungrazed

35 209.1 1171936 0 0.037 Ungrazed

2 3.21 20967.86 0.3114 0.05 Ungrazed

0 0.075 2650 0.006 0.012 Ungrazed

10 23.4 50732.14 0.009 0.067 Ungrazed

35 130.5 1424357 0.0294 0.045 Ungrazed

25 150 1090200 0.0114 0.054 Ungrazed

25 165 1276714 0.0678 0.079 Ungrazed

5 2.1 13485.71 0.027 0.116 Ungrazed

35 223.5 1319379 0.024 0.114 Ungrazed

0 0.075 2235.714 0.0066 0.007 Ungrazed

2 3.42 25385.71 0.0276 0.065 Ungrazed

10 9.9 55778.57 0.0108 0.039 Ungrazed

5 5.22 31492.86 0.042 0.03 Ungrazed

7 8.4 34646.43 0.0384 0.032 Ungrazed

5 5.02 32964.29 0.0216 0.045 Ungrazed

0 0.1 4878.571 0.0132 0.004 Ungrazed

25 46.56 1201786 0.0282 0.056 Grazed

10 20.34 72028.57 0.021 0.075 Grazed

7 8.76 31196.43 0.021 0.005 Grazed

7 9.96 41871.43 0.0132 0.108 Grazed

2 2.41 15121.43 0.0324 0.034 Grazed

35 209.1 1171936 0.0204 0.077 Grazed

2 3.21 20967.86 0.033 0.063 Grazed

0 0.075 2650 6.7008 0.003 Grazed

10 23.4 50732.14 0.0396 0.065 Grazed

35 130.5 1424357 0.0222 0.097 Grazed

25 150 1090200 0.0294 0 Grazed

25 165 1276714 0.0498 0.022 Grazed

5 2.1 13485.71 0.0432 0.173 Grazed

35 223.5 1319379 0.0546 0.063 Grazed

0 0.075 2235.714 3.0228 0.004 Grazed

2 3.42 25385.71 0.1176 0.02 Grazed

10 9.9 55778.57 0.0456 0.162 Grazed

5 5.22 31492.86 0.048 0.151 Grazed

7 8.4 34646.43 0.0804 0.153 Grazed

5 5.02 32964.29 0.4884 0.06 Grazed

0 0.1 4878.571 0.0384 0.001 Grazed

Part B

Modelling the stage of a river in a glacial catchment

(All relevant images, Figures 1-5, are attached at the end of the document)

# Intoduction: This model attempts to predict the stage of a river in a hypothetical glacial catchment.

The inputs of two different glaciers

# of different sizes are considered, along with the input of snow melt and rain. The model consists of

5 water stores: i. Glacier 1,

# ii. Glacier 2, iii. Snow pack, iv. catchment storage and v. River storage. The model predicts the daily

average stage of the river at a

# particular location for 364 days. The data set used is not based on a real location and assumes a

start date of January 2nd and an end date

# of December 31st.

# clear memory

rm(list=ls())

# set working directory

Climate <- read.csv(file.choose()) # load file

Climate$temp0 <- ifelse(Climate$Temperature < 0, 0, Climate$Temperature) # changes negative

temperatures to 0^C. Ice melt is assumed takes place

# at 1^C and above (Pelliciotti, et al. 2005). Negative temperatures will affect the equations for snow

and glacial melt, therefore all negative temperatures are set

# to zero, and as a result no melt takes place.

# define the model parameters and the inital conditions

t <- 0 # Time at the start of the simulations

k1 <- 7 # Model parameter for glacier 1 melt - describes the contribution of the glacier to river

discharge. A function of glacier size and melt rate

Part B

k2 <- 2 # Model parameter for small glacier melt - describes the contribution of glacier 2 to river

discharge

k3 <- 0.9 # model parameter for river flow - This constant controls the rate of flow from the river.

The constant is a reflection of the river's

# cross sectional area. A smaller cross section (due to a constriction) will lead to reduced

discharge (Q=Av). When K3 = 1, it is assumed that

# the rate of inflow is equal to the rate of outflow.

k4 <- 20 # model parameter for rainfall and snow melt contribution to river discharge. Reflects the

size of the catchment

k5 <- 5 # model parameter for snow melt. Describes the rate of snow melt

k7 <- 0.1 # decribes the storage time of water within the catchment (the flashiness of the

catchment). Where a value of 1 means that all water in the catchment

# reaches the river immediately and 0 means that no water reaches the river.

S1 <- 50000 # Inital storage of Glacier 1 (large value that will not be significantly diminished over

a year)

S2 <- 30000 # Inital storage of Glacier 2

S3 <- 10 # Inital river stage below the confluence of the two glacial streams

S4 <- 200 # initial snow cover

S5 <- 5 # intial rainwater storage in catchment

ntimesteps=365 # model runs for a year

timestep = 1 #timestep is 1 day

output <- mat.or.vec(ntimesteps,10) # creation of Output vector. 10 columns are required due for

the input of values following the model run.

# Run model

for(n in 1:ntimesteps){ # set loop to run model for each time step

dS1 <- k1*Climate[n,5]*timestep # Calculate the flow out of the Glacier 1 (K* Temp * t). Clacial

discharge is assumed to be linearly related to temperature.

# (Pelliciotti, et al. 2005)

dS2 <- k2*Climate[n,5]*timestep # Calculate the flow out of the Glacier 2 (K* Temp * t)

dS3 <- k3*S3*timestep # Calculate the flow through the river (K* water level * t)

Part B

dS4 <- (k5*Climate[n,5]*timestep) # calculate flow from snow melt (K* Temp * t). snowmelt rate is

assumed to have a linear relationship with temperature (Hock, 2005)

dS5 <- k7*S5*timestep # calculate the runoff from catchment.

if (S4<dS4){ #if the snow melt discharge is greater than the depth of the snow pack then no melt

takes place. Ensures that snow melt does not

dS4 = S4 # occur when there is no snow present.

}

S1 <- S1-dS1 # Volume of ice in Glacier 1

S2 <- S2-dS2 # Volume of ice in Glacier 2

S3 <- (S3+ (dS5*k4) +dS1+dS2)-dS3 # The stage of the river below the confluence of the glacial

streams. It includes the contribution

# of discharge from the glaciers, snow melt, rain and the loss of water due to river discharge.

S4 <- (S4+Climate[n,4])-dS4 # describes the average depth of the snow pack within the catchment.

S5 <- (S5+Climate[n,3]+dS4)- dS5 # decribes the average storage of rain water and snow melt in the

catchment

t <- t+timestep # New time step

output[n,1] <- t # Data is output into the matrix that was created earlier.

output[n,2] <- S1

output[n,3] <- S2

output[n,4] <- S3

Part B

output[n,5] <- S4

}

output[,6] <- append(output[,2], 0, 0)[-nrow(output)]

output[,7] <- as.matrix(output[,6]-output[,2]) # creates daily average discharge from Glacier 1.

storage - the storage from the previous day

output[,8] <- append(output[,3], 0, 0)[-nrow(output)]

output[,9] <- as.matrix(output[,8]-output[,3]) # creates daily average discharge from Glacier 2.

storage - the storage from the previous day

output2 <- output[-1,] # removes january 1st value due to inability calculate glacier discharge

without the previous day's storage.

#creates new output matrix.

climate2 <- Climate[-1,] # as a result January 1st values must also be removed from Climate data.

#-----------------

# PLot the Model

#-----------------

x <- output2[,1] # define x, y and secondary y axis

y1 <- output2[,4]

y2 <- climate2[,3]

par(mar=c(5,4,4,5)+.1) # allows for the labelling of the secondary axis

Part B

plot(x,y1,type="l",col="blue", xlab="Time (days) ",ylab= "River stage (cm), Glacial Discharge (

m^3/s^-1) and Snowpack depth (cm)",

cex.lab = 0.8, ylim= c(-87,400), main = "Glacial Catchment Model") # plots stage / time

points(x = (output2[,1]), y = output2[,7],xlab = "", ylab="", col="chartreuse",type="l") # adds

glacier 1 discharge

points(x = (output2[,1]), y = output2[,9],xlab = "", ylab="", col="chartreuse4",type="l") # adds

glacier 2 dscharge

points(x = (output2[,1]), y = output2[,5],xlab = "", ylab="", col="gold",type="l") # adds depth of

snow pack

par(new=TRUE)

g_range <- range(0, climate2[,2:3]) # define range of secondary axis

plot(x, y2,type="l",col="black",xaxt="n",yaxt="n",xlab="",ylab="", ylim=g_range ) # plot rainfall

data

axis(4)

points(x = (output2[,1]), y = climate2[,2],xlab = "", ylab="", col="red",type="l") # plot temperature

data

mtext("Rainfall (mm) and Temperature (c)",side=4,line=3)

legend("topleft",cex=0.8,col=c("blue","chartreuse","chartreuse4","gold","black","red"),lty=1, #

add legend

legend=c("River Stage","Glacier1 Q", "Glacier2 Q", "Snowpack depth","rainfall","temperature"))

# See Figure 1

#----------------------------------------------------------------------------------------------

# Scenario testing

#----------------------------------------------------------------------------------------------

# In order to test the sensitivity of the model, three different values for each parameter will be

tested in order

# to determine the significance of different factors in influencing the outcome of the model.

Part B

#----------------------------------------------------

#-- GLACIAL MELT volume --

# ---------------------------------------------------

# Parameters K1 and K2 control the amount of discharge from glaciers 1 and 2. They therefore

reflect the size of the glacial toe

# and the sensitivity to melting. This could be affected by topography, subglacial characteristics and

englacial dynamics.

# Melt rates can be assumed to be linearly related to temperature, however, the rates between

glaciers may vary (Pelliciotti, et al. 2005).

#1) K1 = 2, K2 = 1

# define the model parameters and the inital conditions

t <- 0

k1 <- 2

k2 <- 1

k3 <- 0.9

k4 <- 20

k5 <- 5

k7 <- 0.1

S1 <- 50000

S2 <- 30000

S3 <- 10

S4 <- 200

S5 <- 5

ntimesteps=365

timestep = 1

nomelt = 1

output3 <- mat.or.vec(ntimesteps,10)

for(n in 1:ntimesteps){

Part B

dS1 <- k1*Climate[n,5]*timestep

dS2 <- k2*Climate[n,5]*timestep

dS3 <- k3*S3*timestep

dS4 <- (k5*Climate[n,5]*timestep)

dS5 <- k7*S5*timestep

if (S4<dS4){

dS4 = S4

}

S1 <- S1-dS1

S2 <- S2-dS2

S3 <- (S3+ (dS5*k4) +dS1+dS2)-dS3

S4 <- (S4+Climate[n,4])-dS4

S5 <- (S5+Climate[n,3]+dS4)- dS5

t <- t+timestep

output3[n,1] <- t # Now lets output the data from this time step

output3[n,2] <- S1

output3[n,3] <- S2

output3[n,4] <- S3

output3[n,5] <- S4

}

#2) K1 = 4, K2 = 2.5

# define the model parameters and the inital conditions

t <- 0

Part B

k1 <- 4

k2 <- 2.5

k3 <- 0.9

k4 <- 20

k5 <- 5

k7 <- 0.1

S1 <- 50000

S2 <- 30000

S3 <- 10

S4 <- 200

S5 <- 5

ntimesteps=365

timestep = 1

nomelt = 1

output4 <- mat.or.vec(ntimesteps,10)

for(n in 1:ntimesteps){

dS1 <- k1*Climate[n,5]*timestep

dS2 <- k2*Climate[n,5]*timestep

dS3 <- k3*S3*timestep

dS4 <- (k5*Climate[n,5]*timestep)

dS5 <- k7*S5*timestep

if (S4<dS4){

dS4 = S4

}

S1 <- S1-dS1

S2 <- S2-dS2

S3 <- (S3+ (dS5*k4) +dS1+dS2)-dS3

S4 <- (S4+Climate[n,4])-dS4

Part B

S5 <- (S5+Climate[n,3]+dS4)- dS5

t <- t+timestep

output4[n,1] <- t # Now lets output the data from this time step

output4[n,2] <- S1

output4[n,3] <- S2

output4[n,4] <- S3

output4[n,5] <- S4

}

#3) K1 = 7, K2 = 4

# define the model parameters and the inital conditions

t <- 0

k1 <- 7

k2 <- 4

k3 <- 0.9

k4 <- 20

k5 <- 5

k7 <- 0.1

S1 <- 50000

S2 <- 30000

S3 <- 10

S4 <- 200

S5 <- 5

ntimesteps=365

timestep = 1

nomelt = 1

Part B

output5 <- mat.or.vec(ntimesteps,10)

for(n in 1:ntimesteps){

dS1 <- k1*Climate[n,5]*timestep

dS2 <- k2*Climate[n,5]*timestep

dS3 <- k3*S3*timestep

dS4 <- (k5*Climate[n,5]*timestep)

dS5 <- k7*S5*timestep

if (S4<dS4){

dS4 = S4

}

S1 <- S1-dS1

S2 <- S2-dS2

S3 <- (S3+ (dS5*k4) +dS1+dS2)-dS3

S4 <- (S4+Climate[n,4])-dS4

S5 <- (S5+Climate[n,3]+dS4)- dS5

t <- t+timestep

output5[n,1] <- t # Now lets output the data from this time step

output5[n,2] <- S1

output5[n,3] <- S2

output5[n,4] <- S3

output5[n,5] <- S4

}

plot(output3[,1],output3[,4],type="l",col="deepskyblue", xlab="Time (days) ",ylab="River Stage (cm)

", ylim= c(0,420),

Part B

main = "Scenario testing - Glacial melt rate parameter (K1 and K2)")

# plots first parameter scenario (K1 = 2, k2 = 1)

points(x = (output4[,1]), y = output4[,4],xlab = "", ylab="", col="blueviolet",type="l")

# adds second parameter scenario (K1 = 4, k2 = 2.5)

points(x = (output5[,1]), y = output5[,4],xlab = "", ylab="", col="deeppink",type="l")

# adds second parameter scenario (K1 = 6, k2 = 3)

legend("topleft",cex=0.8,col=c("deepskyblue","blueviolet","deeppink"),lty=1, # add legend

legend=c("K1=2, K2=1","K1=4, K2=2.5", "K1=6, K2=3"))

# Summary:

# increases in volume are linearly related to each other. Increases in glacial melt lead to increased

river stage. If necessary, a

# lag function could be applied to the model depending on the time taken for glacial melt to reach

the river system. This would be

# dependent on subglacial lakes or the positioning of moraines. However, with a time step of one

day, it is considered that this lag

# would have a limited impact on the model.

# See Figure 2

#----------------------------------------------------------------------------

# River Flow Parameter

#----------------------------------------------------------------------------

# The river flow parameter (K3) determines the rate of flow through the river. THis parameter will be

a function of cross sectional

# area and velocity (slope). Where K3 = 1 discharge of inputs is considered to be equal to the river

disharge.

#1) K3=0.3

# define the model parameters and the inital conditions

Part B

t <- 0

k1 <- 7

k2 <- 2

k3 <- 0.3

k4 <- 20

k5 <- 5

k7 <- 0.1

S1 <- 50000

S2 <- 30000

S3 <- 10

S4 <- 200

S5 <- 5

ntimesteps=365

timestep = 1

nomelt = 1

output6 <- mat.or.vec(ntimesteps,10)

for(n in 1:ntimesteps){

dS1 <- k1*Climate[n,5]*timestep

dS2 <- k2*Climate[n,5]*timestep

dS3 <- k3*S3*timestep

dS4 <- (k5*Climate[n,5]*timestep)

dS5 <- k7*S5*timestep

if (S4<dS4){

dS4 = S4

}

S1 <- S1-dS1

S2 <- S2-dS2

S3 <- (S3+ (dS5*k4) +dS1+dS2)-dS3

Part B

S4 <- (S4+Climate[n,4])-dS4

S5 <- (S5+Climate[n,3]+dS4)- dS5

t <- t+timestep

output6[n,1] <- t # Now lets output the data from this time step

output6[n,2] <- S1

output6[n,3] <- S2

output6[n,4] <- S3

output6[n,5] <- S4

}

#2) K3=0.6

# define the model parameters and the inital conditions

t <- 0

k1 <- 7

k2 <- 2

k3 <- 0.6

k4 <- 20

k5 <- 5

k7 <- 0.1

S1 <- 50000

S2 <- 30000

S3 <- 10

S4 <- 200

S5 <- 5

ntimesteps=365

timestep = 1

Part B

nomelt = 1

output7 <- mat.or.vec(ntimesteps,10)

for(n in 1:ntimesteps){

dS1 <- k1*Climate[n,5]*timestep

dS2 <- k2*Climate[n,5]*timestep

dS3 <- k3*S3*timestep

dS4 <- (k5*Climate[n,5]*timestep)

dS5 <- k7*S5*timestep

if (S4<dS4){

dS4 = S4

}

S1 <- S1-dS1

S2 <- S2-dS2

S3 <- (S3+ (dS5*k4) +dS1+dS2)-dS3

S4 <- (S4+Climate[n,4])-dS4

S5 <- (S5+Climate[n,3]+dS4)- dS5

t <- t+timestep

output7[n,1] <- t # Now lets output the data from this time step

output7[n,2] <- S1

output7[n,3] <- S2

output7[n,4] <- S3

output7[n,5] <- S4

}

Part B

#3) K3= 1

# define the model parameters and the inital conditions

t <- 0

k1 <- 7

k2 <- 2

k3 <- 1

k4 <- 20

k5 <- 5

k7 <- 0.1

S1 <- 50000

S2 <- 30000

S3 <- 10

S4 <- 200

S5 <- 5

ntimesteps=365

timestep = 1

nomelt = 1

output8 <- mat.or.vec(ntimesteps,10)

for(n in 1:ntimesteps){

dS1 <- k1*Climate[n,5]*timestep

dS2 <- k2*Climate[n,5]*timestep

dS3 <- k3*S3*timestep

dS4 <- (k5*Climate[n,5]*timestep)

dS5 <- k7*S5*timestep

if (S4<dS4){

dS4 = S4

}

Part B

S1 <- S1-dS1

S2 <- S2-dS2

S3 <- (S3+ (dS5*k4) +dS1+dS2)-dS3

S4 <- (S4+Climate[n,4])-dS4

S5 <- (S5+Climate[n,3]+dS4)- dS5

t <- t+timestep

output8[n,1] <- t # Now lets output the data from this time step

output8[n,2] <- S1

output8[n,3] <- S2

output8[n,4] <- S3

output8[n,5] <- S4

}

plot(output6[,1],output6[,4],type="l",col="darkturquoise", xlab="Time (days) ",ylab="Dishcarge (m3

s-1) ", ylim= c(0,1000),

main = "Scenario testing - River flow parameter (River cross section) (K3) ")

# plots first parameter scenario (K3=0.3)

points(x = (output7[,1]), y = output7[,4],xlab = "", ylab="", col="dodgerblue4",type="l")

# adds second parameter scenario (K3=0.6)

points(x = (output8[,1]), y = output8[,4],xlab = "", ylab="", col="aquamarine3",type="l")

# adds second parameter scenario (K3=1)

legend("topleft",cex=0.8,col=c("darkturquoise","dodgerblue4","aquamarine3"),lty=1, # add

legend

legend=c("K3=0.3","K3=0.6", "K3=1"))

Part B

# Summary:

# Changes in K3 lead to a proportional change in the river stage. The function allows the model to be

adjusted for different types of

# river or lake stores. small changes in K3 can have significant impacts on the stage of the river.

# see Figure 3

#----------------------------------------------------------------------------

# Rain and snow melt contribution Parameter (Catchment size)

#----------------------------------------------------------------------------

# This parameter (K4) scales the average rainfall and snowmelt across the catchment to a value

which reflects the runoff contribution

# to river discharge. An increase in K4 is expected to increase the magnitude of the response to

catchment runoff.

#1) K4=1

# define the model parameters and the inital conditions

t <- 0

k1 <- 7

k2 <- 2

k3 <- 0.9

k4 <- 1

k5 <- 5

k7 <- 0.1

S1 <- 50000

S2 <- 30000

S3 <- 10

S4 <- 200

S5 <- 5

ntimesteps=365

timestep = 1

Part B

nomelt = 1

output9 <- mat.or.vec(ntimesteps,10)

for(n in 1:ntimesteps){

dS1 <- k1*Climate[n,5]*timestep

dS2 <- k2*Climate[n,5]*timestep

dS3 <- k3*S3*timestep

dS4 <- (k5*Climate[n,5]*timestep)

dS5 <- k7*S5*timestep

if (S4<dS4){

dS4 = S4

}

S1 <- S1-dS1

S2 <- S2-dS2

S3 <- (S3+ (dS5*k4) +dS1+dS2)-dS3

S4 <- (S4+Climate[n,4])-dS4

S5 <- (S5+Climate[n,3]+dS4)- dS5

t <- t+timestep

output9[n,1] <- t # Now lets output the data from this time step

output9[n,2] <- S1

output9[n,3] <- S2

output9[n,4] <- S3

output9[n,5] <- S4

}

Part B

#2) K4=20

# define the model parameters and the inital conditions

t <- 0

k1 <- 7

k2 <- 2

k3 <- 0.9

k4 <- 20

k5 <- 5

k7 <- 0.1

S1 <- 50000

S2 <- 30000

S3 <- 10

S4 <- 200

S5 <- 5

ntimesteps=365

timestep = 1

nomelt = 1

output10 <- mat.or.vec(ntimesteps,10)

for(n in 1:ntimesteps){

dS1 <- k1*Climate[n,5]*timestep

dS2 <- k2*Climate[n,5]*timestep

dS3 <- k3*S3*timestep

dS4 <- (k5*Climate[n,5]*timestep)

dS5 <- k7*S5*timestep

if (S4<dS4){

dS4 = S4

}

Part B

S1 <- S1-dS1

S2 <- S2-dS2

S3 <- (S3+ (dS5*k4) +dS1+dS2)-dS3

S4 <- (S4+Climate[n,4])-dS4

S5 <- (S5+Climate[n,3]+dS4)- dS5

t <- t+timestep

output10[n,1] <- t # Now lets output the data from this time step

output10[n,2] <- S1

output10[n,3] <- S2

output10[n,4] <- S3

output10[n,5] <- S4

}

#3) K4=50

# define the model parameters and the inital conditions

t <- 0

k1 <- 7

k2 <- 2

k3 <- 0.9

k4 <- 50

k5 <- 5

k7 <- 0.1

S1 <- 50000

S2 <- 30000

S3 <- 10

Part B

S4 <- 200

S5 <- 5

ntimesteps=365

timestep = 1

nomelt = 1

output11 <- mat.or.vec(ntimesteps,10)

for(n in 1:ntimesteps){

dS1 <- k1*Climate[n,5]*timestep

dS2 <- k2*Climate[n,5]*timestep

dS3 <- k3*S3*timestep

dS4 <- (k5*Climate[n,5]*timestep)

dS5 <- k7*S5*timestep

if (S4<dS4){

dS4 = S4

}

S1 <- S1-dS1

S2 <- S2-dS2

S3 <- (S3+ (dS5*k4) +dS1+dS2)-dS3

S4 <- (S4+Climate[n,4])-dS4

S5 <- (S5+Climate[n,3]+dS4)- dS5

t <- t+timestep

output11[n,1] <- t # Now lets output the data from this time step

output11[n,2] <- S1

output11[n,3] <- S2

output11[n,4] <- S3

output11[n,5] <- S4

Part B

}

plot(output9[,1],output9[,4],type="l",col="forestgreen", xlab="Time (days) ",ylab="Stage (cm) ",

ylim= c(0,700),

main = "Scenario testing - catchment size parameter (K4) ")

# plots first parameter scenario (K4=1)

points(x = (output10[,1]), y = output10[,4],xlab = "", ylab="", col="chartreuse2",type="l")

# adds second parameter scenario (K4=20)

points(x = (output11[,1]), y = output11[,4],xlab = "", ylab="", col="cyan4",type="l")

# adds second parameter scenario (K4=50)

legend("topleft",cex=0.8,col=c("forestgreen","chartreuse2","cyan4"),lty=1, # add legend

legend=c("K4=1","K4=20", "K4=50"))

#Summary:

# The plot indicates that increasing K4 increases the stage of the river, particularly during those

periods where river flow is dominated

# by catchment runoff, rather than glacial melt.

# See Figure 4

#------------------------------------------------------------------------------------------------------

# Snow melt rate parameter

#------------------------------------------------------------------------------------------------------

# This parameter (K5) dictates the rate that snow pack decreases in depth. The melting of snow is

linearly related to

#temperature (Hock, 2005), this is reflected in the snow melt equation included in the model. The

rate of snow melt may also

# be influenced by cloud cover and topography. However, this has been simplified for the purposes

of this model.

Part B

#1) K5=1

# define the model parameters and the inital conditions

t <- 0

k1 <- 7

k2 <- 2

k3 <- 0.9

k4 <- 20

k5 <- 1

k7 <- 0.1

S1 <- 50000

S2 <- 30000

S3 <- 10

S4 <- 200

S5 <- 5

ntimesteps=365

timestep = 1

nomelt = 1

output12 <- mat.or.vec(ntimesteps,10)

for(n in 1:ntimesteps){

dS1 <- k1*Climate[n,5]*timestep

dS2 <- k2*Climate[n,5]*timestep

dS3 <- k3*S3*timestep

dS4 <- (k5*Climate[n,5]*timestep)

dS5 <- k7*S5*timestep

if (S4<dS4){

dS4 = S4

}

S1 <- S1-dS1

Part B

S2 <- S2-dS2

S3 <- (S3+ (dS5*k4) +dS1+dS2)-dS3

S4 <- (S4+Climate[n,4])-dS4

S5 <- (S5+Climate[n,3]+dS4)- dS5

t <- t+timestep

output12[n,1] <- t # Now lets output the data from this time step

output12[n,2] <- S1

output12[n,3] <- S2

output12[n,4] <- S3

output12[n,5] <- S4

}

#2) K5=5

# define the model parameters and the inital conditions

t <- 0

k1 <- 7

k2 <- 2

k3 <- 0.9

k4 <- 20

k5 <- 5

k7 <- 0.1

S1 <- 50000

S2 <- 30000

S3 <- 10

S4 <- 200

S5 <- 5

Part B

ntimesteps=365

timestep = 1

nomelt = 1

output13 <- mat.or.vec(ntimesteps,10)

for(n in 1:ntimesteps){

dS1 <- k1*Climate[n,5]*timestep

dS2 <- k2*Climate[n,5]*timestep

dS3 <- k3*S3*timestep

dS4 <- (k5*Climate[n,5]*timestep)

dS5 <- k7*S5*timestep

if (S4<dS4){

dS4 = S4

}

S1 <- S1-dS1

S2 <- S2-dS2

S3 <- (S3+ (dS5*k4) +dS1+dS2)-dS3

S4 <- (S4+Climate[n,4])-dS4

S5 <- (S5+Climate[n,3]+dS4)- dS5

t <- t+timestep

output13[n,1] <- t # Now lets output the data from this time step

output13[n,2] <- S1

output13[n,3] <- S2

output13[n,4] <- S3

output13[n,5] <- S4

Part B

}

#3) K5=10

# define the model parameters and the inital conditions

t <- 0

k1 <- 7

k2 <- 2

k3 <- 0.9

k4 <- 20

k5 <- 10

k7 <- 0.1

S1 <- 50000

S2 <- 30000

S3 <- 10

S4 <- 200

S5 <- 5

ntimesteps=365

timestep = 1

nomelt = 1

output14 <- mat.or.vec(ntimesteps,10)

for(n in 1:ntimesteps){

dS1 <- k1*Climate[n,5]*timestep

dS2 <- k2*Climate[n,5]*timestep

dS3 <- k3*S3*timestep

dS4 <- (k5*Climate[n,5]*timestep)

dS5 <- k7*S5*timestep

if (S4<dS4){

Part B

dS4 = S4

}

S1 <- S1-dS1

S2 <- S2-dS2

S3 <- (S3+ (dS5*k4) +dS1+dS2)-dS3

S4 <- (S4+Climate[n,4])-dS4

S5 <- (S5+Climate[n,3]+dS4)- dS5

t <- t+timestep

output14[n,1] <- t # Now lets output the data from this time step

output14[n,2] <- S1

output14[n,3] <- S2

output14[n,4] <- S3

output14[n,5] <- S4

}

plot(output12[,1],output12[,4],type="l",col="darkslategray", xlab="Time (days) ",ylab="Stage (cm) ",

ylim= c(0,500),

main = "Scenario testing - Snow Melt Rate (K5) ")

# plots first parameter scenario (K5=1)

points(x = (output13[,1]), y = output13[,4],xlab = "", ylab="", col="firebrick1",type="l")

# adds second parameter scenario (K5=5)

points(x = (output14[,1]), y = output14[,4],xlab = "", ylab="", col="deepskyblue3",type="l")

# adds second parameter scenario (K5=10)

legend("topleft",cex=0.8,col=c("darkslategray","firebrick1","deepskyblue3"),lty=1, # add legend

Part B

legend=c("K5=1","K5=5", "K5=10"))

#Summary:

#The rate of snow melt has a significant impact on the timing of periods of high flow. The Where the

lag is highest (K5 = 1)

# snowmelt coincides with glacial melt and as a result yields the highest stage. Where K5=10, snow

melt occurs very early in the year and

# the snow pack is diminished well before significant glacial melt takes place.

# See Figure 5

#------------------------------------------------------------------------------------------------------

# run off rate parameter for rain and snow melt

#------------------------------------------------------------------------------------------------------

# This parameter (K7) describes the flahiness of the catchment. This factor will be dependent on the

geology, topogrpahy and

# vegetation cover of the catchment. Changes in K7 will affect the response rate of river stage to

runoff from both snow melt and rain.

#1) K7=1

# define the model parameters and the inital conditions

t <- 0

k1 <- 7

k2 <- 2

k3 <- 0.9

k4 <- 20

k5 <- 5

k7 <- 0.001

S1 <- 50000

S2 <- 30000

S3 <- 10

Part B

S4 <- 200

S5 <- 5

ntimesteps=365

timestep = 1

nomelt = 1

output15 <- mat.or.vec(ntimesteps,10)

for(n in 1:ntimesteps){

dS1 <- k1*Climate[n,5]*timestep

dS2 <- k2*Climate[n,5]*timestep

dS3 <- k3*S3*timestep

dS4 <- (k5*Climate[n,5]*timestep)

dS5 <- k7*S5*timestep

if (S4<dS4){

dS4 = S4

}

S1 <- S1-dS1

S2 <- S2-dS2

S3 <- (S3+ (dS5*k4) +dS1+dS2)-dS3

S4 <- (S4+Climate[n,4])-dS4

S5 <- (S5+Climate[n,3]+dS4)- dS5

t <- t+timestep

output15[n,1] <- t # Now lets output the data from this time step

output15[n,2] <- S1

output15[n,3] <- S2

output15[n,4] <- S3

Part B

output15[n,5] <- S4

}

#2) K5=5

# define the model parameters and the inital conditions

t <- 0

k1 <- 7

k2 <- 2

k3 <- 0.9

k4 <- 20

k5 <- 5

k7 <- 0.01

S1 <- 50000

S2 <- 30000

S3 <- 10

S4 <- 200

S5 <- 5

ntimesteps=365

timestep = 1

nomelt = 1

output16 <- mat.or.vec(ntimesteps,10)

for(n in 1:ntimesteps){

dS1 <- k1*Climate[n,5]*timestep

dS2 <- k2*Climate[n,5]*timestep

dS3 <- k3*S3*timestep

dS4 <- (k5*Climate[n,5]*timestep)

dS5 <- k7*S5*timestep

Part B

if (S4<dS4){

dS4 = S4

}

S1 <- S1-dS1

S2 <- S2-dS2

S3 <- (S3+ (dS5*k4) +dS1+dS2)-dS3

S4 <- (S4+Climate[n,4])-dS4

S5 <- (S5+Climate[n,3]+dS4)- dS5

t <- t+timestep

output16[n,1] <- t # Now lets output the data from this time step

output16[n,2] <- S1

output16[n,3] <- S2

output16[n,4] <- S3

output16[n,5] <- S4

}

#3) K5=10

# define the model parameters and the inital conditions

t <- 0

k1 <- 7

k2 <- 2

k3 <- 0.9

k4 <- 20

k5 <- 5

Part B

k7 <- 0.1

S1 <- 50000

S2 <- 30000

S3 <- 10

S4 <- 200

S5 <- 5

ntimesteps=365

timestep = 1

nomelt = 1

output17 <- mat.or.vec(ntimesteps,10)

for(n in 1:ntimesteps){

dS1 <- k1*Climate[n,5]*timestep

dS2 <- k2*Climate[n,5]*timestep

dS3 <- k3*S3*timestep

dS4 <- (k5*Climate[n,5]*timestep)

dS5 <- k7*S5*timestep

if (S4<dS4){

dS4 = S4

}

S1 <- S1-dS1

S2 <- S2-dS2

S3 <- (S3+ (dS5*k4) +dS1+dS2)-dS3

S4 <- (S4+Climate[n,4])-dS4

S5 <- (S5+Climate[n,3]+dS4)- dS5

t <- t+timestep

output17[n,1] <- t # Now lets output the data from this time step

Part B

output17[n,2] <- S1

output17[n,3] <- S2

output17[n,4] <- S3

output17[n,5] <- S4

}

plot(output15[,1],output15[,4],type="l",col="darkslategray", xlab="Time (days) ",ylab="Stage (cm) ",

ylim= c(0,400),

main = "Scenario testing - Run off rate (K7) ")

grid (NULL,NULL, lty = 7, col = "cornsilk2")

# plots first parameter scenario (K5=1)

points(x = (output16[,1]), y = output16[,4],xlab = "", ylab="", col="firebrick1",type="l")

# adds second parameter scenario (K5=5)

points(x = (output17[,1]), y = output17[,4],xlab = "", ylab="", col="deepskyblue3",type="l")

# adds second parameter scenario (K5=10)

legend("topleft",cex=0.8,col=c("darkslategray","firebrick1","deepskyblue3"),lty=1, # add legend

legend=c("K7=0.001","K7=0.01", "K7=0.1"))

#Summary:

# High values of K7 (0.1) result in more rapid responses to snow melt and rain events. Lower K7

values lead to more gradual release of

# water stored in the catchment and the drop in river stage following a runoff event is considerably

slower.

# See Figure 6

Part B

#-----------------------------------------------------------------------------------------------------

# Conclusion

#-----------------------------------------------------------------------------------------------------

# Despite significant simpliifaction of many catchment processes, the model produces a realistic

prediction of the response of river

# stage to contributing water sources in a glacial catchment.

# References:

# R Core Team (2013). R: A language and environment for statistical computing. R Foundation for

Statistical Computing, Vienna, Austria.

# URL http://www.R-project.org/.

#Hock, R. (2005). Glacier melt: a review of processes and their modelling. Progress in physical

geography, 29(3), 362-391.

# Pellicciotti, F., Brock, B., Strasser, U., Burlando, P., Funk, M., & Corripio, J. (2005). An enhanced

temperature-index glacier melt model

# including the shortwave radiation balance: Development and testing for Haut Glacier d'Arolla,

Switzerland. Journal of Glaciology, 51(175),

# 573-587.

Part B

Figure 1: Complete model

Part B

Figure 2: Scenario test investigating the impact of changing glacial melt parameters

Part B

Figure 3: Scenario test investigating the impact of changing the river flow parameter

Part B

Figure 4: Scenario test investigating the impact of changing the catchment size parameter

Part B

Figure 5: Scenario test investigating the impact of changing the catchment snow melt rate parameter

Part B

Figure 6: Scenario test investigating the impact of changing the catchment runoff rate parameter

【返回顶部】【打印本稿】【关闭本页】

【上一篇】：代写TCP/IP Stack、代做Link Layer Ethernet、代写java/c/c++，Python编程...

【下一篇】：代写TCP/IP Stack、代做Link Layer Ethernet、代写java/c/c++，Python编程...

联系方式

最新辅导

热门辅导

您当前位置：首页 >> Python编程Python编程

代做Data Collection、代写R编程设计、代做Java/Python语言

日期：2019-05-20 11:21

相关文章