Pyrosequencing Data

Color Manipulation Experiment

This is the full set of pyrosequencing data for 371 samples submitted to EpigenDX. Note that two samples (18044 and 18045) were confused in the raw data. Since I couldn’t tell which was which and only had data for one I have removed that sample. I also removed samples entirely when greater than 1/2 of the assay was filled with 0s. I believe that those sample have problems with the assay and it isn’t clear if the non-zero numbers are reliably. Similarly, some sample in the dataset do include 0s that may reflect missing data rather than true 0 values. I remove those in the code below, but it is likely worth checking both with and without 0s in some cases (e.g., some particular CpGs have very low average methylation values and 0s might be real at those sites).

Read and organize the data

First I’m just reading in and organizing the data. For now, I only want to work with the 2017 color manipulation experiment. This file also includes the 2018 color-stressor experiment and the 2018 brain collection experiment, so I’ve removed those here.

library(plyr)
ds <- read.delim("Input_Samples.txt")
ds <- subset(ds, ds$Year == 2017)
ds[ds == 0] <- NA

Generate correlation matrix heatmaps

The next thing I want to do is to make plots that illustrate the degree of correlation between sequential CPG sites for each of the four genes. In this plots I’m ignoring the fact that some individuals are represented by multiple samples, but I don’t think that is really a problem for making this point.

library(reshape2)
library(ggplot2)

## Write a little function to extract upper triangle from correlation matrix
  get_upper_tri <- function(cormat){
    cormat[lower.tri(cormat)] <- NA
    return(cormat)
  }

CRHmat <- as.matrix(ds[, 7:17])
CRHRmat <- as.matrix(ds[, 30:48])
GRmat <- as.matrix(ds[, 49:62])
FKmat <- as.matrix(ds[, 19:29])
mats<-list(CRHmat, CRHRmat, GRmat, FKmat)
labs<-c("CRH","CRHR1","GR","FKBP5")

for(i in 1:4){
    USE <- mats[[i]]

    corm <- round(cor(USE, use = "pairwise.complete.obs"), 2)
    cormUpper <- get_upper_tri(corm)
    
    ## Melt the correlation matrix
      melted_cormat <- melt(cormUpper, na.rm = TRUE)
      colnames(melted_cormat) <- c("CpG_Num1", "CpG_Num2", "value")
    
    ## Plot the heatmap
      gghm <- ggplot(data = melted_cormat, aes(CpG_Num2, CpG_Num1, fill = value)) +
        geom_tile(color = "white") +
        scale_fill_gradient2(low = "red", high = "blue", mid = "white",
            midpoint = 0, limit = c(-1, 1), space = "Lab", name = "Pearson\nCorrelation") +
        theme_minimal() + 
        theme(axis.text.x = element_text(angle = 45, vjust = 1,
              size = 9, hjust = 1)) +
        coord_fixed()
      
      gghm2 <- gghm + 
        geom_text(aes(CpG_Num2, CpG_Num1, label = value), color = "white", size = 1.5) +
        theme(
          axis.title.x = element_blank(),
          axis.title.y = element_blank(),
          panel.grid.major = element_blank(),
          panel.border = element_blank(),
          panel.background = element_blank(),
          axis.ticks = element_blank(),
          legend.justification = c(1, 0),
          legend.position = c(0.6, 0.7),
          legend.direction = "horizontal")+
        guides(fill = guide_colorbar(barwidth = 7, barheight = 1,
              title.position = "top", title.hjust = 0.5))
    
      assign(paste(labs[i],"gghm",sep="_"),gghm2)
    
}
CRH_gghm
CRHR1_gghm
GR_gghm
FKBP5_gghm

Plot overall methylation percentages for each gene

The methylation percentages seem to be very highly correlated among CpG sites within a single assay, both when looking just at consecutive sites (second diagonal row) and when looking at the whole assay wide level. The exceptions are GR -18 and GR -7 to -5, which seem to vary more or less independently from the other sites. Overall I think it suggests that it is probably reasonable to average over all of the sites in an assay (with possible exception of GR) and then to use that average as the main response for each individual. One problem with averaging, though, is that not all individuals have data for each CpG. Before averaging, we should check how methylation values differ across the different sites because very different values will make it trickier to do weighted averages when some data are missing.

  CRHmeth <- rep(NA, ncol(CRHmat))
  CRHsd <- CRHmeth
  for(i in 1:ncol(CRHmat)){
    CRHmeth[i] <- mean(na.omit(CRHmat[,i]))
    CRHsd[i] <- sd(na.omit(CRHmat[,i]))
    }

  CRHRmeth <- rep(NA, ncol(CRHRmat))
  CRHRsd <- CRHRmeth
  for(i in 1:ncol(CRHRmat)){
    CRHRmeth[i] <- mean(na.omit(CRHRmat[,i]))
    CRHRsd[i] <- sd(na.omit(CRHRmat[,i]))
  }
  
  GRmeth <- rep(NA, ncol(GRmat))
  GRsd <- GRmeth
  for(i in 1:ncol(GRmat)){
    GRmeth[i] <- mean(na.omit(GRmat[,i]))
    GRsd[i] <- sd(na.omit(GRmat[,i]))
  }
  
  FKmeth <- rep(NA, ncol(FKmat))
  FKsd <- FKmeth
  for(i in 1:ncol(FKmat)){
    FKmeth[i] <- mean(na.omit(FKmat[,i]))
    FKsd[i] <- sd(na.omit(FKmat[,i]))
  }
  
  plot(1,1,type="n",yaxs="i",xaxs="i",ylim=c(0,100),xlim=c(0,20),yaxt="n",xaxt="n",
       xlab="CpG Sites",ylab="Methylation",bty="n")
  axis(1,seq(-2,22,2),cex=0.9)
  axis(2,seq(-20,120,20),las=2)
  
  CRHo <- .1
  CRHRo <- -.1
  GRo <- .2
  FKo <- -.2
  
  r<-seq(1,20,1)
  
  for(i in 1:length(CRHmeth)){
    lines(rep(r[i],2)+CRHo,c(CRHmeth[i]+CRHsd[i],CRHmeth[i]-CRHsd[i]))
  }
  
  for(i in 1:length(GRmeth)){
    lines(rep(r[i],2)+GRo,c(GRmeth[i]+GRsd[i],GRmeth[i]-GRsd[i]))
  }
  
  for(i in 1:length(CRHRmeth)){
    lines(rep(r[i],2)+CRHRo,c(CRHRmeth[i]+CRHRsd[i],CRHRmeth[i]-CRHRsd[i]))
  }
  
  for(i in 1:length(FKmeth)){
    lines(rep(r[i],2)+FKo,c(FKmeth[i]+FKsd[i],FKmeth[i]-FKsd[i]))
  }
  
  lines(seq(1,length(CRHmeth),1)+CRHo,CRHmeth,lwd=2,col="orange")
  points(seq(1,length(CRHmeth),1)+CRHo,CRHmeth,pch=21,bg="orange")
  
  lines(seq(1,length(GRmeth),1)+GRo,GRmeth,lwd=2,col="lightblue")
  points(seq(1,length(GRmeth),1)+GRo,GRmeth,pch=21,bg="lightblue")
  
  lines(seq(1,length(CRHRmeth),1)+CRHRo,CRHRmeth,lwd=2,col="coral3")
  points(seq(1,length(CRHRmeth),1)+CRHRo,CRHRmeth,pch=21,bg="coral3")
  
  lines(seq(1,length(FKmeth),1)+FKo,FKmeth,lwd=2,col="chartreuse3")
  points(seq(1,length(FKmeth),1)+FKo,FKmeth,pch=21,bg="chartreuse3")
  
  legend("topright",c("Crh","Crhr1","GR","Fkbp5"),lwd=2,pch=21,bty="n",
         pt.bg=c("orange","coral3","lightblue","chartreuse3"),
         col=c("orange","coral3","lightblue","chartreuse3"))

The x-axis here doesn’t really mean anything since it is just showing position within the assay. I’ve overlaid all four genes to look at the variation, but remember that these are from totally different locatoins and the same x-axis value for the four genes here is meaningless. The point here is just to show that for each gene, some individual CpG sites have overall higher or lower average methylation values. That means that if we just do a raw average across all CpG sites, missing values at high methylation CpGs will lower the average more than missing values at low methylation CpG sites (and vice versa). A simple solution to this is to scale all the methylation values within each CpG site so that the average value is always 0 and one standard deviation is always 1 unit. Essentially what this would do is fill in missing values with the average (0) for that CpG. I’ve taken that approach below because it is easy…BUT, I think a better approach would probably be to fit this in a multi-level model with missing values imputed as parameters and then used to generate individual measures with uncertainty to use as predictors in the next level up. That is obviously a bit more complicated, but is probably something that I can think about doing. In the end it may not make too much difference since the number of zeros isn’t huge. I’m also not sure that one is necessarily more or less conservative since substituting average values could increase or decrease our chances of finding effects depending on the details of the data.

Scale and calculate assay average methylation

Here I am going back to the first data frame and scaling every individual column (after excluding 0s as above) so that the mean = 0 and SD = 1. Then I calculate average values for each individual gene, except that for GR I’m analyzing -18, -17 to -8, -7, and -6 to -5 as four separate groups because of the lack of correlation in the heatmaps above.

dss <- as.data.frame(scale(ds[,7:62]))
for(i in 1:nrow(dss)){
  dss$CRH[i] <- mean(na.omit(t(dss[i,1:12])))
  dss$CRHR1[i] <- mean(na.omit(t(dss[i,24:42])))
  dss$FK[i] <- mean(na.omit(t(dss[i,13:23])))
  dss$GRn17t8[i] <- mean(na.omit(t(dss[i,44:53])))
  dss$GRn6t5[i] <- mean(na.omit(t(dss[i,55:56])))
}

dss <- cbind(ds[,1:6],dss)

Plot values by groups

This is just a quick plot to visualize what the values look like at 1st and 3rd capture for control vs. dulled females and for each of the genes (with 4 comparisons for GR). This isn’t the formal analysis and I’m not running any stats here since we are planning to include initial color in formal models. I just want to see what the distribution of data look like and make sure nothing weird is going on before modeling.

library(scales)
plot(1,1,type="n",bty="n",yaxt="n",xaxt="n",xlab="",ylab="Methylation (sd)",xlim=c(0,21.5),ylim=c(-3,3))
axis(1,at=c(-5,1.5,4.5,9.5,12.5,17.5,20.5,30),labels=c("","Pre","Post","Pre","Post","Pre","Post",""))
axis(2,at=seq(-5,5,1))
abline(h=0,lty=2,col="gray40")
nogr <- list(subset(dss$CRH, dss$Capture == 1 & dss$Treatment == "Control"),
             subset(dss$CRH, dss$Capture == 1 & dss$Treatment == "Dulled"),
             subset(dss$CRH, dss$Capture == 3 & dss$Treatment == "Control"),
             subset(dss$CRH, dss$Capture == 3 & dss$Treatment == "Dulled"),
             subset(dss$CRHR1, dss$Capture == 1 & dss$Treatment == "Control"),
             subset(dss$CRHR1, dss$Capture == 1 & dss$Treatment == "Dulled"),
             subset(dss$CRHR1, dss$Capture == 3 & dss$Treatment == "Control"),
             subset(dss$CRHR1, dss$Capture == 3 & dss$Treatment == "Dulled"),
             subset(dss$FK, dss$Capture == 1 & dss$Treatment == "Control"),
             subset(dss$FK, dss$Capture == 1 & dss$Treatment == "Dulled"),
             subset(dss$FK, dss$Capture == 3 & dss$Treatment == "Control"),
             subset(dss$FK, dss$Capture == 3 & dss$Treatment == "Dulled"))
xs<-c(1,2,4,5,9,10,12,13,17,18,20,21)
gap<-0.17
cc<-rep(c("orange","lightblue"),8)
for(i in 1:length(xs)){
  points(rep(xs[i],length(nogr[[i]]))+runif(length(nogr[[i]]),min= -gap, max = gap-.05),nogr[[i]],pch=21,bg=alpha(cc[i],0.5))
  lines(rep(xs[i],2)+.33,c(mean(na.omit(nogr[[i]]))+sd(na.omit(nogr[[i]])),
                           mean(na.omit(nogr[[i]]))-sd(na.omit(nogr[[i]]))))
  points(xs[i]+.33,mean(na.omit(nogr[[i]])),pch=22,col=cc[i],bg="black",cex=1.2)
}
cs<-1.5
xs<- -2.5
text(3,xs,"Crh",cex=cs)
text(11,xs,"Crhr1",cex=cs)
text(19,xs,"Fkbp5",cex=cs)

legend("topright",c("Control","Dulled"),pch=21,pt.bg=c("orange","lightblue"),bty="n")


plot(1,1,type="n",bty="n",yaxt="n",xaxt="n",xlab="",ylab="Methylation (sd)",xlim=c(0,30),ylim=c(-3,3))
axis(1,at=c(-5,1.5,4.5,9.5,12.5,17.5,20.5,25.5,28.5,35),
     labels=c("","Pre","Post","Pre","Post","Pre","Post","Pre","Post",""))
axis(2,at=seq(-5,5,1))
abline(h=0,lty=2,col="gray40")
nogr <- list(subset(dss$GRn18, dss$Capture == 1 & dss$Treatment == "Control"),
             subset(dss$GRn18, dss$Capture == 1 & dss$Treatment == "Dulled"),
             subset(dss$GRn18, dss$Capture == 3 & dss$Treatment == "Control"),
             subset(dss$GRn18, dss$Capture == 3 & dss$Treatment == "Dulled"),
             subset(dss$GRn17t8, dss$Capture == 1 & dss$Treatment == "Control"),
             subset(dss$GRn17t8, dss$Capture == 1 & dss$Treatment == "Dulled"),
             subset(dss$GRn17t8, dss$Capture == 3 & dss$Treatment == "Control"),
             subset(dss$GRn17t8, dss$Capture == 3 & dss$Treatment == "Dulled"),
             subset(dss$GRn7, dss$Capture == 1 & dss$Treatment == "Control"),
             subset(dss$GRn7, dss$Capture == 1 & dss$Treatment == "Dulled"),
             subset(dss$GRn7, dss$Capture == 3 & dss$Treatment == "Control"),
             subset(dss$GRn7, dss$Capture == 3 & dss$Treatment == "Dulled"),
             subset(dss$GRn6t5, dss$Capture == 1 & dss$Treatment == "Control"),
             subset(dss$GRn6t5, dss$Capture == 1 & dss$Treatment == "Dulled"),
             subset(dss$GRn6t5, dss$Capture == 3 & dss$Treatment == "Control"),
             subset(dss$GRn6t5, dss$Capture == 3 & dss$Treatment == "Dulled"))
xs<-c(1,2,4,5,9,10,12,13,17,18,20,21,25,26,28,29)
gap<-0.17
cc<-rep(c("orange","lightblue"),8)
for(i in 1:length(xs)){
  points(rep(xs[i],length(nogr[[i]]))+runif(length(nogr[[i]]),min= -gap, max = gap-.05),nogr[[i]],pch=21,bg=alpha(cc[i],0.5))
  lines(rep(xs[i],2)+.33,c(mean(na.omit(nogr[[i]]))+sd(na.omit(nogr[[i]])),
                           mean(na.omit(nogr[[i]]))-sd(na.omit(nogr[[i]]))))
  points(xs[i]+.33,mean(na.omit(nogr[[i]])),pch=22,col=cc[i],bg="black",cex=1.2)
}
cs<-1.5
xs<- -2.5
text(3,xs,"GRn8",cex=cs)
text(11,xs,"GRn17t8",cex=cs)
text(19,xs,"GRn7",cex=cs)
text(27,xs,"GRn6t5",cex=cs)

legend("topright",c("Control","Dulled"),pch=21,pt.bg=c("orange","lightblue"),bty="n")

Nothing super obvious seems to be going on here in terms of changes, but this doesn’t yet account for initial coloration or initial methylation status. One weird thing that is visible here is that GRn7 (which was weird already in not being spatially correlated) has a very strange bimodal distribution. We saw some CpGs like this in the NGS data too. I’m not sure if it is some kind of artifact or a real case of where methylation tends to be bimodal for some reason. It is possible that more individual CpGs are like this too since I haven’t plotted these distributions for every individual site and it would get smoothed over by the averages. One other interesting thing is that some of the GR sites do seem to have an overall seasonal change in methylation, although it doesn’t seem like that slope varies by treatment.

Fit models

Now I’m going to fit the models that we proposed in the pre-registration for each of the four genes (fitting the 4 GR measures separately). The structure of these models is like this:

Methylation ~ Capture# x Treatment x Brightness + (1|Individual)

In the tables below each bold heading is one model and the coefficients, CI, and p-value are listed for each model with descriptions of the random effects below. I can make point estimates with confidence intervals for all differences or plot relationships (e.g. color vs. methylation at 3rd capture for dull vs. control), but it doesn’t look like there is much to see.

dss$Band <- as.factor(dss$Band)
dss$Capture <- as.factor(dss$Capture)
dss$BibB1 <- scale(dss$BibB1)

library(lme4)
library(lmerTest)
library(sjPlot)

CRH.m <- lmer(CRH ~ Capture*Treatment*BibB1 + (1|Band), data = dss)
CRHR1.m <- lmer(CRHR1 ~ Capture*Treatment*BibB1 + (1|Band), data = dss)
FK.m <- lmer(FK ~ Capture*Treatment*BibB1 + (1|Band), data = dss)

GRn18.m <- lmer(GRn18 ~ Capture*Treatment*BibB1 + (1|Band), data = dss)
GRn17t8.m <- lmer(GRn17t8 ~ Capture*Treatment*BibB1 + (1|Band), data = dss)
GRn7.m <- lmer(GRn7 ~ Capture*Treatment*BibB1 + (1|Band), data = dss)
GRn6t5.m <- lmer(GRn6t5 ~ Capture*Treatment*BibB1 + (1|Band), data = dss)

tab_model(CRH.m,CRHR1.m,FK.m,p.style="asterisk")

	CRH		CRHR 1		FK
Predictors	Estimates	CI	Estimates	CI	Estimates	CI
(Intercept)	0.17	-0.12 – 0.45	-0.23	-0.52 – 0.06	0.15	-0.15 – 0.44
Capture 3	-0.08	-0.26 – 0.10	0.32 ^**	0.14 – 0.51	-0.11 ^**	-0.18 – -0.03
Dulled	-0.22	-0.62 – 0.19	0.26	-0.16 – 0.67	-0.13	-0.56 – 0.30
Bib B 1	0.19	-0.08 – 0.45	0.05	-0.22 – 0.32	-0.08	-0.35 – 0.20
Capture3:TreatmentDulled	-0.03	-0.30 – 0.24	-0.19	-0.46 – 0.08	-0.10	-0.21 – 0.01
Capture3:BibB1	-0.07	-0.24 – 0.10	0.00	-0.17 – 0.17	-0.00	-0.07 – 0.07
TreatmentDulled:BibB1	-0.13	-0.55 – 0.28	0.02	-0.41 – 0.45	0.15	-0.28 – 0.59
Capture3:TreatmentDulled:BibB1	0.04	-0.24 – 0.31	-0.10	-0.38 – 0.17	-0.03	-0.14 – 0.08
Random Effects
σ²	0.12		0.12		0.02
τ₀₀	0.61 _Band		0.65 _Band		0.80 _Band
ICC	0.84 _Band		0.85 _Band		0.98 _Band
Observations	118		118		118
Marginal R² / Conditional R²	0.039 / 0.845		0.033 / 0.851		0.022 / 0.979
p<0.05 p<0.01 * p<0.001

tab_model(GRn18.m,GRn17t8.m,GRn7.m,GRn6t5.m,p.style="asterisk")

	G Rn 18		G Rn 17 t 8		G Rn 7		G Rn 6 t 5
Predictors	Estimates	CI	Estimates	CI	Estimates	CI	Estimates	CI
(Intercept)	-0.06	-0.43 – 0.32	-0.21	-0.48 – 0.06	-0.17	-0.52 – 0.17	-0.66 ^***	-0.91 – -0.42
Capture 3	0.24	-0.07 – 0.55	0.14	-0.05 – 0.34	0.25 ^**	0.10 – 0.41	0.94 ^***	0.71 – 1.18
Dulled	-0.19	-0.71 – 0.33	0.27	-0.12 – 0.65	0.11	-0.39 – 0.61	0.54 ^**	0.19 – 0.89
Bib B 1	0.10	-0.25 – 0.46	-0.10	-0.35 – 0.16	0.07	-0.25 – 0.40	-0.08	-0.30 – 0.15
Capture3:TreatmentDulled	0.29	-0.17 – 0.74	0.09	-0.20 – 0.37	0.01	-0.21 – 0.24	-0.09	-0.44 – 0.26
Capture3:BibB1	0.11	-0.18 – 0.40	0.02	-0.16 – 0.20	-0.07	-0.22 – 0.08	0.03	-0.19 – 0.26
TreatmentDulled:BibB1	-0.14	-0.66 – 0.39	0.15	-0.24 – 0.55	-0.15	-0.66 – 0.36	0.16	-0.20 – 0.52
Capture3:TreatmentDulled:BibB1	-0.11	-0.55 – 0.34	0.01	-0.28 – 0.30	0.12	-0.12 – 0.35	-0.30	-0.66 – 0.06
Random Effects
σ²	0.28		0.13		0.08		0.21
τ₀₀	0.74 _Band		0.53 _Band		1.01 _Band		0.33 _Band
ICC	0.72 _Band		0.80 _Band		0.93 _Band		0.62 _Band
Observations	100		118		116		118
Marginal R² / Conditional R²	0.052 / 0.737		0.056 / 0.813		0.020 / 0.929		0.325 / 0.741
p<0.05 p<0.01 * p<0.001

So…there doesn’t seem to be much going on. A few of the values change between 1st and 3rd capture but don’t differ between groups. There is a main effect of dulling for GRn6t5, but that seems be driven largely by the fact that the dulled group had higher pre and post treatment methylation values rather than any kind of different change between captures across the groups. Of course there are a lot of other things we can look at with these data, but for these main effects of coloring there doesn’t seem to be much happening. One other interesting thing here is that the ICC values for individuals seem to be very high. Perhaps that suggests that these values are really just not changing much on the time scale that we are looking at here. I’m plotting individual pairs of 1st to 3rd capture values (ignoring treatment for now) below to take a look at this.

Individual consistency in methylation

These are just some plots to ask how similar individuals are in their 1st and 3rd capture methylation scores for the different assays that we are analyzing above.

Note that I am plotting these on the standardized scale for convenience so that the mean is always 0 and SD is always 1 unit. In reality these different assays vary between each other in the average methylation (see plot above).

library(plyr)
dssj <- dss[,c("Band","Capture","Treatment","CRH","CRHR1","FK","GRn18","GRn17t8","GRn7","GRn6t5")]
dss1 <- subset(dssj,dssj$Capture == 1)
dss3 <- subset(dssj,dssj$Capture == 3)
dss3 <- dss3[,c("Band","CRH","CRHR1","FK","GRn18","GRn17t8","GRn7","GRn6t5")]
colnames(dss3) <- c("Band",paste(colnames(dss3)[2:8],".3",sep=""))

d.j <- join(dss1,dss3,"Band")

palette(c("lightblue","orange"))
p.comp<-function(x,y,cols,tit="GENE",X="Pre-Methylation",Y="Post-Methylation"){
  pal<-c()
  plot(1,1,type="n",bty="n",ylab=Y,xlab=X,xaxt="n",yaxt="n",xaxs="i",yaxs="i",
       xlim=c(-3,3),ylim=c(-3,3),main=tit)
  abline(lm(y~x))
  axis(1,seq(-5,5))
  axis(2,seq(-5,5),las=2)
  points(x,y,pch=21,bg=cols)
  text(2,-2.5,paste("r = ",round(cor(x,y,use="pairwise.complete.obs"),2)))
}
p.comp(d.j$CRH,d.j$CRH.3,d.j$Treatment,tit="Crh")
p.comp(d.j$CRHR1,d.j$CRHR1.3,d.j$Treatment,tit="Crhr1")
p.comp(d.j$FK,d.j$FK.3,d.j$Treatment,tit="Fkbp5")
p.comp(d.j$GRn18,d.j$GRn18.3,d.j$Treatment,tit="GRn18")
p.comp(d.j$GRn17t8,d.j$GRn17t8.3,d.j$Treatment,tit="GRn17t8")
p.comp(d.j$GRn7,d.j$GRn7.3,d.j$Treatment,tit="GRn7")
p.comp(d.j$GRn6t5,d.j$GRn6t5.3,d.j$Treatment,tit="GRn6t5")

So these plots match up pretty well with the very high ICC estimates from the mixed models above. It seems that percent methylation is very stable within individuals going from 1st to 3rd captures. Most of these are looking at averages across a bunch of CpGs so maybe it isn’t that surprising. I can certainly repeat any of these analyses at an individual CpG level and I expect there would be more variation there if only just from random noise, but remember that they tend to be all very highly correlated with each other within an assay. In one sense, this result is encouraging because it suggests a lot of confidence in the methylation scores themselves (they are clearly measuring something about individuals), but it may mean that we aren’t likely to see changes on this short time scale. It is also interesting to note that the CpG with a bimodal distribution (GRn7) is completely consistent here with maybe only two individuals switching from the low to the high modal group between captures.

Color Stressor Experiment

I’m going to run through the same sequence of steps described above for the 2018 color-stressor experiment here. I won’t describe as much since the logic is basically the same. The only differences here are that we have a 2x3 treatment (dull vs. control and stress vs. predator vs. control) and we have samples for 1st, 2nd, and 3rd capture.

Read and organize the data

Still excluding zeros. This time including only 2018 color-stress data.

ds <- read.delim("Input_Samples.txt")
ds <- subset(ds, ds$Year == 2018)
ds <- subset(ds, ds$Treatment != "Blood" & ds$Treatment != "Hippocampus" & ds$Treatment != "Hypothalamus")
ds[ds == 0] <- NA
for(i in 1:nrow(ds)){
  ds$Treat1[i] <- strsplit(as.character(ds$Treatment[i]),"_")[[1]][1]
  ds$Treat2[i] <- strsplit(as.character(ds$Treatment[i]),"_")[[1]][2]
}

Generate correlation matrix heatmaps

Same as in the first experiment above. The plots below look very similar overall to the correlation structure we saw in the 2017 experiment.

library(reshape2)
library(ggplot2)

## Write a little function to extract upper triangle from correlation matrix
  get_upper_tri <- function(cormat){
    cormat[lower.tri(cormat)] <- NA
    return(cormat)
  }

CRHmat <- as.matrix(ds[, 7:17])
CRHRmat <- as.matrix(ds[, 30:48])
GRmat <- as.matrix(ds[, 49:62])
FKmat <- as.matrix(ds[, 19:29])
mats<-list(CRHmat, CRHRmat, GRmat, FKmat)
labs<-c("CRH","CRHR1","GR","FKBP5")

for(i in 1:4){
    USE <- mats[[i]]

    corm <- round(cor(USE, use = "pairwise.complete.obs"), 2)
    cormUpper <- get_upper_tri(corm)
    
    ## Melt the correlation matrix
      melted_cormat <- melt(cormUpper, na.rm = TRUE)
      colnames(melted_cormat) <- c("CpG_Num1", "CpG_Num2", "value")
    
    ## Plot the heatmap
      gghm <- ggplot(data = melted_cormat, aes(CpG_Num2, CpG_Num1, fill = value)) +
        geom_tile(color = "white") +
        scale_fill_gradient2(low = "red", high = "blue", mid = "white",
            midpoint = 0, limit = c(-1, 1), space = "Lab", name = "Pearson\nCorrelation") +
        theme_minimal() + 
        theme(axis.text.x = element_text(angle = 45, vjust = 1,
              size = 9, hjust = 1)) +
        coord_fixed()
      
      gghm2 <- gghm + 
        geom_text(aes(CpG_Num2, CpG_Num1, label = value), color = "white", size = 1.5) +
        theme(
          axis.title.x = element_blank(),
          axis.title.y = element_blank(),
          panel.grid.major = element_blank(),
          panel.border = element_blank(),
          panel.background = element_blank(),
          axis.ticks = element_blank(),
          legend.justification = c(1, 0),
          legend.position = c(0.6, 0.7),
          legend.direction = "horizontal")+
        guides(fill = guide_colorbar(barwidth = 7, barheight = 1,
              title.position = "top", title.hjust = 0.5))
    
      assign(paste(labs[i],"gghm",sep="_"),gghm2)
    
}
CRH_gghm
CRHR1_gghm
GR_gghm
FKBP5_gghm

Plot overall methylation percentages for each gene

This is also plotted exactly the same as above. The methylation patterns here are shockingly similar to the 2017 data with an entirely separate dataset. I guess this is also encouraging for how well the method worked if nothing else…

  CRHmeth <- rep(NA, ncol(CRHmat))
  CRHsd <- CRHmeth
  for(i in 1:ncol(CRHmat)){
    CRHmeth[i] <- mean(na.omit(CRHmat[,i]))
    CRHsd[i] <- sd(na.omit(CRHmat[,i]))
    }

  CRHRmeth <- rep(NA, ncol(CRHRmat))
  CRHRsd <- CRHRmeth
  for(i in 1:ncol(CRHRmat)){
    CRHRmeth[i] <- mean(na.omit(CRHRmat[,i]))
    CRHRsd[i] <- sd(na.omit(CRHRmat[,i]))
  }
  
  GRmeth <- rep(NA, ncol(GRmat))
  GRsd <- GRmeth
  for(i in 1:ncol(GRmat)){
    GRmeth[i] <- mean(na.omit(GRmat[,i]))
    GRsd[i] <- sd(na.omit(GRmat[,i]))
  }
  
  FKmeth <- rep(NA, ncol(FKmat))
  FKsd <- FKmeth
  for(i in 1:ncol(FKmat)){
    FKmeth[i] <- mean(na.omit(FKmat[,i]))
    FKsd[i] <- sd(na.omit(FKmat[,i]))
  }
  
  plot(1,1,type="n",yaxs="i",xaxs="i",ylim=c(0,100),xlim=c(0,20),yaxt="n",xaxt="n",
       xlab="CpG Sites",ylab="Methylation",bty="n")
  axis(1,seq(-2,22,2),cex=0.9)
  axis(2,seq(-20,120,20),las=2)
  
  CRHo <- .1
  CRHRo <- -.1
  GRo <- .2
  FKo <- -.2
  
  r<-seq(1,20,1)
  
  for(i in 1:length(CRHmeth)){
    lines(rep(r[i],2)+CRHo,c(CRHmeth[i]+CRHsd[i],CRHmeth[i]-CRHsd[i]))
  }
  
  for(i in 1:length(GRmeth)){
    lines(rep(r[i],2)+GRo,c(GRmeth[i]+GRsd[i],GRmeth[i]-GRsd[i]))
  }
  
  for(i in 1:length(CRHRmeth)){
    lines(rep(r[i],2)+CRHRo,c(CRHRmeth[i]+CRHRsd[i],CRHRmeth[i]-CRHRsd[i]))
  }
  
  for(i in 1:length(FKmeth)){
    lines(rep(r[i],2)+FKo,c(FKmeth[i]+FKsd[i],FKmeth[i]-FKsd[i]))
  }
  
  lines(seq(1,length(CRHmeth),1)+CRHo,CRHmeth,lwd=2,col="orange")
  points(seq(1,length(CRHmeth),1)+CRHo,CRHmeth,pch=21,bg="orange")
  
  lines(seq(1,length(GRmeth),1)+GRo,GRmeth,lwd=2,col="lightblue")
  points(seq(1,length(GRmeth),1)+GRo,GRmeth,pch=21,bg="lightblue")
  
  lines(seq(1,length(CRHRmeth),1)+CRHRo,CRHRmeth,lwd=2,col="coral3")
  points(seq(1,length(CRHRmeth),1)+CRHRo,CRHRmeth,pch=21,bg="coral3")
  
  lines(seq(1,length(FKmeth),1)+FKo,FKmeth,lwd=2,col="chartreuse3")
  points(seq(1,length(FKmeth),1)+FKo,FKmeth,pch=21,bg="chartreuse3")
  
  legend("topright",c("Crh","Crhr1","GR","Fkbp5"),lwd=2,pch=21,bty="n",
         pt.bg=c("orange","coral3","lightblue","chartreuse3"),
         col=c("orange","coral3","lightblue","chartreuse3"))

Scale and calculate assay average methylation

Nothing different to see here. Just doing the same as above to prepare for plotting.

dss <- as.data.frame(scale(ds[,7:62]))
for(i in 1:nrow(dss)){
  dss$CRH[i] <- mean(na.omit(t(dss[i,1:12])))
  dss$CRHR1[i] <- mean(na.omit(t(dss[i,24:42])))
  dss$FK[i] <- mean(na.omit(t(dss[i,13:23])))
  dss$GRn17t8[i] <- mean(na.omit(t(dss[i,44:53])))
  dss$GRn6t5[i] <- mean(na.omit(t(dss[i,55:56])))
}

dss <- cbind(ds[,1:6],dss,ds[,63:64])

Plot values by groups

Here I have to diverge a bit since we have three captures and a 2x3 treatment structure. Because there are so many groups, I’m going to plot just the means with lines and error bars for each group.

  dsx <- dss
  gr <- c("CRH","CRHR1","FK","GRn18","GRn17t8","GRn7","GRn6t5")
  for(i in 1:7){
    dsx$GG <- dsx[,c(gr[i])]
    groups<-list(subset(dsx$GG, dsx$Capture == 1 & dsx$Treatment == "Control_Control"),
               subset(dsx$GG, dsx$Capture == 1 & dsx$Treatment == "Control_Stress"),
               subset(dsx$GG, dsx$Capture == 1 & dsx$Treatment == "Control_Predator"),
               subset(dsx$GG, dsx$Capture == 1 & dsx$Treatment == "Dull_Control"),
               subset(dsx$GG, dsx$Capture == 1 & dsx$Treatment == "Dull_Stress"),
               subset(dsx$GG, dsx$Capture == 1 & dsx$Treatment == "Dull_Predator"),
               subset(dsx$GG, dsx$Capture == 2 & dsx$Treatment == "Control_Control"),
               subset(dsx$GG, dsx$Capture == 2 & dsx$Treatment == "Control_Stress"),
               subset(dsx$GG, dsx$Capture == 2 & dsx$Treatment == "Control_Predator"),
               subset(dsx$GG, dsx$Capture == 2 & dsx$Treatment == "Dull_Control"),
               subset(dsx$GG, dsx$Capture == 2 & dsx$Treatment == "Dull_Stress"),
               subset(dsx$GG, dsx$Capture == 2 & dsx$Treatment == "Dull_Predator"),
               subset(dsx$GG, dsx$Capture == 3 & dsx$Treatment == "Control_Control"),
               subset(dsx$GG, dsx$Capture == 3 & dsx$Treatment == "Control_Stress"),
               subset(dsx$GG, dsx$Capture == 3 & dsx$Treatment == "Control_Predator"),
               subset(dsx$GG, dsx$Capture == 3 & dsx$Treatment == "Dull_Control"),
               subset(dsx$GG, dsx$Capture == 3 & dsx$Treatment == "Dull_Stress"),
               subset(dsx$GG, dsx$Capture == 3 & dsx$Treatment == "Dull_Predator"))
    mus <- rep(NA, length(groups))
    sds <- rep(NA, length(groups))
    for(k in 1:length(mus)){
      mus[k] <- mean(na.omit(groups[[k]]))
      sds[k] <- sd(na.omit(groups[[k]]))
    }
    
    plot(1,1,type="n",xaxt="n",yaxt="n",bty="n",yaxs="i",xaxs="i",ylab="Methylation (sd)",
        xlab="Capture number",main=gr[i],ylim=c(-2,2),xlim=c(.5,4.5))
    abline(h=0,lty=2,col="gray40")
    axis(1,at=c(-2,1,2,3,10),labels=c("","1","2","3",""))
    axis(2,seq(-10,10,1),las=2)
    
    offs<-c(-.05,-.03,-.01,.01,.03,.05)
    cols<-c("darkgoldenrod1","darkorange2","coral3","lightblue","chartreuse3","darkorchid4")
    for(j in 1:6){
      lines(c(1,2,3)+offs[j],c(mus[j],mus[j+6],mus[j+12]),lwd=2,col=cols[j])
        lines(rep(1,2)+offs[j],c(mus[j]+sds[j],mus[j]-sds[j]))
        lines(rep(2,2)+offs[j],c(mus[j+6]+sds[j+6],mus[j+6]-sds[j+6]))
        lines(rep(3,2)+offs[j],c(mus[j+12]+sds[j+12],mus[j+12]-sds[j+12]))
      points(c(1,2,3)+offs[j],c(mus[j],mus[j+6],mus[j+12]),pch=21,bg=cols[j])
    }
    legend("bottomright",lwd=2,col=cols,c("Control Control","Control Stress","Control Pred",
                           "Dull Control","Dull Stress","Dull Pred"),pch=21,pt.bg=cols,bty="n")
    
  }

From these plots it doesn’t seem like much is going on. We see the same seasonal shift in GRn6t5, which is interesting, but again it doesn’t differ by treatment group. I colored all the control groups in warm colors and dull groups in cool colors and it also doesn’t seem like there is much evidence for differences in those groups at either time point. Next I’ll fit the models like the ones fit for the color manipulation experiment above.

Fit models

I’m doing this just the same as above except that we have the two different treatments. The full model is like this:

Methylation ~ Capture x Treat1 x Treat2 x BibB1 + (1|Band)

dss$Band <- as.factor(dss$Band)
dss$Capture <- as.factor(dss$Capture)
dss$BibB1 <- scale(dss$BibB1)
dss$Treat1 <- as.factor(dss$Treat1)
dss$Treat2 <- as.factor(dss$Treat2)

library(lme4)
library(lmerTest)
library(sjPlot)

CRH.m <- lmer(CRH ~ Capture*Treat1*Treat2 + (1|Band), data = dss)
CRHR1.m <- lmer(CRHR1 ~ Capture*Treat1*Treat2 + (1|Band), data = dss)
FK.m <- lmer(FK ~ Capture*Treat1*Treat2 + (1|Band), data = dss)

GRn18.m <- lmer(GRn18 ~ Capture*Treat1*Treat2 + (1|Band), data = dss)
GRn17t8.m <- lmer(GRn17t8 ~ Capture*Treat1*Treat2 + (1|Band), data = dss)
GRn7.m <- lmer(GRn7 ~ Capture*Treat1*Treat2 + (1|Band), data = dss)
GRn6t5.m <- lmer(GRn6t5 ~ Capture*Treat1*Treat2 + (1|Band), data = dss)

tab_model(CRH.m,CRHR1.m,FK.m,p.style="asterisk")

	CRH		CRHR 1		FK
Predictors	Estimates	CI	Estimates	CI	Estimates	CI
(Intercept)	0.16	-0.30 – 0.62	-0.01	-0.62 – 0.60	-0.11	-0.63 – 0.41
Capture 2	-0.20	-0.44 – 0.05	-0.12	-0.33 – 0.09	-0.01	-0.14 – 0.12
Capture 3	-0.02	-0.25 – 0.20	-0.03	-0.25 – 0.18	-0.06	-0.19 – 0.07
Dull	-0.16	-0.86 – 0.54	0.19	-0.74 – 1.12	0.32	-0.48 – 1.11
Predator	-0.24	-0.90 – 0.43	0.02	-0.86 – 0.90	0.05	-0.70 – 0.80
Stress	-0.04	-0.70 – 0.63	-0.26	-1.12 – 0.60	-0.20	-0.93 – 0.54
Capture2:Treat1Dull	0.02	-0.39 – 0.42	0.04	-0.28 – 0.37	0.10	-0.09 – 0.29
Capture3:Treat1Dull	0.12	-0.27 – 0.50	-0.09	-0.41 – 0.23	0.08	-0.11 – 0.27
Capture2:Treat2Predator	0.10	-0.28 – 0.48	-0.10	-0.39 – 0.19	0.10	-0.08 – 0.28
Capture3:Treat2Predator	0.00	-0.36 – 0.37	-0.14	-0.45 – 0.17	0.13	-0.06 – 0.32
Capture2:Treat2Stress	-0.02	-0.45 – 0.40	0.03	-0.27 – 0.34	-0.03	-0.21 – 0.16
Capture3:Treat2Stress	-0.20	-0.61 – 0.21	0.06	-0.25 – 0.37	0.01	-0.18 – 0.20
Treat1Dull:Treat2Predator	0.74	-0.24 – 1.72	0.18	-1.12 – 1.47	0.11	-1.00 – 1.21
Treat1Dull:Treat2Stress	-0.33	-1.31 – 0.65	0.39	-0.89 – 1.68	-0.15	-1.25 – 0.94
Capture2:Treat1Dull:Treat2Predator	-0.09	-0.71 – 0.52	-0.01	-0.44 – 0.43	-0.17	-0.43 – 0.10
Capture3:Treat1Dull:Treat2Predator	0.00	-0.56 – 0.57	0.20	-0.26 – 0.65	-0.19	-0.47 – 0.09
Capture2:Treat1Dull:Treat2Stress	0.22	-0.40 – 0.85	0.07	-0.39 – 0.53	-0.07	-0.34 – 0.21
Capture3:Treat1Dull:Treat2Stress	0.17	-0.43 – 0.77	0.40	-0.06 – 0.86	-0.04	-0.31 – 0.23
Random Effects
σ²	0.06		0.06		0.02
τ₀₀	0.59 _Band		1.10 _Band		0.82 _Band
ICC	0.91 _Band		0.95 _Band		0.97 _Band
Observations	127		172		176
Marginal R² / Conditional R²	0.098 / 0.915		0.056 / 0.953		0.063 / 0.976
p<0.05 p<0.01 * p<0.001

tab_model(GRn18.m,GRn17t8.m,GRn7.m,GRn6t5.m,p.style="asterisk")

	G Rn 18		G Rn 17 t 8		G Rn 7		G Rn 6 t 5
Predictors	Estimates	CI	Estimates	CI	Estimates	CI	Estimates	CI
(Intercept)	-0.37	-0.95 – 0.22	-0.09	-0.51 – 0.32	0.27	-0.28 – 0.83	-0.14	-0.64 – 0.36
Capture 2	0.33 ^*	0.05 – 0.60	0.46 ^*	0.10 – 0.83	0.07	-0.07 – 0.20	0.41 ^*	0.03 – 0.80
Capture 3	0.48 ^**	0.20 – 0.75	0.57 ^**	0.21 – 0.94	0.08	-0.06 – 0.22	0.64 ^**	0.26 – 1.03
Dull	0.35	-0.55 – 1.24	-0.10	-0.73 – 0.53	0.26	-0.59 – 1.11	-0.32	-1.08 – 0.45
Predator	-0.08	-0.92 – 0.77	-0.51	-1.11 – 0.09	-0.28	-1.09 – 0.52	-0.72	-1.44 – 0.00
Stress	0.17	-0.73 – 1.06	0.29	-0.30 – 0.87	-0.92 ^*	-1.72 – -0.11	0.08	-0.62 – 0.79
Capture2:Treat1Dull	-0.37	-0.79 – 0.05	0.05	-0.50 – 0.60	-0.11	-0.32 – 0.10	0.06	-0.52 – 0.64
Capture3:Treat1Dull	-0.37	-0.78 – 0.05	-0.01	-0.56 – 0.54	-0.03	-0.24 – 0.18	-0.00	-0.59 – 0.58
Capture2:Treat2Predator	0.00	-0.38 – 0.39	-0.04	-0.55 – 0.47	-0.09	-0.28 – 0.10	0.23	-0.30 – 0.77
Capture3:Treat2Predator	-0.04	-0.45 – 0.37	-0.07	-0.62 – 0.47	-0.01	-0.22 – 0.19	0.23	-0.34 – 0.80
Capture2:Treat2Stress	0.17	-0.28 – 0.62	-0.16	-0.69 – 0.37	-0.05	-0.26 – 0.17	0.01	-0.55 – 0.57
Capture3:Treat2Stress	0.08	-0.37 – 0.53	-0.24	-0.77 – 0.28	0.00	-0.21 – 0.21	0.26	-0.30 – 0.82
Treat1Dull:Treat2Predator	-0.14	-1.46 – 1.18	0.63	-0.25 – 1.51	-0.52	-1.71 – 0.67	1.00	-0.06 – 2.06
Treat1Dull:Treat2Stress	0.13	-1.18 – 1.43	-0.41	-1.28 – 0.45	0.17	-1.02 – 1.35	0.22	-0.83 – 1.26
Capture2:Treat1Dull:Treat2Predator	0.30	-0.34 – 0.93	0.02	-0.74 – 0.78	0.20	-0.09 – 0.49	-0.26	-1.06 – 0.54
Capture3:Treat1Dull:Treat2Predator	0.59	-0.08 – 1.27	-0.33	-1.11 – 0.46	0.06	-0.24 – 0.36	0.07	-0.76 – 0.91
Capture2:Treat1Dull:Treat2Stress	0.05	-0.58 – 0.68	-0.28	-1.06 – 0.49	0.19	-0.11 – 0.50	-0.55	-1.37 – 0.27
Capture3:Treat1Dull:Treat2Stress	-0.25	-0.88 – 0.38	-0.54	-1.32 – 0.23	0.28	-0.02 – 0.58	-0.90 ^*	-1.72 – -0.08
Random Effects
σ²	0.10		0.18		0.03		0.20
τ₀₀	0.95 _Band		0.34 _Band		0.95 _Band		0.56 _Band
ICC	0.90 _Band		0.65 _Band		0.97 _Band		0.74 _Band
Observations	151		176		172		176
Marginal R² / Conditional R²	0.058 / 0.908		0.192 / 0.721		0.121 / 0.977		0.180 / 0.785
p<0.05 p<0.01 * p<0.001

These model tables are harder to interpret without plotting point estimates with confidence intervals or estimating specific differences. I can go through and do that for each group, but overall it doesn’t look like there are any real differences between the treatment groups at capture 2 or 3. Just like above the ICC values for ID are very high, which makes it look like methylation is not changing much within individuals over this time frame.

Individual consistency in methylation

I’m making these basically the same as before except now that there are three captures I’m comparing 1 to 2 and 2 to 3 for each panel. I could also compare 1 to 3…but doesn’t seem totally necessary for now.

library(plyr)
dssj <- dss[,c("Band","Capture","Treatment","CRH","CRHR1","FK","GRn18","GRn17t8","GRn7","GRn6t5")]
dss1 <- subset(dssj,dssj$Capture == 1)
dss2 <- subset(dssj,dssj$Capture == 2)
dss2 <- dss2[,c("Band","CRH","CRHR1","FK","GRn18","GRn17t8","GRn7","GRn6t5")]
colnames(dss2) <- c("Band",paste(colnames(dss2)[2:8],".2",sep=""))
dss3 <- subset(dssj,dssj$Capture == 3)
dss3 <- dss3[,c("Band","CRH","CRHR1","FK","GRn18","GRn17t8","GRn7","GRn6t5")]
colnames(dss3) <- c("Band",paste(colnames(dss3)[2:8],".3",sep=""))

d.j <- join(dss1,dss2,"Band")
d.j <- join(d.j,dss3,"Band")

palette(c("lightblue","orange"))
p.comp<-function(x,y,cols,tit="GENE",X="Pre-Methylation",Y="Post-Methylation"){
  pal<-c()
  plot(1,1,type="n",bty="n",ylab=Y,xlab=X,xaxt="n",yaxt="n",xaxs="i",yaxs="i",
       xlim=c(-3,3),ylim=c(-3,3),main=tit)
  abline(lm(y~x))
  axis(1,seq(-5,5))
  axis(2,seq(-5,5),las=2)
  points(x,y,pch=21,bg=cols)
  text(2,-2.5,paste("r = ",round(cor(x,y,use="pairwise.complete.obs"),2)))
}
p.comp(d.j$CRH,d.j$CRH.2,d.j$Treatment,tit="Crh 1 to 2",X="Capture 1",Y="Capture 2")
p.comp(d.j$CRH.2,d.j$CRH.3,d.j$Treatment,tit="Crh 2 to 3",X="Capture 2",Y="Capture 3")
p.comp(d.j$CRHR1,d.j$CRHR1.2,d.j$Treatment,tit="Crhr1 1 to 2",X="Capture 1",Y="Capture 2")
p.comp(d.j$CRHR1.2,d.j$CRHR1.3,d.j$Treatment,tit="Crhr1 2 to 3",X="Capture 2",Y="Capture 3")
p.comp(d.j$FK,d.j$FK.2,d.j$Treatment,tit="Fkbp5 1 to 2",X="Capture 1",Y="Capture 2")
p.comp(d.j$FK.2,d.j$FK.3,d.j$Treatment,tit="Fkbp5 2 to 3",X="Capture 2",Y="Capture 3")
p.comp(d.j$GRn18,d.j$GRn18.2,d.j$Treatment,tit="GRn18 1 to 2",X="Capture 1",Y="Capture 2")
p.comp(d.j$GRn18.2,d.j$GRn18.3,d.j$Treatment,tit="GRn18 2 to 3",X="Capture 2",Y="Capture 3")
p.comp(d.j$GRn17t8,d.j$GRn17t8.2,d.j$Treatment,tit="GRn17t8 1 to 2",X="Capture 1",Y="Capture 2")
p.comp(d.j$GRn17t8.2,d.j$GRn17t8.3,d.j$Treatment,tit="GRn17t8 2 to 3",X="Capture 2",Y="Capture 3")
p.comp(d.j$GRn7,d.j$GRn7.2,d.j$Treatment,tit="GRn7 1 to 2",X="Capture 1",Y="Capture 2")
p.comp(d.j$GRn7.2,d.j$GRn7.3,d.j$Treatment,tit="GRn7 2 to 3",X="Capture 2",Y="Capture 3")
p.comp(d.j$GRn6t5,d.j$GRn6t5.2,d.j$Treatment,tit="GRn6t5 1 to 2",X="Capture 1",Y="Capture 2")
p.comp(d.j$GRn6t5.2,d.j$GRn6t5.3,d.j$Treatment,tit="GRn6t5 2 to 3",X="Capture 2",Y="Capture 3")

Just like above, it seems that none of the methylation measures are moving much across the three captures regardless of treatment group. Again, I could plot more individual CpG sites rather than the assay averages and we might see something more there, but also likely more noise.

Brain Collection Experiment

This is the third set of pyrosequencing data. It includes methylation data from the same four genes for birds that were part of the brain collection study (12 from Wyoming and 12 from Ithaca). Each bird has methylation data from blood, hippocampus, and hypothalamus. Many of the samples didn’t work at some or all locations because of low DNA concentration so the dataset is smaller for some questions.

Read and organize the data

Repeating as above by dropping zeros.

ds <- read.delim("Input_Samples.txt")
ds <- subset(ds, ds$Year == 2018)
ds <- subset(ds, ds$Treatment == "Blood" | ds$Treatment == "Hippocampus" | ds$Treatment == "Hypothalamus")
ds[ds == 0] <- NA

Generate correlation matrix heatmaps

I’m doing this just the same as above, except that I’m going to plot them separately for the three sample types. For now I’m not going to plot them separately for Wyoming vs. Ithaca just to avoid a million plots, but I could make that comparison too. Remember that the sample sizes are much smaller here than in the two studies above so I wouldn’t be surprised to see more variation. First for blood:

library(reshape2)
library(ggplot2)
ds.save <- ds

ds.hyp <- subset(ds, ds$Treatment == "Hypothalamus")
ds.bl <- subset(ds, ds$Treatment == "Blood")
ds.hip <- subset(ds, ds$Treatment == "Hippocampus")

ds <- ds.bl

## Write a little function to extract upper triangle from correlation matrix
  get_upper_tri <- function(cormat){
    cormat[lower.tri(cormat)] <- NA
    return(cormat)
  }

CRHmat <- as.matrix(ds[, 7:17])
CRHRmat <- as.matrix(ds[, 30:48])
GRmat <- as.matrix(ds[, 49:62])
FKmat <- as.matrix(ds[, 19:29])
mats<-list(CRHmat, CRHRmat, GRmat, FKmat)
mats.bl<-mats
labs<-c("CRH","CRHR1","GR","FKBP5")

for(i in 1:4){
    USE <- mats[[i]]

    corm <- round(cor(USE, use = "pairwise.complete.obs"), 2)
    cormUpper <- get_upper_tri(corm)
    
    ## Melt the correlation matrix
      melted_cormat <- melt(cormUpper, na.rm = TRUE)
      colnames(melted_cormat) <- c("CpG_Num1", "CpG_Num2", "value")
    
    ## Plot the heatmap
      gghm <- ggplot(data = melted_cormat, aes(CpG_Num2, CpG_Num1, fill = value)) +
        geom_tile(color = "white") +
        scale_fill_gradient2(low = "red", high = "blue", mid = "white",
            midpoint = 0, limit = c(-1, 1), space = "Lab", name = "Pearson\nCorrelation") +
        theme_minimal() + 
        theme(axis.text.x = element_text(angle = 45, vjust = 1,
              size = 9, hjust = 1)) +
        coord_fixed()
      
      gghm2 <- gghm + 
        geom_text(aes(CpG_Num2, CpG_Num1, label = value), color = "white", size = 1.5) +
        theme(
          axis.title.x = element_blank(),
          axis.title.y = element_blank(),
          panel.grid.major = element_blank(),
          panel.border = element_blank(),
          panel.background = element_blank(),
          axis.ticks = element_blank(),
          legend.justification = c(1, 0),
          legend.position = c(0.6, 0.7),
          legend.direction = "horizontal")+
        guides(fill = guide_colorbar(barwidth = 7, barheight = 1,
              title.position = "top", title.hjust = 0.5))
    
      assign(paste(labs[i],"gghm",sep="_"),gghm2)
    
}
CRH_gghm
CRHR1_gghm
GR_gghm
FKBP5_gghm

Next for hippocampus:

library(reshape2)
library(ggplot2)
ds <- ds.save

ds.hyp <- subset(ds, ds$Treatment == "Hypothalamus")
ds.bl <- subset(ds, ds$Treatment == "Blood")
ds.hip <- subset(ds, ds$Treatment == "Hippocampus")

ds <- ds.hip

## Write a little function to extract upper triangle from correlation matrix
  get_upper_tri <- function(cormat){
    cormat[lower.tri(cormat)] <- NA
    return(cormat)
  }

CRHmat <- as.matrix(ds[, 7:17])
CRHRmat <- as.matrix(ds[, 30:48])
GRmat <- as.matrix(ds[, 49:62])
FKmat <- as.matrix(ds[, 19:29])
mats<-list(CRHmat, CRHRmat, GRmat, FKmat)
mats.hip<-mats
labs<-c("CRH","CRHR1","GR","FKBP5")

for(i in 1:4){
    USE <- mats[[i]]

    corm <- round(cor(USE, use = "pairwise.complete.obs"), 2)
    cormUpper <- get_upper_tri(corm)
    
    ## Melt the correlation matrix
      melted_cormat <- melt(cormUpper, na.rm = TRUE)
      colnames(melted_cormat) <- c("CpG_Num1", "CpG_Num2", "value")
    
    ## Plot the heatmap
      gghm <- ggplot(data = melted_cormat, aes(CpG_Num2, CpG_Num1, fill = value)) +
        geom_tile(color = "white") +
        scale_fill_gradient2(low = "red", high = "blue", mid = "white",
            midpoint = 0, limit = c(-1, 1), space = "Lab", name = "Pearson\nCorrelation") +
        theme_minimal() + 
        theme(axis.text.x = element_text(angle = 45, vjust = 1,
              size = 9, hjust = 1)) +
        coord_fixed()
      
      gghm2 <- gghm + 
        geom_text(aes(CpG_Num2, CpG_Num1, label = value), color = "white", size = 1.5) +
        theme(
          axis.title.x = element_blank(),
          axis.title.y = element_blank(),
          panel.grid.major = element_blank(),
          panel.border = element_blank(),
          panel.background = element_blank(),
          axis.ticks = element_blank(),
          legend.justification = c(1, 0),
          legend.position = c(0.6, 0.7),
          legend.direction = "horizontal")+
        guides(fill = guide_colorbar(barwidth = 7, barheight = 1,
              title.position = "top", title.hjust = 0.5))
    
      assign(paste(labs[i],"gghm",sep="_"),gghm2)
    
}
CRH_gghm
CRHR1_gghm
GR_gghm
FKBP5_gghm

And last for hypothalamus:

library(reshape2)
library(ggplot2)
ds <- ds.save

ds.hyp <- subset(ds, ds$Treatment == "Hypothalamus")
ds.bl <- subset(ds, ds$Treatment == "Blood")
ds.hip <- subset(ds, ds$Treatment == "Hippocampus")

ds <- ds.hyp

## Write a little function to extract upper triangle from correlation matrix
  get_upper_tri <- function(cormat){
    cormat[lower.tri(cormat)] <- NA
    return(cormat)
  }

CRHmat <- as.matrix(ds[, 7:17])
CRHRmat <- as.matrix(ds[, 30:48])
GRmat <- as.matrix(ds[, 49:62])
FKmat <- as.matrix(ds[, 19:29])
mats<-list(CRHmat, CRHRmat, GRmat, FKmat)
mats.hyp<-mats
labs<-c("CRH","CRHR1","GR","FKBP5")

for(i in 1:4){
    USE <- mats[[i]]

    corm <- round(cor(USE, use = "pairwise.complete.obs"), 2)
    cormUpper <- get_upper_tri(corm)
    
    ## Melt the correlation matrix
      melted_cormat <- melt(cormUpper, na.rm = TRUE)
      colnames(melted_cormat) <- c("CpG_Num1", "CpG_Num2", "value")
    
    ## Plot the heatmap
      gghm <- ggplot(data = melted_cormat, aes(CpG_Num2, CpG_Num1, fill = value)) +
        geom_tile(color = "white") +
        scale_fill_gradient2(low = "red", high = "blue", mid = "white",
            midpoint = 0, limit = c(-1, 1), space = "Lab", name = "Pearson\nCorrelation") +
        theme_minimal() + 
        theme(axis.text.x = element_text(angle = 45, vjust = 1,
              size = 9, hjust = 1)) +
        coord_fixed()
      
      gghm2 <- gghm + 
        geom_text(aes(CpG_Num2, CpG_Num1, label = value), color = "white", size = 1.5) +
        theme(
          axis.title.x = element_blank(),
          axis.title.y = element_blank(),
          panel.grid.major = element_blank(),
          panel.border = element_blank(),
          panel.background = element_blank(),
          axis.ticks = element_blank(),
          legend.justification = c(1, 0),
          legend.position = c(0.6, 0.7),
          legend.direction = "horizontal")+
        guides(fill = guide_colorbar(barwidth = 7, barheight = 1,
              title.position = "top", title.hjust = 0.5))
    
      assign(paste(labs[i],"gghm",sep="_"),gghm2)
    
}
CRH_gghm
CRHR1_gghm
GR_gghm
FKBP5_gghm
ds <- ds.save

Even though the sample size isn’t huge, it does seem like the correlation structure in the brain might be a bit different than in the blood and might differ between the two brain regions. That kind of complicates things for comparing the averages, but for now I’m still going to just compare the assay level averages as we calculated them above so that this is directly comparable. To get the averages I first want to standardize within each CpG as above. Before doing that though, I do want to check if the overall methylation levels differ between blood and brain.

Overall methylation levels

I’m going to pool Ithaca and Wyoming here and plot the overall level of methylation for blood, hippocampus, and hypothalamus at each individual CpG site for each of the four genes. Note that CRHR1 has a different y axis scale because all values are pretty low.

  # I saved the matrix lists used for the correlation plots from above for each of the sample types to use here.
    CRH.P <- list(mats.bl[[1]],mats.hip[[1]],mats.hyp[[1]])
    plot(1,1,type="n",yaxt="n",xaxt="n",ylab="Methylation percent",xlab="CpG Number",bty="n",
         xlim=c(0,ncol(CRH.P[[1]])+.8),ylim=c(0,100),main="CRH",xaxs="i",yaxs="i")
    axis(1,seq(-10,50,2))
    axis(2,seq(-100,200,20),las=2)
    r<-seq(1,ncol(CRH.P[[1]]),1)
    offs<-c(-.1,0,.1)
    colx <- c("coral3","chartreuse3","slateblue")
    for(i in 1:3){
      mus<-rep(NA,ncol(CRH.P[[i]]))
      sds<-rep(NA,ncol(CRH.P[[i]]))
      for(w in 1:ncol(CRH.P[[i]])){
        mus[w] <- mean(na.omit(CRH.P[[i]][,w]))
        sds[w] <- mean(na.omit(CRH.P[[i]][,w]))
      }
      for(k in 1:ncol(CRH.P[[1]])){
        lines(rep(r[k],2)+offs[i],c(mean(na.omit(CRH.P[[i]][,k])) + sd(na.omit(CRH.P[[i]][,k])),
                            mean(na.omit(CRH.P[[i]][,k])) - sd(na.omit((CRH.P[[i]][,k])))))
      }
      lines(r+offs[i],mus,lwd=2,col=colx[i])
      points(r+offs[i],mus,pch=21,bg=colx[i])
      legend("topright",c("Blood","Hippocampus","Hypothalamus"),lwd=2,pch=21,col=colx,pt.bg=colx,bty="n")
    }

    CRHR1.P <- list(mats.bl[[2]],mats.hip[[2]],mats.hyp[[2]])
    plot(1,1,type="n",yaxt="n",xaxt="n",ylab="Methylation percent",xlab="CpG Number",bty="n",
         xlim=c(0,ncol(CRHR1.P[[1]])+.8),ylim=c(0,30),main="CRHR1",xaxs="i",yaxs="i")
    axis(1,seq(-10,50,2))
    axis(2,seq(-100,200,10),las=2)
    r<-seq(1,ncol(CRHR1.P[[1]]),1)
    offs<-c(-.1,0,.1)
    colx <- c("coral3","chartreuse3","slateblue")
    for(i in 1:3){
      mus<-rep(NA,ncol(CRHR1.P[[i]]))
      sds<-rep(NA,ncol(CRHR1.P[[i]]))
      for(w in 1:ncol(CRHR1.P[[i]])){
        mus[w] <- mean(na.omit(CRHR1.P[[i]][,w]))
        sds[w] <- mean(na.omit(CRHR1.P[[i]][,w]))
      }
      for(k in 1:ncol(CRHR1.P[[1]])){
        lines(rep(r[k],2)+offs[i],c(mean(na.omit(CRHR1.P[[i]][,k])) + sd(na.omit(CRHR1.P[[i]][,k])),
                            mean(na.omit(CRHR1.P[[i]][,k])) - sd(na.omit((CRHR1.P[[i]][,k])))))
      }
      lines(r+offs[i],mus,lwd=2,col=colx[i])
      points(r+offs[i],mus,pch=21,bg=colx[i])
      legend("topright",c("Blood","Hippocampus","Hypothalamus"),lwd=2,pch=21,col=colx,pt.bg=colx,bty="n")
    }

    GR.P <- list(mats.bl[[3]],mats.hip[[3]],mats.hyp[[3]])
    plot(1,1,type="n",yaxt="n",xaxt="n",ylab="Methylation percent",xlab="CpG Number",bty="n",
         xlim=c(0,ncol(GR.P[[1]])+.8),ylim=c(0,100),main="GR",xaxs="i",yaxs="i")
    axis(1,seq(-10,50,2))
    axis(2,seq(-100,200,20),las=2)
    r<-seq(1,ncol(GR.P[[1]]),1)
    offs<-c(-.1,0,.1)
    colx <- c("coral3","chartreuse3","slateblue")
    for(i in 1:3){
      mus<-rep(NA,ncol(GR.P[[i]]))
      sds<-rep(NA,ncol(GR.P[[i]]))
      for(w in 1:ncol(GR.P[[i]])){
        mus[w] <- mean(na.omit(GR.P[[i]][,w]))
        sds[w] <- mean(na.omit(GR.P[[i]][,w]))
      }
      for(k in 1:ncol(GR.P[[1]])){
        lines(rep(r[k],2)+offs[i],c(mean(na.omit(GR.P[[i]][,k])) + sd(na.omit(GR.P[[i]][,k])),
                            mean(na.omit(GR.P[[i]][,k])) - sd(na.omit((GR.P[[i]][,k])))))
      }
      lines(r+offs[i],mus,lwd=2,col=colx[i])
      points(r+offs[i],mus,pch=21,bg=colx[i])
      legend("topright",c("Blood","Hippocampus","Hypothalamus"),lwd=2,pch=21,col=colx,pt.bg=colx,bty="n")
    }

    FK.P <- list(mats.bl[[4]],mats.hip[[4]],mats.hyp[[4]])
    plot(1,1,type="n",yaxt="n",xaxt="n",ylab="Methylation percent",xlab="CpG Number",bty="n",
         xlim=c(0,ncol(FK.P[[1]])+.8),ylim=c(0,100),main="FK",xaxs="i",yaxs="i")
    axis(1,seq(-10,50,2))
    axis(2,seq(-100,200,20),las=2)
    r<-seq(1,ncol(FK.P[[1]]),1)
    offs<-c(-.1,0,.1)
    colx <- c("coral3","chartreuse3","slateblue")
    for(i in 1:3){
      mus<-rep(NA,ncol(FK.P[[i]]))
      sds<-rep(NA,ncol(FK.P[[i]]))
      for(w in 1:ncol(FK.P[[i]])){
        mus[w] <- mean(na.omit(FK.P[[i]][,w]))
        sds[w] <- mean(na.omit(FK.P[[i]][,w]))
      }
      for(k in 1:ncol(FK.P[[1]])){
        lines(rep(r[k],2)+offs[i],c(mean(na.omit(FK.P[[i]][,k])) + sd(na.omit(FK.P[[i]][,k])),
                            mean(na.omit(FK.P[[i]][,k])) - sd(na.omit((FK.P[[i]][,k])))))
      }
      lines(r+offs[i],mus,lwd=2,col=colx[i])
      points(r+offs[i],mus,pch=21,bg=colx[i])
      legend("topright",c("Blood","Hippocampus","Hypothalamus"),lwd=2,pch=21,col=colx,pt.bg=colx,bty="n")
    }

So each of the different genes seems to have a different pattern of whether overall methylation is similar between the three tissue types. Based on this I think I need to scale and calculate assay averages separately for each tissue because it doesn’t make any sense to lump them together. I do that in the next section.

Scale and calculate assay average methylation

This is similar to what I did in the above experiments, but requires a little bit of extra coding becaues I have to first separate out the three tissue types and scale them separately then merge back into a single data frame with one row per sample with columns for each tissue.

ds.bl <- subset(ds, ds$Treatment == "Blood")
ds.hip <- subset(ds, ds$Treatment == "Hippocampus")
ds.hyp <- subset(ds, ds$Treatment == "Hypothalamus")

dss.bl <- as.data.frame(scale(ds.bl[,7:62]))
for(i in 1:nrow(dss.bl)){
  dss.bl$CRH[i] <- mean(na.omit(t(dss.bl[i,1:12])))
  dss.bl$CRHR1[i] <- mean(na.omit(t(dss.bl[i,24:42])))
  dss.bl$FK[i] <- mean(na.omit(t(dss.bl[i,13:23])))
  dss.bl$GRn17t8[i] <- mean(na.omit(t(dss.bl[i,44:53])))
  dss.bl$GRn6t5[i] <- mean(na.omit(t(dss.bl[i,55:56])))
}
dss.bl <- cbind(ds.bl[,1:6],dss.bl)
dss.bl <- dss.bl[,c("SampleID","Band","Capture","Treatment","CRH","CRHR1","FK","GRn18","GRn17t8","GRn7","GRn6t5")]
colnames(dss.bl) <- c("SampleID","Band","Capture","Treatment",paste(colnames(dss.bl)[5:11],".bl",sep=""))

dss.hip <- as.data.frame(scale(ds.hip[,7:62]))
for(i in 1:nrow(dss.hip)){
  dss.hip$CRH[i] <- mean(na.omit(t(dss.hip[i,1:12])))
  dss.hip$CRHR1[i] <- mean(na.omit(t(dss.hip[i,24:42])))
  dss.hip$FK[i] <- mean(na.omit(t(dss.hip[i,13:23])))
  dss.hip$GRn17t8[i] <- mean(na.omit(t(dss.hip[i,44:53])))
  dss.hip$GRn6t5[i] <- mean(na.omit(t(dss.hip[i,55:56])))
}
dss.hip <- cbind(ds.hip[,1:6],dss.hip)
dss.hip <- dss.hip[,c("SampleID","Band","CRH","CRHR1","FK","GRn18","GRn17t8","GRn7","GRn6t5")]
colnames(dss.hip) <- c("SampleID.hip","Band",paste(colnames(dss.hip)[3:9],".hip",sep=""))

dss.hyp <- as.data.frame(scale(ds.hyp[,7:62]))
for(i in 1:nrow(dss.hyp)){
  dss.hyp$CRH[i] <- mean(na.omit(t(dss.hyp[i,1:12])))
  dss.hyp$CRHR1[i] <- mean(na.omit(t(dss.hyp[i,24:42])))
  dss.hyp$FK[i] <- mean(na.omit(t(dss.hyp[i,13:23])))
  dss.hyp$GRn17t8[i] <- mean(na.omit(t(dss.hyp[i,44:53])))
  dss.hyp$GRn6t5[i] <- mean(na.omit(t(dss.hyp[i,55:56])))
}
dss.hyp <- cbind(ds.hyp[,1:6],dss.hyp)
dss.hyp <- dss.hyp[,c("SampleID","Band","CRH","CRHR1","FK","GRn18","GRn17t8","GRn7","GRn6t5")]
colnames(dss.hyp) <- c("SampleID.hyp","Band",paste(colnames(dss.hyp)[3:9],".hyp",sep=""))

dss <- join(dss.bl, dss.hyp, "Band")
dss <- join(dss, dss.hip, "Band")

dss$Pop <- as.factor(substring(as.character(dss$SampleID),1,1))

Compare methylation averages across tissue types

Now I’m going to make scatterplots comparing all three tissue types when measured from the same individuals. Here there are three plots per bird to account for each of the possible comparisons. I’ve colored the points by population, but am not explicitly comparing population differences here.

palette(c("green","yellow"))
p.comp<-function(x,y,cols,tit="GENE",X="Tissue 1",Y="Tissue 2"){
  pal<-c()
  plot(1,1,type="n",bty="n",ylab=Y,xlab=X,xaxt="n",yaxt="n",xaxs="i",yaxs="i",
       xlim=c(-3,3),ylim=c(-3,3),main=tit)
  abline(lm(y~x))
  axis(1,seq(-10,10,2))
  axis(2,seq(-10,10,2),las=2)
  points(x,y,pch=21,bg=cols)
  text(2,-2.5,paste("r = ",round(cor(x,y,use="pairwise.complete.obs"),2)))
  legend("topleft",c("Ithaca","Wyoming"),pch=21,pt.bg=c("green","yellow"),bty="n")
}

p.comp(dss$CRH.bl, dss$CRH.hip, dss$Pop, "CRH", X = "Blood", Y = "Hippocampus")
p.comp(dss$CRH.bl, dss$CRH.hyp, dss$Pop, "CRH", X = "Blood", Y = "Hypothalamus")
p.comp(dss$CRH.hip, dss$CRH.hyp, dss$Pop, "CRH", X = "Hippocampus", Y = "Hypothalamus")

p.comp(dss$CRHR1.bl, dss$CRHR1.hip, dss$Pop, "CRHR1", X = "Blood", Y = "Hippocampus")
p.comp(dss$CRHR1.bl, dss$CRHR1.hyp, dss$Pop, "CRHR1", X = "Blood", Y = "Hypothalamus")
p.comp(dss$CRHR1.hip, dss$CRHR1.hyp, dss$Pop, "CRHR1", X = "Hippocampus", Y = "Hypothalamus")

p.comp(dss$FK.bl, dss$FK.hip, dss$Pop, "Fkbp5", X = "Blood", Y = "Hippocampus")
p.comp(dss$FK.bl, dss$FK.hyp, dss$Pop, "Fkbp5", X = "Blood", Y = "Hypothalamus")
p.comp(dss$FK.hip, dss$FK.hyp, dss$Pop, "Fkbp5", X = "Hippocampus", Y = "Hypothalamus")

p.comp(dss$GRn18.bl, dss$GRn18.hip, dss$Pop, "GRn18", X = "Blood", Y = "Hippocampus")
p.comp(dss$GRn18.bl, dss$GRn18.hyp, dss$Pop, "GRn18", X = "Blood", Y = "Hypothalamus")
p.comp(dss$GRn18.hip, dss$GRn18.hyp, dss$Pop, "GRn18", X = "Hippocampus", Y = "Hypothalamus")

p.comp(dss$GRn17t8.bl, dss$GRn17t8.hip, dss$Pop, "GRn17t8", X = "Blood", Y = "Hippocampus")
p.comp(dss$GRn17t8.bl, dss$GRn17t8.hyp, dss$Pop, "GRn17t8", X = "Blood", Y = "Hypothalamus")
p.comp(dss$GRn17t8.hip, dss$GRn17t8.hyp, dss$Pop, "GRn17t8", X = "Hippocampus", Y = "Hypothalamus")

p.comp(dss$GRn7.bl, dss$GRn7.hip, dss$Pop, "GRn7", X = "Blood", Y = "Hippocampus")
p.comp(dss$GRn7.bl, dss$GRn7.hyp, dss$Pop, "GRn7", X = "Blood", Y = "Hypothalamus")
p.comp(dss$GRn7.hip, dss$GRn7.hyp, dss$Pop, "GRn7", X = "Hippocampus", Y = "Hypothalamus")

p.comp(dss$GRn6t5.bl, dss$GRn6t5.hip, dss$Pop, "GRn6t5", X = "Blood", Y = "Hippocampus")
p.comp(dss$GRn6t5.bl, dss$GRn6t5.hyp, dss$Pop, "GRn6t5", X = "Blood", Y = "Hypothalamus")
p.comp(dss$GRn6t5.hip, dss$GRn6t5.hyp, dss$Pop, "GRn6t5", X = "Hippocampus", Y = "Hypothalamus")

So….it seems that methylation isn’t all that well correlated across these different tissues. One thing to keep in mind though is that these are averages across the entire assay and looking at the correlation structure above we know that it differs a lot in the different tissues, so perhaps it isn’t surprising that averages wouldn’t match up well. Of the two individual CpGs plotted here (GRn18 and GRn7) one of them has a very strong relationship between all three tissue types (though note that this is one that had the weird bimodal distribution). It may be worth looking at within bird correlations at individual CpGs rather than at these assay averages.

I’m not going to make 56 x 3 separate plots for each individual CpG site, but I can plot just the correlation coefficients for each individual site as you move along the assay. Below I’ve done that for each comparison for each gene.

ds.bl <- subset(ds, ds$Treatment == "Blood")
ds.hip <- subset(ds, ds$Treatment == "Hippocampus")
ds.hyp <- subset(ds, ds$Treatment == "Hypothalamus")

colnames(ds.bl)[7:62]<-paste(colnames(ds.bl)[7:62],".bl",sep="")
colnames(ds.hip)[7:62]<-paste(colnames(ds.hip)[7:62],".hip",sep="")
colnames(ds.hyp)[7:62]<-paste(colnames(ds.hyp)[7:62],".hyp",sep="")

ds.a <- join(ds.bl, ds.hip, "Band")
ds.a <- join(ds.a, ds.hyp, "Band")

cors <- as.data.frame(matrix(nrow=56, ncol=4))
colnames(cors) <- c("CpG", "Blo_Hip", "Blo_Hyp", "Hip_Hyp")
cors$CpG <- colnames(ds)[7:62]

for(i in 1:nrow(cors)){
  cors$Blo_Hip[i] <- cor(ds.a[,i+6], ds.a[,i+67], use = "pairwise.complete.obs")
  cors$Blo_Hyp[i] <- cor(ds.a[,i+6], ds.a[,i+128], use = "pairwise.complete.obs")
  cors$Hip_Hyp[i] <- cor(ds.a[,i+67], ds.a[,i+128], use = "pairwise.complete.obs")
}

colx<-c("coral3","chartreuse3","slateblue")
p.comp<-function(x,y,z,q,cols,tit="GENE",X="CpG Number",Y="Correlation"){
  plot(1,1,type="n",bty="n",ylab=Y,xlab=X,xaxt="n",yaxt="n",xaxs="i",yaxs="i",
       xlim=c(0,length(x)+1),ylim=c(-1.1,1.1),main=tit)
  abline(h = 0, lty = 2, col = "gray40")
  axis(1,seq(-10,100,2))
  axis(2,seq(-10,10,1),las=2)
  lines(x,y,lwd=2,col=cols[1])
  lines(x,z,lwd=2,col=cols[2])
  lines(x,q,lwd=2,col=cols[3])
  points(x,y,pch=21,bg=cols[1])
  points(x,z,pch=21,bg=cols[2])
  points(x,q,pch=21,bg=cols[3])
  text(2,-2.5,paste("r = ",round(cor(x,y,use="pairwise.complete.obs"),2)))
  legend("topleft",c("Bld-Hip","Bld-Hyp","Hip-Hyp"),lwd=2,pch=21,pt.bg=cols,bty="n",col=cols)
}

p.comp(seq(1,11,1),cors$Blo_Hip[1:11],cors$Blo_Hyp[1:11],cors$Hip_Hyp[1:11],cols=colx,tit="Crh")

p.comp(seq(1,11,1),cors$Blo_Hip[13:23],cors$Blo_Hyp[13:23],cors$Hip_Hyp[13:23],cols=colx,tit="Fkbp5")

p.comp(seq(1,19,1),cors$Blo_Hip[24:42],cors$Blo_Hyp[24:42],cors$Hip_Hyp[24:42],cols=colx,tit="Crhr1")

p.comp(seq(1,14,1),cors$Blo_Hip[43:56],cors$Blo_Hyp[43:56],cors$Hip_Hyp[43:56],cols=colx,tit="GR")

So here it does look like there are some clear relationships between methylation in the three tissue types, but it is much more complicated than simply a positive correlation across all tissues. Some CpGs are highly correlated at all three tissues but others are not, some are negatively correlated, etc. One comparison might be to ask how many strong correlations (positive or negative) we are seeing here compared to expected just by chance. I’ve plotted that below by comparing the observed correlations to the expectation if methylation was randomly distributed across sample types. For this plot I’ve just lumped all four genes, but a similar thing could be done separately for each gene. I’m plotting the absolute value of correlations so that strong + or - correlations both appear closer to 1 here.

hh <- rep(NA,1e4)
for(i in 1:1e4){
  hh[i] <- cor(rnorm(24,0,1),rnorm(24,0,1))
}

library(rethinking)
dens(abs(hh),col="gray50",bty="n",yaxt="n",xaxt="n",xlab="Absolute correlation",xlim=c(-.05,1),lwd=2)
axis(1,seq(-2,2,.2))
axis(2,seq(-5,5,1))
dens(abs(cors$Blo_Hip),col="coral3",add=TRUE,lwd=2)
dens(abs(cors$Blo_Hyp),col="chartreuse3",add=TRUE,lwd=2)
dens(abs(cors$Hip_Hyp),col="slateblue",add=TRUE,lwd=2)
legend("topright",c("Null","Bld-Hip","Bld-Hyp","Hip-Hyp"),bty="n",lwd=2,
       col=c("gray50","coral3","chartreuse3","slateblue"))

So all three of the tissue comparisons have many more strong correlations than you would expect by chance, though the strong correlations are mixed in with lots of weak correlations too. It seems that the relationships between tissue types are very gene and CpG site specific and not broadly generalizable even within these pretty narrow assays.

Does methylation vary by location

The last thing I want to do for now is just to plot tissue specific differences in methylation for each gene across the two locations. Given the patterns above, the assay level averages may not be the best thing to use here, but for the sake of simplicity I’m just going to start with those. This can of course be expanded to individual CpG sites later on.

colx<-c("green","yellow")
p.comp<-function(x,y,z,cols,tit="GENE",X="",Y="Methylation (sd)"){
  plot(1,1,type="n",bty="n",ylab=Y,xlab=X,xaxt="n",yaxt="n",xaxs="i",yaxs="i",
       xlim=c(0,11),ylim=c(-3,3),main=tit)
  abline(h = 0, lty = 2, col = "gray40")
  axis(1,c(-5,1.5,5.5,9.5,15),labels=c("","Blood","Hippo","Hypo",""))
  axis(2,seq(-10,10,1),las=2)
  
  gapl <- -.3
  gapu <- .1
  points(rep(1,12)+runif(12,gapl,gapu),x[1:12],pch=21,bg=colx[1])
  points(rep(2,12)+runif(12,gapl,gapu),x[13:24],pch=21,bg=colx[2])
  lines(rep(1+gapu*2,2),c(mean(na.omit(x[1:12]))+sd(na.omit(x[1:12])),
                          mean(na.omit(x[1:12]))-sd(na.omit(x[1:12]))))
  lines(rep(2+gapu*2,2),c(mean(na.omit(x[13:24]))+sd(na.omit(x[13:24])),
                          mean(na.omit(x[13:24]))-sd(na.omit(x[13:24]))))
  points(1+gapu*2,mean(na.omit(x[1:12])),pch=22,col=colx[1],bg="black",cex=1.5)
  points(2+gapu*2,mean(na.omit(x[13:24])),pch=22,col=colx[2],bg="black",cex=1.5)
  
  points(rep(5,12)+runif(12,gapl,gapu),y[1:12],pch=21,bg=colx[1])
  points(rep(6,12)+runif(12,gapl,gapu),y[13:24],pch=21,bg=colx[2])
  lines(rep(5+gapu*2,2),c(mean(na.omit(y[1:12]))+sd(na.omit(y[1:12])),
                          mean(na.omit(y[1:12]))-sd(na.omit(y[1:12]))))
  lines(rep(6+gapu*2,2),c(mean(na.omit(y[13:24]))+sd(na.omit(y[13:24])),
                          mean(na.omit(y[13:24]))-sd(na.omit(y[13:24]))))
  points(5+gapu*2,mean(na.omit(y[1:12])),pch=22,col=colx[1],bg="black",cex=1.5)
  points(6+gapu*2,mean(na.omit(y[13:24])),pch=22,col=colx[2],bg="black",cex=1.5)
  
  points(rep(9,12)+runif(12,gapl,gapu),z[1:12],pch=21,bg=colx[1])
  points(rep(10,12)+runif(12,gapl,gapu),z[13:24],pch=21,bg=colx[2])
  lines(rep(9+gapu*2,2),c(mean(na.omit(z[1:12]))+sd(na.omit(z[1:12])),
                          mean(na.omit(z[1:12]))-sd(na.omit(z[1:12]))))
  lines(rep(10+gapu*2,2),c(mean(na.omit(z[13:24]))+sd(na.omit(z[13:24])),
                          mean(na.omit(z[13:24]))-sd(na.omit(z[13:24]))))
  points(9+gapu*2,mean(na.omit(z[1:12])),pch=22,col=colx[1],bg="black",cex=1.5)
  points(10+gapu*2,mean(na.omit(z[13:24])),pch=22,col=colx[2],bg="black",cex=1.5)
  
  
  legend("topleft",c("Ithaca","Wyoming"),pch=21,pt.bg=colx,bty="n")
}

p.comp(dss$CRH.bl,dss$CRH.hip,dss$CRH.hyp,cols=colx,tit="Crh")

p.comp(dss$CRHR1.bl,dss$CRHR1.hip,dss$CRHR1.hyp,cols=colx,tit="Crhr1")

p.comp(dss$FK.bl,dss$FK.hip,dss$FK.hyp,cols=colx,tit="Fkbp5")

p.comp(dss$GRn18.bl,dss$GRn18.hip,dss$GRn18.hyp,cols=colx,tit="GRn18")

p.comp(dss$GRn17t8.bl,dss$GRn17t8.hip,dss$GRn17t8.hyp,cols=colx,tit="GRn17t8")

p.comp(dss$GRn7.bl,dss$GRn7.hip,dss$GRn7.hyp,cols=colx,tit="GRn7")

p.comp(dss$GRn6t5.bl,dss$GRn6t5.hip,dss$GRn6t5.hyp,cols=colx,tit="GRn6t5")

There are a few spots where it looks like there might be a trend for a population difference, but none of them are super striking and with this small sample size it would have to be a pretty large difference. Again, there might be some more individual CpGs that do differ and I could reproduce similar plots at a CpG rather than assay level as well.

Flexibility in methylation?

Given the results above, it seems worth asking whether any of these CpG sites change in methylation enough that we might expect to see relationships.

Identify individual CpG sites that change

One approach to moving from these average assay level metrics to more specific analyses of individual CpG sites might be to identify only those sites that seem to be flexible. Given the high stability in averages, I expect that most sites are ‘fixed’ at least on the time scale of our experiments. For this I’m combining the 2017 & 2018 experimental data and looking at the strength of correlation between 1st and 3rd capture samples at each individual CpG site (here I’m dropping the 2nd capture from 2018 for convenience).

library(plyr)
ds <- read.delim("Input_Samples.txt")
ds[ds == 0] <- NA
ds <- subset(ds, ds$Treatment != "Blood" & ds$Treatment != "Hippocampus" & ds$Treatment != "Hypothalamus")
ds.1 <- subset(ds, ds$Capture ==1)
ds.3 <- subset(ds, ds$Capture ==3)
ds.3 <- ds.3[,c(3,7:ncol(ds.3))]
colnames(ds.3)[2:ncol(ds.3)] <- paste(colnames(ds.3)[2:ncol(ds.3)],".3",sep="")
ds.j <- join(ds.1, ds.3, "Band")

cors <- as.data.frame(matrix(nrow = 56, ncol = 2))
colnames(cors) <- c("CpG", "Correlation")
cors$CpG <- colnames(ds)[7:62]
for(i in 1:nrow(cors)){
  cors$Correlation[i] <- cor(ds.j[,i+6], ds.j[,i+62], use = "pairwise.complete.obs")
}

 plot(1,1,type="n",yaxs="i",xaxs="i",ylim=c(0,1),xlim=c(0,20),yaxt="n",xaxt="n",
       xlab="CpG Sites",ylab="Pearson correlation",bty="n")
  axis(1,seq(-2,22,2),cex=0.9)
  axis(2,seq(-20,120,.5),las=2)
  abline(h=.5, lty=2, col="gray40")
  
  CRHo <- .1
  CRHRo <- -.1
  GRo <- .2
  FKo <- -.2
  
  r<-seq(1,20,1)
  
  lines(seq(1,length(cors$Correlation[1:12]),1)+CRHo,cors$Correlation[1:12],lwd=2,col="orange")
  points(seq(1,length(cors$Correlation[1:12]),1)+CRHo,cors$Correlation[1:12],pch=21,bg="orange")
  
  lines(seq(1,length(cors$Correlation[43:56]),1)+GRo,cors$Correlation[43:56],lwd=2,col="lightblue")
  points(seq(1,length(cors$Correlation[43:56]),1)+GRo,cors$Correlation[43:56],pch=21,bg="lightblue")
  
  lines(seq(1,length(cors$Correlation[24:42]),1)+CRHRo,cors$Correlation[24:42],lwd=2,col="coral3")
  points(seq(1,length(cors$Correlation[24:42]),1)+CRHRo,cors$Correlation[24:42],pch=21,bg="coral3")
  
  lines(seq(1,length(cors$Correlation[13:23]),1)+FKo,cors$Correlation[13:23],lwd=2,col="chartreuse3")
  points(seq(1,length(cors$Correlation[13:23]),1)+FKo,cors$Correlation[13:23],pch=21,bg="chartreuse3")
  
  legend("bottomright",c("Crh","Crhr1","GR","Fkbp5"),lwd=2,pch=21,bty="n",
         pt.bg=c("orange","coral3","lightblue","chartreuse3"),
         col=c("orange","coral3","lightblue","chartreuse3"))

Almost all individual CpGs have very high correlations between 1st & 3rd captures. Again, this suggests that the assay is really measuring something stable about individual differences, but also that most of these sites are not flexible on the time scale of our experiments. Only 6 individual CpG sites out of 62 have correlation coefficients below 0.5. There isn’t anything particularly special about that number, but it seems like it might be a reasonable threshold and it provides a reasonable number of individual sites to look at. Four of those sites are in GR along with one each in Crh and Crhr1. Fkbp5 is almost perfectly correlated across time points within individuals.

Individual changes at flexible CpG sites

ds.j1 <- subset(ds.j, ds.j$Treatment == "Dulled" | ds.j$Treatment =="Control")
ds.j2 <- subset(ds.j, ds.j$Treatment != "Dulled" & ds.j$Treatment != "Control")

palette(c("slateblue","orange","chartreuse3","coral3","gray50","lightblue"))
collist <- ds.j1$Treatment
collist <- gsub("Dulled", "slateblue", collist)
collist <- gsub("Control", "orange", collist)

p.comp<-function(x,y,cols,leg=1,tit="CPG",low=0,lim=100,X="Capture",Y="Methylation percent"){
  plot(1,1,type="n",bty="n",ylab=Y,xlab=X,xaxt="n",yaxt="n",xaxs="i",yaxs="i",
       xlim=c(0,2.2),ylim=c(low,lim),main=tit)
  axis(1,c(-5,1,2,5),labels=c("","1","3",""))
  axis(2,seq(-20,120,10),las=2)
  for(k in 1:length(x)){
    lines(c(1,2),c(x[k],y[k]),lwd=1,col=collist[k])
  }
  lines(rep(.9,2),c(mean(na.omit(x))+sd(na.omit(x)),
                    mean(na.omit(x))-sd(na.omit(x))),lwd=2)
  lines(rep(2.1,2),c(mean(na.omit(y))+sd(na.omit(y)),
                     mean(na.omit(y))-sd(na.omit(y))),lwd=2)
  points(.9,mean(na.omit(x)),pch=22,cex=1.5,bg="black")
  points(2.1,mean(na.omit(y)),pch=22,cex=1.5,bg="black")
  if(leg==1){legend("topleft",c("Dulled","Control"),pch=21,pt.bg=palette()[1:2],bty="n")}
  if(leg==2){legend("topleft",c("D-C","D-S","C-S","C-P","D-P","C-D"),pch=21,pt.bg=palette()[1:6],bty="n")}
}

collist2 <- ds.j2$Treatment
collist2 <- gsub("Dull_Control", "slateblue", collist2)
collist2 <- gsub("Dull_Stress", "orange", collist2)
collist2 <- gsub("Control_Stress", "chartreuse3", collist2)
collist2 <- gsub("Control_Predator", "coral3", collist2)
collist2 <- gsub("Dull_Predator", "gray50", collist2)
collist2 <- gsub("Control_Control", "lightblue", collist2)

p.comp2<-function(x,y,cols,leg=1,tit="CPG",low=0,lim=100,X="Capture",Y="Methylation percent"){
  plot(1,1,type="n",bty="n",ylab=Y,xlab=X,xaxt="n",yaxt="n",xaxs="i",yaxs="i",
       xlim=c(0,2.2),ylim=c(low,lim),main=tit)
  axis(1,c(-5,1,2,5),labels=c("","1","3",""))
  axis(2,seq(-20,120,10),las=2)
  for(k in 1:length(x)){
    lines(c(1,2),c(x[k],y[k]),lwd=1,col=collist2[k])
  }
  lines(rep(.9,2),c(mean(na.omit(x))+sd(na.omit(x)),
                    mean(na.omit(x))-sd(na.omit(x))),lwd=2)
  lines(rep(2.1,2),c(mean(na.omit(y))+sd(na.omit(y)),
                     mean(na.omit(y))-sd(na.omit(y))),lwd=2)
  points(.9,mean(na.omit(x)),pch=22,cex=1.5,bg="black")
  points(2.1,mean(na.omit(y)),pch=22,cex=1.5,bg="black")
  if(leg==1){legend("topleft",c("Dulled","Control"),pch=21,pt.bg=palette()[1:2],bty="n")}
  if(leg==2){legend("topleft",c("D-C","D-S","C-S","C-P","D-P","C-D"),pch=21,pt.bg=palette()[1:6],bty="n")}
}

p.comp(ds.j1$CRHn78, ds.j1$CRHn78.3, cols = ds.j1$Treatment, lim=50, leg = 1, tit = "Crh -78 2017")
p.comp2(ds.j2$CRHn78, ds.j2$CRHn78.3, cols = ds.j2$Treatment, lim=50, leg = 2, tit = "Crh -78 2018")

p.comp(ds.j1$CRHR118, ds.j1$CRHR118.3, cols = ds.j1$Treatment, lim=15, leg = 1, tit = "Crhr 118 2017")
p.comp2(ds.j2$CRHR118, ds.j2$CRHR118.3, cols = ds.j2$Treatment, lim=15, leg = 2, tit = "Crhr 118 2018")

p.comp(ds.j1$GRn14, ds.j1$GRn14.3, cols = ds.j1$Treatment, lim=6, leg = 1, tit = "GR -14 2017")
axis(2,seq(-2,10,1),las=2)
p.comp2(ds.j2$GRn14, ds.j2$GRn14.3, cols = ds.j2$Treatment, lim=6, leg = 2, tit = "GR -14 2018")
axis(2,seq(-2,10,1),las=2)

p.comp(ds.j1$GRn9, ds.j1$GRn9.3, cols = ds.j1$Treatment, lim=65, leg = 1, tit = "GR -9 2017")
p.comp2(ds.j2$GRn9, ds.j2$GRn9.3, cols = ds.j2$Treatment, lim=65, leg = 2, tit = "GR -9 2018")

p.comp(ds.j1$GRn8, ds.j1$GRn8.3, cols = ds.j1$Treatment, low=10, lim=60, leg = 1, tit = "GR -8 2017")
p.comp2(ds.j2$GRn8, ds.j2$GRn8.3, cols = ds.j2$Treatment, low=10, lim=60, leg = 2, tit = "GR -8 2018")

p.comp(ds.j1$GRn6, ds.j1$GRn6.3, cols = ds.j1$Treatment, low=40, lim=80, leg = 1, tit = "GR -6 2017")
p.comp2(ds.j2$GRn6, ds.j2$GRn6.3, cols = ds.j2$Treatment, low=40, lim=80, leg = 2, tit = "GR -6 2018")

I’m not sure that these plots reveal anything super interesting in the end other than that there seems to be some seasonal regulation in methylation at some of these CpG sites.

Cross year changes in methylation?

dss <- as.data.frame(scale(ds[,7:62]))
for(i in 1:nrow(dss)){
  dss$CRH[i] <- mean(na.omit(t(dss[i,1:12])))
  dss$CRHR1[i] <- mean(na.omit(t(dss[i,24:42])))
  dss$FK[i] <- mean(na.omit(t(dss[i,13:23])))
  dss$GRn17t8[i] <- mean(na.omit(t(dss[i,44:53])))
  dss$GRn6t5[i] <- mean(na.omit(t(dss[i,55:56])))
}

dss <- cbind(ds[,1:6],dss)

dss.y <- subset(dss, dss$Capture == 1)
dss.y17 <- subset(dss.y, dss.y$Year == 2017)
dss.y18 <- subset(dss.y, dss.y$Year == 2018)

colnames(dss.y18) <- paste(colnames(dss.y18), ".18", sep="")
dss.y18$MATCH2 <- paste(dss.y18$Band.18, "2017", sep="_")
dss.y17$MATCH2 <- paste(dss.y17$Band, "2017", sep="_")

dss.yy <- join(dss.y17, dss.y18, "MATCH2")
dss.yy <- subset(dss.yy, is.na(dss.yy$SampleID.18)==FALSE)

plot(1,1,type="n", xaxt="n", yaxt="n", ylim=c(-3.3, 3.3), xlim=c(-3.3, 3.3), ylab="2018 Methylation", xlab = "2017 Methylation",
     yaxs="i",xaxs="i", bty="n")
axis(1,seq(-5,5,1))
axis(2,seq(-5,5,1), las=2)
library(scales)

collist <- c(rep("slateblue", 12), rep("goldenrod", 11), rep("chartreuse3",19), rep("coral3", 14))
for(i in 1:56){
  points(dss.yy[,i+6], dss.yy[,i+74], pch = 16, col = alpha(collist[i], 0.5))
}
abline(a=0,b=1)

legend("topleft", c("Crh","Crhr1","Fkbp5","Nr3c1"), pch=16, bty="n", col=c("slateblue","chartreuse3","goldenrod","coral3"))

In general there is a shockingly high correspondance between methylation in 2017 and methylation in 2018 among the 11 birds with first capture samples. Note that there are a few more samples that could be used here if we added 3rd captures and other comparisons but the result seems pretty clear here either way. It does seem like there may be some age related changes in Crhr1 towards increasing methylation…maybe related to age-related changes in stress responsivity that are often reported?

Age related changes in methylation.

Here I’m just plotting lines connecting 2017 and 2018 methylation measures at each CPG site for each of the 11 birds that have first capture samples in each year. It could be split out more but I think this is good enough to get a general impression.

plot(1,1,type="n", xaxt="n", yaxt="n", xlim=c(.5,2.5), ylim=c(-3.3, 3.3), ylab="Methylation", xlab = "Year",
     yaxs="i",xaxs="i", bty="n", main="Crh")
axis(1,seq(-5,5,1))
axis(2,seq(-5,5,1), las=2)

collist <- c(rep("slateblue", 12), rep("goldenrod", 11), rep("chartreuse3",19), rep("coral3", 14))
for(i in 1:11){
  for(k in 1:nrow(dss.yy)){
    lines(c(1,2),c(dss.yy[k,i+6], dss.yy[k,i+74]), col = "slateblue")
  }
}

plot(1,1,type="n", xaxt="n", yaxt="n", xlim=c(.5,2.5), ylim=c(-3.3, 3.3), ylab="Methylation", xlab = "Year",
     yaxs="i",xaxs="i", bty="n", main="Crhr")
axis(1,seq(-5,5,1))
axis(2,seq(-5,5,1), las=2)

collist <- c(rep("slateblue", 12), rep("goldenrod", 11), rep("chartreuse3",19), rep("coral3", 14))
for(i in 1:11){
  for(k in 1:nrow(dss.yy)){
    lines(c(1,2),c(dss.yy[k,i+29], dss.yy[k,i+97]), col = "chartreuse3")
  }
}

plot(1,1,type="n", xaxt="n", yaxt="n", xlim=c(.5,2.5), ylim=c(-3.3, 3.3), ylab="Methylation", xlab = "Year",
     yaxs="i",xaxs="i", bty="n", main="Fkbp5")
axis(1,seq(-5,5,1))
axis(2,seq(-5,5,1), las=2)

collist <- c(rep("slateblue", 12), rep("goldenrod", 11), rep("chartreuse3",19), rep("coral3", 14))
for(i in 1:11){
  for(k in 1:nrow(dss.yy)){
    lines(c(1,2),c(dss.yy[k,i+18], dss.yy[k,i+86]), col = "goldenrod")
  }
}


plot(1,1,type="n", xaxt="n", yaxt="n", xlim=c(.5,2.5), ylim=c(-3.3, 3.3), ylab="Methylation", xlab = "Year",
     yaxs="i",xaxs="i", bty="n", main="Nr3c1")
axis(1,seq(-5,5,1))
axis(2,seq(-5,5,1), las=2)
collist <- c(rep("slateblue", 12), rep("goldenrod", 11), rep("chartreuse3",19), rep("coral3", 14))
for(i in 1:11){
  for(k in 1:nrow(dss.yy)){
    lines(c(1,2),c(dss.yy[k,i+48], dss.yy[k,i+116]), col = "coral3")
  }
}

Nothing is super clear here but it does seem like there may be some age related changes in some genes. Would need to dig into this a bit more to really test it.

Is methylation correlated with GCs

Methylation averages vs. cort

Here I’m just plotting methylation at the same assay average levels vs. corticosterone (base, stress, dex, negfeed). I’m splitting up points and lines by capture and year in case there are any differences.

dc <- read.delim("Input_Cort_Fled.txt")
dc$MATCH <- paste(dc$fband, dc$year, dc$Capture, sep = "_")

dss <- as.data.frame(scale(ds[,7:62]))
for(i in 1:nrow(dss)){
  dss$CRH[i] <- mean(na.omit(t(dss[i,1:12])))
  dss$CRHR1[i] <- mean(na.omit(t(dss[i,24:42])))
  dss$FK[i] <- mean(na.omit(t(dss[i,13:23])))
  dss$GRn17t8[i] <- mean(na.omit(t(dss[i,44:53])))
  dss$GRn6t5[i] <- mean(na.omit(t(dss[i,55:56])))
}

dss <- cbind(ds[,1:6],dss)

dss$MATCH <- paste(ds$Band, ds$Year, ds$Capture, sep = "_")

dss2 <- join(dss, dc, "MATCH")
dss2$negf <- dss2$stress - dss2$dex

collist<-c("slateblue","orange","chartreuse3","coral3","gray50")

d17_1 <- subset(dss2, dss2$Capture == 1 & dss2$Year == 2017)
d17_3 <- subset(dss2, dss2$Capture == 3 & dss2$Year == 2017)
d18_1 <- subset(dss2, dss2$Capture == 1 & dss2$Year == 2018)
d18_2 <- subset(dss2, dss2$Capture == 2 & dss2$Year == 2018)
d18_3 <- subset(dss2, dss2$Capture == 3 & dss2$Year == 2018)

p.comp<-function(x1=c(998,999),x2=c(998,999),x3=c(998,999),x4=c(998,999),x5=c(998,999),
                 y1=rep(999,2),y2=rep(999,2),y3=rep(999,2),y4=rep(999,2),y5=rep(999,2),
                 leg=1,tit="CPG",
                 ticker=10,low=-2,lim=100,X="Methylation percent",Y="Cort"){
  plot(1,1,type="n",bty="n",ylab=Y,xlab=X,xaxt="n",yaxt="n",xaxs="i",yaxs="i",
       xlim=c(-3,3),ylim=c(low,lim),main=tit)
  axis(1,seq(-5,5,1))
  axis(2,seq(-20,120,ticker),las=2)
  
  points(x1, y1, pch=21, bg = collist[1])
  abline(lm(y1 ~ x1), col = collist[1])
  
  points(x2, y2, pch=21, bg = collist[2])
  abline(lm(y2 ~ x2), col = collist[2])
  
  points(x3, y3, pch=21, bg = collist[3])
  abline(lm(y3 ~ x3), col = collist[3])
  
  points(x4, y4, pch=21, bg = collist[4])
  abline(lm(y4 ~ x4), col = collist[4])
  
  points(x5, y5, pch=21, bg = collist[5])
  abline(lm(y5 ~ x5), col = collist[5])
 
  if(leg==1){legend("topleft",c("17_1","17_2","18_1","18_2","18_3"),pch=21,pt.bg=collist[1:5],bty="n")}
  if(leg==2){legend("topleft",c("17_1","18_1","18_3"),pch=21,pt.bg=collist[c(1,3,5)],bty="n")}
}

p.comp(d17_1$CRH, d17_3$CRH, d18_1$CRH, d18_2$CRH, d18_3$CRH, 
       d17_1$base, d17_3$base, d18_1$base, d18_2$base, d18_3$base, 
       tit = "Crh", Y = "Base cort", lim = 25)
p.comp(x1 = d17_1$CRH, x3 = d18_1$CRH, x5 = d18_3$CRH, 
       y1 = d17_1$stress, y3 = d18_1$stress, y5 = d18_3$stress, 
       tit = "Crh", Y = "Stress cort", lim = 90, leg = 2)
p.comp(x1 = d17_1$CRH, x3 = d18_1$CRH, x5 = d18_3$CRH, 
       y1 = d17_1$dex, y3 = d18_1$dex, y5 = d18_3$dex, 
       tit = "Crh", Y = "Dex cort", lim = 70, leg = 2)
p.comp(x1 = d17_1$CRH, x3 = d18_1$CRH, x5 = d18_3$CRH, 
       y1 = d17_1$negf, y3 = d18_1$negf, y5 = d18_3$negf, 
       tit = "Crh", Y = "Neg feed", lim = 70, leg = 2)

p.comp(d17_1$CRHR1, d17_3$CRHR1, d18_1$CRHR1, d18_2$CRHR1, d18_3$CRHR1, 
       d17_1$base, d17_3$base, d18_1$base, d18_2$base, d18_3$base, 
       tit = "Crhr1", Y = "Base cort", lim = 25)
p.comp(x1 = d17_1$CRHR1, x3 = d18_1$CRHR1, x5 = d18_3$CRHR1, 
       y1 = d17_1$stress, y3 = d18_1$stress, y5 = d18_3$stress, 
       tit = "Crhr1", Y = "Stress cort", lim = 90, leg = 2)
p.comp(x1 = d17_1$CRHR1, x3 = d18_1$CRHR1, x5 = d18_3$CRHR1, 
       y1 = d17_1$dex, y3 = d18_1$dex, y5 = d18_3$dex, 
       tit = "Crhr1", Y = "Dex cort", lim = 70, leg = 2)
p.comp(x1 = d17_1$CRHR1, x3 = d18_1$CRHR1, x5 = d18_3$CRHR1, 
       y1 = d17_1$negf, y3 = d18_1$negf, y5 = d18_3$negf, 
       tit = "Crhr1", Y = "Neg feed", lim = 70, leg = 2)

p.comp(d17_1$FK, d17_3$FK, d18_1$FK, d18_2$FK, d18_3$FK, 
       d17_1$base, d17_3$base, d18_1$base, d18_2$base, d18_3$base, 
       tit = "Fkbp5", Y = "Base cort", lim = 25)
p.comp(x1 = d17_1$FK, x3 = d18_1$FK, x5 = d18_3$FK, 
       y1 = d17_1$stress, y3 = d18_1$stress, y5 = d18_3$stress, 
       tit = "Fkbp5", Y = "Stress cort", lim = 90, leg = 2)
p.comp(x1 = d17_1$FK, x3 = d18_1$FK, x5 = d18_3$FK, 
       y1 = d17_1$dex, y3 = d18_1$dex, y5 = d18_3$dex, 
       tit = "Fkbp5", Y = "Dex cort", lim = 70, leg = 2)
p.comp(x1 = d17_1$FK, x3 = d18_1$FK, x5 = d18_3$FK, 
       y1 = d17_1$negf, y3 = d18_1$negf, y5 = d18_3$negf, 
       tit = "Fkbp5", Y = "Neg feed", lim = 70, leg = 2)

p.comp(d17_1$GRn18, d17_3$GRn18, d18_1$GRn18, d18_2$GRn18, d18_3$GRn18, 
       d17_1$base, d17_3$base, d18_1$base, d18_2$base, d18_3$base, 
       tit = "GRn18", Y = "Base cort", lim = 25)
p.comp(x1 = d17_1$GRn18, x3 = d18_1$GRn18, x5 = d18_3$GRn18, 
       y1 = d17_1$stress, y3 = d18_1$stress, y5 = d18_3$stress, 
       tit = "GRn18", Y = "Stress cort", lim = 90, leg = 2)
p.comp(x1 = d17_1$GRn18, x3 = d18_1$GRn18, x5 = d18_3$GRn18, 
       y1 = d17_1$dex, y3 = d18_1$dex, y5 = d18_3$dex, 
       tit = "GRn18", Y = "Dex cort", lim = 70, leg = 2)
p.comp(x1 = d17_1$GRn18, x3 = d18_1$GRn18, x5 = d18_3$GRn18, 
       y1 = d17_1$negf, y3 = d18_1$negf, y5 = d18_3$negf, 
       tit = "GRn18", Y = "Neg feed", lim = 70, leg = 2)

p.comp(d17_1$GRn17t8, d17_3$GRn17t8, d18_1$GRn17t8, d18_2$GRn17t8, d18_3$GRn17t8, 
       d17_1$base, d17_3$base, d18_1$base, d18_2$base, d18_3$base, 
       tit = "GRn17t8", Y = "Base cort", lim = 25)
p.comp(x1 = d17_1$GRn17t8, x3 = d18_1$GRn17t8, x5 = d18_3$GRn17t8, 
       y1 = d17_1$stress, y3 = d18_1$stress, y5 = d18_3$stress, 
       tit = "GRn17t8", Y = "Stress cort", lim = 90, leg = 2)
p.comp(x1 = d17_1$GRn17t8, x3 = d18_1$GRn17t8, x5 = d18_3$GRn17t8, 
       y1 = d17_1$dex, y3 = d18_1$dex, y5 = d18_3$dex, 
       tit = "GRn17t8", Y = "Dex cort", lim = 70, leg = 2)
p.comp(x1 = d17_1$GRn17t8, x3 = d18_1$GRn17t8, x5 = d18_3$GRn17t8, 
       y1 = d17_1$negf, y3 = d18_1$negf, y5 = d18_3$negf, 
       tit = "GRn17t8", Y = "Neg feed", lim = 70, leg = 2)

p.comp(d17_1$GRn7, d17_3$GRn7, d18_1$GRn7, d18_2$GRn7, d18_3$GRn7, 
       d17_1$base, d17_3$base, d18_1$base, d18_2$base, d18_3$base, 
       tit = "GRn7", Y = "Base cort", lim = 25)
p.comp(x1 = d17_1$GRn7, x3 = d18_1$GRn7, x5 = d18_3$GRn7, 
       y1 = d17_1$stress, y3 = d18_1$stress, y5 = d18_3$stress, 
       tit = "GRn7", Y = "Stress cort", lim = 90, leg = 2)
p.comp(x1 = d17_1$GRn7, x3 = d18_1$GRn7, x5 = d18_3$GRn7, 
       y1 = d17_1$dex, y3 = d18_1$dex, y5 = d18_3$dex, 
       tit = "GRn7", Y = "Dex cort", lim = 70, leg = 2)
p.comp(x1 = d17_1$GRn7, x3 = d18_1$GRn7, x5 = d18_3$GRn7, 
       y1 = d17_1$negf, y3 = d18_1$negf, y5 = d18_3$negf, 
       tit = "GRn7", Y = "Neg feed", lim = 70, leg = 2)

p.comp(d17_1$GRn6t5, d17_3$GRn6t5, d18_1$GRn6t5, d18_2$GRn6t5, d18_3$GRn6t5, 
       d17_1$base, d17_3$base, d18_1$base, d18_2$base, d18_3$base, 
       tit = "GRn6t5", Y = "Base cort", lim = 25)
p.comp(x1 = d17_1$GRn6t5, x3 = d18_1$GRn6t5, x5 = d18_3$GRn6t5, 
       y1 = d17_1$stress, y3 = d18_1$stress, y5 = d18_3$stress, 
       tit = "GRn6t5", Y = "Stress cort", lim = 90, leg = 2)
p.comp(x1 = d17_1$GRn6t5, x3 = d18_1$GRn6t5, x5 = d18_3$GRn6t5, 
       y1 = d17_1$dex, y3 = d18_1$dex, y5 = d18_3$dex, 
       tit = "GRn6t5", Y = "Dex cort", lim = 70, leg = 2)
p.comp(x1 = d17_1$GRn6t5, x3 = d18_1$GRn6t5, x5 = d18_3$GRn6t5, 
       y1 = d17_1$negf, y3 = d18_1$negf, y5 = d18_3$negf, 
       tit = "GRn6t5", Y = "Neg feed", lim = 70, leg = 2)

OK, so there are a coupld of places where stress cort or negative feedback seem to be correlated with methylation, but nothing is incredibly strong and there is some variation between the different captures and years. The next thing I want to do is make some plots comparing the strength of these relationships for each individual CpG site. Based on what we saw from the NGS data, the relationship could be quite different at different sites even within an assay.

Cort by methylation at each CpG

Here what I’m going to do is a kind of manhattan plot similar to what I did for the NGS data previously. The y axis will be p value and the x axis will be the CpG site. For now this is pooling over all years and captures. I could plot separately for each of the captures…

p.neg <- rep(NA, 56)
p.base <- rep(NA, 56)
p.str <- rep(NA, 56)
p.dex <- rep(NA< 56)
p.neg2 <- rep(NA, 56)
p.base2 <- rep(NA, 56)
p.str2 <- rep(NA, 56)
p.dex2 <- rep(NA< 56)
dss.save<-dss2

dss2.17 <- subset(dss2, dss2$Year == 2017)
dss2.18 <- subset(dss2, dss2$Year == 2018)

for(i in 1:56){
  p.neg[i] <- -log(summary(lm(dss2.17$negf ~ dss2.17[,i+6]))$coefficients[2, 4], 10)
  p.base[i] <- -log(summary(lm(dss2.17$base ~ dss2.17[,i+6]))$coefficients[2, 4], 10)
  p.str[i] <- -log(summary(lm(dss2.17$str ~ dss2.17[,i+6]))$coefficients[2, 4], 10)
  p.dex[i] <- -log(summary(lm(dss2.17$dex ~ dss2.17[,i+6]))$coefficients[2, 4], 10)
  p.neg2[i] <- -log(summary(lm(dss2.18$negf ~ dss2.18[,i+6]))$coefficients[2, 4], 10)
  p.base2[i] <- -log(summary(lm(dss2.18$base ~ dss2.18[,i+6]))$coefficients[2, 4], 10)
  p.str2[i] <- -log(summary(lm(dss2.18$str ~ dss2.18[,i+6]))$coefficients[2, 4], 10)
  p.dex2[i] <- -log(summary(lm(dss2.18$dex ~ dss2.18[,i+6]))$coefficients[2, 4], 10)
}

plot(1,1,type="n",bty="n",xaxt="n",yaxt="n",ylab="-log10(P)",xlab="CpG Site",xlim=c(0,57),ylim=c(0,3.2),main = "Baseline cort")
axis(1, seq(-10,100,110))
axis(2, seq(-1,6,1))

points(seq(1,56,1), p.base, pch = 21, bg = c(rep("lightblue",12), rep("pink", 11), rep("lightgreen", 19), rep("palegoldenrod", 14)))
points(seq(1,56,1), p.base2, pch = 21, bg = c(rep("slateblue",12), rep("red4", 11), rep("seagreen4", 19), rep("sienna3", 14)))
text(6.5, 3.2, "Crh")
text(18, 3.2, "Fkbp5")
text(33, 3.2, "Crhr1")
text(49.5, 3.2, "GR")
text(0.5,2.3,"Dark = 2018, Light = 2017", pos =4)

abline(h = -log(0.05, 10), lty=2, col="gray40")
abline(h = -log(0.01, 10), lty=2, col="gray40")
abline(h = -log(0.001, 10), lty=2, col="gray40")

text(.3,-log(0.05, 10), "P = 0.05", cex=.7, pos = 1)
text(.3,-log(0.01, 10), "P = 0.01", cex=.7, pos = 1)
text(.3,-log(0.001, 10), "P = 0.001", cex=.7, pos = 1)

plot(1,1,type="n",bty="n",xaxt="n",yaxt="n",ylab="-log10(P)",xlab="CpG Site",xlim=c(0,57),ylim=c(0,3.2),main = "Stress cort")
axis(1, seq(-10,100,110))
axis(2, seq(-1,6,1))

points(seq(1,56,1), p.str, pch = 21, bg = c(rep("lightblue",12), rep("pink", 11), rep("lightgreen", 19), rep("palegoldenrod", 14)))
points(seq(1,56,1), p.str2, pch = 21, bg = c(rep("slateblue",12), rep("red4", 11), rep("seagreen4", 19), rep("sienna3", 14)))
text(6.5, 3.2, "Crh")
text(18, 3.2, "Fkbp5")
text(33, 3.2, "Crhr1")
text(49.5, 3.2, "GR")
text(0.5,2.3,"Dark = 2018, Light = 2017", pos =4)

abline(h = -log(0.05, 10), lty=2, col="gray40")
abline(h = -log(0.01, 10), lty=2, col="gray40")
abline(h = -log(0.001, 10), lty=2, col="gray40")

text(.3,-log(0.05, 10), "P = 0.05", cex=.7, pos = 1)
text(.3,-log(0.01, 10), "P = 0.01", cex=.7, pos = 1)
text(.3,-log(0.001, 10), "P = 0.001", cex=.7, pos = 1)

plot(1,1,type="n",bty="n",xaxt="n",yaxt="n",ylab="-log10(P)",xlab="CpG Site",xlim=c(0,57),ylim=c(0,3.2),main = "Dex cort")
axis(1, seq(-10,100,110))
axis(2, seq(-1,6,1))

points(seq(1,56,1), p.dex, pch = 21, bg = c(rep("lightblue",12), rep("pink", 11), rep("lightgreen", 19), rep("palegoldenrod", 14)))
points(seq(1,56,1), p.dex2, pch = 21, bg = c(rep("slateblue",12), rep("red4", 11), rep("seagreen4", 19), rep("sienna3", 14)))
text(6.5, 3.2, "Crh")
text(18, 3.2, "Fkbp5")
text(33, 3.2, "Crhr1")
text(49.5, 3.2, "GR")
text(0.5,2.3,"Dark = 2018, Light = 2017", pos =4)

abline(h = -log(0.05, 10), lty=2, col="gray40")
abline(h = -log(0.01, 10), lty=2, col="gray40")
abline(h = -log(0.001, 10), lty=2, col="gray40")

text(.3,-log(0.05, 10), "P = 0.05", cex=.7, pos = 1)
text(.3,-log(0.01, 10), "P = 0.01", cex=.7, pos = 1)
text(.3,-log(0.001, 10), "P = 0.001", cex=.7, pos = 1)

plot(1,1,type="n",bty="n",xaxt="n",yaxt="n",ylab="-log10(P)",xlab="CpG Site",xlim=c(0,57),ylim=c(0,3.2),main = "Negative feedback")
axis(1, seq(-10,100,110))
axis(2, seq(-1,6,1))

points(seq(1,56,1), p.neg, pch = 21, bg = c(rep("lightblue",12), rep("pink", 11), rep("lightgreen", 19), rep("palegoldenrod", 14)))
points(seq(1,56,1), p.neg2, pch = 21, bg = c(rep("slateblue",12), rep("red4", 11), rep("seagreen4", 19), rep("sienna3", 14)))
text(6.5, 3.2, "Crh")
text(18, 3.2, "Fkbp5")
text(33, 3.2, "Crhr1")
text(49.5, 3.2, "GR")
text(0.5,2.3,"Dark = 2018, Light = 2017", pos =4)

abline(h = -log(0.05, 10), lty=2, col="gray40")
abline(h = -log(0.01, 10), lty=2, col="gray40")
abline(h = -log(0.001, 10), lty=2, col="gray40")

text(.3,-log(0.05, 10), "P = 0.05", cex=.7, pos = 1)
text(.3,-log(0.01, 10), "P = 0.01", cex=.7, pos = 1)
text(.33,-log(0.001, 10), "P = 0.001", cex=.7, pos = 1)

dss2<-dss.save

There are some strong relationships in 2018, but we also used some of these 2018 samples for the NGS data that was used to select assays in the first place. Below I’m plotting the same relationships for 2018 only but restricting just to samples that were not used in NGS selection. It seems plausible that there could be real differences between years, but if the 2018 NGS vs. non-NGS samples differ it might suggest that we were essentially overfitting based on the training (NGS) data and that the patterns don’t hold up in the larger sample size.

dNGS <- read.delim("Input_NGS.txt")

check <- as.data.frame(dNGS[,1])
check$here <- rep(1,nrow(check))
colnames(check)<-c("SampleID", "here")

p.neg2 <- rep(NA, 56)
p.base2 <- rep(NA, 56)
p.str2 <- rep(NA, 56)
p.dex2 <- rep(NA< 56)
dss.save<-dss2

tmp <- dss2[,c("SampleID","Band")]
tmp$SampleID <- as.character(tmp$SampleID)
check$SampleID <- as.character(check$SampleID)
check <- join(check, tmp, "SampleID")
check <- check[,c("Band","here")]

dss2.18 <- subset(dss2, dss2$Year == 2018)
dss2.18 <- join(dss2.18, check, "Band")
dss2.18 <- subset(dss2.18, is.na(dss2.18$here) == TRUE)

for(i in 1:56){
  p.neg2[i] <- -log(summary(lm(dss2.18$negf ~ dss2.18[,i+6]))$coefficients[2, 4], 10)
  p.base2[i] <- -log(summary(lm(dss2.18$base ~ dss2.18[,i+6]))$coefficients[2, 4], 10)
  p.str2[i] <- -log(summary(lm(dss2.18$str ~ dss2.18[,i+6]))$coefficients[2, 4], 10)
  p.dex2[i] <- -log(summary(lm(dss2.18$dex ~ dss2.18[,i+6]))$coefficients[2, 4], 10)
}

plot(1,1,type="n",bty="n",xaxt="n",yaxt="n",ylab="-log10(P)",xlab="CpG Site",xlim=c(0,57),ylim=c(0,3.2),main = "Baseline cort")
axis(1, seq(-10,100,110))
axis(2, seq(-1,6,1))

points(seq(1,56,1), p.base2, pch = 21, bg = c(rep("slateblue",12), rep("red4", 11), rep("seagreen4", 19), rep("sienna3", 14)))
text(6.5, 3.2, "Crh")
text(18, 3.2, "Fkbp5")
text(33, 3.2, "Crhr1")
text(49.5, 3.2, "GR")

abline(h = -log(0.05, 10), lty=2, col="gray40")
abline(h = -log(0.01, 10), lty=2, col="gray40")
abline(h = -log(0.001, 10), lty=2, col="gray40")

text(.3,-log(0.05, 10), "P = 0.05", cex=.7, pos = 1)
text(.3,-log(0.01, 10), "P = 0.01", cex=.7, pos = 1)
text(.3,-log(0.001, 10), "P = 0.001", cex=.7, pos = 1)

plot(1,1,type="n",bty="n",xaxt="n",yaxt="n",ylab="-log10(P)",xlab="CpG Site",xlim=c(0,57),ylim=c(0,3.2),main = "Stress cort")
axis(1, seq(-10,100,110))
axis(2, seq(-1,6,1))

points(seq(1,56,1), p.str2, pch = 21, bg = c(rep("slateblue",12), rep("red4", 11), rep("seagreen4", 19), rep("sienna3", 14)))
text(6.5, 3.2, "Crh")
text(18, 3.2, "Fkbp5")
text(33, 3.2, "Crhr1")
text(49.5, 3.2, "GR")

abline(h = -log(0.05, 10), lty=2, col="gray40")
abline(h = -log(0.01, 10), lty=2, col="gray40")
abline(h = -log(0.001, 10), lty=2, col="gray40")

text(.3,-log(0.05, 10), "P = 0.05", cex=.7, pos = 1)
text(.3,-log(0.01, 10), "P = 0.01", cex=.7, pos = 1)
text(.3,-log(0.001, 10), "P = 0.001", cex=.7, pos = 1)

plot(1,1,type="n",bty="n",xaxt="n",yaxt="n",ylab="-log10(P)",xlab="CpG Site",xlim=c(0,57),ylim=c(0,3.2),main = "Dex cort")
axis(1, seq(-10,100,110))
axis(2, seq(-1,6,1))

points(seq(1,56,1), p.dex2, pch = 21, bg = c(rep("slateblue",12), rep("red4", 11), rep("seagreen4", 19), rep("sienna3", 14)))
text(6.5, 3.2, "Crh")
text(18, 3.2, "Fkbp5")
text(33, 3.2, "Crhr1")
text(49.5, 3.2, "GR")

abline(h = -log(0.05, 10), lty=2, col="gray40")
abline(h = -log(0.01, 10), lty=2, col="gray40")
abline(h = -log(0.001, 10), lty=2, col="gray40")

text(.3,-log(0.05, 10), "P = 0.05", cex=.7, pos = 1)
text(.3,-log(0.01, 10), "P = 0.01", cex=.7, pos = 1)
text(.3,-log(0.001, 10), "P = 0.001", cex=.7, pos = 1)

plot(1,1,type="n",bty="n",xaxt="n",yaxt="n",ylab="-log10(P)",xlab="CpG Site",xlim=c(0,57),ylim=c(0,3.2),main = "Negative feedback")
axis(1, seq(-10,100,110))
axis(2, seq(-1,6,1))

points(seq(1,56,1), p.neg2, pch = 21, bg = c(rep("slateblue",12), rep("red4", 11), rep("seagreen4", 19), rep("sienna3", 14)))
text(6.5, 3.2, "Crh")
text(18, 3.2, "Fkbp5")
text(33, 3.2, "Crhr1")
text(49.5, 3.2, "GR")

abline(h = -log(0.05, 10), lty=2, col="gray40")
abline(h = -log(0.01, 10), lty=2, col="gray40")
abline(h = -log(0.001, 10), lty=2, col="gray40")

text(.3,-log(0.05, 10), "P = 0.05", cex=.7, pos = 1)
text(.3,-log(0.01, 10), "P = 0.01", cex=.7, pos = 1)
text(.33,-log(0.001, 10), "P = 0.001", cex=.7, pos = 1)

dss2<-dss.save

The results here are a bit mixed. Our NGS dataset actually included ~60% of the total individuals (we only included 3rd sample captures but since 1-2-3 captures are so similar in methylation I’ve excluded all from those individuals). If I fit the relationships for 2018 using only the 40% of birds that were not included in NGS the CRH relationships more or less go away, but it does seem like there may still be relationhips at the same locations for GR. THe sample size here is quite small (30 individuals with 1-3 samples each) and also only includes the predator treatments since those were the ones that we held out from the NGS submission.

Delta cort vs. delta methylation

Here rather than looking at just the raw relationships I want to look at the change in methylation vs. the change in GC traits from 1-2-3 captures. Since individuals seem to have very consistent methylation profiles, it may be that the change is actually important in regulating GC expression.

# First I'm making a data frame that includes 1st and 3rd captures in the same row
  sub1 <- subset(dss2, dss2$Capture == 3)
  colnames(sub1) <- paste(colnames(sub1), ".cap3", sep="")
  sub1$MATCH2 <- paste(sub1$Band.cap3, sub1$Year.cap3, sep="_")
  sub2 <- subset(dss2, dss2$Capture == 2)
  colnames(sub2) <- paste(colnames(sub2), ".cap2", sep="")
  sub2$MATCH2 <- paste(sub2$Band.cap3, sub2$Year.cap2, sep="_")
  
  dss2$MATCH2 <- paste(dss2$Band, dss2$Year, sep="_")
  
  dssx <- subset(dss2, dss2$Capture ==1)
  dssx <- join(dssx, sub2, "MATCH2")
  dssx <- join(dssx, sub1, "MATCH2")
  
  delt <- as.data.frame(matrix(nrow=nrow(dssx), ncol=70))
  colnames(delt) <- colnames(dssx)[c(3:6, 78, 7:67, 74:76, 79)]
  colnames(delt)[6:70] <- paste("d.", colnames(delt)[6:70], sep="")
  
  delt[,c(1:5)] <- dssx[,c(3:6, 78)]
  
  for(i in 1:61){
    delt[ ,i+5] <- dssx[,i+165] - dssx[,i+6]
  }
  for(i in 1:3){
    delt[ ,i+66] <- dssx[,i+232] - dssx[,i+73]
  }
  delt[,70] <- dssx[,238] - dssx[,79]
  
  delt.17 <- subset(delt, delt$Year == 2017)
  
  b.1t3.17 <- rep(NA, 61)
  s.1t3.17 <- rep(NA, 61)
  d.1t3.17 <- rep(NA, 61)
  n.1t3.17 <- rep(NA, 61)
  
  delt.18 <- subset(delt, delt$Year == 2018)
  
  b.1t3.18 <- rep(NA, 61)
  s.1t3.18 <- rep(NA, 61)
  d.1t3.18 <- rep(NA, 61)
  n.1t3.18 <- rep(NA, 61)
  
  for(i in 1:61){
    b.1t3.17[i] <- summary(lm(delt.17$d.base ~ delt.17[,i+5]))$coefficients[2,4]
    
    b.1t3.18[i] <- summary(lm(delt.18$d.base ~ delt.18[,i+5]))$coefficients[2,4]
    s.1t3.18[i] <- summary(lm(delt.18$d.stress ~ delt.18[,i+5]))$coefficients[2,4]
    d.1t3.18[i] <- summary(lm(delt.18$d.dex ~ delt.18[,i+5]))$coefficients[2,4]
    n.1t3.18[i] <- summary(lm(delt.18$d.negf ~ delt.18[,i+5]))$coefficients[2,4]
  }
  
plot(1,1,type="n",bty="n",xaxt="n",yaxt="n",ylab="-log10(P)",xlab="CpG Site",xlim=c(0,74),ylim=c(0,3.2),main = "Delta cort vs. delta meth")
axis(1, seq(-10,100,110))
axis(2, seq(-1,6,1))

points(seq(1,61,1), -log(b.1t3.17,10), pch = 21, bg = "lightblue")
points(seq(1,61,1), -log(b.1t3.18,10), pch = 21, bg = "slateblue")
points(seq(1,61,1), -log(s.1t3.18,10), pch =21, bg = "coral3")
points(seq(1,61,1), -log(d.1t3.18,10), pch = 21, bg = "chartreuse3")
points(seq(1,61,1), -log(n.1t3.18,10), pch = 21, bg = "goldenrod")

legend("bottomright",c("base_17","base_18","str_18","dex_18","fb_18"),pch=21,bty="n",
       pt.bg=c("lightblue","slateblue","coral3","chartreuse3","goldenrod"))

abline(v = c(12.5, 23.5, 42.5, 56.5), lty=3, col="red")

text(6.5, 3.2, "Crh")
text(18, 3.2, "Fkbp5")
text(33, 3.2, "Crhr1")
text(49.5, 3.2, "GR")
text(63.5, 3.2, "Avgs")

abline(h = -log(0.05, 10), lty=2, col="gray40")
abline(h = -log(0.01, 10), lty=2, col="gray40")
abline(h = -log(0.001, 10), lty=2, col="gray40")

text(.3,-log(0.05, 10), "P = 0.05", cex=.7, pos = 1)
text(.3,-log(0.01, 10), "P = 0.01", cex=.7, pos = 1)
text(.3,-log(0.001, 10), "P = 0.001", cex=.7, pos = 1)

It doesn’t seem like there is anything in here that is super convincing. There are a handful of relationships with P < 0.05, but that is to be expected with so many comparisons. Also there don’t seem to be any places where the 2017 and 2018 data are in agreement. Note that I’ve only compared first to third captures here for convenience, but 2017 doesn’t have second capture and 2018 only has baseline cort at second capture.

Is methylation correlated with RS

Methylation vs. fledging

First this is just fitting simple models to ask whether methylation at any CPG or average is correlated with success at fledging (yes or no). Before splitting into groups I’m just fitting the entire dataset with all samples pooled.

dssf <- dss2
dssf$Fled <- ifelse(dssf$NumFled == 0, 0, 1)

store.p <- rep(NA, 61)
dssf$Band <- as.factor(dssf$Band)
dssf$Year <- as.factor(dssf$Year)
dssf$Capture <- as.factor(dssf$Capture)
for(i in 1:61){
  dat.temp <- dssf[,c(i+6, 3:5, 81)]
  colnames(dat.temp)[1] <- "Meth"
  m <- glmer(Fled ~ Meth + Capture + Year + (1|Band), data = dat.temp, family = binomial)
  store.p[i] <- summary(m)$coefficients[2,4]
}

m <- glmer(Fled ~ GRn17t8 + Capture + as.factor(Year) + (1|Band), data = dssf, family = binomial)

plot(1,1,type="n",bty="n",xaxt="n",yaxt="n",ylab="-log10(P)",xlab="CpG Site",xlim=c(0,74),ylim=c(0,3.2),main = "Fledging success")
axis(1, seq(-10,100,110))
axis(2, seq(-1,6,1))

points(seq(1,61,1),-log(store.p,10),pch=21,bg="lightblue")

abline(v = c(12.5, 23.5, 42.5, 56.5), lty=3, col="red")

text(6.5, 3.2, "Crh")
text(18, 3.2, "Fkbp5")
text(33, 3.2, "Crhr1")
text(49.5, 3.2, "GR")
text(63.5, 3.2, "Avgs")

abline(h = -log(0.05, 10), lty=2, col="gray40")
abline(h = -log(0.01, 10), lty=2, col="gray40")
abline(h = -log(0.001, 10), lty=2, col="gray40")

text(.3,-log(0.05, 10), "P = 0.05", cex=.7, pos = 1)
text(.3,-log(0.01, 10), "P = 0.01", cex=.7, pos = 1)
text(.3,-log(0.001, 10), "P = 0.001", cex=.7, pos = 1)

There is nothing really here in the binomial glms…next I’m running it as the actual number of offspring fledged.

dssf <- dss2

store.p <- rep(NA, 61)
dssf$Band <- as.factor(dssf$Band)
dssf$Year <- as.factor(dssf$Year)
dssf$Capture <- as.factor(dssf$Capture)
for(i in 1:61){
  dat.temp <- dssf[,c(i+6, 3:5, 78)]
  colnames(dat.temp)[1] <- "Meth"
  m <- lmer(NumFled ~ Meth + Capture + Year + (1|Band), data = dat.temp)
  store.p[i] <- summary(m)$coefficients[2,5]
}



plot(1,1,type="n",bty="n",xaxt="n",yaxt="n",ylab="-log10(P)",xlab="CpG Site",xlim=c(0,74),ylim=c(0,3.2),main = "Fledging number")
axis(1, seq(-10,100,110))
axis(2, seq(-1,6,1))

points(seq(1,61,1),-log(store.p,10),pch=21,bg="lightblue")

abline(v = c(12.5, 23.5, 42.5, 56.5), lty=3, col="red")

text(6.5, 3.2, "Crh")
text(18, 3.2, "Fkbp5")
text(33, 3.2, "Crhr1")
text(49.5, 3.2, "GR")
text(63.5, 3.2, "Avgs")

abline(h = -log(0.05, 10), lty=2, col="gray40")
abline(h = -log(0.01, 10), lty=2, col="gray40")
abline(h = -log(0.001, 10), lty=2, col="gray40")

text(.3,-log(0.05, 10), "P = 0.05", cex=.7, pos = 1)
text(.3,-log(0.01, 10), "P = 0.01", cex=.7, pos = 1)
text(.3,-log(0.001, 10), "P = 0.001", cex=.7, pos = 1)

Again, it really doesn’t look like there is anything here…

Compare methylation to color

I’m doing the same thing as above but now looking at breast brightness. At present I only have brightness measures in here for 2017, I still need to add them for 2018.

dssf <- dss2

store.p <- rep(NA, 61)
dssf$Band <- as.factor(dssf$Band)
dssf$Year <- as.factor(dssf$Year)
dssf$Capture <- as.factor(dssf$Capture)
dssf$BibB1 <- scale(dssf$BibB1)
for(i in 1:61){
  dat.temp <- na.omit(dssf[,c(2,i+6, 3:5)])
  colnames(dat.temp)[2] <- "Meth"
  dat.temp <- subset(dat.temp, dat.temp$Capture == 1)
  m <- lm(Meth ~ BibB1, data = dat.temp)
  store.p[i] <- summary(m)$coefficients[2,4]
}



plot(1,1,type="n",bty="n",xaxt="n",yaxt="n",ylab="-log10(P)",xlab="CpG Site",xlim=c(0,74),ylim=c(0,3.2),main = "Breast brightness")
axis(1, seq(-10,100,110))
axis(2, seq(-1,6,1))

points(seq(1,61,1),-log(store.p,10),pch=21,bg="coral3")

abline(v = c(12.5, 23.5, 42.5, 56.5), lty=3, col="red")

text(6.5, 3.2, "Crh")
text(18, 3.2, "Fkbp5")
text(33, 3.2, "Crhr1")
text(49.5, 3.2, "GR")
text(63.5, 3.2, "Avgs")

abline(h = -log(0.05, 10), lty=2, col="gray40")
abline(h = -log(0.01, 10), lty=2, col="gray40")
abline(h = -log(0.001, 10), lty=2, col="gray40")

text(.3,-log(0.05, 10), "P = 0.05", cex=.7, pos = 1)
text(.3,-log(0.01, 10), "P = 0.01", cex=.7, pos = 1)
text(.3,-log(0.001, 10), "P = 0.001", cex=.7, pos = 1)

This is only for 2017, but it doesn’t seem like there is any strong evidence for a relationship between methylation and breast brightness. Maybe a bit in CRH but it isn’t super convincing.

NGS vs. Pyrosequencing

Compare average methylation NGS to Pyro

This is mostly a sanity check just to see whether the average methylation amounts and individual samples correspond between NGS and pyrosequencing data. The 36 samples that were used for NGS methylation analysis for 2018 were also included in the pyrosequencing data set and are compared here. First comparing average methylation across CpG sites for the two methods.

dNGS <- read.delim("Input_NGS.txt")

check <- as.data.frame(dNGS[,1])
check$here <- rep(1,nrow(check))
colnames(check)<-c("SampleID","here")
dsx <- join(ds,check,"SampleID")
dsx <- subset(dsx, is.na(dsx$here)==FALSE)

pyro <- rep(NA, 56)
ngs <- rep(NA, 56)
for(i in 1:length(pyro)){
  pyro[i] <- mean(na.omit(dsx[, i+6]))
  ngs[i] <- mean(na.omit(dNGS[, i+1]))
}

  plot(1,1,type="n",yaxs="i",xaxs="i",ylim=c(0,100),xlim=c(0,57),yaxt="n",xaxt="n",
       xlab="CpG sites (light = pyro, dark = NGS)",ylab="Methylation",bty="n")
  axis(1,seq(-10,100,10),cex=0.9)
  axis(2,seq(-20,120,20),las=2)
  
  lines(seq(1,12,1),pyro[1:12],col="lightblue")
  lines(seq(13,23,1),pyro[13:23],col="goldenrod")
  lines(seq(24,42,1),pyro[24:42],col="lightgreen")
  lines(seq(43,56,1),pyro[43:56],col="pink")
  
  lines(seq(2,13,1),pyro[1:12],col="gray40")
  
  lines(seq(1,12,1),ngs[1:12],col="slateblue")
  lines(seq(13,23,1),ngs[13:23],col="sienna3")
  lines(seq(24,42,1),ngs[24:42],col="chartreuse3")
  lines(seq(43,56,1),ngs[43:56],col="coral3")
  
  points(seq(1,56,1), pyro, pch=21, bg = c(rep("lightblue",12), rep("goldenrod",11),
                                           rep("lightgreen",19), rep("pink",14)))
  points(seq(2,13,1), pyro[1:12], pch=21, bg = "gray40")
  points(seq(1,56,1), ngs, pch=21, bg = c(rep("slateblue",12), rep("sienna3",11),
                                           rep("chartreuse3",19), rep("coral3",14)))
  
  text(6.5, 95, "Crh")
  text(18.5, 95, "Fkbp5")
  text(33.5, 95, "Crhr1")
  text(50, 95, "GR")

There seems to be a very close correspondance between the average methylation detected from NGS and from pyrosequencing for all four genes. For CRH, I have a feeling that the CPG numbering may be off by one (shifting hte pyrosequencing one CpG to the right would be a near perfect match). I’ve checked the FASTA IDs that Epigen provided and they do seem to match up, but it seems like this could be an error introduced by the conditions of the pyrosequencing or something like that. It doesn’t really matter since we aren’t making any big interpretations based on the exact positions of the CpG sites, but it may be worth asking Epigen about. The dark gray line and points for CRH is what it would like like if the CRH CpG locations were shifted over by one spot.

Next I’m going to compare individual sample repeatability across the two different methylation technologies. For this I’ll just plot the pairwise correlation at each site rather than make a ton of scatterplots since the point is really only to see that the results are consistent.

Compare individual samples NGS to pyro

dNGS2 <- dNGS
colnames(dNGS2) <- c("SampleID", paste(colnames(dNGS)[2:ncol(dNGS)], ".NGS", sep = ""))
dsx2 <- join(dsx, dNGS2, "SampleID")

cons <- rep(NA, 56)

for(i in 1:56){
  cons[i] <- cor(dsx2[, i+6], dsx2[, i+63], use = "pairwise.complete.obs")
}

cons.crh <- rep(NA, 11)
for(i in 1:11){
  cons.crh[i] <- cor(dsx2[, i+6], dsx2[, i+64], use = "pairwise.complete.obs")
}


  plot(1,1,type="n",yaxs="i",xaxs="i",ylim=c(0,1),xlim=c(0,57),yaxt="n",xaxt="n",
       xlab="CpG sites (light = pyro, dark = NGS)",ylab="Methylation",bty="n")
  axis(1,seq(-10,100,10),cex=0.9)
  axis(2,seq(-20,120,.5),las=2)
  abline(h=0.5, lty=2, col="gray50")
  
  lines(seq(1,12,1),cons[1:12],col="slateblue")
  lines(seq(13,23,1),cons[13:23],col="sienna3")
  lines(seq(24,42,1),cons[24:42],col="chartreuse3")
  lines(seq(43,56,1),cons[43:56],col="coral3")
  lines(seq(1,11,1),cons.crh,col="gray50")
  
  

  points(seq(1,11,1), cons.crh, pch=21, bg = "gray50")
  points(seq(1,56,1), cons, pch=21, bg = c(rep("slateblue",12), rep("sienna3",11),
                                           rep("chartreuse3",19), rep("coral3",14)))
  
  text(6.5, .95, "Crh")
  text(18.5, .95, "Fkbp5")
  text(33.5, .95, "Crhr1")
  text(50, .95, "GR")

Most individual CpG sites are reasonably well correlated across the two methods, though a few are not. Many of those bad ones are at spots where the methylation is very low overall so I think the ‘readings’ may be mostly noise at those locations. The gray line in CRH is shifting the points over to where I think they should be (see above). Interestingly the points are reasonably well correlated even with them offset, which I think reflects the fact that methylation is pretty highly correlated within each assay. Next I’m plotting methylation averages over the entire assay for each sample using pyrosequencing vs. NGS.

  for(i in 1:nrow(dsx2)){
    dsx2$CRHp[i] <- mean(na.omit(t(dsx2[i,7:18])))
    dsx2$FKp[i] <- mean(na.omit(t(dsx2[i,19:29])))
    dsx2$CRHRp[i] <- mean(na.omit(t(dsx2[i,30:48])))
    dsx2$GRp[i] <- mean(na.omit(t(dsx2[i,49:62])))
    
    dsx2$CRHngs[i] <- mean(na.omit(t(dsx2[i,64:75])))
    dsx2$FKngs[i] <- mean(na.omit(t(dsx2[i,76:86])))
    dsx2$CRHRngs[i] <- mean(na.omit(t(dsx2[i,87:105])))
    dsx2$GRngs[i] <- mean(na.omit(t(dsx2[i,106:119])))    
  }
  
plot(1,1,type="n",bty="n",xaxt="n",yaxt="n",ylab="Pyrosequencing",xlab="NGS",xlim=c(0,100),ylim=c(0,100),
     main="Average meth assay wide")
axis(1,seq(-20,120,20))
axis(2,seq(-20,120,20))

points(dsx2$CRHngs, dsx2$CRHp, pch = 21, bg = "slateblue")
points(dsx2$FKngs, dsx2$FKp, pch = 21, bg = "goldenrod")
points(dsx2$CRHRngs, dsx2$CRHRp, pch = 21, bg = "chartreuse3")
points(dsx2$GRngs, dsx2$GRp, pch = 21, bg = "coral3")

legend("topleft",c("Crh","Crhr1","Nr3c1","Fkbp5"),pch=21,pt.bg=c("slateblue","chartreuse3","coral3","goldenrod"),
       bty="n")

c <- round(cor(dsx2$CRHngs, dsx2$CRHp, use = "pairwise.complete.obs"),2)
text(40,20, paste("r = ",c,sep=""),col="slateblue")

c <- round(cor(dsx2$CRHRngs, dsx2$CRHRp, use = "pairwise.complete.obs"),2)
text(25, 9, paste("r = ",c,sep=""),col="chartreuse3")

c <- round(cor(dsx2$GRngs, dsx2$GRp, use = "pairwise.complete.obs"),2)
text(65,36, paste("r = ",c,sep=""),col="coral3")

c <- round(cor(dsx2$FKngs, dsx2$FKp, use = "pairwise.complete.obs"),2)
text(87,57, paste("r = ",c,sep=""),col="goldenrod")


plot(1,1,type="n",bty="n",xaxt="n",yaxt="n",ylab="Pyrosequencing",xlab="NGS",xlim=c(0,100),ylim=c(0,100),
     main="All CpG sites comparison")
axis(1,seq(-20,120,20))
axis(2,seq(-20,120,20))

library(scales)
for(i in 1:11){
  points(dsx2[,i+6], dsx2[,i+64], pch = 16, col = alpha("slateblue",0.3))
}

for(i in 1:11){
  points(dsx2[,i+18], dsx2[,i+75], pch = 16, col = alpha("goldenrod",0.5))
}
for(i in 1:19){
  points(dsx2[,i+29], dsx2[,i+86], pch = 16, col = alpha("chartreuse3",0.19))
}

for(i in 1:14){
  points(dsx2[,i+48], dsx2[,i+105], pch = 16, col = alpha("coral3",0.35))
}

legend("topleft",c("Crh","Crhr1","Nr3c1","Fkbp5"),pch=16,col=c("slateblue","chartreuse3","coral3","goldenrod"),
       bty="n")

Plotting all four genes with averages for each sample over the entire assay looks pretty good. The span of methylation across all four genes makes it look a bit better than it maybe really is, but even GR with a lower correlation still is at 0.53 and some individual CpGs in there are much higher. It also seems when looking at the individual CpG plot that there might be one or two sites that are throwing off the GR correlation. There did seem to be a mutation there from what Epigen said so it could be that the alignment between the NGS and pyrosequencing samples is not quite right in one or two cases and that is what is leading to the lack of correlation. I’m not sure that is a problem for the pyrosequencing data since those are all processed in the same way.

I guess overall it seems like the two methods are working pretty well, even though there is a fair bit of noise at some sites. I’m not sure we could hope for much better given that these are two entirely separate procedures and represent entirely separate preparations (different extractions, methylation conversions, etc).

Tree Swallow Methylation Projects

Conor Taff

February 25, 2019

Pyrosequencing Data

Color Manipulation Experiment

Read and organize the data

Generate correlation matrix heatmaps

Plot overall methylation percentages for each gene

Scale and calculate assay average methylation

Plot values by groups

Fit models

Individual consistency in methylation

Color Stressor Experiment

Read and organize the data

Generate correlation matrix heatmaps

Plot overall methylation percentages for each gene

Scale and calculate assay average methylation

Plot values by groups

Fit models

Individual consistency in methylation

Brain Collection Experiment

Read and organize the data

Generate correlation matrix heatmaps

Overall methylation levels

Scale and calculate assay average methylation

Compare methylation averages across tissue types

Does methylation vary by location

Flexibility in methylation?

Identify individual CpG sites that change

Individual changes at flexible CpG sites

Cross year changes in methylation?

Age related changes in methylation.

Is methylation correlated with GCs

Methylation averages vs. cort

Cort by methylation at each CpG

Delta cort vs. delta methylation

Is methylation correlated with RS

Methylation vs. fledging

Compare methylation to color

NGS vs. Pyrosequencing

Compare average methylation NGS to Pyro

Compare individual samples NGS to pyro