AMCR Mosquito Buckets

Overview

This code uses data from mosquito bloodmeals from buckets placed near and far from active crow nests in Davis California. Collected mosquitoes were used to identify what host had been fed on. The main questions to be addressed are:

How common are crow feeds?
Are crow feeds more common near active nests?
How do crow feeds vary by date in the season?

Secondary questions include:

What other species are most commonly fed on?
How do seasonal patterns for those other species compare to crows?

The markdown code below produces tables and figures that address these questions. Some description of the methods to produce each item are included along with some interpretation, but the goal of this document is to keep a record of these figures and possible options with the intention of expanding on or condensing these explanations for the working paper. For some of the figures multiple options are plotted and no figure numbers are given because many of these may not end up in the final paper.

First, I load packages to be used, read in the relevant data, and do some data cleanup to help with plotting.

## Load packages ----
  pacman::p_load(lme4, ggplot2, here, MuMIn, rethinking, MASS, scales, plotfunctions, plyr, vegan,
                 lubridate, data.table, ggrepel, dplyr, tidyverse, ggmap, sjPlot, geosphere)

## Read in data ----
  # Data by bloodmeal tested
    d_blood <- read.delim(here::here("1_data", "bloodmeal_data.txt"))
    d_blood$doy_s <- scale(d_blood$doy)   # scaled to improve model fitting, doesn't transform any information
    d_blood$bucket_id <- as.factor(d_blood$bucket_id)
    for(i in 1:nrow(d_blood)){
      if(d_blood$meal_common_name[i] == "American crow"){d_blood$crow_fed[i] <- 1}
      if(d_blood$meal_common_name[i] != "American crow"){d_blood$crow_fed[i] <- 0}
    }
        
  # Data on nest locations, characteristics, and active dates
    d_nest <- read.delim(here::here("1_data", "nest_data.txt"))
    
  # Data on bucket characteristics and locations
    d_buck <- read.delim(here::here("1_data", "bucket_data.txt"))

Using these data, I'm calculating distances from buckets to active nests. Because nests are only active for part of the season but mosquitoes are collected from the same buckets on many days, this needs to be calculated for each bloodmeal. So this code chunk is doing two things. First, just generating a count of the number of total active nests on each day (for plotting a histogram later). Second, for every bloodmeal that was identified, the code loops through and identifies based on the day the distance the the nearest active nest in the population. Distance to that nest is recorded in meters, but later on a threshold can be applied (e.g., 50m) to count meals that are near or far from active nests in a categorical sense. If no nests are active on a day that a bloodmeal is collected, then the value 9999 is entered as a placeholder.

# Make a data frame with the count of active nests on each day of the season for plotting a histogram
  active_nests <- as.data.frame(seq(min(na.omit(d_nest$start_active)), max(d_blood$doy), 1))
  colnames(active_nests) <- "doy"
  for(i in 1:nrow(active_nests)){
    sub <- subset(d_nest, d_nest$start_active <= active_nests$doy[i] & d_nest$end_active >= active_nests$doy[i])
    active_nests$count[i] <- nrow(sub)
  }

# For each bloodmeal, determine the distance to the closest active nest.
  # First join bloodmeals to bucket info to add spatial data to each bloodmeal
      d_buck2 <- d_buck[, c("bucket_id", "latitude", "longitude", "long_dd", "lat_dd")]
      d_blood <- join(d_blood, d_buck2, "bucket_id", match = "first")
      
  # Loop through each bloodmeal and find closest nest using pythagorean theorem and UTM locations for only active nests
      # There is a redundant calculation using decimal degrees that was used to check on some discrepancies...both are the same
      for(i in 1:nrow(d_blood)){
        
        # subset nests to just those that are active on the day a bloodmeal was collected 
          sub <- subset(d_nest, d_nest$start_active <= d_blood$doy[i] & d_nest$end_active >= d_blood$doy[i])
          
        # If no nests were active, assign a placeholder value of 9999 to the nearest nest distance
          if(nrow(sub) == 0){d_blood$closest_act[i] <- 9999}
          if(nrow(sub) == 0){d_blood$closest_act2[i] <- 9999}
          
        # If some nests were active, loop through each of those active nests and calculate distance to the bucket where
          # the bloodmeal was collected using the pythagorean theorem.
          ## EDIT: also calculating distance from decimal degrees using Geosphere package. Think there may be a projection problem with UTM coordinates.
            if(nrow(sub) > 0){
              for(k in 1:nrow(sub)){
                sub$distance2[k] <- sqrt((sub$latitude[k] - d_blood$latitude[i])^2 + (sub$longitude[k] - d_blood$longitude[i])^2)
                sub$distance[k] <- distGeo(c(d_blood$long_dd[i], d_blood$lat_dd[i]), c(sub$long_ddeg[k], sub$lat_ddeg[k]))
              }
              
          # From the list of active nests, find the closest one and save that distance to the bloodmeal id dataframe
            d_blood$closest_act[i] <- min(na.omit(sub$distance))
            d_blood$closest_act2[i] <- min(na.omit(sub$distance2))
          }
      }
      
  # Put in a categorical predictor for whether an active nest was present within 100 meters
      for(i in 1:nrow(d_blood)){
        if(d_blood$closest_act[i] <= 100){d_blood$nest_in_100[i] <- 1}
        if(d_blood$closest_act[i] > 100){d_blood$nest_in_100[i] <- 0}
        if(d_blood$closest_act[i] <= 50){d_blood$nest_in_50[i] <- 1}
        if(d_blood$closest_act[i] > 50){d_blood$nest_in_50[i] <- 0}
        if(d_blood$closest_act[i] <= 10){d_blood$nest_in_10[i] <- 1}
        if(d_blood$closest_act[i] > 10){d_blood$nest_in_10[i] <- 0}
      }
      
  # Change crow fed and nests in 100 to categorical predictors
      d_blood$crow_fed <- as.factor(d_blood$crow_fed)
      d_blood$nest_in_100 <- as.factor(d_blood$nest_in_100)
      d_blood$nest_in_50 <- as.factor(d_blood$nest_in_50)
      d_blood$nest_in_10 <- as.factor(d_blood$nest_in_10)

Proximity & Date

Nest to bucket distances

Calculating summary statistics for the distance from buckets to the closest nest on our full nest list, regardless of whether or on what dates that nest was ever active.

for(i in 1:nrow(d_buck)){
  temp <- d_nest
  for(k in 1:nrow(temp)){
    temp$distance[k] <- distGeo(c(d_buck$long_dd[i], d_buck$lat_dd[i]), c(temp$long_ddeg[k], temp$lat_ddeg[k]))
  }
  d_buck$minimum_nest[i] <- min(na.omit(temp$distance))
}

buck_nest <- ggplot(d_buck, aes(x = minimum_nest)) + theme_minimal() + 
    labs(x = "Minimum distance to nest (m)", y = "Count") + geom_histogram(binwidth = 50, fill = "goldenrod")
buck_nest

Mean = 147.66
SD = 244.06
SE = 31.5080105
Minimum = 1.04
Maximum = 860.38

Histogram of bloodmeal distance

As a quick look here I'm just plotting histograms for actual distances from bloodmeals to active nests for meals that were or were not crows. In these plots, bloodmeals that were collected after the very last active nest are not included since they don't have a distance to any active nest. These are really just for our own benefit as we were thinking about thresholds and kind of replicate the more formal analyses below. Right now the bin width is at 50m. You can set it at 10 and see the same general thing but the bars get really tiny... If we actually want to include something like this I can clean up the plot with better labels and matching colors, etc.

One thing that is kind of cool that you can see here is that when you are within 50m of an active nest, more than 50% of the bloodmeals are from crows! That seems pretty amazing.

  d_hist <- subset(d_blood, d_blood$closest_act < 9999)  # gets rid of bloodmeals after last active nest done

  p_act <- ggplot(d_hist, aes(x = closest_act, color = crow_fed, fill = crow_fed)) +
            geom_histogram(binwidth = 50, position = "identity", alpha = 0.5) + theme_minimal() + 
            labs(x = "Closest active nest (m)")
  p_act

Crow feed models

For this analysis, the unit of observation is each mosquito that had an identified bloodmeal. The current dataset I'm using has 297 observed mosquitoes but this combines different species and maybe doesn't match the exact dataset used in other places? Each of these mosquitoes has a single host species identified. I start by fitting a set of candidate models that includes a null (intercept only model), a model with just date as a continuous predictor, and models that include a two level factor (yes/no) for whether an active crow nest was present within 10, 50, or 100 meters of the bucket. Day of year was fit as a continuous variable but was standardized (subtract mean and divide by standard deviation) before fitting. This isn't a transformation and the results are exactly the same as they would otherwise be, but it means that the models fit faster and the effect size listed in the tables below is for a one standard deviation increase in days of year over the distribution of days in this study (its not actually intuitive to interpret the coefficient directly regardless of standardization). For plotting, day of year is always back-converted from the model to the regular integer scale. I'm not sure at the moment what criteria were used to determine 'active' nests. All models are fit as generalized linear mixed models with a binomial response and with bucket id as a random effect.

Models are fit using the 'glmer' function in the 'lme4' package. AICc values are compared using the 'MuMIn' package in 'R'.

First, I fit the five models and compare their AICc values (sorry this one isn't pretty).

## Do nest proximity and date predict crow feeds ----
    
  # Fit a set of candidate models to compare by AICc
      m_null <- glmer(crow_fed ~ 1 + (1|bucket_id), data = d_blood, family = binomial)
      m_date <- glmer(crow_fed ~ doy_s + (1|bucket_id), data = d_blood, family = binomial)
      m_10 <- glmer(crow_fed ~ doy_s + nest_in_10 + (1|bucket_id), data = d_blood, family = binomial)
      m_50 <- glmer(crow_fed ~ doy_s + nest_in_50 + (1|bucket_id), data = d_blood, family = binomial)
      m_100 <- glmer(crow_fed ~ doy_s + nest_in_100 + (1|bucket_id), data = d_blood, family = binomial)
      
  # Compare fit models
      comp <- model.sel(m_null, m_date, m_10, m_50, m_100)
      
  # Print the AIC table    
      comp

## Model selection table 
##         (Int)  doy_s nst_in_10 nst_in_50 nst_in_100 df  logLik  AICc delta
## m_10   -3.833 -1.338         +                       4 -49.461 107.1  0.00
## m_50   -4.052 -1.272                   +             4 -50.758 109.7  2.59
## m_100  -4.275 -1.422                              +  4 -53.877 115.9  8.83
## m_date -3.990 -1.670                                 3 -55.136 116.4  9.29
## m_null -3.197                                        2 -65.573 135.2 28.13
##        weight
## m_10    0.772
## m_50    0.211
## m_100   0.009
## m_date  0.007
## m_null  0.000
## Models ranked by AICc(x) 
## Random terms (all models): 
## '1 | bucket_id'

There is pretty strong support for the models that include a factor either for crow nests within 10 or within 50 meters. The delta AIC between these two models is only 0.94. Essentially, they are almost the same model and there isn't much reason to prefer one over the other. Together, these two models have 91.8% of the model weight. Since these two are mostly interchangeable, I next made a table that displays the fit details of each of these options. As we discussed by email, an alternative to this approach would be to just say from the beginning that the design called for 50m as a 'close' cutoff point and that we set up our statistical model that way. This approach could cut AICc comparison out entirely, or could include just the m_50 model above, dropping the m_10 and m_100. In the end I don't think any of those choices make much real difference so its really a matter of what we think makes the most sense.

Print the fit model details:

tab_model(m_10, m_50)

	crow_fed			crow_fed
Predictors	Odds Ratios	CI	p	Odds Ratios	CI	p
(Intercept)	0.02	0.01 – 0.07	<0.001	0.02	0.00 – 0.06	<0.001
doy_s	0.26	0.11 – 0.64	0.003	0.28	0.11 – 0.70	0.007
nest_in_10: nest_in_101	38.07	4.43 – 327.47	0.001
nest_in_50: nest_in_501				12.08	2.42 – 60.16	0.002
Random Effects
σ²	3.29			3.29
τ₀₀	0.86 _{bucket_id}			1.24 _{bucket_id}
ICC	0.21			0.27
N	33 _{bucket_id}			33 _{bucket_id}
Observations	297			297
Marginal R² / Conditional R²	0.390 / 0.516			0.352 / 0.529

Crow feed plots

Now I make some plots from the model above. For now I'm using the nests <50 model, but this could be switched easily depending on what we decide to go with. There are several options here for what to include and whether to do this as multipanel or single panel etc. I've printed a few different options below. Another option that isn't printed that matches an earlier version of the google doc would be to have separate plots showing the seasonal trend and the near vs. far difference. Personally, I think it works better to combine those two points into the one figure like the ones here, but it's certainly possible to go back to making them that way.

In all of these plots, the fit lines are maximum likelihood estimates from the chosen model (in this case <50). The confidence intervals for the two levels (yes/no for nest <50) are determined by pulling 500,000 samples from the posterior distribution of the fit model using the 'mvrnorm' function from the 'MASS' package and then determining the 90% highest posterior density interval using the 'rethinking' package in 'R'. The circles show raw data. For those, all mosquitoes collected on the same day are grouped by whether the bucket was >/< 50m from a crow nest and then for each of those groups the percent of mosquitoes that was crow fed is calculated. That percentage is the y location of the point and the size of the point is the number of mosquitoes that was included in each of those groups. The gray histogram is the count of active crow nests on each day of the season. For that I tried to use hatch to fledge dates from the last excel file that Andrea had sent, but there may well be a better/different way to calculate those numbers (or they may not be worth including).

# Choose model to plot from
      m_plot <- m_10

# Set the number of samples to use when plotting confidence intervals. More = smoother but takes longer to plot
    num_samps <- 10000
  
  # Make a new data frame with info per date 
      d2 <- as.data.frame(unique(d_blood$doy))
      colnames(d2) <- "doy"
      for(i in 1:nrow(d2)){
        sub <- subset(d_blood, d_blood$doy == d2$doy[i] & d_blood$nest_in_10 == "1")
        d2$count10[i] <- nrow(sub)
        sub2 <- subset(sub, sub$crow_fed == 1)
        d2$crow_count10[i] <- nrow(sub2)
        
        sub <- subset(d_blood, d_blood$doy == d2$doy[i] & d_blood$nest_in_10 == "0")
        d2$count[i] <- nrow(sub)
        sub2 <- subset(sub, sub$crow_fed == 1)
        d2$crow_count[i] <- nrow(sub2)
      }
      d2$pct_crow10 <- d2$crow_count10 / d2$count10
      d2$pct_crow <- d2$crow_count / d2$count
      
  # Make a plot showing seasonal change ----
      plot(1, 1, type = "n", xaxs = "i", yaxs = "i", xaxt = "n", yaxt = "n", ylim = c(-0.02, 1.02),
           xlim = c(120, 247), bty = "n", ylab = "Proportion of crow fed bloodmeals", xlab = "Day of year")
      
      # Build a histogram of number of active nests per day
        d_act <- as.data.frame(seq(min(na.omit(d_nest$start_active)), max(na.omit(d_nest$end_active))))
        colnames(d_act) <- "doy"
        for(i in 1:nrow(d_act)){
          sub <- subset(d_nest, d_nest$start_active <= d_act$doy[i] & d_nest$end_active >= d_act$doy[i])
          d_act$count[i] <- nrow(sub)
        }
    
      # Add histogram of active nests
        add_bars(d_act$doy, d_act$count / 38, border = "gray60", bty = "n", col = "gray60", 
               beside = TRUE, space = c(0, 0))
      
      # Add axes
        axis(1, seq(100, 260, 20))
        axis(2, seq(-0.2, 1.2, 0.1), las = 2)
  
      # Add legend  
        legend("topright", c("25", "5", "Nest <10m", "Nest >10m"), bty = "n", pch = 21, 
             pt.bg = c("white", "white", "coral3", "slateblue"), pt.cex = log(c(25, 5, 2, 2) + 1),
             y.intersp = 1.2)
      
      ## Add in the model best fit
        # calculate best fit line from model parameters
          r <- seq(min(d_blood$doy_s), max(d_blood$doy_s), 0.05)
          mus <- logistic(fixef(m_plot)[1] + fixef(m_plot)[2] * r + fixef(m_plot)[3]*0) 
          mus2 <- logistic(fixef(m_plot)[1] + fixef(m_plot)[2] * r + fixef(m_plot)[3]*1)
          r2 <- r * sd(d_blood$doy) + mean(d_blood$doy)
          
        # draw lines separately for nests close to and far from crow nests
          lines(r2, mus, lwd = 2, col = "slateblue")
          lines(r2, mus2, lwd = 2, col = "coral3")
        
        # sample from fit model to create confidence intervals  
          post <- mvrnorm(n = num_samps, mu = fixef(m_plot), Sigma = vcov(m_plot))
          cis <- sapply(r, function(z)HPDI(logistic(post[, 1] + post[, 2]*z), 0.9))
          cis10 <- sapply(r, function(z)HPDI(logistic(post[, 1] + post[, 2]*z + post[, 3]), 0.9))
        
        # draw shaded confidence intervals  
          shade(cis, r2, col = alpha("slateblue", 0.5))
          shade(cis10, r2, col = alpha("coral3", 0.5))
        
        # add points for raw data  
          points(d2$doy, d2$pct_crow, pch = 21, bg = "slateblue", cex = log(d2$count + 1))
          points(d2$doy, d2$pct_crow10, pch = 21, bg = "coral3", cex = log(d2$count10 + 1))

The same thing but for 50 meters.

# Choose model to plot from
      m_plot <- m_50

# Set the number of samples to use when plotting confidence intervals. More = smoother but takes longer to plot
    num_samps <- 10000
  
  # Make a new data frame with info per date 
      d2 <- as.data.frame(unique(d_blood$doy))
      colnames(d2) <- "doy"
      for(i in 1:nrow(d2)){
        sub <- subset(d_blood, d_blood$doy == d2$doy[i] & d_blood$nest_in_50 == "1")
        d2$count50[i] <- nrow(sub)
        sub2 <- subset(sub, sub$crow_fed == 1)
        d2$crow_count50[i] <- nrow(sub2)
        
        sub <- subset(d_blood, d_blood$doy == d2$doy[i] & d_blood$nest_in_50 == "0")
        d2$count[i] <- nrow(sub)
        sub2 <- subset(sub, sub$crow_fed == 1)
        d2$crow_count[i] <- nrow(sub2)
      }
      d2$pct_crow50 <- d2$crow_count50 / d2$count50
      d2$pct_crow <- d2$crow_count / d2$count
      
  # Make a plot showing seasonal change ----
      plot(1, 1, type = "n", xaxs = "i", yaxs = "i", xaxt = "n", yaxt = "n", ylim = c(-0.02, 1.02),
           xlim = c(120, 247), bty = "n", ylab = "Proportion of crow fed bloodmeals", xlab = "Day of year")
      
      # Build a histogram of number of active nests per day
        d_act <- as.data.frame(seq(min(na.omit(d_nest$start_active)), max(na.omit(d_nest$end_active))))
        colnames(d_act) <- "doy"
        for(i in 1:nrow(d_act)){
          sub <- subset(d_nest, d_nest$start_active <= d_act$doy[i] & d_nest$end_active >= d_act$doy[i])
          d_act$count[i] <- nrow(sub)
        }
    
      # Add histogram of active nests
        add_bars(d_act$doy, d_act$count / 38, border = "gray60", bty = "n", col = "gray60", 
               beside = TRUE, space = c(0, 0))
      
      # Add axes
        axis(1, seq(100, 260, 20))
        axis(2, seq(-0.2, 1.2, 0.1), las = 2)
  
      # Add legend  
        legend("topright", c("25", "5", "Nest <50m", "Nest >50m"), bty = "n", pch = 21, 
             pt.bg = c("white", "white", "coral3", "slateblue"), pt.cex = log(c(25, 5, 2, 2) + 1),
             y.intersp = 1.2)
      
      ## Add in the model best fit
        # calculate best fit line from model parameters
          r <- seq(min(d_blood$doy_s), max(d_blood$doy_s), 0.05)
          mus <- logistic(fixef(m_plot)[1] + fixef(m_plot)[2] * r + fixef(m_plot)[3]*0) 
          mus2 <- logistic(fixef(m_plot)[1] + fixef(m_plot)[2] * r + fixef(m_plot)[3]*1)
          r2 <- r * sd(d_blood$doy) + mean(d_blood$doy)
          
        # draw lines separately for nests close to and far from crow nests
          lines(r2, mus, lwd = 2, col = "slateblue")
          lines(r2, mus2, lwd = 2, col = "coral3")
        
        # sample from fit model to create confidence intervals  
          post <- mvrnorm(n = num_samps, mu = fixef(m_plot), Sigma = vcov(m_plot))
          cis <- sapply(r, function(z)HPDI(logistic(post[, 1] + post[, 2]*z), 0.9))
          cis50 <- sapply(r, function(z)HPDI(logistic(post[, 1] + post[, 2]*z + post[, 3]), 0.9))
        
        # draw shaded confidence intervals  
          shade(cis, r2, col = alpha("slateblue", 0.5))
          shade(cis50, r2, col = alpha("coral3", 0.5))
        
        # add points for raw data  
          points(d2$doy, d2$pct_crow, pch = 21, bg = "slateblue", cex = log(d2$count + 1))
          points(d2$doy, d2$pct_crow50, pch = 21, bg = "coral3", cex = log(d2$count50 + 1))

I can easily make lots of changes to these figures, but I'm going to wait until we decide exactly what we want to include. I could drop the histogram entirely. Or could make the histogram a separate panel that appears just above or below the main plot but smaller? Or it could be a gray line with a separate y-axis on the right hand side showing the number of activ nests? Can change colors/points/styles/transparency etc. Just let me know.

Multi-species

This is now using the full set of birds identified from the crow blood meals (I've excluded mammals, do we want to include them?).

Frequency by species

First, make a simple plot showing the percentage of bloodmeals that were detected from each bird species in the study.

# make a frequency plot for most eaten species
          dx <- subset(d_blood, d_blood$class == "bird")
          hh <- as.data.frame(unique(dx$meal_source))
          colnames(hh) <- "bloodmeal_source"
          for(i in 1:nrow(hh)){
            sub <- subset(dx, dx$meal_source == hh$bloodmeal_source[i])
            hh$count[i] <- nrow(sub)
          }
          hh$freq <- hh$count / sum(hh$count)
          hh <- hh[order(hh$count),]
          
          barplot(hh$freq, bty = "n", ylab = "Percent of avian bloodmeals", yaxt = "n",
                  col = alpha(c(rep("#E69F00", 19), "slateblue", rep("#E69F00", 4)), 0.9),
                  ylim = c(0, 0.2501), yaxs = "i")  
          axis(1, c(-10, 40), rep("", 2))
          axis(2, seq(-.05, 0.3, 0.05), c("", "0%", "5%", "10%", "15%", "20%", "25%", ""), las = 2)
          
          text(seq(0.7, 28.3, 1.2), labels = as.character(hh$bloodmeal_source),
               srt = 48, cex = 0.61, par("usr")[3], xpd = TRUE, adj = c(1.1, 1.1))

Same thing but with common names (Note - I need to go through and standardize capitalizatoin on these common names and double check for spelling, etc):

# make a frequency plot for most eaten species
          dx <- subset(d_blood, d_blood$class == "bird")
          hh <- as.data.frame(unique(dx$meal_common_name))
          colnames(hh) <- "bloodmeal_source"
          for(i in 1:nrow(hh)){
            sub <- subset(dx, dx$meal_common_name == hh$bloodmeal_source[i])
            hh$count[i] <- nrow(sub)
          }
          hh$freq <- hh$count / sum(hh$count)
          hh <- hh[order(hh$count),]
          
          barplot(hh$freq, bty = "n", ylab = "Percent of avian bloodmeals", yaxt = "n",
                  col = alpha(c(rep("#E69F00", 19), "slateblue", rep("#E69F00", 4)), 0.9),
                  ylim = c(0, 0.2501), yaxs = "i")  
          axis(1, c(-10, 40), rep("", 2))
          axis(2, seq(-.05, 0.3, 0.05), c("", "0%", "5%", "10%", "15%", "20%", "25%", ""), las = 2)
          
          text(seq(0.7, 28.3, 1.2), labels = as.character(hh$bloodmeal_source),
               srt = 48, cex = 0.61, par("usr")[3], xpd = TRUE, adj = c(1.1, 1.1))

Date by species

Plot lines showing the estimated relationship between date and detection probability for each of the common species. These models are fit exactly like the binomial GLMMs described above and they all take the form:

'detected(yes/no) ~ standardized_date + (1|bucket_ID)'

A separate model is fit for each species. The lines shown are maximum likelihood estimates. I didn't plot confidence intervals because it would get really crowded quickly, but I could add them or we could potentially instead do this as a plot of small multiples (one small panel for each species with fit line and confidence interval).

One option for this plot would be to drop the legend and instead label the lines directly with text in the graph for each species next to its line. I think there is enough spacing between the lines that this might work and might look nicer than the giant legend, especially if we end up adding in more species that are currently included. Let me know what you think.

Here is the version I sent around earlier but switched to common names:

 # Make a date by detection probability plot with the most common 7 species
          for(i in 1:nrow(d_blood)){
            ifelse(d_blood$meal_source[i] == "Corvus brachyrhynchos",
                                    d_blood$amcr[i] <- 1, d_blood$amcr[i] <- 0)
            ifelse(d_blood$meal_source[i] == "Haemorhous mexicanus", 
                                    d_blood$hofi[i] <- 1, d_blood$hofi[i] <- 0)
            ifelse(d_blood$meal_source[i] == "Nycticorax nycticorax",
                                    d_blood$heron[i] <- 1, d_blood$heron[i] <- 0)
            ifelse(d_blood$meal_source[i] == "Aphelocoma californica", 
                                    d_blood$jay[i] <- 1, d_blood$jay[i] <- 0)
            ifelse(d_blood$meal_source[i] == "Pica nuttalli",
                                    d_blood$magpie[i] <- 1, d_blood$magpie[i] <- 0)
            ifelse(d_blood$meal_source[i] == "Streptopelia decaocto", 
                                    d_blood$dove[i] <- 1, d_blood$dove[i] <- 0)
            ifelse(d_blood$meal_source[i] == "Mimus polyglottos", 
                                    d_blood$mocking[i] <- 1, d_blood$mocking[i] <- 0)
            ifelse(d_blood$meal_source[i] == "Zenaida macroura", 
                                    d_blood$mourning[i] <- 1, d_blood$mourning[i] <- 0)
            ifelse(d_blood$meal_source[i] == "Sturnus vulgaris", 
                                    d_blood$starling[i] <- 1, d_blood$starling[i] <- 0)
            ifelse(d_blood$meal_source[i] == "Turdus migratorius", 
                                    d_blood$robin[i] <- 1, d_blood$robin[i] <- 0)
          }
          
          d_blood$doy_s <- scale(d_blood$doy)
          
          m_crow <- glmer(amcr ~ doy_s + (1|bucket_id), data = d_blood, family = binomial)
          m_hofi <- glmer(hofi ~ doy_s + (1|bucket_id), data = d_blood, family = binomial)
          m_heron <- glmer(heron ~ doy_s + (1|bucket_id), data = d_blood, family = binomial)
          m_jay <- glmer(jay ~ doy_s + (1|bucket_id), data = d_blood, family = binomial)
          m_magpie <- glmer(magpie ~ doy_s + (1|bucket_id), data = d_blood, family = binomial)
          m_dove <- glmer(dove ~ doy_s + (1|bucket_id), data = d_blood, family = binomial)
          m_mocking <- glmer(mocking ~ doy_s + (1|bucket_id), data = d_blood, family = binomial)
          m_mourning <- glmer(mourning ~ doy_s + (1|bucket_id), data = d_blood, family = binomial)
          m_starling <- glmer(starling ~ doy_s + (1|bucket_id), data = d_blood, family = binomial)
          m_robin <- glmer(robin ~ doy_s + (1|bucket_id), data = d_blood, family = binomial)
          
          
      # Make a plot showing seasonal change
          plot(1, 1, type = "n", xaxs = "i", yaxs = "i", xaxt = "n", yaxt = "n", ylim = c(-0.02, 0.4),
               xlim = c(120, 320), bty = "n", ylab = "Proportion of bloodmeals", xlab = "Day of year")
          
         # Add axes
          axis(1, seq(100, 260, 20))
          axis(2, seq(-0.2, 1.2, 0.1), las = 2)
          
          # Add legend  
         # legend("topright", c("25", "5", "Nest <50m", "Nest >50m"), bty = "n", pch = 21, 
          #       pt.bg = c("white", "white", "slateblue", "coral3"), pt.cex = log(c(25, 5, 2, 2) + 1),
          #       y.intersp = 1.2)
          
          ## Add in the model best fit
          # calculate best fit line from model parameters
          r <- seq(min(d_blood$doy_s), max(d_blood$doy_s), 0.05)
          mu_crow <- logistic(fixef(m_crow)[1] + fixef(m_crow)[2] * r) 
          mu_hofi <- logistic(fixef(m_hofi)[1] + fixef(m_hofi)[2] * r) 
          mu_heron <- logistic(fixef(m_heron)[1] + fixef(m_heron)[2] * r) 
          mu_jay <- logistic(fixef(m_jay)[1] + fixef(m_jay)[2] * r) 
          mu_magpie <- logistic(fixef(m_magpie)[1] + fixef(m_magpie)[2] * r) 
          mu_dove <- logistic(fixef(m_dove)[1] + fixef(m_dove)[2] * r) 
          mu_mocking <- logistic(fixef(m_mocking)[1] + fixef(m_mocking)[2] * r) 
          mu_mourning <- logistic(fixef(m_mourning)[1] + fixef(m_mourning)[2] * r)
          mu_starling <- logistic(fixef(m_starling)[1] + fixef(m_starling)[2] * r)
          mu_robin <- logistic(fixef(m_robin)[1] + fixef(m_robin)[2] * r)
          r2 <- r * sd(d_blood$doy) + mean(d_blood$doy)
          
          # draw lines separately for each species
            lines(r2, mu_crow, lwd = 3, col = "#D55E00")
            lines(r2, mu_hofi, lwd = 3, col = "#CC79A7", lty = 2)
            lines(r2, mu_heron, lwd = 3, col = "black", lty = 2)
            lines(r2, mu_jay, lwd = 3, col = "#009E73")
            lines(r2, mu_magpie, lwd = 3, col = "#56B4E9")
            lines(r2, mu_dove, lwd = 3, col = "#0072B2")
            lines(r2, mu_mourning, lwd = 3, col = "green", lty = 2)
            lines(r2, mu_starling, lwd = 3, col = "gray40")
            lines(r2, mu_robin, lwd = 3, col = "slateblue", lty = 2)
            lines(r2 + 1, mu_mocking+.001, lwd = 3, col = "gray40", lty = 2)
          
          ## Add legend
            legend("topright", c("House Finch", "Black-Crowned Night Heron", "Western Scrub Jay", "Yellow-Billed Magpie",
                                "American Crow", "Collared Dove", "Northern Mockingbird", "Mourning Dove", "European Starling",
                                "American Robin"), 
                          lwd = 3, bty = "n",
                          col = c("#CC79A7", "black", "#009E73", "#56B4E9", "#D55E00", "#0072B2", "gray40", "green", "gray40", "slateblue"),
                          lty = c(2, 2, 1, 1, 1, 1, 2, 2, 1, 2))

This one has all the species we had talked about added in. It is kind of a mess but lets look at this and decide what we actually want to include before I spend time cleaning it up. Once you get down past the mockingbird I'm really not sure that we have enough data to say anything about the trends in the other species because the sample size for crow fed blood meals is so small.

Map

This is just generating the map showing box locations and nest locations. I can easily change type and color of points, type of map background, etc. I used the latest excel file that Andrea had sent as the list of nest locations, but let me know if we want to use a different file. I'm not sure whether it makes sense to try to illustrate anything else on here. I think showing buffers around nests or nests that are near/far from buckets will get really busy especially since the 'active' status of nests changes depending on the day that mosquitoes were collected from a given bucket. The mapping data comes from Stamen and OpenStreetMap. It should be cited as the source. Its all under CC BY 3.0 licensing so I don't think there is any problem or need for permissions to use it in any journal. Here is the website that lists their citation info (http://maps.stamen.com/#watercolor/12/37.7706/-122.3782).

The map was created using the 'R' package 'ggmap' and that should be cited.

# Make a map  

# This draws from the 'stamen' map library using OpenStreetMap
          # set the boundaries of the lower left and top right corner
            loc <- c(-121.8, 38.514, -121.74348, 38.54819)  

          # make data frame with nests and buckets put together
            d_spatial <- as.data.frame(c(as.character(d_nest$family_id), as.character(d_buck$bucket_id)))
            colnames(d_spatial) <- "feature_id"
            d_spatial$longitude <- c(d_nest$long_dd, d_buck$long_dd)
            d_spatial$latitude <- c(d_nest$lat_dd, d_buck$lat_dd)
            d_spatial$nest_or_bucket <- c(rep("Nest", nrow(d_nest)), rep("Bucket", nrow(d_buck)))
          
          # make the initial map. Lots of options here for different visualizations of map
            p2 <- ggmap(get_map(location = loc, source = "stamen",
                    zoom = 14, maptype = "toner", color = "bw")) +
                    labs(y = "Latitude", x = "Longitude") 
        
          # add points for buckets and nests
            p3 <- p2 + geom_point(aes(x = longitude, y = latitude, shape = nest_or_bucket, 
                                      fill = nest_or_bucket), data = d_spatial, col = "black",
                                      alpha = 0.8, size = 2) + 
                      scale_shape_manual(values = c(21, 23), labels = c("Bucket", "Nest")) + 
                      scale_fill_manual(values = c("#F8766D", "#00BFC4"), labels = c("Bucket", "Nest")) +
                      theme(legend.title = element_blank())
            
            p3

Compare Mosquito Diets

Here I'm looking at whether there seem to be any differences in overall diet of Culex pipiens vs. tarsalis. They seem pretty similar and I think the few that do differ are likely to be the result of a single bucket having a lot of one blood meal type and also more of that type of mosquito. This isn't really a formal comparison. I can try to do something with a community ecology package, but I'm actually not totally sure how to account for the uneven sampling and importance of buckets in that kind of analysis. Do we need that? It does seem like both species were about equally likely to feed on crows.

  meal_summary <- as.data.frame(unique(d_blood$meal_common_name))
  colnames(meal_summary) <- "common_name"
  pipiens <- subset(d_blood, d_blood$mosq_sp == "Culex pipiens (complex)")
  tarsalis <- subset(d_blood, d_blood$mosq_sp == "Culex tarsalis")
  xx <- d_blood[, c("meal_common_name", "class")]
  colnames(xx)[1] <- "common_name"
  meal_summary <- join(meal_summary, xx, "common_name", match = "first")
  
  for(i in 1:nrow(meal_summary)){
    sub1 <- subset(pipiens, pipiens$meal_common_name == meal_summary$common_name[i])
    sub2 <- subset(tarsalis, tarsalis$meal_common_name == meal_summary$common_name[i])
    meal_summary$pipiens_count[i] <- nrow(sub1)
    meal_summary$tarsalis_count[i] <- nrow(sub2)
    
    ifelse(as.character(meal_summary$class[i]) == "bird",
           meal_summary$color[i] <- "coral3",
           meal_summary$color[i] <- "chartreuse3")
  }
  
  meal_summary[9, 5] <- "slateblue"   #changing color just for crow
  
  meal_summary$pipiens_pct <- meal_summary$pipiens_count / 100
  meal_summary$tarsalis_pct <- meal_summary$tarsalis_count / 196
  
  plot(1, 1, type = "n", bty = "n", xlab = "Culex pipiens percent of meals", ylab = "Culex tarsalis percent of meals",
       xlim = c(0, 0.23), ylim = c(0, 0.23), xaxt = "n", yaxt = "n")
  axis(1, seq(-0.1, 0.4, 0.1), c("", "0 %", "10 %", "20 %", "30%", ""))
  axis(2, seq(-0.1, 0.4, 0.1), c("", "0 %", "10 %", "20 %", "30%", ""), las = 2)
  points(meal_summary$pipiens_pct + runif(nrow(meal_summary), -0.001, 0.001), 
         meal_summary$tarsalis_pct, pch = 21, bg = alpha(as.character(meal_summary$color), 0.8))
  abline(0, 1, lty = 2, lwd = 2, col = "gray50")
  
  legend("topleft", c("Birds", "Mammals", "American Crow"), bty = "n", pch = 21, pt.bg = c("coral3", "chartreuse3", "slateblue"))
  text(0.23, 0.07, expression(italic("Black-Crowned Night Heron")), cex = 0.6, pos = 2)
  text(-0.01, 0.095, expression(italic("Eurasian Collared Dove")), cex = 0.6, pos = 4)
  text(-0.003, 0.067, expression(italic("Cow")), cex = 0.6, pos = 4)

You can see below that some buckets had pretty different mixes of tarsalis vs. pipiens and of course bucket location was pretty important for what was fed on too. Do you want me to do something else for analyses/plots here??

  bucket_summary <- as.data.frame(unique(d_blood$bucket_id))
  colnames(bucket_summary) <- "bucket_id"
  #xx <- d_blood[, c("meal_common_name", "class")]
  #colnames(xx)[1] <- "common_name"
  #meal_summary <- join(meal_summary, xx, "common_name", match = "first")
  
  for(i in 1:nrow(bucket_summary)){
    sub1 <- subset(pipiens, pipiens$bucket_id == bucket_summary$bucket_id[i])
    sub2 <- subset(tarsalis, tarsalis$bucket_id == bucket_summary$bucket_id[i])
    bucket_summary$pipiens_count[i] <- nrow(sub1)
    bucket_summary$tarsalis_count[i] <- nrow(sub2)
  }
  
  bucket_summary$total_count <- bucket_summary$pipiens_count + bucket_summary$tarsalis_count
  
  
  plot(1, 1, type = "n", bty = "n", xlab = "Culex pipiens count", ylab = "Culex tarsalis count",
       xlim = c(0, 38), ylim = c(0, 38), xaxt = "n", yaxt = "n",
       main = "Mosquito Species by Bucket")
  axis(1, seq(-5, 50, 5))
  axis(2, seq(-5, 50, 5), las = 2)
  points(bucket_summary$pipiens_count, 
         bucket_summary$tarsalis_count, pch = 21, bg = alpha("coral3", 0.7),
         cex = log(bucket_summary$total_count))
  abline(0, 1, lty = 2, lwd = 2, col = "gray50")

R packages to cite

This is a list of the reference info for the R packages specifically mentioned above that should be cited in the paper.

citation()

## 
## To cite R in publications use:
## 
##   R Core Team (2019). R: A language and environment for
##   statistical computing. R Foundation for Statistical Computing,
##   Vienna, Austria. URL https://www.R-project.org/.
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {R: A Language and Environment for Statistical Computing},
##     author = {{R Core Team}},
##     organization = {R Foundation for Statistical Computing},
##     address = {Vienna, Austria},
##     year = {2019},
##     url = {https://www.R-project.org/},
##   }
## 
## We have invested a lot of time and effort in creating R, please
## cite it when using it for data analysis. See also
## 'citation("pkgname")' for citing R packages.

citation("MuMIn")

## 
## To cite package 'MuMIn' in publications use:
## 
##   Kamil Bartoń (2019). MuMIn: Multi-Model Inference. R package
##   version 1.43.6. https://CRAN.R-project.org/package=MuMIn
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {MuMIn: Multi-Model Inference},
##     author = {Kamil Bartoń},
##     year = {2019},
##     note = {R package version 1.43.6},
##     url = {https://CRAN.R-project.org/package=MuMIn},
##   }
## 
## ATTENTION: This citation information has been auto-generated from
## the package DESCRIPTION file and may need manual editing, see
## 'help("citation")'.

citation("rethinking")

## 
## To cite package 'rethinking' in publications use:
## 
##   Richard McElreath (2019). rethinking: Statistical Rethinking
##   book package. R package version 1.90.
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {rethinking: Statistical Rethinking book package},
##     author = {Richard McElreath},
##     year = {2019},
##     note = {R package version 1.90},
##   }
## 
## ATTENTION: This citation information has been auto-generated from
## the package DESCRIPTION file and may need manual editing, see
## 'help("citation")'.

citation("lme4")

## 
## To cite lme4 in publications use:
## 
##   Douglas Bates, Martin Maechler, Ben Bolker, Steve Walker (2015).
##   Fitting Linear Mixed-Effects Models Using lme4. Journal of
##   Statistical Software, 67(1), 1-48. doi:10.18637/jss.v067.i01.
## 
## A BibTeX entry for LaTeX users is
## 
##   @Article{,
##     title = {Fitting Linear Mixed-Effects Models Using {lme4}},
##     author = {Douglas Bates and Martin M{\"a}chler and Ben Bolker and Steve Walker},
##     journal = {Journal of Statistical Software},
##     year = {2015},
##     volume = {67},
##     number = {1},
##     pages = {1--48},
##     doi = {10.18637/jss.v067.i01},
##   }

citation("MASS")

## 
## To cite the MASS package in publications use:
## 
##   Venables, W. N. & Ripley, B. D. (2002) Modern Applied Statistics
##   with S. Fourth Edition. Springer, New York. ISBN 0-387-95457-0
## 
## A BibTeX entry for LaTeX users is
## 
##   @Book{,
##     title = {Modern Applied Statistics with S},
##     author = {W. N. Venables and B. D. Ripley},
##     publisher = {Springer},
##     edition = {Fourth},
##     address = {New York},
##     year = {2002},
##     note = {ISBN 0-387-95457-0},
##     url = {http://www.stats.ox.ac.uk/pub/MASS4},
##   }

citation("ggmap")

## 
## To cite ggmap in publications, please use:
## 
##   D. Kahle and H. Wickham. ggmap: Spatial Visualization with
##   ggplot2. The R Journal, 5(1), 144-161. URL
##   http://journal.r-project.org/archive/2013-1/kahle-wickham.pdf
## 
## A BibTeX entry for LaTeX users is
## 
##   @Article{,
##     author = {David Kahle and Hadley Wickham},
##     title = {ggmap: Spatial Visualization with ggplot2},
##     journal = {The R Journal},
##     year = {2013},
##     volume = {5},
##     number = {1},
##     pages = {144--161},
##     url = {https://journal.r-project.org/archive/2013-1/kahle-wickham.pdf},
##   }

citation("vegan")

## 
## To cite package 'vegan' in publications use:
## 
##   Jari Oksanen, F. Guillaume Blanchet, Michael Friendly, Roeland
##   Kindt, Pierre Legendre, Dan McGlinn, Peter R. Minchin, R. B.
##   O'Hara, Gavin L. Simpson, Peter Solymos, M. Henry H. Stevens,
##   Eduard Szoecs and Helene Wagner (2019). vegan: Community Ecology
##   Package. R package version 2.5-6.
##   https://CRAN.R-project.org/package=vegan
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {vegan: Community Ecology Package},
##     author = {Jari Oksanen and F. Guillaume Blanchet and Michael Friendly and Roeland Kindt and Pierre Legendre and Dan McGlinn and Peter R. Minchin and R. B. O'Hara and Gavin L. Simpson and Peter Solymos and M. Henry H. Stevens and Eduard Szoecs and Helene Wagner},
##     year = {2019},
##     note = {R package version 2.5-6},
##     url = {https://CRAN.R-project.org/package=vegan},
##   }
## 
## ATTENTION: This citation information has been auto-generated from
## the package DESCRIPTION file and may need manual editing, see
## 'help("citation")'.

Session info

This is the full session info from the r environment used to put this together. It includes versions of R and packages used and all the different packages that were loaded when I was doing these analyses.

sessionInfo()

## R version 3.6.1 (2019-07-05)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: OS X El Capitan 10.11.6
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] parallel  stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] geosphere_1.5-10   sjPlot_2.7.2       ggmap_3.0.0       
##  [4] forcats_0.4.0      stringr_1.4.0      purrr_0.3.3       
##  [7] readr_1.3.1        tidyr_1.0.0        tibble_2.1.3      
## [10] tidyverse_1.2.1    dplyr_0.8.3        ggrepel_0.8.1     
## [13] data.table_1.12.6  lubridate_1.7.4    vegan_2.5-6       
## [16] lattice_0.20-38    permute_0.9-5      plyr_1.8.4        
## [19] plotfunctions_1.4  scales_1.0.0       MASS_7.3-51.4     
## [22] rethinking_1.90    dagitty_0.2-2      rstan_2.19.2      
## [25] StanHeaders_2.19.0 MuMIn_1.43.6       here_0.1          
## [28] ggplot2_3.2.1      lme4_1.1-21        Matrix_1.2-17     
## 
## loaded via a namespace (and not attached):
##  [1] minqa_1.2.4         colorspace_1.4-1    rjson_0.2.20       
##  [4] sjlabelled_1.1.1    rprojroot_1.3-2     estimability_1.3   
##  [7] parameters_0.2.0    rstudioapi_0.10     mvtnorm_1.0-11     
## [10] xml2_1.2.2          splines_3.6.1       mnormt_1.5-5       
## [13] knitr_1.25          sjmisc_2.8.2        zeallot_0.1.0      
## [16] jsonlite_1.6        nloptr_1.2.1        ggeffects_0.12.0   
## [19] broom_0.5.2         cluster_2.1.0       png_0.1-7          
## [22] compiler_3.6.1      httr_1.4.1          emmeans_1.4.1      
## [25] sjstats_0.17.6      backports_1.1.5     assertthat_0.2.1   
## [28] lazyeval_0.2.2      cli_1.1.0           htmltools_0.4.0    
## [31] prettyunits_1.0.2   tools_3.6.1         coda_0.19-3        
## [34] gtable_0.3.0        glue_1.3.1          V8_2.3             
## [37] Rcpp_1.0.2          cellranger_1.1.0    vctrs_0.2.0        
## [40] nlme_3.1-141        psych_1.8.12        insight_0.5.0      
## [43] xfun_0.10           ps_1.3.0            rvest_0.3.4        
## [46] lifecycle_0.1.0     pacman_0.5.1        hms_0.5.2          
## [49] inline_0.3.15       yaml_2.2.0          curl_4.2           
## [52] gridExtra_2.3       loo_2.1.0           stringi_1.4.3      
## [55] bayestestR_0.3.0    boot_1.3-23         pkgbuild_1.0.5     
## [58] shape_1.4.4         RgoogleMaps_1.4.5.3 rlang_0.4.0        
## [61] pkgconfig_2.0.3     matrixStats_0.55.0  bitops_1.0-6       
## [64] evaluate_0.14       labeling_0.3        processx_3.4.1     
## [67] tidyselect_0.2.5    magrittr_1.5        R6_2.4.0           
## [70] generics_0.0.2      foreign_0.8-72      pillar_1.4.2       
## [73] haven_2.1.1         withr_2.1.2         mgcv_1.8-29        
## [76] sp_1.3-1            performance_0.3.0   modelr_0.1.5       
## [79] crayon_1.3.4        rmarkdown_2.3       jpeg_0.1-8.1       
## [82] grid_3.6.1          readxl_1.3.1        callr_3.3.2        
## [85] digest_0.6.21       xtable_1.8-4        stats4_3.6.1       
## [88] munsell_0.5.0