Exporatory Data Analysis of NEFSC Bottom Trawl Survey Data

Exploration of spatial and temporal patterns in abundance, and bodymass of fishes from the Northeast groundfish survey.

Build code containing data wrangling and conversions can be accessed here.

Data loaded for this markdown is pulled directly from the {targets} pipeline for data consistency with a developing codebase.

####  Load NEFSC Groundfish Data  ####

# Loading from targets:
# Load the area-stratified biomass/abundances that used species cutoffs
withr::with_dir(rprojroot::find_root('_targets.R'), 
                tar_load(nefsc_stratified))


# Do some text formatting
nefsc_stratified <- nefsc_stratified %>% 
  mutate(
    id = as.character(id),
    spec_class = case_when(
      spec_class == "dem" ~ "Demersal",
      spec_class == "gf" ~ "Groundfish",
      spec_class == "pel" ~ "Pelagic",
      spec_class == "el" ~ "Elasmobranch",
      TRUE ~ "Unclassified"
  ))

# Run Summary Functions
ann_means <- ss_annual_summary(nefsc_stratified, include_epu = F)
seasonals <- ss_seasonal_summary(nefsc_stratified, include_epu = F) 

# bind them so you can facet
summs <- bind_rows(ann_means, seasonals) %>% 
  mutate(season = factor(season, levels = c("Spring", "Fall", "Spring + Fall")))


#drop haddock to see if that changes the bump
haddock_ann  <- ss_annual_summary(filter(nefsc_stratified, comname != "haddock"))
haddock_seas <- ss_seasonal_summary(filter(nefsc_stratified, comname != "haddock"))
no_haddockn  <- bind_rows(haddock_ann, haddock_seas) %>% 
  mutate(season = factor(season, levels = c("Spring", "Fall", "Spring + Fall")))

Temporal Trends

Patterns in annual abundance or CPUE of High Profile Species at annual and seasonal scales.

Total Biomass

Biomass of a given species from a station is a function of the adjusted number caught, their lengths, and the corresponding weights at-length using the species specific length-weight relationship coefficients. Sex-specific coefficients and seasonal growth coefficients are used when available.

Total biomass in the following plots represent the total of all individual biomass estimates from all species with available growth coefficients, for each year and within each survey season.

L-W Derived

Species specific length-weight relationships allow us to look at where biomass is across different lengths. When aggregating up to total biomass using these expected relationships we get this timeseries.

# Annual Timeline
bio_1 <- ann_means %>% 
  ggplot(aes(est_year, lw_biomass_kg, 
             color = season, 
             linetype = "All Species")) +
    geom_vline(xintercept = 2008.5, linetype = 2, alpha = 0.5) +
    geom_line(group = 1) +
    geom_line(data = haddock_ann, 
              aes(est_year, lw_biomass_kg, 
                  color = season, 
                  linetype = "Haddock Removed")) +
    scale_y_continuous(labels = scales::comma_format()) +
    scale_color_manual(values = "black") +
    labs(x = "", 
         y = "Total Length-Weight Biomass (kg)",
         linetype = "Species Represented",
         color = "Season")


# Seasonal Timeline
bio_2 <- seasonals %>% 
  ggplot(aes(est_year, lw_biomass_kg, color = season)) +
    geom_vline(xintercept = 2008.5, linetype = 2, alpha = 0.5) +
    geom_line() +
    scale_y_continuous(labels = scales::comma_format()) +
    scale_color_gmri() +
    labs(x = "", 
         y = "Total Length-Weight Biomass (kg)", 
         color = "Season",
         caption = "Biomass totals from actual catch, not stratified to represent survey areas.")

# Stack them with patchwork
bio_1 / bio_2

Shipboard Weight

As a comparison to estimated length-weight bodymass, we can also get the total biomass from the weights recorded on the ship. Not all fishes are weighed individually, but total biomass for a species is recorded for each station, so at an aggregate tbey should match up.

We would expect with accurate length-weight relationships that this plot should resemble the length-weight time line.

ship_1 <- ann_means %>% 
  ggplot(aes(est_year, fscs_biomass_kg, color = season, linetype = "All Species")) +
    geom_vline(xintercept = 2008.5, linetype = 2, alpha = 0.5) +
    geom_line(group = 1) +
    geom_line(data = haddock_ann, 
              aes(est_year, fscs_biomass_kg, color = season, linetype = "Haddock Removed")) +
    scale_color_manual(values = "black") +
    scale_y_continuous(labels = scales::comma_format()) +
    labs(linetype = "Species Represented",
         color = "Season",
         x = "", 
         y = "Survey Biomass (kg)")

ship_2 <- seasonals %>% 
  ggplot(aes(est_year, fscs_biomass_kg, color = season)) +
    geom_vline(xintercept = 2008.5, linetype = 2, alpha = 0.5) +
    geom_line() +
    scale_y_continuous(labels = scales::comma_format()) +
    scale_color_gmri() +
    labs(x = "", y = "Survey Biomass (kg)", caption = "Biomass from survdat$biomass.")

ship_1 / ship_2

Area Stratification

Area stratified totals for abundance and biomass are calculated according to the following formulas in accordance with code found here, courtesy of Sean Lucey

Expected biomass or abundance across the entire study area is estimated using several components.
1. The catch per effort within a stratum (effort converted to area)CPUE
2. The relative area of a stratum, to the entire survey area, Stratum Area Ratio

Basically we infer a density of fishes within a stratum based on CPUE. Then, because know how big that stratum is, we infer the amount of fish there would be across the entire thing. When done for each of the strata we can aggregate to get an expected abundance across the entire area.

Stratification Explained

The stratum area ratio is the area of strata: $1,2,3, ...s$ relative to the total area of all strata sampled that year (only stratum with data we use is included).

\[stratum~ratio = area(stratum_s)~ / total~area\]

Alternatively, catch can be similarly weighted acccording to the effort and areas of the regional Ecological Production Units. Both of the following are done for each year to account for annual variation in coverage.

\[epu~ratio = area(epu_s) / total~area\]

Next, using the amount of tows in a given year for a specific strata/epu, we get the amount of individuals across that group and divide it by the effort. This is then scaled by $stratum~ratio$ to scale it by how abundance for that stratum reflects the total area of the survey.

For Strata: $1,2,3, ...s$
For years: $1,2,3, ...i$

\[stratum~weighted~catch = stratum~ratio_s * (\frac{~abundance_{is}~}{~n~stations_{is}}~),\]

The stratum weighted catch (abundance or biomass) is the average catch rate for a given area, weighted by the ratio of that stratum’s area to the overall survey area that year.

The stratum weighted catch is then multiplied by the total survey area for that year, divided by the area of a standard tow by the RV Albatross (0.0384 square km). That value can be adjusted to reflect differences in catchability (q) if available. Otherwise q is explicitly assumed to be 1.

This gives the total abundance or biomass for a given size class across the entire strata based on its catch rate that year and the area that the stratum represents.

\[area~expanded~abundance =(stratum~weighted~catch * \frac{total~area}{.0384~square~km})~/~q\]

Then to get the expected biomass for a given size class using the length weight coefficients, the expanded total abundance for that size class just needs to be multiplied by what the mean bodymass would be for that length fish and its species-specific coefficients.

\[area~expanded~biomass = area~expanded~abundance~*~size~specific~biomass\]

Abundance

When expanding the catches out to the total area the survey represents we can get this time line for annual abundances.

abund_1 <- ann_means %>% 
  ggplot(aes(est_year, total_survey_abund)) +
  
  geom_line(color = "gray50", linetype = 3) +
  geom_line(aes(color = `Research Vessel`), show.legend = F) +
  scale_y_continuous(labels = scales::comma_format()) +
  scale_color_gmri() +
  scale_x_continuous(labels = scales::breaks_pretty()) +
  labs(x = "", y = "Total Fish from Survey") 

abund_2 <- ann_means %>% 
  ggplot(aes(est_year, strat_abundance_s/1000000)) +
  geom_line(color = "gray50", linetype = 3) +
  geom_line(aes(color = `Research Vessel`)) +
  scale_y_continuous(labels = scales::comma_format()) +
  scale_color_gmri() +
  scale_x_continuous(labels = scales::breaks_pretty()) +
  labs(x = "", y = "Area Extrapolated Abundance \n(millions)",
       caption = "Area stratification weights from nmfs trawl stratum, not ecological production units.") +
  theme(legend.position = "bottom")


abund_1 / abund_2

Size Specific L-W Biomass

Biomass below uses catch at length, and the expected biomass at length, to get size specific catch/tows across all years for each stratum.

These catch rates are then expanded from catch per tow, to expected total across the entire area for each strata using the difference in relative area among them and the expected area covered by a normal tow.

lw_strat_1 <-  ggplot(ann_means) +
    geom_line(aes(est_year, lw_strat_biomass_s/1000000, 
                  color = season)) +
   scale_color_manual(values = "black") +
    scale_y_continuous(labels = scales::comma_format()) +
    geom_vline(xintercept = 2008.5, linetype = 2, alpha = 0.5) +
    labs(x = "", 
         y = "Stratified Total Biomass\n l-W Regressions (million kg)",
         color = "") +
  theme(legend.position = "bottom")

lw_strat_2 <- ggplot(seasonals) +
    geom_line(aes(est_year, lw_strat_biomass_s/1000000, color = season)) +
    scale_y_continuous(labels = scales::comma_format()) +
    geom_vline(xintercept = 2008.5, linetype = 2, alpha = 0.5) +
    scale_color_gmri() +
    labs(x = "", 
         y = "Stratified Total Biomass\n L-W Regressions (million kg)", 
         color = "")+
  theme(legend.position = "bottom")

lw_strat_1 / lw_strat_2

Total Species Biomass

The totals below use the survdat$biomass column for size-agnostic biomass catch rates. Uses strata specific biomass catch rates which are then expanded out to be representative of total survey areas.

strat_1 <-  ggplot(ann_means) +
    geom_line(aes(est_year, fscs_strat_biomass_s/1000000, color = season)) +
    scale_color_manual(values = "black") +
    scale_y_continuous(labels = scales::comma_format()) +
    geom_vline(xintercept = 2008.5, linetype = 2, alpha = 0.5) +
    labs(x = "", 
         y = "Stratified Total Biomass \n survdat$BIOMASS (million kg)",
         color = "Season") +
  theme(legend.position = "bottom")

strat_2 <-  ggplot(seasonals) +
    geom_line(aes(est_year, fscs_strat_biomass_s/1000000, color = season)) +
    scale_y_continuous(labels = scales::comma_format()) +
    geom_vline(xintercept = 2008.5, linetype = 2, alpha = 0.5) +
    scale_color_gmri() +
    labs(x = "", 
         y = "Stratified Total Biomass \n survdat$BIOMASS (million kg)",
         color = "Season") +
  theme(legend.position = "bottom")

strat_1 / strat_2

CPUE

Biomass per station in the following figures represents the total biomass divided by the effort in number of stations.

There has been no correction for differences in physical stratum area in these figures, only station effort within them for a given year.

cpue_1 <- ann_means %>% 
  ggplot() +
    geom_line(aes(est_year, lw_biomass_per_station, 
                  color = season, linetype = "All Species")) +
    geom_line(data = haddock_ann, 
              aes(est_year, lw_biomass_per_station, 
                  color = season, linetype = "Haddock Removed")) +
    geom_vline(xintercept = 2008.5, linetype = 2, alpha = 0.5) +
    scale_color_manual(values = "black") +
    labs(x = "", 
         y = "LW Biomass / Station\n(kg)",
         linetype = "",
         color = "Season") +
  theme(legend.position = "bottom")

cpue_2 <- seasonals %>% 
  ggplot(aes(est_year, lw_biomass_per_station, color = season)) +
    geom_line() +
    geom_vline(xintercept = 2008.5, linetype = 2, alpha = 0.5) +
    scale_color_gmri() +
    labs(x = "", 
         y = "LW Biomass / Station\n(kg)", 
         color = "Season") +
  theme(legend.position = "bottom")

cpue_1 / cpue_2

Species Guilds

Relative contribution to overall biomass by species group based on management and/or life history: Demersal, Elasmobranch, Groundfish, Pelagic

# Species Class Bins Data Prep
class_totals <- nefsc_stratified %>% 
  group_by(est_year, spec_class) %>% 
  summarise(`Total Class Biomass` = sum(sum_weight_kg),
            `Total Stratified Biomass` = sum(strat_total_biom_s),
            .groups = "keep") %>% 
  ungroup() %>% 
  left_join(ann_means, by = "est_year") %>% 
  mutate(`Percent of Annual Biomass` = (`Total Class Biomass` / lw_biomass_kg) * 100 ,
         `Percent of Stratified Biomass` = (`Total Stratified Biomass` / lw_strat_biomass_s ) * 100,
         `Species Guild` = spec_class,
         `Species Guild` = ifelse(is.na(`Species Guild`), "Unclassified", `Species Guild`),
         Year = as.character(est_year))

Proportion of Total Catch

Class Totals

# Class Proportions
survey_bio <- class_totals %>% 
  ggplot(aes(x = fct_rev(Year), y = `Percent of Annual Biomass`, fill = `Species Guild`)) + 
    geom_col(color = "white", size = 0.2) +
    geom_vline(xintercept = "2008", linetype = 2, color = "gray25") +
    labs(x = "", y = "Proportion of Total Annual Biomass") +
    coord_flip() +
    scale_fill_gmri() +
    scale_y_continuous(position = "right") +
    guides("fill" = guide_legend(title.position = "top", title.hjust = 0.5)) +
    theme(legend.position = "bottom")

strat_bio <- class_totals %>% 
  ggplot(aes(x = fct_rev(Year), y = `Percent of Stratified Biomass`, fill = `Species Guild`)) + 
    geom_col(color = "white", size = 0.2) +
    geom_vline(xintercept = "2008", linetype = 2, color = "gray25") +
    labs(x = "", y = "Proportion of Stratified Biomass") +
    coord_flip() +
    scale_fill_gmri() +
    scale_y_continuous(position = "right") +
    guides("fill" = guide_legend(title.position = "top", title.hjust = 0.5)) +
    theme(legend.position = "bottom")


survey_bio | strat_bio

Plotting Individual Guilds

guild_plots <- nefsc_stratified %>% 
  split(.$spec_class) %>% 
  imap(function(guild, guild_name){
    
    
    year_df <- guild %>% 
      split(.$est_year) %>% 
      imap_dfr(function(year_group, year){
        #quantile breaks
        probs <- c(0.5, 0.25, 0.5, 0.75, 0.95 )
        
        # length quantiles
        length_quantiles <- Hmisc::wtd.quantile(
            x = year_group$length, 
            weights = year_group$numlen_adj, 
            probs = ) %>% 
          as.data.frame() %>% 
          rownames_to_column() %>% 
          setNames(c("Quantile", "length")) %>% 
          mutate(Quantile = factor(probs),
                 year = year)
        
        # weight quantiles
        weight_quantiles <- Hmisc::wtd.quantile(
            x = year_group$ind_weight_kg, 
            weights = year_group$numlen_adj, 
            probs = probs) %>% 
          as.data.frame() %>% 
          rownames_to_column() %>% 
          setNames(c("Quantile", "weight (kg)")) %>% 
          mutate(Quantile = factor(probs),
                 year = year)
        
        guild_data <- left_join(length_quantiles, weight_quantiles, 
                               by = c("year", "Quantile")) %>% 
          mutate(year = fct_rev(year),
                 `Research Vessel` = ifelse(as.numeric(as.character(year)) > 2008, "HB", "AL"))
        
        return(guild_data)  
          
    
      })
    
    
    
    
    # # Make key for better labels
    # guild_key <- c("dem" = "Demersal",
    #                "el" = "Elasmobranch",
    #                "gf" = "Groundfish",
    #                "pel" = "Pelagic")
    
     # Make key for better labels
    guild_key <- c("Demersal" = "Demersal",
                   "Elasmobranch" = "Elasmobranch",
                   "Groundfish" = "Groundfish",
                   "Pelagic" = "Pelagic")
    
    
    # build plot
    p1 <- ggplot(year_df, aes(y = length, x = factor(year))) +
      geom_rect(ymin = -Inf, 
                ymax = Inf, 
                xmin = "2008", 
                xmax = "2019", 
                fill = "gray90") +
      geom_line(aes(group = year), color = "gray50", alpha = 0.5) +
      geom_point(aes(color = Quantile)) +
      scale_color_gmri() +
      labs(subtitle = guild_key[guild_name], y = "Individual Length (cm)", x = NULL) +
      theme(legend.position = "none",
            axis.text.x = element_blank(),
            axis.ticks.x = element_blank())
    
    # build weight plot
    p2 <- ggplot(year_df, aes(y = `weight (kg)`, x = factor(year))) +
      geom_rect(ymin = -Inf, 
                ymax = Inf, 
                xmin = "2008", 
                xmax = "2019", 
                fill = "gray90") +
      geom_line(aes(group = year), color = "gray50", alpha = 0.5) +
      geom_point(aes(color = Quantile)) +
      scale_color_gmri() +
      labs(y = "Individual Weight (kg)", x = NULL) +
      theme(axis.text.x = element_text(angle = 90, vjust = 0.5)) 
      
    patch_plot <- p1 / p2
    return(patch_plot)
    
    
  })

Pelagic

guild_plots$Pelagic

Demersal

guild_plots$Demersal

Elasmobranch

guild_plots$Elasmobranch

Groundfish

guild_plots$Groundfish

Mean Individual Biomass

Check these across years and also by season

summs %>% 
  ggplot(aes(est_year, mean_ind_bodymass_lw * 1000, color = season)) +
   geom_rect(ymin = -Inf, 
             ymax = Inf, 
             xmin = 2008, 
             xmax = 2019, 
             color = "transparent",
             fill = "gray90") +
    geom_line(show.legend = FALSE) +
    scale_color_manual(values = c(
      "Spring + Fall" = "black",
      "Spring" = as.character(gmri_cols("gmri blue")),
      "Fall"   = as.character(gmri_cols("orange")))) +
  labs(x = "", y = "Mean Individual Biomass (g)") +
  facet_grid(season~area)

Mean Individual Length

summs %>% 
  ggplot(aes(est_year, mean_ind_length, color = season)) +
    geom_rect(ymin = -Inf, 
             ymax = Inf, 
             xmin = 2008, 
             xmax = 2019, 
             color = "transparent",
             fill = "gray90") +
    geom_line(show.legend = FALSE) +
    scale_color_manual(values = c(
      "Spring + Fall" = "black",
      "Spring" = as.character(gmri_cols("gmri blue")),
      "Fall"   = as.character(gmri_cols("orange")))) +
  labs(x = "", y = "Mean Individual Length (cm)") +
  facet_grid(season~area)

Top 5 Species by Weight

Since 2008 there has been a large increase in the amount of total biomass caught and the adjusted CPUE. As a way to look into the top species contributing to this I have pulled out the top 5 species for that year by total biomass caught as derived from their length weight relationships.

# Pull top 5 species from each year, by biomass
top_5 <- nefsc_stratified %>% 
  filter(est_year >= 2004) %>% 
  group_by(est_year, comname) %>% 
  summarise(n_fish = sum(numlen_adj),
            lw_biomass = sum(sum_weight_kg)) %>% 
  ungroup() %>% 
  split(.$est_year) %>% 
  map_dfr(function(yr_group){
    yr_group  %>% 
      slice_max(order_by = lw_biomass, n = 5) })
  

# Plot each year, re-ordered by biomass
t5_plots <- top_5 %>% 
  split(.$est_year) %>% 
  map(function(x){
    x %>% mutate(comname = fct_drop(comname),
                 comname = fct_reorder(comname, .fun = max, lw_biomass)) %>% 
    ggplot(aes(x = lw_biomass, y = comname)) +
      geom_point() +
      geom_segment(aes(yend = comname, x = 0, xend = lw_biomass)) +
      facet_wrap(~est_year) +
      scale_x_continuous(labels = scales::comma_format()) +
      labs(x = "Total Biomass (kg) - From LW Relationship", y = "")
})



# Assemble 4-panel plots displaying years in groups of 4
t1 <- ((t5_plots$`2004` | t5_plots$`2005`) / (t5_plots$`2006` | t5_plots$`2007`))
t2 <- ((t5_plots$`2008` | t5_plots$`2009`) / (t5_plots$`2010` | t5_plots$`2011`))
t3 <- ((t5_plots$`2012` | t5_plots$`2013`) / (t5_plots$`2014` | t5_plots$`2015`))
t4 <- ((t5_plots$`2016` | t5_plots$`2017`) / (t5_plots$`2018` | t5_plots$`2019`))

2004 - 2007

t1

2008 - 2011

t2

2012 - 2015

t3

2016 - 2019

t4

Small Fish

Bodymass size cutoffs are used to plot the timelines of total fish biomass below the following thresholds for individual biomass: .0625, .125, .25, .5, 1kg. This is going to fail for the shipboard weights because that weight is divided equally across different measurements of fishes regardless of size.

# Function to plot total annual biomass based on a maximum size to include
max_weight_plot <- function(size_cutoff){
  
  
  
  year_summs <- nefsc_stratified %>% 
    filter(ind_weight_kg < size_cutoff) %>% 
    group_by(est_year) %>% 
    summarise(total_biomass_lw = sum(sum_weight_kg),
              total_strat_biom = sum(strat_total_lwbio_s),
              `Maximum Individual Biomass` = str_c("Individual Mass < ", size_cutoff, " kg"),
              .groups = "keep")
    
    
    
    survey_plot <- year_summs %>% 
      ggplot(aes(est_year, total_biomass_lw, color = `Maximum Individual Biomass`)) +
      geom_line() +
      scale_color_gmri() +
      scale_y_continuous(labels = scales::comma_format()) +
      labs(x = NULL, y = "Survey Annual (kg)") +
      theme(legend.position = c(0.3, 0.9))
  
   # Plot the survey biomass
  strat_plot <- year_summs %>%
    ggplot(aes(est_year, total_strat_biom)) +
      geom_line(color = gmri_cols("gmri blue")) +
      scale_y_continuous(labels = scales::comma_format()) +
      labs(x = NULL, y = "Area Stratified Annual (kg)") 
  
  # Plot them stacked
  survey_plot / strat_plot
}

.0625 kg

# .5 kg
max_weight_plot(.0625)

.125 kg

max_weight_plot(.125)

.25 kg

max_weight_plot(.25)

.5

max_weight_plot(.5)

1 kg

max_weight_plot(1)

Vessel Changes

Maximum size plots show some recent changes in biomass of small fishes corresponding with the vessel change from the RV Albatross to the RV Henry Bigelow.

To Take a look at what species contributed the most to this, I will maintain the maximum body size filter of .125 kg, and compare the difference in catch/tow for species across research vessel. The following plots will highlight the top & bottom ten species experiencing positive/negative changes to their catch rates. Temporal differences are ignored here, and may play a bigger role than vessel changes.

# annual biomasses for top 10 species, ordered by total biomass that year
# Minimum individual bodymass of 5k

# Compare total biomass and cpue across vessels
svp <- nefsc_stratified %>% 
    filter(ind_weight_kg < .125) %>% 
    group_by(svvessel, comname) %>% 
    summarise(bio = sum(sum_weight_kg),
              effort = n_distinct(id),
              cpue = bio / effort,
              .groups = "keep") %>% 
  pivot_wider(names_from = svvessel, 
              names_sep = "_",
              values_from = c(bio, effort, cpue)) %>% 
  mutate(cpue_change = (cpue_HB - cpue_AL) / cpue_AL) %>% 
  arrange(desc(cpue_change)) %>% 
  ungroup

Highlighting Positive Shifts in CPUE

# Highlight Positive Changes in CPUE
pos_change_plot <- svp %>% 
  slice_max(n = 10, order_by = cpue_change) %>% 
  mutate(comname = fct_reorder(.f = comname, .x = cpue_change, .fun = max)) %>% 
  
  ggplot(aes(y = comname, x = cpue_change)) +
  geom_segment(aes(x = 0, xend = cpue_change, y = comname, yend = comname)) +
  geom_point() +
  labs(x = "Relative CPUE Change \n AL -> HB", y = "", subtitle = "Top Positive Shifts in CPUE")

# individual cpues
al_p <- svp %>% 
  slice_max(n = 10, order_by = cpue_change) %>% 
  mutate(comname = fct_reorder(comname, cpue_AL, max)) %>% 
  ggplot(aes(y = comname, x = cpue_AL)) +
  geom_segment(aes(x = 0, xend = cpue_AL, y = comname, yend = comname)) +
  geom_point() +
  labs(x = "Biomass (kg) / Station", y = "", subtitle = "Albatross CPUE")

hb_p <- svp %>% 
  slice_max(n = 10, order_by = cpue_change) %>% 
  mutate(comname = fct_reorder(comname, cpue_HB, max)) %>% 
  ggplot(aes(y = comname, x = cpue_HB)) +
  geom_segment(aes(x = 0, xend = cpue_HB, y = comname, yend = comname)) +
  geom_point() +
  labs(x = "Biomass (kg) / Station", y = "", subtitle = "Bigelow CPUE")


pos_change_plot | (al_p / hb_p)

Highlighting Negative Shift in CPUE

# Highlight Negative Shift in CPUE
neg_change_plot <- svp %>% 
  slice_min(n = 10, order_by = cpue_change) %>% 
  mutate(comname = fct_reorder(comname, cpue_change, max)) %>% 
  ggplot(aes(y = comname, x = cpue_change)) +
  geom_segment(aes(x = 0, xend = cpue_change, y = comname, yend = comname)) +
  geom_point() + 
  scale_x_reverse() +
  labs(x = "Relative CPUE Change \n AL -> HB", y = "", subtitle = "Top Negative Shifts in CPUE")

# individual cpues
al_neg <- svp %>% 
  slice_min(n = 10, order_by = cpue_change) %>% 
  mutate(comname = fct_reorder(comname, cpue_AL, min)) %>% 
  ggplot(aes(y = comname, x = cpue_AL)) +
  geom_segment(aes(x = 0, xend = cpue_AL, y = comname, yend = comname)) +
  geom_point() +
  labs(x = "Biomass (kg) / Station", y = "", subtitle = "Albatross CPUE")

hb_neg <- svp %>% 
  slice_min(n = 10, order_by = cpue_change) %>% 
  mutate(comname = fct_reorder(comname, cpue_HB, min)) %>% 
  ggplot(aes(y = comname, x = cpue_HB)) +
  geom_segment(aes(x = 0, xend = cpue_HB, y = comname, yend = comname)) +
  geom_point() +
  labs(x = "Biomass (kg) / Station", y = "", subtitle = "Bigelow CPUE")


neg_change_plot | (al_neg / hb_neg)

Large Fishes

Same idea for the larger fish, but the opposite. A minimum cutoff for the individual bodymass. Below are the upper quantiles for individual body mass weighted by number at length.

# plotting function
min_weight_plot <- function(size_cutoff){
  
  # Get annual summaries
  year_summs <- nefsc_stratified %>% 
    filter(ind_weight_kg > size_cutoff) %>% 
    group_by(est_year) %>% 
    summarise(total_biomass_lw = sum(sum_weight_kg),
              total_strat_biom = sum(strat_total_lwbio_s),
              `Minimum Individual Biomass` = str_c("Individual Mass > ", size_cutoff, " kg"),
              .groups = "keep") 
  
  # Plot the survey biomass
  survey_plot <- year_summs %>%
    ggplot(aes(est_year, total_biomass_lw, color = `Minimum Individual Biomass`)) +
      geom_line() +
      scale_color_gmri() +
      scale_y_continuous(labels = scales::comma_format()) +
      labs(x = NULL, y = "Survey Annual (kg)") +
    theme(legend.position = c(0.3, 0.9))
  
   # Plot the survey biomass
  strat_plot <- year_summs %>%
    ggplot(aes(est_year, total_strat_biom)) +
      geom_line(color = gmri_cols("gmri blue")) +
      scale_y_continuous(labels = scales::comma_format()) +
      labs(x = NULL, y = "Area Stratified Annual (kg)") 
  
  # Plot them stacked
  survey_plot / strat_plot
}

1 kg

min_weight_plot(1)

2 kg

min_weight_plot(2)

5 kg

min_weight_plot(5)

10 kg

min_weight_plot(10)

Recent Upswings

Plots of minimum bodysize cutoff at 5kg and 10kg seem to have a big upswing in the last couple years, the following plot looks to highlight where the most biomass resides in those years with that same minimum bodymass cutoff of 5g.

# annual biomasses for top 10 species, ordered by total biomass that year
# Minimum individual bodymass of 5k
y_plots <- nefsc_stratified %>% 
    filter(ind_weight_kg > 5,
           est_year >= 2016) %>% 
    group_by(est_year, comname) %>% 
    summarise(total_biomass_lw = sum(sum_weight_kg),
              `Minimum Individual Biomass` = str_c("Individual Mass > ", 5, " kg")) %>%   
  split(.$est_year) %>% 
  map(function(x){
    x %>%  
      top_n(n = 20, total_biomass_lw) %>% 
      arrange(desc(total_biomass_lw)) %>% 
      mutate(comname = fct_reorder(comname, total_biomass_lw, mean))  %>% 
      ggplot(aes(total_biomass_lw, comname)) +
      geom_segment(aes(x = 0, xend = total_biomass_lw, comname, yend = comname)) +
      geom_point() +
      facet_wrap(~est_year) +
      labs(x = "Total LW Biomass (kg)", y = NULL)})

y_plots$`2016` | y_plots$`2017`

y_plots$`2018` | y_plots$`2019`

Cownose Rays

Big jumps in 2018 and 2019 seem to stem from catches of cownose rays. Further inspection shows that they boil down to two alarmingly high catches in the fall of each year. The table below details all instances of catch for cownose rays since 1990. Typically there have been 0-2 per year, with a deeply concerning increase in 2018-2019.

# How many cownose rays, and how suspicious is that number?
 cownose <- nefsc_stratified %>% 
   filter(comname == "cownose ray") %>% 
   group_by(comname, est_year, season, id) %>% 
   summarise(total_biomass_lw = sum(sum_weight_kg),
              num_individuals = sum(numlen_adj),
              n_occurrences   = n_distinct(id)) 

cownose %>% kable()

comname	est_year	season	id	total_biomass_lw	num_individuals	n_occurrences
cownose ray	1972	Fall	1972080531650	137.286135	16	1
cownose ray	1972	Fall	1972080551650	36.689235	4	1
cownose ray	1976	Fall	1976091121650	7.425116	1	1
cownose ray	2006	Fall	2006100691650	2.046616	1	1
cownose ray	2009	Fall	2009040721650	28.211572	2	1
cownose ray	2011	Fall	2011050531650	22.596350	1	1
cownose ray	2012	Fall	2012040861650	1.243419	1	1
cownose ray	2018	Fall	2018040781690	80.136882	45	1
cownose ray	2019	Fall	2019041521730	1771.707695	109	1

Prominant Species Timelines

This section is for species of prominent importance either for their significance as a commercial species, or their proliferance in the survey.

Highlighting Single Species

# What do we want to show for these highlight species
species_timeline <- function(common_name){
  annual_totals <- nefsc_stratified %>% 
    filter(comname == common_name) %>% 
    group_by(est_year) %>% 
    summarize(total_strat_biom = sum(strat_total_lwbio_s), 
              avg_ind_mass = weighted.mean(ind_weight_kg, numlen_adj),
              .groups = "drop")
  
  biomass_timeline <- ggplot(annual_totals, aes(est_year, total_strat_biom)) +
    geom_line(group = 1) +
    geom_point() +
    scale_y_continuous(labels = scales::comma_format()) +
    labs(x = "", y = "Stratified Total Biomass (kg)",
         title = str_to_title(common_name),
         subtitle = "Total Biomass Over Time")
  
  bodysize_timeline <- ggplot(annual_totals, aes(est_year, avg_ind_mass)) +
    geom_line(group = 1) +
    geom_point() +
    labs(x = "Year", y = "Individual Bodymass (kg)",
         subtitle = "Individual Bodymass Over Time")
  
  biomass_timeline / bodysize_timeline
}

Atlantic Cod

species_timeline("atlantic cod")

Haddock

species_timeline("haddock")

Spiny Dogfish

species_timeline("spiny dogfish")

Acadian Redfish

species_timeline("acadian redfish")

Winter Skate

species_timeline("winter skate")

Silver Hake

species_timeline("silver hake")

Functional Group Trends

fgroup_summs <- nefsc_stratified %>%
  filter(spec_class != "Unclassified") %>% 
  group_by(est_year, spec_class) %>% 
  summarise(total_strat_biom = sum(strat_total_lwbio_s), 
            avg_ind_mass = weighted.mean(ind_weight_kg, numlen_adj),
            .groups = "drop")

biomass_timeline <- ggplot(fgroup_summs, aes(est_year, total_strat_biom)) +
    geom_line(group = 1) +
    #geom_point() +
    facet_grid(~spec_class) +
    scale_y_continuous(labels = scales::comma_format()) +
    labs(x = "", y = "Stratified Total Biomass (kg)",
         subtitle = "Total Biomass Over Time")
  
  bodysize_timeline <- ggplot(fgroup_summs, aes(est_year, avg_ind_mass)) +
    geom_line(group = 1) +
    #geom_point() +
    facet_grid(~spec_class) +
    labs(x = "Year", y = "Individual Bodymass (kg)",
         subtitle = "Individual Bodymass Over Time")
  
  biomass_timeline / bodysize_timeline

# Plot top 5 biomass species from each group
t5_plots <- nefsc_stratified %>% 
  filter(spec_class != "Unclassified") %>% 
  split(.$spec_class) %>% 
  map(function(spec_class_dat){
    
    # Pull out the top 5  most biomass species
    top_species <- spec_class_dat %>% 
      group_by(comname) %>% 
      summarize(total_strat_biom = sum(strat_total_lwbio_s), 
                avg_ind_mass = weighted.mean(ind_weight_kg, numlen_adj),
                .groups = "drop") %>% 
      arrange(desc(total_strat_biom)) %>% 
      top_n(n = 5, wt = total_strat_biom) %>% 
      pull(comname)
    
    
    
    # Get annual Summaries
    ysummary <- spec_class_dat %>% 
      filter(comname %in% top_species) %>% 
      group_by(comname, est_year, spec_class) %>% 
        summarize(total_strat_biom = sum(strat_total_lwbio_s), 
                  avg_ind_mass = weighted.mean(ind_weight_kg, numlen_adj),
                  .groups = "drop") 
    
    # Order factor by biomass totals
    ysummary <- mutate(ysummary, comname = factor(comname, levels = top_species))
    
    ysummary %>% 
      ggplot(aes(est_year, total_strat_biom, color = comname)) +
        geom_line() +
        scale_y_continuous(labels = scales::comma_format()) +
        facet_wrap(~spec_class) +
        scale_color_gmri() +
        labs(x = "", y = "Total Strat Biomass (kg)")+
        theme(legend.position = "bottom") +
        guides(color = guide_legend(title = "",
                                    nrow = 2, 
                                    title.position = "top", 
                                    title.hjust = 0.5, 
                                    label.position = "top"))
  })


(t5_plots[[1]] | t5_plots[[2]]) / (t5_plots[[3]] | t5_plots[[4]]) + plot_annotation(title = "Top 5 Species By Biomass")

Spatial Patterns

For large regions like Georges Bank and the Gulf of Maine, what kind of patterns are we seeing.

# Load the strata
survey_strata <- read_sf(str_c(res_path, "Shapefiles/BottomTrawlStrata/BTS_Strata.shp"))  %>% 
  clean_names() %>% 
  filter(strata >= 01010 ,
         strata <= 01760,
         strata != 1310,
         strata != 1320,
         strata != 1330,
         strata != 1350,
         strata != 1410,
         strata != 1420,
         strata != 1490) 


# Key to which strata = which regions
strata_key <- list(
  "Georges Bank"          = as.character(13:23),
  "Gulf of Maine"         = as.character(24:40),
  "Southern New England"  = str_pad(as.character(1:12), width = 2, pad = "0", side = "left"),
  "Mid-Atlantic Bight"    = as.character(61:76))


# Assign Areas
survey_strata <- survey_strata %>% 
  mutate(
    strata = str_pad(strata, width = 5, pad = "0", side = "left"),
    strata_num = str_sub(strata, 3, 4),
    area = case_when(
      strata_num %in% strata_key$`Georges Bank` ~ "Georges Bank",
      strata_num %in% strata_key$`Gulf of Maine` ~ "Gulf of Maine",
      strata_num %in% strata_key$`Southern New England` ~ "Southern New England",
      strata_num %in% strata_key$`Mid-Atlantic Bight` ~ "Mid-Atlantic Bight",
    TRUE ~ "Outside Major Study Areas"
  )) %>% 
  select(finstr_id, strata, strata_num, area, a2, str2, set, stratuma, str3, geometry)


# Make trawl data an sf dataset
trawl_sf <- nefsc_stratified %>% st_as_sf(coords = c("decdeg_beglon", "decdeg_beglat"), crs = 4326)

Map of Trawl Regions

For this project we are using different regions composed of collections of individual trawl survey strata.

For comparison sake the areas of the different ecological production units has also been included.

# Plot to check
strata_plot <- ggplot() +
  geom_sf(data = new_england) +
  geom_sf(data = canada) +
  geom_sf(data = survey_strata, aes(fill = area)) +
  coord_sf(xlim = c(-77, -65.5), ylim = c(34, 45.75), expand = FALSE) +
  guides(fill = guide_legend(nrow = 2)) +
  theme_bw() +
  theme(legend.position = "bottom", legend.title = element_blank()) +
  labs(subtitle = "Survey Regions")


# load EPU shapes
epu_sf <- ecodata::epu_sf
epu_plot <- ggplot() +
  geom_sf(data = new_england) +
  geom_sf(data = canada) +
  geom_sf(data = epu_sf, aes(fill = EPU)) +
  coord_sf(xlim = c(-77, -65.5), ylim = c(34, 45.75), expand = FALSE) +
  guides(fill = guide_legend(nrow = 2)) +
  theme_bw() +
  theme(legend.position = "bottom", legend.title = element_blank())+
  labs(subtitle = "Ecological Production Units")

strata_plot | epu_plot

Regional Summaries

# Just Area, all seasons
area_summs <- nefsc_stratified %>% 
  group_by(survey_area) %>% 
  summarise(
    season = "Spring + Fall",
    lw_biomass_kg = sum(sum_weight_kg, na.rm = T),
    n_stations = n_distinct(id),
    lw_biomass_per_station = lw_biomass_kg / n_stations,
    mean_ind_bodymass = weighted.mean(ind_weight_kg, weights = numlen_adj),
    mean_ind_length = weighted.mean(length_cm, weights = numlen_adj),
    .groups = "keep"
  ) %>% 
  ungroup()

# Area x Season
seas_area <- nefsc_stratified %>% 
  group_by(survey_area, season) %>% 
  summarise(
    lw_biomass_kg = sum(sum_weight_kg, na.rm = T),
    n_stations = n_distinct(id),
    lw_biomass_per_station = lw_biomass_kg / n_stations,
    mean_ind_bodymass = weighted.mean(ind_weight_kg, weights = numlen_adj),
    mean_ind_length = weighted.mean(length_cm, weights = numlen_adj),
    .groups = "keep"
  ) %>% 
  ungroup()


# Combine those two
summs_combined <- bind_rows(area_summs, seas_area) %>% 
  mutate(season = factor(season, levels = c("Spring", "Fall", "Spring + Fall")))

summs_combined %>% 
  mutate_if(is.numeric,round, 2) %>% 
  arrange(survey_area,season) %>% 
  knitr::kable()

survey_area	season	lw_biomass_kg	n_stations	lw_biomass_per_station	mean_ind_bodymass	mean_ind_length
GB	Spring	438226.3	2996	146.27	0.88	40.33
GB	Fall	450348.8	3124	144.16	0.73	37.69
GB	Spring + Fall	888575.1	6120	145.19	0.80	38.94
GoM	Spring	348424.0	3631	95.96	0.63	35.06
GoM	Fall	619849.5	3738	165.82	0.70	36.53
GoM	Spring + Fall	968273.5	7369	131.40	0.67	35.84
MAB	Spring	752939.2	2544	295.97	0.83	44.44
MAB	Fall	104074.9	2476	42.03	0.66	25.95
MAB	Spring + Fall	857014.1	5020	170.72	0.78	38.68
SNE	Spring	460508.7	2869	160.51	0.54	36.58
SNE	Fall	210873.7	2743	76.88	0.45	32.75
SNE	Spring + Fall	671382.4	5612	119.63	0.51	35.08

# Year x Area
area_summs_y <- nefsc_stratified %>% 
  group_by(est_year, survey_area) %>% 
  summarise(
    season                 = "Spring + Fall",
    lw_biomass_kg          = sum(sum_weight_kg, na.rm = T),
    n_stations             = n_distinct(id),
    lw_biomass_per_station = lw_biomass_kg / n_stations,
    mean_ind_bodymass      = weighted.mean(ind_weight_kg, weights = numlen_adj),
    mean_ind_length        = weighted.mean(length_cm, weights = numlen_adj),
    strat_total_abund         = sum(strat_total_abund_s),
    strat_total_lwbio         = sum(strat_total_lwbio_s),
    strat_total_biom          = sum(strat_total_biom_s),
    .groups = "keep"
  ) %>% 
  ungroup()

Total Biomass

Total biomass in the figures below is derived from weight at length relationships and reflects the mean expected biomass of the fish caught using the observed distribution and frequency of lengths.

area_summs_y %>% 
  ggplot(aes(est_year, lw_biomass_kg)) +
    geom_line() +
    facet_wrap(~survey_area, ncol = 2) +
    scale_y_continuous(labels = scales::comma_format()) +
    labs(x = "", y = "Total Biomass (kg)")

CPUE

CPUE displayed below is the mean total biomass, derived from weight at length relationships, per station for each region. These values have not been weighted to reflect the difference in area between regions.

area_summs_y %>% 
  ggplot(aes(est_year, lw_biomass_per_station)) +
    geom_line() +
    facet_wrap(~survey_area, ncol = 2) +
    labs(x = "", y = "Adjusted Biomass per Station (kg)")

Stratified Total Abundance

Stratified Total abundance is calculated by taking the catch per station rates for each length class of each species and extending that density to the total areas of the region. Catchability is assumed to be 1 for this calculation though it is certainly lower. This extension of catch rates is highly sensitive to large swings due to high variances and zero inflation of abundances.

area_summs_y %>% 
  ggplot(aes(est_year, strat_total_abund/1000000)) +
    geom_line() +
    scale_y_continuous(labels = scales::comma_format()) +
    facet_wrap(~survey_area, ncol = 2) +
    labs(x = "", y = "Stratified Total Abundance (millions)")

Stratified Total Biomass

The expected biomass estimates for the survdat$biomass data and the expected biomass from L-W regressions should be fairly similar or it would refute the accuracy of length-weight relationshihps.

There also should be no substantial jump in observations immediately following the transition between research vessels as the catch rates have been adjusted to account for that.

This plot should thus show very similar trends for the change in stratified biomass from the biomass weighed during the survey operation (left) and biomass derived from length weight relationships (right).

fscs <- area_summs_y %>% 
  ggplot() +
    geom_line(aes(est_year, strat_total_biom / 1000000, color = "Shipboard Weights")) +
    scale_y_continuous(labels = scales::comma_format()) +
    facet_wrap(~survey_area, ncol = 1) +
    scale_color_gmri(reverse = T) +
    labs(x = "", y = "Stratified Total Biomass FSCS (million kg)") +
    theme(legend.position = "bottom", legend.title = element_blank())

lw <- area_summs_y %>% 
  ggplot() +
    geom_line(aes(est_year, strat_total_lwbio / 1000000, color = "LW Regression Weights")) +
    scale_y_continuous(labels = scales::comma_format()) +
    scale_color_gmri(reverse = F) +
    facet_wrap(~survey_area, ncol = 1) +
    labs(x = "", y = "Stratified Total Biomass LW (million kg)") +
    theme(legend.position = "bottom", legend.title = element_blank())

fscs | lw

A work by Adam A. Kemberling

Akemberling@gmri.org

NEFSC Exploratory Data Analysis

Adam A. Kemberling

2021-08-02

Exporatory Data Analysis of NEFSC Bottom Trawl Survey Data

Temporal Trends

Total Biomass

L-W Derived

Shipboard Weight

Area Stratification

Stratification Explained

Abundance

Size Specific L-W Biomass

Total Species Biomass

CPUE

Species Guilds

Proportion of Total Catch

Pelagic

Demersal

Elasmobranch

Groundfish

Mean Individual Biomass

Mean Individual Length

Top 5 Species by Weight

2004 - 2007

2008 - 2011

2012 - 2015

2016 - 2019

Small Fish

.0625 kg

.125 kg

.25 kg

.5

1 kg

Vessel Changes

Large Fishes

1 kg

2 kg

5 kg

10 kg

Recent Upswings

Cownose Rays

Prominant Species Timelines

Atlantic Cod

Haddock

Spiny Dogfish

Acadian Redfish

Winter Skate

Silver Hake

Functional Group Trends

Spatial Patterns

Map of Trawl Regions

Regional Summaries

Total Biomass

CPUE

Stratified Total Abundance

Stratified Total Biomass