About

The goal of this markdown is to validate consistency of outcomes for various aggregation scales. The approach is to take 4 years from the survey data, 2006-2010 to cover the change in survey design. Then we will track comparisons between methods for estimating biomass at a station level or in aggregate.

Starting Data

####__  Data  ####


# Data From 03/03/2021:
load(paste0(nmfs_path, "2021_survdat/NEFSC_BTS_all_seasons_03032021.RData"))

# Clean names up a touch
trawldat <- survey$survdat %>% clean_names() %>% 
  rename(est_year = year) 

# And filter years to speed this process up
trawldat <- filter(trawldat, est_year >= 2006 & est_year <= 2010) %>% 
  mutate(comname = tolower(comname))

Clean up Function

The size spectra clean up function does any data prep including:

  • Removal of incomplete records
  • Spatial filters
  • Pairing species to growth coefficients
  • Calculating stratum effort and area weighting coefficients
  • Estimating stratified catch rates
# # Cleanup functions
# Moved to gmRi package


# Clean/Prep the Survdat 
survdat_21 <- gmRi::gmri_survdat_prep(survdat = trawldat)

# Add LW coefficients
# option to cutoff species using a threshold cutoff: 
survdat_lw <- gmRi::add_lw_info(survdat_clean = survdat_21, cutoff = FALSE)


# Get stratified Biomass
weights_21 <- gmRi::add_area_stratification(survdat_weights = survdat_lw, include_epu = F)

NOTE:

The cleanup function retains the station-level information of the raw data, so its possible to compare individual stations across the raw and prepped data:

Direct Comparisons

NOTE: In these initial comparisons all species with length-weight relationships are included however bad they are. Later in the document the comparison is made again to showcase the improvement of the relationship between two biomass sources.

Looking at the weights dataset we prepared, the first obvious comparison to make is how to weights from length weight relationships compare to the bulk biomass measurements recorded on the ship.

Total Biomass vs. L-W Biomass

Due to sub-sampling of large catches and inherent variability around the mean weight at length for a species we expect the biomass estimates to resemble but not exactly match the biomass recorded for a species total weight at a station. The following figures aim to track how these aggregate totals compare over the years of the survey.

####  Annual Summary, summer and Fall  ####


# 1. prepare data to plot different biomass sources side-by-side:

# BIOMASS
# Biomass_per_lclass distributes biomass equally across lengths,
# Is also used for area stratified biomass, eliminates need to use distinct
ann_summs <- weights_21 %>% 
  group_by(est_year) %>% 
  summarise(`Bulk Biomass` = sum(biom_per_lclass),
            `Area Stratified Total BIOMASS` = sum(strat_total_biom_s),
            `L-W Biomass`               = sum(sum_weight_kg),
            `Area Stratified Total L-W` = sum(strat_total_lwbio_s),
            season = "Spring + Fall",
            .groups = "drop") 




####  Annual Summary, separate seasons  ####

seas_summs <-  weights_21  %>% 
  group_by(est_year, season) %>% 
  summarise(`Bulk Biomass` = sum(biom_per_lclass),
            `Area Stratified Total BIOMASS` = sum(strat_total_biom_s),
            `L-W Biomass` = sum(sum_weight_kg),
            `Area Stratified Total L-W` = sum(strat_total_lwbio_s),
            .groups = "drop")


# reformat for plotting
summaries_long <- bind_rows(ann_summs, seas_summs) %>% 
  pivot_longer(names_to = "Biomass Source", 
               values_to = "Total Survey Biomass (kg)", 
               cols = c(`L-W Biomass`, `Bulk Biomass`)) %>% 
  pivot_longer(names_to = "Stratified Biomass Source", 
               values_to = "Stratified Total Biomass (kg)", 
               cols = c(`Area Stratified Total L-W`,`Area Stratified Total BIOMASS`)) %>% 
  mutate(season = factor(season, levels = c("Spring + Fall", "Spring", "Fall")))

Un-adjusted Biomass

# Total Biomass from survey
ggplot(summaries_long, aes(est_year, `Total Survey Biomass (kg)`, color = `Biomass Source`)) +
  geom_line() +
  facet_grid(`Biomass Source`~season, scales = "free") +
  scale_y_continuous(labels = comma_format())+
  scale_color_gmri() +
  labs(x = "",
       caption = "All available species coefficients used (wigley & fishbase)",
       title = "Difference Prior to LW Relationship Validations") +
  theme(legend.position = "bottom") +
  guides(color = guide_legend(title.position = "top", title.hjust = 0.5))

Area-Stratified Biomass

# What that biomass is extended out to total areas
ggplot(summaries_long, 
       aes(est_year, `Stratified Total Biomass (kg)`, 
           color = `Stratified Biomass Source`)) +
  geom_line() +
  facet_grid(`Stratified Biomass Source`~season, scales = "free") +
  scale_y_continuous(labels = comma_format())+
  scale_color_gmri() +
  labs(x = "", 
       caption = "All available species coefficients used (wigley & fishbase)",
       title = "Difference Prior to LW Relationship Validations") +
  theme_bw() + 
  theme(legend.position = "bottom",
        axis.text.x = element_text(angle = 90, vjust = 0.5)) +
  guides(color = guide_legend(title.position = "top", title.hjust = 0.5)) 

Aggregation Comparisons

Another comparison of interest is whether it makes a difference to calculate catch rates for individual lengths of a species, expanding those rates out to get a stratified biomass. Or to aggregate before estimating catch rates.

The following code will look into that comparison.

Size Specific Catch Rates

The way the size spectra build code is written catch rates for each length caught are estimated and weighted for every stratum in which they are caught. Weighting is adjusted by the amount of effort within that stratum in that year, and the relative size of the stratum to the overall sampling coverage that year.

# test_station <- weights_20$id[1]
test_station <- weights_21$id[1]
test_species <- "silver hake"

print(paste0("The test station is: ", test_station))
[1] "The test station is: 2006042241300"
print(paste0("The test species is: ", test_species))
[1] "The test species is: silver hake"
weights_21 %>% 
    filter(id == test_station, 
           comname == test_species) %>% 
      select("Species"                                      = comname, 
             "Length"                                       = length_cm, 
             "Number at Length"                             = numlen_adj, 
             "Weight (kg)"                                  = sum_weight_kg, 
             "Stratum"                                      = stratum,
             "Tows in Stratum (by year)"                    = strat_ntows, 
             "Abundance per Tow"                            = abund_tow_s, 
             "Weighted Abundance per Tow (by stratum area)" = strat_mean_abund_s,
             "Projected Abundance (stratum)"                = strat_total_abund_s) %>% 
  mutate(`Weight (g)` = `Weight (kg)` * 1000) %>% 
  mutate_if(is.numeric, round, 3) %>% 
  head(10) %>% 
  kable() %>%  
  kable_styling(font_size = 10, )
Species Length Number at Length Weight (kg) Stratum Tows in Stratum (by year) Abundance per Tow Weighted Abundance per Tow (by stratum area) Projected Abundance (stratum) Weight (g)
silver hake 25 1 0.095 1300 3 0.333 0.003 16579 95.362
silver hake 29 1 0.152 1300 3 0.333 0.003 16579 152.228
silver hake 31 1 0.188 1300 3 0.333 0.003 16579 187.830

The question is whether the sum of these “Projected Abundances” is the same when these catch rates are done in aggregate using the following function:

source(here("R/support/nefsc_ss_build_nodrop.R"))
agg_species_metrics
function (.data = data, ..., include_epu = FALSE) 
{
    num_tows <- .data %>% group_by(...) %>% summarise(n_stations = n_distinct(id), 
        n_species = n_distinct(comname), .groups = "keep")
    data_stratified <- .data %>% left_join(num_tows)
    species_data_stratified <- data_stratified %>% group_by(..., 
        comname) %>% summarise(n_tows = max(n_stations), n_species = max(n_species), 
        lw_biomass_kg = sum(sum_weight_kg), fscs_biomass_kg = sum(biom_per_lclass), 
        abundance = sum(numlen_adj), abund_tow = abundance/n_tows, 
        lwbio_tow = lw_biomass_kg/n_tows, biom_tow = fscs_biomass_kg/n_tows, 
        mean_ind_length = weighted.mean(length_cm, numlen_adj), 
        mean_ind_bodymass_lw = weighted.mean(ind_weight_kg, numlen_adj), 
        mean_ind_length_lw = weighted.mean(length_cm, numlen_adj), 
        strat_abundance_s = sum(strat_total_abund_s), lw_strat_biomass_s = sum(strat_total_lwbio_s, 
            na.rm = T), fscs_strat_biomass_s = sum(strat_total_biom_s), 
        .groups = "keep") %>% ungroup()
    if (include_epu == T) {
        epu_strat <- data_stratified %>% group_by(..., comname) %>% 
            summarise(strat_abundance_epu = sum(strat_total_abund_epu), 
                lw_strat_biomass_epu = sum(strat_total_lwbio_epu, 
                  na.rm = T), fscs_strat_biomass_epu = sum(strat_total_biom_epu), 
                .groups = "keep") %>% ungroup()
        species_data_stratified <- bind_cols(species_data_stratified, 
            epu_strat)
    }
    return(species_data_stratified)
}

Aggregate After

If we keep all the individual size classes separate, and get the stratified abundances at length and their associated weights at length we come to this table when we aggregate by year, season, and species.

cod_21 <- weights_21 %>% filter(comname == "atlantic cod")

# adding up stratified weights at the end
cod_21 %>% 
  group_by(est_year, season, comname) %>% 
  summarise(stratified_abundance = sum(strat_total_abund_s),
            stratified_lwbio     = sum(strat_total_lwbio_s),
            stratified_biom      = sum(strat_total_biom_s),
            .groups = "drop") %>% 
  select(Year                     = est_year, 
         Season                   = season, 
         Species                  = comname, 
         `Stratified Abundance`   = stratified_abundance,
         `Stratified LWbio`       = stratified_lwbio,
         `Stratified Biom`        = stratified_biom) %>% 
  kable() %>% 
  kable_styling(font_size = 10)
Year Season Species Stratified Abundance Stratified LWbio Stratified Biom
2006 Fall atlantic cod 7075583 9570447 9542305
2006 Spring atlantic cod 5908324 12619290 11478785
2007 Fall atlantic cod 4087122 6083501 6146140
2007 Spring atlantic cod 22382584 31917333 30563927
2008 Fall atlantic cod 9244146 10838608 11155130
2008 Spring atlantic cod 15921891 20502500 20615227
2009 Fall atlantic cod 12602502 10827485 13935593
2009 Spring atlantic cod 9745758 13100125 14945137
2010 Fall atlantic cod 4599235 5463940 6000686
2010 Spring atlantic cod 11886289 13176354 14302068

Aggregate First

Alternatively, you can aggregate the total to get stratified abundances for year and season first and the numbers check out that way as well.

# Aggregating then calculating the catch rates etc.
cod_21 %>% 
  agg_species_metrics(est_year, season) %>% 
  select(Year                     = est_year, 
         Season                   = season, 
         Species                  = comname, 
         `Stratified Abundance`   = strat_abundance_s,
         `Stratified LWbio`       = lw_strat_biomass_s,
         `Stratified Biom`        = fscs_strat_biomass_s) %>% 
  kable() %>% 
  kable_styling(font_size = 10)
Year Season Species Stratified Abundance Stratified LWbio Stratified Biom
2006 Fall atlantic cod 7075583 9570447 9542305
2006 Spring atlantic cod 5908324 12619290 11478785
2007 Fall atlantic cod 4087122 6083501 6146140
2007 Spring atlantic cod 22382584 31917333 30563927
2008 Fall atlantic cod 9244146 10838608 11155130
2008 Spring atlantic cod 15921891 20502500 20615227
2009 Fall atlantic cod 12602502 10827485 13935593
2009 Spring atlantic cod 9745758 13100125 14945137
2010 Fall atlantic cod 4599235 5463940 6000686
2010 Spring atlantic cod 11886289 13176354 14302068

Checking Length Weight Relationships

It looks like the root cause is just the difference between what the mean prediction from species-specific coefficients is when compared to the biomass column in survdat.

If we take the totals from the length-weight derived biomasses and the BIOMASS columns that match we can rank them by how often they over/underestimate biomass.

NOTE: To be sure there is no problem with the survey change only data prior to 2008 will be used.

# Load all years and species using most recent data
weights_all <- gmRi::gmri_survdat_prep(survdat_source = "most recent") %>% 
  gmRi::add_lw_info(cutoff = FALSE) %>% 
  gmRi::add_area_stratification() %>% 
  filter(est_year <= 2008)

Summary and Plotting Functions

# Get the total weights for each station, then do it each year for every species
mismatch_ranks <- function(x){
  species_summaries <- x %>% 
    group_by(est_year, comname, id) %>% 
    summarise(number_caught = sum(numlen_adj),
              length_weight_biomass = sum(sum_weight_kg),
              ship_measured_biomass = sum(biom_per_lclass), # also in kg
              lw_underestimate = ship_measured_biomass - length_weight_biomass, 
              .groups = "drop") 
  
  
  # Format rankings to plot
  n_species <- length(unique(species_summaries$comname))
  ranks <- species_summaries %>% 
    mutate(comname = fct_reorder(.f = comname, .x = lw_underestimate, mean),
           species_rank = as.numeric(comname),
           extremes = ifelse( species_rank >= n_species - 20, "Extreme Under-Estimation", NA),
           extremes = ifelse(species_rank <= 20, "Extreme Over-Estimation", extremes)) 
  
  return(ranks)
  
}


# Plot flags top and bottom 15 species that don't match well with survdat$biomass
mismatch_plot <- function(x){
  # All ranks in one panel
  all_ranks <- x %>% 
    ggplot(aes(lw_underestimate, species_rank)) +
      geom_boxplot(aes(group = species_rank), color = "gray40") +
      geom_boxplot(data = filter(x, is.na(extremes) == FALSE),
                   aes(group = species_rank, color = extremes)) +
      geom_vline(xintercept = 0, size = 0.5, color = "black", linetype = 2) +
      scale_x_continuous(labels = comma_format()) +
      scale_color_gmri() +
      guides("color" = guide_legend(title = "")) +
      labs(x = "Underestimation of Biomass\nShipboard Weight - LW Biomass (kg)",
           y = "Underestimation Rank") +
    theme(legend.position = c(0.2, 1))
  
  # Focus on top 5
  top_5 <- x %>% 
    filter(extremes == "Extreme Under-Estimation") %>% 
    ggplot(aes(lw_underestimate, comname)) +
      geom_boxplot(aes(group = species_rank), color = gmri_cols("orange")) +
      scale_color_gmri() +
      scale_x_continuous(labels = comma_format()) +
      labs(x = "",
           y = "", 
           subtitle = "L-W biomass < 'survdat$Biomass'") +
    theme(title = element_text(hjust = 0.5))
  
  # Focus on bottom 5
  bot_5 <- x %>% 
    filter(extremes == "Extreme Over-Estimation") %>% 
    ggplot(aes(lw_underestimate, comname)) +
      geom_boxplot(aes(group = species_rank), color = gmri_cols("gmri blue")) +
      scale_color_gmri() +
      scale_x_continuous(labels = comma_format()) +
      labs(x = "Underestimation of Biomass\nShipboard Weight - LW Biomass (kg)",
           y = "", 
           subtitle = "L-W mass > 'survdat$Biomass'") +
    theme(title = element_text(hjust = 0.5))
    
  
  
  # Patchwork plot
  plot_out <- all_ranks | (top_5/bot_5)
  return(plot_out)
}

Wigley 06 - Species

These species got a priority pass when matching with L-W coefficients because they are from a more recent and region specific source.

# Just Wigley
wigley_ranks <- weights_all %>% 
  filter(source == "wigley") %>% 
  mismatch_ranks() 

# Wigley mismatch plot
wigley_ranks %>% 
  mismatch_plot()

Fishbase Species

These species were matched up with coefficients from fishbase and are likely less accurate than the ones with svspp codes.

fishbase_ranks <- weights_all %>% 
  filter(source == "fishbase") %>% 
  mismatch_ranks()

fishbase_ranks %>% 
  mismatch_plot()

LW Fit Diagnostics

The following tabs are an attempt to dig into what is happening with species on the extremes of the previous plots. The single species plots should ideally display horizontal trends with no bias towards over or under estimation of biomass.

Single Species Checks

Take the species that stood out above and plot the mismatch versus the station abundances. This should highlight whether its an issue of sub-sampling. Might be beneficial to split it for seasons for dogfish and some other species.

#species that are caught in large numbers
subsampling_suspects <- c("spiny dogfish", "haddock", "blueback herring", 
                          "american shad", "southern stingray", "atlantic cod", 
                          "pollock", "alewife", "acadian redfish", "butterfish")


# Ranks Across all Sources
all_source_ranks <- mismatch_ranks(weights_all) 



# Plot how they look for the specific species
all_source_ranks %>% 
  filter(comname %in% subsampling_suspects) %>% 
  ggplot(aes(number_caught, lw_underestimate)) +
      geom_hline(yintercept = 0, linetype = 2, color = "gray50", size = 0.5) +
      geom_point(color = gmri_cols("gmri blue")) +
      geom_smooth(formula = y ~ x, 
                  method = "lm", 
                  color = gmri_cols("orange")) +
      facet_wrap(~comname, scales = "free", ncol = 2) +
      labs(x = "Single Station Abundance", y = "survdat$BIOMASS - LW Biomass")

Cutoff Threshold

If we assume that the total biomass weighed at a station is correct then we can use some cutoff value for an unacceptable deviation from the known weight.

# Percent Cutoff
perc_cutoff <- 15

For example values 15% heavier or lighter.

# Identify Coefficients that seem off
perc_differences <- weights_all %>% 
  group_by(comname, source) %>% 
  summarise(
    `Biomass`    = sum(biom_per_lclass),
    `LW Biomass` = sum(sum_weight_kg),
    weight_perc_diff  = ((`LW Biomass` - Biomass) / Biomass) * 100,
    broken_coef  = ifelse(weight_perc_diff > perc_cutoff | weight_perc_diff < -perc_cutoff, "broken", "less broken"),
    .groups = "drop")


# Pull the ones that are flagged
broken_coefficients <- perc_differences %>% 
  filter(broken_coef == "broken")


# Broken Wigley Coefficientz
broken_coefficients %>% 
  filter(source == "wigley") %>% 
  mutate(comname = fct_reorder(comname, weight_perc_diff, max)) %>% 
  ggplot(aes(weight_perc_diff, comname)) +
   geom_point() +
   labs(x = "Percent Difference in Biomass Sources", y = "",
        subtitle = "Extreme Biomass Mismatch - Wigley Coefficients")

# Broken fishbase coefficients
broken_coefficients %>% 
  filter(source == "fishbase",
         comname != "hogfish") %>% 
  top_n(n = 20, weight_perc_diff) %>% 
  mutate(comname = fct_reorder(comname, weight_perc_diff, max)) %>% 
  ggplot(aes(weight_perc_diff, comname)) +
   geom_point() +
   labs(x = "Percent Difference in Biomass", 
        y = "", 
        caption = "hogfish excluded, super broken also",
        subtitle = "Extreme Biomass Mismatch - Fishbase Coefficients") +
  scale_x_continuous(labels = scales::comma_format())

Removing Bad Fits with Cutoff

Certain coefficients for some species are not well aligned with the biomass recorded in the survey. These contribute to the big differences in stratified abundances/biomasses. If we set a reasonable cutoff threshold of 15%, these are what the timelines now look like and how many species get dropped from each source.

# Cutoff Threshold
cut_percent <- 15

fishbase_cutoffs <- perc_differences %>% 
  filter(source == "fishbase",
         weight_perc_diff >= cut_percent) %>% 
  pull(comname)

wigley_cutoffs <- perc_differences %>% 
  filter(source == "wigley",
         weight_perc_diff >= cut_percent) %>% 
  pull(comname)

# concatenate them
bad_coefficients <- c( fishbase_cutoffs, wigley_cutoffs)


# drop things using cutoff
fish_removed <- weights_all %>% 
  filter(comname %not in% bad_coefficients)


# Re-total weights without those species
new_totals <- fish_removed %>% 
 group_by(est_year, season) %>% 
 summarise(`Biomass` = sum(biom_per_lclass),
           `LW Biomass` = sum(sum_weight_kg),
           .groups = "drop")


# Plot timelines again
ggplot(new_totals, aes(est_year)) +
  geom_line(aes(y = Biomass, color = "BIOMASS")) +
  geom_line(aes(y = `LW Biomass`, color = "LW Biomass")) +
  facet_wrap(~season, ncol = 1) +
  scale_y_continuous(labels = comma_format())+
  scale_color_gmri() +
  labs(x = "", 
       color = "",
       subtitle = paste0("Species Removed at ", cut_percent, "% Difference Cutoff"),
       caption = paste(length(fishbase_cutoffs),"Fishbase Species Removed.",    length(wigley_cutoffs), "Wigley Species Removed")) +
  theme(legend.position = "top")

fishbase_okay <- perc_differences %>% 
  filter(source == "fishbase",
         weight_perc_diff < cut_percent) %>% 
  pull(comname)

wigley_okay <- perc_differences %>% 
  filter(source == "wigley",
         weight_perc_diff < cut_percent) %>% 
  pull(comname)


wigley_bad <- perc_differences %>% 
  filter(source == "wigley",
         weight_perc_diff > cut_percent) %>% 
  pull(comname)

The species that pass the cutoff of no more than 15% difference in biomass are the following 68 species from the Wigley Paper:

wigley_okay
 [1] "acadian redfish"          "american plaice"         
 [3] "american shad"            "atlantic angel shark"    
 [5] "atlantic cod"             "atlantic croaker"        
 [7] "atlantic halibut"         "atlantic herring"        
 [9] "atlantic mackerel"        "atlantic sharpnose shark"
[11] "atlantic spadefish"       "atlantic sturgeon"       
[13] "atlantic thread herring"  "atlantic wolffish"       
[15] "barndoor skate"           "black sea bass"          
[17] "blackbelly rosefish"      "blueback herring"        
[19] "bluefish"                 "buckler dory"            
[21] "bullnose ray"             "butterfish"              
[23] "chain dogfish"            "clearnose skate"         
[25] "cownose ray"              "cunner"                  
[27] "cusk"                     "fawn cusk-eel"           
[29] "fourspot flounder"        "goosefish"               
[31] "greater amberjack"        "haddock"                 
[33] "little skate"             "longhorn sculpin"        
[35] "northern kingfish"        "northern searobin"       
[37] "ocean pout"               "offshore hake"           
[39] "pollock"                  "red hake"                
[41] "rosette skate"            "roughtail stingray"      
[43] "round herring"            "sand tiger"              
[45] "sandbar shark"            "scup"                    
[47] "sea raven"                "silver hake"             
[49] "smooth butterfly ray"     "smooth dogfish"          
[51] "smooth skate"             "southern kingfish"       
[53] "spanish mackerel"         "spanish sardine"         
[55] "spiny butterfly ray"      "spiny dogfish"           
[57] "spot"                     "spotted hake"            
[59] "striped bass"             "summer flounder"         
[61] "thorny skate"             "weakfish"                
[63] "white hake"               "windowpane flounder"     
[65] "winter flounder"          "winter skate"            
[67] "witch flounder"           "yellowtail flounder"     

Species from the Wigley paper that did not meet the standards of the 15% cutoff threshold were:

wigley_bad
[1] "alewife"            "atlantic torpedo"   "bluntnose stingray"
[4] "southern stingray"  "tautog"             "vermillion snapper"

The fishbase species that pass the 15% cutoff are the following 0 species:

fishbase_okay
character(0)

Re-Assess Total BIOMASS & LW Biomass

Once those species have been removed we can reassess how far off the biomass values are, and how much we should be able to trust the validity of the weights at length.

# Pull out species 
weights_cutoff <- weights_all %>% filter(comname %in% wigley_okay)



# Annual Summary - both seasons combined
ann_summs <- weights_cutoff %>% 
  group_by(est_year) %>% 
  summarise(`Bulk Biomass` = sum(biom_per_lclass),
            `Area Stratified Total BIOMASS` = sum(strat_total_biom_s),
            season = "Spring + Fall",
            `L-W Biomass` = sum(sum_weight_kg),
            `Area Stratified Total L-W` = sum(strat_total_lwbio_s),
            .groups = "drop") 


# Get summary for each season for every year
seas_summs <-  weights_cutoff  %>% 
  group_by(est_year, season) %>% 
  summarise(`Bulk Biomass` = sum(biom_per_lclass),
            `Area Stratified Total BIOMASS` = sum(strat_total_biom_s),
            `L-W Biomass` = sum(sum_weight_kg),
            `Area Stratified Total L-W` = sum(strat_total_lwbio_s),
            .groups = "drop")


# reformat for plotting
summaries_long <- bind_rows(ann_summs, seas_summs) %>% 
  pivot_longer(names_to = "Biomass Source", 
               values_to = "Total Survey Biomass (kg)", 
               cols = c(`L-W Biomass`, `Bulk Biomass`)) %>% 
  pivot_longer(names_to = "Stratified Biomass Source", 
               values_to = "Stratified Total Biomass (kg)", 
               cols = c(`Area Stratified Total L-W`,`Area Stratified Total BIOMASS`)) %>% 
  mutate(season = factor(season, levels = c("Spring + Fall", "Spring", "Fall")))

Un-Stratified

This plot charts the trends in the biomass recorded during the survey, survdat$BIOMASS and its length-specific counterpart.

# Total Biomass from survey
ggplot(summaries_long, 
       aes(est_year, `Total Survey Biomass (kg)`, color = `Biomass Source`)) +
  geom_line() +
  facet_grid(season~., scales = "free") +
  scale_y_continuous(labels = comma_format())+
  scale_color_gmri() +
  labs(x = "",
       title = "Success:",
       subtitle = "Removing bad L-W fits aligns biomass sources.") +
  theme(legend.position = "bottom")

Area-Stratified

Here are the trends for the area-stratified totals.

# What that biomass is extended out to total areas of the survey
ggplot(summaries_long, 
       aes(est_year, `Stratified Total Biomass (kg)`, 
           color = `Stratified Biomass Source`)) +
  geom_line() +
  facet_grid(season~., scales = "free") +
  scale_y_continuous(labels = comma_format())+
  scale_color_gmri() +
  labs(x = "", 
       caption = "Species cutoff in place") +
  theme(legend.position = "top",
        axis.text.x = element_text(angle = 90, vjust = 0.5)) +
  guides(color = guide_legend(title.position = "top", title.hjust = 0.5))

Percent of Biomass Remaining

By filtering out all these species it is possible that we are missing a meaningful proportion of the biomass caught in the survey. Here is what the total biomass was before and after, and what fraction of the biomass belongs to each of the functional groups.

total_bio <- sum(weights_all$biom_per_lclass)
cutoff_bio <- sum(weights_cutoff$biom_per_lclass)
biom_perc <- round((cutoff_bio / total_bio) * 100, 2)

Across all years the amount of biomass included in the 68 species is 98.98% of the biomass caught in the survey.

Looking across different years the numbers look like this:

# Yearly totals and seasonal totals for all species
ann_summ_all <- weights_all %>% 
  group_by(est_year) %>% 
  summarise(`Bulk Biomass` = sum(biom_per_lclass),
            `Area Stratified Total BIOMASS` = sum(strat_total_biom_s),
            `L-W Biomass` = sum(sum_weight_kg),
            `Area Stratified Total L-W` = sum(strat_total_lwbio_s),
            season = "Spring + Fall",
            .groups = "drop") 

seas_summs_all <-  weights_all  %>% 
  group_by(est_year, season) %>% 
  summarise(`Bulk Biomass` = sum(biom_per_lclass),
            `Area Stratified Total BIOMASS` = sum(strat_total_biom_s),
            `L-W Biomass` = sum(sum_weight_kg),
            `Area Stratified Total L-W` = sum(strat_total_lwbio_s),
            .groups = "drop")


# reformat for plotting
summaries_long_all <- bind_rows(ann_summ_all, seas_summs_all) %>% 
  pivot_longer(names_to = "Biomass Source", 
               values_to = "Total Survey Biomass (kg)", 
               cols = c(`L-W Biomass`, `Bulk Biomass`)) %>% 
  pivot_longer(names_to = "Stratified Biomass Source", 
               values_to = "Stratified Total Biomass (kg)", 
               cols = c(`Area Stratified Total L-W`,`Area Stratified Total BIOMASS`)) %>% 
  mutate(season = factor(season, levels = c("Spring + Fall", "Spring", "Fall")))

Un-Stratified Biomass

This plot charts the trends in the biomass recorded during the survey. Can only track survdat$BIOMASS here because the L-W version doesn’t hold up for all species.

# Comapare percentage of biomass that remains
all_spec_totals <- bind_rows(ann_summ_all, seas_summs_all) %>% 
  select(est_year, season, `Bulk Biomass`) %>% mutate(`Data Source` = "All Species")
cutoff_totals <- bind_rows(ann_summs, seas_summs)  %>% 
  select(est_year, season, `Bulk Biomass`) %>% mutate(`Data Source` = "Validated Species")

# Combine for plot
biom_check <- bind_rows(all_spec_totals, cutoff_totals) %>% 
  pivot_wider(names_from = "Data Source", values_from = "Bulk Biomass") %>% 
  mutate(
    `Percent After Cutoff` = (`Validated Species` / `All Species`),
    season = factor(season, levels = c("Spring + Fall", "Spring", "Fall")))



# Total Biomass from survey - All Species
ggplot(biom_check) +
  geom_line(aes(est_year, `Percent After Cutoff`)) +
  facet_grid(season~., scales = "free") +
  scale_y_continuous(labels = percent_format())+
  labs(x = "",
       title = "Impact of Removing Species",
       subtitle = "Percent of Biomass Remaining After Validation") +
  theme(legend.position = "bottom")