Comparing the Species Composition of Size Spectra

Supplemental materials for size spectrum manuscript.

Author

Affiliation

Gulf of Maine Research Institute

Published

December 7, 2022

Purpose

The goal of this markdown is to highlight differences in how the presence/absence of different functional groups impacts the different size spectra.

How does a lack of groundfish impact volatility (see-saw)?

What body sizes are notably deficient (if any)?

How does bin width, minimum weight, and maximum weight impact how the spectra is calculated.

What size bins are consistently present across all years and areas?

Data:

This is the target point for the stratified abundance data. It has been filtered to exclude biomass below 1g.

This is the target point for the size spectrum results. Any of the plots should match these results as a consistency check:

Preparing Data For Figures:

This is the code for the aggregation function, it totals the abundance within each size bin based on some grouping variable. Usually its just year and area, but we also want to preserve functional groups here:

Running the Aggregation

Comparing Functional Group Compositions:

Two things here. 1) Need to verify that the aggregations for the annual summary and the functional groups match up for direct comparison. 2) I’m expecting to see in more recent years how dominant the elasmobranchs have become in the community.

This plotting function will plot this size spectrum aggregates for both the functional group and overall annual aggregates. The slope and intercept will be plotted on the annual aggregate panels.

1970-2005
1996-2019

Specific Years Where Max Size Changes

This is what a single year looks like in the most recent regime, do we see any absences in lower size groups at a yearly step?

When/Where are Size Bins Missing

This section is for documenting where and how often size bins do not have abundances present for analysis. These are the sistuations where I am testing the impacts surrounding how I treat situations where:

If 10^4 is routinely absent in some areas, should it be kept? What if it is the result of fishing, shouldn’t the 0 biomass be meaningful?

Behavior for an Empty Bin?

Whenever a bin is empty, there is a decision to make about how to handle that gap. An NA value will ignore that bin exists, and ignore that there is an absence in potential biomass, biasing the slope to be more positive. Substituting small values near 0 to preserve the bin has the opposite impact, biasing a more negative slope depending on how small the substitution.

To pick a single year as an example, this is what Georges Bank’s ISD exponent fit changes with a change in maximum size parameter from 10^4 to 10^5.

What % of Biomass Falls within Min/Max Limits

Both the binning and ISD methodology requires some decision making around the minimum and maximum body sizes being investigated. This is either done explicitly with the min/max parameters of the ISD bounded power law, or implicitly with the smallest and largest size bins included in the slope estimates.

Under the current analysis these upper and lower limits are:

Current Analysis Configurations:

lower limit: 1gram
upper limit: 8192grams

Using the current analysis controls:

96.89% of the stratified biomass falls below the maximum size limit of 8192g.
99.998% of the stratified biomass falls above the minimum size limit of 1g.
And 96.88% of the stratified total biomass falls within that min-max size range.

--- title: "Comparing the Species Composition of Size Spectra" author: name: "Adam A. Kemberling" title: "Quantitative Research Associate" url: https://github.com/adamkemberling affiliation: Gulf of Maine Research Institute description: | Supplemental materials for size spectrum manuscript. date: "`r Sys.Date()`" format: html: self-contained: true code-fold: true code-tools: true df-print: kable toc: true toc-depth: 3 editor: source execute: echo: false warning: false message: false fig.align: center comment: "" --- ```{r} #| label: load packages and functions #### Packages #### library(targets) library(here) library(gmRi) library(patchwork) library(gt) library(knitr) library(tidyverse) library(scales) # Support functions source(here("R/support/sizeSpectra_support.R")) # Set theme theme_set(theme_gmri(legend.position = "bottom")) ``` ## Purpose The goal of this markdown is to highlight differences in how the presence/absence of different functional groups impacts the different size spectra. How does a lack of groundfish impact volatility (see-saw)? What body sizes are notably deficient (if any)? How does bin width, minimum weight, and maximum weight impact how the spectra is calculated. **What size bins are consistently present across all years and areas?** ### Data: This is the target point for the stratified abundance data. It has been filtered to exclude biomass below 1g. ```{r} #| label: load survdat_data # 1. Biological data used as input withr::with_dir(rprojroot::find_root('_targets.R'), tar_load(catch_log2_labelled)) catch_size_bins <- catch_log2_labelled # Number of Unique Species n_species <- length(unique(catch_size_bins$comname)) ``` This is the target point for the size spectrum results. Any of the plots should match these results as a consistency check: ```{r} #| label: load the size spectrum indices withr::with_dir(rprojroot::find_root('_targets.R'), tar_load(size_spectrum_indices)) # Grab SS Groups we care about region_indices <- size_spectrum_indices %>% filter(`group ID` == "single years * region") %>% mutate(yr = as.numeric(as.character(Year)), survey_area = factor(survey_area, levels = c("GoM", "GB", "SNE", "MAB")), sig_fit = ifelse(log2_sig_strat < 0.05, "Significant", "Non-Significant")) ``` ### Preparing Data For Figures: This is the code for the aggregation function, it totals the abundance within each size bin based on some grouping variable. Usually its just year and area, but we also want to preserve functional groups here: ### Running the Aggregation ```{r} #| label: aggregate-bin-totals # Run it for the hare groups hare_group_aggregates <- aggregate_log2_bins( log2_assigned = catch_size_bins, min_log2_bin = 0, max_log2_bin = 13, bin_increment = 1, Year, survey_area, hare_group) # Run just the years so we can assure that the totals are the same: year_group_aggregates <- aggregate_log2_bins( log2_assigned = catch_size_bins, min_log2_bin = 0, max_log2_bin = 13, bin_increment = 1, Year, survey_area) # Do they Add up? # original gom_90 <- filter(year_group_aggregates, Year == 1990, survey_area == "GoM") # with functional groups gom_90_check <- hare_group_aggregates %>% filter( Year == 1990, survey_area == "GoM") %>% group_by(Year, survey_area, log2_bins, left_lim, right_lim, bin_label, bin_width, bin_midpoint) %>% #glimpse() summarise( observed_abundance = sum(observed_abundance), observed_weight_g = sum(observed_weight_g), stratified_abundance = sum(stratified_abundance), stratified_weight_g = sum(stratified_weight_g), normalized_abund = sum(normalized_abund), norm_strat_abund = sum(norm_strat_abund), denorm_abund = sum(denorm_abund), denorm_strat_abund = sum(denorm_strat_abund), .groups = "drop" ) # Look the same to me!, wait maybe not... # glimpse(select(gom_90, Year, survey_area, left_lim, observed_abundance, stratified_abundance)) # glimpse(select(gom_90_check, Year, survey_area, left_lim, observed_abundance, stratified_abundance)) ``` ### Comparing Functional Group Compositions: Two things here. 1) Need to verify that the aggregations for the annual summary and the functional groups match up for direct comparison. 2) I'm expecting to see in more recent years how dominant the elasmobranchs have become in the community. This plotting function will plot this size spectrum aggregates for both the functional group and overall annual aggregates. The slope and intercept will be plotted on the annual aggregate panels. ```{r} #| label: Spectr-Comparison-Plot # Ideally: # 4 row figure, row for each region # Show what the side bins looked like, # with their compositions from each functional group #' @title Compare Proportions of Size Bins to Spectra Slopes #' #' @param functional_group_aggregates Aggregated Numbers of Stratified Abundance, result of agg_spectra_bins, using Year and spec_class #' @param annual_aggregates Same as functional_group_aggregates, but only on Year #' @param filter_years A year range to filter by #' @param fill_group factor group to do composition fill with #' @param x_icrement the spacing between log increments #' #' @return #' @export #' #' @examples compare_spectra_components <- function( functional_group_aggregates, annual_aggregates, filter_years = 2012, fill_group, x_increment = 1){ # Filter the year functional_group_aggregates <-filter(functional_group_aggregates, Year %in% filter_years) # Stacked Bar Plot for the Functional Groups fgroup_figure <- functional_group_aggregates %>% group_by(survey_area, {{fill_group}}, left_lim, bin_label) %>% summarise(norm_strat_abund = sum(norm_strat_abund), .groups = "drop") %>% ggplot(aes(left_lim, norm_strat_abund)) + geom_col(aes(fill = {{fill_group}}), position = "fill", color = "gray30") + scale_fill_gmri() + scale_x_continuous(breaks = seq(0, 15, by = x_increment), labels = math_format(2^.x)) + # Use this with position Fill scale_y_continuous(labels = percent_format()) + # Does not work with position stack #scale_y_log10(labels = trans_format("log10", math_format(10^.x))) + facet_wrap(~survey_area, ncol = 1, scales = "free") + labs(x = "Individual Bodymass (g)", y = "Fraction of Abundance (Area-Stratified)", fill = "Functional Group:", subtitle = str_c("Functional Group Composition: ", filter_years[[1]], "-", filter_years[[length(filter_years)]])) + guides(fill = guide_legend(nrow = 2)) # Get the total abundance in each bin bin_data_summary <- annual_aggregates %>% filter(Year %in% filter_years#, #left_lim <= 4.5 ) %>% complete(survey_area, bin_midpoint, fill = list("norm_strat_abund" = 0)) %>% group_by(survey_area, bin_midpoint) %>% summarise(norm_strat_abund = sum(norm_strat_abund), .groups = "drop") # Make Plot # Solid Bar Graph for Annual Totals annual_figure <- ggplot( bin_data_summary, # aes(left_lim, norm_strat_abund)) + aes(bin_midpoint, norm_strat_abund)) + geom_col(fill = gmri_cols("gmri blue"), alpha = 0.7) + geom_point(color = gmri_cols("gmri blue")) + geom_smooth(formula = y ~ x, method = "lm", color = gmri_cols("orange")) + stat_poly_eq(formula = y ~ x, aes(label = paste(after_stat(eq.label), after_stat(rr.label), sep = "~~~")), label.y = 1, label.x = 0.9, parse = TRUE) + #scale_x_continuous(breaks = seq(0, 15, by = x_increment), labels = math_format(2^.x)) + scale_x_log10(labels = trans_format("log10", math_format(10^.x))) + scale_y_log10(labels = trans_format("log10", math_format(10^.x))) + facet_wrap(~survey_area, ncol = 1, scales = "free") + labs(x = "Individual Bodymass (g)", y = "Stratified Abundance (Normalized)", fill = "Functional Group:", subtitle = str_c("Aggregate Community Size Spectra:")) # Collect the plots together plot_out <- (fgroup_figure | annual_figure) + plot_layout(guides = "collect") & theme(legend.position = "bottom") return(plot_out) } ``` ::: panel-tabset ### 1970-2005 ```{r} #| label: early-regime-hare #| fig.height: 8 #| fig.width: 8 # What does 1970-2005 look like t <- compare_spectra_components( functional_group_aggregates = hare_group_aggregates, annual_aggregates = year_group_aggregates, filter_years = c(1970:2005), fill_group = hare_group, x_increment = 1) t ``` ### 1996-2019 ```{r} #| label: late-regime-hare2 #| fig.height: 8 #| fig-width: 10 # What does 1996-2019 look like (t <- compare_spectra_components( functional_group_aggregates = hare_group_aggregates, annual_aggregates = year_group_aggregates, filter_years = c(2006:2019), fill_group = hare_group, x_increment = 1)) ``` ::: ## Specific Years Where Max Size Changes This is what a single year looks like in the most recent regime, do we see any absences in lower size groups at a yearly step? ```{r} #| label: log2-binning-swap # Load the data before max size cutoff withr::with_dir(rprojroot::find_root('_targets.R'), tar_load(catch_complete)) # Run the pipeline preparations without any min/max filters: #### Log2 Binning #### # 1. Set 0.5 increments to check presence absence with more detail all_sizes_l2 <- catch_complete %>% prep_sizeSpectra_data(lw_trawl_data = .) %>% assign_log2_bins(wmin_grams = ., log2_increment = 1) # Aggregate for all sizes on yr, area, functional group all_sizes_hare <- aggregate_log2_bins( log2_assigned = all_sizes_l2, min_log2_bin = 0, max_log2_bin = 13, bin_increment = 1, Year, survey_area, hare_group) # Aggregate bodymass and abundance in the bins for all data, each year and region all_sizes_yr_area <- aggregate_log2_bins( log2_assigned = all_sizes_l2, min_log2_bin = 0, max_log2_bin = 13, bin_increment = 1, Year, survey_area) ``` ```{r} #| label: specific-year-impact #| fig.height: 8 #| fig.width: 10 # What does 2015 look like (t <- compare_spectra_components( functional_group_aggregates = all_sizes_hare, annual_aggregates = all_sizes_yr_area, filter_years = c(2015), fill_group = hare_group, x_increment = 1)) ``` # When/Where are Size Bins Missing This section is for documenting where and how often size bins do not have abundances present for analysis. These are the sistuations where I am testing the impacts surrounding how I treat situations where: > If 10^4 is routinely absent in some areas, should it be kept? What if it is the result of fishing, shouldn't the 0 biomass be meaningful? ```{r} #| label: size-bin-presence-absence #| fig.height: 8 # What would all size aggregates be if they werent filtered: # i.e. all possible points on a spectrum all_sizes_check <- aggregate_log2_bins( log2_assigned = all_sizes_l2, min_log2_bin = 0, max_log2_bin = 14, bin_increment = 1, Year, survey_area) # Flag for each group when it is present, there was no catch, or if there aree still NA's agg_flags <- all_sizes_check %>% mutate(observed_abundance = ifelse(observed_abundance == 1, "Absent", "Present")) # Plot them agg_flags %>% ggplot(aes(Year, left_lim, fill = observed_abundance)) + geom_tile(color = "gray15") + scale_fill_gmri(reverse = T) + scale_y_continuous( breaks = seq(0,15, by = 1),#labels = half_format labels = math_format(2^.x)) + facet_wrap(~survey_area, ncol = 1) + labs(y = "Body-Mass Size Bin Left Limit (g)") ``` # Behavior for an Empty Bin? Whenever a bin is empty, there is a decision to make about how to handle that gap. An `NA` value will ignore that bin exists, and ignore that there is an absence in potential biomass, biasing the slope to be more positive. Substituting small values near 0 to preserve the bin has the opposite impact, biasing a more negative slope depending on how small the substitution. ```{r} # Grab data for 1998 test_1998 <- all_sizes_l2 %>% filter(Year == 1998) # Aggregate bodymass and abundance in the bins for all data, each year and region agg_log2_1998 <- aggregate_log2_bins( log2_assigned = test_1998, min_log2_bin = 0, max_log2_bin = 13, bin_increment = 1, Year, survey_area) %>% mutate(observed_abundance = ifelse(observed_abundance == 1, "Absent", "Present")) ``` ::: panel-tabset ### Replace Gaps with N = 1 ```{r} #| fig.height: 6 # Plot the binned spectra agg_log2_1998 %>% ggplot(aes(left_lim, norm_strat_abund)) + geom_col(fill = gmri_cols("gmri blue"), alpha = 0.5) + geom_point(color = gmri_cols("gmri blue")) + geom_smooth( formula = y ~ x, method = "lm", color = gmri_cols("orange"), se = F) + geom_hline(yintercept = 10^0, linewidth = 1, color = "black") + stat_poly_eq( formula = y ~ x, aes(label = paste(after_stat(eq.label), after_stat(rr.label), sep = "~~~")), label.y = 1, label.x = 0.9, parse = TRUE) + scale_x_continuous(breaks = seq(0, 10, by = 1), labels = math_format(10^.x)) + scale_y_log10(labels = trans_format("log10", math_format(10^.x))) + facet_wrap(~survey_area, ncol = 1) + labs(x = "Individual Bodymass (g)", y = "Stratified Abundance (Normalized)", fill = "Functional Group:", title = str_c("Substitute NA with Abundance = 1")) ``` ### Ignore Gaps and Lose Bins ```{r} #| fig.height: 6 # Plot the binned spectra agg_log2_1998 %>% mutate(norm_strat_abund = ifelse(norm_strat_abund < 1, NA, norm_strat_abund)) %>% ggplot(aes(left_lim, norm_strat_abund)) + geom_col(fill = gmri_cols("gmri blue"), alpha = 0.5) + geom_point(color = gmri_cols("gmri blue")) + geom_smooth( formula = y ~ x, method = "lm", color = gmri_cols("orange"), se = F) + geom_hline(yintercept = 10^0, linewidth = 1, color = "black") + stat_poly_eq( formula = y ~ x, aes(label = paste(after_stat(eq.label), after_stat(rr.label), sep = "~~~")), label.y = 1, label.x = 0.9, parse = TRUE) + scale_x_continuous(breaks = seq(0, 10, by = 1), labels = math_format(10^.x)) + scale_y_log10(labels = trans_format("log10", math_format(10^.x))) + facet_wrap(~survey_area, ncol = 1) + labs(x = "Individual Bodymass (g)", y = "Stratified Abundance (Normalized)", fill = "Functional Group:", title = str_c("Ignore Empty Bin")) ``` ### ISD Analogs To pick a single year as an example, this is what Georges Bank's ISD exponent fit changes with a change in maximum size parameter from 10^4 to 10^5. ```{r} #| fig.height: 8 # Grab data for 1998 #ex 98 gb_98 <- all_sizes_l2 %>% filter(Year == 1998, survey_area == "GB") # prep the catch data for isd plotting gb_98_isd <- isd_plot_prep(biomass_data = gb_98, stratified_abundance = T, min_weight_g = 1) # What is the fit we have in the pipeline: gb_res_98 <- region_indices %>% filter(Year == 1984, survey_area == "GB") # Plot the current setup maxcut1 <- ggplot_isd(isd_data_prepped = gb_98_isd, mle_results = gb_res_98, abundance_vals = "stratified", plot_rects = F, show_pl_fit = TRUE, xlim_global = NULL, group_name = NULL)[[2]] + labs(title = "Georges Bank 1998, Max ISD Size 10^4g") #-------- # plotting a different xmax # What does it look like if we fit it with different xmax gb_res_98_2 <- group_isd_calc(dataBinForLike = gb_98, group_var = "GB 1998", abundance_vals = "stratified", isd_xmin = 1, isd_xmax = 10^5, vecDiff = 0.5) maxcut2 <- ggplot_isd(isd_data_prepped = gb_98_isd, mle_results = gb_res_98_2, abundance_vals = "stratified", plot_rects = F, show_pl_fit = TRUE, xlim_global = NULL, group_name = NULL)[[2]] + labs(title = "Georges Bank 1998, Max ISD Size 10^5g") # Compare the two maxcut1 / maxcut2 ``` ::: # What % of Biomass Falls within Min/Max Limits Both the binning and ISD methodology requires some decision making around the minimum and maximum body sizes being investigated. This is either done explicitly with the min/max parameters of the ISD bounded power law, or implicitly with the smallest and largest size bins included in the slope estimates. Under the current analysis these upper and lower limits are: ```{r} # Load the anlaysis options withr::with_dir(rprojroot::find_root('_targets.R'), tar_load(analysis_options)) # Get the min and max sizes allowed max_size <- analysis_options$max_input_weight_g min_size <- analysis_options$min_input_weight_g ``` **Current Analysis Configurations:** * lower limit: `r min_size`gram * upper limit: `r max_size`grams ```{r} # What fraction of biomass is within min/max total_strat_bio <- sum(catch_complete$sum_weight_kg) # 1. percent below upper below_max <- catch_complete %>% filter(ind_weight_kg < (max_size/1000)) %>% pull(sum_weight_kg) %>% sum() # make a percent percent_below <- round((below_max/total_strat_bio) * 100, 2) # 2. percent above lower above_min <- catch_complete %>% filter(ind_weight_kg > (min_size/1000)) %>% pull(sum_weight_kg) %>% sum() # make a percent percent_above <- round((above_min/total_strat_bio) * 100, 3) # 3. Percent within thw two within_bounds <- below_max <- catch_complete %>% filter( ind_weight_kg > (min_size/1000) & ind_weight_kg < (max_size/1000)) %>% pull(sum_weight_kg) %>% sum() percent_within <- round((within_bounds/total_strat_bio) * 100, 2) ``` **Using the current analysis controls: ** * `r percent_below`% of the stratified biomass falls below the maximum size limit of `r max_size`g. * `r percent_above`% of the stratified biomass falls above the minimum size limit of `r min_size`g. * And `r percent_within`% of the stratified total biomass falls within that min-max size range.