The goal of this markdown is to highlight differences in how the presence/absence of different functional groups impacts the different size spectra.
How does a lack of groundfish impact volatility (see-saw)?
What body sizes are notably deficient (if any)?
How does bin width, minimum weight, and maximum weight impact how the spectra is calculated.
What size bins are consistently present across all years and areas?
Data:
This is the target point for the stratified abundance data. It has been filtered to exclude biomass below 1g.
This is the target point for the size spectrum results. Any of the plots should match these results as a consistency check:
Preparing Data For Figures:
This is the code for the aggregation function, it totals the abundance within each size bin based on some grouping variable. Usually its just year and area, but we also want to preserve functional groups here:
Running the Aggregation
Comparing Functional Group Compositions:
Two things here. 1) Need to verify that the aggregations for the annual summary and the functional groups match up for direct comparison. 2) I’m expecting to see in more recent years how dominant the elasmobranchs have become in the community.
This plotting function will plot this size spectrum aggregates for both the functional group and overall annual aggregates. The slope and intercept will be plotted on the annual aggregate panels.
This is what a single year looks like in the most recent regime, do we see any absences in lower size groups at a yearly step?
When/Where are Size Bins Missing
This section is for documenting where and how often size bins do not have abundances present for analysis. These are the sistuations where I am testing the impacts surrounding how I treat situations where:
If 10^4 is routinely absent in some areas, should it be kept? What if it is the result of fishing, shouldn’t the 0 biomass be meaningful?
Behavior for an Empty Bin?
Whenever a bin is empty, there is a decision to make about how to handle that gap. An NA value will ignore that bin exists, and ignore that there is an absence in potential biomass, biasing the slope to be more positive. Substituting small values near 0 to preserve the bin has the opposite impact, biasing a more negative slope depending on how small the substitution.
To pick a single year as an example, this is what Georges Bank’s ISD exponent fit changes with a change in maximum size parameter from 10^4 to 10^5.
What % of Biomass Falls within Min/Max Limits
Both the binning and ISD methodology requires some decision making around the minimum and maximum body sizes being investigated. This is either done explicitly with the min/max parameters of the ISD bounded power law, or implicitly with the smallest and largest size bins included in the slope estimates.
Under the current analysis these upper and lower limits are:
Current Analysis Configurations:
lower limit: 1gram
upper limit: 8192grams
Using the current analysis controls:
96.89% of the stratified biomass falls below the maximum size limit of 8192g.
99.998% of the stratified biomass falls above the minimum size limit of 1g.
And 96.88% of the stratified total biomass falls within that min-max size range.
Source Code
---title: "Comparing the Species Composition of Size Spectra"author: name: "Adam A. Kemberling" title: "Quantitative Research Associate" url: https://github.com/adamkemberling affiliation: Gulf of Maine Research Institutedescription: | Supplemental materials for size spectrum manuscript.date: "`r Sys.Date()`"format: html: self-contained: true code-fold: true code-tools: true df-print: kable toc: true toc-depth: 3editor: sourceexecute: echo: false warning: false message: false fig.align: center comment: ""---```{r}#| label: load packages and functions#### Packages ####library(targets)library(here)library(gmRi)library(patchwork)library(gt)library(knitr)library(tidyverse)library(scales)# Support functionssource(here("R/support/sizeSpectra_support.R"))# Set themetheme_set(theme_gmri(legend.position ="bottom"))```## PurposeThe goal of this markdown is to highlight differences in how the presence/absence of different functional groups impacts the different size spectra.How does a lack of groundfish impact volatility (see-saw)?What body sizes are notably deficient (if any)?How does bin width, minimum weight, and maximum weight impact how the spectra is calculated.**What size bins are consistently present across all years and areas?**### Data:This is the target point for the stratified abundance data. It has been filtered to exclude biomass below 1g. ```{r}#| label: load survdat_data# 1. Biological data used as inputwithr::with_dir(rprojroot::find_root('_targets.R'), tar_load(catch_log2_labelled)) catch_size_bins <- catch_log2_labelled# Number of Unique Speciesn_species <-length(unique(catch_size_bins$comname))```This is the target point for the size spectrum results. Any of the plots should match these results as a consistency check:```{r}#| label: load the size spectrum indiceswithr::with_dir(rprojroot::find_root('_targets.R'), tar_load(size_spectrum_indices)) # Grab SS Groups we care aboutregion_indices <- size_spectrum_indices %>%filter(`group ID`=="single years * region") %>%mutate(yr =as.numeric(as.character(Year)),survey_area =factor(survey_area, levels =c("GoM", "GB", "SNE", "MAB")),sig_fit =ifelse(log2_sig_strat <0.05, "Significant", "Non-Significant"))```### Preparing Data For Figures:This is the code for the aggregation function, it totals the abundance within each size bin based on some grouping variable. Usually its just year and area, but we also want to preserve functional groups here:### Running the Aggregation```{r}#| label: aggregate-bin-totals# Run it for the hare groupshare_group_aggregates <-aggregate_log2_bins(log2_assigned = catch_size_bins, min_log2_bin =0,max_log2_bin =13, bin_increment =1, Year, survey_area, hare_group)# Run just the years so we can assure that the totals are the same:year_group_aggregates <-aggregate_log2_bins(log2_assigned = catch_size_bins, min_log2_bin =0,max_log2_bin =13, bin_increment =1, Year, survey_area)# Do they Add up?# originalgom_90 <-filter(year_group_aggregates, Year ==1990, survey_area =="GoM")# with functional groupsgom_90_check <- hare_group_aggregates %>%filter( Year ==1990, survey_area =="GoM") %>%group_by(Year, survey_area, log2_bins, left_lim, right_lim, bin_label, bin_width, bin_midpoint) %>%#glimpse()summarise(observed_abundance =sum(observed_abundance),observed_weight_g =sum(observed_weight_g),stratified_abundance =sum(stratified_abundance),stratified_weight_g =sum(stratified_weight_g),normalized_abund =sum(normalized_abund),norm_strat_abund =sum(norm_strat_abund),denorm_abund =sum(denorm_abund),denorm_strat_abund =sum(denorm_strat_abund),.groups ="drop" )# Look the same to me!, wait maybe not...# glimpse(select(gom_90, Year, survey_area, left_lim, observed_abundance, stratified_abundance))# glimpse(select(gom_90_check, Year, survey_area, left_lim, observed_abundance, stratified_abundance))```### Comparing Functional Group Compositions:Two things here. 1) Need to verify that the aggregations for the annual summary and the functional groups match up for direct comparison. 2) I'm expecting to see in more recent years how dominant the elasmobranchs have become in the community.This plotting function will plot this size spectrum aggregates for both the functional group and overall annual aggregates. The slope and intercept will be plotted on the annual aggregate panels.```{r}#| label: Spectr-Comparison-Plot# Ideally:# 4 row figure, row for each region# Show what the side bins looked like, # with their compositions from each functional group#' @title Compare Proportions of Size Bins to Spectra Slopes#'#' @param functional_group_aggregates Aggregated Numbers of Stratified Abundance, result of agg_spectra_bins, using Year and spec_class#' @param annual_aggregates Same as functional_group_aggregates, but only on Year#' @param filter_years A year range to filter by#' @param fill_group factor group to do composition fill with#' @param x_icrement the spacing between log increments#'#' @return#' @export#'#' @examplescompare_spectra_components <-function( functional_group_aggregates, annual_aggregates, filter_years =2012, fill_group,x_increment =1){# Filter the year functional_group_aggregates <-filter(functional_group_aggregates, Year %in% filter_years)# Stacked Bar Plot for the Functional Groups fgroup_figure <- functional_group_aggregates %>%group_by(survey_area, {{fill_group}}, left_lim, bin_label) %>%summarise(norm_strat_abund =sum(norm_strat_abund),.groups ="drop") %>%ggplot(aes(left_lim, norm_strat_abund)) +geom_col(aes(fill = {{fill_group}}), position ="fill", color ="gray30") +scale_fill_gmri() +scale_x_continuous(breaks =seq(0, 15, by = x_increment), labels =math_format(2^.x)) +# Use this with position Fillscale_y_continuous(labels =percent_format()) +# Does not work with position stack#scale_y_log10(labels = trans_format("log10", math_format(10^.x))) +facet_wrap(~survey_area, ncol =1, scales ="free") +labs(x ="Individual Bodymass (g)", y ="Fraction of Abundance (Area-Stratified)",fill ="Functional Group:",subtitle =str_c("Functional Group Composition: ", filter_years[[1]], "-", filter_years[[length(filter_years)]])) +guides(fill =guide_legend(nrow =2))# Get the total abundance in each bin bin_data_summary <- annual_aggregates %>%filter(Year %in% filter_years#,#left_lim <= 4.5 ) %>%complete(survey_area, bin_midpoint, fill =list("norm_strat_abund"=0)) %>%group_by(survey_area, bin_midpoint) %>%summarise(norm_strat_abund =sum(norm_strat_abund), .groups ="drop")# Make Plot# Solid Bar Graph for Annual Totals annual_figure <-ggplot( bin_data_summary, # aes(left_lim, norm_strat_abund)) +aes(bin_midpoint, norm_strat_abund)) +geom_col(fill =gmri_cols("gmri blue"), alpha =0.7) +geom_point(color =gmri_cols("gmri blue")) +geom_smooth(formula = y ~ x,method ="lm",color =gmri_cols("orange")) +stat_poly_eq(formula = y ~ x,aes(label =paste(after_stat(eq.label), after_stat(rr.label), sep ="~~~")), label.y =1, label.x =0.9,parse =TRUE) +#scale_x_continuous(breaks = seq(0, 15, by = x_increment), labels = math_format(2^.x)) +scale_x_log10(labels =trans_format("log10", math_format(10^.x))) +scale_y_log10(labels =trans_format("log10", math_format(10^.x))) +facet_wrap(~survey_area, ncol =1, scales ="free") +labs(x ="Individual Bodymass (g)", y ="Stratified Abundance (Normalized)",fill ="Functional Group:",subtitle =str_c("Aggregate Community Size Spectra:"))# Collect the plots together plot_out <- (fgroup_figure | annual_figure) +plot_layout(guides ="collect") &theme(legend.position ="bottom")return(plot_out)}```::: panel-tabset### 1970-2005 ```{r}#| label: early-regime-hare#| fig.height: 8#| fig.width: 8# What does 1970-2005 look liket <-compare_spectra_components(functional_group_aggregates = hare_group_aggregates, annual_aggregates = year_group_aggregates, filter_years =c(1970:2005), fill_group = hare_group, x_increment =1)t```### 1996-2019 ```{r}#| label: late-regime-hare2#| fig.height: 8#| fig-width: 10# What does 1996-2019 look like(t <-compare_spectra_components(functional_group_aggregates = hare_group_aggregates, annual_aggregates = year_group_aggregates, filter_years =c(2006:2019), fill_group = hare_group, x_increment =1))```:::## Specific Years Where Max Size ChangesThis is what a single year looks like in the most recent regime, do we see any absences in lower size groups at a yearly step?```{r}#| label: log2-binning-swap# Load the data before max size cutoffwithr::with_dir(rprojroot::find_root('_targets.R'), tar_load(catch_complete))# Run the pipeline preparations without any min/max filters:#### Log2 Binning ##### 1. Set 0.5 increments to check presence absence with more detailall_sizes_l2 <- catch_complete %>%prep_sizeSpectra_data(lw_trawl_data = .) %>%assign_log2_bins(wmin_grams = ., log2_increment =1)# Aggregate for all sizes on yr, area, functional groupall_sizes_hare <-aggregate_log2_bins(log2_assigned = all_sizes_l2, min_log2_bin =0,max_log2_bin =13, bin_increment =1, Year, survey_area, hare_group)# Aggregate bodymass and abundance in the bins for all data, each year and regionall_sizes_yr_area <-aggregate_log2_bins(log2_assigned = all_sizes_l2, min_log2_bin =0,max_log2_bin =13, bin_increment =1, Year, survey_area)``````{r}#| label: specific-year-impact#| fig.height: 8#| fig.width: 10# What does 2015 look like(t <-compare_spectra_components(functional_group_aggregates = all_sizes_hare, annual_aggregates = all_sizes_yr_area,filter_years =c(2015), fill_group = hare_group, x_increment =1))```# When/Where are Size Bins MissingThis section is for documenting where and how often size bins do not have abundances present for analysis. These are the sistuations where I am testing the impacts surrounding how I treat situations where:> If 10^4 is routinely absent in some areas, should it be kept? What if it is the result of fishing, shouldn't the 0 biomass be meaningful?```{r}#| label: size-bin-presence-absence#| fig.height: 8# What would all size aggregates be if they werent filtered:# i.e. all possible points on a spectrumall_sizes_check <-aggregate_log2_bins(log2_assigned = all_sizes_l2, min_log2_bin =0,max_log2_bin =14, bin_increment =1, Year, survey_area)# Flag for each group when it is present, there was no catch, or if there aree still NA's agg_flags <- all_sizes_check %>%mutate(observed_abundance =ifelse(observed_abundance ==1, "Absent", "Present")) # Plot themagg_flags %>%ggplot(aes(Year, left_lim, fill = observed_abundance)) +geom_tile(color ="gray15") +scale_fill_gmri(reverse = T) +scale_y_continuous(breaks =seq(0,15, by =1),#labels = half_formatlabels =math_format(2^.x)) +facet_wrap(~survey_area, ncol =1) +labs(y ="Body-Mass Size Bin Left Limit (g)")```# Behavior for an Empty Bin?Whenever a bin is empty, there is a decision to make about how to handle that gap. An `NA` value will ignore that bin exists, and ignore that there is an absence in potential biomass, biasing the slope to be more positive. Substituting small values near 0 to preserve the bin has the opposite impact, biasing a more negative slope depending on how small the substitution.```{r}# Grab data for 1998test_1998 <- all_sizes_l2 %>%filter(Year ==1998)# Aggregate bodymass and abundance in the bins for all data, each year and regionagg_log2_1998 <-aggregate_log2_bins(log2_assigned = test_1998, min_log2_bin =0,max_log2_bin =13, bin_increment =1, Year, survey_area) %>%mutate(observed_abundance =ifelse(observed_abundance ==1, "Absent", "Present")) ```::: panel-tabset### Replace Gaps with N = 1```{r}#| fig.height: 6# Plot the binned spectraagg_log2_1998 %>%ggplot(aes(left_lim, norm_strat_abund)) +geom_col(fill =gmri_cols("gmri blue"), alpha =0.5) +geom_point(color =gmri_cols("gmri blue")) +geom_smooth(formula = y ~ x,method ="lm",color =gmri_cols("orange"), se = F) +geom_hline(yintercept =10^0, linewidth =1, color ="black") +stat_poly_eq(formula = y ~ x,aes(label =paste(after_stat(eq.label), after_stat(rr.label), sep ="~~~")), label.y =1, label.x =0.9,parse =TRUE) +scale_x_continuous(breaks =seq(0, 10, by =1), labels =math_format(10^.x)) +scale_y_log10(labels =trans_format("log10", math_format(10^.x))) +facet_wrap(~survey_area, ncol =1) +labs(x ="Individual Bodymass (g)", y ="Stratified Abundance (Normalized)",fill ="Functional Group:",title =str_c("Substitute NA with Abundance = 1"))```### Ignore Gaps and Lose Bins```{r}#| fig.height: 6# Plot the binned spectraagg_log2_1998 %>%mutate(norm_strat_abund =ifelse(norm_strat_abund <1, NA, norm_strat_abund)) %>%ggplot(aes(left_lim, norm_strat_abund)) +geom_col(fill =gmri_cols("gmri blue"), alpha =0.5) +geom_point(color =gmri_cols("gmri blue")) +geom_smooth(formula = y ~ x,method ="lm",color =gmri_cols("orange"), se = F) +geom_hline(yintercept =10^0, linewidth =1, color ="black") +stat_poly_eq(formula = y ~ x,aes(label =paste(after_stat(eq.label), after_stat(rr.label), sep ="~~~")), label.y =1, label.x =0.9,parse =TRUE) +scale_x_continuous(breaks =seq(0, 10, by =1), labels =math_format(10^.x)) +scale_y_log10(labels =trans_format("log10", math_format(10^.x))) +facet_wrap(~survey_area, ncol =1) +labs(x ="Individual Bodymass (g)", y ="Stratified Abundance (Normalized)",fill ="Functional Group:",title =str_c("Ignore Empty Bin"))```### ISD AnalogsTo pick a single year as an example, this is what Georges Bank's ISD exponent fit changes with a change in maximum size parameter from 10^4 to 10^5.```{r}#| fig.height: 8# Grab data for 1998#ex 98gb_98 <- all_sizes_l2 %>%filter(Year ==1998, survey_area =="GB")# prep the catch data for isd plottinggb_98_isd <-isd_plot_prep(biomass_data = gb_98, stratified_abundance = T, min_weight_g =1)# What is the fit we have in the pipeline:gb_res_98 <- region_indices %>%filter(Year ==1984, survey_area =="GB")# Plot the current setupmaxcut1 <-ggplot_isd(isd_data_prepped = gb_98_isd, mle_results = gb_res_98, abundance_vals ="stratified",plot_rects = F,show_pl_fit =TRUE,xlim_global =NULL,group_name =NULL)[[2]] +labs(title ="Georges Bank 1998, Max ISD Size 10^4g")#--------# plotting a different xmax# What does it look like if we fit it with different xmaxgb_res_98_2 <-group_isd_calc(dataBinForLike = gb_98, group_var ="GB 1998", abundance_vals ="stratified",isd_xmin =1,isd_xmax =10^5,vecDiff =0.5)maxcut2 <-ggplot_isd(isd_data_prepped = gb_98_isd, mle_results = gb_res_98_2, abundance_vals ="stratified",plot_rects = F,show_pl_fit =TRUE,xlim_global =NULL,group_name =NULL)[[2]] +labs(title ="Georges Bank 1998, Max ISD Size 10^5g")# Compare the twomaxcut1 / maxcut2```:::# What % of Biomass Falls within Min/Max LimitsBoth the binning and ISD methodology requires some decision making around the minimum and maximum body sizes being investigated. This is either done explicitly with the min/max parameters of the ISD bounded power law, or implicitly with the smallest and largest size bins included in the slope estimates.Under the current analysis these upper and lower limits are:```{r}# Load the anlaysis optionswithr::with_dir(rprojroot::find_root('_targets.R'), tar_load(analysis_options))# Get the min and max sizes allowed max_size <- analysis_options$max_input_weight_g min_size <- analysis_options$min_input_weight_g```**Current Analysis Configurations:** * lower limit: `r min_size`gram * upper limit: `r max_size`grams ```{r}# What fraction of biomass is within min/maxtotal_strat_bio <-sum(catch_complete$sum_weight_kg)# 1. percent below upperbelow_max <- catch_complete %>%filter(ind_weight_kg < (max_size/1000)) %>%pull(sum_weight_kg) %>%sum()# make a percentpercent_below <-round((below_max/total_strat_bio) *100, 2)# 2. percent above lowerabove_min <- catch_complete %>%filter(ind_weight_kg > (min_size/1000)) %>%pull(sum_weight_kg) %>%sum()# make a percentpercent_above <-round((above_min/total_strat_bio) *100, 3)# 3. Percent within thw twowithin_bounds <- below_max <- catch_complete %>%filter( ind_weight_kg > (min_size/1000) & ind_weight_kg < (max_size/1000)) %>%pull(sum_weight_kg) %>%sum()percent_within <-round((within_bounds/total_strat_bio) *100, 2)```**Using the current analysis controls: *** `r percent_below`% of the stratified biomass falls below the maximum size limit of `r max_size`g. * `r percent_above`% of the stratified biomass falls above the minimum size limit of `r min_size`g. * And `r percent_within`% of the stratified total biomass falls within that min-max size range.