Detailing the Maine Coastal Current Information with FVCOM

Published

March 4, 2025

Code
{
  library(raster)
  library(sf) 
  library(fvcom) 
  library(ncdf4) 
  library(tidyverse)
  library(gmRi)
  library(patchwork)
  library(rnaturalearth)
  library(showtext)
  # Cyclic color palettes in scico
  # From: https://www.fabiocrameri.ch/colourmaps/
  library(scico)
  library(legendry)
}

# namespace conflicts
conflicted::conflict_prefer("select", "dplyr")
conflicted::conflict_prefer("filter", "dplyr")


# Set the theme
theme_set(theme_bw() + map_theme())

# Project paths
lob_ecol_path <- cs_path("mills", "Projects/Lobster ECOL")
fvcom_path    <- cs_path("res", "FVCOM/Lobster-ECOL")
poly_paths    <- cs_path("mills", "Projects/Lobster ECOL/Spatial_Defs")

# Shapefiles
new_england <- ne_states("united states of america", returnclass = "sf") %>% 
  filter(postal %in% c("VT", "ME", "RI", "MA", "CT", "NH", "NY", "MD", "VA", "NJ", "DE", "NC", "PA", "WV"))
canada <- ne_states("canada", returnclass = "sf")

# # # Support functions for FVCOM
source(here::here("R/FVCOM_Support.R"))
Code
# Use GMRI style
use_gmri_style_rmd()

Daily Maine Coastal Current Information from FVCOM

The Gulf of Maine Coastal Current (GMCC) can be divided into to principal branches (Eastern Maine Coastal Current EMCC & Western Maine Coastal Current WMCC), separated by a junction near Penobscot bay.

This junction is important as it is a site where a variable portion of current flow shifts from continuing Southwest, and veers offshore.

This shift in current flow has important implications for nutrient and plankton distribution, and larval transport. This is why this section of the current flow is used as an indicator of general circulation patterns within the Gulf of Maine.

Code
# Loading FVCOM

# Get some file names
fvcom_surfbot_files <- setNames(
  list.files(fvcom_path, full.names = T, pattern = ".nc"),
  str_remove(list.files(fvcom_path, full.names = F, pattern = ".nc"), ".nc"))


# Open one
gom3_early <- nc_open(fvcom_surfbot_files[[1]])
# So the data is daily...
# ncvar_get(gom3_early, varid = "surface_v", start = c(1,1), count = c(1,-1))


# Get the mesh itself as a simple feature collection
gom3_mesh <- get_mesh_geometry(gom3_early, what = 'lonlat') 

Maine Coastal Current Region

The primary area of concern rgarding capturing the continuous flow of the Maine Coastal Current is the area off the coast of penobscot bay.

The following polygon was created as a boundary for subsetting the relevant data for calculating MCC indices.

We may want to extend our area east (Li et al 2022): > The EMCC generally bifurcates southeast of Mount Desert Island (D. Brooks & Townsend, 1989; D. Li et al., 2021; Luerssen et al., 2005; Pettigrew et al., 1998, 2005).

Code
# Make a polygon for the area of interest:

#Original Area
mcc_turnoff_coords <- tribble(
  ~"lon", ~"lat",
  -69.34, 43.21,   # Bottom Left
  -69.78, 43.62,   # Off Popham
  -67.45, 44.34,   # Off Jonesport
  -67.4, 43.8,     # Bottom right
  -69.34, 43.21,   # Bottom left again
)

# # Expanded Area?
# mcc_turnoff_coords <- tribble(
#   ~"lon", ~"lat",
#   -69.75, 43,    # Bottom Left
#   -70, 43.5,       # Top Left
#   -67.25, 44.45,   # Off Jonesport
#   -66.9, 43.95,     # Bottom right
#   -69.75, 43,    # Bottom left again
# )


# Make it a polygon
mcc_turnoff_poly <- st_polygon(list(cbind(mcc_turnoff_coords$lon, mcc_turnoff_coords$lat))) %>% 
  st_sfc(crs = 4326) %>% 
  st_as_sf() %>% 
  mutate(area = "Maine Coastal Current Region") 
st_geometry(mcc_turnoff_poly) <- "geometry"

Using that boundary we can clip the FVCOM mesh and obtain the indices to use for pulling data from the many netcdf files.

Code
# Project polygon
poly_projected <- st_transform(mcc_turnoff_poly, st_crs(gom3_mesh)) 

# Turn off s2
sf::sf_use_s2(FALSE)

# Trim it
mcc_studyarea_mesh <- mesh_trim(
  mesh = gom3_mesh, 
  domain = st_as_sf(poly_projected) )

# Map that
ggplot() +
  geom_sf(data = new_england) +
  geom_sf(data = canada) +
  geom_sf(data = mcc_studyarea_mesh, color = "black", alpha = 0.6) +
  geom_sf(data = mcc_turnoff_poly, fill = "orange", alpha = 0.4) +
  theme_classic() +
  map_theme() +
  coord_sf(
    xlim = c(-70, -67.3), 
    ylim = c(43.2, 44.7), 
    expand = F) +
  labs(title = "MCC Turnoff Study Area - FVCOM Mesh Intersection")

Grab the Surface Current Information

The clipped mesh contains node and element index numbers. We will want to use these to pull variables for each of these locations. These will then be used for principal component analyses whos goal is to decompose the correlation structures between all these locations.

Pull Daily Surface Current Values

The data obtained from the Chen lab directly contains surface current vector information, and does not contain siglev coordinates.

The code to pull these variables from these files is slightly different for this reason (mainly just one less dimension to provide “start” and “count” arguments for within ncdf4::ncvar_get())

Daily current information gives us the option of estimating a more refined/targeted seasonal index and would give us higher resolution throughout the year.

Literature suggests connectivity is highest in the winter (concurrent with offshore EMCC veering) and again in spring/summer (time of strongest EMCC flow).

During late-fall and early winter the inshore flow reverses*

Both the circulation and particle tracking models suggested that the connectivity generally peaks twice annually, highest in winter and then secondarily in late spring or early summer. The former is concurrent with the most southwest offshore veering of the EMCC, while the latter is concurrent with the strongest EMCC. Moreover, the counter-WMCC can reduce the connectivity and result in year-to-year variations. Li et al. 2022

Code
#' @title Extract Netcdf Data with Index List
#' 
#' @description Subsets data from the results of ncvar_get(). Accepts single or
#' multiple indexes for one or more dimensions using a list. List order should
#' match the relevant dimensions for the variable.
#' 
#' If no indexing is supplied for a relevant dimension, all elements are returned.
#'
#' @param nc_array 
#' @param index_list 
#'
#' @returns
#' @export
#'
#' @examples
subset_nc_var <- function(nc_array, index_list) {
  
  # Get the number of dimensions
  dims <- length(dim(nc_array))
  
  # Create a list of indices, defaulting to ":" (keeping all elements)
  full_index_list <- rep(list(quote(expr = )), dims)  
  
  # Update with user-specified indices
  for (dim_idx in seq_along(index_list)) {
    if (!is.null(index_list[[dim_idx]])) {
      full_index_list[[dim_idx]] <- index_list[[dim_idx]]
    }
  }
  
  # Use do.call to apply indexing dynamically
  extracted_data <- do.call(`[`, c(list(nc_array), full_index_list, list(drop = TRUE)))
  
  return(extracted_data)
}
Code
# Function to grab each of them for a netcdf connection:
# Loop through daily surfbot files and pull values for the proper nodes
get_elem_timeseries <- function(fpath, elem_list){
  
  # Open (lazy-load) the netcdf connection
  fvcom_x <- nc_open(fpath)

  # Time dimension info
  time_dim <- ncvar_get(fvcom_x, "Times")
  
  # Get U
  u_array <- ncvar_get(
    nc = fvcom_x, 
    varid = "surface_u", 
    start = c(1, 1),
    count = c(-1, -1))
  
  # Get V
  v_array <- ncvar_get(
    nc = fvcom_x, 
    varid = "surface_v", 
    start = c(1, 1),
    count = c(-1, -1))
  
  # Get Lon/Lat
  lon_array <- ncvar_get(fvcom_x, varid = "lonc", start = c(1), count = c(-1))
  lat_array <- ncvar_get(fvcom_x, varid = "latc", start = c(1), count = c(-1))
  
  # Close connection to the netcdf:
  nc_close(fvcom_x)
  
  # Start index (just grab all the way through)
  daily_df <- map_dfr(elem_list, function(elem_x){
    elem_lon <- lon_array[[elem_x]]
    elem_lat <- lat_array[[elem_x]]
    data.frame(
      "time" = time_dim,
      "lonc" = elem_lon,
      "latc" = elem_lat,
      "u" = subset_nc_var(u_array, list(elem_x)) ,
      "v" = subset_nc_var(v_array, list(elem_x))
      )
    
  }, .id = "elem")

 
  
  # Return the table
  message(str_c("Completed: ", fpath))
  return(daily_df)
}
Code
# Get the elem numbers we care about for ncvar_get()
mcc_elems <- unique(mcc_studyarea_mesh$elem) %>%
  setNames(unique(mcc_studyarea_mesh$elem))



# # Test one
#t1 <-  get_elem_timeseries(fpath = fvcom_surfbot_files[[1]], elem_list = mcc_elems[1])

# Run them all
daily_mcc_surface_currents <- map_dfr(
  .x = fvcom_surfbot_files,
  .f = ~get_elem_timeseries(fpath = .x, elem_list = mcc_elems))


# Save them
write_csv(
  daily_mcc_surface_currents,
  here::here("data/daily_mcc_surface_currents_vectors.csv"))

Visual Inspection of Current Vectors

If we load in the table(s) of the element values processed above, we can start to look what the current vectors look like on a map.

Current vectors can be visualized easily using ggplot with metR::geom_vector().

Confirm Daily Records

The following figure shows the daily Eastward and Northward velocities at centroid #48733.

Code
# Load daily:
daily_mcc_sc <- read_csv(here::here("data/daily_mcc_surface_currents_vectors.csv"))  %>% 
  mutate(
    time = as.Date(time),
    label = time,
    year = str_sub(label, 1,4),
    month = as.numeric(str_sub(label, 6, 7)))


# Plot the velocities for an element as a check
daily_mcc_sc %>% 
  filter(elem == 48733) %>% 
  filter(between(time, as.Date("1978-01-01"), as.Date("2019-12-31"))) %>% 
  ggplot() +
  geom_line(aes(time, u, color = "Eastward Velocity (u)"), alpha = 0.3) +
  geom_line(aes(time, v, color = "Northward Velovity (v)"), alpha = 0.3) +
  labs(
    title = "Confirming Daily Records",
    subtitle = "centroid index: 48733"
  )

Mapping Long-term Averages

Before proceeding too far, here is what the average flow direction is across the domain of interest to us:

Code
# Quick verification that directions make sense
average_directions <-  daily_mcc_sc %>% 
  mutate(period = lubridate::month(time, label = TRUE)) %>% 
  bind_rows(mutate(daily_mcc_sc, period = "Year Round")) %>% 
  mutate(period = factor(period, levels = c(month.abb, "Year Round"))) %>% 
  group_by(period, lonc, latc) %>% 
  summarise(
    across(.cols = c(u,v), .fns = ~mean(.x, na.rm = T))) %>% 
  mutate(
    # Radian to degree conversion
    angle = atan2(v, u) * 180 / pi,   # Convert from radians to degrees
    angle = if_else(angle<0, angle+360, angle),
    speed = sqrt(u^2 + v^2),          # Compute speed (magnitude)
    dx = u / speed * 0.05,            # Scale x-component for visualization
    dy = v / speed * 0.05            # Scale y-component for visualization
  )


# Map them
ggplot(average_directions) +
  geom_sf(data = new_england) +
  geom_segment(
    aes(lonc, latc, xend = lonc + dx, yend = latc + dy, color = angle),
    arrow = arrow(length = unit(0.03, "cm")),
    linewidth = 0.25) +
  coord_sf(
    xlim = c(-70, -67.3), 
    ylim = c(43.2, 44.7), 
    expand = F) +
  facet_wrap(~period, ncol = 3) +
  scale_color_scico(
    palette = 'romaO',
    breaks = c(45, 90, 135, 180, 225, 270, 315, 360),
    labels = c("NE", "N", "NW", "W", "SW", "S", "SE", "E"),
    limits = c(0,360)) +
  guides(color = guide_colring(
    reverse = T, 
    start = pi/2)) +
  theme(legend.position = "right") +
  labs(title = "Average Surface Current Direction", 
       x = "Longitude", y = "Latitude")

MCC Index from Principal Component Analysis

latc/lonc coordinate pairs for the centers of each triangle provide a unique ID for each element.

When transposed we can create a matrix where each triangular element value has one row per month (or daily, whatever timestep), and one column for each variable (eastward velocity, northward etc.). From this matrix we can perform a Principal Component Analysis (PCA) returning 2 or more principal components that explain a share of the variance in the matrix.

Question: What variables to include/exclude for PCA

We can perform the PCA using only one or more of the current variables, essentially just the water flow characteristics. Or we could include temperature and salinity and focus more on the water mass characteristics.

Code used for past indices looked at the vertically averaged Eastward current vector in isolation.

The primary variables of interest for this step are:

  • u The Daily Eastward Velocity in meters s-1
  • v The Daily Northward Water Velocity in meters s-1

temperature and salinity may also be used to determine/delineate the different water masses.

Method/Approach Questions:

Take a step back, understand the flow before trying to relate the PCA back.

Our aim is generate a timeseries that is either a direct measure or a good proxy for MCC flow continuity. We care about this because flow characteristics determine whether lobster larva are transported along the coast or advected offshore.

In chatting with Damien, there may not be 1 best way to get the meaningful information out of the different FVCOM variables available.

It may make sense to tailor the data we use and the index we generate to lobster question:

  • This could mean limitieng the time span, to include the time of year when lobster larva are dispersing.

Simple Test: PCA Approach

The following approach uses Northward (v) and Eastward(u) water velocity of the surface layer for the principal component analysis.

Code
# For Area weighted PCA, get mesh element areas


# Get the surface area of all the elements
elem_areas_df <- map_dfr(
  split(mcc_studyarea_mesh, mcc_studyarea_mesh$elem),
   function(x){
     data.frame("elem" = x$elem, "area" = st_area(x))
})


# A. Surface current variables (u,v)
pca_mat <- daily_mcc_sc %>% 
  # Join surface areas and weight u + v
  # left_join(elem_areas_df) %>% 
  # mutate(across(c(u,v), ~.x*sqrt(area))) %>%  # Weight the current values by the areas
  select(time, elem, u, v) %>% 
  pivot_wider(
    names_from = elem, 
    values_from = c(u, v)) %>% 
  column_to_rownames("time")

# # Inspect
# pca_mat[1:6,1:6]


# Do PCA
# If all columns have comparable variances and you want to retain absolute differences in current strength, center but do not scale.
mcc_pca <- prcomp(pca_mat, scale. = TRUE, center = TRUE)

PC Loadings

The following maps show how the two principal component rotations/loadings (the same if matrix is scaled) relate to areas along the study area.

Code
# Summary - Proportion of variance
mcc_pca_summ <- summary(mcc_pca)
mcc_pca_summ$importance[1:2, 1:2]
                            PC1      PC2
Standard deviation     21.45602 18.97389
Proportion of Variance  0.39482  0.30876
Code
# Pull + Reshape the Rotations / Loadings
mcc_loadings <- data.frame(
  "PC1" = mcc_pca$rotation[,"PC1"],
  "PC2" = mcc_pca$rotation[,"PC2"]) %>% 
  rownames_to_column("loc") %>% 
  separate(col = "loc", into = c("var", "elem"), sep = "_") %>% 
  pivot_longer(
    cols = starts_with("PC"), 
    names_to = "PC", 
    values_to = "PC_rotation") %>% 
  mutate(longname = if_else(var == "u", "Eastward Velocity (u)", "Northward Velocity (v)"))



# Map The Loadings
mcc_studyarea_mesh %>% 
  mutate(elem = as.character(elem)) %>% 
  left_join(mcc_loadings) %>% 
  st_as_sf() %>% 
  ggplot() +
  geom_sf(data = new_england) +
  geom_sf(aes(fill = PC_rotation)) +
  facet_grid(PC~longname) +
  coord_sf(
    xlim = c(-70, -67.3), 
    ylim = c(43.2, 44.7), 
    expand = F) +
  scale_fill_distiller(palette = "RdBu", limits = c(-0.06, 0.06)) +
  map_theme(
    legend.title.position = "top",
    legend.title = element_text(hjust = 0.5),
    legend.position = "bottom")

PC Timeseries

Code
# Pull Principal Components overall values
mcc_pcomponents <- data.frame(
  "date" = rownames(pca_mat),
  "PC1" = mcc_pca$x[,"PC1"],
  "PC2" = mcc_pca$x[,"PC2"])  %>% 
  mutate(
    date = as.Date(date), 
    mon = lubridate::month(date), 
    yr = lubridate::year(date))

# Make a monthly summary, with a 12-month rolling average
mcc_pc_monthly <- mcc_pcomponents %>% 
  pivot_longer(
    starts_with("PC"), 
    names_to = "Principal Component", 
    values_to = "vals") %>% 
  group_by(yr, mon, `Principal Component`) %>% 
  summarise(vals = mean(vals),
            .groups = "drop") %>% 
  mutate(date = as.Date(
    str_c(
      yr,
      str_pad(mon, side = "left", width = 2, pad = "0"),
      "15",
      sep = "-"))) %>% 
  group_by(`Principal Component`) %>% 
  arrange(date) %>% 
  mutate(
    vals_rollmean = zoo::rollmean(vals, k = 12, fill = NA, align = "center"))



# Plot daily with the monthly smooth
mcc_pc_monthly %>% 
  ggplot() +
  geom_line(
    aes(date, vals, color = `Principal Component`),
    linewidth = 0.4, alpha = 0.4) +
  geom_line(
    aes(date, vals_rollmean, color = `Principal Component`),
    linetype = 1) +
  rcartocolor::scale_color_carto_d() + 
  labs(
    title = "MCC Surface Current PCA Timeseries",
    y = "Principal Component Value",
    x = "Date",
    color = "12-month average")

Validation Exercise: Pettigrew 2005 Years

I need to figure out how to relate the principal components to the open/closed gate language of pettigrew 2005.

This study used 1998-2001 as the study years, and characterized strong offshore veering in 1998 and strong continuous flow in 2000.

These years may be useful for characterizing the relationship between the current principal components. Here is what the current directions were like during those years:

Code
# Quick verification that directions make sense
pettigrew_directions <-  daily_mcc_sc %>% 
  filter(between(time, as.Date("1998-01-01"), as.Date("2001-12-31"))) %>% 
  mutate(year = lubridate::year(time)) %>% 
  filter(year != 1999) %>% 
  mutate(
    gate_position = case_match(
      year, 
      1998~"Gate Closed", 
      2000~"Gate Open", 
      2001~"Mixed")) %>% 
  group_by(gate_position, year, lonc, latc) %>% 
  summarise(
    across(.cols = c(u,v), .fns = ~mean(.x, na.rm = T)),
    .groups = "drop") %>% 
  mutate(
    # Radian to degree conversion
    angle = atan2(v, u) * 180 / pi,   # Convert from radians to degrees
    angle = if_else(angle<0, angle+360, angle),
    speed = sqrt(u^2 + v^2),          # Compute speed (magnitude)
    dx = u / speed * 0.05,            # Scale x-component for visualization
    dy = v / speed * 0.05)            # Scale y-component for visualization



# Map them
ggplot(pettigrew_directions) +
  geom_sf(data = new_england) +
  geom_segment(
    aes(
      lonc, 
      latc, 
      xend = lonc + dx, 
      yend = latc + dy,
      color = angle),
    arrow = arrow(length = unit(0.05, "cm")),
    linewidth = 0.5) +
  theme_minimal() +
  theme(
    legend.position = "right") +
  coord_sf(
    xlim = c(-70, -67.3), 
    ylim = c(43.2, 44.7), 
    expand = F) +
  facet_wrap(~gate_position*year, ncol = 2) +
  scale_color_scico(
    palette = 'romaO',
    breaks = c(45, 90, 135, 180, 225, 270, 315, 360),
    labels = c("NE", "N", "NW", "W", "SW", "S", "SE", "E"),
    limits = c(0,360)) +
  guides(color = guide_colring(
    reverse = T, 
    start = pi/2)) +
  labs(
    title = "MCC Pettigrew 2025: Gate Open/Closed/Mixed Example Years",
    subtitle = "Average Surface Current Direction", 
    x = "Longitude", 
    y = "Latitude",
    color = "Current Direction")

Validation Exercise: PC Timeseries Peaks

Choose the 12-month periods where principal components are at their peaks. Make maps of what the current directions over that same period of time.

Code
# Get the dates when the 12 month average are highest
mcc_peaks <- mcc_pc_monthly %>% 
  select(-vals) %>% 
  filter(date > as.Date("1990-01-01")) %>% 
  pivot_wider(names_from = `Principal Component`, values_from = vals_rollmean) %>% 
  mutate(
    PC_state = case_when(
       PC1 == max(PC1, na.rm = T) ~ "PC1 High",
       PC2 == max(PC2, na.rm = T) ~ "PC2 High",
       PC1 == min(PC1, na.rm = T) ~ "PC1 low",
       PC2 == min(PC2, na.rm = T) ~ "PC2 low",
       TRUE ~ NA)) %>% 
  filter(is.na(PC_state) == FALSE)


#  Plot the timeseries with those breaks
mcc_pc_monthly %>% 
  ggplot(aes(date, vals, color = `Principal Component`)) +
    geom_line(
      aes(date, vals, color = `Principal Component`),
      linewidth = 0.4, alpha = 0.4) +
    geom_line(
      aes(date, vals_rollmean, color = `Principal Component`),
      linetype = 1) +
    geom_vline(
      data = mcc_peaks,
      aes(xintercept = date), color = "gray30", linetype = 2) +
    rcartocolor::scale_color_carto_d() + 
    labs(
      title = "MCC Surface Current PCA Timeseries",
      subtitle = "Peak States in 12-month average (dashed line)",
      y = "Principal Component Value",
      x = "Date")

Code
# Get the average currents for the 12 months around those breaks
peak_directions <- mcc_peaks %>% 
  split(.$date) %>% 
  map_dfr(function(peak_date){
    
    # Get the start-end date brackets for the year
    start_month <- peak_date$date - lubridate::dmonths(6)
    end_month <- peak_date$date + lubridate::dmonths(6)
    yr <- peak_date$yr
    mon <- str_pad(peak_date$mon, width = 2, side = "left", pad = "0")
    state <- peak_date$PC_state
    
    # Filter to that period, get average conditions
    daily_mcc_sc %>% 
      filter(between(time, start_month, end_month)) %>% 
      group_by(lonc, latc) %>% 
      summarise( 
        across(
          .cols = c(u,v), 
          .fns = ~mean(.x, na.rm = T)),
        .groups = "drop")  %>% 
    mutate(
      # Radian to degree conversion
      angle = atan2(v, u) * 180 / pi,   
      angle = if_else(angle<0, angle+360, angle),
      speed = sqrt(u^2 + v^2),          
      dx = u / speed * 0.05,            
      dy = v / speed * 0.05) %>% 
        mutate(
          start_month = start_month,  
          end_month = end_month,  
          PC_state = state,
          ymon = str_c(yr, mon, sep = "-"),
          .before = "lonc")
      
    # End the function
    })




# Map the current directions at those times
peak_direction_maps <- ggplot(peak_directions) +
  geom_sf(data = new_england) +
  geom_segment(
    aes(
      lonc, 
      latc, 
      xend = lonc + dx, 
      yend = latc + dy,
      color = angle),
    arrow = arrow(length = unit(0.05, "cm")),
    linewidth = 0.5) +
  theme_minimal() +
  theme(
    legend.position = "right") +
  coord_sf(
    xlim = c(-70, -67.3), 
    ylim = c(43.2, 44.7), 
    expand = F) +
  facet_wrap(~PC_state*ymon, ncol = 2) +
  scale_color_scico(
    palette = 'romaO',
    breaks = c(45, 90, 135, 180, 225, 270, 315, 360),
    labels = c("NE", "N", "NW", "W", "SW", "S", "SE", "E"),
    limits = c(0,360)) +
  guides(color = guide_colring(
    reverse = T, 
    start = pi/2)) +
  labs(
    title = "Peak/Trough Periods in PC Timeseries",
    subtitle = "Average Surface Current Direction", 
    x = "Longitude", 
    y = "Latitude",
    color = "Current Direction")


# Stack them
peak_direction_maps

PCA on Angular Direction and Speed Directly?

We could try coding the PCA to use angle and speed directly rather than the Northward and Eastward velocity components.

The cosine and sine components of the current’s angular direction preserve the cyclical nature of the currents.

Code
# Get the Cyclic X and Y components using trig
angular_pca_df <- daily_mcc_sc %>%
  mutate(
    angle_rad = atan2(v, u),    # Compute angle in radians
    cos_theta = cos(angle_rad), # Cyclic X component
    sin_theta = sin(angle_rad), # Cyclic Y component
    speed = sqrt(u^2 + v^2)     # Optional: current speed
  )


# A. Pivot wider to get those three things by the mesh element
angular_pca_mat <- angular_pca_df %>% 
  select(time, elem, cos_theta, sin_theta, speed) %>% 
  pivot_wider(
    names_from = elem, 
    values_from = c(cos_theta, sin_theta, speed), names_sep = "X") %>% 
  column_to_rownames("time")

# Now run PCA using cos_theta and sin_theta instead of u, v
angular_pca_result <- prcomp(angular_pca_mat, center = TRUE, scale. = TRUE)

Angular PC Loadings

The following maps show how the two principal component rotations/loadings (the same if matrix is scaled) relate to areas along the study area.

Code
# Summary - Proportion of variance
angular_pca_summ <- summary(angular_pca_result)
angular_pca_summ$importance[1:2, 1:2]
                            PC1      PC2
Standard deviation     21.23591 19.03156
Proportion of Variance  0.25784  0.20709
Code
# Pull + Reshape the Rotations / Loadings
angular_loadings <- data.frame(
  "PC1" = angular_pca_result$rotation[,"PC1"],
  "PC2" = angular_pca_result$rotation[,"PC2"]) %>% 
  rownames_to_column("loc") %>% 
  separate(col = "loc", into = c("var", "elem"), sep = "X") %>% 
  pivot_longer(
    cols = starts_with("PC"), 
    names_to = "PC", 
    values_to = "PC_rotation") %>% 
  mutate(longname = case_match(
    var,
    "cos_theta" ~"Cyclic X component",
    "sin_theta" ~ "Cyclic Y component",
    "speed" ~ "Current Speed"))



# Map The Loadings
mcc_studyarea_mesh %>% 
  mutate(elem = as.character(elem)) %>% 
  left_join(angular_loadings) %>% 
  st_as_sf() %>% 
  ggplot() +
  geom_sf(data = new_england) +
  geom_sf(aes(fill = PC_rotation)) +
  facet_grid(PC~longname) +
  coord_sf(
    xlim = c(-70, -67.3), 
    ylim = c(43.2, 44.7), 
    expand = F) +
  scale_fill_distiller(palette = "RdBu", limits = c(-0.06, 0.06)) +
  map_theme(
    legend.title.position = "top",
    legend.title = element_text(hjust = 0.5),
    legend.position = "bottom")

Angular PC Timeseries

Code
# Pull Principal Components overall values
angular_pcomponents <- data.frame(
  "date" = rownames(angular_pca_mat),
  "PC1" = angular_pca_result$x[,"PC1"],
  "PC2" = angular_pca_result$x[,"PC2"])  %>% 
  mutate(
    date = as.Date(date), 
    mon = lubridate::month(date), 
    yr = lubridate::year(date))

# Make a monthly summary:
angular_pc_monthly <- angular_pcomponents %>% 
  pivot_longer(
    starts_with("PC"), 
    names_to = "Principal Component", 
    values_to = "vals") %>% 
  group_by(yr, mon, `Principal Component`) %>% 
  summarise(
    vals = mean(vals),
    .groups = "drop") %>% 
  mutate(date = as.Date(
    str_c(
      yr,
      str_pad(mon, side = "left", width = 2, pad = "0"),
      "15",
      sep = "-"))) %>% 
  group_by(`Principal Component`) %>% 
  arrange(date) %>% 
  mutate(vals_rollmean = zoo::rollmean(vals, k = 12, fill = NA, align = "center"))

Validation Exercise: Check PC Timeseries Peaks

Code
# Get the dates when the 12 month average are highest
angular_peaks <- angular_pc_monthly %>% 
  select(-vals) %>% 
  filter(date > as.Date("1990-01-01")) %>% 
  pivot_wider(names_from = `Principal Component`, values_from = vals_rollmean) %>% 
  mutate(
    PC_state = case_when(
       PC1 == max(PC1, na.rm = T) ~ "PC1 High",
       PC2 == max(PC2, na.rm = T) ~ "PC2 High",
       PC1 == min(PC1, na.rm = T) ~ "PC1 low",
       PC2 == min(PC2, na.rm = T) ~ "PC2 low",
       TRUE ~ NA)) %>% 
  filter(is.na(PC_state) == FALSE)

# Timeseries
angular_monthly_timeseries <- angular_pc_monthly %>% 
  ggplot(aes(date, vals, color = `Principal Component`)) +
    geom_line(
    aes(date, vals, color = `Principal Component`),
    linewidth = 0.4, alpha = 0.4) +
  geom_line(
    aes(date, vals_rollmean, color = `Principal Component`),
    linetype = 1) +
    geom_vline(
      data = data.frame(angular_peaks),
      aes(xintercept = date), color = "gray30", linetype = 2) +
    rcartocolor::scale_color_carto_d() + 
    labs(
      title = "MCC Angular PCA Timeseries",
      subtitle = "Peak States in 12-month average (dashed line)",
      y = "Principal Component Value",
      x = "Date")
angular_monthly_timeseries

Code
# 12-month average of lead-in
# Get the average currents for the 12 months around those breaks
angular_peak_directions <- angular_peaks %>% 
  split(.$date) %>% 
  map_dfr(function(peak_date){
    
    # Get the start-end date brackets for the year
    start_month <- peak_date$date - lubridate::dmonths(6)
    end_month <- peak_date$date + lubridate::dmonths(6)
    yr <- peak_date$yr
    state <- peak_date$PC_state
    mon <- str_pad(peak_date$mon, width = 2, side = "left", pad = "0")
    
    # Filter to that period, get average conditions
    daily_mcc_sc %>% 
      filter(between(time, start_month, end_month)) %>% 
      group_by(lonc, latc) %>% 
      summarise( 
        across(
          .cols = c(u,v), 
          .fns = ~mean(.x, na.rm = T)),
        .groups = "drop")  %>% 
    mutate(
      # Radian to degree conversion
      angle = atan2(v, u) * 180 / pi,   
      angle = if_else(angle<0, angle+360, angle),
      speed = sqrt(u^2 + v^2),          
      dx = u / speed * 0.05,            
      dy = v / speed * 0.05) %>% 
    mutate(
      start_month = start_month,  
      end_month = end_month,  
      PC_state = state,
      ymon = str_c(yr, mon, sep = "-"),
      .before = "lonc")
      
    # End the function
    })





# Map them
peak_direction_maps <- ggplot(angular_peak_directions) +
  geom_sf(data = new_england) +
  geom_segment(
    aes(
      lonc, 
      latc, 
      xend = lonc + dx, 
      yend = latc + dy,
      color = angle),
    arrow = arrow(length = unit(0.05, "cm")),
    linewidth = 0.5) +
  theme_minimal() +
  theme(
    legend.position = "right") +
  coord_sf(
    xlim = c(-70, -67.3), 
    ylim = c(43.2, 44.7), 
    expand = F) +
  facet_wrap(~PC_state*ymon, ncol = 2) +
  scale_color_scico(
    palette = 'romaO',
    breaks = c(45, 90, 135, 180, 225, 270, 315, 360),
    labels = c("NE", "N", "NW", "W", "SW", "S", "SE", "E"),
    limits = c(0,360)) +
  guides(color = guide_colring(
    reverse = T, 
    start = pi/2)) +
  labs(
    title = "Peak/Trough Periods in AngularPC Timeseries",
    subtitle = "Average Surface Current Direction", 
    x = "Longitude", 
    y = "Latitude",
    color = "Current Direction")


# Stack them
peak_direction_maps


Alternative Approaches: Area-Flux

An alternative way to potentially get at MCC flow continuity is by measuring water volume flux directly for some transect along this area, and potentially use that instead of the PCA approach.

We could frame it as either the total southwestern flux, or if its clear/consistent where positionally the current veers south, we could measure southward flux in that area.

It would be super interesting to correlate the surface volume flux with the river output from the penobscot.