Skip to contents

About

This package contains various functions for accessing (pulling) and cleaning federal fisheries dependent and independent data. Many of these data sets are confidential and require a formal data request submission to the appropriate governing body. Information regarding metadata and access for most of these data sets can be accessed via inPort, see below:

InPort is the authoritative metadata repository and data inventory platform for NOAA Fisheries and the National Ocean Service. The system supports documentation of datasets and provides tools to facilitate data discovery, public access, and responsible stewardship of scientific data within these line offices.

As such, using species.shifts will require access to this data in order to use the cleaning and plotting functions. It is strongly recommended that this data be stored locally in a centralized repository. Please refer to the package README more information on recommended work flows.

Data sets

The functions in this package correspond to six federal fisheries data sets, each with its own respective pull_() function.

NOAA Fisheries Vessel Trip Reports

NEFSC Observer at Sea

NEFSC Spring-Fall Bottom Trawl Survey

GARFO Permits Data

GARFO Dealer Reported Landings

NOAA Fisheries Marine Recreational Information Program


Vessel trip reports

vtr <- pull_vtr(proj_path = my_path)

year trip_type lat lon species_name port_name state_abb state_full kept discarded
1996 COMMERCIAL 41.67694 -69.59028 FLOUNDER, WINTER / BLACKBACK CHATHAM MA Massachusetts 350 0
1996 COMMERCIAL 42.54417 -70.53750 COD GLOUCESTER MA Massachusetts 850 0
1996 COMMERCIAL 41.37472 -71.57028 FLOUNDER, WINTER / BLACKBACK POINT JUDITH RI Rhode Island 72 0

*Trip ID information removed from demo to maintain confidentiality.

How it works: The vessel trip report data was received as a ZIP folder with several CSV files, a separte file for each year of interest. This function reads these files and combines them into a single data frame. For consistency across each file, vtrserno is converted into a character vector, while year, sub_trip_id, calc_lat_deg, calc_lat_min, calc_lat_sec, calc_lon_deg, calc_lon_min, calc_lon_sec, and calc_inshr_area are all forced as numerics.

calc_lat_deg, calc_lat_min, calc_lat_sec and converted and combined to create a decimal degree measurement for latitude, and the same is done for longitude. From there, any states outside GARFO jurisdiction are removed (Texas, Alabama, etc.) and trips are filtered to the area between -60 and -80 degrees west and 20 degrees north.

The data that is returned includes all that is shown above, and sub_trip_id as the unique trip identifier.


Fisheries Observer

observer <- pull_observer(proj_path = my_path)

year negear comname targspec1 targspec2 targspec3 lat lon hailwt live_wt kept decade
1989 50 MONKFISH (GOOSEFISH) 5260 0 0 40.91 -70.83083 25 25 Kept 1980
1989 50 BUTTERFISH 5260 0 0 40.91 -70.83083 25 25 Kept 1980
1989 50 FLOUNDER, YELLOWTAIL 5260 0 0 40.91 -70.83083 100 100 Discarded 1980

*Trip ID information removed from demo to maintain confidentiality.

How it works: The fisheries observer information was received from the Northeast Fisheries Science Center as a Microsoft Excel Workbook. The data spans 1989 to 2024 and contains trip, haul and catch data from that time frame. The haul data is broken up into two separate sheets within the excel workbook. Catch data is stored within individual sheets for each year of the data. All trip data is contained within one sheet.

From the readxl package, excel_sheets() to read in the entire workbook, and map read_excel() to read in each sheet. The two haul data sheets are combined to create a singular data frame. Catch data from 1989-1995 are missing a year column, and so one is created from the link column in order to combine these years with subsequent years of data. Haul data and catch data are joined by the link3 column.

Because coordinates are recorded at different points depending on the type of gear used on the trip, an intermediate data set is created to capture the proper recorded coordinates to each trip. This list is then joined back to the combined catch-haul data set to ensure accurate coordinates based on trip type.


NEFSC Bottom Trawl

nefsc <- pull_nefsc(proj_path = my_path)

id svspp comname year est_month est_day season lat lon est_towdate total_biomass_kg
1.97003e+12 23 winter skate 1970 3 0 Spring 41.58333 -69.46667 3/12/70 8:00 3.128
1.97003e+12 28 thorny skate 1970 3 0 Spring 41.58333 -69.46667 3/12/70 8:45 0.900
1.97003e+12 73 atlantic cod 1970 3 0 Spring 41.58333 -69.46667 3/12/70 8:45 8.910

How it works: The trawl data is received from the Northeast Fisheries Science Center as a .Rdata file pulled from an SQL database. survdat is extracting from the larger list of data and is the basis of the data used here. survdat is combined with a previously built species list to add the common names of survey species.

From there, a unique tow ID is built based on the cruise, station, and strata surveyed. A date column is added based on the estimate month and day of the survey.

Observations where there is a mismatch in abundance and biomass are revised so that when biomass is 0 but abundance is greater than 0, the biomass is recorded as 0.0001 kg and when abundance is 0 but biomass is greather than 0, the abundance is recorded as 1.

Strata and species not regularly sampled are then filtered out from the data, and the data subset to begin at 1970, when the trawl began to run consistently.


GARFO Federal Permits

permits <- pull_permits(proj_path = my_path)

ap_year ap_num vp_num pport ppst permit target category lat long state_full council
1996 227829 221741 GLOUCESTER MA lobster_1 lobster commercial 42.61536 -70.66246 Massachusetts New England
1996 227830 127133 GLOUCESTER MA lobster_1 lobster commercial 42.61536 -70.66246 Massachusetts New England
1996 227831 231461 RYE NH lobster_1 lobster commercial 43.01202 -70.77194 New Hampshire New England

How it works: Federal permits data is received as a series of individual Excel spreadsheets, one for each year. This function is designed to read in multiple Excel files from a single source folder and combine them into one data frame.

Permits are grouped by species, in that each species is represented by a single column and each permits type represented by a number, letter, or some combination of both. If an individual holds multiple permits types/endorsements, a single entry is made and those values are separated by a comma within a single cell. As such, this function breaks up each column into the respective number of permits types, and renamed with the target species name and permit type (ex: black_sea_bass_1 and black_sea_bass_2). The all permit type columns are then pivoted, so that each permit application number has a corresponding permit type and value 1 for that permit.

From there, the names of target species are cleaned and added as a separate column, for later species-level grouping. An additional column is added based on whether a permit type is categorized as commercial, for-hire/charter or recreational. Because we are interested in also accessing species-level trends, permits that cover multiple species (multispecies and squid/mackerel/butterfish) have been parsed out so that each specific permit type corresponds with it’s target species.

Lastly, permits are geocoded using tidygeocoder to their reported principal port and grouped to their respective management council region. Roughly 1% of entries are lost due to misspelled principal ports.


GARFO Dealer-reported landings

landings <- pull_landings(proj_path = my_path)

year portnm state species_name land live value confidential comname lat long state_full council
1996 ADDISON ME clam, quahog, ocean 26418 217965 98152 NA ocean clam, quahog 44.61882 -67.74445 Maine New England
1996 ADDISON ME lobster, american 355359 355359 1178910 NA american lobster 44.61882 -67.74445 Maine New England
1996 ADDISON ME scallop, sea 9704 80837 66560 NA sea scallop 44.61882 -67.74445 Maine New England

How it works: Confidential landings data was received along with the vessel trip reports as an Excel spreadsheet. The first 9 lines contained metadata as provided by GARFO. As such, this function uses readxl to read in the file and requires a skip argument to skip the first few rows that contain metadata. SwimmeR is used to clean the species names and tidygeocoder to geocode the ports associated with landings.


Marine Recreational Information Program

Directed trips

mrip_directed_trips <- pull_mrip_directed_trips()

year wave st sub_reg mode_fx area_x species Trips SE PSE state mode area
2018 1 10 5 3 1 striped bass 23379.88 8886.435 38.0 DE Shore State Territorial Seas (Ocean<=3 mi excluding Inland)
2018 1 10 5 3 5 striped bass 169377.67 40817.095 24.1 DE Shore Inland
2018 1 10 5 7 1 striped bass 44944.57 38431.208 85.5 DE Private & Rental State Territorial Seas (Ocean<=3 mi excluding Inland)

How it works: The function to pull in and clean the MRIP directed trips is arguably the most complex of the data cleaning functions. The data is hosted online and is downloadable as a ZIP file for individual years. This function pulls in the zipped files, unzips them and saves them locally in intermediate files, which are required for the trip estimation code. Once the data has been pulled in and saved to the intermediary files, the trip estimates are calculated using Gary Nelson’s MRIP directed trip estimate R template. This code estimates the number of directed trips by species, wave, domain, trip type and year.

For this analyses, we used all waves, primary target trip type, all states along the east coast, and calculate estimates for Atlantic croaker, Atlantic mackerel, black sea bass, blueline tilefish, bluefish, gray triggerfish, king mackerel, Spanish mackerel, striped bass, summer flounder, scup, spiny dogfish, goosefish (monkfish) and tilefish (golden tilefish).


Catch estimates

mrip_catch <- pull_mrip_catch()

estimate_status year coast region state common_name sci_name harvest_a_b1_numbers pse_harvest_a_b1_numbers harvest_a_b1_weight_lb pse_harvest_a_b1_weight released_alive_b2_numbers released_alive_b2_lower_95_percent_confidence_limit released_alive_b2_upper_95_percent_confidence_limit total_catch_a_b1_b2_numbers total_catch_a_b1_b2_lower_95_percent_confidence_limit total_catch_a_b1_b2_upper_95_percent_confidence_limit total_catch_pse comname
Final 2024 Atlantic Coast Mid Atlantic Delaware rudderfish, banded Seriola zonata 0 0.0 0 0.0 23 0 56 23 0 56 73.9 banded rudderfish
Final 2024 Atlantic Coast Mid Atlantic Delaware shark, dogfish, spiny Squalus acanthias 0 0.0 0 0.0 35430 6611 64249 35430 6611 64249 41.5 spiny dogfish
Final 2024 Atlantic Coast Mid Atlantic Delaware bass, black sea Centropristis striata 105763 28.8 130995 28.1 1008441 290955 1725926 1114203 395720 1832686 32.9 black sea bass

How it works: Catch estimates are publically available on the ACCSP Data Warehouse and downloaded as a CSV File. This function reads in the CSV and cleans the species names to be uniform with other data sets.


Plotting

Once the data has been pulled and saved to the local environment, each data set has a respective plot_() or map_() function. More about those functions here.