Accessing and cleaning federal fisheries data • species.shifts

About

This package contains various functions for accessing (pulling) and cleaning federal fisheries dependent and independent data. Many of these data sets are confidential and require a formal data request submission to the appropriate governing body. Information regarding metadata and access for most of these data sets can be accessed via inPort, see below:

InPort is the authoritative metadata repository and data inventory platform for NOAA Fisheries and the National Ocean Service. The system supports documentation of datasets and provides tools to facilitate data discovery, public access, and responsible stewardship of scientific data within these line offices.

As such, using species.shifts will require access to this data in order to use the cleaning and plotting functions. It is strongly recommended that this data be stored locally in a centralized repository. Please refer to the package README more information on recommended work flows.

Data sets

The functions in this package correspond to six federal fisheries data sets, each with its own respective pull_() function.

NOAA Fisheries Vessel Trip Reports

NEFSC Observer at Sea

NEFSC Spring-Fall Bottom Trawl Survey

GARFO Permits Data

GARFO Dealer Reported Landings

NOAA Fisheries Marine Recreational Information Program

Vessel trip reports

vtr <- pull_vtr(proj_path = my_path)

year	trip_type	lat	lon	species_name	port_name	state_abb	state_full	kept
1996	COMMERCIAL	41.67694	-69.59028	FLOUNDER, WINTER / BLACKBACK	CHATHAM	MA	Massachusetts	350
1996	COMMERCIAL	42.54417	-70.53750	COD	GLOUCESTER	MA	Massachusetts	850
1996	COMMERCIAL	41.37472	-71.57028	FLOUNDER, WINTER / BLACKBACK	POINT JUDITH	RI	Rhode Island	72

*Trip ID information removed from demo to maintain confidentiality.

How it works: The vessel trip report data was received as a ZIP folder with several CSV files, a separte file for each year of interest. This function reads these files and combines them into a single data frame. For consistency across each file, vtrserno is converted into a character vector, while year, sub_trip_id, calc_lat_deg, calc_lat_min, calc_lat_sec, calc_lon_deg, calc_lon_min, calc_lon_sec, and calc_inshr_area are all forced as numerics.

calc_lat_deg, calc_lat_min, calc_lat_sec and converted and combined to create a decimal degree measurement for latitude, and the same is done for longitude. From there, any states outside GARFO jurisdiction are removed (Texas, Alabama, etc.) and trips are filtered to the area between -60 and -80 degrees west and 20 degrees north.

The data that is returned includes all that is shown above, and sub_trip_id as the unique trip identifier.

Fisheries Observer

observer <- pull_observer(proj_path = my_path)

year	negear	comname	targspec1	lat	lon	hailwt	live_wt	kept	decade
1989	50	MONKFISH (GOOSEFISH)	5260	40.91	-70.83083	25	25	Kept	1980
1989	50	BUTTERFISH	5260	40.91	-70.83083	25	25	Kept	1980
1989	50	FLOUNDER, YELLOWTAIL	5260	40.91	-70.83083	100	100	Discarded	1980

*Trip ID information removed from demo to maintain confidentiality.

How it works: The fisheries observer information was received from the Northeast Fisheries Science Center as a Microsoft Excel Workbook. The data spans 1989 to 2024 and contains trip, haul and catch data from that time frame. The haul data is broken up into two separate sheets within the excel workbook. Catch data is stored within individual sheets for each year of the data. All trip data is contained within one sheet.

From the readxl package, excel_sheets() to read in the entire workbook, and map read_excel() to read in each sheet. The two haul data sheets are combined to create a singular data frame. Catch data from 1989-1995 are missing a year column, and so one is created from the link column in order to combine these years with subsequent years of data. Haul data and catch data are joined by the link3 column.

Because coordinates are recorded at different points depending on the type of gear used on the trip, an intermediate data set is created to capture the proper recorded coordinates to each trip. This list is then joined back to the combined catch-haul data set to ensure accurate coordinates based on trip type.

NEFSC Bottom Trawl

nefsc <- pull_nefsc(proj_path = my_path)

id	svspp	comname	year	est_month	season	lat	lon	est_towdate	total_biomass_kg
1.97003e+12	23	winter skate	1970	3	Spring	41.58333	-69.46667	3/12/70 8:00	3.128
1.97003e+12	28	thorny skate	1970	3	Spring	41.58333	-69.46667	3/12/70 8:45	0.900
1.97003e+12	73	atlantic cod	1970	3	Spring	41.58333	-69.46667	3/12/70 8:45	8.910

How it works: The trawl data is received from the Northeast Fisheries Science Center as a .Rdata file pulled from an SQL database. survdat is extracting from the larger list of data and is the basis of the data used here. survdat is combined with a previously built species list to add the common names of survey species.

From there, a unique tow ID is built based on the cruise, station, and strata surveyed. A date column is added based on the estimate month and day of the survey.

Observations where there is a mismatch in abundance and biomass are revised so that when biomass is 0 but abundance is greater than 0, the biomass is recorded as 0.0001 kg and when abundance is 0 but biomass is greather than 0, the abundance is recorded as 1.

Strata and species not regularly sampled are then filtered out from the data, and the data subset to begin at 1970, when the trawl began to run consistently.

GARFO Federal Permits

permits <- pull_permits(proj_path = my_path)

ap_year	ap_num	vp_num	pport	ppst	permit	target	category	lat	long	state_full	council
1996	227829	221741	GLOUCESTER	MA	lobster_1	lobster	commercial	42.61536	-70.66246	Massachusetts	New England
1996	227830	127133	GLOUCESTER	MA	lobster_1	lobster	commercial	42.61536	-70.66246	Massachusetts	New England
1996	227831	231461	RYE	NH	lobster_1	lobster	commercial	43.01202	-70.77194	New Hampshire	New England

How it works: Federal permits data is received as a series of individual Excel spreadsheets, one for each year. This function is designed to read in multiple Excel files from a single source folder and combine them into one data frame.

Permits are grouped by species, in that each species is represented by a single column and each permits type represented by a number, letter, or some combination of both. If an individual holds multiple permits types/endorsements, a single entry is made and those values are separated by a comma within a single cell. As such, this function breaks up each column into the respective number of permits types, and renamed with the target species name and permit type (ex: black_sea_bass_1 and black_sea_bass_2). The all permit type columns are then pivoted, so that each permit application number has a corresponding permit type and value 1 for that permit.

From there, the names of target species are cleaned and added as a separate column, for later species-level grouping. An additional column is added based on whether a permit type is categorized as commercial, for-hire/charter or recreational. Because we are interested in also accessing species-level trends, permits that cover multiple species (multispecies and squid/mackerel/butterfish) have been parsed out so that each specific permit type corresponds with it’s target species.

Lastly, permits are geocoded using tidygeocoder to their reported principal port and grouped to their respective management council region. Roughly 1% of entries are lost due to misspelled principal ports.

GARFO Dealer-reported landings

landings <- pull_landings(proj_path = my_path)

year	portnm	state	species_name	land	live	value	confidential	comname	lat	long	state_full	council
1996	ADDISON	ME	clam, quahog, ocean	26418	217965	98152	NA	ocean clam, quahog	44.61882	-67.74445	Maine	New England
1996	ADDISON	ME	lobster, american	355359	355359	1178910	NA	american lobster	44.61882	-67.74445	Maine	New England
1996	ADDISON	ME	scallop, sea	9704	80837	66560	NA	sea scallop	44.61882	-67.74445	Maine	New England

How it works: Confidential landings data was received along with the vessel trip reports as an Excel spreadsheet. The first 9 lines contained metadata as provided by GARFO. As such, this function uses readxl to read in the file and requires a skip argument to skip the first few rows that contain metadata. SwimmeR is used to clean the species names and tidygeocoder to geocode the ports associated with landings.

Marine Recreational Information Program

Directed trips

mrip_directed_trips <- pull_mrip_directed_trips()

year	wave	st	sub_reg	mode_fx	area_x	species	Trips	SE	PSE	state	mode	area
2018	1	10	5	3	1	striped bass	23379.88	8886.435	38.0	DE	Shore	State Territorial Seas (Ocean<=3 mi excluding Inland)
2018	1	10	5	3	5	striped bass	169377.67	40817.095	24.1	DE	Shore	Inland
2018	1	10	5	7	1	striped bass	44944.57	38431.208	85.5	DE	Private & Rental	State Territorial Seas (Ocean<=3 mi excluding Inland)

How it works: The function to pull in and clean the MRIP directed trips is arguably the most complex of the data cleaning functions. The data is hosted online and is downloadable as a ZIP file for individual years. This function pulls in the zipped files, unzips them and saves them locally in intermediate files, which are required for the trip estimation code. Once the data has been pulled in and saved to the intermediary files, the trip estimates are calculated using Gary Nelson’s MRIP directed trip estimate R template. This code estimates the number of directed trips by species, wave, domain, trip type and year.

For this analyses, we used all waves, primary target trip type, all states along the east coast, and calculate estimates for Atlantic croaker, Atlantic mackerel, black sea bass, blueline tilefish, bluefish, gray triggerfish, king mackerel, Spanish mackerel, striped bass, summer flounder, scup, spiny dogfish, goosefish (monkfish) and tilefish (golden tilefish).

Catch estimates

mrip_catch <- pull_mrip_catch()

estimate_status	year	coast	region	state	common_name	sci_name	harvest_a_b1_numbers	pse_harvest_a_b1_numbers	harvest_a_b1_weight_lb	pse_harvest_a_b1_weight	released_alive_b2_numbers	released_alive_b2_lower_95_percent_confidence_limit	released_alive_b2_upper_95_percent_confidence_limit	total_catch_a_b1_b2_numbers	total_catch_a_b1_b2_lower_95_percent_confidence_limit	total_catch_a_b1_b2_upper_95_percent_confidence_limit	total_catch_pse	comname
Final	2024	Atlantic Coast	Mid Atlantic	Delaware	rudderfish, banded	Seriola zonata	0	0.0	0	0.0	23	0	56	23	0	56	73.9	banded rudderfish
Final	2024	Atlantic Coast	Mid Atlantic	Delaware	shark, dogfish, spiny	Squalus acanthias	0	0.0	0	0.0	35430	6611	64249	35430	6611	64249	41.5	spiny dogfish
Final	2024	Atlantic Coast	Mid Atlantic	Delaware	bass, black sea	Centropristis striata	105763	28.8	130995	28.1	1008441	290955	1725926	1114203	395720	1832686	32.9	black sea bass

How it works: Catch estimates are publically available on the ACCSP Data Warehouse and downloaded as a CSV File. This function reads in the CSV and cleans the species names to be uniform with other data sets.

Plotting

Once the data has been pulled and saved to the local environment, each data set has a respective plot_() or map_() function. More about those functions here.