| year | trip_type | lat | lon | species_name | port_name | state_abb | state_full | kept | discarded |
|---|---|---|---|---|---|---|---|---|---|
| 1996 | COMMERCIAL | 41.67694 | -69.59028 | FLOUNDER, WINTER / BLACKBACK | CHATHAM | MA | Massachusetts | 350 | 0 |
| 1996 | COMMERCIAL | 42.54417 | -70.53750 | COD | GLOUCESTER | MA | Massachusetts | 850 | 0 |
| 1996 | COMMERCIAL | 41.37472 | -71.57028 | FLOUNDER, WINTER / BLACKBACK | POINT JUDITH | RI | Rhode Island | 72 | 0 |
About
This package contains various functions for accessing (pulling) and cleaning federal fisheries dependent and independent data. Many of these data sets are confidential and require a formal data request submission to the appropriate governing body. Information regarding metadata and access for most of these data sets can be accessed via inPort, see below:
InPort is the authoritative metadata repository and data inventory platform for NOAA Fisheries and the National Ocean Service. The system supports documentation of datasets and provides tools to facilitate data discovery, public access, and responsible stewardship of scientific data within these line offices.
As such, using species.shifts will require access to this data in order to use the cleaning and plotting functions. It is strongly recommended that this data be stored locally in a centralized repository. Please refer to the package README more information on recommended work flows.
Data sets
The functions in this package correspond to six federal fisheries data sets, each with its own respective pull_() function.
NOAA Fisheries Vessel Trip Reports
NEFSC Spring-Fall Bottom Trawl Survey
GARFO Dealer Reported Landings
NOAA Fisheries Marine Recreational Information Program
Vessel trip reports
vtr <- pull_vtr(proj_path = my_path)
*Trip ID information removed from demo to maintain confidentiality.
How it works: The vessel trip report data was received as a ZIP folder with several CSV files, a separte file for each year of interest. This function reads these files and combines them into a single data frame. For consistency across each file, vtrserno is converted into a character vector, while year, sub_trip_id, calc_lat_deg, calc_lat_min, calc_lat_sec, calc_lon_deg, calc_lon_min, calc_lon_sec, and calc_inshr_area are all forced as numerics.
calc_lat_deg, calc_lat_min, calc_lat_sec and converted and combined to create a decimal degree measurement for latitude, and the same is done for longitude. From there, any states outside GARFO jurisdiction are removed (Texas, Alabama, etc.) and trips are filtered to the area between -60 and -80 degrees west and 20 degrees north.
The data that is returned includes all that is shown above, and sub_trip_id as the unique trip identifier.
Fisheries Observer
observer <- pull_observer(proj_path = my_path)
| year | negear | comname | targspec1 | targspec2 | targspec3 | lat | lon | hailwt | live_wt | kept | decade |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1989 | 50 | MONKFISH (GOOSEFISH) | 5260 | 0 | 0 | 40.91 | -70.83083 | 25 | 25 | Kept | 1980 |
| 1989 | 50 | BUTTERFISH | 5260 | 0 | 0 | 40.91 | -70.83083 | 25 | 25 | Kept | 1980 |
| 1989 | 50 | FLOUNDER, YELLOWTAIL | 5260 | 0 | 0 | 40.91 | -70.83083 | 100 | 100 | Discarded | 1980 |
*Trip ID information removed from demo to maintain confidentiality.
How it works: The fisheries observer information was received from the Northeast Fisheries Science Center as a Microsoft Excel Workbook. The data spans 1989 to 2024 and contains trip, haul and catch data from that time frame. The haul data is broken up into two separate sheets within the excel workbook. Catch data is stored within individual sheets for each year of the data. All trip data is contained within one sheet.
From the readxl package, excel_sheets() to read in the entire workbook, and map read_excel() to read in each sheet. The two haul data sheets are combined to create a singular data frame. Catch data from 1989-1995 are missing a year column, and so one is created from the link column in order to combine these years with subsequent years of data. Haul data and catch data are joined by the link3 column.
Because coordinates are recorded at different points depending on the type of gear used on the trip, an intermediate data set is created to capture the proper recorded coordinates to each trip. This list is then joined back to the combined catch-haul data set to ensure accurate coordinates based on trip type.
NEFSC Bottom Trawl
nefsc <- pull_nefsc(proj_path = my_path)
| id | svspp | comname | year | est_month | est_day | season | lat | lon | est_towdate | total_biomass_kg |
|---|---|---|---|---|---|---|---|---|---|---|
| 1.97003e+12 | 23 | winter skate | 1970 | 3 | 0 | Spring | 41.58333 | -69.46667 | 3/12/70 8:00 | 3.128 |
| 1.97003e+12 | 28 | thorny skate | 1970 | 3 | 0 | Spring | 41.58333 | -69.46667 | 3/12/70 8:45 | 0.900 |
| 1.97003e+12 | 73 | atlantic cod | 1970 | 3 | 0 | Spring | 41.58333 | -69.46667 | 3/12/70 8:45 | 8.910 |
How it works: The trawl data is received from the Northeast Fisheries Science Center as a .Rdata file pulled from an SQL database. survdat is extracting from the larger list of data and is the basis of the data used here. survdat is combined with a previously built species list to add the common names of survey species.
From there, a unique tow ID is built based on the cruise, station, and strata surveyed. A date column is added based on the estimate month and day of the survey.
Observations where there is a mismatch in abundance and biomass are revised so that when biomass is 0 but abundance is greater than 0, the biomass is recorded as 0.0001 kg and when abundance is 0 but biomass is greather than 0, the abundance is recorded as 1.
Strata and species not regularly sampled are then filtered out from the data, and the data subset to begin at 1970, when the trawl began to run consistently.
GARFO Federal Permits
permits <- pull_permits(proj_path = my_path)
| ap_year | ap_num | vp_num | pport | ppst | permit | target | category | lat | long | state_full | council |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1996 | 227829 | 221741 | GLOUCESTER | MA | lobster_1 | lobster | commercial | 42.61536 | -70.66246 | Massachusetts | New England |
| 1996 | 227830 | 127133 | GLOUCESTER | MA | lobster_1 | lobster | commercial | 42.61536 | -70.66246 | Massachusetts | New England |
| 1996 | 227831 | 231461 | RYE | NH | lobster_1 | lobster | commercial | 43.01202 | -70.77194 | New Hampshire | New England |
How it works: Federal permits data is received as a series of individual Excel spreadsheets, one for each year. This function is designed to read in multiple Excel files from a single source folder and combine them into one data frame.
Permits are grouped by species, in that each species is represented by a single column and each permits type represented by a number, letter, or some combination of both. If an individual holds multiple permits types/endorsements, a single entry is made and those values are separated by a comma within a single cell. As such, this function breaks up each column into the respective number of permits types, and renamed with the target species name and permit type (ex: black_sea_bass_1 and black_sea_bass_2). The all permit type columns are then pivoted, so that each permit application number has a corresponding permit type and value 1 for that permit.
From there, the names of target species are cleaned and added as a separate column, for later species-level grouping. An additional column is added based on whether a permit type is categorized as commercial, for-hire/charter or recreational. Because we are interested in also accessing species-level trends, permits that cover multiple species (multispecies and squid/mackerel/butterfish) have been parsed out so that each specific permit type corresponds with it’s target species.
Lastly, permits are geocoded using tidygeocoder to their reported principal port and grouped to their respective management council region. Roughly 1% of entries are lost due to misspelled principal ports.
GARFO Dealer-reported landings
landings <- pull_landings(proj_path = my_path)
| year | portnm | state | species_name | land | live | value | confidential | comname | lat | long | state_full | council |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1996 | ADDISON | ME | clam, quahog, ocean | 26418 | 217965 | 98152 | NA | ocean clam, quahog | 44.61882 | -67.74445 | Maine | New England |
| 1996 | ADDISON | ME | lobster, american | 355359 | 355359 | 1178910 | NA | american lobster | 44.61882 | -67.74445 | Maine | New England |
| 1996 | ADDISON | ME | scallop, sea | 9704 | 80837 | 66560 | NA | sea scallop | 44.61882 | -67.74445 | Maine | New England |
How it works: Confidential landings data was received along with the vessel trip reports as an Excel spreadsheet. The first 9 lines contained metadata as provided by GARFO. As such, this function uses readxl to read in the file and requires a skip argument to skip the first few rows that contain metadata. SwimmeR is used to clean the species names and tidygeocoder to geocode the ports associated with landings.
Marine Recreational Information Program
Directed trips
mrip_directed_trips <- pull_mrip_directed_trips()
| year | wave | st | sub_reg | mode_fx | area_x | species | Trips | SE | PSE | state | mode | area |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2018 | 1 | 10 | 5 | 3 | 1 | striped bass | 23379.88 | 8886.435 | 38.0 | DE | Shore | State Territorial Seas (Ocean<=3 mi excluding Inland) |
| 2018 | 1 | 10 | 5 | 3 | 5 | striped bass | 169377.67 | 40817.095 | 24.1 | DE | Shore | Inland |
| 2018 | 1 | 10 | 5 | 7 | 1 | striped bass | 44944.57 | 38431.208 | 85.5 | DE | Private & Rental | State Territorial Seas (Ocean<=3 mi excluding Inland) |
How it works: The function to pull in and clean the MRIP directed trips is arguably the most complex of the data cleaning functions. The data is hosted online and is downloadable as a ZIP file for individual years. This function pulls in the zipped files, unzips them and saves them locally in intermediate files, which are required for the trip estimation code. Once the data has been pulled in and saved to the intermediary files, the trip estimates are calculated using Gary Nelson’s MRIP directed trip estimate R template. This code estimates the number of directed trips by species, wave, domain, trip type and year.
For this analyses, we used all waves, primary target trip type, all states along the east coast, and calculate estimates for Atlantic croaker, Atlantic mackerel, black sea bass, blueline tilefish, bluefish, gray triggerfish, king mackerel, Spanish mackerel, striped bass, summer flounder, scup, spiny dogfish, goosefish (monkfish) and tilefish (golden tilefish).
Catch estimates
mrip_catch <- pull_mrip_catch()
| estimate_status | year | coast | region | state | common_name | sci_name | harvest_a_b1_numbers | pse_harvest_a_b1_numbers | harvest_a_b1_weight_lb | pse_harvest_a_b1_weight | released_alive_b2_numbers | released_alive_b2_lower_95_percent_confidence_limit | released_alive_b2_upper_95_percent_confidence_limit | total_catch_a_b1_b2_numbers | total_catch_a_b1_b2_lower_95_percent_confidence_limit | total_catch_a_b1_b2_upper_95_percent_confidence_limit | total_catch_pse | comname |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Final | 2024 | Atlantic Coast | Mid Atlantic | Delaware | rudderfish, banded | Seriola zonata | 0 | 0.0 | 0 | 0.0 | 23 | 0 | 56 | 23 | 0 | 56 | 73.9 | banded rudderfish |
| Final | 2024 | Atlantic Coast | Mid Atlantic | Delaware | shark, dogfish, spiny | Squalus acanthias | 0 | 0.0 | 0 | 0.0 | 35430 | 6611 | 64249 | 35430 | 6611 | 64249 | 41.5 | spiny dogfish |
| Final | 2024 | Atlantic Coast | Mid Atlantic | Delaware | bass, black sea | Centropristis striata | 105763 | 28.8 | 130995 | 28.1 | 1008441 | 290955 | 1725926 | 1114203 | 395720 | 1832686 | 32.9 | black sea bass |
How it works: Catch estimates are publically available on the ACCSP Data Warehouse and downloaded as a CSV File. This function reads in the CSV and cleans the species names to be uniform with other data sets.
Plotting
Once the data has been pulled and saved to the local environment, each data set has a respective plot_() or map_() function. More about those functions here.