--- title: "Retrieve Taxonomic Data From WoRMS" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Retrieve Taxonomic Data From WoRMS} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ## WoRMS The World Register of Marine Species (WoRMS) is a comprehensive database providing authoritative lists of marine organism names, managed by taxonomic experts. It combines data from the Aphia database and other sources like AlgaeBase and FishBase, offering species names, higher classifications, and additional data. WoRMS is continuously updated and maintained by taxonomists. In this tutorial, we source the R package [`worrms`](https://CRAN.R-project.org/package=worrms) to access WoRMS data for our function. Please note that the authors of `SHARK4R` are not affiliated with WoRMS. ## Getting Started #### Installation You can install the latest version of `SHARK4R` from CRAN using: ```{r, eval=FALSE} install.packages("SHARK4R") ``` Load the `SHARK4R`, `dplyr` and `ggplot2` libraries: ```{r, eval=FALSE} library(SHARK4R) library(dplyr) library(ggplot2) ``` ```{r, include=FALSE} suppressPackageStartupMessages({ library(SHARK4R) library(dplyr) library(ggplot2) }) ``` ## Retrieve Data Using SHARK4R ### Retrieve Phytoplankton Data From SHARK Phytoplankton data, including scientific names and AphiaIDs, are downloaded from SHARK. To see more download options, please visit the [Retrieve Data From SHARK](https://sharksmhi.github.io/SHARK4R/articles/retrieve_shark_data.html) tutorial. ```{r} # Retrieve all phytoplankton data from April 2015 shark_data <- get_shark_data(fromYear = 2015, toYear = 2015, months = 4, dataTypes = "Phytoplankton", verbose = FALSE) ``` ### Match Taxa Names Taxon names can be matched with the WoRMS API to retrieve Aphia IDs and corresponding taxonomic information. The `match_worms_taxa()` function incorporates retry logic to handle temporary failures, ensuring that all names are processed successfully. ```{r} # Find taxa without Aphia ID no_aphia_id <- shark_data %>% filter(is.na(aphia_id)) # Randomly select taxa with missing aphia_id taxa_names <- sample(unique(no_aphia_id$scientific_name), size = 10, replace = TRUE) # Match taxa names with WoRMS worms_records <- match_worms_taxa(unique(taxa_names), fuzzy = TRUE, best_match_only = TRUE, marine_only = TRUE, verbose = FALSE) # Print result as tibble tibble(worms_records) ``` ### Get WoRMS records from AphiaID Taxonomic records can also be retrieved using Aphia IDs, employing the same retry and error-handling logic as the `match_worms_taxa()` function. ```{r} # Randomly select ten Aphia IDs aphia_ids <- sample(unique(shark_data$aphia_id), size = 10) # Remove NAs aphia_ids <- aphia_ids[!is.na(aphia_ids)] # Retrieve records worms_records <- get_worms_records(aphia_ids, verbose = FALSE) # Print result as tibble tibble(worms_records) ``` ### Get WoRMS Taxonomy SHARK sources taxonomic information from [Dyntaxa](https://artfakta.se/), which is reflected in columns starting with `taxon_xxxxx`. Equivalent columns based on WoRMS can be retrieved using the `add_worms_taxonomy()` function. ```{r} # Retrieve taxonomic table worms_taxonomy <- add_worms_taxonomy(aphia_ids, verbose = FALSE) # Print result as tibble tibble(worms_taxonomy) # Enrich data with data from WoRMS shark_data_with_worms <- shark_data %>% left_join(worms_taxonomy, by = "aphia_id") ``` ### Retrieve WoRMS Taxonomic Hierarchies To explore the full hierarchical taxonomy records of your Aphia IDs, you can use the `get_worms_taxonomy_tree()` function. This function retrieves records for the entire taxonomic tree from WoRMS, including parent-child relationships, and can optionally fetch all descendants (e.g. species) under a genus or known synonyms. ```{r} # Retrieve taxonomic tree worms_tree <- get_worms_taxonomy_tree( aphia_ids[1], # use first id only in this example add_descendants = FALSE, # only retrieve hierarchy for given AphiaIDs add_synonyms = FALSE, # do not retrieve synonyms verbose = FALSE # suppress progress messages ) # Print as tibble for easier viewing tibble(worms_tree) ``` ### Assign Phytoplankton Groups Phytoplankton data are often categorized into major groups such as Dinoflagellates, Diatoms, Cyanobacteria, and Others. This grouping can be achieved by referencing information from WoRMS and assigning taxa to these groups based on their taxonomic classification, as demonstrated in the example below. ```{r} # Subset data from one national monitoring station nat_stations <- shark_data %>% filter(station_name %in% c("BY5 BORNHOLMSDJ")) # Randomly select one sample from the nat_stations sample <- sample(unique(nat_stations$shark_sample_id_md5), 1) # Subset the random sample shark_data_subset <- shark_data %>% filter(shark_sample_id_md5 == sample) # Assign groups by providing both scientific name and Aphia ID plankton_groups <- assign_phytoplankton_group( scientific_names = shark_data_subset$scientific_name, aphia_ids = shark_data_subset$aphia_id, verbose = FALSE) # Print result tibble(distinct(plankton_groups)) # Add plankton groups to data and summarize abundance results plankton_group_sum <- shark_data_subset %>% mutate(plankton_group = plankton_groups$plankton_group) %>% filter(parameter == "Abundance") %>% group_by(plankton_group) %>% summarise(sum_plankton_groups = sum(value, na.rm = TRUE)) # Plot a pie chart ggplot(plankton_group_sum, aes(x = "", y = sum_plankton_groups, fill = plankton_group)) + geom_col(width = 1) + coord_polar(theta = "y") + labs( title = "Phytoplankton Groups", subtitle = paste(unique(shark_data_subset$station_name), unique(shark_data_subset$sample_date)), fill = "Plankton Group" ) + theme_void() + theme(plot.background = element_rect(fill = "white", color = NA)) ``` #### Assign Custom Phytoplankton Groups You can add custom plankton groups by using the `custom_groups` parameter, allowing flexibility to categorize plankton based on specific taxonomic criteria. Please note that the order of the list matters: taxa are assigned to the last matching group. For example: Mesodinium rubrum will be excluded from the Ciliates group because it appears after Ciliates in the list in the example below. ```{r} # Define custom plankton groups using a named list custom_groups <- list( "Cryptophytes" = list(class = "Cryptophyceae"), "Green Algae" = list(class = c("Trebouxiophyceae", "Chlorophyceae", "Pyramimonadophyceae"), phylum = "Chlorophyta"), "Ciliates" = list(phylum = "Ciliophora"), "Mesodinium rubrum" = list(scientific_name = "Mesodinium rubrum"), "Dinophysis" = list(genus = "Dinophysis") ) # Assign groups by providing scientific name only, and adding custom groups plankton_groups <- assign_phytoplankton_group( scientific_names = shark_data_subset$scientific_name, custom_groups = custom_groups, verbose = FALSE) # Add new plankton groups to data and summarize abundance results plankton_custom_group_sum <- shark_data_subset %>% mutate(plankton_group = plankton_groups$plankton_group) %>% filter(parameter == "Abundance") %>% group_by(plankton_group) %>% summarise(sum_plankton_groups = sum(value, na.rm = TRUE)) # Plot a new pie chart, including the custom groups ggplot(plankton_custom_group_sum, aes(x = "", y = sum_plankton_groups, fill = plankton_group)) + geom_col(width = 1) + coord_polar(theta = "y") + labs( title = "Phytoplankton Custom Groups", subtitle = paste(unique(shark_data_subset$station_name), unique(shark_data_subset$sample_date)), fill = "Plankton Group" ) + theme_void() + theme(plot.background = element_rect(fill = "white", color = NA)) ``` --- ## Citation ```{r, echo=FALSE} # Print citation citation("SHARK4R") ```