IMIS

Publications | Institutes | Persons | Datasets | Projects | Maps
[ report an error in this record ]basket (0): add | show Print this page

An automated workflow to assess completeness and curate GenBank for environmental DNA metabarcoding: the marine fish assemblage as case study
Claver, C.; Canals, O.; de Amézaga, L.G.; Mendibil, I.; Rodriguez-Ezpeleta, N. (2023). An automated workflow to assess completeness and curate GenBank for environmental DNA metabarcoding: the marine fish assemblage as case study. Environmental DNA 5(4): 634-647. https://dx.doi.org/10.1002/edn3.433
In: Environmental DNA. John Wiley & Sons: Hoboken. e-ISSN 2637-4943, more
Peer reviewed article  

Available in  Authors 

Keyword
    Marine/Coastal

Authors  Top 
  • Claver, C.
  • Canals, O.
  • de Amézaga, L.G.
  • Mendibil, I.
  • Rodriguez-Ezpeleta, N.

Abstract
    To successfully implement environmental DNA-based (eDNA) diversity monitoring, the completeness and accuracy of reference databases used for taxonomic assignment of eDNA sequences are among the challenges to be tackled. Here, we have developed a workflow that evaluates the current status of GenBank for marine fishes. For a given combination of species and barcodes, a gap analysis is performed and potentially erroneous sequences are identified. Our gap analysis based on the four most used genes (cytochrome c oxidase subunit 1, 12S rRNA, 16S rRNA, and cytochrome b) for fish eDNA metabarcoding found that COI, the universal choice for metazoans, is the gene covering the highest number of Northeast Atlantic marine fishes (70%), while 12S rRNA, the preferred region for fish-targeting studies, only covers about 50% of the species. The presence of too close and too distant barcode sequences as expected by their taxonomic classification confirms the existence of erroneous sequences in GenBank that our workflow can detect and eliminate. Comparing taxonomic assignments of real marine eDNA samples with raw and clean reference databases for the most used 12S rRNA barcodes (teleo and MiFish), we confirmed that both barcodes perform differently and demonstrated that the application of the database cleaning workflow can result in drastic changes in community composition. Besides providing a tool for reference database curation, this study confirms the need to increase 12S rRNA reference sequences for European marine fishes and evidences the dangers of taxonomic assignments by directly querying GenBank. We have developed a workflow that evaluates the current status of GenBank for marine fishes. For a given combination of species and barcodes, a gap analysis is performed and potentially erroneous sequences are identified.

All data in the Integrated Marine Information System (IMIS) is subject to the VLIZ privacy policy Top | Authors