Mobilising marine biodiversity data: a new malacological dataset of Italian records (Mollusca)

Occurrence
Latest version published by Museo di Zoologia (MZUR) - Sapienza University of Rome on Dec 6, 2024 Museo di Zoologia (MZUR) - Sapienza University of Rome

Download the latest version of this resource data as a Darwin Core Archive (DwC-A) or the resource metadata as EML or RTF:

Data as a DwC-A file download 44,096 records in English (1 MB) - Update frequency: unknown
Metadata as an EML file download in English (24 KB)
Metadata as an RTF file download in English (14 KB)

Description

The location and palaeoceanographic history of the Mediterranean Sea make it a biodiversity hotspot, prompting extensive studies in this region. However, despite the marine biodiversity of this area is apparently widely studied, a large amount of distributional data for Mediterranean taxa is still unpublished or scattered in various sources and formats, causing severe limitations to their potential reuse. This emerges as a particularly thorny issue for highly biodiverse and neglected taxa, such as invertebrates. The mobilisation of these frozen data through a process of standardisation and georeferencing could potentially support biodiversity research and conservation. The aim of this work is to provide a standardised pipeline to integrate these dispersed data, focusing on the Italian waters of the Mediterranean Sea and using molluscs as target taxa. Data were gathered from two main sources: published literature and Natural History Collections. The harmonisation process involved three key steps: 1) terminology and structure standardisation, 2) taxonomy updating and 3) georeferencing. Our efforts yielded over 44000 standardised records of mollusc species from Italian seawaters. These records encompassed primary biodiversity data from newly digitised specimens owned by 11 different institutions and private collectors, as well as secondary biodiversity data extracted from 311 published studies.

Data Records

The data in this occurrence resource has been published as a Darwin Core Archive (DwC-A), which is a standardized format for sharing biodiversity data as a set of one or more data tables. The core data table contains 44,096 records.

This IPT archives the data and thus serves as the data repository. The data and resource metadata are available for download in the downloads section. The versions table lists other versions of the resource that have been made publicly available and allows tracking changes made to the resource over time.

Versions

The table below shows only published versions of the resource that are publicly accessible.

How to cite

Researchers should cite this work as follows:

Giannini A (2024). Mobilising marine biodiversity data: a new malacological dataset of Italian records (Mollusca). Version 1.0. Museo di Zoologia (MZUR) - Sapienza University of Rome. Occurrence dataset. https://cloud.gbif.org/eca/resource?r=mzur_sap_zoo_01&v=1.0

Rights

Researchers should respect the following rights statement:

The publisher and rights holder of this work is Museo di Zoologia (MZUR) - Sapienza University of Rome. This work is licensed under a Creative Commons Attribution Non Commercial (CC-BY-NC 4.0) License.

GBIF Registration

This resource has been registered with GBIF, and assigned the following GBIF UUID: e0370da7-b32b-414c-8a61-1d86125075f3.  Museo di Zoologia (MZUR) - Sapienza University of Rome publishes this resource, and is itself registered in GBIF as a data publisher endorsed by Participant Node Managers Committee.

Keywords

Occurrence; Marine; Mollusca; Italy

Contacts

Arianna Giannini
  • Metadata Provider
  • Originator
  • Point Of Contact
  • PhD Student
Sapienza University of Rome
00185 Roma
RM
IT
Caterina Giovinazzo
  • Curator
Polo Museale Sapienza
00185 Roma
RM
IT

Geographic Coverage

Collected data occurred within the Italian Exclusive Economic Zone in the Mediterranean Sea.

Bounding Coordinates South West [35.35, 7.605], North East [45.784, 18.795]

Taxonomic Coverage

The dataset includes 44096 occurrences of 1513 Italian marine mollusc species.

Class Bivalvia, Monoplacophora, Gastropoda, Scaphopoda, Polyplacophora, Cephalopoda
Order Nudibranchia, Cardiida, Ellobiida, Pectinida, Gadilida, Neopilinida, Chitonida, Carditida, Aplysiida, Lepidopleurida, Sepiida, Dentaliida, Oegopsida, Callochitonida, Myida, Solemyida, Lepetellida, Systellommatophora, Cocculinida, Caenogastropoda incertae sedis, Arcida, Mytilida, Trochida, Nuculanida, Siphonariida, Myopsida, Ostreida, Neogastropoda, Cycloneritida, Umbraculida, Limida, Cephalaspidea, Galeommatida, Venerida, Littorinimorpha, Pleurobranchida, Seguenziida, Runcinida, Adapedonta, Pteropoda, Gastrochaenida, Nuculida, Lucinida, Octopoda

Project Data

No Description available

Title THE NATIONAL CHECKLIST OF ITALIAN FAUNA - DEVELOPMENT OF A MODERN DATABASE

The personnel involved in the project:

Marco Oliverio

Sampling Methods

Data were gathered from two main sources: literature and Natural History Collections (NHCs). To collect literature data, a comprehensive search was performed on the public databases Scopus and Web of Science. In addition to this, we also searched data from journals specialised on mediterranean marine fauna, namely Iberus and all the volumes of both journals of the Italian Society of Malacology (Società Italiana di Malacologia, SIM): Bollettino Malacologico and Alleryana. Since until the publication of the Checklist of the Italian Fauna no unified standard existed for Italian molluscan taxonomy and nomenclature - and verifying the accuracy of identifications reported in literature would have been difficult without direct check of the actual specimens - the literature search was restricted to publications issued after the first edition of the Checklist of the Italian Fauna. Species distribution information published in various formats (e.g. data tables in supplementary materials or within the paper, species lists, statements reporting the species occurrence) were considered as potential raw data. Paper with data already published in public databases were excluded. In order to avoid collecting the same record several times, only papers with new data were considered (i.e. new data derived from a dedicated sampling or non-new data published for the first time). Data from NHCs were collected by direct request to private collectors and institutions. From both sources, records were included in the dataset if at least the occurrence locality and a taxonomic identification at the genus level (or more specific) were stated.

Study Extent The present work aims at collecting and making usable in the form of point-occurrences the distributional data of marine mollusc species reported in Italy, by integrating via harmonisation and georeferencing processes both primary (i.e. newly digitised specimen from public and private Natural History Collections) and secondary biodiversity data (i.e. non databased spatial information of species reported in publicly-accessible papers).

Method step description:

  1. 1. Firstly, data were merged and formatted in a Darwin Core scheme, using Biodiversity Data Cleaning toolkit package in R. 2. With the same package, a first filter was performed to clean the dataset from duplicates and records lacking essential information (i.e. identification or locality/coordinates). Then, data were manually filtered to retrieve records that were: out of scope (i.e. occurrences outside Italian Marine Exclusive Economic Zone, fossils, non-marine species), too vague (i.e. broad locality, specimens with a higher level of identification than the genus), or dubious (dubious locality, ambiguous and/or unclear identification). 3. Taxonomy was aligned to the one proposed by the World Register of Marine Species (WoRMS Editorial Board 2024) using the taxon-match Life Watch webservice, also extracting the WoRMS Life Science Identifiers for each valid scientific name to trace as far as possible rehashes of taxonomy, which in marine molluscs are quite common, especially through molecular evidence. 4. The remaining dubious taxonomy that was not automatically validated was checked manually and then submitted to experts, which resulted in the removal of other records with dubious identification. 5. Open Nomenclature qualifiers were used to set uncertainty and provisional statuses for taxonomic identifications. 6. Subsequently, records were classified in 7 different groups based on the type of the geographic information they had, in order to georeference them by the most appropriate method. Georeferencing was performed following the point-radius method, using GEOLocate web-based collaborative client and QGIS. Each final processed record has associated coordinates expressed in WGS84 decimal degrees and an uncertainty measure in metres. GEBCO_2022 global terrain model was used to georeference depth data correctly. 7. During the georeferencing process it was possible to remove other data occurred outside study boundaries. We then excluded records with >5000 m of uncertainty radius. 8. As raw temporal data from NHCs arrived in various formats, this information was handled with the R package lubridate and converted to ISO 8601 format. The temporal information collected spans various degrees of resolution, from the exact date to time ranges between years. We decided to mantain also records without temporal information. No hourly data were collected.

Additional Metadata