Metadata standards

From D4Science Wiki
Jump to: navigation, search

Here follows relevant information and discussions for the identification of sets of Metadata which could be supported by iMarine. A specific subset of Metadata, the Business metadata, should be identified for supporting the implementation of the EA-CoP Data Access and Sharing Policies.

The iMarine web infrastructure can support fully or partially different types of metadata according to requirements. I.e. standard widely used among EA-CoPCommunity of Practice. and indispensable for optimized use of the iMarine infrastructure and its many data management and processing capacities; Open Source community committed to support with software components the further development of this standard.


Dublin Core (DC)

The Dublin Core® Metadata Initiative, or "DCMI", is an open organization supporting innovation in metadata design and best practices across the metadata ecology.
See more at http://dublincore.org

Textual documents, such as articles, journals, papers, etc. are described using Dublin Core.

Dublin Core and Business metadata

The EA-CoP Data Access and Sharing Policies document contains a proposal for associating all shared datasets in iMarine with a descriptive and standard set of metadata, the business metadata.

  • Which should be the most appropriate format?
  • Which standards can iMarine support?


The Dublin Core vocabulary is a potential candidate for supporting the Business metadata in iMarine.

Here follows an initial proposal for possible utilization of DC elements:

  • Owner* = dcterms:RightsHolder
  • Context* = dcterms:Collection and/or dcterms:Dataset Authorship (Context or Subject area gives the scope in which a content is positioned. As a data collection, it indicates to which aggregation of resources the content is belonging to.)
  • Author* = dc:Creator
  • Title* = dc:Title
  • Publisher = dc:Publisher
  • Creation date = dcterms: Created
  • Last update date* = dc:Date
  • Expiry date = dcterms:Valid
  • Contact
  • Copyright licenses = dc:Rights and/or dcterms:AccessRights and/or dcterms:RightsStatement and/or dcterms:License etc. (Rights management, Creative Commons License type or other licenses)
  • Content description = dc:Description (e.g. Data aggregation level)
  • Spatial Scale = dcterms:Spatial (Spatial characteristics of the resource. Geographic level at which the content is applied)
  • Coverage = dcterms:Coverage and/or dc:Subject (Geographical coverage, Topic coverage, etc.)
  • Language* = dc:Language
  • Custom bibliographic citation = dcterms:bibliographicCitation
  • Media type = dcterms:MediaType
  • Identifier = dc:Identifier (e.g. URL of the resource)

"*" Mandatory metadata

Darwin Core (DwC)

Darwin Core matadata http://rs.tdwg.org/dwc/ are supported by iMarine web infrastructure for handling information related to taxa, their occurrence in nature as documented by observations, specimens, samples, and related information. As per DwC definition ".. It is meant to provide a stable standard reference for sharing information on biological diversity. As a glossary of terms, the Darwin Core is meant to provide stable semantic definitions with the goal of being maximally reusable in a variety of contexts...".

Biological and ecological data are made available in Darwin Core. DwC is a de-facto standard to represent data and metadata; Taxonomic information are made available in Darwin Core Archive format. DwCA is a de-facto standard to deliver archives of data expressed in DwC;

Geographic information (ISO/TC211, OGC)

Geo-referenced data are described using ISO 19115/19119 and made available through the OGC protocols: WMSSee Workload Management System or Web Mapping Service., WCSWeb Coverage Service, WFSWeb Feature Service, etc. This potentially target all data having a geographic dimension, including Biological and ecological data that can be also enriched with ISO 19115/19119 geographic metadata if they include a geographic coverage.

ISO standards

Three main types of geographic abstract metadata (approved or draft) ISO/TC 211 Standards, can be listed:

  • ISO 19115:2003 / 19115-1:2014 - Dataset Metadata
  • ISO 19119:2005 - Service metadata
  • ISO 19110:2005 - Feature Cataloging

Abstract specifications

ISO 19115:2003 - Dataset Metadata

The ISO 19115:2003 standard is the metadata standard approved by OGC, and is composed by 2 parts:

  • Part 1 (ISO 19115-1): base ISO metadata standard for the description of geographic information and services. A revision of this standard was recently released by ISO: ISO 19115-1:2014 and needs to be taken into consideration.
  • Part 2 (ISO 19115-2): extension for imagery and gridded data & instrument-based data collection. These extensions also include improved descriptions of lineage and processing information.

The ISO 19115:2003 is structured by a set of metadata packages:

  • Entity Set Information: main metadata package that contains information such as identifiers (file, parent), characterSet, language, hierarchical level, main contact (party in charge of the metadata), metadata standard name/version
  • Identification Information: package that describes the resource described in the metadata. Part of this package is specialized for the data or service identification (ISO 19119). Key elements are: title, date, abstract purpose, thesaurus & asociated keywords, data use & access limitation and constraints, extent (geographic, temporal, vertical)
  • Constraints Information: package required for managing rights to information including restrictions applied to the resource access and/or use. Can apply to both Entity Set (metadata) and Identification (resource)
  • Data Quality Information: package required to give information on the quality of the resource.
  • Maintenance Information: package that describes the maintenance and update procedure applied to the resource. Can apply to both Entity Set (metadata) and Identification (resource)
  • Spatial Representation Information: package that aims to describe the spatial representation of the geographic information, either for vector or grid-based representation
  • Reference System Information: package that describes the spatial and temporal reference systems used in the described resource
  • Content Information: package that aims to describe the content of resource either the feature catalogue de (for vector data) or coverage (for grid data). Related to the ISO 19110 standard specification.
  • Portrayal catalogue information: information about the portrayal catalogue(s) used to display the resource
  • Distribution Information: package required to access the resource. This package especially aims to distribute the resources by pointing to the OGC Web-Service resource.
  • Metadata extension Information: package to handle extented metadata elements
  • Application schema information: package defining the application schema used
ISO 19119:2005 - Service metadata

The describes the web-services such as the OGC Web-Services (OWS), usable in association with the ISO 19115 standard.

ISO 19110:2005 - Feature Cataloging

The ISO 19110 standard is aimed to address the description of feature types / coverage usable in association with the ISO 19115 standard.

Implementations

The ISO 19139 standard defines the XML schema implementation of the above abstract metadata standards.

OGC standards

The OGC Web-Service GetCapabilities is widely used as reference for service metadata & description of layers, as it can provide lot of information if enough enriched. One key element worth mentioned is the service/dataset relationship implemented through the MetadataURL links (analogy with the ISO 19119 service "operatesOn" relationship).

In addition, the OGC standard AuthorityURL and identifiers GetCapabilities elements are assets as they provide a place for handling code/identifier mappings from different authoritative Information Systems. An example of mappings that can be handled in the GetCapabilities is the mapping between metadataURL (from a metadata catalogue) and a coded entity (from a Linked Open Data), that can be used to enrich Linked Open Data with geographic references.

Mapping with other metadata standards

SDMX

Statistical Data are made available in SDMX that includes specific metadata for agencies, code lists, and datasets.

  • What SDMX can hold in terms of Metadata?
  • Which fields of SDMX can be mapped to Business Metadata and how?

FLUX

Resources