Difference between revisions of "Metadata standards"

From D4Science Wiki
Jump to: navigation, search
(Handling iMarine Business metadata with ISO/OGC metadata standards)
(ISO standards)
 
(41 intermediate revisions by 2 users not shown)
Line 7: Line 7:
  
 
==Dublin Core (DC) ==
 
==Dublin Core (DC) ==
The Dublin Core® Metadata Initiative http://dublincore.org is the potential candidate for supporting the business matadata in iMarine.
+
The Dublin Core® Metadata Initiative, or "DCMI", is an open organization supporting innovation in metadata design and best practices across the metadata ecology.
 +
<br/>
 +
See more at http://dublincore.org
 +
<br/>
 +
 
 +
Textual documents, such as articles, journals, papers, etc. are described using Dublin Core.
 +
 
 +
=== Dublin Core and Business metadata ===
 +
The [[EA-CoP_Data_Access_and_Sharing_Policies|EA-CoP Data Access and Sharing Policies]] document contains a proposal for associating all shared datasets in iMarine with a descriptive and standard set of metadata, the '''business metadata'''.
 +
 
 +
*Which should be the most appropriate format?
 +
*Which standards can iMarine support?
 +
 
 +
 
 +
The Dublin Core vocabulary is a potential candidate for supporting the Business metadata in iMarine.
 +
 
 +
Here follows an initial proposal for possible utilization of DC elements:
 +
 
 +
*Owner*  = dcterms:RightsHolder
 +
*Context*  = dcterms:Collection and/or dcterms:Dataset Authorship (Context or Subject area gives the scope in which a content is positioned. As a data collection, it indicates to which aggregation of resources the content is belonging to.)
 +
*Author*  = dc:Creator
 +
*Title*  = dc:Title
 +
*Publisher  = dc:Publisher
 +
*Creation date  = dcterms: Created
 +
*Last update date* = dc:Date
 +
*Expiry date  = dcterms:Valid
 +
*Contact
 +
*Copyright licenses = dc:Rights and/or dcterms:AccessRights and/or dcterms:RightsStatement and/or dcterms:License etc. (Rights management, Creative Commons License type or other licenses)
 +
*Content description  = dc:Description (e.g. Data aggregation level)
 +
*Spatial Scale = dcterms:Spatial (Spatial characteristics of the resource. Geographic level at which the content is applied)
 +
*Coverage = dcterms:Coverage and/or dc:Subject (Geographical coverage, Topic coverage, etc.)
 +
*Language*  = dc:Language
 +
*Custom bibliographic citation = dcterms:bibliographicCitation
 +
*Media type = dcterms:MediaType
 +
*Identifier = dc:Identifier (e.g. URL of the resource)
 +
 
 +
"*" Mandatory metadata
  
 
== Darwin Core (DwC) ==
 
== Darwin Core (DwC) ==
 
Darwin Core matadata http://rs.tdwg.org/dwc/ are supported by iMarine web infrastructure for handling information related to taxa, their occurrence in nature as documented by observations, specimens, samples, and related information. As per DwC definition ".. It is meant to provide a stable standard reference for sharing information on biological diversity. As a glossary of terms, the Darwin Core is meant to provide stable semantic definitions with the goal of being maximally reusable in a variety of contexts...".
 
Darwin Core matadata http://rs.tdwg.org/dwc/ are supported by iMarine web infrastructure for handling information related to taxa, their occurrence in nature as documented by observations, specimens, samples, and related information. As per DwC definition ".. It is meant to provide a stable standard reference for sharing information on biological diversity. As a glossary of terms, the Darwin Core is meant to provide stable semantic definitions with the goal of being maximally reusable in a variety of contexts...".
 +
 +
Biological and ecological data are made available in Darwin Core. DwC is a de-facto standard to represent data and metadata;
 +
Taxonomic information are made available in Darwin Core Archive format. DwCA is a de-facto standard to deliver archives of data expressed in DwC;
  
 
== Geographic information (ISO/TC211, OGC) ==
 
== Geographic information (ISO/TC211, OGC) ==
 +
 +
Geo-referenced data are described using ISO 19115/19119 and made available through the OGC protocols: WMS, WCS, WFS, etc. This potentially target all data having a geographic dimension, including Biological and ecological data that can be also enriched with ISO 19115/19119 geographic metadata if they include a geographic coverage.
 +
 +
=== ISO standards ===
  
 
Three main types of geographic abstract metadata (approved or draft) ISO/TC 211 Standards, can be listed:
 
Three main types of geographic abstract metadata (approved or draft) ISO/TC 211 Standards, can be listed:
* ISO 19115:2003 - Dataset Metadata
+
* ISO 19115:2003 / 19115-1:2014 - Dataset Metadata
 
* ISO 19119:2005 - Service metadata
 
* ISO 19119:2005 - Service metadata
 
* ISO 19110:2005 - Feature Cataloging
 
* ISO 19110:2005 - Feature Cataloging
  
=== Abstract specifications ===
+
==== Abstract specifications ====
  
==== ISO 19115:2003 - Dataset Metadata ====
+
===== ISO 19115:2003 - Dataset Metadata =====
  
 
The ISO 19115:2003 standard is the metadata standard approved by OGC, and is composed by 2 parts:
 
The ISO 19115:2003 standard is the metadata standard approved by OGC, and is composed by 2 parts:
* Part 1 (ISO 19115-1): base ISO metadata standard for the description of geographic information and services.
+
* Part 1 (ISO 19115-1): base ISO metadata standard for the description of geographic information and services. A revision of this standard was recently released by ISO: ISO 19115-1:2014 and needs to be taken into consideration.
 
* Part 2 (ISO 19115-2): extension for imagery and gridded data & instrument-based data collection. These extensions also include improved descriptions of lineage and processing information.
 
* Part 2 (ISO 19115-2): extension for imagery and gridded data & instrument-based data collection. These extensions also include improved descriptions of lineage and processing information.
  
 
The ISO 19115:2003 is structured by a set of metadata packages:
 
The ISO 19115:2003 is structured by a set of metadata packages:
 
* '''Entity Set Information''': main metadata package that contains information such as identifiers (file, parent), characterSet, language, hierarchical level, main contact (party in charge of the metadata), metadata standard name/version
 
* '''Entity Set Information''': main metadata package that contains information such as identifiers (file, parent), characterSet, language, hierarchical level, main contact (party in charge of the metadata), metadata standard name/version
* '''Identification Information''': package that describes the resource described in the metadata. Part of this package is specialized for the data or service identification (ISO 19119).
+
* '''Identification Information''': package that describes the resource described in the metadata. Part of this package is specialized for the data or service identification (ISO 19119). Key elements are: title, date, abstract purpose, thesaurus & asociated keywords, data use & access limitation and constraints, extent (geographic, temporal, vertical)
 
* '''Constraints Information''': package required for managing rights to information including restrictions applied to the resource access and/or use. Can apply to both Entity Set (metadata) and Identification (resource)
 
* '''Constraints Information''': package required for managing rights to information including restrictions applied to the resource access and/or use. Can apply to both Entity Set (metadata) and Identification (resource)
 
* '''Data Quality Information''': package required to give information on the quality of the resource.
 
* '''Data Quality Information''': package required to give information on the quality of the resource.
Line 41: Line 84:
 
* '''Application schema information''': package defining the application schema used
 
* '''Application schema information''': package defining the application schema used
  
==== ISO 19119:2005 - Service metadata ====
+
===== ISO 19119:2005 - Service metadata =====
  
 
The describes the web-services such as the OGC Web-Services (OWS), usable in association with the ISO 19115 standard.
 
The describes the web-services such as the OGC Web-Services (OWS), usable in association with the ISO 19115 standard.
  
==== ISO 19110:2005 - Feature Cataloging ====
+
===== ISO 19110:2005 - Feature Cataloging =====
  
 
The ISO 19110 standard is aimed to address the description of feature types / coverage usable in association with the ISO 19115 standard.
 
The ISO 19110 standard is aimed to address the description of feature types / coverage usable in association with the ISO 19115 standard.
  
=== Implementations ===
+
==== Implementations ====
  
 
The ISO 19139 standard defines the XML schema implementation of the above abstract metadata standards.
 
The ISO 19139 standard defines the XML schema implementation of the above abstract metadata standards.
 +
 +
=== OGC standards ===
 +
 +
The OGC Web-Service GetCapabilities is widely used as reference for service metadata & description of layers, as it can provide lot of information if enough enriched. One key element worth mentioned is the service/dataset relationship implemented through the MetadataURL links (analogy with the ISO 19119 service "operatesOn" relationship).
 +
 +
In addition, the OGC standard AuthorityURL and identifiers GetCapabilities elements are assets as they provide a place for handling code/identifier mappings from different authoritative Information Systems. An example of mappings that can be handled in the GetCapabilities is the mapping between metadataURL (from a metadata catalogue) and a coded entity (from a Linked Open Data), that can be used to enrich Linked Open Data with geographic references.
  
 
=== Mapping with other metadata standards ===
 
=== Mapping with other metadata standards ===
  
* Considering the [http://dublincore.org Dublin Core®] vocabulary is currently seen as reference for supporting the business metadata in iMarine, it is important to address how the different internationally-recognized domain metadata standards are mapped the Dublic Core metadata vocabulary.
+
* Considering the [http://dublincore.org Dublin Core®] vocabulary is currently seen as reference for supporting the business metadata in iMarine, it would be important to address how the different internationally-recognized domain metadata standards are mapped the Dublic Core metadata vocabulary.
 
* For more information on the mapping Dublin Core / ISO 19115, see:
 
* For more information on the mapping Dublin Core / ISO 19115, see:
 
**[https://joinup.ec.europa.eu/catalogue/asset_release/mapping-between-dublin-core-and-iso-19115-geographic-information-metadata Joinup EU initiative]
 
**[https://joinup.ec.europa.eu/catalogue/asset_release/mapping-between-dublin-core-and-iso-19115-geographic-information-metadata Joinup EU initiative]
 
**[ftp://cenftp1.cenorm.be/PUBLIC/CWAs/e-Europe/MMI-DC/cwa14857-00-2003-Nov.pdf CEN/ISSS MMI-DC workshop report]
 
**[ftp://cenftp1.cenorm.be/PUBLIC/CWAs/e-Europe/MMI-DC/cwa14857-00-2003-Nov.pdf CEN/ISSS MMI-DC workshop report]
 +
* [http://inspire.jrc.ec.europa.eu/reports/ImplementingRules/metadata/MD_IR_and_DC_state%20of%20progress.pdf INSPIRE and Dublin Core / ISO15836]
  
 
==SDMX==
 
==SDMX==
 +
Statistical Data are made available in SDMX that includes specific metadata for agencies, code lists, and datasets.
  
 
*What SDMX can hold in terms of Metadata?
 
*What SDMX can hold in terms of Metadata?
Line 67: Line 118:
 
==FLUX ==
 
==FLUX ==
  
== Handling iMarine Business Metadata through selected iMarine Metadata standards ==
+
==Resources ==
  
The [[EA-CoP_Data_Access_and_Sharing_Policies|EA-CoP Data Access and Sharing Policies]] document contains a proposal for associating all shared datasets in iMarine with a descriptive and standard set of metadata, the '''business metadata''' (i.e. ownership and context, authorship, copyright licenses, content description and main features).
+
*[[EA-CoP Data Access and Sharing Policies]]
 
+
*[[Ecosystem Approach Community of Practice: iMarine Guidelines and Best Practices]]
*Which should be the most appropriate format?
+
*[[Content citation]]
*Which standards can iMarine support?
+
*[http://www.w3.org/TR/void/#license W3C license of a dataset (from Describing Linked Datasets with the VoID Vocabulary)]

Latest revision as of 12:06, 10 July 2014

Here follows relevant information and discussions for the identification of sets of Metadata which could be supported by iMarine. A specific subset of Metadata, the Business metadata, should be identified for supporting the implementation of the EA-CoP Data Access and Sharing Policies.

The iMarine web infrastructure can support fully or partially different types of metadata according to requirements. I.e. standard widely used among EA-CoPCommunity of Practice. and indispensable for optimized use of the iMarine infrastructure and its many data management and processing capacities; Open Source community committed to support with software components the further development of this standard.


Dublin Core (DC)

The Dublin Core® Metadata Initiative, or "DCMI", is an open organization supporting innovation in metadata design and best practices across the metadata ecology.
See more at http://dublincore.org

Textual documents, such as articles, journals, papers, etc. are described using Dublin Core.

Dublin Core and Business metadata

The EA-CoP Data Access and Sharing Policies document contains a proposal for associating all shared datasets in iMarine with a descriptive and standard set of metadata, the business metadata.

  • Which should be the most appropriate format?
  • Which standards can iMarine support?


The Dublin Core vocabulary is a potential candidate for supporting the Business metadata in iMarine.

Here follows an initial proposal for possible utilization of DC elements:

  • Owner* = dcterms:RightsHolder
  • Context* = dcterms:Collection and/or dcterms:Dataset Authorship (Context or Subject area gives the scope in which a content is positioned. As a data collection, it indicates to which aggregation of resources the content is belonging to.)
  • Author* = dc:Creator
  • Title* = dc:Title
  • Publisher = dc:Publisher
  • Creation date = dcterms: Created
  • Last update date* = dc:Date
  • Expiry date = dcterms:Valid
  • Contact
  • Copyright licenses = dc:Rights and/or dcterms:AccessRights and/or dcterms:RightsStatement and/or dcterms:License etc. (Rights management, Creative Commons License type or other licenses)
  • Content description = dc:Description (e.g. Data aggregation level)
  • Spatial Scale = dcterms:Spatial (Spatial characteristics of the resource. Geographic level at which the content is applied)
  • Coverage = dcterms:Coverage and/or dc:Subject (Geographical coverage, Topic coverage, etc.)
  • Language* = dc:Language
  • Custom bibliographic citation = dcterms:bibliographicCitation
  • Media type = dcterms:MediaType
  • Identifier = dc:Identifier (e.g. URL of the resource)

"*" Mandatory metadata

Darwin Core (DwC)

Darwin Core matadata http://rs.tdwg.org/dwc/ are supported by iMarine web infrastructure for handling information related to taxa, their occurrence in nature as documented by observations, specimens, samples, and related information. As per DwC definition ".. It is meant to provide a stable standard reference for sharing information on biological diversity. As a glossary of terms, the Darwin Core is meant to provide stable semantic definitions with the goal of being maximally reusable in a variety of contexts...".

Biological and ecological data are made available in Darwin Core. DwC is a de-facto standard to represent data and metadata; Taxonomic information are made available in Darwin Core Archive format. DwCA is a de-facto standard to deliver archives of data expressed in DwC;

Geographic information (ISO/TC211, OGC)

Geo-referenced data are described using ISO 19115/19119 and made available through the OGC protocols: WMSSee Workload Management System or Web Mapping Service., WCSWeb Coverage Service, WFSWeb Feature Service, etc. This potentially target all data having a geographic dimension, including Biological and ecological data that can be also enriched with ISO 19115/19119 geographic metadata if they include a geographic coverage.

ISO standards

Three main types of geographic abstract metadata (approved or draft) ISO/TC 211 Standards, can be listed:

  • ISO 19115:2003 / 19115-1:2014 - Dataset Metadata
  • ISO 19119:2005 - Service metadata
  • ISO 19110:2005 - Feature Cataloging

Abstract specifications

ISO 19115:2003 - Dataset Metadata

The ISO 19115:2003 standard is the metadata standard approved by OGC, and is composed by 2 parts:

  • Part 1 (ISO 19115-1): base ISO metadata standard for the description of geographic information and services. A revision of this standard was recently released by ISO: ISO 19115-1:2014 and needs to be taken into consideration.
  • Part 2 (ISO 19115-2): extension for imagery and gridded data & instrument-based data collection. These extensions also include improved descriptions of lineage and processing information.

The ISO 19115:2003 is structured by a set of metadata packages:

  • Entity Set Information: main metadata package that contains information such as identifiers (file, parent), characterSet, language, hierarchical level, main contact (party in charge of the metadata), metadata standard name/version
  • Identification Information: package that describes the resource described in the metadata. Part of this package is specialized for the data or service identification (ISO 19119). Key elements are: title, date, abstract purpose, thesaurus & asociated keywords, data use & access limitation and constraints, extent (geographic, temporal, vertical)
  • Constraints Information: package required for managing rights to information including restrictions applied to the resource access and/or use. Can apply to both Entity Set (metadata) and Identification (resource)
  • Data Quality Information: package required to give information on the quality of the resource.
  • Maintenance Information: package that describes the maintenance and update procedure applied to the resource. Can apply to both Entity Set (metadata) and Identification (resource)
  • Spatial Representation Information: package that aims to describe the spatial representation of the geographic information, either for vector or grid-based representation
  • Reference System Information: package that describes the spatial and temporal reference systems used in the described resource
  • Content Information: package that aims to describe the content of resource either the feature catalogue de (for vector data) or coverage (for grid data). Related to the ISO 19110 standard specification.
  • Portrayal catalogue information: information about the portrayal catalogue(s) used to display the resource
  • Distribution Information: package required to access the resource. This package especially aims to distribute the resources by pointing to the OGC Web-Service resource.
  • Metadata extension Information: package to handle extented metadata elements
  • Application schema information: package defining the application schema used
ISO 19119:2005 - Service metadata

The describes the web-services such as the OGC Web-Services (OWS), usable in association with the ISO 19115 standard.

ISO 19110:2005 - Feature Cataloging

The ISO 19110 standard is aimed to address the description of feature types / coverage usable in association with the ISO 19115 standard.

Implementations

The ISO 19139 standard defines the XML schema implementation of the above abstract metadata standards.

OGC standards

The OGC Web-Service GetCapabilities is widely used as reference for service metadata & description of layers, as it can provide lot of information if enough enriched. One key element worth mentioned is the service/dataset relationship implemented through the MetadataURL links (analogy with the ISO 19119 service "operatesOn" relationship).

In addition, the OGC standard AuthorityURL and identifiers GetCapabilities elements are assets as they provide a place for handling code/identifier mappings from different authoritative Information Systems. An example of mappings that can be handled in the GetCapabilities is the mapping between metadataURL (from a metadata catalogue) and a coded entity (from a Linked Open Data), that can be used to enrich Linked Open Data with geographic references.

Mapping with other metadata standards

SDMX

Statistical Data are made available in SDMX that includes specific metadata for agencies, code lists, and datasets.

  • What SDMX can hold in terms of Metadata?
  • Which fields of SDMX can be mapped to Business Metadata and how?

FLUX

Resources