Semantic cluster

From D4Science Wiki
Revision as of 18:05, 14 March 2013 by Julien.barde (Talk | contribs) (Introduction and Background (The Problems))

Jump to: navigation, search

The main purpose of the Cluster work plan (template here) is to provide the iMarine Board with a management tool usable as a framework for planning activities, and that can serve as a guide for carrying out that work. The scope is thus the interface between the Board and the project's Work Packages activities. After drafting, a work plan needs approval from the iMarine Board, following the Board procedures.

Executive Summary

The iMarine Semantic Cluster is maintaining and promoting a Work Plan (this document) aimed at:

   * organizing collections of requirements gathered from the iMarine Business Cases
   * providing recommendations for the implementation of the iMarine infrastructure. 

The requirements are inputs for the cluster, from iMarine Business Cases that are grouped as follows:

   * Support to regional (Africa) LME pelagic EAF community [1]
   * the FAO deep seas fisheries programme
   * and the UN EAF Ecosystem Approach to fisheries 

The recommendations are outputs from the cluster, primarily intended for the iMarine Board, the iMarine project partners (Work Packages) and the Communities of Practice (CoPCommunity of Practice.) identified within the Ecosystem Approach. They are aimed at releasing infrastructure services such as:

   * setting up ontologies from controlled vocabularies of the domain: species taxonomy, fishing vessels and gears codes (FAO, DG-MARE code lists, )...
   * creation of Linked Open Data through enrichment of Metadata with URIs of ontologies (TLO, Ecoscope, FLOD, WORMS): bibliographic references, OGC metadata (data sources and related services including processes), EML metadata, .pdf / . doc files
   * workflow for massive RDF generation, storage and publication (triple store, SPARQL endpoint, OpenSearch).
   * seamless access to metadata catalogues through search engines based on ontologies

Such Infrastructure Services can be used by the iMarine eScience services (VREs & Apps): species manager, geoexplorer, iMarine search engine.

Introduction and Background (The Problems)

Currently, some datasets are freely available (GBIF, OBIS, INSPIRE..) but difficult to retrieve as related metadata are heterogeneous. Indeed the name of creators and other tags used to annotate these resources with related entities of the domain (species, fishing gears, fisheries..) are rarely using the same terms. Data discovery is thus complicated because users have to use synonyms for the same concepts in multiple languages to retrieve the datasets. Ontologies can help in matching terms and improving data discovery.

Semantic Web and ontologies enable data producers to create richer metadata. Usual metadata are using XML schema with literals as values for tags (like keywords, persons). This is the case for Dublin Core metadata, OGC metadata, EML metadata. These XML metadata with literals can be transformed in RDF metadata with URIs of ontologies. This can be achieved programmatically with text mining applications.


However, most of all, the main issue is the lack of ontologies for the domain of Ecosystem Approach to Marine Resources. Many initiatives have been dealing with related sub-domains:

  • species:
    • Worms [2] is not a real ontology but is translated into RDF [3]
  • ecological concepts:
    • NASA Semantic Web for Earth and Environmental Terminology (SWEET ontologies [4])
    • ontologies for ecoinformatics [5]
  • fisheries sciences: Neon with FAO [6]

On top of these ontologies, there is a need to built a new top-llevl ontology which reuses parts of existing ones (including those for information resources: Dublin Core, FOAF, Dclite4g [7], Genesi-dec [8]..).

Goals and Objectives (The Outputs)

Outputs of the cluster are Roadmaps, Tradeoff analysis and Guidelines for the development, deployment and maintenance of infrastructure services involving semantic resources and technology, such as:

  • publication of species manager results (code mapping / reconcialiation) VREVirtual Research Environment. with RDF (based on Top Level Ontology Schema)
  • publication of iMarine geonetwork metadata (about data sources and related services: WMSSee Workload Management System or Web Mapping Service. / WFSWeb Feature Service/ WCSWeb Coverage Service/ WPS...) through RDF (based on GENESI-DEC Schema)
  • RDF generation from various types of information resources (Web Pages, OGC metadata / CSW URL, .pdf /. doc files, bibliographic references..)

Such Infrastructure Services are needed by the iMarine eScience services (VREs & Apps) and other web service endpoints.

A validation process aims at matching the cluster outputs with 'consuming' eScience services like these ones:

  • a VREVirtual Research Environment. to provide GUIs to facilitate RDF generation through iMarine Tagger
  • a VREVirtual Research Environment. to provide a search engine for iMarine enabling seamless access to different metadata catalogues (iMarine native metadata element set, OGC, publications, pictures...)
  • Smartfish Web portal
  • Fact sheet generator (e.g. Tuna Atlas Use Case)

Resources and Constraints (The Inputs)

The Business Cases requirements are inputs for the cluster, they come from 3 Business Cases that are grouped as follows:

   * Smartfish
   * Tuna Atlas

Other inputs:

  • RDF sources for domain entities: FAO FLOD (species, vessels, areas and related properties), IRD Ecoscope (species, vessels, ecosystems and related properties), WORMS (taxon ranks and related properties), Species manager VREVirtual Research Environment. (species and codes).
  • RDF sources for information resources metadata: FAO FLOD (publications, ??), IRD Ecoscope (pictures, databases, publications, people...), iMarine geonetwork

Strategy and Actions (from Inputs to Outputs)

Another Wiki page is dedicated to Semantic cluster achievements [9] related to iMarine Board Work Plan [10].

Cluster Participants and Roles

  • IRD :
    • provides an ontology about domain entities and related information resources metadata,
    • provides expertise about the domain (Ecosystem Approach to Marine Resources) with underlying research laboratory
  • FAO
  • FORTH

Appendix A - Resources

Appendix B - Budget

Appendix C - Schedule

Appendix D - Documents

Appendix D - Documents

Appendix E - Other

iMarine Technical Guidelines

  • Publishing guidelines for Data and Services Providers [11]