Difference between revisions of "Semantic cluster"
Julien.barde (Talk | contribs) (→Introduction and Background (The Problems)) |
Julien.barde (Talk | contribs) (→Appendix D - Documents) |
||
Line 151: | Line 151: | ||
+ | |||
+ | |||
+ | |||
+ | '''TCOM Documents''' | ||
+ | |||
+ | * OGC/ISO Publishing guidelines for Data and Services Providers. Use Cases and links with the Statistical Cluster (and VREs) and Semantic Cluster (Tuna Atlas fact sheets and indicators) ''TCOM-4 Oostende, Belgium 23-25 January 2013'' at: http://bscw.research-infrastructures.eu/bscw/bscw.cgi/d275308/Geospatial_and_semantic.pdf | ||
+ | * T10.4-Semantic Data Analysis FORTH 4th TCOM.pdf ''TCOM-4 Oostende, Belgium 23-25 January 2013'' [http://dlib.sns.it/bscw/bscw.cgi/d275300/T10.4-Semantic_Data_Analysis_FORTH_4th_TCOM.pdf] | ||
+ | * T10.4-FLOD initiative ''TCOM-4 Oostende, Belgium 23-25 January 2013'' [http://dlib.sns.it/bscw/bscw.cgi/d275377/FLOD%20initiative.pptx] | ||
== Appendix E - Other == | == Appendix E - Other == |
Revision as of 12:48, 15 March 2013
The main purpose of the Cluster work plan (template here) is to provide the iMarine Board with a management tool usable as a framework for planning activities, and that can serve as a guide for carrying out that work. The scope is thus the interface between the Board and the project's Work Packages activities. After drafting, a work plan needs approval from the iMarine Board, following the Board procedures.
Executive Summary
The iMarine Semantic Cluster is maintaining and promoting a Work Plan (this document) aimed at:
* organizing collections of requirements gathered from the iMarine Business Cases * providing recommendations for the implementation of the iMarine infrastructure.
The requirements are inputs for the cluster, from iMarine Business Cases that are grouped as follows:
* Support to regional (Africa) LME pelagic EAF community [1] * the FAO deep seas fisheries programme * and the UN EAF Ecosystem Approach to fisheries
The recommendations are outputs from the cluster, primarily intended for the iMarine Board, the iMarine project partners (Work Packages) and the Communities of Practice (CoPCommunity of Practice.) identified within the Ecosystem Approach. They are aimed at releasing infrastructure services such as:
* setting up ontologies from controlled vocabularies of the domain: species taxonomy, fishing vessels and gears codes (FAO, DG-MARE code lists, )... * creation of Linked Open Data through enrichment of Metadata with URIs of ontologies (TLO, Ecoscope, FLOD, WORMS): bibliographic references, OGC metadata (data sources and related services including processes), EML metadata, .pdf / . doc files * workflow for massive RDF generation, storage and publication (triple store, SPARQL endpoint, OpenSearch). * seamless access to metadata catalogues through search engines based on ontologies
Such Infrastructure Services can be used by the iMarine eScience services (VREs & Apps): species manager, geoexplorer, iMarine search engine.
Introduction and Background (The Problems)
Currently, some datasets are freely available (GBIF, OBIS, INSPIRE..) but difficult to retrieve as related metadata are heterogeneous. Indeed the name of creators and other tags used to annotate these resources with related entities of the domain (species, fishing gears, fisheries..) are rarely using the same terms. Data discovery is thus complicated because users have to use synonyms for the same concepts in multiple languages to retrieve the datasets. Ontologies can help in matching terms and improving data discovery.
Semantic Web and ontologies enable data producers to create richer metadata. Usual metadata are using XML schema with literals as values for tags (like keywords, persons). This is the case for Dublin Core metadata, OGC metadata, EML metadata. These XML metadata with literals can be transformed in RDF metadata with URIs of ontologies. This can be achieved programmatically with text mining applications.
However, most of all, the main issue is the lack of ontologies for the domain of Ecosystem Approach to Marine Resources. Many initiatives have been dealing with related sub-domains:
- species:
- fisheries sciences: Neon with FAO [6]
On top of these ontologies, there is a need to built a new top-level ontology which reuses parts of existing ones (including those for information resources: Dublin Core, FOAF, Dclite4g [7], Genesi-dec [8]..).
Such ontologies can be used to set up knowldedge bases by instianting underlying classes and properties. Indeed, concepts are not only URIs to annotate information resources but are made of a set of properties indicating the relationships between entities of the domain: which species is predator of these species, which fishing gear are targeting these species, where these vessels are fishing... Knowledge bases can thus be used to set up Web portals summarizing some knowledge about entities: fact sheets about species, fishing gears, ecosystems, fisheries..
Automated fact sheet generation is a key issue in iMarine if we consider that a lot of systems have set up fact sheets:
- Worms Yellowfin Tuna fact sheet [9]
- FIRMS Yellowfin Tuna fact sheet [10]
- Fishbase Yellowfin Tuna fact sheet [11]
- Encyclopedia Of Life Yellowfin Tuna fact sheet [12]
- GBIF Yellowfin Tuna fact sheet [13]
Being able to generate such fact sheets directly from RDF requires the content of underlying information systems to be made available in RDF. To achieve this goal, iMarine VREVirtual Research Environment. and apps can help. Indeed, applications like "species manager" can combine information from different sources (OBIS, WORMS, GBIF, Fishbase...) and export the resulting mapping in RDF (compliant with TLO).
Other domains face similar issues and research projetcs like agInfra suggest methods and tools that have to be taken into account in the framework of iMarine.
Goals and Objectives (The Outputs)
Outputs of the cluster are Roadmaps, Tradeoff analysis and Guidelines for the development, deployment and maintenance of infrastructure services involving semantic resources and technology, such as:
- publication of species manager results (code mapping / reconcialiation) VREVirtual Research Environment. with RDF (based on Top Level Ontology Schema)
- publication of iMarine geonetwork metadata (about data sources and related services: WMSSee Workload Management System or Web Mapping Service. / WFSWeb Feature Service/ WCSWeb Coverage Service/ WPS...) through RDF (based on GENESI-DEC Schema)
- RDF generation from various types of information resources (Web Pages, OGC metadata / CSW URL, .pdf /. doc files, bibliographic references..)
Such Infrastructure Services are needed by the iMarine eScience services (VREs & Apps) and other web service endpoints.
A validation process aims at matching the cluster outputs with 'consuming' eScience services like these ones:
- a VREVirtual Research Environment. to provide GUIs to facilitate RDF generation through iMarine Tagger
- a VREVirtual Research Environment. to provide a search engine for iMarine enabling seamless access to different metadata catalogues (iMarine native metadata element set, OGC, publications, pictures...)
- Smartfish Web portal
- Fact sheet generator (e.g. Tuna Atlas Use Case)
Resources and Constraints (The Inputs)
The Business Cases requirements are inputs for the cluster, they come from 3 Business Cases that are grouped as follows:
* Smartfish * Tuna Atlas
Other inputs:
- RDF sources for domain entities: FAO FLOD (species, vessels, areas and related properties), IRD Ecoscope (species, vessels, ecosystems and related properties), WORMS (taxon ranks and related properties), Species manager VREVirtual Research Environment. (species and codes).
- RDF sources for information resources metadata: FAO FLOD (publications, ??), IRD Ecoscope (pictures, databases, publications, people...), iMarine geonetwork
Strategy and Actions (from Inputs to Outputs)
Another Wiki page is dedicated to Semantic cluster achievements [14] related to iMarine Board Work Plan [15].
From the strenghts and skills of the iMarine partners contributing to the Semantic Cluster, the following action plans have been conducted or are underway:
- Leveraging the FLOD and Ecoscope knowledge bases,
- Implementing SPARQL enpoints,
- Implementing OpenSearch,
- Implementing new schema for RDF metadata (GENESI-DEC)
- use FORTH search engine (xSearch) on top of FLOD and Ecoscope knowledge bases (including OpenSearch for results and SPARQL enpoints for clustering),
- use FORTH entity / text mining application with FLOD and Ecoscope to highlight Web Pages,
- use FORTH entity / text mining to annotate new kinds of information resources (bibliographic references, OGC metadata...)
For each of them, it is envisioned (by January 2013) to review and benchmark their added-value accordingly to the following iMarine standard review:
- Who are the Users
- Who are the co-funding partners
- What are the iMarine infrastructure resources involved
- What are the outcomes that do match the iMarine Description of Work
- How do they fit in the EA-CoPCommunity of Practice. business cases
- How do they contribute to the sustainability of an EA-CoPCommunity of Practice.
- How far are they re-usable with clear benefits to EA-CoPCommunity of Practice. representatives, and proven compatibility with EA-CoPCommunity of Practice. resources
- How far are they consistent with EC regulations/strategies such as open data strategy for Europe [16].
Cluster Participants and Roles
- IRD:
- provides an ontology about domain entities and related information resources metadata,
- provides expertise about the domain (Ecosystem Approach to Marine Resources) with underlying research laboratory
- FAO:
- provides an ontology which deals with entities of the domain (vessel, gear, linneantaxonomy, port, flagstate, area: sea, eez, statisticaldivision, rfb..),
- provides Linked Open Data (publications) which are annotated with FLOD ontologies URIs
- FORTH:
- provides expertise in setting up ontologies and work on TLO [17]
- provides tools to annotate information ressources and discover them through search engine exploiting ontologies (for clustering results...)
Appendix A - Resources
- Wiki page about Top Level Ontology / TLO [18]
- Ongoing version of TLO [19]
- Previous version of TLO [20]
- FORTH xSearch [21]
- FORTH tagger [22]
- Ecoscope fact sheet example [23]
Appendix B - Budget
Appendix C - Schedule
The Semantic Cluster aligns its work plan to its primary 'customer' milestones, that are the planned iMarine Board meetings, appointed through the life-time of the iMarine project:
- Semester 1 (Nov 2011 - Apr. 2012);
- Mobilization phase: identification of opportunities for collaboration and technologies
- Semantic Cluster support:
- Semester 2 (May 2012 - Oct. 2012);
- Stabilization phase: validation of opportunities and definition of the technology scope
- Semantic Cluster support:
- Semester 3 (Nov 2012 - Apr. 2013);
- Experimentation phase: with technologies, and with expansion of the EA-CoPCommunity of Practice. user base
- Semantic Cluster support:
- Semester 4 (May 2013 - Oct. 2013);
- Validation phase: collaboration structures and EA-CoPCommunity of Practice. requirements consolidation
- Semantic Cluster support:
- Semester 5 (Nov 2013 - Apr. 2014);
- Exploitation phase: operations through EA-CoPCommunity of Practice. collaboration frameworks
- Semantic Cluster support:
Appendix D - Documents
TCOM Documents
- OGC/ISO Publishing guidelines for Data and Services Providers. Use Cases and links with the Statistical Cluster (and VREs) and Semantic Cluster (Tuna Atlas fact sheets and indicators) TCOM-4 Oostende, Belgium 23-25 January 2013 at: http://bscw.research-infrastructures.eu/bscw/bscw.cgi/d275308/Geospatial_and_semantic.pdf
- T10.4-Semantic Data Analysis FORTH 4th TCOM.pdf TCOM-4 Oostende, Belgium 23-25 January 2013 [24]
- T10.4-FLOD initiative TCOM-4 Oostende, Belgium 23-25 January 2013 [25]
Appendix E - Other
iMarine Technical Guidelines
- Publishing guidelines for Data and Services Providers [26]