Difference between revisions of "Top Level Ontology"

From D4Science Wiki
Jump to: navigation, search
(Meeting Tconf of the Semantic Cluster, 13.02.2013)
(Releases)
 
(205 intermediate revisions by 4 users not shown)
Line 1: Line 1:
== Person responsible for editing/maintaining this page==
+
== Introduction ==
  
* Carlo Allocca (carlo@ics.forth.gr)
+
=== In a nutshell ===
 +
One of the main characteristics of biodiversity data is its cross-disciplinary feature and the extremely broad range of data types, structures, and semantic concepts which encompasses. Moreover, biodiversity data, especially in the marine domain, is widely distributed, with few well-established repositories or standard protocols for their archiving, access, and retrieval.  Queries like ''“Given the scientific name of a species, find its predators with the related taxon-rank classification and with the different codes that the organizations use to refer to them"'', cannot be formulated (and consequently nor answered) by any individual source. To formulate such queries we need an expressive conceptual model, while for answering them we also have to assemble pieces of information stored in different sources.
 +
To fill this gap, we have designed and implemented a top level ontology, called '''Marine Top Level Ontology''' (for short '''MarineTLO''').
  
== TLO-Development activity ==
+
=== Motivating Scenarios ===
 +
The availability of a top level ontology for the marine domain would be useful in various scenarios. Below we will describe them.
  
 +
'''For Publishing Linked Data:''' There is a trend towards publishing Linked Data; consequently a rising issue concerns the structure that is beneficial to use during such publishing. The semantic structure that will be presented can be used by the involved organizations for anticipating future needs for information integration, and thus alleviating the required effort for (post) integration.
  
=== General Description ===
+
'''For Generating Fact Sheets:''' FactSheetGenerator is an application provided by IRD aiming at providing factual knowledge about the marine domain by mashing-up relevant knowledge distributed across several data sources. Currently, FactSheetGenerator uses only ECOSCOPE and related knowledge stored in other sources (e.g., about commercial codes or taxonomic information) cannot be exploited. MarineTLO could be exploited for advancing this application, i.e., for providing more complete semantic descriptions.
  
This activity concerns with the development of a Top Level Ontology (for short TLO) that will integrate the concepts currently existing in marine-domain knowledge bases (in particular FLOD and ECOSCOPE knowledge bases). The TLO-Development activity is dived into six sub-activities (or Tasks) and related to each other as shown in the diagram in Fig 1.  
+
'''For Semantic Post-Processing of the Results of Keyword Search Queries:''' Another big challenge nowadays is how to integrate structured data with unstructured data (documents and text). The availability of harmonized structured knowledge about the marine domain can be exploited for a semantic post-processing of the search results (over dedicated or general purpose search systems). [http://wiki.i-marine.eu/index.php/XSearch XSearch] is a meta-search engine that offers semantic post-processing of search results and is able to analyze the returned results by exploiting also the availability of semantic repositories (e.g. SPARQL endpoints). Xsearch could exploit MarineTLO for providing more complete information about the identified entities.
  
 +
'''For Enabling Complex Query Services over Integrated Data:''' MarineTLO can be used as the schema for setting up integrated repositories that offer more complex query services, which cannot be supported by the individual underlying sources. In general, there are two main approaches for building and querying such repositories: the materialized integration approach (or warehouse approach), and the virtual integration (or mediator) approach (more information about these approaches can be found in '''[Tzitzikas-MTSR'13]'''). The key point is that in both cases we need a schema and MarineTLO can serve this requirement.
  
 +
== MarineTLO as a product ==
  
[[File:Pic1.png|center]]
+
We used a set of underlying sources for integrating their concepts in MarineTLO. Below we briefly describe these sources, and then describe the ontology MarineTLO and its corresponding releases.
  
=== Methodology ===
+
=== The main underlying sources ===
  
It is based on an Iterative and Incremental development approach. As such, one iteration will involve all the above tasks that are described here http://wiki.i-marine.eu/index.php/Top_Level_Ontology. All the iterations will be accurately described and TLO Modules/ Versions will be delivered in each iteration ready to be used.
+
'''Fisheries Linked Open Data:''' FLOD, created and maintained by Food and Agriculture Organization (FAO), is dedicated to create a dense network of relationships among the entities of the Fishery domains, and to programmatically serve them to semantic and traditional application environments. The FLOD content is exposed either via a public SPARQL endpoint[http://www.fao.org/figis/flod/endpoint] (suitable for semantic applications) or via a JAVA API to be embedded in consumers’ application code. Currently, the FLOD network includes entities and relationships from the domains of Marine Species, Water Areas, Land Areas, Exclusive Economic Zones, and serves software applications in the domain of statistics and GIS.
  
=== Activities scheduled with deadlines ===
+
'''ECOSCOPE Knowledge Base:''' IRD  offers a public SPARQL endpoint[http://ecoscopebc.mpl.ird.fr/joseki/ecoscope] for its knowledge base containing geographical data, pictures and information about marine ecosystems (specifically data about fishes, sharks, related persons, countries and organizations, harbors, vessels, etc.).
  
At least two iterations are needed to complete the TLO-Development activity with deadlines December 2012 and January 2013, respectively. Each iteration is planned to be monitored by opening related tickets. And, as in January there is going to be the next meeting, we (Claudio, Julien and Carlo) will discuss if we need a third or more iterations.  
+
'''WoRMS:''' The World Register of Marine Species[http://www.marinespecies.org] currently contains more than 200 thousand species, around 380 thousand species names including synonyms, and 470 thousands taxa (infraspecies to kingdoms).
  
 +
'''FishBase:''' FishBase[http://www.fishbase.org] is a global database of fish species. It is a relational database containing information about the taxonomy, geographical distribution, biometrics, population, genetic data and many more. Currently, it contains more the 32 thousand species and more than 300 thousand common names in various languages.
  
== Related Cluster ==
+
'''DBpedia:''' DBpedia[http://dbpedia.org] is a project focusing on the task of converting content from Wikipedia to structured knowledge so that Semantic Web techniques can be employed against it. At the time of writing this article, the English version of the knowledge base of DBpedia describes more than 4.5 million things, containing persons, places, works, species, etc. In our case, we are using a subset of DBpedia’s knowledge base containing only fishes (i.e. instances classified under the class <nowiki>http://dbpedia.org/ontology/Fish</nowiki>).
http://wiki.i-marine.eu/index.php/Semantic_cluster_achievements
+
  
== Related Wiki Pages ==
+
=== The Marine Top Level Ontology ===
http://wiki.i-marine.eu/index.php/XSearch
+
  
== Meeting In Progress ==
+
MarineTLO is not supposed to be a single ontology covering the entirety of what exists. It aims at being a global core model that (a) covers with suitable abstractions the domains under consideration to enable the most fundamental queries, (b) can be extended to any level of detail on demand, and (c) can adequately map and integrate data originating from distinct sources. This approach has two main benefits:
 +
* reduced effort for improving and evolving it since the focus is given on one model rather than many
 +
* reduced effort for constructing mappings since this approach avoid the pair-wise mappings between individual metadata formats and/or ontologies.
  
===Meeting In Rome, 03.11.2012 ===
+
For the development and evolution of MarineTLO we have adopted an iterative and incremental methodology comprising the following steps: (i) ontological analysis of the underlying sources, (ii) design, (iii) implementation and (iv) evaluation. The activities of each iteration has been monitored by opening the corresponding tickets. The section [[#Related_tickets]] contains a list of the tickets that have been opened. For the implementation of MarineTLO we have used OWL 2 (Web Ontology Language) while for the needs of evaluation we used the notion of competence queries.
http://wiki.i-marine.eu/index.php/03.11.2012.MeetingInRome
+
  
=== Meeting on "FLOD Ontological Analysis: First Iteration", 7.12.2012 ===
+
=== Releases ===
For the details of the first meeting, please follow the link
+
In total we have released 4 versions of the MarineTLO. Each release was able to cover various concepts for different sources. Below we report the contents, some basic information, and several links (e.g., documentation) of each version.  
http://wiki.i-marine.eu/index.php/7.12.2012-TLO_FLOD_Ontological_Analysis. This meeting is related to the ticket https://issue.imarine.research-infrastructures.eu/ticket/224#comment:15
+
  
=== Meeting In Ostende, 29.01.2013 ===
+
{| border="1" class="wikitable"
 +
|+ MarineTLO Versions
 +
! Version
 +
! Classes and Properties
 +
! Underlying Sources
 +
! Concepts covered
 +
! OWL File
 +
! Documentation
 +
! Mappings
 +
! Competence Queries
 +
! Release Date
 +
|-
 +
! Version 1
 +
| 17 classes and 8 properties
 +
|| FLOD, ECOSCOPE, WORMS
 +
|| Species, Scientific Names, Predators
 +
|| http://goo.gl/ukxmAv
 +
||
 +
|| http://goo.gl/kEfp8g
 +
|| http://goo.gl/3sMdwR
 +
|| March 2013
 +
|-
 +
! Version 2
 +
| 57 classes and 22 properties
 +
|| FLOD, ECOSCOPE, WORMS, DBpedia
 +
|| Species, Scientific Names, Predators, Authorships
 +
|| http://goo.gl/JTh8p9
 +
|| http://goo.gl/Y9bGFy
 +
|| http://goo.gl/I6STNv
 +
|| http://goo.gl/3ZNqb4
 +
|| July 2013
 +
|-
 +
! Version 3
 +
| 57 classes and 25 properties
 +
|| FLOD, ECOSCOPE, WORMS, DBpedia, Fishbase
 +
|| Species, Scientific Names, Common Names, Predators, Authorships, Ecosystems, Countries, Water Areas, Vessels, Gears, EEZ
 +
|| http://goo.gl/J15OCE
 +
|| http://goo.gl/FygSd6
 +
|| http://goo.gl/vxESUF
 +
|| http://goo.gl/yTaz7O
 +
|| October 2013
 +
|-
 +
! Version 4
 +
| 127 classes and 81 properties
 +
|| FLOD, ECOSCOPE, WORMS, DBpedia, Fishbase
 +
|| Species, Scientific Names, Common Names, Predators, Authorships, Ecosystems, Countries, Water Areas, Vessels, Gears, EEZ, Bibliography, Statistical Indicators
 +
|| http://goo.gl/Yh1Uot
 +
|| http://goo.gl/KIFY6e
 +
|| http://goo.gl/WU3xkG
 +
|| http://goo.gl/KIFY6e
 +
|| July 2014
 +
|}
  
* Agenda and Presentation, please follow the link http://bscw.research-infrastructures.eu/bscw/bscw.cgi/d276128/TLO_iMarine_FORTH_4rd_TCOMOstende.pdf
+
== References ==
  
* For the details of the Semantic Cluster, Minutes of the TLO-Activity at the 4th TCOM imarine in Ostende on 29 Jan 2013: Julien’s Scenario and TLO-Instantiating Plan/Discussion, please follow the link http://bscw.research-infrastructures.eu/bscw/bscw.cgi/d276309/Semantic%20Cluster%20and%20TLO%20ActivitiesVer1.docx
+
=== Related Links/Publications ===
 +
* MarineTLO public website http://www.ics.forth.gr/isl/MarineTLO/
 +
* MarineTLO-based Warehouse wiki page http://wiki.i-marine.eu/index.php/MarineTLO-based_warehouse
 +
* <b>[Tzitzikas-MTSR'13]</b> Y. Tzitzikas, C. Alloca, C. Bekiari, Y. Marketakis, P. Fafalios, M. Doerr, N. Minadakis, T. Patkos and L. Candela. <i>Integrating Heterogeneous and Distributed Information about Marine Species through a Top Level Ontology</i>, Proceedings of the 7th Metadata and Semantic Research Conference, MTSR'13, Thessaloniki, Greece, November 2013.
 +
* <b>[Tzitzikas-ERCIM'14]</b> Y. Tzitzikas, C. Allocca, C. Bekiari, Y. Marketakis, P. Fafalios and N. Minadakis. <i>Ontology-based Integration of Heterogeneous and Distributed Information of the Marine Domain</i>, ERCIM News 2014 (96), Special theme: Linked Open Data, January 2014
 +
* <b>[Tzitzikas-LWDM'14]</b> Y. Tzitzikas, N. Minadakis, Y. Marketakis, P. Fafalios, C. Alloca and N. Mountantonakis. <i>Quantifying the Connectivity of a Semantic Warehouse</i>, 4th International Workshop on Linked Web Data Management, LWDM'14, Athens, Greece, March 2014.
 +
* <b>[Moutantonakis-PROFILES'14]</b> M. Mountantonakis, C. Allocca, P. Fafalios, N. Minadakis, Y. Marketakis, C. Lantzaki and Y. Tzitzikas. <i>Extending VoID for Expressing the Connectivity Metrics of a Semantic Warehouse</i>, 1st International Workshop on Dataset Profiling & Federated Search for Linked Data (PROFILES'14), in conjunction with the 11th Extended Semantic Web Conference (ESWC'14), Anissaras Hersonissou, Crete, Greece, May 2014.
 +
* <b>[Tzitzikas-ESCW'14]</b> Y. Tzitzikas, N. Minadakis, Y. Marketakis, P. Fafalios, C. Alloca, N. Mountantonakis and I. Zidianaki. <i>MatWare: Constructing and Exploiting Domain Specific Warehouses by Aggregating Semantic Data</i>, 11th Extended Semantic Web Conference (ESWC'14), Anissaras Hersonissou, Crete, Greece, May 2014.
  
=== Meeting Tconf of the Semantic Cluster, 13.02.2013  ===
+
=== Related Meetings & Minutes ===
 +
* 3rd TCOM, Rome, Italy, 03.11.2012
 +
** Minutes: http://wiki.i-marine.eu/index.php/03.11.2012.MeetingInRome
 +
* FLOD Ontological analysis virtual meeting
 +
** Minutes: http://wiki.i-marine.eu/index.php/7.12.2012-TLO_FLOD_Ontological_Analysis
 +
* 4th TCOM, Ostende, Belgium, 29.01.2013
 +
** Slides: http://goo.gl/JuCrbV
 +
** Minutes: http://wiki.i-marine.eu/index.php/4th_TCom_Meeting:_25th_January_2013_Discussions_and_Notes
 +
* Semantic Cluster virtual meeting, 13.02.2013
 +
** Documents: http://goo.gl/8wbfTK, http://goo.gl/uPS4Ck
 +
** Minutes: http://wiki.i-marine.eu/index.php/13.02.2013_Semantic_Cluster
 +
* 5th TCOM, Pisa, Italy, 21.03.2013
 +
** Slides: http://goo.gl/mHWYzg
 +
** Minutes: http://wiki.i-marine.eu/index.php/5th_TCom_Meeting:_22nd_March_2013_Discussions_and_Notes
 +
* 5th TCOM, Rome, Italy, 26.03.2013
 +
** Slides: http://goo.gl/Kqm25t
 +
** Minutes: http://wiki.i-marine.eu/index.php/5th_TCom_Meeting:_28th_March_2013_Discussions_and_Notes
 +
* 6th TCOM, Skiathos, Greece, 19.06.2013
 +
** Slides: http://goo.gl/SNiw3c
 +
** Minutes: http://wiki.i-marine.eu/index.php/6th_TCom_Meeting:_19th_June_2013_Discussions_and_Notes
 +
* 7th TCOM, Rome, Italy, 15.10.2013
 +
** Slides: http://goo.gl/jL1JZ7
 +
** Minutes: http://wiki.i-marine.eu/index.php/7th_TCom_Meeting:_15th_October_2013_Discussions_and_Notes
 +
* 8th TCOM, Athens, Greece, 04.02.2014
 +
** Slides: http://goo.gl/4MAsqP
 +
** Minutes: http://wiki.i-marine.eu/index.php/8th_TCom_Meeting:_4th_February_2014_Discussions_and_Notes
 +
* 9th TCOM, Heraklion, Greece, 09.07.2014
 +
** Slides: http://goo.gl/1Q3Pph
  
what we (as FORTH) propose as description/actions/scheduling about TLO activities. Please, follow the link
+
=== Related tickets ===
http://bscw.research-infrastructures.eu/bscw/bscw.cgi/d276348/Semantic%20Cluster%20and%20TLO%20ActivitiesVer2.docx
+
* #224- Towards a top-level ontology for FAO & IRD, https://issue.imarine.research-infrastructures.eu/ticket/224
 
+
* #888- FLOD Ontological Analysis, https://issue.imarine.research-infrastructures.eu/ticket/888
== Motivation - Goal - Requirements ==
+
* #889- ECOSCOPE Ontological Analysis, https://issue.imarine.research-infrastructures.eu/ticket/889
Describe a scenario that will justify the need for having such a TLO on top of marine-domain knowledge bases.
+
* #890- TLO Design: First Iteration, https://issue.imarine.research-infrastructures.eu/ticket/890
 
+
* #891- TLO Implementation: First Iteration, https://issue.imarine.research-infrastructures.eu/ticket/891
THE MOTIVATIONS ARE BASED ON THE FACT THAT:
+
* #892- TLO Evaluation: First Iteration, https://issue.imarine.research-infrastructures.eu/ticket/892
Semantic technologies, applications and services for biodiversity mostly rely on the rise of an interconnected and shared tree-of-life like dataset scaling on the web. The various communities (including also marine one) are contributing to this joint effort aim to share domain data and their meaning, to provide a solid basis for biodiversity systems interoperability.
+
* #900- TLO Usage: First Iteration, https://issue.imarine.research-infrastructures.eu/ticket/900
 
+
* #1220- TLO Population: SPD, https://issue.imarine.research-infrastructures.eu/ticket/1220
THE GOAL IS:
+
* #1451- TLO Examples for species and related SPARQL queries, https://issue.imarine.research-infrastructures.eu/ticket/1451
Our goal in modelling and formalising a Top Level Ontology (TLO) is for '''integrating''' and '''semantically extending''' the underlying models of existing marine data sources. Specifically, the TLO is used on the top of a number of real and heterogeneous marine data sources, including FLOD and ECOSCOPE, as knowledge mediator to represent, manipulate and reason upon and across them.
+
* #1603- The new release of TLO, https://issue.imarine.research-infrastructures.eu/ticket/1603
 
+
* #1764- TLO Documentation, https://issue.imarine.research-infrastructures.eu/ticket/1764
THE REQUIREMENTS ARE:
+
* #1848- Documentation of the process used for creating TLO-based warehouses, https://issue.imarine.research-infrastructures.eu/ticket/1848
This Top Level Ontology has to focus on EAF ( Ecosystem Approach to Fisheries / Marine Resources) and should be generic enough to provide consistent abstractions or specifications of concepts included in all data models  or ontologies of iMarine data sources such as '''ECOSCOPE''', TDWG-WORMS, '''FLOD''', AGROVOC, DwC, IBIS [Gangemi 2002], [Doerr 2003], and provide the necessary properties to make this distributed knowledge base a coherent source of facts relating observational data with the respective spatiotemporal context and categorical (systematic) domain knowledge
+
* #1932- Building the new MarineTLO Warehouse, https://issue.imarine.research-infrastructures.eu/ticket/1932
 
+
* #2046- MarineTLO Version 3.0.0, https://issue.imarine.research-infrastructures.eu/ticket/2046
== FLOD Ontological Analysis ==
+
* #2050- Actions for exploiting the MarineTLO-based warehouse 2, https://issue.imarine.research-infrastructures.eu/ticket/2050
The ontological analysis of FLOD (+ references to documents/ wiki pages/ tickets). This activity is associated to the ticket https://issue.imarine.research-infrastructures.eu/ticket/888
+
* #2255- Investigation on Automation and quality improvement of the TLO Marine Warehouse, https://issue.imarine.research-infrastructures.eu/ticket/2255
 
+
* #2261- Constructing the MarineTLO warehouse v3, https://issue.imarine.research-infrastructures.eu/ticket/2261
This sub-activity, with the first iteration, has the primary goal to provide a common understanding of the FLOD ontology network. It has been considered necessary for the development of the TLO.
+
* #2319- MarineTLO Version 4.0.0, https://issue.imarine.research-infrastructures.eu/ticket/2319
 
+
* #2433- Constructing the MarineTLO-based warehouse v3+, https://issue.imarine.research-infrastructures.eu/ticket/2433
The '''first draft (04-Dec-2012)''' describing the '''FLOD Ontological Analysis at the first iteration''' can be found here  http://bscw.research-infrastructures.eu/bscw/bscw.cgi/d260861/FLOD%20analysis%20First%20Iteration.pdf
+
* #2807- Constructing the MarineTLO-based warehouse v4, https://issue.imarine.research-infrastructures.eu/ticket/2807
 
+
* #2982- Supporting the exploitation of MarineTLO-basedwarehouse v4, https://issue.imarine.research-infrastructures.eu/ticket/2982
The '''first (07-Dec-2012)''' '''Tconf''' on the '''FLOD Ontological Analysis at the first iteration''' can be found here http://wiki.i-marine.eu/index.php/7.12.2012-TLO_FLOD_Ontological_Analysis
+
 
+
The '''second draft (31-Dec-2012)''' describing the '''FLOD Ontological Analysis at the first iteration''' can be found here  http://bscw.research-infrastructures.eu/bscw/bscw.cgi/d273954/FLOD31122012.pdf
+
 
+
== Ecoscope Ontological Analysis ==
+
The ontological analysis of Ecoscope (+ references to documents/ wiki pages/ tickets). This activity is associated to the ticket https://issue.imarine.research-infrastructures.eu/ticket/889
+
 
+
This sub-activity, with the first iteration, has the primary goal to provide a common understanding of the ECOSCOPE ontology network. It has been considered necessary for the development of the TLO.
+
 
+
The '''first draft (13-Dec-2012)''' describing the '''ECOSCOPE Ontological Analysis''' at the first iteration can be found here http://bscw.research-infrastructures.eu/bscw/bscw.cgi/d261574/MainECOSCOPE.pdf
+
 
+
 
+
The '''first (11-Jan-2013) Tconf''' on the ECOSCOPE Ontological Analysis at the first iteration can be found here http://bscw.research-infrastructures.eu/bscw/bscw.cgi/d274281/ECOSCOPEmeetingWithJulien10012013.docx
+
 
+
== TLO Design ==
+
The activities carried out towards the design of TLO (+ references to documents / wiki pages/ tickets). This activity is associated to the ticket https://issue.imarine.research-infrastructures.eu/ticket/890
+
 
+
The '''first draft (21-Dec-2012)''' describing the TLO design at the first iteration can be found here http://bscw.research-infrastructures.eu/bscw/bscw.cgi/261904
+
 
+
The '''second draft (24-Dec-2012)''' describing the TLO design at the first iteration can be found here http://bscw.research-infrastructures.eu/bscw/bscw.cgi/d261983/TLO_draft(0.2).doc
+
 
+
== TLO Implementation ==
+
The activities carried out for the implementation of TLO (+ references to documents/ wiki pages/ tickets). This activity is associated to the ticket https://issue.imarine.research-infrastructures.eu/ticket/891
+
 
+
A SPARQL endpoint has been set at the following address http://139.91.183.78:8890/sparql. Based on this, it is possible to run the set of queries available here http://bscw.research-infrastructures.eu/bscw/bscw.cgi/d261933/TLO-Set%20of%20Queries%20firt%20iteration.pdf
+
 
+
== TLO Results ==
+
 
+
=== TLO Usage ===
+
Activities related to the usage of TLO (+ references to documents/ wiki pages/ tickets). This activity is associated to the ticket https://issue.imarine.research-infrastructures.eu/ticket/900
+
 
+
Here, we describe possible scenarios in which TLO can be evaluated. Currently, we identify the followings:
+
 
+
* TLO as meta-model for FLOD and ECOSCOPE
+
* TLO as knowledge model for semantic search in X-search
+
** Suppose that a user is looking for publications about tuna. Specifically he wants to find experiments that were applied to several species of tuna. So, he submits the query tuna and gets a sorted list of results and various categories of entities like Regional Fisheries Body, Species, FAO Country, etc. User realizes that the category Species may contain interesting entities. He notices that there is an entity with the label skipjack tuna which is a medium-sized fish in the tuna family found in tropical and warm-temperate waters. User wants to learn more information about that species. Specifically, he would like to see other species for which the skipjack tuna is predator or is prey. By clicking the icon next to the entity's name, user is able to instantly (at real-time) retrieve such information. In particular, in the back end, a SPARQL query is sent to the TLO's endpoint asking for that information. Note that the 'Species' have been derived from FLOD, while the properties 'is predator of' and 'is prey of' have been derived from ECOSCOPE's knowledge base. That would be impossible without the exploitation of the TLO.
+
 
+
 
+
* TLO as testing model for Project Cultural Geosemantics of Gerald Hiebel
+
**The Project Cultural Geosemantics of Gerald Hiebel is supported by a  Marie Curie Inter European Fellowship and investigates a methodology  to integrate CIDOC CRM with OGC GeoSPARQL in order to represent  spatial data within the ontology. The project includes an in depth  analysis of the proposed methodology on archaeological data. To  support the validity of the methodology in other domains it will be  tested on a conceptual level to represent spatial concepts within the  TLO ontology proposed for imarine. Selected data sets of imarine will  be used to test the hypothesis.
+
 
+
=== TLO Evaluation ===
+
Activities related to the evaluation of TLO (+ references to documents/ wiki pages/ tickets). This activity is associated to the ticket https://issue.imarine.research-infrastructures.eu/ticket/892
+
 
+
 
+
Currently, we are working on TLO-Populating activity. To give a support to this activity, I started to design the OWL output structure of the information we need according to the current version of the TLO. Below you can find two links:
+
 
+
1) To populate just the class Species, please follow the link http://bscw.research-infrastructures.eu/bscw/bscw.cgi/d275986/OutputStructureToPopulateTLO-Species.rtf 
+
 
+
2) To populate the class Species with predator and prey relationships, please follow the link http://bscw.research-infrastructures.eu/bscw/bscw.cgi/d275990/OutputStructureToPopulateTLO-SpeciesWithPredatorAndPrey.rtf
+
 
+
3) I am willing to discuss about other type of outputs depending on the data you can provide.
+
 
+
 
+
 
+
Here, we report the results of the evaluation of the TLO which are related to the specific used scenario.
+
 
+
== TLO Products ==
+
 
+
=== The last TLO Version, 13/11/2012 ===
+
http://bscw.research-infrastructures.eu/bscw/bscw.cgi/d260671/TLO-Ontology_20121113.owl
+
 
+
=== Previous TLO Versions ===
+
 
+
http://bscw.research-infrastructures.eu/bscw/bscw.cgi/260815
+
 
+
== TLO Related Tickets ==
+
 
+
=== First Iteration ===
+
* FLOD Ontological Analysis: https://issue.imarine.research-infrastructures.eu/ticket/888 '''CLOSED'''
+
 
+
* Ecoscope Ontological Analysis: https://issue.imarine.research-infrastructures.eu/ticket/889 '''CLOSED'''
+
 
+
* TLO Design: https://issue.imarine.research-infrastructures.eu/ticket/890 '''CLOSED'''
+
 
+
* TLO Implementation: https://issue.imarine.research-infrastructures.eu/ticket/891 '''CLOSED'''
+
 
+
* TLO Results: TLO Usage https://issue.imarine.research-infrastructures.eu/ticket/900 '''CLOSED'''
+
 
+
* TLO Evaluation, https://issue.imarine.research-infrastructures.eu/ticket/892 '''OPEN'''
+
 
+
* TLO Instatiation, https://issue.imarine.research-infrastructures.eu/ticket/1220 '''OPEN'''
+

Latest revision as of 11:28, 30 October 2014

Introduction

In a nutshell

One of the main characteristics of biodiversity data is its cross-disciplinary feature and the extremely broad range of data types, structures, and semantic concepts which encompasses. Moreover, biodiversity data, especially in the marine domain, is widely distributed, with few well-established repositories or standard protocols for their archiving, access, and retrieval. Queries like “Given the scientific name of a species, find its predators with the related taxon-rank classification and with the different codes that the organizations use to refer to them", cannot be formulated (and consequently nor answered) by any individual source. To formulate such queries we need an expressive conceptual model, while for answering them we also have to assemble pieces of information stored in different sources. To fill this gap, we have designed and implemented a top level ontology, called Marine Top Level Ontology (for short MarineTLO).

Motivating Scenarios

The availability of a top level ontology for the marine domain would be useful in various scenarios. Below we will describe them.

For Publishing Linked Data: There is a trend towards publishing Linked Data; consequently a rising issue concerns the structure that is beneficial to use during such publishing. The semantic structure that will be presented can be used by the involved organizations for anticipating future needs for information integration, and thus alleviating the required effort for (post) integration.

For Generating Fact Sheets: FactSheetGenerator is an application provided by IRD aiming at providing factual knowledge about the marine domain by mashing-up relevant knowledge distributed across several data sources. Currently, FactSheetGenerator uses only ECOSCOPE and related knowledge stored in other sources (e.g., about commercial codes or taxonomic information) cannot be exploited. MarineTLO could be exploited for advancing this application, i.e., for providing more complete semantic descriptions.

For Semantic Post-Processing of the Results of Keyword Search Queries: Another big challenge nowadays is how to integrate structured data with unstructured data (documents and text). The availability of harmonized structured knowledge about the marine domain can be exploited for a semantic post-processing of the search results (over dedicated or general purpose search systems). XSearch is a meta-search engine that offers semantic post-processing of search results and is able to analyze the returned results by exploiting also the availability of semantic repositories (e.g. SPARQL endpoints). Xsearch could exploit MarineTLO for providing more complete information about the identified entities.

For Enabling Complex Query Services over Integrated Data: MarineTLO can be used as the schema for setting up integrated repositories that offer more complex query services, which cannot be supported by the individual underlying sources. In general, there are two main approaches for building and querying such repositories: the materialized integration approach (or warehouse approach), and the virtual integration (or mediator) approach (more information about these approaches can be found in [Tzitzikas-MTSR'13]). The key point is that in both cases we need a schema and MarineTLO can serve this requirement.

MarineTLO as a product

We used a set of underlying sources for integrating their concepts in MarineTLO. Below we briefly describe these sources, and then describe the ontology MarineTLO and its corresponding releases.

The main underlying sources

Fisheries Linked Open Data: FLOD, created and maintained by Food and Agriculture Organization (FAO), is dedicated to create a dense network of relationships among the entities of the Fishery domains, and to programmatically serve them to semantic and traditional application environments. The FLOD content is exposed either via a public SPARQL endpoint[1] (suitable for semantic applications) or via a JAVA API to be embedded in consumers’ application code. Currently, the FLOD network includes entities and relationships from the domains of Marine Species, Water Areas, Land Areas, Exclusive Economic Zones, and serves software applications in the domain of statistics and GIS.

ECOSCOPE Knowledge Base: IRD offers a public SPARQL endpoint[2] for its knowledge base containing geographical data, pictures and information about marine ecosystems (specifically data about fishes, sharks, related persons, countries and organizations, harbors, vessels, etc.).

WoRMS: The World Register of Marine Species[3] currently contains more than 200 thousand species, around 380 thousand species names including synonyms, and 470 thousands taxa (infraspecies to kingdoms).

FishBase: FishBase[4] is a global database of fish species. It is a relational database containing information about the taxonomy, geographical distribution, biometrics, population, genetic data and many more. Currently, it contains more the 32 thousand species and more than 300 thousand common names in various languages.

DBpedia: DBpedia[5] is a project focusing on the task of converting content from Wikipedia to structured knowledge so that Semantic Web techniques can be employed against it. At the time of writing this article, the English version of the knowledge base of DBpedia describes more than 4.5 million things, containing persons, places, works, species, etc. In our case, we are using a subset of DBpedia’s knowledge base containing only fishes (i.e. instances classified under the class http://dbpedia.org/ontology/Fish).

The Marine Top Level Ontology

MarineTLO is not supposed to be a single ontology covering the entirety of what exists. It aims at being a global core model that (a) covers with suitable abstractions the domains under consideration to enable the most fundamental queries, (b) can be extended to any level of detail on demand, and (c) can adequately map and integrate data originating from distinct sources. This approach has two main benefits:

  • reduced effort for improving and evolving it since the focus is given on one model rather than many
  • reduced effort for constructing mappings since this approach avoid the pair-wise mappings between individual metadata formats and/or ontologies.

For the development and evolution of MarineTLO we have adopted an iterative and incremental methodology comprising the following steps: (i) ontological analysis of the underlying sources, (ii) design, (iii) implementation and (iv) evaluation. The activities of each iteration has been monitored by opening the corresponding tickets. The section #Related_tickets contains a list of the tickets that have been opened. For the implementation of MarineTLO we have used OWL 2 (Web Ontology Language) while for the needs of evaluation we used the notion of competence queries.

Releases

In total we have released 4 versions of the MarineTLO. Each release was able to cover various concepts for different sources. Below we report the contents, some basic information, and several links (e.g., documentation) of each version.

MarineTLO Versions
Version Classes and Properties Underlying Sources Concepts covered OWL File Documentation Mappings Competence Queries Release Date
Version 1 17 classes and 8 properties FLOD, ECOSCOPE, WORMS Species, Scientific Names, Predators http://goo.gl/ukxmAv http://goo.gl/kEfp8g http://goo.gl/3sMdwR March 2013
Version 2 57 classes and 22 properties FLOD, ECOSCOPE, WORMS, DBpedia Species, Scientific Names, Predators, Authorships http://goo.gl/JTh8p9 http://goo.gl/Y9bGFy http://goo.gl/I6STNv http://goo.gl/3ZNqb4 July 2013
Version 3 57 classes and 25 properties FLOD, ECOSCOPE, WORMS, DBpedia, Fishbase Species, Scientific Names, Common Names, Predators, Authorships, Ecosystems, Countries, Water Areas, Vessels, Gears, EEZ http://goo.gl/J15OCE http://goo.gl/FygSd6 http://goo.gl/vxESUF http://goo.gl/yTaz7O October 2013
Version 4 127 classes and 81 properties FLOD, ECOSCOPE, WORMS, DBpedia, Fishbase Species, Scientific Names, Common Names, Predators, Authorships, Ecosystems, Countries, Water Areas, Vessels, Gears, EEZ, Bibliography, Statistical Indicators http://goo.gl/Yh1Uot http://goo.gl/KIFY6e http://goo.gl/WU3xkG http://goo.gl/KIFY6e July 2014

References

Related Links/Publications

  • MarineTLO public website http://www.ics.forth.gr/isl/MarineTLO/
  • MarineTLO-based Warehouse wiki page http://wiki.i-marine.eu/index.php/MarineTLO-based_warehouse
  • [Tzitzikas-MTSR'13] Y. Tzitzikas, C. Alloca, C. Bekiari, Y. Marketakis, P. Fafalios, M. Doerr, N. Minadakis, T. Patkos and L. Candela. Integrating Heterogeneous and Distributed Information about Marine Species through a Top Level Ontology, Proceedings of the 7th Metadata and Semantic Research Conference, MTSR'13, Thessaloniki, Greece, November 2013.
  • [Tzitzikas-ERCIM'14] Y. Tzitzikas, C. Allocca, C. Bekiari, Y. Marketakis, P. Fafalios and N. Minadakis. Ontology-based Integration of Heterogeneous and Distributed Information of the Marine Domain, ERCIM News 2014 (96), Special theme: Linked Open Data, January 2014
  • [Tzitzikas-LWDM'14] Y. Tzitzikas, N. Minadakis, Y. Marketakis, P. Fafalios, C. Alloca and N. Mountantonakis. Quantifying the Connectivity of a Semantic Warehouse, 4th International Workshop on Linked Web Data Management, LWDM'14, Athens, Greece, March 2014.
  • [Moutantonakis-PROFILES'14] M. Mountantonakis, C. Allocca, P. Fafalios, N. Minadakis, Y. Marketakis, C. Lantzaki and Y. Tzitzikas. Extending VoID for Expressing the Connectivity Metrics of a Semantic Warehouse, 1st International Workshop on Dataset Profiling & Federated Search for Linked Data (PROFILES'14), in conjunction with the 11th Extended Semantic Web Conference (ESWC'14), Anissaras Hersonissou, Crete, Greece, May 2014.
  • [Tzitzikas-ESCW'14] Y. Tzitzikas, N. Minadakis, Y. Marketakis, P. Fafalios, C. Alloca, N. Mountantonakis and I. Zidianaki. MatWare: Constructing and Exploiting Domain Specific Warehouses by Aggregating Semantic Data, 11th Extended Semantic Web Conference (ESWC'14), Anissaras Hersonissou, Crete, Greece, May 2014.

Related Meetings & Minutes

Related tickets