Difference between revisions of "Top Level Ontology"

Revision as of 10:23, 30 October 2014

Introduction

In a nutshell

One of the main characteristics of biodiversity data is its cross-disciplinary feature and the extremely broad range of data types, structures, and semantic concepts which encompasses. Moreover, biodiversity data, especially in the marine domain, is widely distributed, with few well-established repositories or standard protocols for their archiving, access, and retrieval. Queries like “Given the scientific name of a species, find its predators with the related taxon-rank classification and with the different codes that the organizations use to refer to them", cannot be formulated (and consequently nor answered) by any individual source. To formulate such queries we need an expressive conceptual model, while for answering them we also have to assemble pieces of information stored in different sources. To fill this gap, we have designed and implemented a top level ontology, called Marine Top Level Ontology (for short MarineTLO).

Motivating Scenarios

The availability of a top level ontology for the marine domain would be useful in various scenarios. Below we will describe them.

For Publishing Linked Data: There is a trend towards publishing Linked Data; consequently a rising issue concerns the structure that is beneficial to use during such publishing. The semantic structure that will be presented can be used by the involved organizations for anticipating future needs for information integration, and thus alleviating the required effort for (post) integration.

For Generating Fact Sheets: FactSheetGenerator is an application provided by IRD aiming at providing factual knowledge about the marine domain by mashing-up relevant knowledge distributed across several data sources. Currently, FactSheetGenerator uses only ECOSCOPE and related knowledge stored in other sources (e.g., about commercial codes or taxonomic information) cannot be exploited. MarineTLO could be exploited for advancing this application, i.e., for providing more complete semantic descriptions.

For Semantic Post-Processing of the Results of Keyword Search Queries: Another big challenge nowadays is how to integrate structured data with unstructured data (documents and text). The availability of harmonized structured knowledge about the marine domain can be exploited for a semantic post-processing of the search results (over dedicated or general purpose search systems). XSearch is a meta-search engine that offers semantic post-processing of search results and is able to analyze the returned results by exploiting also the availability of semantic repositories (e.g. SPARQL endpoints). Xsearch could exploit MarineTLO for providing more complete information about the identified entities.

For Enabling Complex Query Services over Integrated Data: MarineTLO can be used as the schema for setting up integrated repositories that offer more complex query services, which cannot be supported by the individual underlying sources. In general, there are two main approaches for building and querying such repositories: the materialized integration approach (or warehouse approach), and the virtual integration (or mediator) approach (more information about these approaches can be found in [Tzitzikas-MTSR'13]). The key point is that in both cases we need a schema and MarineTLO can serve this requirement.

MarineTLO as a product

We used a set of underlying sources for integrating their concepts in MarineTLO. Below we briefly describe these sources, and then describe the ontology MarineTLO and its corresponding releases.

The main underlying sources

Fisheries Linked Open Data: FLOD, created and maintained by Food and Agriculture Organization (FAO), is dedicated to create a dense network of relationships among the entities of the Fishery domains, and to programmatically serve them to semantic and traditional application environments. The FLOD content is exposed either via a public SPARQL endpoint[1] (suitable for semantic applications) or via a JAVA API to be embedded in consumers’ application code. Currently, the FLOD network includes entities and relationships from the domains of Marine Species, Water Areas, Land Areas, Exclusive Economic Zones, and serves software applications in the domain of statistics and GIS.

ECOSCOPE Knowledge Base: IRD offers a public SPARQL endpoint[2] for its knowledge base containing geographical data, pictures and information about marine ecosystems (specifically data about fishes, sharks, related persons, countries and organizations, harbors, vessels, etc.).

WoRMS: The World Register of Marine Species[3] currently contains more than 200 thousand species, around 380 thousand species names including synonyms, and 470 thousands taxa (infraspecies to kingdoms).

FishBase: FishBase[4] is a global database of fish species. It is a relational database containing information about the taxonomy, geographical distribution, biometrics, population, genetic data and many more. Currently, it contains more the 32 thousand species and more than 300 thousand common names in various languages.

DBpedia: DBpedia[5] is a project focusing on the task of converting content from Wikipedia to structured knowledge so that Semantic Web techniques can be employed against it. At the time of writing this article, the English version of the knowledge base of DBpedia describes more than 4.5 million things, containing persons, places, works, species, etc. In our case, we are using a subset of DBpedia’s knowledge base containing only fishes (i.e. instances classified under the class http://dbpedia.org/ontology/Fish).

The Marine Top Level Ontology

MarineTLO is not supposed to be a single ontology covering the entirety of what exists. It aims at being a global core model that (a) covers with suitable abstractions the domains under consideration to enable the most fundamental queries, (b) can be extended to any level of detail on demand, and (c) can adequately map and integrate data originating from distinct sources. This approach has two main benefits:

reduced effort for improving and evolving it since the focus is given on one model rather than many
reduced effort for constructing mappings since this approach avoid the pair-wise mappings between individual metadata formats and/or ontologies.

For the development and evolution of MarineTLO we have adopted an iterative and incremental methodology comprising the following steps: (i) ontological analysis of the underlying sources, (ii) design, (iii) implementation and (iv) evaluation. The activities of each iteration has been monitored by opening and corresponding tickets. The section REF_TICKETS_SECTION contains a list of the tickets that have been opened. For the implementation of MarineTLO we have used OWL 2 (Web Ontology Language) while for the needs of evaluation we used the notion of competence queries.

Releases

In total we released 4 versions of the MarineTLO. Each release was able to cover various concepts for different sources. Below we report the contents of each version, some basic information and links its documentation.

MarineTLO Versions
Version	Classes and Properties	Underlying Sources	Concepts covered	OWL File	Documentation	Mappings	Competence Queries	Release Date
Version 1	17 classes and 8 properties	FLOD, ECOSCOPE, WORMS	Species, Scientific Names, Predators	http://goo.gl/ukxmAv		http://goo.gl/kEfp8g	http://goo.gl/3sMdwR	March 2013
Version 2	57 classes and 22 properties	FLOD, ECOSCOPE, WORMS, DBpedia	Species, Scientific Names, Predators, Authorships	http://goo.gl/JTh8p9	http://goo.gl/Y9bGFy	http://goo.gl/I6STNv	http://goo.gl/3ZNqb4	July 2013
Version 3	57 classes and 25 properties	FLOD, ECOSCOPE, WORMS, DBpedia, Fishbase	Species, Scientific Names, Common Names, Predators, Authorships, Ecosystems, Countries, Water Areas, Vessels, Gears, EEZ	http://goo.gl/J15OCE	http://goo.gl/FygSd6	http://goo.gl/vxESUF	http://goo.gl/yTaz7O	October 2013
Version 4	127 classes and 81 properties	FLOD, ECOSCOPE, WORMS, DBpedia, Fishbase	Species, Scientific Names, Common Names, Predators, Authorships, Ecosystems, Countries, Water Areas, Vessels, Gears, EEZ, Bibliography, Statistical Indicators	http://goo.gl/Yh1Uot	http://goo.gl/KIFY6e	http://goo.gl/WU3xkG	http://goo.gl/KIFY6e	July 2014

References

Related Meetings & Minutes

3rd TCOM, Rome, Italy, 03.11.2012
- Minutes: http://wiki.i-marine.eu/index.php/03.11.2012.MeetingInRome
FLOD Ontological analysis virtual meeting
- Minutes: http://wiki.i-marine.eu/index.php/7.12.2012-TLO_FLOD_Ontological_Analysis
4th TCOM, Ostende, Belgium, 29.01.2013
- Slides: http://goo.gl/JuCrbV
- Minutes: http://wiki.i-marine.eu/index.php/4th_TCom_Meeting:_25th_January_2013_Discussions_and_Notes
Semantic Cluster virtual meeting, 13.02.2013
- Documents: http://goo.gl/8wbfTK, http://goo.gl/uPS4Ck
- Minutes: http://wiki.i-marine.eu/index.php/13.02.2013_Semantic_Cluster
5th TCOM, Pisa, Italy, 21.03.2013
- Slides: http://goo.gl/mHWYzg
- Minutes: http://wiki.i-marine.eu/index.php/5th_TCom_Meeting:_22nd_March_2013_Discussions_and_Notes
5th TCOM, Rome, Italy, 26.03.2013
- Slides: http://goo.gl/Kqm25t
- Minutes: http://wiki.i-marine.eu/index.php/5th_TCom_Meeting:_28th_March_2013_Discussions_and_Notes
6th TCOM, Skiathos, Greece, 19.06.2013
- Slides: http://goo.gl/SNiw3c
- Minutes: http://wiki.i-marine.eu/index.php/6th_TCom_Meeting:_19th_June_2013_Discussions_and_Notes
7th TCOM, Rome, Italy, 15.10.2013
- Slides: http://goo.gl/jL1JZ7
- Minutes: http://wiki.i-marine.eu/index.php/7th_TCom_Meeting:_15th_October_2013_Discussions_and_Notes
8th TCOM, Athens, Greece, 04.02.2014
- Slides: http://goo.gl/4MAsqP
- Minutes: http://wiki.i-marine.eu/index.php/8th_TCom_Meeting:_4th_February_2014_Discussions_and_Notes
9th TCOM, Heraklion, Greece, 09.07.2014
- Slides: http://goo.gl/1Q3Pph

Related tickets

#224- Towards a top-level ontology for FAO & IRD, https://issue.imarine.research-infrastructures.eu/ticket/224
#888- FLOD Ontological Analysis, https://issue.imarine.research-infrastructures.eu/ticket/888
#889- ECOSCOPE Ontological Analysis, https://issue.imarine.research-infrastructures.eu/ticket/889
#890- TLO Design: First Iteration, https://issue.imarine.research-infrastructures.eu/ticket/890
#891- TLO Implementation: First Iteration, https://issue.imarine.research-infrastructures.eu/ticket/891
#892- TLO Evaluation: First Iteration, https://issue.imarine.research-infrastructures.eu/ticket/892
#900- TLO Usage: First Iteration, https://issue.imarine.research-infrastructures.eu/ticket/900
#1220- TLO Population: SPD, https://issue.imarine.research-infrastructures.eu/ticket/1220
#1451- TLO Examples for species and related SPARQL queries, https://issue.imarine.research-infrastructures.eu/ticket/1451
#1603- The new release of TLO, https://issue.imarine.research-infrastructures.eu/ticket/1603
#1764- TLO Documentation, https://issue.imarine.research-infrastructures.eu/ticket/1764
#1848- Documentation of the process used for creating TLO-based warehouses, https://issue.imarine.research-infrastructures.eu/ticket/1848
#1932- Building the new MarineTLO Warehouse, https://issue.imarine.research-infrastructures.eu/ticket/1932
#2046- MarineTLO Version 3.0.0, https://issue.imarine.research-infrastructures.eu/ticket/2046
#2050- Actions for exploiting the MarineTLO-based warehouse 2, https://issue.imarine.research-infrastructures.eu/ticket/2050
#2255- Investigation on Automation and quality improvement of the TLO Marine Warehouse, https://issue.imarine.research-infrastructures.eu/ticket/2255
#2261- Constructing the MarineTLO warehouse v3, https://issue.imarine.research-infrastructures.eu/ticket/2261
#2319- MarineTLO Version 4.0.0, https://issue.imarine.research-infrastructures.eu/ticket/2319
#2433- Constructing the MarineTLO-based warehouse v3+, https://issue.imarine.research-infrastructures.eu/ticket/2433
#2807- Constructing the MarineTLO-based warehouse v4, https://issue.imarine.research-infrastructures.eu/ticket/2807
#2982- Supporting the exploitation of MarineTLO-basedwarehouse v4, https://issue.imarine.research-infrastructures.eu/ticket/2982

@@ Line 14: / Line 14: @@
 '''For Semantic Post-Processing of the Results of Keyword Search Queries:''' Another big challenge nowadays is how to integrate structured data with unstructured data (documents and text). The availability of harmonized structured knowledge about the marine domain can be exploited for a semantic post-processing of the search results (over dedicated or general purpose search systems). [http://wiki.i-marine.eu/index.php/XSearch XSearch] is a meta-search engine that offers semantic post-processing of search results and is able to analyze the returned results by exploiting also the availability of semantic repositories (e.g. SPARQL endpoints). Xsearch could exploit MarineTLO for providing more complete information about the identified entities.
-'''For Enabling Complex Query Services over Integrated Data:''' MarineTLO can be used as the schema for setting up integrated repositories that offer more complex query services, which cannot be supported by the individual underlying sources. In general, there are two main approaches for building and querying such repositories: the materialized integration approach (or warehouse approach), and the virtual integration (or mediator) approach (more information about these sources can be found in '''REF_XXX'''). The key point is that in both cases we need a schema and MarineTLO can serve this requirement.
+'''For Enabling Complex Query Services over Integrated Data:''' MarineTLO can be used as the schema for setting up integrated repositories that offer more complex query services, which cannot be supported by the individual underlying sources. In general, there are two main approaches for building and querying such repositories: the materialized integration approach (or warehouse approach), and the virtual integration (or mediator) approach (more information about these approaches can be found in '''[Tzitzikas-MTSR'13]'''). The key point is that in both cases we need a schema and MarineTLO can serve this requirement.
 == MarineTLO as a product ==

Difference between revisions of "Top Level Ontology"

Revision as of 10:23, 30 October 2014

Introduction

In a nutshell

Motivating Scenarios

MarineTLO as a product

The main underlying sources

The Marine Top Level Ontology

Releases

References

Related Links/Publications

Related Meetings & Minutes

Related tickets

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

D4Science

Capacity

Procedures

Policies

Documentation

Tools