Blue Hackathon iMarine Data Challenges

From D4Science Wiki
Revision as of 12:36, 28 June 2013 by Andrea.manzi (Talk | contribs) (Ecoscope)

Jump to: navigation, search

Data Challenges

Challenge #1

Enrich HTML web content with RDF annotation, and enable annotation-based document discovery

Background

(Why this is relevant to blue-er world)

Objectives
  1. We ask the hackathon participants to find a technical solution to enrich the factsheets of the FIGIS portal with annotations in RDFa format. Can we write there? The annotation will consist at least of the URIs of the entities referenced in the factsheet, and of set of relevant relations provided with the datasets.
    1. GOAL: an RDFa client will be able to extract (is that the correct term?) the annotations
  2. We ask the hackathon to use the annotations produced at item one, as input to online search of factsheets (publication, GIS maps, images, statistical timeseries), to create enhanced discovery facility that complement the web page information content.
    1. GOAL: a set of factsheets is retrieved via online search services.
    2. GOAL: Access to the factsheets
Challenges

TBD

Datasets

TBD

APIs

TBD


Challenge #2

Generate RDF dataset from GIS layer in geonetwork, and map the geographic entities with existing LOD datasets

Background

(Why this is relevant to blue-er world)

Objectives
  1. We ask the hackathon to find a technical solution to produce LOD dataset from a collection of GIS layers accessed via GeoNetwork web services. The entities of the dataset will have to be mapped with existing LOD datasets in the GIS domain.
    1. GOAL: given an online service that list a collection of GIS layer, a new LOD dataset is produced.
    2. GOAL: enrich the geographic entities in that dataset with more data gathered trough the mapping with existing LOD GIS datasets (e.g. geonames, geopolitical ontology, dbpedia, etc).
Challenges

TBD

Datasets

TBD

APIs

TBD


Challenge #3

Generate RDF dataset from DarwinCore sources and map to existing biodiversity LOD

Background

(Why this is relevant to blue-er world)

Objectives
  1. We ask the hackathon participants to produce a LOD dataset from a source of DarwinCore data (XML or a service), and map its entities with existing LOD datasets in the biodiversity domain.
    1. GOAL: access complementary information with taxonomic data through the mappings (e.g. species conservation status, capture statistics, distribution map, etc)
Challenges

TBD

Datasets

TBD

APIs

TBD


Challenge #4

Generate dynamic fact-sheets mashing up data from distributed LOD datasets

Background

(Why this is relevant to blue-er world)

Objectives
  1. We ask the hackathon to find a technical solution based on LOD data mashup, to compose domain-based sections of a factsheet, taking data from distributed LOD datasets. The domain of the sections can be: economics, taxonomic, fishing technique, statistics, publications etc.
    1. GOAL: a web service responding with a collection of data clustered by domain-section, and display the result in HTML format
Challenges

TBD

Datasets

TBD

APIs

TBD


Challenge #5

Search Results presentation exploitation.

Search results regarding marine data could be enriched in order to provide advanced experience to the user. Derived information could be injected into results regarding identification of special keywords (related to the query), with results retrieved by OpenSearch and other external(?) datasources. Also exploration of the results could be improved from simple browsing into information discovery, providing accumulated information, filtering, suggestions etc.

Background

(Why this is relevant to blue-er world)

Objectives
  1. We ask the hackathon participants to enrich the search results retrieved from iMarine Collections by identifying special keywords (related to the topic) with results retrieved from OpenSearch and other external(?) datasources.
  2. We ask the hackathon participants to explore the database by performing a number of predefined queries and keep statistics on them in order to enhance the existing browsing methods
Challenges

TBD

Datasets
Ecoscope

[Ecoscope]

APIs

gCUbe_Search_client

Challenge #6

Processing and Visualization of data sets

Exploit geolocation of real-world data in order to calculate and visualize geographical information and trends (i.e. migration of species). Support interactive map search over multiple sources, combined and enriched results. Search results will be presented on a map with possible options of clustering, filtering etc. User could also interact with results, like clicking on a result or location would show related results, helpful things etc.

Background

(Why this is relevant to blue-er world)

Objectives
  1. Exploit the species occurrences data in order to calculate and visualize geographical trends (i.e. migration of species).
  2. Interactive Map Search. Search over data of multiple source, combine them and enrich results. Search results can be presented on a map.
    1. clustering, filtering
    2. trend identification
    3. interact with results, like clicking on a result or location would show related results, helpful things etc
Challenges

TBD

Datasets

iMarine_GeoNetwork

APIs

iMarine_GeoNetwork

Some Notes from FORTH about possible challenges

What is this fish? Shoot and learn

Title: What is this fish? Shoot and learn The user takes a shot of a fish using his mobile. The application uses the images of species (e.g. by exploiting ECOSCOPE’s images) and returns to the user the name of the fish and related information. Requirement: image similarity.

Gradual Query Expansion for Species (semantic pre-processing of keyword queries)

Title: Gradual Query Expansion for Species (semantic pre-processing of keyword queries) Challenge: tackle the problem of empty (or small) answers in search systems by designing and developing a component that allows gradual query expansion which exploits the availability of linked data. Input: A species name, a number of related sources of information Output: A series of queries <q1, q2, … , qk>, where a query is a set of words. The words in q_i is subset of the words in q_{i+1} , and so on. For instance q1 could contain the names of the species in different natural languages, q2 could include the scientific names, q3 could include sub/sup-species, q4 could include competitors, predators, etc. It could be deployed as a web app where the user enters his query, the app computes the expanded queries and could directly forward the control to a search engine (the expanded query is passed through the url).

Linked Data for Species

Title: Linked Data for Species Link the species described in the TLO-based warehouse (SPARQL endpoint) with related information in other sources of structured (e.g. DBPEDIA, ..) or unstructured information (e.g. Wikipedia, …) aiming at ….<<we need a specific objective here>>

Linked Data Browser

Title: Linked Data Browser Build a browser (textual/graphical) for the TLO-based repository (SPARQL endpoint). You should consider devices with small screens (one could develop a dedicated android client for this). Other SPARQL endpoints (or structured information accessed through HTTP) could also be considered. Challenge: tackle overloading

Datasets and APIs

Datasets

TLO based SPARQL endpoint

Data Graph

Description

The description of the MarineTLO can be found here:

http://wiki.i-marine.eu/index.php/Top_Level_Ontology

Exploitation Example

(How can be used within a challenge)

FAO FLOD

Description

(Short description of available data)

Exploitation Example

(How can be used within a challenge)


iMarine GeoNetwork

Description

(Short description of available data)

Exploitation Example

(How can be used within a challenge)

iMarine Biodiversity Data Service

Endpoint of the WS giving access to Biodiversity data coming from several providers ( OBIS, GBIF, CoL..)

Description

(Short description of available data)

Exploitation Example

</pre>

APIs

SPARQL Client

Any SPARQL client available on the Web

Description

(documentation)

Exploitation Example

(How can be used within a challenge) + (javadoc?)


GeoNetwork Client

Wiki

Javadoc

Description

(documentation)

Exploitation Example

(How can be used within a challenge) + (javadoc?)

SPD Client

Wiki

Description

The SPD Client can be used to access a Biodiversity data broker implemented in iMarine, the SPD service. More details about the architecture of the service are available at

https://gcube.wiki.gcube-system.org/gcube/index.php/Biodiversity_Access

Exploitation Example

The client can be used for example to query the OBIS data source and return the taxonomic information related to shark


ScopeProvider.instance.set("/d4science.research-infrastructures.eu/gCubeApps");
Manager manager = manager().withTimeout(3, TimeUnit.MINUTES).build();

Stream<ResultElement> taxa = manager.search("SEARCH BY CN 'shark' RESOLVE WITH OBIS EXPAND IN OBIS  RETURN Taxon");
		
while (taxa.hasNext()){
	TaxonomyItem taxon = (TaxonomyItem)taxa.next();
	System.out.println(taxon.getAuthor()+" "+taxon.getRank()+" "+taxon.getScientificName());
	while ((taxon=taxon.getParent())!=null)
		System.out.println(taxon.getScientificName()+" -- "+taxon.getRank());
}

gCUbe Search client

Wiki

Javadoc


Description

(documentation)

Exploitation Example

(How can be used within a challenge) + (javadoc?)

Artifacts

The software distributed by iMarine ( gCube ) is available trough Maven repositories. The following setting.xml configuration file should be set up:

<settings xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/settings-1.0.0.xsd">

	

	<profiles>
		<profile>
			<id>gcube</id>
			<repositories>
				<repository>
					<id>gcube-releases</id>
					<name>gCube Releases</name>
					<url>http://maven.research-infrastructures.eu/nexus/content/repositories/gcube-releases</url>
					<releases>
						<enabled>true</enabled>
					</releases>
					<snapshots>
						<enabled>false</enabled>
					</snapshots>
				</repository>
				<repository>
					<id>gcube-externals</id>
					<name>gCube Externals</name>
					<url>http://maven.research-infrastructures.eu/nexus/content/repositories/gcube-externals</url>
					<snapshots>
						<enabled>false</enabled>
					</snapshots>
					<releases>
						<enabled>true</enabled>
					</releases>
				</repository>
			</repositories>

			<pluginRepositories>
				<pluginRepository>
					<id>gcube-releases</id>
					<name>gCube Releases</name>
					<url>http://maven.research-infrastructures.eu/nexus/content/repositories/gcube-releases</url>
					<releases>
						<enabled>true</enabled>
					</releases>
					<snapshots>
						<enabled>false</enabled>
					</snapshots>
				</pluginRepository>
				<pluginRepository>
					<id>gcube-externals</id>
					<name>gCube Externals</name>
					<url>http://maven.research-infrastructures.eu/nexus/content/repositories/gcube-externals</url>
					<snapshots>
						<enabled>false</enabled>
					</snapshots>
					<releases>
						<enabled>true</enabled>
					</releases>
				</pluginRepository>
			</pluginRepositories>
			
		</profile>
	</profiles>

	<activeProfiles>
		<activeProfile>gcube</activeProfile>
	</activeProfiles>
</settings>

or the same settings included in your pom file. The maven coordinates of the components to use for the challenges are documented in the related wikis.

External Links