SoBigData Interoperability Guidelines

From D4Science Wiki
Revision as of 17:50, 18 March 2016 by Leonardo.candela (Talk | contribs) (Dataset Integration)

Jump to: navigation, search

SoBigData is ...

Dataset Integration

Datasets are collections of data that are considered as a unit for management purposes. An initial list of datasets to be considered / integrated in the SoBigData e-InfrastructureAn operational combination of digital technologies (hardware and software), resources (data and services), communications (protocols, access rights and networks), and the people and organizational structures needed to support research efforts and collaboration in the large. is described in D8.1. The first level of interoperability to be achieved consists in enabling users (primarily SoBigData practitioners) to discover, in a seamless way, information (aka metadata) on the available datasets. This will be achieved by explicitly registering/publishing dataset information through the dataset catalogue service hosted by the SoBigData Infrastructure. Once registered, datasets can be discovered by search (Google-like and faceted search by tags) and/or browse. For each dataset the catalogue provides the user with a rich set of metadata including:

  • descriptive information like title, author/provider, keywords, and description aiming at providing details on the dataset,
  • coverage oriented information to characterise the extent (e.g. spatial extent, temporal extent) of the dataset,
  • classification oriented information like tags,
  • access oriented information like web protocols for accessing the actual data in any of the formats it is made available,
  • usage oriented information like licence.

To collect such metadata, SoBigData

  • defines a dataset application profile by building on existing formats like DataCite and DCAT,
  • requires data providers to register their datasets into the catalogue.

In case datasets are already published in other repositories, e.g. Geospatial data repositories compliant with CWS standard, the integration might be semi-automatic by harvesting the metadata and repurposing them to comply with the SoBigData application profile.

Application Integration

Method Integration

Service Integration