Difference between revisions of "SoBigData Interoperability Guidelines"

From D4Science Wiki
Jump to: navigation, search
(Method Integration)
(Application Integration)
 
Line 59: Line 59:
 
* reconsider their application architecture thus to extrapolate the methods and the datasets;  
 
* reconsider their application architecture thus to extrapolate the methods and the datasets;  
 
* make their methods compliant with the guidelines of the hosting platform according to the methodology described in a [https://wiki.gcube-system.org/How-to_Implement_Algorithms_for_the_Statistical_Manager dedicated Wiki page]. This activity might require some modification / adaptation of the method implementation, e.g. for input parameters specification. The cost of this adaptation depends on the complexity of the method;   
 
* make their methods compliant with the guidelines of the hosting platform according to the methodology described in a [https://wiki.gcube-system.org/How-to_Implement_Algorithms_for_the_Statistical_Manager dedicated Wiki page]. This activity might require some modification / adaptation of the method implementation, e.g. for input parameters specification. The cost of this adaptation depends on the complexity of the method;   
 +
** A [[SoBigData: step-by-step procedure for algorithm integration]] (including a couple of Java simple classes) resulting from a concrete integration exercise is available;
 
* publish the algorithm through the platform. In case the method is implemented with a R script, the platform is provided with a [https://wiki.gcube-system.org/Statistical_Algorithms_Importer facility supporting this publishing phase];
 
* publish the algorithm through the platform. In case the method is implemented with a R script, the platform is provided with a [https://wiki.gcube-system.org/Statistical_Algorithms_Importer facility supporting this publishing phase];
 
* transform the datasets in publishable assets and publish them as previously described;  
 
* transform the datasets in publishable assets and publish them as previously described;  

Latest revision as of 16:38, 26 May 2016

SoBigData is an ongoing EU project willing to create the Social Mining and Big Data Ecosystem: a research infrastructure (RI) providing an integrated ecosystem for ethic-sensitive scientific discoveries and advanced applications of social data mining on the various dimensions of social life, as recorded by “big data”.

In order to achieve this goal, the project will develop the SoBigData e-InfrastructureAn operational combination of digital technologies (hardware and software), resources (data and services), communications (protocols, access rights and networks), and the people and organizational structures needed to support research efforts and collaboration in the large. by aggregating and harmonising community resources.

This page describes how the existing resources are going to be integrated thus to become interoperable.

Dataset Integration

Datasets are collections of data that are considered as a unit for management purposes. An initial list of datasets to be considered / integrated in the SoBigData e-InfrastructureAn operational combination of digital technologies (hardware and software), resources (data and services), communications (protocols, access rights and networks), and the people and organizational structures needed to support research efforts and collaboration in the large. is described in D8.1. The first level of interoperability to be achieved consists in enabling users (primarily SoBigData practitioners) to discover, in a seamless way, information (aka metadata) on the available datasets. This will be achieved by explicitly registering/publishing dataset information through the dataset catalogue service hosted by the SoBigData Infrastructure. Once registered, datasets can be discovered by search (Google-like and faceted search by tags) and/or browse. For each dataset the catalogue provides the user with a rich set of metadata including:

  • descriptive information like title, author/provider, keywords, and description aiming at providing details on the dataset,
  • coverage oriented information to characterise the extent (e.g. spatial extent, temporal extent) of the dataset,
  • classification oriented information like tags,
  • access oriented information like web protocols for accessing the actual data in any of the formats it is made available,
  • usage oriented information like licence.

To collect such metadata, SoBigData

  • defines a dataset application profile by building on existing formats like DataCite and DCAT,
  • requires data providers to register their datasets into the catalogue.

In case datasets are already published in other repositories, e.g. Geospatial data repositories compliant with CWS standard, the integration might be semi-automatic by harvesting the metadata and repurposing them to comply with the SoBigData application profile.

Method Integration

A method is an implementation of a social mining algorithm / procedure. An initial list of methods to be considered / integrated in the SoBigData e-InfrastructureAn operational combination of digital technologies (hardware and software), resources (data and services), communications (protocols, access rights and networks), and the people and organizational structures needed to support research efforts and collaboration in the large. is described in D10.2. The integration of this typology of social mining asset relies on:

  • the SoBigData Infrastructure information system / registry for discovery purposes;
  • the gCube-based data analytics engine equipping the SoBigData Infrastructure for enactment / operation purposes.

A very lightweight integration is achieved by explicitly registering/publishing method information through the catalogue service hosted by the SoBigData Infrastructure. Each method is expected to be equipped with a sort of web-based “landing page”, i.e. a web-page where users are provided with any information (documentation, download, examples) enabling the user to make use the specific method. The link to this landing page represents a key information to be maintained in the catalogue in addition to the rest of information needed for discovery purposes.

For an effective integration method owners are requested to:

  • make their method compliant with the guidelines of the hosting platform according to the methodology described in a dedicated Wiki page. This activity might require some modification / adaptation of the method implementation, e.g. for input parameters specification. The cost of this adaptation depends on the complexity of the method;
  • publish the algorithm through the platform. In case the method is implemented with a R script, the platform is provided with a facility supporting this publishing phase;

Once integrated, the method becomes a SoBigData social mining asset that:

  • will benefit from a distributed and scalable computing platform;
  • can be exploited in the context of many virtual research environments and it is suitable for being repurposed / applied to datasets;
  • will be automatically made available via a web-based GUI as well as with web-based protocols (SOAP and Rest);

is monitored and assessed by SoBigData tools, e.g. detailed statistics on usage are transparently collected.

Application Integration

An application is a stand-alone system offering one or more social mining methods. In some cases it offers also some social mining datasets. An initial list of applications to be considered / integrated in the SoBigData e-InfrastructureAn operational combination of digital technologies (hardware and software), resources (data and services), communications (protocols, access rights and networks), and the people and organizational structures needed to support research efforts and collaboration in the large. is described in D10.2. The integration of this typology of social mining asset relies on:

  • the SoBigData Infrastructure information system / registry for discovery purposes;
  • the gCube-based data analytics engine equipping the SoBigData Infrastructure for enactment / operation purposes.

A very lightweight integration is achieved by explicitly registering/publishing application information through the catalogue service hosted by the SoBigData Infrastructure. Each application is expected to be equipped with a sort of web-based “landing page”, i.e. a web-page where users are provided with any information (documentation, download, examples) enabling the user to make use the specific application. The link to this landing page represents a key information to be maintained in the catalogue in addition to the rest of information needed for discovery purposes. For an effective integration application owners are requested to:

  • reconsider their application architecture thus to extrapolate the methods and the datasets;
  • make their methods compliant with the guidelines of the hosting platform according to the methodology described in a dedicated Wiki page. This activity might require some modification / adaptation of the method implementation, e.g. for input parameters specification. The cost of this adaptation depends on the complexity of the method;
  • publish the algorithm through the platform. In case the method is implemented with a R script, the platform is provided with a facility supporting this publishing phase;
  • transform the datasets in publishable assets and publish them as previously described;

Once integrated, the application actually become a number of SoBigData social mining assets that:

  • will benefit from a distributed and scalable computing platform;
  • can be exploited in the context of many virtual research environments and it is suitable for being repurposed / applied to datasets;
  • will be automatically made available via a web-based GUI as well as with web-based protocols (SOAP and Rest);

are monitored and assessed by SoBigData tools, e.g. detailed statistics on usage are transparently collected.

Service Integration

A service is a web-based facility offering one or more social mining methods. An initial list of services to be considered / integrated in the SoBigData e-InfrastructureAn operational combination of digital technologies (hardware and software), resources (data and services), communications (protocols, access rights and networks), and the people and organizational structures needed to support research efforts and collaboration in the large. is described in D10.2. The integration of this typology of social mining asset relies on:

  • the SoBigData Infrastructure information system / registry for discovery purposes;
  • the gCube-based hosting platform equipping the SoBigData Infrastructure for operational purposes.

In particular, if the service is a Java web service compliant with JAX-RS, JAX-WS service owners are requested to:

  • produce some simple configuration files according to the guidelines given in the SmartGears Wiki page;
  • deploy their service in a SmartGear hosting node;

Once integrated, according to this pattern the service becomes a SoBigData social mining asset that:

  • will benefit from a distributed and scalable hosting infrastructure, e.g. the SoBogData Infrastructure manager can create one or more instances of such a service;
  • can be exploited in the context of many virtual research environments and it is suitable for being repurposed / applied to datasets;
  • is monitored and assessed by SoBigData tools, e.g. detailed statistics on usage are transparently collected.

If the service is a non-Java based web service the level of integration is more superficial and supports only discovery. Service owner is actually requested to register the service instance in the SoBigData information system / registry by rich metadata including a web-based access point to use to interact with the service instance.