The ScalableDataMining Virtual Research Environment

From D4Science Wiki
Revision as of 19:26, 19 March 2014 by Leonardo.candela (Talk | contribs) (Software)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Description

Scalable Data Mining is a VREVirtual Research Environment. designed to apply Data Mining techniques to biological data. The algorithms are executed in a distributed fashion on the e-InfrastructureAn operational combination of digital technologies (hardware and software), resources (data and services), communications (protocols, access rights and networks), and the people and organizational structures needed to support research efforts and collaboration in the large. nodes or on local multi-core machines. Scalability is thus meant as distributed data processing but even as services dynamically provided to the users. The system is scalable in the number of users and in the size of the data to process. Statistical data processing can be applied to perform Niche Modelling or Ecological Modelling experiments. Other applications can use general purpose techniques like Bayesian models. Time series of observations can be managed as well, in order to classify trends, catch anomaly patterns and perform simulations. The idea under the distributed computation for data mining techniques is to overcome common limitations that can happen when using statistical algorithms:

  • the training and projection procedure timing,
  • the linear or non-linear time increase when the number of data to process increases,
  • the multiple runs needed for reducing overfitting or local minima problems,
  • the multiple models topologies to be evaluated for assessing the optimal model's configuration.

All the above issues strongly limit the amount of time a scientist can dedicate to the evaluation of the results and to the combination and comparison of the outcomes of different experiments. On the other side the Statistical Data Mining VREVirtual Research Environment. adds advantages in using a distributed e-InfrastructureAn operational combination of digital technologies (hardware and software), resources (data and services), communications (protocols, access rights and networks), and the people and organizational structures needed to support research efforts and collaboration in the large. endowed with many data sources. Some of these are:

  • efficiency and time saving in computations;
  • availability of a set of data sources containing environmental or species features;
  • reliability of the quality of the features;
  • certification of compliancy between e-InfrastructureAn operational combination of digital technologies (hardware and software), resources (data and services), communications (protocols, access rights and networks), and the people and organizational structures needed to support research efforts and collaboration in the large. data sources and algorithms inputs\outputs;
  • import of users' own files;
  • sharing of results and users' files.

URI: https://portal.i-marine.d4science.org/group/scalabledatamining

The ScalableDataMining Virtual Research Environment Homepage

Functionality

The main facilities this VREVirtual Research Environment. offers are:

  • Statistical Manager:
  • Workspace: to enable every user to store and organise the information objects he/she is interested to work with. In addition to that, the user is allowed to collaborate with other users by sharing objects and messages;
  • VREVirtual Research Environment. Management: to enable authorised users (i.e. VREVirtual Research Environment. Managers) to manage other users using or willing to access the VREVirtual Research Environment.. VREVirtual Research Environment. Managers can (i) authorise users in accessing the VREVirtual Research Environment., (ii) assign or withdraw roles to users, (iii) remove users, and (iv) send a communication to the current users.

Data

Software

  • (DONE) gCube 2.11.0 (October '12);
  • (DONE) gCube 2.12.0 (February '13);
  • (DONE) gCube 2.13.0 (March '13);
  • (DONE) gCube 2.14.0 (May '13);
  • (DONE) gCube 2.15.0 (June '13);
  • (DONE) gCube 2.16.0 (July '13);
  • (DONE) gCube 2.17.0 (December '13);
  • (DONE) gCube 2.17.1 (December '13);
  • (DONE) gCube 3.0.0 (March '14);