Difference between revisions of "R algorithm integration with Statistical Manager"

From D4Science Wiki
Jump to: navigation, search
(Hypothesis and Thesis)
Line 1: Line 1:
 
 
== Hypothesis and Thesis ==
 
== Hypothesis and Thesis ==
  
Line 16: Line 15:
 
== Activity Workflow ==
 
== Activity Workflow ==
  
* The activity was done by familiarizing with the Statistical Manager, relying both on the documentation and a tutorial made available to facilitate the integration of algorithms.
+
* The activity was done by familiarizing with the Statistical Manager, relying both on the [http://gcube.wiki.gcube-system.org/gcube/index.php/How-to_Implement_Algorithms_for_the_Statistical_Manager documentation] and a [http://i-marine.eu/Content/eTraining.aspx?id=e1777006-a08c-49ad-b2e6-c13e094f27d4 tutorial video] made available to facilitate the integration of algorithms.
 
* A basic R script was created to test the Statistical Manager. This script allows to convert a SDMX-ML dataset to CSV.
 
* A basic R script was created to test the Statistical Manager. This script allows to convert a SDMX-ML dataset to CSV.
 
* In order to integrate the R script, a separate Java Maven project was created (with the aim to add further algorithm later).  
 
* In order to integrate the R script, a separate Java Maven project was created (with the aim to add further algorithm later).  
Line 24: Line 23:
 
** algorithms inputs (difference between a File input and remote resource - URL - input)
 
** algorithms inputs (difference between a File input and remote resource - URL - input)
 
** the need for data managers to indicate the eventual R package dependencies to install prior to the algorithm deployment
 
** the need for data managers to indicate the eventual R package dependencies to install prior to the algorithm deployment
** how to add the algorithm within a given category of algorithms (for display purpose)
+
** how to add the algorithm within a given category of algorithms (for display purpose in the Statistical Manager user interface)
 
* The algorithm was successfully deployed and is currently operational in the [https://dev3.d4science.org/group/devvre/sm development portal]
 
* The algorithm was successfully deployed and is currently operational in the [https://dev3.d4science.org/group/devvre/sm development portal]
  

Revision as of 10:03, 19 June 2014

Hypothesis and Thesis

This experiments aims to test and assess how data managers / developers can plug easily algorithms (especially R algorithms) in the infrastructure, through the Statistical Manager tool, and respond quickly to data analysis needs while benefiting of iMarine computing resources.

The product of this experiment is a basic service that allows to convert a SDMX dataset, provided through a SDMX service URL, to the CSV format.

The broader scope of this experiment is:

  • to assess how a data manager / developer can plug an algorithm by their own,
  • to identify potential improvements to make the R script integration quick and easy

Outcome

TBD

Activity Workflow

  • The activity was done by familiarizing with the Statistical Manager, relying both on the documentation and a tutorial video made available to facilitate the integration of algorithms.
  • A basic R script was created to test the Statistical Manager. This script allows to convert a SDMX-ML dataset to CSV.
  • In order to integrate the R script, a separate Java Maven project was created (with the aim to add further algorithm later).
  • Few exchange with the Statistical Managers developers was required for the project settings, an highlighted some few scatter in the documentation
  • The R script was integrated in the project, tested and sent to Statistical Manager team for its deployment
  • Additional exchange with the team took place, to have some clarifications on:
    • algorithms inputs (difference between a File input and remote resource - URL - input)
    • the need for data managers to indicate the eventual R package dependencies to install prior to the algorithm deployment
    • how to add the algorithm within a given category of algorithms (for display purpose in the Statistical Manager user interface)
  • The algorithm was successfully deployed and is currently operational in the development portal

Conclusion

TBD

Recommendations & future developments

TBD

Experimentation

TBD

Related links