Difference between revisions of "R algorithm integration with Statistical Manager"

Revision as of 14:58, 19 June 2014

Hypothesis and Thesis

This experiments performed by FAO aim to test and assess how data managers / developers can plug easily algorithms (especially R algorithms) in the infrastructure, through the Statistical Manager tool, and respond quickly to data analysis needs while benefiting of iMarine computing resources.

The product of this experiment is a basic service that allows to convert a SDMX dataset, provided through a SDMX service URL, to the CSV format.

The broader scope of this experiment is:

to assess how a data manager / developer can plug an algorithm by their own,
to identify potential improvements to make the R script integration quick and easy

Outcome

The results of this experiment that the procedure of integrating R scripts as algorithms is a quick, straightforward and sustainable to be considered by institutions that wish to plug data analysis algorithms. The benefits are the following:

The e-infrastructure, by means of the Statistical Manager, provides a fast, straightforward, well-documented and sustainable procedure of algorithm integration, highlightly recommended for institutions
In term of software tools & programming language, some basic knowledge is required:
- basic knowledge in Java programming is required.
- knowledge of an IDE (e.g. Eclipse) and SVN is recommended
- Additional knowledge of Maven is optional, but required only if data managers intend to build a separate Java project to deliver the algorithms (as done in this exercice).
Through this procedure, the e-infrastructure offers a powerful tool to institutions, especially research institutions, to expose scripts (often scattered among offices) to be exposed as web-services, and make benefits of the e-infrastructure computing resources

Activity Workflow

The activity was done by familiarizing with the Statistical Manager, relying both on the documentation and a tutorial video made available to facilitate the integration of algorithms.
A basic R script was created to test the Statistical Manager. This script allows to convert a SDMX-ML dataset to CSV.
In order to integrate the R script, a separate Java Maven project was created (with the aim to add further algorithm later).
Few exchange with the Statistical Managers developers was required for the project settings, an highlighted some few scatter in the documentation
The R script was integrated in the project, tested and sent to Statistical Manager team for its deployment
Additional exchange with the team took place, to have some clarifications on:
- algorithms inputs (difference between a File input and remote resource - URL - input)
- the need for data managers to indicate the eventual R package dependencies to install prior to the algorithm deployment
- how to add the algorithm within a given category of algorithms (for display purpose in the Statistical Manager user interface)
The algorithm was successfully deployed and is currently operational in the development portal, and usable in the rich user interface of the Statistical Manager.

Conclusion

TBD

Recommendations & future developments

TBD

Experimentation

TBD

@@ Line 1: / Line 1: @@
 == Hypothesis and Thesis ==
-This experiments aims to test and assess how data managers / developers can plug easily algorithms (especially R algorithms) in the infrastructure, through the Statistical Manager tool, and respond quickly to data analysis needs while benefiting of iMarine computing resources.
+This experiments performed by FAO aim to test and assess how data managers / developers can plug easily algorithms (especially R algorithms) in the infrastructure, through the Statistical Manager tool, and respond quickly to data analysis needs while benefiting of iMarine computing resources.
 The product of this experiment is a basic service that allows to convert a SDMX dataset, provided through a SDMX service URL, to the CSV format.
@@ Line 11: / Line 11: @@
 == Outcome ==
-TBD
+The results of this experiment that the procedure of integrating R scripts as algorithms is a quick, straightforward and sustainable to be considered by institutions that wish to plug data analysis algorithms. The benefits are the following:
+* The e-infrastructure, by means of the Statistical Manager, provides a fast, straightforward, well-documented and sustainable procedure of algorithm integration, highlightly recommended for institutions
+* In term of software tools & programming language, some basic knowledge is required:
+** basic knowledge in Java programming is required.
+** knowledge of an IDE (e.g. Eclipse) and SVN is recommended
+** Additional knowledge of Maven is optional, but required only if data managers intend to build a separate Java project to deliver the algorithms (as done in this exercice).
+* Through this procedure, the e-infrastructure offers a powerful tool to institutions, especially research institutions, to expose scripts (often scattered among offices) to be exposed as web-services, and make benefits of the e-infrastructure computing resources
 == Activity Workflow ==
@@ Line 24: / Line 31: @@
 ** the need for data managers to indicate the eventual R package dependencies to install prior to the algorithm deployment
 ** how to add the algorithm within a given category of algorithms (for display purpose in the Statistical Manager user interface)
-* The algorithm was successfully deployed and is currently operational in the [https://dev3.d4science.org/group/devvre/sm development portal]
+* The algorithm was successfully deployed and is currently operational in the [https://dev3.d4science.org/group/devvre/sm development portal], and usable in the rich user interface of the Statistical Manager.
 == Conclusion ==
@@ Line 31: / Line 38: @@
 == Recommendations & future developments ==
 TBD

Difference between revisions of "R algorithm integration with Statistical Manager"

Revision as of 14:58, 19 June 2014

Hypothesis and Thesis

Outcome

Activity Workflow

Conclusion

Recommendations & future developments

Experimentation

Related links

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

D4Science

Capacity

Procedures

Policies

Documentation

Tools