Difference between revisions of "R algorithm integration with Statistical Manager"
From D4Science Wiki
Line 1: | Line 1: | ||
== Hypothesis and Thesis == | == Hypothesis and Thesis == | ||
− | This experiments | + | This experiments performed by FAO aim to test and assess how data managers / developers can plug easily algorithms (especially R algorithms) in the infrastructure, through the Statistical Manager tool, and respond quickly to data analysis needs while benefiting of iMarine computing resources. |
The product of this experiment is a basic service that allows to convert a SDMX dataset, provided through a SDMX service URL, to the CSV format. | The product of this experiment is a basic service that allows to convert a SDMX dataset, provided through a SDMX service URL, to the CSV format. | ||
Line 11: | Line 11: | ||
== Outcome == | == Outcome == | ||
− | + | The results of this experiment that the procedure of integrating R scripts as algorithms is a quick, straightforward and sustainable to be considered by institutions that wish to plug data analysis algorithms. The benefits are the following: | |
+ | * The e-infrastructure, by means of the Statistical Manager, provides a fast, straightforward, well-documented and sustainable procedure of algorithm integration, highlightly recommended for institutions | ||
+ | * In term of software tools & programming language, some basic knowledge is required: | ||
+ | ** basic knowledge in Java programming is required. | ||
+ | ** knowledge of an IDE (e.g. Eclipse) and SVN is recommended | ||
+ | ** Additional knowledge of Maven is optional, but required only if data managers intend to build a separate Java project to deliver the algorithms (as done in this exercice). | ||
+ | * Through this procedure, the e-infrastructure offers a powerful tool to institutions, especially research institutions, to expose scripts (often scattered among offices) to be exposed as web-services, and make benefits of the e-infrastructure computing resources | ||
+ | |||
== Activity Workflow == | == Activity Workflow == | ||
Line 24: | Line 31: | ||
** the need for data managers to indicate the eventual R package dependencies to install prior to the algorithm deployment | ** the need for data managers to indicate the eventual R package dependencies to install prior to the algorithm deployment | ||
** how to add the algorithm within a given category of algorithms (for display purpose in the Statistical Manager user interface) | ** how to add the algorithm within a given category of algorithms (for display purpose in the Statistical Manager user interface) | ||
− | * The algorithm was successfully deployed and is currently operational in the [https://dev3.d4science.org/group/devvre/sm development portal] | + | * The algorithm was successfully deployed and is currently operational in the [https://dev3.d4science.org/group/devvre/sm development portal], and usable in the rich user interface of the Statistical Manager. |
== Conclusion == | == Conclusion == | ||
Line 31: | Line 38: | ||
== Recommendations & future developments == | == Recommendations & future developments == | ||
− | |||
TBD | TBD | ||
Revision as of 14:58, 19 June 2014
Hypothesis and Thesis
This experiments performed by FAO aim to test and assess how data managers / developers can plug easily algorithms (especially R algorithms) in the infrastructure, through the Statistical Manager tool, and respond quickly to data analysis needs while benefiting of iMarine computing resources.
The product of this experiment is a basic service that allows to convert a SDMX dataset, provided through a SDMX service URL, to the CSV format.
The broader scope of this experiment is:
- to assess how a data manager / developer can plug an algorithm by their own,
- to identify potential improvements to make the R script integration quick and easy
Outcome
The results of this experiment that the procedure of integrating R scripts as algorithms is a quick, straightforward and sustainable to be considered by institutions that wish to plug data analysis algorithms. The benefits are the following:
- The e-infrastructure, by means of the Statistical Manager, provides a fast, straightforward, well-documented and sustainable procedure of algorithm integration, highlightly recommended for institutions
- In term of software tools & programming language, some basic knowledge is required:
- basic knowledge in Java programming is required.
- knowledge of an IDE (e.g. Eclipse) and SVN is recommended
- Additional knowledge of Maven is optional, but required only if data managers intend to build a separate Java project to deliver the algorithms (as done in this exercice).
- Through this procedure, the e-infrastructure offers a powerful tool to institutions, especially research institutions, to expose scripts (often scattered among offices) to be exposed as web-services, and make benefits of the e-infrastructure computing resources
Activity Workflow
- The activity was done by familiarizing with the Statistical Manager, relying both on the documentation and a tutorial video made available to facilitate the integration of algorithms.
- A basic R script was created to test the Statistical Manager. This script allows to convert a SDMX-ML dataset to CSV.
- In order to integrate the R script, a separate Java Maven project was created (with the aim to add further algorithm later).
- Few exchange with the Statistical Managers developers was required for the project settings, an highlighted some few scatter in the documentation
- The R script was integrated in the project, tested and sent to Statistical Manager team for its deployment
- Additional exchange with the team took place, to have some clarifications on:
- algorithms inputs (difference between a File input and remote resource - URL - input)
- the need for data managers to indicate the eventual R package dependencies to install prior to the algorithm deployment
- how to add the algorithm within a given category of algorithms (for display purpose in the Statistical Manager user interface)
- The algorithm was successfully deployed and is currently operational in the development portal, and usable in the rich user interface of the Statistical Manager.
Conclusion
TBD
Recommendations & future developments
TBD
Experimentation
TBD