Difference between revisions of "Integrate SPREAD algorithms in Statistical Manager"

From D4Science Wiki
Jump to: navigation, search
(Experimentation)
(Experimentation)
Line 44: Line 44:
 
** from FAO major area to EEZ - High seas
 
** from FAO major area to EEZ - High seas
  
View 1: View of the Statistical Manager, after filtering on "reallocation", the 2 algorithms newly added appear. Use of the simplified algorithm, where:
+
 
 +
''View 1'': View of the Statistical Manager, after filtering on "reallocation", the 2 algorithms newly added appear. Use of the simplified algorithm, where:
 
* the SDMX getdata url is input,
 
* the SDMX getdata url is input,
 
* the reference area corresponds to its name as referenced in the SDMX (FAO_MAJOR_AREA),
 
* the reference area corresponds to its name as referenced in the SDMX (FAO_MAJOR_AREA),
Line 51: Line 52:
 
* we leave unchecked "include Computations", as we want the final dataset aggregated by EEZ - highseas
 
* we leave unchecked "include Computations", as we want the final dataset aggregated by EEZ - highseas
  
View 2: Result of the computation, where the reallocated dataset is available for download as CSV
+
 
 +
[[File:SPREAD TEST 1.jpg]]
 +
 
 +
 
 +
''View 2'': Result of the computation, where the reallocated dataset is available for download as CSV.
 +
 
 +
[[File:SPREAD TEST 2.jpg]]
  
 
== Related links ==
 
== Related links ==

Revision as of 08:52, 24 June 2014

Hypothesis and Thesis

Ongoing experiment - under editing

This experiment is performed by FAO in order to further test and assess how data managers / developers can plug easily algorithms (especially R algorithms) in the infrastructure, through the Statistical Manager tool, and respond quickly to data analysis needs while benefiting of iMarine computing resources.

The product of this experiment include two Spatial Data Reallocation (SPREAD) algorithms:

  • one generic, with more parameters
  • one simplified, in order to better adjust SPREAD needs of the FAO Fisheries & Aquaculture department

The scope of these algorithm integration experiments is:

  • developer/algorithm integrator oriented
    • to assess how a data manager / developer can plug an algorithm by their own,
    • to identify potential improvements for the ease, speed and sustainability of the R algorithm integration procedure
  • end-user oriented
    • to assess user friendliness of the Statistical Manager data analysis tool

Outcome

The results of this experiment confirm the results of the first experiment (SDMX Data converter) and show that the procedure of integrating R scripts as data analysis algorithms is a quick, straightforward and sustainable.

In addition to the outcome of the first experiment, the present experiment highlighted the flexibility of the Statistical Manager and its capacity to simplify algorithms inputs to guarantee user-friendliness of the algorithm execution by the end-user.


Activity Workflow

  • The activity consisted in adding the two algorithms (both R script & wrapping Java class) to the statistical-manager-figis-algorithms project that hosts FAO experiments, along with performing tests.
  • The updated archives and R scripts where shared with the Statistical Manager team
  • The Statistical Manager team deployed the algorithms in the iMarine development portal

Conclusion

TBD

Recommendations & future developments

TBD

Experimentation

  • The two algorithms were plugged very quickly in the Statistical Manager
  • The simplified algorithm allows FI - FIPS users to familiarize & use quickly the SPREAD algorithm, as it makes easier specifying the intersections to use for the spatial reallocation. At now the the dataset has to be input as SDMX
  • In this experiment, we proceed to the spatial reallocation:


View 1: View of the Statistical Manager, after filtering on "reallocation", the 2 algorithms newly added appear. Use of the simplified algorithm, where:

  • the SDMX getdata url is input,
  • the reference area corresponds to its name as referenced in the SDMX (FAO_MAJOR_AREA),
  • the statField corresponds to its name in the SDMX (obsValue)
  • we select FAO_AREAS_x_EEZ_HIGHSEAS as intersection (reallocate from FAO AREAS to EEZ - highseas)
  • we leave unchecked "include Computations", as we want the final dataset aggregated by EEZ - highseas


SPREAD TEST 1.jpg


View 2: Result of the computation, where the reallocated dataset is available for download as CSV.

SPREAD TEST 2.jpg

Related links