Integrate SPREAD algorithms in Statistical Manager

Hypothesis and Thesis

Ongoing experiment - under editing

This experiment is performed by FAO in order to further test and assess how data managers / developers can plug easily algorithms (especially R algorithms) in the infrastructure, through the Statistical Manager tool, and respond quickly to data analysis needs while benefiting of iMarine computing resources.

The product of this experiment include two Spatial Data Reallocation (SPREAD) algorithms:

one generic, with more parameters
one simplified, in order to better adjust SPREAD needs of the FAO Fisheries & Aquaculture department

The scope of these algorithm integration experiments is:

developer/algorithm integrator oriented
- to assess how a data manager / developer can plug an algorithm by their own,
- to identify potential improvements for the ease, speed and sustainability of the R algorithm integration procedure
end-user oriented
- to assess user friendliness of the Statistical Manager data analysis tool

Outcome

The results of this experiment confirm the results of the first experiment (SDMX Data converter) and show that the procedure of integrating R scripts as data analysis algorithms is a quick, straightforward and sustainable.

In addition to the outcome of the first experiment, the present experiment highlighted the flexibility of the Statistical Manager and its capacity to simplify algorithms inputs to guarantee user-friendliness of the algorithm execution by the end-user.

Activity Workflow

The activity consisted in adding the two algorithms (both R script & wrapping Java class) to the statistical-manager-figis-algorithms project that hosts FAO experiments, along with performing tests.
The updated archives and R scripts where shared with the Statistical Manager team
The Statistical Manager team deployed the algorithms in the iMarine development portal

Conclusion

TBD

Recommendations & future developments

TBD

Experimentation

The two algorithms were plugged very quickly in the Statistical Manager
The simplified algorithm allows FI - FIPS users to familiarize & use quickly the SPREAD algorithm, as it makes easier specifying the intersections to use for the spatial reallocation. At now the the dataset has to be input as SDMX
In this experiment, we proceed to the spatial reallocation:
- of a global catch dataset for the species Atlantic herring, from 1990 to 2010. The SDMX request is http://data.fao.org/sdmx/repository/data/CAPTURE/..HER/FAO/?startPeriod=1990&endPeriod=2010 for which catches are reported by FAO major area (FAO_MAJOR_AREA)
- from FAO major area to EEZ - High seas

View 1: View of the Statistical Manager, after filtering on "reallocation", the 2 algorithms newly added appear. Use of the simplified algorithm, where:

the SDMX getdata url is input,
the reference area corresponds to its name as referenced in the SDMX (FAO_MAJOR_AREA),
the statField corresponds to its name in the SDMX (obsValue)
we select FAO_AREAS_x_EEZ_HIGHSEAS as intersection (reallocate from FAO AREAS to EEZ - highseas)
we leave unchecked "include Computations", as we want the final dataset aggregated by EEZ - highseas

View 2: Result of the computation, where the reallocated dataset is available for download as CSV

Integrate SPREAD algorithms in Statistical Manager

Hypothesis and Thesis

Outcome

Activity Workflow

Conclusion

Recommendations & future developments

Experimentation

Related links

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

D4Science

Capacity

Procedures

Policies

Documentation

Tools