Difference between revisions of "ICES SGVMS"

From D4Science Wiki
Jump to: navigation, search
(Experimentation)
Line 77: Line 77:
 
As first step, we evaluated the performance of the R Script when running on one of the i-Marine machines. The machine was an Intel i7-3770 CPU @3.4 GHz, with 16 GB RAM, running Ubuntu 12.04, 64 bits.  
 
As first step, we evaluated the performance of the R Script when running on one of the i-Marine machines. The machine was an Intel i7-3770 CPU @3.4 GHz, with 16 GB RAM, running Ubuntu 12.04, 64 bits.  
 
A summary of the performance is reported in a public document, available [http://goo.gl/f3Y3je  here].
 
A summary of the performance is reported in a public document, available [http://goo.gl/f3Y3je  here].
The performance in terms of memory are reported in the following figure, which run the script on 97016 vessels trajectories points. The required memory was more than 20 GB.
+
The performance in terms of memory are reported in the following figure, which run the script on 97,016 vessels trajectories points requiring 10 minutes resolution. The required memory was more than 20 GB.
  
# figure
+
[[Image:memorySGVMS.png|frame|center|Memory usage by VMSTools on 97,016 vessels points interpolations at 10 minutes resolution.]]
  
 
On the other side, we evaluated the computation time required by the script at the variation of the number of points. The next figure shows the exponential nature of the computation time
 
On the other side, we evaluated the computation time required by the script at the variation of the number of points. The next figure shows the exponential nature of the computation time
  
# figure
+
[[Image:computationTimeSGVMS.png|frame|center|Variation in the computation time with respect to the number of vessels points to process.]]
# table
+
  
 
====Integration Result====
 
====Integration Result====
Line 94: Line 93:
 
The next figure shows the interface to the procedure automatically generated by the e-Infrastructure, on the basis of the inputs\outputs specifications
 
The next figure shows the interface to the procedure automatically generated by the e-Infrastructure, on the basis of the inputs\outputs specifications
  
# figure
+
[[Image:InterfaceSGVMS.png|frame|center|Interface for the Vessels interpolation procedure. The interface was automatically produced by the Statistical Manager.]]
  
 
The Statistical Manager dataspace environment reports the used inputs and the produced outputs
 
The Statistical Manager dataspace environment reports the used inputs and the produced outputs
  
# figure
+
[[Image:inputsoutputsSGVMS.png|frame|center|A summary of the TACSAT2 files produced by the computations. Note that inputs are separated from outputs by the Imported\Computed indication.]]
  
 
The "Check the Computations" environment allows for having the history of the experiments, along with the used parameters and the produced outputs (provenance)
 
The "Check the Computations" environment allows for having the history of the experiments, along with the used parameters and the produced outputs (provenance)
  
# figure
+
[[Image:checkcomp1SGVMS.png|frame|center|Report of the input parameters used in a previously run process.]]
 +
 
 +
[[Image:checkcomp2SGVMS.png|frame|center|Report of the output produced by a previously run process.]]
 +
 
  
 
=== Related links ===
 
=== Related links ===

Revision as of 11:35, 16 June 2014

Hypothesis and Thesis

The premise of this activity was a review of the ICES procedure for interpolating Vessels routes. A feasibility study was produced on the basis of the 2012 report by the Study Group on VMS data (SGVMS) on vessels data analysis.

Our review is available at the following address: http://goo.gl/risQre

The scope of the SGVMS is to supply ICES expert groups with information and highlights. Interested groups would manage the following fields of research: bird ecology, marine mammal ecology, spatial planning, socio-economics. The products of the SGVMS analyses involve (i) spatially detailed maps of fishing effort by métier, (ii) trends in effort over time and (iii) identification of regions unimpacted by certain gears. Starting from this point, the scope of this experiment is to show what the i-Marine e-InfrastructureAn operational combination of digital technologies (hardware and software), resources (data and services), communications (protocols, access rights and networks), and the people and organizational structures needed to support research efforts and collaboration in the large. can add to the SGVMS procedures.

  • Which enhancements can bring importing SGVMS tools in the i-Marine e-InfrastructureAn operational combination of digital technologies (hardware and software), resources (data and services), communications (protocols, access rights and networks), and the people and organizational structures needed to support research efforts and collaboration in the large.?
  • Which is the performance of the resulting process?

In this experiment we give an answer to the above questions.

Outcome

The results of this experiment highlight that there are advantages in integrating SGVMS tools in the e-InfrastructureAn operational combination of digital technologies (hardware and software), resources (data and services), communications (protocols, access rights and networks), and the people and organizational structures needed to support research efforts and collaboration in the large.. We demonstrate this with a practical example on Vessels points interpolation.

We can summarize these with the following points: Start each line with a number sign (#).

  1. The e-InfrastructureAn operational combination of digital technologies (hardware and software), resources (data and services), communications (protocols, access rights and networks), and the people and organizational structures needed to support research efforts and collaboration in the large. enables multi-tenancy and synchronous interrogation to a standalone procedure with hardcoded inputs and outputs
  2. A graphical user interface is automatically generated on top of the procedure
  3. The e-InfrastructureAn operational combination of digital technologies (hardware and software), resources (data and services), communications (protocols, access rights and networks), and the people and organizational structures needed to support research efforts and collaboration in the large. allows for executing R scripts on powerful machines
  4. The script can potentially be fed with datasets yet uploaded on the e-InfrastructureAn operational combination of digital technologies (hardware and software), resources (data and services), communications (protocols, access rights and networks), and the people and organizational structures needed to support research efforts and collaboration in the large.
  5. The integration allows non-R programmers to use an R Script
  6. The system enables automatic provenance management: the history of the experiments, the used inputs and the produced outputs are automatically recorded
  7. The system allows for inputs, outputs and parameters sharing in easy way
  8. Information is stored on Hi-availability, distributed storage systems
  9. The procedure can be used by external people, if the process is allowed to be published under WPS standard. Connection is even possible by means of Java thin clients

Activity Workflow

A logbook of the activity, from the requirements to the implementation can be found here: https://issue.imarine.research-infrastructures.eu/ticket/2861

Conclusion

The weakness points of the sequential solution by SGVMS can be summarized with in following points:

  1. The SGVMS proposes several approaches to vessels tracks interpolation. Nevertheless, they state that these methods should be compared to each other and should be tested against a high resolution dataset. This would be useful to assess which of the methods most closely reflects reality. They expect that different methods might appear most suitable depending on gear or fleet;
  2. The users of their platform should be able to (i) understand of the contents of the data being analyzed, (ii) work with a command-line interface environment, (iii) use adequate resources to ensure standardized but meaningful outputs;
  3. The SGVMS reports and encourages also other approaches to VMS analysis, e.g. Bayesian models to investigate fishing patterns and models to understand the effect of resolution of VMS analysis on benthic impact assessments. On the other side, this requires cross-domain knowledge;
  4. No mention to intersecting ecological models and VMS data is given;
  5. The procedures cannot manage synchronous calls by different users, producing different outputs.


I-Marine is endowed with a framework to import Vessels processing scripts, written in R language. Furthermore, it accommodates the above requirements by means of input\output standardization and e-InfrastructureAn operational combination of digital technologies (hardware and software), resources (data and services), communications (protocols, access rights and networks), and the people and organizational structures needed to support research efforts and collaboration in the large. facilities. These can be summarized in the following:

  1. separation between final users and developers of the process;
  2. multi-tenancy an multi-user facilities;
  3. resources sharing;
  4. input\output datasets reusability;
  5. applicability of models from other domains (Bayesian models);
  6. intersection with models developed in other domains (e.g. Aquamaps).

Future Development

Future development on top of the here presented integration can involve the following points:

  1. Porting of other SGVMS tools onto the e-InfrastructureAn operational combination of digital technologies (hardware and software), resources (data and services), communications (protocols, access rights and networks), and the people and organizational structures needed to support research efforts and collaboration in the large.;
  2. Using i-Marine to extend the SGVMS tools, e.g. for producing FishFrame compliant documents;
  3. Using i-Marine to practically interpolating large vessels tracks;
  4. Using Time series analysis tools on vessels tracks;
  5. Extracting backward and forward fishing indicators;
  6. Distributing VMS related aggregated data products through the infrastructure;
  7. Intersecting VMS/FishFrame data with niche models (Aquamaps) to retrieve the list of species possibly involved in catches;
  8. Applying clustering to vessels tracks to detect similar behaviors by vessels;
  9. Using Bayesian models to automatically classifying fishing activity.

Experimentation

As first step, we analyzed the VMSTools suite, published by SGVMS on the google code repository. This suite involves sevaral scripts and procedures to analyze VMS data. We extracted the VMS interpolation procedures from VMSTools. These accept a set of VMS trajectories as input and produce interpolated versions, using either Hermite Cubic Splines or Straight line interpolation. Inputs have to follow the TACSAT2 format. One example of input file can be found here. A sample of interpolated output is visible here.

We produced a single R Script including the necessary VMS tools procedures. The script can be found here. It builds on top of the interpolation and reconstruction functions in the VMSTools.

Performance

As first step, we evaluated the performance of the R Script when running on one of the i-Marine machines. The machine was an Intel i7-3770 CPU @3.4 GHz, with 16 GB RAM, running Ubuntu 12.04, 64 bits. A summary of the performance is reported in a public document, available here. The performance in terms of memory are reported in the following figure, which run the script on 97,016 vessels trajectories points requiring 10 minutes resolution. The required memory was more than 20 GB.

Memory usage by VMSTools on 97,016 vessels points interpolations at 10 minutes resolution.

On the other side, we evaluated the computation time required by the script at the variation of the number of points. The next figure shows the exponential nature of the computation time

Variation in the computation time with respect to the number of vessels points to process.

Integration Result

We integrated the script with the i-Marine e-InfrastructureAn operational combination of digital technologies (hardware and software), resources (data and services), communications (protocols, access rights and networks), and the people and organizational structures needed to support research efforts and collaboration in the large. facilities. This required to build an algorithm for the Statistical Manager platform. The Statistical Manager comes with a development framework to rapidly integrate R Scripts.

The added value of this framework is that it automatically accounts for multiple users requests even if the input\output of the R script is static. This is achieved by performing on-the-fly code injection.

The result of such activity was that the procedure was deployed on the i-Marine e-InfrastructureAn operational combination of digital technologies (hardware and software), resources (data and services), communications (protocols, access rights and networks), and the people and organizational structures needed to support research efforts and collaboration in the large., which automatically endowed it with a user interface, and with facilities for sharing and provenance maintenance. The next figure shows the interface to the procedure automatically generated by the e-InfrastructureAn operational combination of digital technologies (hardware and software), resources (data and services), communications (protocols, access rights and networks), and the people and organizational structures needed to support research efforts and collaboration in the large., on the basis of the inputs\outputs specifications

Interface for the Vessels interpolation procedure. The interface was automatically produced by the Statistical Manager.

The Statistical Manager dataspace environment reports the used inputs and the produced outputs

A summary of the TACSAT2 files produced by the computations. Note that inputs are separated from outputs by the Imported\Computed indication.

The "Check the Computations" environment allows for having the history of the experiments, along with the used parameters and the produced outputs (provenance)

Report of the input parameters used in a previously run process.
Report of the output produced by a previously run process.


Related links

A tutorial on the Statistical Manager

Implementing algorithms with the Statistical Manager Framework

Using the Statistical Manager by external thin clients

The Statistical Manager interface on the i-Marine Portal (Biodiversity Lab VRE)