Difference between revisions of "ICES SGVMS"
Line 5: | Line 5: | ||
The scope of the SGVMS is to supply ICES expert groups with information and highlights. Interested groups would manage the following fields of research: bird ecology, marine mammal ecology, spatial planning, socio-economics. The products of the SGVMS analyses involve (i) spatially detailed maps of fishing effort by métier, (ii) trends in effort over time and (iii) identification of regions unimpacted by certain gears. | The scope of the SGVMS is to supply ICES expert groups with information and highlights. Interested groups would manage the following fields of research: bird ecology, marine mammal ecology, spatial planning, socio-economics. The products of the SGVMS analyses involve (i) spatially detailed maps of fishing effort by métier, (ii) trends in effort over time and (iii) identification of regions unimpacted by certain gears. | ||
+ | |||
Starting from this point, the scope of this experiment is to show what the i-Marine e-Infrastructure can add to the SGVMS procedures. | Starting from this point, the scope of this experiment is to show what the i-Marine e-Infrastructure can add to the SGVMS procedures. | ||
− | * Which enhancements can | + | In particular, we want to answer to the following questions: |
+ | * Which enhancements can importing SGVMS tools in the i-Marine e-Infrastructure bring to the original process? | ||
* Which is the performance of the resulting process? | * Which is the performance of the resulting process? | ||
Line 16: | Line 18: | ||
The results of this experiment highlight that there are advantages in integrating SGVMS tools in the e-Infrastructure. We demonstrate this with a practical example on Vessels points interpolation. | The results of this experiment highlight that there are advantages in integrating SGVMS tools in the e-Infrastructure. We demonstrate this with a practical example on Vessels points interpolation. | ||
− | We | + | We summarize benefits in the following: |
− | + | ||
− | + | ||
− | # The e-Infrastructure enables multi-tenancy and synchronous interrogation to a standalone procedure | + | # The e-Infrastructure allows for executing processes that are highly demanding in terms of hardware resources |
+ | # The e-Infrastructure enables multi-tenancy and synchronous interrogation to a standalone procedure containing hardcoded inputs and outputs | ||
# A graphical user interface is automatically generated on top of the procedure | # A graphical user interface is automatically generated on top of the procedure | ||
# The e-Infrastructure allows for executing R scripts on powerful machines | # The e-Infrastructure allows for executing R scripts on powerful machines | ||
− | # The | + | # The scripts can potentially be fed with datasets yet available in the e-Infrastructure |
# The integration allows non-R programmers to use an R Script | # The integration allows non-R programmers to use an R Script | ||
− | # The system enables automatic provenance management: the history of the experiments, the used inputs and the produced outputs are automatically recorded | + | # The system enables automatic provenance management: the history of the experiments, the used inputs and the produced outputs are automatically recorded and stored |
# The system allows for inputs, outputs and parameters sharing in easy way | # The system allows for inputs, outputs and parameters sharing in easy way | ||
− | # Information is stored on | + | # Information is stored on hi-availability, distributed storage systems |
# The procedure can be used by external people, if the process is allowed to be published under WPS standard. Connection is even possible by means of Java thin clients | # The procedure can be used by external people, if the process is allowed to be published under WPS standard. Connection is even possible by means of Java thin clients | ||
=== Activity Workflow === | === Activity Workflow === | ||
− | A logbook of the activity, from the requirements to the implementation can be found | + | A logbook of the activity, from the requirements to the implementation, can be found [https://issue.imarine.research-infrastructures.eu/ticket/2861 here]. |
=== Conclusion === | === Conclusion === |
Revision as of 11:45, 16 June 2014
Hypothesis and Thesis
The premise of this activity was a review of the ICES procedure for interpolating Vessels routes. A feasibility study was produced on the basis of the 2012 report by the Study Group on VMS data (SGVMS) on vessels data analysis.
Our review is available at the following address: http://goo.gl/risQre
The scope of the SGVMS is to supply ICES expert groups with information and highlights. Interested groups would manage the following fields of research: bird ecology, marine mammal ecology, spatial planning, socio-economics. The products of the SGVMS analyses involve (i) spatially detailed maps of fishing effort by métier, (ii) trends in effort over time and (iii) identification of regions unimpacted by certain gears.
Starting from this point, the scope of this experiment is to show what the i-Marine e-InfrastructureAn operational combination of digital technologies (hardware and software), resources (data and services), communications (protocols, access rights and networks), and the people and organizational structures needed to support research efforts and collaboration in the large. can add to the SGVMS procedures.
In particular, we want to answer to the following questions:
- Which enhancements can importing SGVMS tools in the i-Marine e-InfrastructureAn operational combination of digital technologies (hardware and software), resources (data and services), communications (protocols, access rights and networks), and the people and organizational structures needed to support research efforts and collaboration in the large. bring to the original process?
- Which is the performance of the resulting process?
In this experiment we give an answer to the above questions.
Outcome
The results of this experiment highlight that there are advantages in integrating SGVMS tools in the e-InfrastructureAn operational combination of digital technologies (hardware and software), resources (data and services), communications (protocols, access rights and networks), and the people and organizational structures needed to support research efforts and collaboration in the large.. We demonstrate this with a practical example on Vessels points interpolation.
We summarize benefits in the following:
- The e-InfrastructureAn operational combination of digital technologies (hardware and software), resources (data and services), communications (protocols, access rights and networks), and the people and organizational structures needed to support research efforts and collaboration in the large. allows for executing processes that are highly demanding in terms of hardware resources
- The e-InfrastructureAn operational combination of digital technologies (hardware and software), resources (data and services), communications (protocols, access rights and networks), and the people and organizational structures needed to support research efforts and collaboration in the large. enables multi-tenancy and synchronous interrogation to a standalone procedure containing hardcoded inputs and outputs
- A graphical user interface is automatically generated on top of the procedure
- The e-InfrastructureAn operational combination of digital technologies (hardware and software), resources (data and services), communications (protocols, access rights and networks), and the people and organizational structures needed to support research efforts and collaboration in the large. allows for executing R scripts on powerful machines
- The scripts can potentially be fed with datasets yet available in the e-InfrastructureAn operational combination of digital technologies (hardware and software), resources (data and services), communications (protocols, access rights and networks), and the people and organizational structures needed to support research efforts and collaboration in the large.
- The integration allows non-R programmers to use an R Script
- The system enables automatic provenance management: the history of the experiments, the used inputs and the produced outputs are automatically recorded and stored
- The system allows for inputs, outputs and parameters sharing in easy way
- Information is stored on hi-availability, distributed storage systems
- The procedure can be used by external people, if the process is allowed to be published under WPS standard. Connection is even possible by means of Java thin clients
Activity Workflow
A logbook of the activity, from the requirements to the implementation, can be found here.
Conclusion
The weakness points of the sequential solution by SGVMS can be summarized with in following points:
- The SGVMS proposes several approaches to vessels tracks interpolation. Nevertheless, they state that these methods should be compared to each other and should be tested against a high resolution dataset. This would be useful to assess which of the methods most closely reflects reality. They expect that different methods might appear most suitable depending on gear or fleet;
- The users of their platform should be able to (i) understand of the contents of the data being analyzed, (ii) work with a command-line interface environment, (iii) use adequate resources to ensure standardized but meaningful outputs;
- The SGVMS reports and encourages also other approaches to VMS analysis, e.g. Bayesian models to investigate fishing patterns and models to understand the effect of resolution of VMS analysis on benthic impact assessments. On the other side, this requires cross-domain knowledge;
- No mention to intersecting ecological models and VMS data is given;
- The procedures cannot manage synchronous calls by different users, producing different outputs.
I-Marine is endowed with a framework to import Vessels processing scripts, written in R language. Furthermore, it accommodates the above requirements by means of input\output standardization and e-InfrastructureAn operational combination of digital technologies (hardware and software), resources (data and services), communications (protocols, access rights and networks), and the people and organizational structures needed to support research efforts and collaboration in the large. facilities.
These can be summarized in the following:
- separation between final users and developers of the process;
- multi-tenancy an multi-user facilities;
- resources sharing;
- input\output datasets reusability;
- applicability of models from other domains (Bayesian models);
- intersection with models developed in other domains (e.g. Aquamaps).
Future Development
Future development on top of the here presented integration can involve the following points:
- Porting of other SGVMS tools onto the e-InfrastructureAn operational combination of digital technologies (hardware and software), resources (data and services), communications (protocols, access rights and networks), and the people and organizational structures needed to support research efforts and collaboration in the large.;
- Using i-Marine to extend the SGVMS tools, e.g. for producing FishFrame compliant documents;
- Using i-Marine to practically interpolating large vessels tracks;
- Using Time series analysis tools on vessels tracks;
- Extracting backward and forward fishing indicators;
- Distributing VMS related aggregated data products through the infrastructure;
- Intersecting VMS/FishFrame data with niche models (Aquamaps) to retrieve the list of species possibly involved in catches;
- Applying clustering to vessels tracks to detect similar behaviors by vessels;
- Using Bayesian models to automatically classifying fishing activity.
Experimentation
As first step, we analyzed the VMSTools suite, published by SGVMS on the google code repository. This suite involves sevaral scripts and procedures to analyze VMS data. We extracted the VMS interpolation procedures from VMSTools. These accept a set of VMS trajectories as input and produce interpolated versions, using either Hermite Cubic Splines or Straight line interpolation. Inputs have to follow the TACSAT2 format. One example of input file can be found here. A sample of interpolated output is visible here.
We produced a single R Script including the necessary VMS tools procedures. The script can be found here. It builds on top of the interpolation and reconstruction functions in the VMSTools.
Performance
As first step, we evaluated the performance of the R Script when running on one of the i-Marine machines. The machine was an Intel i7-3770 CPU @3.4 GHz, with 16 GB RAM, running Ubuntu 12.04, 64 bits. A summary of the performance is reported in a public document, available here. The performance in terms of memory are reported in the following figure, which run the script on 97,016 vessels trajectories points requiring 10 minutes resolution. The required memory was more than 20 GB.
On the other side, we evaluated the computation time required by the script at the variation of the number of points. The next figure shows the exponential nature of the computation time
Integration Result
We integrated the script with the i-Marine e-InfrastructureAn operational combination of digital technologies (hardware and software), resources (data and services), communications (protocols, access rights and networks), and the people and organizational structures needed to support research efforts and collaboration in the large. facilities. This required to build an algorithm for the Statistical Manager platform. The Statistical Manager comes with a development framework to rapidly integrate R Scripts.
The added value of this framework is that it automatically accounts for multiple users requests even if the input\output of the R script is static. This is achieved by performing on-the-fly code injection.
The result of such activity was that the procedure was deployed on the i-Marine e-InfrastructureAn operational combination of digital technologies (hardware and software), resources (data and services), communications (protocols, access rights and networks), and the people and organizational structures needed to support research efforts and collaboration in the large., which automatically endowed it with a user interface, and with facilities for sharing and provenance maintenance. The next figure shows the interface to the procedure automatically generated by the e-InfrastructureAn operational combination of digital technologies (hardware and software), resources (data and services), communications (protocols, access rights and networks), and the people and organizational structures needed to support research efforts and collaboration in the large., on the basis of the inputs\outputs specifications
The Statistical Manager dataspace environment reports the used inputs and the produced outputs
The "Check the Computations" environment allows for having the history of the experiments, along with the used parameters and the produced outputs (provenance)
Related links
A tutorial on the Statistical Manager
Implementing algorithms with the Statistical Manager Framework
Using the Statistical Manager by external thin clients
The Statistical Manager interface on the i-Marine Portal (Biodiversity Lab VRE)