AppliFish2 Synchronization

From D4Science Wiki
Revision as of 10:30, 29 September 2013 by Anton.ellenbroek (Talk | contribs) (VRE)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

synchronization has 2 parts; the VREVirtual Research Environment. holding or serving the content, and the Mobile Apps providing or needing (sub)sets of the VREVirtual Research Environment. content.

VREVirtual Research Environment.

Push Features

The requirements for pushing from iMarine into Apps are to update the SQLite db containing the Species Fact Sheets;

  • Entire new fact-sheets;
  • Fact-sheets updates; new statistics, links, local names, maps, etc.

Pull Features - Generate New content - App side

There are many field data collection apps for environmental and biological data.

  • A good example is the very widely used app for ornithologists: Observado
  • Another 'random' app is GISCloud
  • And another example of a survey tool is SurveyMe

AppliFish could be equipped with similar data collection features.

Pull feature - Generate new content - backend

  • For AppliFish data were extracted from the FAO FishFinder Aquatic Species fact sheets. The webservice for extracting selected pieces of information through specific XPaths download the results in CSV or JSON format (F.Fiorellato).

An initial api is available here: http://figisapps.fao.org/vrmf/samples/species/FS/

The list of endpoints (URLs) for the available services are:

Full data extraction

   Syntax: http://figisapps.fao.org/vrmf/samples/services/species/FS/extract/all.<format>
       Example (extract ALL data in CSV format): http://figisapps.fao.org/vrmf/samples/services/species/FS/extract/all.csv
       Example (extract ALL data in JSON format): http://figisapps.fao.org/vrmf/samples/services/species/FS/extract/all.json 


Subset (by 3-alpha-code) data extraction

   Syntax: http://figisapps.fao.org/vrmf/samples/services/species/FS/extract/3a/<comma separated list of 3-alpha codes>.<format>
       Example (extract data for Bluefin Tuna, Swordfish and Albacore in CSV format): http://figisapps.fao.org/vrmf/samples/services/species/FS/extract/3a/BFT,SWO,ALB.csv
       Example (extract all data for Bluefin Tuna, Swordfish and Albacore in JSON format): http://figisapps.fao.org/vrmf/samples/services/species/FS/extract/3a/BFT,SWO,ALB.json 


List all associations between 3-alpha-code and factsheet IDs

   Syntax: http://figisapps.fao.org/vrmf/samples/services/species/FS/list.<format>
       Example (extract all associations in CSV format): http://figisapps.fao.org/vrmf/samples/services/species/FS/list.csv
       Example (extract all associations in JSON format): http://figisapps.fao.org/vrmf/samples/services/species/FS/list.json 


Rescan the species factsheets (to be invoked when species factsheets are added, deleted or updated in order to reflect updates in the dataset)

   Syntax: http://figisapps.fao.org/vrmf/samples/services/species/reinitialize 


The actual XPaths are:

   FAO3AlphaCode: fi:FIGISDoc/fi:AqSpecies/fi:AqSpeciesIdent/fi:FAO3AlphaCode
   FAOName.en: fi:FIGISDoc/fi:AqSpecies/fi:AqSpeciesIdent/fi:FAOName/fi:En
   FAOName.fr: fi:FIGISDoc/fi:AqSpecies/fi:AqSpeciesIdent/fi:FAOName/fi:Fr
   FAOName.sp: fi:FIGISDoc/fi:AqSpecies/fi:AqSpeciesIdent/fi:FAOName/fi:Sp
   Image (width 300)[!] fi:FIGISDoc/fi:AqSpecies/fi:AqSpeciesIdent/fi:Image[1] Please note that in several cases there are multiple images and drawings
   ScientificName: fi:FIGISDoc/fi:AqSpecies/fi:AqSpeciesIdent/fint:ScientificName
   Family fi:FIGISDoc/fi:AqSpecies/fi:AqSpeciesIdent/fi:SciName/fi:Family
   PersonalAuthor: fi:FIGISDoc/fi:AqSpecies/fi:AqSpeciesIdent/fi:SciName/ags:PersonalAuthor Please note that Author and year need to be concatenated <author, year> and putin brackets if the attribute “ChangedGenus” (fi:FIGISDoc/fi:AqSpecies/fi:AqSpeciesIdent/fi:SciName) is = Y
   Year: fi:FIGISDoc/fi:AqSpecies/fi:AqSpeciesIdent/fi:SciName/fi:Year
   DiagnosticFeat: fi:FIGISDoc/fi:AqSpecies/fi:AqSpeciesProfile/fi:DiagnosticFeat (all sub elements to be concatenated)
   AreaText: fi:FIGISDoc/fi:AqSpecies/fi:AqSpeciesFeature/fi:GeoDist fi:AqSpeciesText+ fi:AreaText
   HabitatBio: fi:FIGISDoc/fi:AqSpecies/fi:AqSpeciesFeature/fi:HabitatBio all text for its sub-elements but skipping the content of the following sub-elements:
       Bathymetry node fi:FIGISDoc/fi:AqSpecies/fi:AqSpeciesFeature/fi:HabitatBio/fi:DepthBehav/fi:Bathymetry
       Reproduction: fi:FIGISDoc/fi:AqSpecies/fi:AqSpeciesFeature/fi:HabitatBio/fi:Reproduction
       Feeding: fi:FIGISDoc/fi:AqSpecies/fi:AqSpeciesFeature/fi:HabitatBio/fi:Feeding 
   InterestFisheries only FisheriesText: fi:FIGISDoc/fi:AqSpecies/fi:AqSpeciesFeature/fi:InterestFisheries/fi:FisheriesText (all sub elements to be concatenated)
   Local names: fi:FIGISDoc/fi:AqSpecies/fi:AqSpeciesFeature/fi:LocalName then each name (and its country) is by entry fi:FIGISDoc/fi:AqSpecies/fi:AqSpeciesFeature/fi:LocalName/fi:LocalNameEntry[XXXX] we suggest a format as <localName> (Country)

App SQLite data and structure

The results are copied into a simple SQLite database, depending on the quality of the source data into a temporary table, e.g. if alfa3 was not provided, if texts were too long, if names between data providers did not match(some 25% of names were not identical between different sources), or if the structure of the result did not allow for a direct copy.

some examples of transformation required before copying into the results table are:

  • For the statistics; following the unit (tonne / numbers), merge all statistics into 1 column;
"t | 2000, 654 | 2001, 616 | 2002, 427 | 2003, 468 | 2004, 321 | 2005, 422 | 2006, 415 | 2007, 465 | 2008, 328 | 2009, 311 | 2010, 207"
  • for the local name, merge information from FAO, FB/SLB, WoRMS into 1 column. Since the result from WoRMS was normalized, a de-nomalization / group_concat was needed to achieve a result similar to:
"Thresher, English | Renard, French | Zorro, Spanish  | Common thresher, England | Whiptail shark, South africa | Lluynog mor, Wales | Renard, France | Rabosa, Spain | Arequim, Portugal | Peixe rato, Madeira | Tubarâo raposo, Azores | Pesce volpe, Italy | Pas sabljas, Adriatic | Raefhajen, Sweden | Onagazame, Japan | Zorro, Cuba | Lisitska morskayia, Russia | Zorro cauda longa, Mozambique"  
  • For name mismatching, a manual effort was done, as there is (was) no service available to radily match all.

All results are copied into the main App table:

CREATE TABLE "fishmapp" ("id" integer PRIMARY KEY NOT NULL ,"alfa3_code" text,"Scientific_name" text,"English_name" text,"French_name" text,"Spanish_name" text,"Author" text,"Family" text,"Ordo" text,"Local_names" text,"AquaMapsName" text,"Stats" text,"fact_distribution" text,"fact_production" text,"fact_link" text,"species_image" text, "iucn_status" TEXT, "species_group" TEXT, "sizes" TEXT, "dt_upd" DATETIME)