Difference between revisions of "MaxEnt"

From D4Science Wiki
Jump to: navigation, search
 
(15 intermediate revisions by one other user not shown)
Line 1: Line 1:
 
==Description==
 
==Description==
This page explains how to use the MaxEnt Algorithm on the Statistical Manager with the i-Marine portal..
+
This page explains how to use the MaxEnt Algorithm on the [https://services.d4science.org/group/biodiversitylab/processing-tools Statistical Manager] via the D4Science portal.
The algorithm is hosted by the D4Science e-Infrastructure that supports i-Marine. It is a Maximum-Entropy model for species habitat modeling, based on the implementation by Shapire et al. v 3.3.3k at [http://www.cs.princeton.edu/~schapire/maxent/ Princeton University].  
+
The algorithm is hosted by the D4Science e-Infrastructure, which the D4Science e-infrastructure relies on. It is a Maximum-Entropy model for species habitat modeling, based on the implementation by Shapire et al. v 3.3.3k at [http://www.cs.princeton.edu/~schapire/maxent/ Princeton University].  
  
In this adaptation the software accepts a table following the [http://i-marine.eu/Content/eTraining.aspx?id=43714ba2-4cb5-4e97-b77f-b6288c9358c2 Species Product Discovery service model] of i-Marine and a set of environmental layers in various formats (NetCDF, WFS, WCS, ASC, GeoTiff) via direct links or GeoExplorer UUIDs.  
+
In this adaptation, the software accepts a table following the [http://i-marine.eu/Content/eTraining.aspx?id=43714ba2-4cb5-4e97-b77f-b6288c9358c2 Species Product Discovery service model] of D4Science and a set of environmental layers in various formats (NetCDF, WFS, WCS, ASC, GeoTiff) via direct links or GeoExplorer UUIDs.  
  
 
The user can also set the bounding box and the spatial resolution (in decimal degrees) of the training and the projection. The application will adapt the layers to that resolution if this is higher than the native one.
 
The user can also set the bounding box and the spatial resolution (in decimal degrees) of the training and the projection. The application will adapt the layers to that resolution if this is higher than the native one.
Line 18: Line 18:
  
 
Starting from this output, other processes of the Statistical Manager can be later applied to the raw values, for example to produce a GIS map (e.g. the "Statistical Manager Points to Map" process).
 
Starting from this output, other processes of the Statistical Manager can be later applied to the raw values, for example to produce a GIS map (e.g. the "Statistical Manager Points to Map" process).
Eventually, results can be shared with other participants to the e-Infrastructure using the [http://i-marine.eu/Content/eTraining.aspx?id=07793722-b76a-4e92-b29a-3a05d3947ded&li=0 i-Marine workspace].
+
Eventually, results can be shared with other participants to the e-Infrastructure using the [http://i-marine.eu/Content/eTraining.aspx?id=07793722-b76a-4e92-b29a-3a05d3947ded&li=0 D4Science workspace].
 +
 
 +
==Link to the process==
 +
 
 +
[https://services.d4science.org/group/biodiversitylab/processing-tools The MaxEnt process can be found on the Statistical Manager, on the D4Science web portal].
 +
 
 +
You should access the "Execute an Experiment" section and open the "Bayesian Methods" category, or use the search box writing "Max Ent".
  
 
==Demo Video==
 
==Demo Video==
Here is a demonstration of the usage of the MaxEnt algorithm on the Statistical Manager: http://goo.gl/TYYnTO
+
[http://goo.gl/TYYnTO A demonstration video is available to demonstrate how to access and use the MaxEnt algorithm on the Statistical Manager]
  
 
==Inputs==
 
==Inputs==
  
[[Image:maxent.png|frame|center|An example of the MaxEnt process configuration]]
+
[[Image:maxent.png|thumb|center|upright=6.5|An example of the MaxEnt process configuration]]
 +
 
 +
The inputs tp the MaxEnt procedure are the following:
 +
 
 +
 
 +
{| class="wikitable" style="margin: 1em auto 1em auto;"
 +
|-
 +
! Parameter Name
 +
! Description
 +
! Example
 +
|-
 +
| SpeciesName
 +
| The name of the species to model and the occurrence records refer to. If the name is not important this can be a generic string.
 +
| Latimeria chalumnae
 +
|-
 +
| MaxIterations
 +
| The number of learning iterations of the MaxEnt algorithm
 +
| 1000
 +
|-
 +
| DefaultPrevalence
 +
| A priori probability of presence at ordinary occurrence points. Ref. [http://onlinelibrary.wiley.com/doi/10.1111/j.1466-8238.2010.00581.x/abstract Santika 2010]
 +
| 0.5
 +
|-
 +
| OccurrencesTable
 +
| A geospatial table containing occurrence records, following the template of the Species Products Discovery datasets. See section below for more details.
 +
| LatimeriaPointsTable
 +
|-
 +
| LongitudeColumn
 +
| The table column containing longitude values
 +
| decimallongitude
 +
|-
 +
| LatitudeColumn
 +
| The table column containing latitude values
 +
| decimallatitude
 +
|-
 +
| Z
 +
| Value of Z. Default is 0, that means environmental layers processing will be at surface level or at the first avaliable Z value in the layer
 +
| 0
 +
|-
 +
| TimeIndex
 +
| Time Index. The default is the first time indexed in the input environmental datasets
 +
| 0
 +
|-
 +
| XResolution
 +
| Model projection resolution on the X axis in decimal degrees
 +
| 1
 +
|-
 +
| YResolution
 +
| Model projection resolution on the Y axis in decimal degrees
 +
| 1
 +
|-
 +
|-
 +
| Layers
 +
| The list of environmental layers to use for enriching the points. Each entry is a layer Title or UUID or HTTP link. See section below for further details.
 +
| https://dl.dropboxusercontent.com/u/12809149/wind1.tif
 +
|-
 +
|+ SPD Input Template.
 +
|}
  
 
==SPD Input Format==
 
==SPD Input Format==
  
The algorithm needs a table to be uploaded on the Statistical Manager. To use the upload facilities, refer to the [http://gcube.wiki.gcube-system.org/gcube/index.php/Statistical_Manager_Tutorial Statistical Manager Tutorial] page.
+
The algorithm needs a table to be uploaded on the Statistical Manager. To use the upload facilities, refer to the [http://wiki.gcube-system.org/gcube/index.php/Statistical_Manager_Tutorial Statistical Manager Tutorial] page.
The uploaded table should follow the Species Product Discovery (SPD) template and can be generate by the [https://i-marine.d4science.org/group/biodiversitylab/species-data-discovery SPD service]:
+
The uploaded table should follow the Species Product Discovery (SPD) template and can be generate by the [https://services.d4science.org/group/biodiversitylab/species-data-discovery SPD service].
 +
One example is [http://goo.gl/luE4qy here].
  
  
Line 118: Line 182:
 
==Feeding the algorithm with Input Maps==
 
==Feeding the algorithm with Input Maps==
  
In the layers box users can insert links to maps that will be used to associate environmental values to species occurrence records.
+
In the layers box users can insert links to maps that will be used to associate environmental values to species occurrence records. The model will project values only in locations where all the layers have defined values.
The + button allows to insert a new layer.
+
The "+" button allows to insert a new input environmental layer.
  
 
===Input Examples===
 
===Input Examples===
Line 126: Line 190:
 
[[Image:layersinput.png|frame|center|Example of environmental layers used as inputs]]
 
[[Image:layersinput.png|frame|center|Example of environmental layers used as inputs]]
  
'''Input from i-Marine GeoExplorer'''
+
'''Input from D4Science GeoExplorer'''
  
using the [https://i-marine.d4science.org/group/biodiversitylab/geo-visualisation i-Marine GeoExplorer application]:
+
using the [https://services.d4science.org/group/biodiversitylab/geo-visualisation D4Science GeoExplorer application]:
  
 
* search for an environmental layer (e.g. temperature)
 
* search for an environmental layer (e.g. temperature)
* click on one of the layers found by the search
+
* click on one of the found layers
* in the "Summary Layer Info" panel on the right scroll down and select the Metadata UUID string (e.g. cd048cb5-dbb6-414b-a3b9-1f3ac512fbff)
+
* in the "Summary Layer Info" panel on the right side of the panel, scroll down and select the Metadata UUID string (e.g. cd048cb5-dbb6-414b-a3b9-1f3ac512fbff)
 
* paste the UUID in the layers box in MaxEnt
 
* paste the UUID in the layers box in MaxEnt
  
 
'''Input from a WFS link'''
 
'''Input from a WFS link'''
  
MaxEnt can import WFS links residing either on a GeoServer or on a MapServer. The server must be able to produce maps details in json format.
+
MaxEnt can import WFS links residing either on a GeoServer or on a MapServer. The server should be configured to produce maps details in json format.
In this case you can insert the direct WFS link in the layers box, without specifying the bounding box.
+
In this case, you can insert the direct WFS link in the layers box, without specifying the bounding box.
  
E.g.: http://geoserver-dev.d4science-ii.research-infrastructures.eu/geoserver/ows?service=wfs&version=1.0.0&request=GetFeature&srsName=urn:x-ogc:def:crs:EPSG:4326&TYPENAME=aquamaps:worldborders
+
Example:
 +
 
 +
http://geoserver-dev.d4science-ii.research-infrastructures.eu/geoserver/ows?service=wfs&version=1.0.0&request=GetFeature&srsName=urn:x-ogc:def:crs:EPSG:4326&TYPENAME=aquamaps:worldborders
  
 
Please, use EPSG:4326 as projection.
 
Please, use EPSG:4326 as projection.
Line 146: Line 212:
 
'''Input from a WCS link'''
 
'''Input from a WCS link'''
  
You can input a direct WCS link.
+
You can insert a direct WCS link.
  
E.g.:  
+
Example:  
  
 
http://geoserver-dev.d4science-ii.research-infrastructures.eu/geoserver/wcs/wcs?service=wcs&version=1.0.0&request=GetCoverage&coverage=aquamaps:WorldClimBio2&CRS=EPSG:4326&RESPONSE_CRS=EPSG:4326
 
http://geoserver-dev.d4science-ii.research-infrastructures.eu/geoserver/wcs/wcs?service=wcs&version=1.0.0&request=GetCoverage&coverage=aquamaps:WorldClimBio2&CRS=EPSG:4326&RESPONSE_CRS=EPSG:4326
Line 156: Line 222:
 
'''Input from a NetCDF-GRID file'''
 
'''Input from a NetCDF-GRID file'''
  
You can input the OpenDAP link to a NetCDF file, only if this contains one single dimension layer.
+
You can insert the OpenDAP link to a NetCDF file, only if this contains one single dimension layer.
  
E.g.:  
+
Example:  
  
 
http://thredds.research-infrastructures.eu/thredds/dodsC/public/netcdf/WOA2005TemperatureAnnual_CLIMATOLOGY_METEOROLOGY_ATMOSPHERE_.nc
 
http://thredds.research-infrastructures.eu/thredds/dodsC/public/netcdf/WOA2005TemperatureAnnual_CLIMATOLOGY_METEOROLOGY_ATMOSPHERE_.nc
Line 164: Line 230:
 
'''ASC ESRI-GRID files'''
 
'''ASC ESRI-GRID files'''
  
You can input a direct http link to an ESRI-GRID file, even using local common publishing tools (e.g. dropbox).
+
You can insert a direct http link to an ESRI-GRID file, even using local-machines common publishing tools (e.g. dropbox).
  
E.g.:
+
Example:  
  
 
http://thredds.research-infrastructures.eu/thredds/fileServer/public/netcdf/ph.asc
 
http://thredds.research-infrastructures.eu/thredds/fileServer/public/netcdf/ph.asc
Line 174: Line 240:
 
'''GeoTiffs'''
 
'''GeoTiffs'''
  
Http links to GeoTiff files are allowed, even using local common publishing tools (e.g. dropbox).
+
Http links to GeoTiff files are allowed, even using local-machines common publishing tools (e.g. dropbox).
E.g.:
+
 
 +
Example:  
  
 
https://dl.dropboxusercontent.com/u/12809149/wind1.tif
 
https://dl.dropboxusercontent.com/u/12809149/wind1.tif
  
===Contacts===
+
==Contacts==
 
For questions and bug alerts use [https://support.d4science.research-infrastructures.eu/ this] form.
 
For questions and bug alerts use [https://support.d4science.research-infrastructures.eu/ this] form.

Latest revision as of 17:25, 3 September 2015

Description

This page explains how to use the MaxEnt Algorithm on the Statistical Manager via the D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. portal. The algorithm is hosted by the D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. e-InfrastructureAn operational combination of digital technologies (hardware and software), resources (data and services), communications (protocols, access rights and networks), and the people and organizational structures needed to support research efforts and collaboration in the large., which the D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. e-infrastructure relies on. It is a Maximum-Entropy model for species habitat modeling, based on the implementation by Shapire et al. v 3.3.3k at Princeton University.

In this adaptation, the software accepts a table following the Species Product Discovery service model of D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. and a set of environmental layers in various formats (NetCDF, WFSWeb Feature Service, WCSWeb Coverage Service, ASC, GeoTiff) via direct links or GeoExplorer UUIDs.

The user can also set the bounding box and the spatial resolution (in decimal degrees) of the training and the projection. The application will adapt the layers to that resolution if this is higher than the native one.

The output is made up of the following components:

  • a thumbnail map of the projected model,
  • the ROC curve,
  • the Omission/Commission chart,
  • a table containing the raw assigned values,
  • a threshold to transform the table into a 0-1 probability distribution,
  • a report of the importance of the used layers in the model,
  • ASCII representations of the input layers to check their alignment.

Starting from this output, other processes of the Statistical Manager can be later applied to the raw values, for example to produce a GIS map (e.g. the "Statistical Manager Points to Map" process). Eventually, results can be shared with other participants to the e-InfrastructureAn operational combination of digital technologies (hardware and software), resources (data and services), communications (protocols, access rights and networks), and the people and organizational structures needed to support research efforts and collaboration in the large. using the D4Science workspace.

Link to the process

The MaxEnt process can be found on the Statistical Manager, on the D4Science web portal.

You should access the "Execute an Experiment" section and open the "Bayesian Methods" category, or use the search box writing "Max Ent".

Demo Video

A demonstration video is available to demonstrate how to access and use the MaxEnt algorithm on the Statistical Manager

Inputs

An example of the MaxEnt process configuration

The inputs tp the MaxEnt procedure are the following:


Parameter Name Description Example
SpeciesName The name of the species to model and the occurrence records refer to. If the name is not important this can be a generic string. Latimeria chalumnae
MaxIterations The number of learning iterations of the MaxEnt algorithm 1000
DefaultPrevalence A priori probability of presence at ordinary occurrence points. Ref. Santika 2010 0.5
OccurrencesTable A geospatial table containing occurrence records, following the template of the Species Products Discovery datasets. See section below for more details. LatimeriaPointsTable
LongitudeColumn The table column containing longitude values decimallongitude
LatitudeColumn The table column containing latitude values decimallatitude
Z Value of Z. Default is 0, that means environmental layers processing will be at surface level or at the first avaliable Z value in the layer 0
TimeIndex Time Index. The default is the first time indexed in the input environmental datasets 0
XResolution Model projection resolution on the X axis in decimal degrees 1
YResolution Model projection resolution on the Y axis in decimal degrees 1
Layers The list of environmental layers to use for enriching the points. Each entry is a layer Title or UUID or HTTP link. See section below for further details. https://dl.dropboxusercontent.com/u/12809149/wind1.tif
SPD Input Template.

SPD Input Format

The algorithm needs a table to be uploaded on the Statistical Manager. To use the upload facilities, refer to the Statistical Manager Tutorial page. The uploaded table should follow the Species Product Discovery (SPD) template and can be generate by the SPD service. One example is here.


Field name Format
institutioncode string
collectioncode string
catalognumber string
dataset string
dataprovider string
datasource string
scientificnameauthorship string
identifiedby string
credits string
recordedby string
eventdate timestamp without time zone
modified timestamp without time zone
scientificname string
kingdom string
family string
locality string
country string
citation string
decimallatitude double precision
decimallongitude double precision
coordinateuncertaintyinmeters string
maxdepth double precision
mindepth double precision
basisofrecord string
SPD Input Template.


The fields could be also empty, except for the decimallatitude and decimallongitude fields. This allows to apply MaxEnt also to other domains than species distributions modelling. Note that the template closely follows the Darwin Core format.

Feeding the algorithm with Input Maps

In the layers box users can insert links to maps that will be used to associate environmental values to species occurrence records. The model will project values only in locations where all the layers have defined values. The "+" button allows to insert a new input environmental layer.

Input Examples

Example of environmental layers used as inputs

Input from D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. GeoExplorer

using the D4Science GeoExplorer application:

  • search for an environmental layer (e.g. temperature)
  • click on one of the found layers
  • in the "Summary Layer Info" panel on the right side of the panel, scroll down and select the Metadata UUID string (e.g. cd048cb5-dbb6-414b-a3b9-1f3ac512fbff)
  • paste the UUID in the layers box in MaxEnt

Input from a WFSWeb Feature Service link

MaxEnt can import WFSWeb Feature Service links residing either on a GeoServer or on a MapServer. The server should be configured to produce maps details in json format. In this case, you can insert the direct WFSWeb Feature Service link in the layers box, without specifying the bounding box.

Example:

http://geoserver-dev.d4science-ii.research-infrastructures.eu/geoserver/ows?service=wfs&version=1.0.0&request=GetFeature&srsName=urn:x-ogc:def:crs:EPSG:4326&TYPENAME=aquamaps:worldborders

Please, use EPSG:4326 as projection.

Input from a WCSWeb Coverage Service link

You can insert a direct WCSWeb Coverage Service link.

Example:

http://geoserver-dev.d4science-ii.research-infrastructures.eu/geoserver/wcs/wcs?service=wcs&version=1.0.0&request=GetCoverage&coverage=aquamaps:WorldClimBio2&CRS=EPSG:4326&RESPONSE_CRS=EPSG:4326

Please, use EPSG:4326 as projection.

Input from a NetCDF-GRID file

You can insert the OpenDAP link to a NetCDF file, only if this contains one single dimension layer.

Example:

http://thredds.research-infrastructures.eu/thredds/dodsC/public/netcdf/WOA2005TemperatureAnnual_CLIMATOLOGY_METEOROLOGY_ATMOSPHERE_.nc

ASC ESRI-GRID files

You can insert a direct http link to an ESRI-GRID file, even using local-machines common publishing tools (e.g. dropbox).

Example:

http://thredds.research-infrastructures.eu/thredds/fileServer/public/netcdf/ph.asc

https://dl.dropboxusercontent.com/u/12809149/layer1.asc

GeoTiffs

Http links to GeoTiff files are allowed, even using local-machines common publishing tools (e.g. dropbox).

Example:

https://dl.dropboxusercontent.com/u/12809149/wind1.tif

Contacts

For questions and bug alerts use this form.