Difference between revisions of "MaxEnt"
(15 intermediate revisions by one other user not shown) | |||
Line 1: | Line 1: | ||
==Description== | ==Description== | ||
− | This page explains how to use the MaxEnt Algorithm on the Statistical Manager | + | This page explains how to use the MaxEnt Algorithm on the [https://services.d4science.org/group/biodiversitylab/processing-tools Statistical Manager] via the D4Science portal. |
− | The algorithm is hosted by the D4Science e-Infrastructure | + | The algorithm is hosted by the D4Science e-Infrastructure, which the D4Science e-infrastructure relies on. It is a Maximum-Entropy model for species habitat modeling, based on the implementation by Shapire et al. v 3.3.3k at [http://www.cs.princeton.edu/~schapire/maxent/ Princeton University]. |
− | In this adaptation the software accepts a table following the [http://i-marine.eu/Content/eTraining.aspx?id=43714ba2-4cb5-4e97-b77f-b6288c9358c2 Species Product Discovery service model] of | + | In this adaptation, the software accepts a table following the [http://i-marine.eu/Content/eTraining.aspx?id=43714ba2-4cb5-4e97-b77f-b6288c9358c2 Species Product Discovery service model] of D4Science and a set of environmental layers in various formats (NetCDF, WFS, WCS, ASC, GeoTiff) via direct links or GeoExplorer UUIDs. |
The user can also set the bounding box and the spatial resolution (in decimal degrees) of the training and the projection. The application will adapt the layers to that resolution if this is higher than the native one. | The user can also set the bounding box and the spatial resolution (in decimal degrees) of the training and the projection. The application will adapt the layers to that resolution if this is higher than the native one. | ||
Line 18: | Line 18: | ||
Starting from this output, other processes of the Statistical Manager can be later applied to the raw values, for example to produce a GIS map (e.g. the "Statistical Manager Points to Map" process). | Starting from this output, other processes of the Statistical Manager can be later applied to the raw values, for example to produce a GIS map (e.g. the "Statistical Manager Points to Map" process). | ||
− | Eventually, results can be shared with other participants to the e-Infrastructure using the [http://i-marine.eu/Content/eTraining.aspx?id=07793722-b76a-4e92-b29a-3a05d3947ded&li=0 | + | Eventually, results can be shared with other participants to the e-Infrastructure using the [http://i-marine.eu/Content/eTraining.aspx?id=07793722-b76a-4e92-b29a-3a05d3947ded&li=0 D4Science workspace]. |
+ | |||
+ | ==Link to the process== | ||
+ | |||
+ | [https://services.d4science.org/group/biodiversitylab/processing-tools The MaxEnt process can be found on the Statistical Manager, on the D4Science web portal]. | ||
+ | |||
+ | You should access the "Execute an Experiment" section and open the "Bayesian Methods" category, or use the search box writing "Max Ent". | ||
==Demo Video== | ==Demo Video== | ||
− | + | [http://goo.gl/TYYnTO A demonstration video is available to demonstrate how to access and use the MaxEnt algorithm on the Statistical Manager] | |
==Inputs== | ==Inputs== | ||
− | [[Image:maxent.png| | + | [[Image:maxent.png|thumb|center|upright=6.5|An example of the MaxEnt process configuration]] |
+ | |||
+ | The inputs tp the MaxEnt procedure are the following: | ||
+ | |||
+ | |||
+ | {| class="wikitable" style="margin: 1em auto 1em auto;" | ||
+ | |- | ||
+ | ! Parameter Name | ||
+ | ! Description | ||
+ | ! Example | ||
+ | |- | ||
+ | | SpeciesName | ||
+ | | The name of the species to model and the occurrence records refer to. If the name is not important this can be a generic string. | ||
+ | | Latimeria chalumnae | ||
+ | |- | ||
+ | | MaxIterations | ||
+ | | The number of learning iterations of the MaxEnt algorithm | ||
+ | | 1000 | ||
+ | |- | ||
+ | | DefaultPrevalence | ||
+ | | A priori probability of presence at ordinary occurrence points. Ref. [http://onlinelibrary.wiley.com/doi/10.1111/j.1466-8238.2010.00581.x/abstract Santika 2010] | ||
+ | | 0.5 | ||
+ | |- | ||
+ | | OccurrencesTable | ||
+ | | A geospatial table containing occurrence records, following the template of the Species Products Discovery datasets. See section below for more details. | ||
+ | | LatimeriaPointsTable | ||
+ | |- | ||
+ | | LongitudeColumn | ||
+ | | The table column containing longitude values | ||
+ | | decimallongitude | ||
+ | |- | ||
+ | | LatitudeColumn | ||
+ | | The table column containing latitude values | ||
+ | | decimallatitude | ||
+ | |- | ||
+ | | Z | ||
+ | | Value of Z. Default is 0, that means environmental layers processing will be at surface level or at the first avaliable Z value in the layer | ||
+ | | 0 | ||
+ | |- | ||
+ | | TimeIndex | ||
+ | | Time Index. The default is the first time indexed in the input environmental datasets | ||
+ | | 0 | ||
+ | |- | ||
+ | | XResolution | ||
+ | | Model projection resolution on the X axis in decimal degrees | ||
+ | | 1 | ||
+ | |- | ||
+ | | YResolution | ||
+ | | Model projection resolution on the Y axis in decimal degrees | ||
+ | | 1 | ||
+ | |- | ||
+ | |- | ||
+ | | Layers | ||
+ | | The list of environmental layers to use for enriching the points. Each entry is a layer Title or UUID or HTTP link. See section below for further details. | ||
+ | | https://dl.dropboxusercontent.com/u/12809149/wind1.tif | ||
+ | |- | ||
+ | |+ SPD Input Template. | ||
+ | |} | ||
==SPD Input Format== | ==SPD Input Format== | ||
− | The algorithm needs a table to be uploaded on the Statistical Manager. To use the upload facilities, refer to the [http:// | + | The algorithm needs a table to be uploaded on the Statistical Manager. To use the upload facilities, refer to the [http://wiki.gcube-system.org/gcube/index.php/Statistical_Manager_Tutorial Statistical Manager Tutorial] page. |
− | The uploaded table should follow the Species Product Discovery (SPD) template and can be generate by the [https:// | + | The uploaded table should follow the Species Product Discovery (SPD) template and can be generate by the [https://services.d4science.org/group/biodiversitylab/species-data-discovery SPD service]. |
+ | One example is [http://goo.gl/luE4qy here]. | ||
Line 118: | Line 182: | ||
==Feeding the algorithm with Input Maps== | ==Feeding the algorithm with Input Maps== | ||
− | In the layers box users can insert links to maps that will be used to associate environmental values to species occurrence records. | + | In the layers box users can insert links to maps that will be used to associate environmental values to species occurrence records. The model will project values only in locations where all the layers have defined values. |
− | The + button allows to insert a new layer. | + | The "+" button allows to insert a new input environmental layer. |
===Input Examples=== | ===Input Examples=== | ||
Line 126: | Line 190: | ||
[[Image:layersinput.png|frame|center|Example of environmental layers used as inputs]] | [[Image:layersinput.png|frame|center|Example of environmental layers used as inputs]] | ||
− | '''Input from | + | '''Input from D4Science GeoExplorer''' |
− | using the [https:// | + | using the [https://services.d4science.org/group/biodiversitylab/geo-visualisation D4Science GeoExplorer application]: |
* search for an environmental layer (e.g. temperature) | * search for an environmental layer (e.g. temperature) | ||
− | * click on one of the | + | * click on one of the found layers |
− | * in the "Summary Layer Info" panel on the right scroll down and select the Metadata UUID string (e.g. cd048cb5-dbb6-414b-a3b9-1f3ac512fbff) | + | * in the "Summary Layer Info" panel on the right side of the panel, scroll down and select the Metadata UUID string (e.g. cd048cb5-dbb6-414b-a3b9-1f3ac512fbff) |
* paste the UUID in the layers box in MaxEnt | * paste the UUID in the layers box in MaxEnt | ||
'''Input from a WFS link''' | '''Input from a WFS link''' | ||
− | MaxEnt can import WFS links residing either on a GeoServer or on a MapServer. The server | + | MaxEnt can import WFS links residing either on a GeoServer or on a MapServer. The server should be configured to produce maps details in json format. |
− | In this case you can insert the direct WFS link in the layers box, without specifying the bounding box. | + | In this case, you can insert the direct WFS link in the layers box, without specifying the bounding box. |
− | + | Example: | |
+ | |||
+ | http://geoserver-dev.d4science-ii.research-infrastructures.eu/geoserver/ows?service=wfs&version=1.0.0&request=GetFeature&srsName=urn:x-ogc:def:crs:EPSG:4326&TYPENAME=aquamaps:worldborders | ||
Please, use EPSG:4326 as projection. | Please, use EPSG:4326 as projection. | ||
Line 146: | Line 212: | ||
'''Input from a WCS link''' | '''Input from a WCS link''' | ||
− | You can | + | You can insert a direct WCS link. |
− | + | Example: | |
http://geoserver-dev.d4science-ii.research-infrastructures.eu/geoserver/wcs/wcs?service=wcs&version=1.0.0&request=GetCoverage&coverage=aquamaps:WorldClimBio2&CRS=EPSG:4326&RESPONSE_CRS=EPSG:4326 | http://geoserver-dev.d4science-ii.research-infrastructures.eu/geoserver/wcs/wcs?service=wcs&version=1.0.0&request=GetCoverage&coverage=aquamaps:WorldClimBio2&CRS=EPSG:4326&RESPONSE_CRS=EPSG:4326 | ||
Line 156: | Line 222: | ||
'''Input from a NetCDF-GRID file''' | '''Input from a NetCDF-GRID file''' | ||
− | You can | + | You can insert the OpenDAP link to a NetCDF file, only if this contains one single dimension layer. |
− | + | Example: | |
http://thredds.research-infrastructures.eu/thredds/dodsC/public/netcdf/WOA2005TemperatureAnnual_CLIMATOLOGY_METEOROLOGY_ATMOSPHERE_.nc | http://thredds.research-infrastructures.eu/thredds/dodsC/public/netcdf/WOA2005TemperatureAnnual_CLIMATOLOGY_METEOROLOGY_ATMOSPHERE_.nc | ||
Line 164: | Line 230: | ||
'''ASC ESRI-GRID files''' | '''ASC ESRI-GRID files''' | ||
− | You can | + | You can insert a direct http link to an ESRI-GRID file, even using local-machines common publishing tools (e.g. dropbox). |
− | + | Example: | |
http://thredds.research-infrastructures.eu/thredds/fileServer/public/netcdf/ph.asc | http://thredds.research-infrastructures.eu/thredds/fileServer/public/netcdf/ph.asc | ||
Line 174: | Line 240: | ||
'''GeoTiffs''' | '''GeoTiffs''' | ||
− | Http links to GeoTiff files are allowed, even using local common publishing tools (e.g. dropbox). | + | Http links to GeoTiff files are allowed, even using local-machines common publishing tools (e.g. dropbox). |
− | + | ||
+ | Example: | ||
https://dl.dropboxusercontent.com/u/12809149/wind1.tif | https://dl.dropboxusercontent.com/u/12809149/wind1.tif | ||
− | + | ==Contacts== | |
For questions and bug alerts use [https://support.d4science.research-infrastructures.eu/ this] form. | For questions and bug alerts use [https://support.d4science.research-infrastructures.eu/ this] form. |
Latest revision as of 17:25, 3 September 2015
Description
This page explains how to use the MaxEnt Algorithm on the Statistical Manager via the D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. portal. The algorithm is hosted by the D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. e-InfrastructureAn operational combination of digital technologies (hardware and software), resources (data and services), communications (protocols, access rights and networks), and the people and organizational structures needed to support research efforts and collaboration in the large., which the D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. e-infrastructure relies on. It is a Maximum-Entropy model for species habitat modeling, based on the implementation by Shapire et al. v 3.3.3k at Princeton University.
In this adaptation, the software accepts a table following the Species Product Discovery service model of D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. and a set of environmental layers in various formats (NetCDF, WFSWeb Feature Service, WCSWeb Coverage Service, ASC, GeoTiff) via direct links or GeoExplorer UUIDs.
The user can also set the bounding box and the spatial resolution (in decimal degrees) of the training and the projection. The application will adapt the layers to that resolution if this is higher than the native one.
The output is made up of the following components:
- a thumbnail map of the projected model,
- the ROC curve,
- the Omission/Commission chart,
- a table containing the raw assigned values,
- a threshold to transform the table into a 0-1 probability distribution,
- a report of the importance of the used layers in the model,
- ASCII representations of the input layers to check their alignment.
Starting from this output, other processes of the Statistical Manager can be later applied to the raw values, for example to produce a GIS map (e.g. the "Statistical Manager Points to Map" process). Eventually, results can be shared with other participants to the e-InfrastructureAn operational combination of digital technologies (hardware and software), resources (data and services), communications (protocols, access rights and networks), and the people and organizational structures needed to support research efforts and collaboration in the large. using the D4Science workspace.
Link to the process
The MaxEnt process can be found on the Statistical Manager, on the D4Science web portal.
You should access the "Execute an Experiment" section and open the "Bayesian Methods" category, or use the search box writing "Max Ent".
Demo Video
Inputs
The inputs tp the MaxEnt procedure are the following:
Parameter Name | Description | Example |
---|---|---|
SpeciesName | The name of the species to model and the occurrence records refer to. If the name is not important this can be a generic string. | Latimeria chalumnae |
MaxIterations | The number of learning iterations of the MaxEnt algorithm | 1000 |
DefaultPrevalence | A priori probability of presence at ordinary occurrence points. Ref. Santika 2010 | 0.5 |
OccurrencesTable | A geospatial table containing occurrence records, following the template of the Species Products Discovery datasets. See section below for more details. | LatimeriaPointsTable |
LongitudeColumn | The table column containing longitude values | decimallongitude |
LatitudeColumn | The table column containing latitude values | decimallatitude |
Z | Value of Z. Default is 0, that means environmental layers processing will be at surface level or at the first avaliable Z value in the layer | 0 |
TimeIndex | Time Index. The default is the first time indexed in the input environmental datasets | 0 |
XResolution | Model projection resolution on the X axis in decimal degrees | 1 |
YResolution | Model projection resolution on the Y axis in decimal degrees | 1 |
Layers | The list of environmental layers to use for enriching the points. Each entry is a layer Title or UUID or HTTP link. See section below for further details. | https://dl.dropboxusercontent.com/u/12809149/wind1.tif |
SPD Input Format
The algorithm needs a table to be uploaded on the Statistical Manager. To use the upload facilities, refer to the Statistical Manager Tutorial page. The uploaded table should follow the Species Product Discovery (SPD) template and can be generate by the SPD service. One example is here.
Field name | Format |
---|---|
institutioncode | string |
collectioncode | string |
catalognumber | string |
dataset | string |
dataprovider | string |
datasource | string |
scientificnameauthorship | string |
identifiedby | string |
credits | string |
recordedby | string |
eventdate | timestamp without time zone |
modified | timestamp without time zone |
scientificname | string |
kingdom | string |
family | string |
locality | string |
country | string |
citation | string |
decimallatitude | double precision |
decimallongitude | double precision |
coordinateuncertaintyinmeters | string |
maxdepth | double precision |
mindepth | double precision |
basisofrecord | string |
The fields could be also empty, except for the decimallatitude and decimallongitude fields. This allows to apply MaxEnt also to other domains than species distributions modelling. Note that the template closely follows the Darwin Core format.
Feeding the algorithm with Input Maps
In the layers box users can insert links to maps that will be used to associate environmental values to species occurrence records. The model will project values only in locations where all the layers have defined values. The "+" button allows to insert a new input environmental layer.
Input Examples
Input from D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. GeoExplorer
using the D4Science GeoExplorer application:
- search for an environmental layer (e.g. temperature)
- click on one of the found layers
- in the "Summary Layer Info" panel on the right side of the panel, scroll down and select the Metadata UUID string (e.g. cd048cb5-dbb6-414b-a3b9-1f3ac512fbff)
- paste the UUID in the layers box in MaxEnt
Input from a WFSWeb Feature Service link
MaxEnt can import WFSWeb Feature Service links residing either on a GeoServer or on a MapServer. The server should be configured to produce maps details in json format. In this case, you can insert the direct WFSWeb Feature Service link in the layers box, without specifying the bounding box.
Example:
Please, use EPSG:4326 as projection.
Input from a WCSWeb Coverage Service link
You can insert a direct WCSWeb Coverage Service link.
Example:
Please, use EPSG:4326 as projection.
Input from a NetCDF-GRID file
You can insert the OpenDAP link to a NetCDF file, only if this contains one single dimension layer.
Example:
ASC ESRI-GRID files
You can insert a direct http link to an ESRI-GRID file, even using local-machines common publishing tools (e.g. dropbox).
Example:
http://thredds.research-infrastructures.eu/thredds/fileServer/public/netcdf/ph.asc
https://dl.dropboxusercontent.com/u/12809149/layer1.asc
GeoTiffs
Http links to GeoTiff files are allowed, even using local-machines common publishing tools (e.g. dropbox).
Example:
https://dl.dropboxusercontent.com/u/12809149/wind1.tif
Contacts
For questions and bug alerts use this form.