Ecosystem Approach Community of Practice: OBIS

From D4Science Wiki
Jump to: navigation, search

OBIS Current Situation

The Ocean Biogeographic Information System (OBIS, www.iobis.org) is a global information system for digital marine biodiversity data. It is comprised of data management facilities and infrastructure that provide open access to data for technical, educational, scientific and resource management purposes. By providing access to marine biogeographic data using a standard terminology, OBIS fills a critical marine data gap. An extensive description of OBIS can be found on-line.

OBIS is built on open-source software. The central data management tool is PostgreSQL, with PostGIS as GIS extensions. The web site and web-based search interface is based on Apache, PHP, GeoServer and OpenLayers.

OBIS manages the upload of data from a variety of sources (list) and formats (csv, ...), and offers many database functions and procedures for sorting, filtering, merging, etc. It also includes advanced analytical support, as PostgreSQL queries and pl/pgsql functions. The data are used to calculate several derived products, mainly maps of diversity indices such as species richness, and heat maps of sampling density and number of species recorded. Most of these calculations are in psql; in some cases, R has been used.

The resulting OBIS rdbms feeds data to the OBIS website, and to upstream aggregators such as GBIF and EOL. It can also:

  • generate geospatially explicit data (georeferenced)
  • display data in an integrated MapViewer;
  • export data over R-ODBC to a stand-alone R-environment;
  • calculate 'ranges' over a number of physical oceanography parameters, and bathymetry, which are shared with the Encyclopedia of Life (EOL, http://www.eol.org)
  • etc.

OBIS IT

The OBIS infrastruture is distributed, with several servers hosted by partners in the OBIS network. An 'OBIS instance' consists of two servers: one database server; the other a web and application server; operating system is Ubuntu. More details on the set-up can be found in a recent paper, "Fujioka, E., Vanden Berghe, E., Donnelly, B., Castillo, J., Cleary, J., Holmes, C., McKnight, S., et al. (Accepted). Advancing global marine biogeography research with opensource GIS software and cloud computing. Transactions in GIS." Please contact Edward if you're interested in a copy of the pre-print. As soon as the paper is available on line, this page will be changed to link to it directly.

Currently there are three such pairs: the development, staging and production installation. The staging and production pairs both are hosted by the Flanders Marine Institute. The development web server lives in Duke University; the data assembly ('data development') machine is managed from Rutgers, and lives on the Amazon Cloud. One issue to investigate is whether the data development machine can be operated on the D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. infrastructure. The Amazon solution is extremely satisfying (very stable, fast, flexible...) but not without cost.

Several pieces of the OBIS data streams are clear candidates for work. There is a need to automate the quality control; data ingestion and integration has to be improved, including detection of duplicates; upstream provision of OBIS data, mainly to GBIF and EOL should be improved; marine data now available in GBIF but not in OBIS should be incorporated in OBIS. All these activities will lead to a bigger and better OBIS database, which will be better able to server the Community of PracticeA term coined to capture an "activity system" that includes individuals who are united in action and in the meaning that "action" has for them and for the larger collective. The communities of practice are "virtual", ''i.e.'', they are not formal structures, such as departments or project teams. Instead, these communities exist in the minds of their members, are glued together by the connections they have with each other, as well as by their specific shared problems or areas of interest. The generation of knowledge in communities of practice occurs when people participate in problem solving and share the knowledge necessary to solve the problems..

Another line of work will be the environmental modelling. For this, we need access to physical oceanography data and other data types (bathymetry, distance from ice...). We need the algorithms to be operational - Aquamaps and the algorithms implemented in OpenModeller. The first results of this line of work are relatively far away. As one intermediate result, we can automate/streamline the data delivery to EOL.

OBIS VREVirtual Research Environment.

Many unfinished sections

OBIS Postgres database management OBIS Pgadmin OBIS R-ODBC OBIS Users Management OBIS Data vizualization (Table / Chart / Map) OBIS Data Export …. …. ….


IRD Data Access

IPT

IRD has discussed services it expect to contribute to OBIS. IRD has set up the GBIF Integrated Publishing Toolkit (IPT, [1]) last year on a server at IRD to provide access to part of IRD’s data:

  • metadata with EML metadata format,
  • data with Darwin Core data format,

http://vmirdgbif-proto.mpl.ird.fr:8080/ipt/

In D4science, a workflow has to be defined that allows the reuse of these data to populate the OBIS database. The data can be obtained from GBIF or by connecting OBIS system the IPT instance of IRD. The obvious problem is to avoid duplication of data in the OBIS system from different dataflows.


To avoid 'Crop circles', a flag in the metadata whether a dataset can flow upstream or not. Also, data that have already been submitted to GBIF should be recognizable; this avoids that OBIS in D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. passes that data to GBIF, and that there is no need to harvest from GBIF (if it is already harvested directly from IRD).

IRD data accuracy

The IPT of IRD can expose much more data than currently shared with GBIF. IRD currently does not share these data because of location accuracy issues. OBIS / D4ScienceAn e-Infrastructure operated by the D4Science.org initiative. has addressed this issue in the past (e.g. for trawlnets) and IRD seeks a solution through the iMarine collaboration.

IRD collects data from landings of purse seiners, and these do not always reflect exactly where the fishes were caught (various possible fishing operations over months and huge areas). Instead of a range of accurate points IRD has many possible points; resulting in a polygon that can be small or big). IRD has developed a range of different algorithms to transform polygons into points that could be used in OBIS.

IRD can share more data located with polygons instead of points at the end of this year.

For OBIS, harvesting directly is important; it provides the option to be more specific for marine data. For instance, ingesting transects as start and end-points, rather than a single point. OBIS can provide details on the differences between Darwin Core and the OBIS Schema. Other extension requirements are dealing with polygons and sets of points.


IRD biological parameters

IRD can share biological parameters such as weight and length as well for some of its datasets, and share the sampling / survey method used in the collection of the data.

Lengths and weights would be an interesting extension, but we'd have to extend the OBIS Schema in order to deal with those. I'm a bit reluctant to do this directly. I think we first should investigate rewriting the OBIS Schema as a formal extension to the new extendible Darwin Core, and then do length and weight as an extension of the extension, or as a direct extension of GBIF's Darwin Core (assuming we can have several extensions in parallel).


IRD data coverage

Spatial

IRD data coverage OBIS data coverage

EurOBIS is not in charge of the Indian Ocean data.

Temporal

The IRD datasets start in (list sets) They are updated continuously / monthly / annualy


OBIS Website

OBIS VREVirtual Research Environment. Profile

Product

Describe the proposed solution in maximum 3 sentences:

With ICIS capture time-series can be

Priority to CoPCommunity of Practice.

List proposed solution priority:

  • Identified community: Users now:
  • Potential for co-funding:
  • Structural allocation of resources:
  • Referred in DoW:
  • Business Cases:
  • How does the proposed action generally support sustainability aspects
  • How consistent it is with EC regulations/strategies (eg INSPIRE, ... ):
  • Re-usability – benefits – compatibility

Parentage

Relation to CoPCommunity of Practice. Software Relation to D4S technologies

Does the proposed solution solve other problems associated with EA-CoPCommunity of Practice. Business Cases?

If the proposed solution can be used in another SW scenario (not users!) please describe.

Public

How big is the expected user community after delivery?

Productivity

Are the proposed measures effective?

Does it reduce a known workload?

Price

Is the proposed solution cheap?

Expected effort in PM:

Presentation

How is the component delivered to users? (Design / on-line help / training material / support). The OBIS VREVirtual Research Environment. will be a VREVirtual Research Environment. that build a data-load and validation interface around an existing Postgresql db.

The VREVirtual Research Environment. will not replace all existing data structures and services, and a pgAdmin is expected, even with very resticted grants and rights on the DB instance.

The VREVirtual Research Environment. will offer

  • Data discovery through DiGIR, IPT, and HIT
  • Data Loading to the DB
  • Data Vizualization in a map
  • Interactive data management with a tabular and map interface.
  • ...


Privacy

Are they safe?

The Postgresql stores data ...

Access is only possible through ....

Need the proposed solution to manage confidential info at data / dataset / organizational level? None of the data is confidential.

Describe security and privacy issues:


Policy

Are there any policies available that describe data access and sharing?

Yes.

Are these really needed?

Yes, OBIS combines data from many different resources and it is important to keep track of the ownership, and provide proper attributions in all products. Without a well-defined data access and sharing rule-frame, it is also very difficult to identify replicates; e.g. if a dataset has already been uploaded to GBIF, and if it has alreday been reviewed.

Copyright / attribution / metadata / legal

The attribution records are most important.

Perils

Do they introduce moral hazard? (A hazard here is the risk that users will behave more recklessly if they are insulated from the effects of the software, or if they do not understand what it produces, where data come from, what they represent etc. .)

The OBIS VREVirtual Research Environment. carries no risks to users and or developers.