Ecosystem Approach Community of Practice: Codelistmanager

From D4Science Wiki
Revision as of 16:36, 15 March 2013 by Anton.ellenbroek (Talk | contribs) (CodelistManager Profile)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

CodelistManager Profile

The CodelistManager VREVirtual Research Environment. aims to provide the management of Codes, their description, and temoral and spatial coverage.

The base requirements have already been collected and were implemented in the ICIS VREVirtual Research Environment. in the D4ScienceAn e-Infrastructure operated by the D4Science.org initiative.-II project.

The code lists are currently used in ICIS Curation and are integrated in the gCube system. They form the starting point for a curation of TS-Objects to select the dimension of a column, and curate the values in the TS-Object.

However, many new opportunities arose, and within the limited effort available in D4ScienceAn e-Infrastructure operated by the D4Science.org initiative.-II not all requirements were implemented. In addition, the opportunities go much beyond the limited requirements in ICIS, and a whole new set of functionalities can be perceived that are best bundled in a separate VREVirtual Research Environment..

Problem

Describe the CoPCommunity of Practice. issue to be addressed by the Componenent (VREVirtual Research Environment. / service / resource / etc)

The reliability of any set of data relies on the precision and accuracy of the reference data that describe the observational values. A tool that manages the codes and their scope in it's own context is missing.

Product

Describe the proposed solution in maximum 3 sentences:

With CodelistManager reference data can be loaded and described as dynamic sets. These sets contain descriptions of the codes, but also contain the validity of the codes over time, space and owner. The CodelistManager offers a facility to generate Codelists for use in curation by other VREVirtual Research Environment.'s, enriched with the scope of their validity and accessability, but also to publish for consumption by external applications in a variety of formats, such as RDF and SDMX.


Priority to CoPCommunity of Practice.

List proposed solution priority following the iMarine Board priority setting criteria:

  • Identified community: Users now: Nearly all conceived VREVirtual Research Environment.'s are in need of high quality and dynamic codelists. Also beyond the gCube environment, access to reliable codelists (i.e. with quality indicators on completeness, validity, and precision) will be valuable, especially if these can be
  • Potential for co-funding: Good. However, community requirements outside the project are not well understood. How dynamic must codelists be, how to generate partial codelists, discovery of codelists, distribution formats, statistical and computational precision etc. all have to be understood. Examples: A codelist of weigth-classes; what does a reported capture of .5 mean? Can it be .5087, can it be .64? A codelist for periods; what is 2009? The calendar year? The Fiscal Year?
  • Structural allocation of resources: To be discussed in a. SB, b. iMarine Board
  • Referred in DoW: T3.3, WP9, T9.3, and in the Methodology.
  • Business Cases: Supports BC 1, 2,.
  • How does the proposed action generally support sustainability aspects Codelist are the base of any harmonization effort. Without properly understood codelists, no data can be realistically used across any system. Without Codelist, there can be no sustainability.
  • How consistent it is with EC regulations/strategies (eg INSPIRE, ... ):

Very much, as most strategies aim to bring data under commonly shared and maintained reference schemes.

  • Re-usability – benefits – compatibility

Very High. The benefit is that with gCube components the codelists can be used in workflow supported data-upgrade services, transforming unstructured datasets to structured quality datasets.

Parentage

Relation to CoPCommunity of Practice. Software

Codelistmanagement is critical to all applications in iMarine where data have to cross a systems or domain boundary.

Relation to D4S technologies For WP6

Does the proposed solution solve other problems associated with EA-CoPCommunity of Practice. Business Cases?

For further iMarine Board evaluation.

If the proposed solution can be used in another SW scenario (not users!) please describe.

For WP6

Public

How big is the expected user community after delivery?

All VREs and all tabular data managing services will benefit from the data exposed through the CodelistManager. Outside the iMarine ecosystem, reliable codelists exposed as e.g. RDF will be marketable.

Productivity

Are the proposed measures effective?

Very much. The service will boost the quality of not only the codelists themselves, but also of all datasets that use the service.

Does it reduce a known workload?

Yes, all efforts to generate, maintain and modify codelists can be done by a few experts, whereas in the current situation, all efforts are repeated without quality indicators.

Price

Is the proposed solution cheap?

No, it should only be pursued if the requirements are understood by WP6 and the iMarine Board.

Expected effort in PM: 8PM at least.

Presentation

How is the component delivered to users? (Design / on-line help / training material / support).

CodelistManager is conceived to be delivered through a VREVirtual Research Environment. that starts from an available codelist; this can be a csv, a codelist from a registry, or other dynamic system. If the system is a dynamic, online repository, a synchronization feature must be considered.

It also requires an user interface to define a new codelist, import an exsting one, manage versioning and synchronization, define sub-sets, manage the validity over owner, space and time.

For the quality indicators, access to the an additional layer is required to define e.g. the uri where the codes came from, the quality of the source code-list, the quality of the items in the list etc.

The codelists have to made available in other VREVirtual Research Environment.'s for validation, but also to external users as SDMX codelists, RDF, and / or JSON. Access to these codelists is subject to policies that relate to the validity of the codelist, and the access rights of the external user or application.

The codelists will have different access policies, translations and units, depending on the user preferences.

Privacy

Are they safe?

There are no privacy issues of legal or physical persons involved.

Need the proposed solution to manage confidential info at data / dataset / organizational level?

Yes, most codelist are not entirely open.

A codelist must be explicitedly published before it can be used by other. If a VREVirtual Research Environment. User has created a codelist, the VREVirtual Research Environment. Manager is the responible to publish in e.g. as a D4S codelist, in an SDMX repository, or otherwise.

Every codelist produced can only be shared by the VREVirtual Research Environment. User that generated it in the VREVirtual Research Environment.. Only the VREVirtual Research Environment. manager can publish the generated maps to render them discoverable and visible to other VREs.

Describe security and privacy issues:

Not possible, here WP6 can contribute.

Policy

Are there any policies available that describe data access and sharing?

No, a beginning has been made in T3.2 by FAO, these are described here:

CLM policy

Are these really needed?

No. Beyond the already available iMarine deployment and operation policies, no code list specific policies are now needed. The expectation is that in order to adhere to iMarine data access and sharing policies a registration to iMarine VREVirtual Research Environment. policies will suffice. With these policies the VREVirtual Research Environment. can manage e.g.:

  • Authentication;
  • Users and role management;
  • Publishing and sharing rights.

Where the code list management tools are used beyond the boundaries of iMarine technologies, other policies may apply that are however irrelevant to this project.

Copyright / attribution / metadata / legal

The CodelistManager can serve as a test-case for the management of attribute meta-data in the public domain. E.g a FAO capture data-set may be curated with Eurostat and Unesco Codelists. No policy exists on the correct copyright citation or legal implications of re-publishing data using external reference data.

Perils

Do they introduce moral hazard? (A hazard here is the risk that users will behave more recklessly if they are insulated from the effects of the software, or if they do noit understand what it produces, where data come from, what they represent etc. .)

The use of codelists may lead users to believe that the data they describe follow the same quality rules as the lists themselves. Bad data remain bad data, even in a high quality system.