Talk:EA-CoP Data Access and Sharing Policies

From D4Science Wiki
Jump to: navigation, search

Here follows a summary of major feedback collected during meetings, mail exchanges or live discussions.

Feedback on Policy documents presented at iMB3 (March 2013)

Business Metadata

From: Nicolas Bailly - 26 March 2013 BioFresh metadata: http://data.freshwaterbiodiversity.eu/aboutmdb.html (and explore the menu about Metadata/Datasets).

Feedback on Annexes


From mail exchange between Ward Appeltans and Anton Ellenbroek [AE] - 29 October 2012


I have some remarks/questions on the Annex 2 draft iMarine "terms of use".

1. Disclaimer: - It will be important to add that iMarine is not liable for the content/quality of the data or for any misuse or misinterpretation of the data. {=> Added AG 9 March 2013}


2. Copyright: -conflict with paragraph 1 and 2. Are all dataset records licensed under the CC license?

  • [AE] No, we should adhere to the providers policy. If that is not possible, the owner should not publish the data and only use them in their private environment, which can be a VREVirtual Research Environment.. We should revise this para. {=> Modified AG 9 March 2013}


3. Online community:

  • iMarine Channel: "register for video repository, personal web space". Are we doing this?
    • [AE] Not sure what you mean here with doing. The project supports similar facilities, however, not through the infrastructure.
    • {AG} So far this is what iMarine is indicating in the "Join" page http://www.i-marine.eu/Pages/StaticProfile.aspx


  • iMarine Gateway: can you confirm, is this only open for members of institutions that have an agreement with iMarine?
    • [AE] Like a bank safe …. depends on how you use it. Every user has to register for registration to VREVirtual Research Environment.. It is then up to the VREVirtual Research Environment. owner to grant access.


4. VREs: "Only metadata formats considered as iMarine standards can be fully supported". 3.5 Metadata on page 7 deals with metadata, but doesn't say something about 'our' standards for metadata.

  • [AE] This is written with a future situation in mind where we will have selected a set of formats.
  • {AG} You can also consult the Metadata standards page for accessing all relevant information and discussions on Business Metadata.


Annex 3, iMarine privacy policy

1. VREVirtual Research Environment. security issues

  • "as long as long" double long. {=> Fixed AG 9 March 2013}
  • changes to the policy: you refer to "this page", => add a URL. {AG} Actually "this page" is the page the user is reading, the privacy Policy document itself. Anyway, I added the URL of the current iMarine privacy policy page. (9 March 2013)
  • And who can decide on changes to the Policy? [AE] The policy is under the remit of the Board, and that is where changes can be proposed.

Share alike databases - Machine processable terms of use

From: Alex Hardisty - 11 October 2012 12:22

...I suggest we have to distinguish between the two following:

1) "data-set" = A data product; a collection of data with associated metadata describing its identity, provenance, purpose and relationship with other artefacts within an infrastructure.

2) "raw data" = Unvalidated data drawn directly from an instrument (or other logical data source); generally not annotated or partitioned until processed within an infrastructure.

Raw data can be real-time, nearRT, or legacy.

Data-sets are never real-time or nRT. By definition, data-sets are raw data that has been processed (perhaps in real-time) but which is now held somewhere for subsequent processing. Data-sets can be current or legacy.

Both data-sets and raw data can be observational data, or derived data.

Data access and sharing policies may be different for raw data and data-sets. {AG 9 March 2013} This thread can be further discussed during the Board Meeting 3. My opinion is that even if the distinction may be needed at "Community best practices" level, the share procedures (within/among VREs or with public audience) should be the same regardless the type of data.


From: Los, Wouter - 10/10/2012 17:25

Dear all,

Most important for me was that we should avoid to have separate disciplinary solutions. A generic approach (from the viewpoint of data infrastructures) should look for solutions to accommodate different kinds of entering data. Thus, the categories of real-time, near real-time or historical/legacy (Neil: I like this) are only meant for optimizing services from data infrastructures. The use case of iMarine is interesting as this probably includes all three categories.

Best wishes, Wouter


From: Neil Holdsworth - October 10, 2012 4:19 PM

Dear All,

I have seen a similar debate in other fora about making a categorisation and agree that it is useful to distinguish but problematic to define. In discussions with other data managers, it seems that the only definition that would stand up to scrutiny is to use the level of quality control on a dataset as the measure of whether it falls into real-time, near real-time or historical/legacy. This only works if you have a good handle on the quality control usually through a flagging system.

Neil


From: Uhlir, Paul - 10 October 2012 14:01


Dear Donatella and Wouter,

The iMarine project may well be a very good case study to use in the whole process, so thanks for suggesting that.

I also am supportive of the "legacy, observational and real-time data" categories that Wouter brought up. I have a slight definitional problem, however. I presume that by "legacy" you mean historical/archived (and only digital?). By "observational," you mean you may mean "near-real time"? If not, what is the distinguishing characteristic? In any event, there are definitely different economic values, uses, and licensing considerations for these (and other) data categories that should be explored.

In the GEO summary white paper on data licensing I sent earlier, we only addressed the easy case of the most open category of data, the GEOSS Data-CORE. How more restricted categories of data can be navigated with the greatest amount of openness possible, however, is still not fully clear.

Cheers, Paul


From: Los, Wouter - October 08, 2012 6:01 PM


...My impression is that the process is getting direction and speed, which is good news.

Your suggestion below needs support in my view. The subject is very relevant, especially with attention on trust for data generators and data owners when they have to rely on data infrastructures. In this respect we should avoid only disciplinary approaches, since this will bring us again in compartments rather than generic data infrastructure approaches. It would in my opinion sense to distinguish between legacy data, observational data and real time data, which each deserve attention for appropriate solutions.

Best wishes, Wouter


From: Donatella Castelli - Monday, October 08, 2012 6:05 PM

Dear all,

I think that this effort on data access and sharing policy is very in line with the objectives of the constituent RDA WG that I introduced at the iMarine Board meeting last week. I am cc-ing Enrique Alonso Garcia and Paul Uhlir, the co-chairs of the RDA Legal Interoperability WG proposal (attached the last version of the slides that they presented in Washington).

Paul, Enrique, in the context of the iMarine (http://www.i-marine.eu/) and agINFRA (http://aginfra.eu/ ) EU projects we are experimenting concrete needs for clearly defined data sharing policies. Both projects have activities dedicated to address this issue. We had a very interesting discussion on this point last week at the iMarine Board. Do you think that we could join forces with RDA? Our activity will be done anyway since it is part of the projects contract signed with the EU. How could we start collaborating if it makes sense?

the CReATIVE-B project - legal requirements for biodiversity-related infrastructures

From Nikos Manouselis - 08 October 2012

...we had the agINFRA plenary meeting...

One of the issues that we discussed was related to the way all the aggregated data and metadata sources should be licensed in order to offer them to our users.

During our discussion with the agINFRA Advisory Board (including experts from BioVEL and LifeWatch) we found out that a review of such legal requirements for biodiversity-related infrastructures will take place within the CReATIVE-B project (http://creative-b.eu/) and will cover several global infrastructures such as LifeWatch and GBIF.

We plan to set up a Working Group that will include people from all relevant initiatives, maybe this would be an opportunity to liaise also with IMARINE so that its also represented in this discussion. ...

Data stored or accessed through iMarine

From: Pasquale Pagano - 05 October 2012

Looking at the Document iMB2/2012/3, I noticed the following sentence in section 4. Data Access.

"By accessing the data stored in iMarine, users agree to the related Data Access Policy."

I think that we should avoid to say 'stored'. We could report instead:

"By accessing the data made available by iMarine, users agree to the related Data Access Policy." {=> Added AG 9 March 2013}

In fact, the same data access policy should apply to data either stored in iMarine or accessed through iMarine.

Licensing frameworks

From: Hervé Caumont - Terradue - 05 October 2012

... following today's review of the MoU topic, here are some pointers wrt Licensing frameworks that we discussed:

Open Database License (ODbL) is published by Open Data Commons / Open Knowledge Foundation. -- The OpenStreetMap (OSM) project completed the move from a Creative Commons license to ODbL in September 2012 http://en.wikipedia.org/wiki/Open_Database_License

Creative Commons Rights Expression Language (CC REL) http://wiki.creativecommons.org/CcREL http://code.creativecommons.org/doc/commoner/metadata.html

Data Provenance

From: Ellenbroek, Anton (FIPS) - 26 September 2012

This is OK. But what is our policy should we re-use a sub-set of information.

E.g. we have a global dataset (with coverage global) of all agriculture products from 2000 to 2010 and filter to make it the Sabaudia dataset for finocchio in 2010.

Who and how adjusts the metadata of this dataset?

And now I merge this with socio economic data of farmers income in Italy from ISTAT - 2004 - 2011.

What publish data do I give my dataset?

And then I cross-analyze the data, and discover the finocchio production per ha in Sabaudia. Who will be the owner of that?

My policy would be something like:

"All datasets published in iMarine as publicly accessible information in repositories, on-line content, or downloadable datasets will be the ownership of the VREVirtual Research Environment.-owner. It's provenance will contain reference to the iMarine project, enriched with the list of original references to the contributing dataset. iMarine shall not marshal, manage, or maintain any metadata that can be modified by users, or provide services that exploit these metadata to govern access and / or sharing policies"

So we concatenate all metadata references we find, without any management on top.


From: Gentile, Aureliano (FIPS)- Wednesday, September 26, 2012

...while updating the iMarine policy document I took inspiration from this message for better structuring the list of possible candidates for business metadata (applied at data set level):

  • Ownership and context

Owner Context (including Spatial scale, geographic coverage, topic coverage, and references in case of derivative works)

  • Authorship

Author Title Publisher Publish date Last update date Expiry date

  • Copyright licenses

Rights management, Creative Commons License type (or other licenses …)

  • Content description

Data aggregation level Spatial Scale Geographical coverage Topic coverage ...


From: Taconet, Marc (FIPS) - 21 September 2012

Astonishing! Look at the first par. From Tina's message ... The exact set of terms we discussed today Re Policy doc in iMarine. ...


From: Farmer, Tina (FIPS)- Fri Sep 21

I would like to let you know that since I sent you these texts, I’ve been trying to obtain more information regarding their provenance, authorship, ‘ownership’, context, the full list of topics referred to, etc. ...

FishFrame

From: julien barde - 04 March 2012

... actually fishrame is a data format that focus on fisheries data sharing, wherever they are collected onboard or not, for example it works fine with sampling data from landings as well. Fishframe is best suited to manage (and then share) fisheries data as the model is clearly made for this kind of data. In particular it covers the Data Collection Framework data that European Commission wants to get for stock assessment. So I would say that fishrame is the most relevant data format to share fisheries data with accuracy (describing the observation context: fishing gears, effort...) between partners involved in fisheries data collection. However it's not an official european standard for now. But if the goal is to share data with statisticians who don't need much about fishing activities description (just catches) then SDMX is probably better as they can be shared in a more generic manner. In the same way Darwin Core would be better to share fisheries data with (marine) ecology scientists who just care about species, date, location, size...but Darwin core won't enable to share easily information on fisheries sampling protocols as we can do it with fishframe. The main question is about the community who is going to use the data and the goals of data analysis. So far, our approach is to adapt the data export to the requirements of the community of users. SDMX can enable the use of our data with fishstatJ for example which is very interesting. One of the question we had with Pascal and Erik was about the interest to get a data conversion tool between fishframe and SDMX. But now with Pascal and Erik's work we can share our data with SDMX and fishframe. The good point with fishframe as well is that they furnish a R library (COST library) to analyze fisheries data in fishframe format. ....

FLOD positioning - Authoritative codelists and their mapping

From: Calderini, Francesco (CIOK) - 18 January 2012 13:35

Def: Clearly accurate or knowledgeable Examples of AUTHORITATIVE 'The book is an authoritative guide to the city's restaurants'

Authoritative is always with respect to a context (the city's restaurants), there is not 'authoritative' attribution valid everywhere. Hence, when you talk about codelists management, there is no 'authoritative codelist', that is the assumption, but a set of codelists with a different degree of quality, authoritative in the domain (statistical dataset) they are used.


From: Taconet, Marc (FIPS)18 January 2012

Colleagues,

Some meat for brainstorming, and feeding Data policy aspects of Code lists management together with the positioning of FLOD.

Anton would like us to better define "Authoritative codelists and their mapping"; Below are the issues which he raised:

<< Some examples of difficulties: AFSIS may be declared authoritative for a purpose (capture recording), but it contains items (Small tuna's) which are useless to taxonomists, and it is incomplete for e.g. decapods. Two recognized species list may have different names for the same species. Or it may be a subspecies, or belong in a different family. Who is authoritative? A gear may be defined with a name, but the maze size or beam width may vary over time. So what is the authoritative definition? A species may be called 'local' in France, if it never breeds, but 'migrational' in the UK. So what is the authoritative status? A species may be endangered according to IUCN, but not in the classification of org XYZ. What should be the result of a query on endangered species by org XYZ?

Same for mapping; what is the exchange rate $ to €? What is the French translation for vessel? Navire or vaisseau?

For lists like ports, gears, fuel, I believe we can achieve authoritative lists, but we should ask an expert if this is feasible for gears.

What is the FLOD vision for all this?

Useful References

  • ICES Guidelines and Policy
  • Final report of a Board on Research Data and Information (BRDI) project: For Attribution-Developing Data Attribution and Citation Practices and Standards, which explored how to assign credit for data (attribution) and how to reference data (citation) in ways that others can identify, discover, and retrieve them. Among the questions explored were: what approaches are generic across disciplines and what practices are field-specific; what are the major scientific, technical, institutional, financial, legal, and socio-cultural issues that need to be considered; and what are some of the options for the successful development and implementation of data citation practices and standards? A free pdf is available electronically (after registering) at: http://www.nap.edu/catalog.php?record_id=13564 . Print copies may be ordered through the National Academies Press (http://www.nap.edu), and additional information about the background of this project may be found on the Board's website at http://sites.nationalacademies.org/PGA/brdi/PGA_063656 . Questions or comments about the project may be directed to my attention at puhlir@nas.edu.