Difference between revisions of "CodelistManager"

From D4Science Wiki
Jump to: navigation, search
(UseCase import csv)
(Context)
 
(27 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 +
=Context=
 +
[[Statistical cluster]]
 +
 +
[[CodelistManagerDesign]]
 +
 +
[[CotrixBuild]]
 +
 +
[[Cotrix configuration and deployment scenarios]]
 +
 +
[[Vox]]
 +
 
=Domain Model=  
 
=Domain Model=  
 
This is work in progress and has very much draft status.  
 
This is work in progress and has very much draft status.  
Line 6: Line 17:
 
[[image:CoreDomainModel.jpg]]
 
[[image:CoreDomainModel.jpg]]
  
Collection: An aggregation of codelists and hierarchies. In SDMX a hierarchy is called a HierarchicalCodelist.  
+
Collection: An aggregation of code lists and hierarchies. In SDMX a hierarchy is called a HierarchicalCodelist.  
  
  
 
TODO: Leave, Joine and Merge need to be modeled better.
 
TODO: Leave, Joine and Merge need to be modeled better.
 
TODO: MasterConcepts can have cardinalities with itself as well
 
TODO: MasterConcepts can have cardinalities with itself as well
 +
 +
Note: A simplified version of this model is implemented in cotrix-tabular:
 +
https://github.com/cotrix/cotrixrep/tree/master/cotrix/cotrix-tabular/src/main/java/org/cotrix/domain
 +
The domain model in cotrix-tabular can be considered as the implementation model.
  
 
==Documentation==
 
==Documentation==
Line 45: Line 60:
  
 
* A code can become final when:
 
* A code can become final when:
** it is published in a codelist
+
** it is published in a code list
 
** it is made final
 
** it is made final
 
* A code becomes non final when it was final and has been changed
 
* A code becomes non final when it was final and has been changed
 
* A code is non final when it is created
 
* A code is non final when it is created
* A code can only change from final to non-final when it was not yet published in a codelist
+
* A code can only change from final to non-final when it was not yet published in a code list
 
* Changing the validityPeriod, wellKnownText or value of a final code will result in a copy of that code. The new code will be non final.
 
* Changing the validityPeriod, wellKnownText or value of a final code will result in a copy of that code. The new code will be non final.
 
* Creating a new ''Code'' means also creating a new ''CodeLife''
 
* Creating a new ''Code'' means also creating a new ''CodeLife''
Line 67: Line 82:
 
A Collection will be published as 1 Artefact. Within the published artefact you may still be able to see the original artifacts where it was constructed from.   
 
A Collection will be published as 1 Artefact. Within the published artefact you may still be able to see the original artifacts where it was constructed from.   
  
The difference is that a Union will act like one Codelist. A Union can only be created from other codelists. In its published form it is not always possible to relate back to the original artifacts where it was constructed from.
+
The difference is that a Union will act like one Codelist. A Union can only be created from other code lists. In its published form it is not always possible to relate back to the original artifacts where it was constructed from.
  
 
=Use Cases=  
 
=Use Cases=  
Line 76: Line 91:
 
* User selects CSV file to import from
 
* User selects CSV file to import from
 
* System interprets the CSV file  
 
* System interprets the CSV file  
* User chooses to accept the interpretation or decided to manual intervene
+
* User chooses to accept the interpretation or decides to manually intervene
 
* User can manually intervene by assigning the columntype to each column of the original CSV (columntypes are code, description or annotation)
 
* User can manually intervene by assigning the columntype to each column of the original CSV (columntypes are code, description or annotation)
 
* User can manually intervene by changing the cardinalities between the codecolumns (1-n, 1-1, n-1)
 
* User can manually intervene by changing the cardinalities between the codecolumns (1-n, 1-1, n-1)
Line 82: Line 97:
 
* User makes the artefact(s) final and sends the artefact(s) for approval
 
* User makes the artefact(s) final and sends the artefact(s) for approval
  
* Approver approves or denies the finalised artefact
+
* Approver approves or denies the finalised artefact(s)
 
* Approver sends the artefact(s) for publication
 
* Approver sends the artefact(s) for publication
  
* Publisher publishes the approved artefact
+
* Publisher publishes the approved artefact(s)
 
+
  
 +
Note:
 +
In the future we can think of importing from SDMX, JDBC or any other source.
  
 
==UseCase import csv Example==  
 
==UseCase import csv Example==  
Line 93: Line 109:
 
http://www.fao.org/fishery/collection/asfis/en documented here http://km.fao.org/FIGISwiki/index.php/ASFIS_SDMX_Codelist
 
http://www.fao.org/fishery/collection/asfis/en documented here http://km.fao.org/FIGISwiki/index.php/ASFIS_SDMX_Codelist
  
After having imported the ASFIS file, the following codelists are '''interpreted''':
+
After having imported the ASFIS file, the following code lists are '''interpreted''':
 
* ASFIS Species Alpha 3 Codelist
 
* ASFIS Species Alpha 3 Codelist
 
* ASFIS Species Taxonomic Codelist
 
* ASFIS Species Taxonomic Codelist
* ASFIS Species Family Taxonomic codelist
+
* ASFIS Species Family Taxonomic code list
* ASFIS Species Order Taxonomic codelist
+
* ASFIS Species Order Taxonomic code list
 
and hierarchies:
 
and hierarchies:
 
* Relation ASFIS Species Taxonomic code - Alpha 3 code
 
* Relation ASFIS Species Taxonomic code - Alpha 3 code
Line 105: Line 121:
 
* ASFIS List of Species
 
* ASFIS List of Species
  
'''Interpreted''' means that the system is capable of understanding all the implicit relations in the tabular format file like the the '''ASFIS_sp_Feb_2011.txt''' file and shows in the UI distinguished codelists, hierarchies and collections. The '''ASFIS_sp_Feb_2011.txt''' file results therefore in 4 codelits, 3 hierarchies and 1 collection.
+
'''Interpreted''' means that the system is capable of understanding all the implicit relations in the tabular format file like the the '''ASFIS_sp_Feb_2011.txt''' file and shows in the UI distinguished code lists, hierarchies and collections. The '''ASFIS_sp_Feb_2011.txt''' file results therefore in 4 codelits, 3 hierarchies and 1 collection.
  
 
The collection ''ASFIS List of Species'' is containing the same information as the original ASFIS_sp_Feb_2011.txt file.
 
The collection ''ASFIS List of Species'' is containing the same information as the original ASFIS_sp_Feb_2011.txt file.
Line 118: Line 134:
 
* View edited codes/hierarchies
 
* View edited codes/hierarchies
 
* Make ''Artefact'' final
 
* Make ''Artefact'' final
 
==UseCase approve==
 
A collection, codelist, hierarchy is approved and is ready to be published.
 
 
  
 
==UseCase publish==  
 
==UseCase publish==  
 
A ''Collection'', ''Codelist'' or ''Hierarchy'' can be published through SDMX, CSV:
 
A ''Collection'', ''Codelist'' or ''Hierarchy'' can be published through SDMX, CSV:
* Codelists are published as SDMX codelists according the SDMX REST specifications.
+
* Codelists are published as SDMX code lists according the SDMX REST specifications.
* Hierarchies are published as SDMX hierarchical codelists according the SDMX REST specifications
+
* Hierarchies are published as SDMX hierarchical code lists according the SDMX REST specifications
 
* Collections are published as zip, txt, zip containing a txt file or zip containing a csv file. Such a collection would represent for instance the original ASFIS txt file.
 
* Collections are published as zip, txt, zip containing a txt file or zip containing a csv file. Such a collection would represent for instance the original ASFIS txt file.
  
 +
 +
==UseCase Union==
 +
* select 2 or more code lists
 +
* publish them as 1 code list or layer
  
 
==UseCase DiffReport==  
 
==UseCase DiffReport==  
Line 141: Line 157:
  
  
==UseCase publish layer as codelist==  
+
==UseCase publish layer as code list==  
 +
 
 +
 
 
[[File:ImportLayer.jpg]]
 
[[File:ImportLayer.jpg]]
 
* Import layer (shapefile)
 
* Import layer (shapefile)
 
* .... process generic edit and approve functions
 
* .... process generic edit and approve functions
 
* Publish as CSV and SDMX
 
* Publish as CSV and SDMX
 +
 +
Low prios:
 
* Publish as WFS and WMS(format shape)
 
* Publish as WFS and WMS(format shape)
 
* Publish in PostGis
 
* Publish in PostGis
 
* Publish in Oracle Locator
 
* Publish in Oracle Locator
  
 +
 +
* The geometry is expressed as Well-known text(WKT) http://en.wikipedia.org/wiki/Well-known_text
 +
* Language dependent attributes from the shapefile are expressed as descriptions
 +
* Non language dependent attributes from the shapefile are expressed as annotations
 +
 +
 +
* The geo-code list end-product should handle  '''source layer provenance''' information, i.e. from a tabular data column curated with such geo-code list, we must be able to know the GIS layer provenance information. The layer provenance information should be enought to point back on the layer. This information should include at least (1) the Geoserver base URL & (2) the layer name
 +
 +
Such layer provenance information is required in the SPREAD scenario, for intersection Data Discovery.
 +
 +
==UseCase publish layer as code list Example==
  
 
The practical case behind this usecase is the FAO major areas:
 
The practical case behind this usecase is the FAO major areas:
Line 155: Line 186:
 
http://km.fao.org/FIGISwiki/index.php/FMA_SDMX_Codelist
 
http://km.fao.org/FIGISwiki/index.php/FMA_SDMX_Codelist
  
After having imported the FAO areas layer, the following codelists are interpreted:
+
After having imported the FAO areas layer, the following code lists are interpreted:
* FAO Production Area codelist (from major area to sub-unit)
+
* FAO Production Area code list (from major area to sub-unit)
* FAO Major Water Area codelist
+
* FAO Major Water Area code list
* FAO Major Water Area Subarea codelist
+
* FAO Major Water Area Subarea code list
* FAO Major Water Area Division codelist
+
* FAO Major Water Area Division code list
* FAO Major Water Area Subdivision codelist
+
* FAO Major Water Area Subdivision code list
* FAO Major Water Area Subunit codelist
+
* FAO Major Water Area Subunit code list
 
and hierarchies:
 
and hierarchies:
 
* Relation Area code - Subarea code
 
* Relation Area code - Subarea code
Line 167: Line 198:
 
* Relation Division code - Subdivision code
 
* Relation Division code - Subdivision code
 
* Relation Subdivision code - Subunit code
 
* Relation Subdivision code - Subunit code
This practical case will follow this [[CodelistManager#Workflow_of_an_artefact|Workflow of an artefact]]: Imported, Parsed, '''Immutable''', Final, Approved and Published. The editing work will be done in ArcGis. The the reference will be that shapefile edited by ArcGis.  
+
This practical case will follow this [[CodelistManager#Workflow_of_an_artefact|Workflow of an artefact]]: Imported, Interpreted, '''Immutable''', Final, Approved and Published. The editing work will be done in ArcGis. The the reference will be that shapefile edited by ArcGis.
  
 
+
Eventually this usecase can replace the shp2Oracle and re-index functionality, currently used by Fabio Carocci.
'''Rules'''
+
* The geometry is expressed as Well-known text(WKT) http://en.wikipedia.org/wiki/Well-known_text
+
* Language dependent attributes from the shapefile are expressed as descriptions
+
* Non language dependent attributes from the shapefile are expressed as annotations
+
  
 
=Core Rules=  
 
=Core Rules=  
 
* A code can become final when:
 
* A code can become final when:
** it is published in a codelist
+
** it is published in a code list
 
** it is made final
 
** it is made final
 
* A code becomes non final when it was final and has been changed
 
* A code becomes non final when it was final and has been changed
 
* A code is non final when it is created
 
* A code is non final when it is created
* A code can only change from final to non-final when it was not yet published in a codelist
+
* A code can only change from final to non-final when it was not yet published in a code list
 
* Changing the validityPeriod, wellKnownText or value of a final code will result in a copy of that code. The new code will be non final.
 
* Changing the validityPeriod, wellKnownText or value of a final code will result in a copy of that code. The new code will be non final.
 
* Creating a new ''Code'' means also creating a new ''CodeLife''
 
* Creating a new ''Code'' means also creating a new ''CodeLife''

Latest revision as of 17:06, 5 December 2012

Context

Statistical cluster

CodelistManagerDesign

CotrixBuild

Cotrix configuration and deployment scenarios

Vox

Domain Model

This is work in progress and has very much draft status.


Core

CoreDomainModel.jpg

Collection: An aggregation of code lists and hierarchies. In SDMX a hierarchy is called a HierarchicalCodelist.


TODO: Leave, Joine and Merge need to be modeled better. TODO: MasterConcepts can have cardinalities with itself as well

Note: A simplified version of this model is implemented in cotrix-tabular: https://github.com/cotrix/cotrixrep/tree/master/cotrix/cotrix-tabular/src/main/java/org/cotrix/domain The domain model in cotrix-tabular can be considered as the implementation model.

Documentation

Documentation.jpg

Workflow of an artefact

Workflow.jpg WorkflowClassDiagram.jpg

The WorkflowStatus has as possible values the possible artefact states.

Note that the workflow is aware of an artefact, not the other way around. Doing so, the workflow can be a pluggable module.


Type (Artefact or Code) Description
LocallyCreated This is the default type. Created or imported and the further lifecycle and management is in the system
ImportedImmutable Imported from outside and cannot be changed. Will have a CodeLife. The lifecycle and management is outside of the system but will be followed and monitored in the system.
LifeLinked Is only linked, not stored and at most cached. Works in principle only in case the outside link is available. Will not have a CodeLife, the lifecycle and management is outside of the system

TODO: document all states.

Statechart of a Code

StatechartDiagramCode.jpg The CodeStatus has as values the possible code states. Codestatus.jpg

  • A code can become final when:
    • it is published in a code list
    • it is made final
  • A code becomes non final when it was final and has been changed
  • A code is non final when it is created
  • A code can only change from final to non-final when it was not yet published in a code list
  • Changing the validityPeriod, wellKnownText or value of a final code will result in a copy of that code. The new code will be non final.
  • Creating a new Code means also creating a new CodeLife
  • Making a copy of a Code results in adding a link from that new Code to its CodeLife


PartialCodelist

PartialCodelist.jpg


Union

Union.jpg

Union is similar to Collection and PartialCodelist, but different!

A Collection will be published as 1 Artefact. Within the published artefact you may still be able to see the original artifacts where it was constructed from.

The difference is that a Union will act like one Codelist. A Union can only be created from other code lists. In its published form it is not always possible to relate back to the original artifacts where it was constructed from.

Use Cases

CoreUseCases.jpg


UseCase import csv

  • User selects CSV file to import from
  • System interprets the CSV file
  • User chooses to accept the interpretation or decides to manually intervene
  • User can manually intervene by assigning the columntype to each column of the original CSV (columntypes are code, description or annotation)
  • User can manually intervene by changing the cardinalities between the codecolumns (1-n, 1-1, n-1)
  • User gives version name
  • User makes the artefact(s) final and sends the artefact(s) for approval
  • Approver approves or denies the finalised artefact(s)
  • Approver sends the artefact(s) for publication
  • Publisher publishes the approved artefact(s)

Note: In the future we can think of importing from SDMX, JDBC or any other source.

UseCase import csv Example

A good example for the import csv file is the ASFIS species list. The Asfis species list is a zip file, containing the file ASFIS_sp_Feb_2011.txt, which is a csv file. The implicit hierarchies in this file are documented here. http://www.fao.org/fishery/collection/asfis/en documented here http://km.fao.org/FIGISwiki/index.php/ASFIS_SDMX_Codelist

After having imported the ASFIS file, the following code lists are interpreted:

  • ASFIS Species Alpha 3 Codelist
  • ASFIS Species Taxonomic Codelist
  • ASFIS Species Family Taxonomic code list
  • ASFIS Species Order Taxonomic code list

and hierarchies:

  • Relation ASFIS Species Taxonomic code - Alpha 3 code
  • Relation ASFIS Family - Species
  • Relation ASFIS Order - Family

and collections

  • ASFIS List of Species

Interpreted means that the system is capable of understanding all the implicit relations in the tabular format file like the the ASFIS_sp_Feb_2011.txt file and shows in the UI distinguished code lists, hierarchies and collections. The ASFIS_sp_Feb_2011.txt file results therefore in 4 codelits, 3 hierarchies and 1 collection.

The collection ASFIS List of Species is containing the same information as the original ASFIS_sp_Feb_2011.txt file.

UseCase create new version of an Artefact

  • Start from scratch, import, or copy an existing Artefact in order to work on a new version of an Artefact.
  • Delete codes/hierarchies
  • Add codes/hierarchies
  • Edit codes/hierarchies
  • View deleted codes/hierarchies
  • View added codes/hierarchies
  • View edited codes/hierarchies
  • Make Artefact final

UseCase publish

A Collection, Codelist or Hierarchy can be published through SDMX, CSV:

  • Codelists are published as SDMX code lists according the SDMX REST specifications.
  • Hierarchies are published as SDMX hierarchical code lists according the SDMX REST specifications
  • Collections are published as zip, txt, zip containing a txt file or zip containing a csv file. Such a collection would represent for instance the original ASFIS txt file.


UseCase Union

  • select 2 or more code lists
  • publish them as 1 code list or layer

UseCase DiffReport

  1. User select artefact(Codelist, HierarchicalCodelist or Collection).
  2. User selects a certain version from that artefact.
  3. User selects another version from that same artefact.
  4. User clicks on generate DiffReport and views the DiffReport

The report shows:

  • Codes added.
  • Codes deleted.
  • Number of codes in the first and second selected version.


UseCase publish layer as code list

ImportLayer.jpg

  • Import layer (shapefile)
  • .... process generic edit and approve functions
  • Publish as CSV and SDMX

Low prios:

  • Publish as WFSWeb Feature Service and WMSSee Workload Management System or Web Mapping Service.(format shape)
  • Publish in PostGis
  • Publish in Oracle Locator


  • The geometry is expressed as Well-known text(WKT) http://en.wikipedia.org/wiki/Well-known_text
  • Language dependent attributes from the shapefile are expressed as descriptions
  • Non language dependent attributes from the shapefile are expressed as annotations


  • The geo-code list end-product should handle source layer provenance information, i.e. from a tabular data column curated with such geo-code list, we must be able to know the GIS layer provenance information. The layer provenance information should be enought to point back on the layer. This information should include at least (1) the Geoserver base URL & (2) the layer name

Such layer provenance information is required in the SPREAD scenario, for intersection Data Discovery.

UseCase publish layer as code list Example

The practical case behind this usecase is the FAO major areas:

http://km.fao.org/FIGISwiki/index.php/FMA_SDMX_Codelist

After having imported the FAO areas layer, the following code lists are interpreted:

  • FAO Production Area code list (from major area to sub-unit)
  • FAO Major Water Area code list
  • FAO Major Water Area Subarea code list
  • FAO Major Water Area Division code list
  • FAO Major Water Area Subdivision code list
  • FAO Major Water Area Subunit code list

and hierarchies:

  • Relation Area code - Subarea code
  • Relation Subarea code - Division code
  • Relation Division code - Subdivision code
  • Relation Subdivision code - Subunit code

This practical case will follow this Workflow of an artefact: Imported, Interpreted, Immutable, Final, Approved and Published. The editing work will be done in ArcGis. The the reference will be that shapefile edited by ArcGis.

Eventually this usecase can replace the shp2Oracle and re-index functionality, currently used by Fabio Carocci.

Core Rules

  • A code can become final when:
    • it is published in a code list
    • it is made final
  • A code becomes non final when it was final and has been changed
  • A code is non final when it is created
  • A code can only change from final to non-final when it was not yet published in a code list
  • Changing the validityPeriod, wellKnownText or value of a final code will result in a copy of that code. The new code will be non final.
  • Creating a new Code means also creating a new CodeLife
  • Making a copy of a Code results in adding a link from that new Code to its CodeLife

Note: This table is not yet integrated in the model.

Nice to haves

  • Integration with SharePoint
  • Support for CMIS
  • Export to OWL
  • Export to SKOSS
  • Export to RDF
  • Merging
  • Mapping

Links

[1]TaxoTools