Difference between revisions of "CodelistManager"
(→Nice to haves) |
(→Context) |
||
(62 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
− | = | + | =Context= |
+ | [[Statistical cluster]] | ||
+ | |||
+ | [[CodelistManagerDesign]] | ||
+ | |||
+ | [[CotrixBuild]] | ||
+ | |||
+ | [[Cotrix configuration and deployment scenarios]] | ||
+ | |||
+ | [[Vox]] | ||
+ | |||
+ | =Domain Model= | ||
+ | This is work in progress and has very much draft status. | ||
+ | |||
+ | |||
+ | ==Core== | ||
[[image:CoreDomainModel.jpg]] | [[image:CoreDomainModel.jpg]] | ||
− | |||
+ | Collection: An aggregation of code lists and hierarchies. In SDMX a hierarchy is called a HierarchicalCodelist. | ||
− | == | + | |
+ | TODO: Leave, Joine and Merge need to be modeled better. | ||
+ | TODO: MasterConcepts can have cardinalities with itself as well | ||
+ | |||
+ | Note: A simplified version of this model is implemented in cotrix-tabular: | ||
+ | https://github.com/cotrix/cotrixrep/tree/master/cotrix/cotrix-tabular/src/main/java/org/cotrix/domain | ||
+ | The domain model in cotrix-tabular can be considered as the implementation model. | ||
+ | |||
+ | ==Documentation== | ||
[[image:Documentation.jpg]] | [[image:Documentation.jpg]] | ||
+ | |||
+ | ==Workflow of an artefact== | ||
+ | [[image:workflow.jpg]] | ||
+ | [[image:WorkflowClassDiagram.jpg]] | ||
+ | |||
+ | The WorkflowStatus has as possible values the possible artefact states. | ||
+ | |||
+ | Note that the workflow is aware of an artefact, not the other way around. Doing so, the workflow can be a pluggable module. | ||
+ | |||
+ | |||
+ | |||
+ | {| border="1" cellpadding="1" cellspacing="1" | ||
+ | |- | ||
+ | | '''Type''' (Artefact or Code) | ||
+ | | '''Description''' | ||
+ | |- | ||
+ | | LocallyCreated | ||
+ | | This is the default type. Created or imported and the further lifecycle and management is '''in''' the system | ||
+ | |- | ||
+ | | ImportedImmutable | ||
+ | | Imported from outside and cannot be changed. Will have a ''CodeLife''. The lifecycle and management is '''outside''' of the system but will be followed and monitored in the system. | ||
+ | |- | ||
+ | | LifeLinked | ||
+ | | Is only linked, not stored and at most cached. Works in principle only in case the outside link is available. Will not have a ''CodeLife'', the lifecycle and management is '''outside''' of the system | ||
+ | |} | ||
+ | TODO: document all states. | ||
+ | |||
+ | ==Statechart of a Code== | ||
+ | [[image:StatechartDiagramCode.jpg]] The CodeStatus has as values the possible code states. [[image:Codestatus.jpg]] | ||
+ | |||
+ | * A code can become final when: | ||
+ | ** it is published in a code list | ||
+ | ** it is made final | ||
+ | * A code becomes non final when it was final and has been changed | ||
+ | * A code is non final when it is created | ||
+ | * A code can only change from final to non-final when it was not yet published in a code list | ||
+ | * Changing the validityPeriod, wellKnownText or value of a final code will result in a copy of that code. The new code will be non final. | ||
+ | * Creating a new ''Code'' means also creating a new ''CodeLife'' | ||
+ | * Making a copy of a ''Code'' results in adding a link from that new ''Code'' to its ''CodeLife'' | ||
+ | |||
+ | |||
+ | |||
==PartialCodelist== | ==PartialCodelist== | ||
[[image:PartialCodelist.jpg]] | [[image:PartialCodelist.jpg]] | ||
− | |||
− | |||
− | = | + | |
+ | ==Union== | ||
+ | [[image:union.jpg]] | ||
+ | |||
+ | Union is similar to Collection and PartialCodelist, but different! | ||
+ | |||
+ | A Collection will be published as 1 Artefact. Within the published artefact you may still be able to see the original artifacts where it was constructed from. | ||
+ | |||
+ | The difference is that a Union will act like one Codelist. A Union can only be created from other code lists. In its published form it is not always possible to relate back to the original artifacts where it was constructed from. | ||
+ | |||
+ | =Use Cases= | ||
[[image:CoreUseCases.jpg]] | [[image:CoreUseCases.jpg]] | ||
==UseCase import csv== | ==UseCase import csv== | ||
+ | * User selects CSV file to import from | ||
+ | * System interprets the CSV file | ||
+ | * User chooses to accept the interpretation or decides to manually intervene | ||
+ | * User can manually intervene by assigning the columntype to each column of the original CSV (columntypes are code, description or annotation) | ||
+ | * User can manually intervene by changing the cardinalities between the codecolumns (1-n, 1-1, n-1) | ||
+ | * User gives version name | ||
+ | * User makes the artefact(s) final and sends the artefact(s) for approval | ||
+ | |||
+ | * Approver approves or denies the finalised artefact(s) | ||
+ | * Approver sends the artefact(s) for publication | ||
+ | |||
+ | * Publisher publishes the approved artefact(s) | ||
+ | |||
+ | Note: | ||
+ | In the future we can think of importing from SDMX, JDBC or any other source. | ||
+ | |||
+ | ==UseCase import csv Example== | ||
A good example for the import csv file is the ASFIS species list. The Asfis species list is a zip file, containing the file ASFIS_sp_Feb_2011.txt, which is a csv file. The implicit hierarchies in this file are documented here. | A good example for the import csv file is the ASFIS species list. The Asfis species list is a zip file, containing the file ASFIS_sp_Feb_2011.txt, which is a csv file. The implicit hierarchies in this file are documented here. | ||
http://www.fao.org/fishery/collection/asfis/en documented here http://km.fao.org/FIGISwiki/index.php/ASFIS_SDMX_Codelist | http://www.fao.org/fishery/collection/asfis/en documented here http://km.fao.org/FIGISwiki/index.php/ASFIS_SDMX_Codelist | ||
− | After having imported the ASFIS file, the following | + | After having imported the ASFIS file, the following code lists are '''interpreted''': |
* ASFIS Species Alpha 3 Codelist | * ASFIS Species Alpha 3 Codelist | ||
* ASFIS Species Taxonomic Codelist | * ASFIS Species Taxonomic Codelist | ||
− | * ASFIS Species Family Taxonomic | + | * ASFIS Species Family Taxonomic code list |
− | * ASFIS Species Order Taxonomic | + | * ASFIS Species Order Taxonomic code list |
and hierarchies: | and hierarchies: | ||
* Relation ASFIS Species Taxonomic code - Alpha 3 code | * Relation ASFIS Species Taxonomic code - Alpha 3 code | ||
Line 31: | Line 121: | ||
* ASFIS List of Species | * ASFIS List of Species | ||
− | '''Interpreted''' means that the system is capable of understanding all the implicit relations in the tabular format file like the the '''ASFIS_sp_Feb_2011.txt''' file and shows in the UI distinguished | + | '''Interpreted''' means that the system is capable of understanding all the implicit relations in the tabular format file like the the '''ASFIS_sp_Feb_2011.txt''' file and shows in the UI distinguished code lists, hierarchies and collections. The '''ASFIS_sp_Feb_2011.txt''' file results therefore in 4 codelits, 3 hierarchies and 1 collection. |
The collection ''ASFIS List of Species'' is containing the same information as the original ASFIS_sp_Feb_2011.txt file. | The collection ''ASFIS List of Species'' is containing the same information as the original ASFIS_sp_Feb_2011.txt file. | ||
Line 44: | Line 134: | ||
* View edited codes/hierarchies | * View edited codes/hierarchies | ||
* Make ''Artefact'' final | * Make ''Artefact'' final | ||
− | |||
− | |||
− | |||
− | |||
==UseCase publish== | ==UseCase publish== | ||
A ''Collection'', ''Codelist'' or ''Hierarchy'' can be published through SDMX, CSV: | A ''Collection'', ''Codelist'' or ''Hierarchy'' can be published through SDMX, CSV: | ||
− | * Codelists are published as SDMX | + | * Codelists are published as SDMX code lists according the SDMX REST specifications. |
− | * Hierarchies are published as SDMX hierarchical | + | * Hierarchies are published as SDMX hierarchical code lists according the SDMX REST specifications |
* Collections are published as zip, txt, zip containing a txt file or zip containing a csv file. Such a collection would represent for instance the original ASFIS txt file. | * Collections are published as zip, txt, zip containing a txt file or zip containing a csv file. Such a collection would represent for instance the original ASFIS txt file. | ||
− | ==UseCase | + | ==UseCase Union== |
− | + | * select 2 or more code lists | |
− | + | * publish them as 1 code list or layer | |
==UseCase DiffReport== | ==UseCase DiffReport== | ||
Line 71: | Line 157: | ||
− | ==UseCase | + | ==UseCase publish layer as code list== |
+ | |||
+ | |||
[[File:ImportLayer.jpg]] | [[File:ImportLayer.jpg]] | ||
* Import layer (shapefile) | * Import layer (shapefile) | ||
* .... process generic edit and approve functions | * .... process generic edit and approve functions | ||
− | * Publish as | + | * Publish as CSV and SDMX |
+ | |||
+ | Low prios: | ||
+ | * Publish as WFS and WMS(format shape) | ||
* Publish in PostGis | * Publish in PostGis | ||
* Publish in Oracle Locator | * Publish in Oracle Locator | ||
+ | |||
+ | * The geometry is expressed as Well-known text(WKT) http://en.wikipedia.org/wiki/Well-known_text | ||
+ | * Language dependent attributes from the shapefile are expressed as descriptions | ||
+ | * Non language dependent attributes from the shapefile are expressed as annotations | ||
+ | |||
+ | |||
+ | * The geo-code list end-product should handle '''source layer provenance''' information, i.e. from a tabular data column curated with such geo-code list, we must be able to know the GIS layer provenance information. The layer provenance information should be enought to point back on the layer. This information should include at least (1) the Geoserver base URL & (2) the layer name | ||
+ | |||
+ | Such layer provenance information is required in the SPREAD scenario, for intersection Data Discovery. | ||
+ | |||
+ | ==UseCase publish layer as code list Example== | ||
The practical case behind this usecase is the FAO major areas: | The practical case behind this usecase is the FAO major areas: | ||
Line 84: | Line 186: | ||
http://km.fao.org/FIGISwiki/index.php/FMA_SDMX_Codelist | http://km.fao.org/FIGISwiki/index.php/FMA_SDMX_Codelist | ||
− | After having imported the FAO areas layer, the following | + | After having imported the FAO areas layer, the following code lists are interpreted: |
− | * FAO Production Area | + | * FAO Production Area code list (from major area to sub-unit) |
− | * FAO Major Water Area | + | * FAO Major Water Area code list |
− | * FAO Major Water Area Subarea | + | * FAO Major Water Area Subarea code list |
− | * FAO Major Water Area Division | + | * FAO Major Water Area Division code list |
− | * FAO Major Water Area Subdivision | + | * FAO Major Water Area Subdivision code list |
− | * FAO Major Water Area Subunit | + | * FAO Major Water Area Subunit code list |
and hierarchies: | and hierarchies: | ||
* Relation Area code - Subarea code | * Relation Area code - Subarea code | ||
Line 96: | Line 198: | ||
* Relation Division code - Subdivision code | * Relation Division code - Subdivision code | ||
* Relation Subdivision code - Subunit code | * Relation Subdivision code - Subunit code | ||
+ | This practical case will follow this [[CodelistManager#Workflow_of_an_artefact|Workflow of an artefact]]: Imported, Interpreted, '''Immutable''', Final, Approved and Published. The editing work will be done in ArcGis. The the reference will be that shapefile edited by ArcGis. | ||
− | + | Eventually this usecase can replace the shp2Oracle and re-index functionality, currently used by Fabio Carocci. | |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
=Core Rules= | =Core Rules= | ||
* A code can become final when: | * A code can become final when: | ||
− | ** it is published in a | + | ** it is published in a code list |
** it is made final | ** it is made final | ||
* A code becomes non final when it was final and has been changed | * A code becomes non final when it was final and has been changed | ||
* A code is non final when it is created | * A code is non final when it is created | ||
− | * A code can only change from final to non-final when it was not yet published in a | + | * A code can only change from final to non-final when it was not yet published in a code list |
* Changing the validityPeriod, wellKnownText or value of a final code will result in a copy of that code. The new code will be non final. | * Changing the validityPeriod, wellKnownText or value of a final code will result in a copy of that code. The new code will be non final. | ||
* Creating a new ''Code'' means also creating a new ''CodeLife'' | * Creating a new ''Code'' means also creating a new ''CodeLife'' | ||
* Making a copy of a ''Code'' results in adding a link from that new ''Code'' to its ''CodeLife'' | * Making a copy of a ''Code'' results in adding a link from that new ''Code'' to its ''CodeLife'' | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
''Note: This table is not yet integrated in the model.'' | ''Note: This table is not yet integrated in the model.'' | ||
Latest revision as of 17:06, 5 December 2012
Context
Cotrix configuration and deployment scenarios
Domain Model
This is work in progress and has very much draft status.
Core
Collection: An aggregation of code lists and hierarchies. In SDMX a hierarchy is called a HierarchicalCodelist.
TODO: Leave, Joine and Merge need to be modeled better.
TODO: MasterConcepts can have cardinalities with itself as well
Note: A simplified version of this model is implemented in cotrix-tabular: https://github.com/cotrix/cotrixrep/tree/master/cotrix/cotrix-tabular/src/main/java/org/cotrix/domain The domain model in cotrix-tabular can be considered as the implementation model.
Documentation
Workflow of an artefact
The WorkflowStatus has as possible values the possible artefact states.
Note that the workflow is aware of an artefact, not the other way around. Doing so, the workflow can be a pluggable module.
Type (Artefact or Code) | Description |
LocallyCreated | This is the default type. Created or imported and the further lifecycle and management is in the system |
ImportedImmutable | Imported from outside and cannot be changed. Will have a CodeLife. The lifecycle and management is outside of the system but will be followed and monitored in the system. |
LifeLinked | Is only linked, not stored and at most cached. Works in principle only in case the outside link is available. Will not have a CodeLife, the lifecycle and management is outside of the system |
TODO: document all states.
Statechart of a Code
The CodeStatus has as values the possible code states.
- A code can become final when:
- it is published in a code list
- it is made final
- A code becomes non final when it was final and has been changed
- A code is non final when it is created
- A code can only change from final to non-final when it was not yet published in a code list
- Changing the validityPeriod, wellKnownText or value of a final code will result in a copy of that code. The new code will be non final.
- Creating a new Code means also creating a new CodeLife
- Making a copy of a Code results in adding a link from that new Code to its CodeLife
PartialCodelist
Union
Union is similar to Collection and PartialCodelist, but different!
A Collection will be published as 1 Artefact. Within the published artefact you may still be able to see the original artifacts where it was constructed from.
The difference is that a Union will act like one Codelist. A Union can only be created from other code lists. In its published form it is not always possible to relate back to the original artifacts where it was constructed from.
Use Cases
UseCase import csv
- User selects CSV file to import from
- System interprets the CSV file
- User chooses to accept the interpretation or decides to manually intervene
- User can manually intervene by assigning the columntype to each column of the original CSV (columntypes are code, description or annotation)
- User can manually intervene by changing the cardinalities between the codecolumns (1-n, 1-1, n-1)
- User gives version name
- User makes the artefact(s) final and sends the artefact(s) for approval
- Approver approves or denies the finalised artefact(s)
- Approver sends the artefact(s) for publication
- Publisher publishes the approved artefact(s)
Note: In the future we can think of importing from SDMX, JDBC or any other source.
UseCase import csv Example
A good example for the import csv file is the ASFIS species list. The Asfis species list is a zip file, containing the file ASFIS_sp_Feb_2011.txt, which is a csv file. The implicit hierarchies in this file are documented here. http://www.fao.org/fishery/collection/asfis/en documented here http://km.fao.org/FIGISwiki/index.php/ASFIS_SDMX_Codelist
After having imported the ASFIS file, the following code lists are interpreted:
- ASFIS Species Alpha 3 Codelist
- ASFIS Species Taxonomic Codelist
- ASFIS Species Family Taxonomic code list
- ASFIS Species Order Taxonomic code list
and hierarchies:
- Relation ASFIS Species Taxonomic code - Alpha 3 code
- Relation ASFIS Family - Species
- Relation ASFIS Order - Family
and collections
- ASFIS List of Species
Interpreted means that the system is capable of understanding all the implicit relations in the tabular format file like the the ASFIS_sp_Feb_2011.txt file and shows in the UI distinguished code lists, hierarchies and collections. The ASFIS_sp_Feb_2011.txt file results therefore in 4 codelits, 3 hierarchies and 1 collection.
The collection ASFIS List of Species is containing the same information as the original ASFIS_sp_Feb_2011.txt file.
UseCase create new version of an Artefact
- Start from scratch, import, or copy an existing Artefact in order to work on a new version of an Artefact.
- Delete codes/hierarchies
- Add codes/hierarchies
- Edit codes/hierarchies
- View deleted codes/hierarchies
- View added codes/hierarchies
- View edited codes/hierarchies
- Make Artefact final
UseCase publish
A Collection, Codelist or Hierarchy can be published through SDMX, CSV:
- Codelists are published as SDMX code lists according the SDMX REST specifications.
- Hierarchies are published as SDMX hierarchical code lists according the SDMX REST specifications
- Collections are published as zip, txt, zip containing a txt file or zip containing a csv file. Such a collection would represent for instance the original ASFIS txt file.
UseCase Union
- select 2 or more code lists
- publish them as 1 code list or layer
UseCase DiffReport
- User select artefact(Codelist, HierarchicalCodelist or Collection).
- User selects a certain version from that artefact.
- User selects another version from that same artefact.
- User clicks on generate DiffReport and views the DiffReport
The report shows:
- Codes added.
- Codes deleted.
- Number of codes in the first and second selected version.
UseCase publish layer as code list
- Import layer (shapefile)
- .... process generic edit and approve functions
- Publish as CSV and SDMX
Low prios:
- Publish as WFSWeb Feature Service and WMSSee Workload Management System or Web Mapping Service.(format shape)
- Publish in PostGis
- Publish in Oracle Locator
- The geometry is expressed as Well-known text(WKT) http://en.wikipedia.org/wiki/Well-known_text
- Language dependent attributes from the shapefile are expressed as descriptions
- Non language dependent attributes from the shapefile are expressed as annotations
- The geo-code list end-product should handle source layer provenance information, i.e. from a tabular data column curated with such geo-code list, we must be able to know the GIS layer provenance information. The layer provenance information should be enought to point back on the layer. This information should include at least (1) the Geoserver base URL & (2) the layer name
Such layer provenance information is required in the SPREAD scenario, for intersection Data Discovery.
UseCase publish layer as code list Example
The practical case behind this usecase is the FAO major areas:
http://km.fao.org/FIGISwiki/index.php/FMA_SDMX_Codelist
After having imported the FAO areas layer, the following code lists are interpreted:
- FAO Production Area code list (from major area to sub-unit)
- FAO Major Water Area code list
- FAO Major Water Area Subarea code list
- FAO Major Water Area Division code list
- FAO Major Water Area Subdivision code list
- FAO Major Water Area Subunit code list
and hierarchies:
- Relation Area code - Subarea code
- Relation Subarea code - Division code
- Relation Division code - Subdivision code
- Relation Subdivision code - Subunit code
This practical case will follow this Workflow of an artefact: Imported, Interpreted, Immutable, Final, Approved and Published. The editing work will be done in ArcGis. The the reference will be that shapefile edited by ArcGis.
Eventually this usecase can replace the shp2Oracle and re-index functionality, currently used by Fabio Carocci.
Core Rules
- A code can become final when:
- it is published in a code list
- it is made final
- A code becomes non final when it was final and has been changed
- A code is non final when it is created
- A code can only change from final to non-final when it was not yet published in a code list
- Changing the validityPeriod, wellKnownText or value of a final code will result in a copy of that code. The new code will be non final.
- Creating a new Code means also creating a new CodeLife
- Making a copy of a Code results in adding a link from that new Code to its CodeLife
Note: This table is not yet integrated in the model.
Nice to haves
- Integration with SharePoint
- Support for CMIS
- Export to OWL
- Export to SKOSS
- Export to RDF
- Merging
- Mapping
Links
[1]TaxoTools