Difference between revisions of "09.01.2014 Data Ingestion and Publication"

Revision as of 18:42, 9 January 2014

Data Publication

Description of first implementation:

OAI-PMH publishing is based on data returned by collection browsing (testing approach), via an ASL component.

Planned approach implementation:

OAI-PMH "resources" (i.e. datasets) are mapped to search queries that are served by the OAI-PMH protocol provider at ASL level.
Search is used in order to be able to expose all available indexed sources in a mixed manner, do being restricted on single source and is a different way of providing data instead of presenting them. (as mentioned, the browsing capability of search is currently being used).
- The brows approach A query construction It will also be automatically constructed by exploiting browsing capability. Also mapping sets to queries.
An end-point providing information of all data sets exposed by a scope is provided as an ASL HTTP component.
Metadata are mapped into DC format appropriate for OAI-PMH publishing:
- Custom schema is currently being used, intend to xslt transform each different schema to dc schema, in order to have OAI-PMH compliance.
- Transformations of hosted schemas into DC-lite are considered to be provided as XSLTs at the time of the configuration of the service.

Notes:

The incremental OAI-PMH publishing cannot be supported.

Suggestions (by Leonardo C.):

Provide a UI (portlet) for configuring the service (OAI-PMH resource name, search query definition, transformation to DC-Lite).
Use an approach similar to simplified field mapping index resources, for creating the DC-Lite required schema of OAI-PMH Protocol. Additionally use the "common presentation fields" for deriving the DC-Lite record, so that it is homogeneous across all schemas.
We should follow the general approach of having as many standards as possible supported so that we can maximize adoption, yet OAI-ORE is not a priority.

Decisions / Concerns / Highlights:

A configuration UI must be provided along side the service components.
The "search query" approach will be followed as it allows more complex views of the harvested / hosted metadata to be served by the system.
The only blocking issue being that perhaps the "presentation" fields are too few to give any valuable DC-Lite record.
OAI-ORE is not a major priority, so it will be considered after OAI-PMH is completed.
Different sets to be published must be identified.
Implementation phase will take place at least until March. Sooner if possible depending on other concurrent activities.

Data Ingestion

Description:
- Intention to be capable of indexing various data sources. A number of plugins will be implemented that retrieve data from sources, provide data with xml representation. Forward data to gDTS for transformation and feed to index.

Intermediate data are not stored anywhere. Process will be triggered during indexing as a program with unique uri and manager will take care of the whole process, from data retrieval to feeding.

Example:
- rdbms data represented as xml, further transformed into rowsets, and index feed

no incremental update. index swap at the moment. investigation in future

Credentials for all sources stored on IS.

Plugins will act as data provider, external to sources (within or out infrastructure), but must be able to access them

Begin with rdbms data and expand to GIS data, timeseries etc. GeoNetwork can be accesed directly, exploit (convert) timeseries

@@ Line 1: / Line 1: @@
 ==Data Publication==
-*Description of our first approach:
-**Data returned by search system are served with OAI-PMH protocol at ASL level
-*Search is used in order to retrieve all available indexed sources, do not depend on single source and is a different way of providing data instead of presenting them. Browsing capability of search is being exploited.
+Description of first implementation:
+* OAI-PMH publishing is based on data returned by collection browsing (testing approach), via an ASL component.
-*Custom schema is currently being used, intend to xslt transform each different schema to dc schema, in order to have OAI-PMH with dc schema support (at least dc subset)
+Planned approach implementation:
+* OAI-PMH "resources" (i.e. datasets) are mapped to search queries that are served by the OAI-PMH protocol provider at ASL level.
+* Search is used in order to be able to expose all available indexed sources in a mixed manner, do being restricted on single source and is a different way of providing data instead of presenting them. (as mentioned, the browsing capability of search is currently being used).
+** The brows approach A query construction It will also be automatically constructed by exploiting browsing capability. Also mapping sets to queries.
+* An end-point providing information of all data sets exposed by a scope is provided as an ASL HTTP component.
+* Metadata are mapped into DC format appropriate for OAI-PMH publishing:
+** Custom schema is currently being used, intend to xslt transform each different schema to dc schema, in order to have OAI-PMH compliance.
+** Transformations of hosted schemas into DC-lite are considered to be provided as XSLTs at the time of the configuration of the service.
-*Steps to be fulfilled
+Notes:
-# identify different sets. Much work to be done here.
+* The incremental OAI-PMH publishing cannot be supported.
-# query construction. It will also be automatically constructed by exploiting browsing capability. Also mapping sets to queries.
-# mappings of schemas to dc. File uploading of mappings.
-*Implementation phase will take place at least until March. Sooner if possible cause no other OAI-PMH data publication is currently available as before.
+Suggestions (by Leonardo C.):
+* Provide a UI (portlet) for configuring the service (OAI-PMH resource name, search query definition, transformation to DC-Lite).
+* Use an approach similar to simplified field mapping index resources, for creating the DC-Lite required schema of OAI-PMH Protocol. Additionally use the "common presentation fields" for deriving the DC-Lite record, so that it is homogeneous across all schemas.
+* We should follow the general approach of having as many standards as possible supported so that we can maximize adoption, yet OAI-ORE is not a priority.
+Decisions / Concerns / Highlights:
+* A configuration UI must be provided along side the service components.
+* The "search query" approach will be followed as it allows more complex views of the harvested / hosted metadata to be served by the system.
+* The only blocking issue being that perhaps the "presentation" fields are too few to give any valuable DC-Lite record.
+* OAI-ORE is not a major priority, so it will be considered after OAI-PMH is completed.
+* Different sets to be published must be identified.
+* Implementation phase will take place at least until March. Sooner if possible depending on other concurrent activities.
 ==Data Ingestion==
 *Description:
 **Intention to be capable of indexing various data sources. A number of plugins will be implemented that retrieve data from sources, provide data with xml representation. Forward data to gDTS for transformation and feed to index.

Difference between revisions of "09.01.2014 Data Ingestion and Publication"

Revision as of 18:42, 9 January 2014

Data Publication

Data Ingestion

Other topics

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

D4Science

Capacity

Procedures

Policies

Documentation

Tools