09.01.2014 Data Ingestion and Publication

Data Publication

Description of first implementation:

OAI-PMH publishing is based on data returned by collection browsing (testing approach), via an ASL component.

Planned approach implementation:

OAI-PMH "resources" (i.e. datasets) are mapped to search queries that are served by the OAI-PMH protocol provider at ASL level.
Search is used in order to be able to expose all available indexed sources in a mixed manner, do being restricted on single source and is a different way of providing data instead of presenting them. (as mentioned, the browsing capability of search is currently being used).
- The brows approach A query construction It will also be automatically constructed by exploiting browsing capability. Also mapping sets to queries.
An end-point providing information of all data sets exposed by a scope is provided as an ASL HTTP component.
Metadata are mapped into DC format appropriate for OAI-PMH publishing:
- Custom schema is currently being used, intend to xslt transform each different schema to dc schema, in order to have OAI-PMH compliance.
- Transformations of hosted schemas into DC-lite are considered to be provided as XSLTs at the time of the configuration of the service.

Notes:

Suggestions (by Leonardo C.):

Provide a UI (portlet) for configuring the service (OAI-PMH resource name, search query definition, transformation to DC-Lite).
Use an approach similar to simplified field mapping index resources, for creating the DC-Lite required schema of OAI-PMH Protocol. Additionally use the "common presentation fields" for deriving the DC-Lite record, so that it is homogeneous across all schemas.
We should follow the general approach of having as many standards as possible supported so that we can maximize adoption, yet OAI-ORE is not a priority.

Decisions / Concerns / Highlights:

A configuration UI must be provided along side the service components.
The "search query" approach will be followed as it allows more complex views of the harvested / hosted metadata to be served by the system.
The only blocking issue being that perhaps the "presentation" fields are too few to give any valuable DC-Lite record.
OAI-ORE is not a major priority, so it will be considered after OAI-PMH is completed.
Different sets to be published must be identified.
Implementation phase will take place at least until March. Sooner if possible depending on other concurrent activities.

Description:
- Intention to be capable of indexing various data sources. A number of plugins will be implemented that retrieve data from sources, provide data with xml representation. Forward data to gDTS for transformation and feed to index.

Intermediate data are not stored anywhere. Process will be triggered during indexing as a program with unique uri and manager will take care of the whole process, from data retrieval to feeding.

Example:
- rdbms data represented as xml, further transformed into rowsets, and index feed

Plugins will act as data provider, external to sources (within or out infrastructure), but must be able to access them

Begin with rdbms data and expand to GIS data, timeseries etc. GeoNetwork can be accesed directly, exploit (convert) timeseries