30.05.2013 WP10-WP6 - Semantic Search

From D4Science Wiki
Jump to: navigation, search

Agenda

Time: Thu, May 29, 2013, 11:00 - 14:00 CEST

  • The status of adoption of the alternative implementation of xSearch components
  • The status TLO exploitation in semantic search
  • The plans for the VREVirtual Research Environment. that will demonstrate the aforementioned elements' progress

Participants

  • Alex Antoniadis (NKUA)
  • John Gerbesiotis (NKUA)
  • Pasquale Pagano (CNR)
  • Leonardo Candela (CNR)
  • Yannis Tzitzikas (FORTH)
  • Yannis Marketakis (FORTH)
  • Carlo Allocca (FORTH)
  • Pavlos Fafalios (FORTH)
  • Pascal Cauquil (IRD)

Discussion Summary

Adoption of alternative implementation of xSearch components

  • Objectives
    • Improve scalability
    • Extend functionality
    • Analyzing more hits
  • Goals
    • Extend xSearch
    • Multiple instances deployment
  • Does not impose any risk as current implementation is operational
  • Is expected to be included in the following release


Deployment of a Search VREVirtual Research Environment. in the gCubeApps VOVirtual Organization;

  • Forthcoming VREVirtual Research Environment. is important for demo
  • Related task had been postponed until underlying supporting infrastructure comes available (e.g. provide access from those different data providers)
  • After gCube 2.15.0 gets released, the building of the new VREVirtual Research Environment. will start
  • Include ecoscope (made opensearch compatible) and bing opensearch (is supported)
  • Describing sources (found on notes below)
  • Deployment plan as described
    • start new VREVirtual Research Environment.
    • index all collections
    • deploy new portlet and new xSearch as pre-production
    • defining and tuning over production infrastructure
    • make it available to end-users before September review


TLO exploitation

  • repository construction for evaluation using:
    • owllim
    • virtuoso
  • investigating which triple store is beneficial to use based on the requirements of the mappings
    • comparatively test the process using both owllim and virtuoso, done in parallel with the inspection of the contents of the repository and the testing of the competency queries
  • current version of xSearch that uses a copy of FLOD uses virtuoso
  • tried owllim in order to check if the extra reasoning facilities that it offers, can be exploited for the process of mapping
  • Ideally, to have a repo and process to construct integrated warehouse, to support different access schemas
  • Carlo should do the feeding mapping and evaluation
  • discuss with community the frequency of changes, in order to schedule updates
    • provide an update mechanism to anyone
  • FAO had reported that would construct this repo
  • FAO has provided a document with the workflow, not complete
  • TD said that collaboration and communication on this must get improved
  • FAO has not reported anything in T10.4 although agreed to provide details in the previous telco
  • XSearch can use any sparql point
  • XSearch is only one of the possible ways to exploit the TLO. It can also be exploited in various ways (for mashups like FactSheetsGenerator, for publishing LOD, for warehouses (i.e. tlo-based repositories))


Other Discussions

Reported Activities

  • FORTH declared activities of investigating the alternative implementation for two consecutive months
  • Corresponding tickets updated occasionally, usage of private svn do not depict current status
  • FORTH justified activities after updating tickets, preparing analysis document(evaluation of alternatives, queries for demonstration) and clarifying ongoing activities

Entity mining bookmarklet

  • FORTH proposed the demonstration of the bookmarklet and wondered which way would be best
  • CNR pointed out that since there is no worms endpoint can not be promoted. It can be used in any place that access data. Cannot rely on vocabulary.
  • FORTH replied that it uses FLOOD for identification with commercial name

First review demonstration

  • FORTH differed from the decision of not showing the xSearch in previous review
  • TD indicated that this was a decision taken by WP leaders and in context of WP5


Suggestions exploiting TLO, published metadata from species, mappings. A lot of application could exploit the information from TLO

Actions

  • Alternative implementation of xSearch components to be included in next release
  • FORTH will create 2 tickets
    • implementation of the alternative approach for xSearch components
    • how to have more than one running instance of XSearch in the production infrastructure (FORTH will liaise with Andrea on this)
  • Yannis (FORTH) will contact Anton (FAO) to check who is the contact person in T10.4
  • Lino will contact Andrea to prepare a plan for the creation and validation of the Search VREVirtual Research Environment.
  • On VREVirtual Research Environment. Deployment: to update #1190 with the new data sources identified in this teleco
  • Carlo (FORTH) will produce a document on the workflow leading to the population of TLO; this should be analysed by FAO - To update the ticket. To continue the documentation of the process

Notes

On user-oriented “environments” (portals) for testing the technology, demonstrating it and finally making it available to “others”:

  • Devportal is expected to be used for development purposes, it relies on the development infrastructure: it is available at dev.d4science.org
    • access development computational resources maintained by the development team
  • Newportal can be used to test portlets by relying on the production infrastructure: it is available at newportal.i-marine.d4science.org
    • access production-quality computational resources maintained by the infrastructure team. Its usage is reserved to project members
  • Production portal must be used to offer the service by relying on the production infrastructure: it is available at portal.i-marine.d4science.org
    • access production-quality computational resources maintained by the infrastructure team. All users can access it


The following list characterises the VREVirtual Research Environment. of the forthcoming Search VREVirtual Research Environment..

DublinCore (DC) via OAI-PMH accessed via TreeManager

  1. AquaticCommons
  2. BioRisk
  3. Biodiversity Heritage Library
  4. BiolineInternational
  5. Comparative Cytogenetics
  6. Cultural ... (CEEMaR)
  7. DRS
  8. Dryad
  9. International Journal of Myriapodology
  10. Journal of Hymenoptera
  11. MycoKeys
  12. Nature Conservation
  13. NeoBiota
  14. OceanDocs
  15. PANGAEA
  16. PhytoKeys
  17. Subterranean Biology
  18. WHOAS
  19. ZooKeys
  20. nature

DarwinCore (DwC) accessed via TreeManager

  1. WoRMS
  2. WoRDS
  3. ITIS
  4. IRMNG
  5. Catalogue of Life (CoL)

FAO Ad-hoc format via TreeManager

  1. FAO Species Factsheet (FIGIS) (this is documented at #1592)

http://figisapps.fao.org/vrmf/samples/species/FS/

OpenSearch External Sources

  1. Ecoscope search (private discussion, a ticket should be created)
  2. Bing Search (this should not be an issue, it is supported according to #1255)