17.09.2012 WP10 - X-Search Integration

From D4Science Wiki
Jump to: navigation, search

Agenda

Time: Mon, September 17, 2012, 15:00 - 16:00 CEST

  • Discussion on the topics raised in tickets #627, #628
  • Discussion on other issues observed during the usage of the application by NKUA
  • Time frame of issue resolution in relation with XSearch internal evaluation
  • Agreement on a concrete plan regarding the functionality that will be included in the release predating the project review
  • Status of Flod exploitation, plan until review

Participants

  • Alex Antoniadis (NKUA)
  • Gerasimos Farantatos (NKUA)
  • John Gerbesiotis (NKUA)
  • Rena Tsantouli (NKUA)
  • Giota Koltsida (NKUA)
  • Pavlos Fafalios (FORTH)
  • Yannis Marketakis (FORTH)
  • Yannis Kitsos (FORTH)

Discussion Summary

  • Discussion on the topics raised in tickets #627, #628
    • Regarding the gRS2 usage issue raised in #627 and the suggested solution appearing as the first bullet in #628, FORTH's concern was if adopting the solution would entail additional delays. In particular, FORTH asked if there would be any delays in determining where the results end when reading the results in a gradual manner from ResultsetConsumer. NKUA informed FORTH that as soon as a Data Source finishes producing results it closes its writer and that the same holds for all search operators. In that way, the information that production is finished is communicated in a pipelined manner and the consumer is able to immediately determine whether additional results are available or not. The ResultsetConsumer utility exposes this gRS2 functionality, so with the proper calls to its API FORTH will be able to follow the suggestion with no additional delays. FORTH acknowledged the reasons behind NKUA's suggestion and agreed to proceed since there will be no problems.
    • FORTH expressed the need to have a better picture of the performance of gRS2 in transferring records. In particular, judging by the time needed to retrieve result pages in the Result portlet which is in the order of some seconds, FORTH wondered if transferring e.g. 1000 results would take too long and negate the usefulness of XSearch. NKUA answered that the transferring of results is not on such orders of magnitude and will not pose problems. Some quick estimations on gRS2 performance using forward reader were provided and times were found to be in the order of milliseconds. FORTH would like some measurements. Since the delays incurred by the client side by the presentation layer are irrelevant, NKUA will provide measurements of the times needed to consume results through ResultsetConsumer and the timing of random seeks using ResultsetConsumer. In case of the latter, the timings related to the number of records that fit one result page will determine the incurred delay of using ResultsetConsumer to retrieve results instead of keeping all records in memory.
    • Regarding the second and third suggestions in ticket #628, FORTH stated that XSearch should be able to present results at real time. If the user were made to wait each time a cluster or entity is selected, XSearch would not be able to claim that real-time results are provided. For this reason, the suggestions will be further evaluated as soon as the measurements from NKUA are available.
  • Discussion on other issues observed during the usage of the application by NKUA
    • NKUA reported that the presented entities include stop words such as "and", "at", etc. FORTH states that since XSearch performs lexical analysis, the quality of entities depends on the quality of snippets provided by the underlying search engine. FORTH stated that in other webapps which the underlying system is different the quality of entities is better. NKUA asked if it is possible that FORTH perform a postprocessing of entities, by removing the entities contained in a list of stop words. FORTH informed that this is already performed for clusters, and will look into performing the same for entities as well.
    • NKUA reported that different number of result pages and different clusters/entities were observed after submitting multiple times the same query returning the same number of results in the same order. FORTH informed NKUA that this issue is most likely caused by an issue in ResultsetConsumer. In particular, FORTH reported that calls to getResultsToText of ResultsetConsumer with the same arguments return different number of results. NKUA will investigate the issue.
    • NKUA reported that presented results change when semantic information becomes available. FORTH believes that the underlying cause is the same, so once a solution is found both problems will be solved.
  • Additional topics of discussion
    • FORTH asked if search results are ranked by some way. NKUA answered that unified ranking is currently not provided. In particular, the results belonging to a collection are ranked, by their union is not. Currently the results are returned by collection, i.e. all results from one collection followed by all results from another collection etc. The unification of ranked results could be performed at the client side, or even instruct search to rank the results in FIFO order and consider the outcome ordering as a ranking estimation. However, the problem would be better addressed by the Search System itself. NKUA will work towards providing ranked search results.
    • FORTH requested javadocs. NKUA informed FORTH that javadocs are available in the service archives of all software artifacts.
  • Agreement on a concrete plan regarding the functionality that will be included in the release predating the project review
    • FORTH stated that the XSearch portlet and service are not of high priority for the project review. Since all features to be demonstrated are available in other components, specifically webapps running over other search engines such as FIGIS, FORTH aims to provide only basic functionality concerning XSearch over gCube. This means that effort will be focused on resolving issues and making the current version of XSearch portlet work correctly with the current functionality and avoid performing major changes until the review. As of now, FORTH is not certain if the XSearch portlet will be demonstrated in the review. For these reasons, no commitment can be made in the implementation of mining over result content (Resource Registry integration) and the parameterization of XSearch for different VREs/scopes.
  • Status of FLOD exploitation, plan until review
    • FLOD is already exploited by XSearch. The functionality is offered by a webapp operating over FIGIS+FLOD, where entities are linked with FLOD descriptions. This webapp also enables navigation into FLOD.
    • This webapp is external to the iMarine infrastructure.
    • A ticket has been opened (#474) in order to register FLOD in the infrastructure. The activity is blocked by it.

Actions

  • FORTH is going to follow the first suggestion of ticket #628 regarding the usage of gRS2
  • NKUA will provide ResultsetConsumer timings for record transfer and random seek behavior
  • NKUA will investigate the reported issue of ResultsetConsumer
  • NKUA will implement unified result ranking
  • Next conference call will be held next week, in order to discuss current status and XSearch use cases.