Difference between revisions of "28.08.2013 SmartFish Annotation"

From D4Science Wiki
Jump to: navigation, search
(technical discussion on indexing and annotation services for SmartFish F-RIS prototype)
 
Line 13: Line 13:
  
 
Claudio also recalled that he describes in his mail the procedure to achieve the objective:
 
Claudio also recalled that he describes in his mail the procedure to achieve the objective:
''1- moving the current Lucene index to the infrastructure (I assume it can be ingested by Elastic Search)
+
''1- moving the current Lucene index to the infrastructure (I assume it can be ingested by Elastic Search)<br/>
2- replace the address of search invocation from the portal front-end to the infrastructure search service (encapsulate the conversion of the infra search response to the JSON format accepted by the front-end inside the portal back-end). Note: this operation is transparent to the front-end and the portal is still fully functional.
+
''2- replace the address of search invocation from the portal front-end to the infrastructure search service (encapsulate the conversion of the infra search response to the JSON format accepted by the front-end inside the portal back-end). Note: this operation is transparent to the front-end and the portal is still fully functional.<br/>
3- test the behaviour of the search through the front-end (user based) to ensure it is reproducing the current behaviour and thus set the baseline for improvement.  
+
''3- test the behaviour of the search through the front-end (user based) to ensure it is reproducing the current behaviour and thus set the baseline for improvement. <br/>
4- create an improved version of the index using Eleastic Search framework. The improvement consist in creating a custom index schema based on the structure of the documents in the collection. For this phase NKUA engineer will be supported by FAO experts pointing out which sections of the documents structure are sensitive of special attention. This achievement is propaedeutic to a weighted search
+
''4- create an improved version of the index using Eleastic Search framework. The improvement consist in creating a custom index schema based on the structure of the documents in the collection. For this phase NKUA engineer will be supported by FAO experts pointing out which sections of the documents structure are sensitive of special attention. This achievement is propaedeutic to a weighted search <br/>
5-replace the index and asses the precision and recall against the base line in item 3.''
+
''5-replace the index and asses the precision and recall against the base line in item 3.<br/>
  
 
Yann asked as a conclusion both Alex and Claudio to re-work on these 5 items, clarifying each one and adding a time frame. Once done, another meeting will be called to validate with Lino the time to be spend on this task.
 
Yann asked as a conclusion both Alex and Claudio to re-work on these 5 items, clarifying each one and adding a time frame. Once done, another meeting will be called to validate with Lino the time to be spend on this task.
  
 
To be done: Claudio and Alex: prepare an evaluation of the time needed to have indexing and annotation processes back in the iMarine e-infrastructure for the SmartFish F-RIS.
 
To be done: Claudio and Alex: prepare an evaluation of the time needed to have indexing and annotation processes back in the iMarine e-infrastructure for the SmartFish F-RIS.

Latest revision as of 13:00, 2 September 2013

Skype call, 28 August 2013


Agenda: technical discussion on indexing and annotation services to be re-integrated in iMarine infrascture for the Smartfish Fisheries Regional Information System

On the call: Yann Laurent, Anton Ellenbroek, Claudio Baldassarre, Alexandros Antoniadis, Lino Pasquale

Yann made an introduction of the meeting recalling the main objective to discuss any technical matter to clarify SmartFish needs for iMarine indexing and annotations service and provide assessment of the time / resources needed to achieve the re-integration in the infrastructure. Currently, such services are available but needs a better support for a broader use in the future, and improvements to enhance accuracy of results, especially by adding a weight to keyword when indexing a document. Claudio gave Alex some details on the search mechanism, especially insisting on the use of RDF and the existing annotation schema and knowledge base. He also highlighted that the main challenge is the capability to identify the main concepts in the scanned document and give this context to the annotations. It has been agreed that a better comprehension of the needs was shared.

Claudio also recalled that he describes in his mail the procedure to achieve the objective: 1- moving the current Lucene index to the infrastructure (I assume it can be ingested by Elastic Search)
2- replace the address of search invocation from the portal front-end to the infrastructure search service (encapsulate the conversion of the infra search response to the JSON format accepted by the front-end inside the portal back-end). Note: this operation is transparent to the front-end and the portal is still fully functional.
3- test the behaviour of the search through the front-end (user based) to ensure it is reproducing the current behaviour and thus set the baseline for improvement.
4- create an improved version of the index using Eleastic Search framework. The improvement consist in creating a custom index schema based on the structure of the documents in the collection. For this phase NKUA engineer will be supported by FAO experts pointing out which sections of the documents structure are sensitive of special attention. This achievement is propaedeutic to a weighted search
5-replace the index and asses the precision and recall against the base line in item 3.

Yann asked as a conclusion both Alex and Claudio to re-work on these 5 items, clarifying each one and adding a time frame. Once done, another meeting will be called to validate with Lino the time to be spend on this task.

To be done: Claudio and Alex: prepare an evaluation of the time needed to have indexing and annotation processes back in the iMarine e-infrastructure for the SmartFish F-RIS.