NEWSLETTER
Trimestral | Nº 04 - 2019
Formação Avançada

Doutoramento em Informática
Promoting Understandability in Consumer Health Information Search
Hua Yang

Promoting Understandability in Consumer Health Information Search

Orientação:  Teresa Cristina de Freitas Gonçalves

Nowadays, intheareaofConsumerHealthInformationRetrieval, techniques and methodologies are still far from being effective in answering complex health queries. One main challenge comes from the varying and limited medical knowledge background of consumers; the existing language gap between non-expert consumers and the complex medical resources confuses them. So, returning not only topical relevant but also understandable health information to the user is a significant and practical challenge in this area. In this work, the main research goal is to study ways to promote understandability in Consumer Health Information Retrieval. To help reaching this goal, two research questions are issued: (i) how to bridge the existing language gap; (ii) how to return more understandable documents. Two modules are designed, each answering one research question. In the first module, a Medical Concept Model is proposed for use in health query processing; this model integrates Natural Language Processing techniques into state-ofthe-art Information Retrieval. Moreover, aiming to integrate syntactic and semantic information, word embedding models are explored as query expansion resources. The second module is designed to learn understandability from past data; a two-stage learning to rank model is proposed with rank aggregation methods applied on single field-based ranking models. These proposed modules are assessed on FIRE’2016 CHIS track data and CLEF’2016-2018 eHealth IR data collections. Extensive experimental comparisons with the state-of-the-art baselines on the considered data collections confirmed the effectiveness of the proposed approaches: regarding understandability relevance, the improvement is 11.5%, 9.3% and 16.3% in RBP, uRBP and uRBPgr evaluation metrics, respectively; in what concerns to topical relevance, the improvement is 7.8%, 16.4% and 7.6% in P@10, NDCG@10 and MAP evaluation metrics, respectively.Keywords: Information Retrieval, Health, Consumer, Understandability, Query expansion, Learning to Rank.

Nowadays, intheareaofConsumerHealthInformationRetrieval, techniques and methodologies are still far from being effective in answering complex health queries. One main challenge comes from the varying and limited medical knowledge background of consumers; the existing language gap between non-expert consumers and the complex medical resources confuses them. So, returning not only topical relevant but also understandable health information to the user is a significant and practical challenge in this area. In this work, the main research goal is to study ways to promote understandability in Consumer Health Information Retrieval. To help reaching this goal, two research questions are issued: (i) how to bridge the existing language gap; (ii) how to return more understandable documents. Two modules are designed, each answering one research question. In the first module, a Medical Concept Model is proposed for use in health query processing; this model integrates Natural Language Processing techniques into state-ofthe-art Information Retrieval. Moreover, aiming to integrate syntactic and semantic information, word embedding models are explored as query expansion resources. The second module is designed to learn understandability from past data; a two-stage learning to rank model is proposed with rank aggregation methods applied on single field-based ranking models. These proposed modules are assessed on FIRE’2016 CHIS track data and CLEF’2016-2018 eHealth IR data collections. Extensive experimental comparisons with the state-of-the-art baselines on the considered data collections confirmed the effectiveness of the proposed approaches: regarding understandability relevance, the improvement is 11.5%, 9.3% and 16.3% in RBP, uRBP and uRBPgr evaluation metrics, respectively; in what concerns to topical relevance, the improvement is 7.8%, 16.4% and 7.6% in P@10, NDCG@10 and MAP evaluation metrics, respectively.

Keywords: Information Retrieval, Health, Consumer, Understandability, Query expansion, Learning to Rank.