ComprehensionWorkshop

Using Eye Tracking Data for Lexical Simplification

Authors:
Hodivoianu, Anamaria, anamaria.hodivoianu@s.unibuc.ro, University of Bucharest
Kuvshynova, Oleksandra, oleksandra.kuvshynova@s.unibuc.ro, University of Bucharest
Marin, Mircea, mircea.marin@s.unibuc.ro, University of Bucharest
Nisioi, Sergiu, sergiu.nisioi@unibuc.ro, University of Bucharest

Keywords: lexical complexity prediction, lexical simplification

Abstract:

Lexical complexity prediction (LCP) and lexical simplification (LS) are two related tasks aimed at transforming texts from their original form into a more accessible and easier-to-understand version. Current work on LCP relies on word-level data annotation, where annotators provide a complexity rating from 1 (the simplest) to 5 (the most complex). Using the complex areas identified by LCP, simplification methods generate substitution candidates based on a simplicity score.

In this work, we explore the relationship between eye-tracking reading times and explicit lexical complexity assessments. Due to the lack of direct eye-tracking data in lexical complexity datasets (and, conversely, the absence of complexity annotations in eye-tracking datasets), we build a set of annotations on the Romanian version of the MultiplEye dataset. Using pre-trained predictors as estimators of actual reading times and lexical complexity, we explore several relationships between these two modalities of collecting reading-related data.

Preliminary results show that the relationship between eye-tracking metrics and LCP is not symmetric: training on LCP data yields better predictors of reading durations than the reverse. Words perceived as highly complex take more time to read, but words that take longer to read are not necessarily labeled as complex.

This research has been supported by InstRead: Research Instruments for Text Complexity, Simplification and Readability Assessment, CNCS-UEFISCDI project number PN-IV-P2-2.1-TE-2023-2007.