ComprehensionWorkshop

Reading Profiles Issued from Eye-tracking Data

Authors:
Ivchenko, Oksana, PhD candidate
Grabar, Natalia, CNRS Researcher
Nasir, Jamal, Assistant Professor

Keywords: French eye-tracking corpus, Reading strategies, Medical text simplification

Abstract:

Reading is a crucial skill for acquiring knowledge, processing information, and making informed choices. However, naturally written texts, particularly in specialized domains like medicine, often contain complex concepts that hinder comprehension [1, 2, 3]. While text simplification aims to improve readability, current approaches primarily rely on linguistic heuristics and lack objective methods to identify difficult passages. Eye movements provide a direct and objective window into the cognitive processes underlying reading comprehension, offering valuable insights into reading difficulty. We introduce a novel annotated eye-tracking corpus for French that spans three text types (medical, clinical, general), comprising 14 texts, each presented in two versions: original and manually simplified. The simplification targeted lexical, syntactic, and semantic complexity. Participants read these texts, while their eye movements were tracked, and answered yes/no/I don’t know comprehension questions (13 in total, randomly distributed across slides). Our goal is to enhance the accessibility of medical and general texts by refining reading assessment methods and identifying individual differences in reading strategies. These insights can inform adaptive text simplification models and machine learning approaches for automated reading evaluation. We present analyses of forty participants (no medical education, native French speakers), that read texts in either their original or simplified version, ensuring no participant saw both versions of the same text. Six key eye-movement indicators were derived from the data: total and average fixation duration, fixation count, duration of the initial gaze, and the minimum-to-maximum range. These features were aggregated per participant and used to compute pairwise Euclidean distances, forming the basis of a clustering analysis. Using hierarchical clustering and k-means, we identified three distinct reading profiles: Cluster A: Readers with the shortest fixation durations and the fewest fixations, suggesting a faster, possibly skimming reading strategy. Cluster B: Readers with longest fixation durations across all metrics, indicating a careful and deliberate reading approach. Cluster C: Readers in between these two extremes, reflecting a balanced reading strategy. These results highlight different patterns in reading strategies, which could be leveraged to improve adaptive text simplification and personalized readability models, particularly in medical contexts where complex language can be a barrier to comprehension. These results are also supported by an analysis of comprehension responses and their correlation with reading profiles, providing further insights into the relationship between reading strategies and text understanding. Bibliography [1] Alia Pugh, Devin Kearns, and Elfrieda Hiebert. 2023. Text types and their relation to efficacy in beginning reading interventions. Reading Research Quarterly, 58:710–732. [2] Glenn Fulcher. 1997. Text difficulty and accessibility: Reading formulae and expert judgement. System, 25:497–513. [3] Pierre Zweigenbaum, Pierre Jacquemart, Natalia Grabar, and Benoit Habert. 2001. Building a text corpus for representing the variety of medical language. In MEDINFO, pages 290–294.