Authors:
Sood, Ekta, Ekta.sood@colorado.edu, University of Colorado Boulder
Dhar, Prajit, Prajitdhar@gmail.com
Troiano, Enrica, enrica.troiano@gmail.com, Vrije Universiteit Amsterdam
Southwell, Rosy, rosy.southwell@colorado.edu, University of Colorado Boulder
D’Mello, Sidney, Sidney.dmello@colorado.edu
Keywords: Eyetracking, representation learning, NLP, reading, attention modeling, scan path prediction
Abstract:
Accurately predicting human scanpaths during reading is vital for diverse fields and downstream tasks, from educational technologies to automatic question answering. To date, however, progress in this direction remains limited by scarce gaze data. We overcome the issue with ScanEZ, a self-supervised framework grounded in cognitive models of reading. ScanEZ jointly models the spatial and temporal dimensions of scanpaths by leveraging synthetic data and a 3-D gaze objective inspired by masked language modeling. With this framework, we provide evidence that two key factors in scanpath prediction during reading are: the use of masked modeling of both spatial and temporal patterns of eye movements, and cognitive model simulations as an inductive bias to kick-start training. Our approach achieves state-of-the-art results on established datasets (e.g., up to 31.4% negative log-likelihood improvement on CELER L1), and proves portable across different experimental conditions.