N-gram-like Language Models Predict Reading Time Best

☆☆☆☆☆Mar 10, 2026arxiv →

James A. Michaelov, Roger P. Levy

Abstract

Recent work has found that contemporary language models such as transformers can become so good at next-word prediction that the probabilities they calculate become worse for predicting reading time. In this paper, we propose that this can be explained by reading time being sensitive to simple n-gram statistics rather than the more complex statistics learned by state-of-the-art transformer language models. We demonstrate that the neural language models whose predictions are most correlated with n-gram probability are also those that calculate probabilities that are the most correlated with eye-tracking-based metrics of reading time on naturalistic text.

Explain this paper

Ask this paper

Loading chat…

Abstract

Explain this paper

Ask this paper

Rate this paper