A quantitative window on the history of statistics: topic-modelling 120 years of Biometrika


As one of the oldest continuously publishing journals in statistics (published since 1901), Biometrika provides a unique window onto the history of statistics and its epistemic development throughout the 20th and the beginning of the 21st centuries. While the early history of the discipline, with the works of key figures, such as Karl Pearson, Francis Galton, or Ronald Fisher, is relatively well known, the later (and longer) episodes of its intellectual development remain understudied. By applying digital tools to the full-text corpus of the journal articles (N = 5,596), the objective of this study is to provide a novel quantitative exploration of the history of the statistical sciences via an all-encompassing view of 120 years of Biometrika. To this aim, topic-modelling analyses are used and provide insights into the epistemic content of the journal and its evolution. Striking changes in the thematic content of the journal are documented and quantified for the first time, from the decline of Pearsonian and Weldonian biometrical research and the journal’s tight connection to biology in the 1930s to the rise of modern statistical methods beginning in the 1960s and 1970s. Newly developed approaches are used to infer author networks from publication topics. The resulting network of authors shows the existence of several communities, well-aligned with topic clusters and their evolution through time. It also highlights the role of specific figures over more than a century of publishing history and provides a first window onto the foundation, development, and diverse applications of the statistical sciences.


Bertoldi, N., Lareau, F., Pence, C. H. et Malaterre, C. (2023). A quantitative window on the history of statistics: topic-modelling 120 years of Biometrika. Digital Scholarship in the Humanities, 39(1), 13-29.

