Evaluating topic models: A review and guidelines
Simone Santoni and Matteo Devigili
Informed by the literature on statistical natural language processing and the results of a survey on organisation and management applications, the paper proposes some guidelines that could help authors and reviewers to consistently assess the validity of a topic model and, on a broader scale, to improve the reproducibility of results. These guidelines convey the idea that there is no one-fits-all approach to assessing the validity of topic models. Instead, the most suited approach is contingent upon the substantive scope of a topic modeling study and the specific goals of the empirical analysis. Four vignettes, which are grounded in prior studies on cultural markets, serve a twofold goal. First, they illustrate the scenarios that result from different combinations of a study’s substantive scope and empirical goals. Second, they offer the ground to see the guidelines in action in the context of a dataset containing song lyrics.