40 years of research on eating disorders in domain-specific journals: Bibliometrics, network analysis, and topic modeling
Carlos A. Almenara
Original Upload Date: Nov 7, 2020
Abstract
Previous studies have used a query-based approach to search and gather scientific literature. Instead, the current study focused on domain-specific journals in the field of eating disorders. A total of 8651 documents (since 1981 to 2020), of whom 7899 had an abstract, were retrieved from: International Journal of Eating Disorders (n = 4185, 48.38%), Eating and Weight Disorders (n = 1540, 17.80%), European Eating Disorders Review (n = 1461, 16.88%), Eating Disorders (n = 1072, 12.39%), and Journal of Eating Disorders (n = 393, 4.54%). To analyze these data, diverse methodologies were employed: bibliometrics (to identify top cited documents), network analysis (to identify the most representative scholars and collaboration networks), and topic modeling (to retrieve major topics using text mining, natural language processing, and machine learning algorithms). The results showed that the most cited documents were related to instruments used for the screening and evaluation of eating disorders, followed by review articles related to the epidemiology, course and outcome of eating disorders. Network analysis identified well-known scholars in the field, as well as their collaboration networks. Finally, topic modeling identified 10 major topics whereas a time series analysis of these topics identified relevant historical shifts. This study discusses the results in terms of future opportunities in the field of eating disorders.
Keywords: eating disorders, big data, data mining, bibliometrics, social network analysis, machine learning, natural language processing, topic modeling
Paper citation (preprint)
Almenara, C. A. (2022, preprint). 40 years of research on eating disorders in domain-specific journals: Bibliometrics, network analysis, and topic modeling. https://doi.org/10.31234/osf.io/hxwez
Python Code (Main Sources)
Maksin, E. (2021). COVID-19 Literature Clustering. https://www.kaggle.com/code/maksimeren/covid-19-literature-clustering/notebook
Prabhakaran, S. (2018).Topic Modeling with Gensim (Python). https://www.machinelearningplus.com/nlp/topic-modeling-gensim-python/
References (NLP and Topic Modeling)
Brank, J., Mladenić, D., Grobelnik, M., & Milić-Frayling, N. (2008). Feature selection for the classification of large document collections. Journal of Universal Computer Science, 14(10), 1562–1596. https://doi.org/10.3217/jucs-014-10-1562
Geigle, C., Mei, Q., & Zhai, C. (2018). Feature engineering for text data. In G. Dong & H. Liu (Eds.), Feature engineering for machine learning and data analytics (pp. 15–54). CRC Press.
Kalyan, K. S., & Sangeetha, S. (2020). SECNLP: A survey of embeddings in clinical natural language processing. Journal of Biomedical Informatics, 101, 103323. https://doi.org/10.1016/j.jbi.2019.103323
Nadkarni, P. M., Ohno-Machado, L., & Chapman, W. W. (2011). Natural language processing: An introduction. Journal of the American Medical Informatics Association, 18(5), 544–551. https://doi.org/10.1136/amiajnl-2011-000464
Neumann, M., King, D., Beltagy, I., & Ammar, W. (2019). ScispaCy: Fast and robust models for biomedical natural language processing. In D. Demner- Fushman, K. B. Cohen, S. Ananiadou, & J. Tsujii (Eds.), Proceedings of the 18th BioNLP Workshop and Shared Task (pp. 319–327). Association for Computational Linguistics. https://doi.org/10.18653/v1/W19-5034
Nothman, J., Qin, H., & Yurchak, R. (2018). Stop word lists in free open-source software packages. Proceedings of Workshop for NLP Open Source Software (NLP-OSS), 7–12. https://doi.org/10.18653/v1/W18-2502
Řehůřek, R., & Sojka, P. (2010). Software framework for topic modelling with large corpora. Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, 45–50.
Wang, Y., Liu, S., Afzal, N., Rastegar-Mojarad, M., Wang, L., Shen, F., Kingsbury, P., & Liu, H. (2018). A comparison of word embeddings for the biomedical natural language processing. Journal of Biomedical Informatics, 87, 12–20. https://doi.org/10.1016/j.jbi.2018.09.008