lda_topics_metaheuristics

NLP pipeline, Topic Classification and multicore hyperparameter tuning algorithms developed for the research "How COVID-19 Impacted Data Science: a Topic Retrieval and Analysis from GitHub Projects' Descriptions" (presented at the Brazilian Symposium On Databases 2021 (SBBD))

This work compares topics of interest from Data Science projects and their evolution over the COVID-19 pandemic period by analyzing Jupyter Notebook and Python GitHub projects from a year before and during the pandemic. We employ various state-of-art algorithms to find topics based on the repositories descriptions, and compare their performance for tuning the topic classification model hyperparameters for better accuracy.

The research dataset is also available on Zenodo: Greed: Github repositories and descriptions

Libraries and Algorithms

pylang
spaCy
gensim
scikit-learn
pandas
seaborn
numpy
Term Frequency and Inverse Document Frequency (TF-IDF)
Latent Dirichlet Allocation (LDA)
Differential Evolutionary (DE) and its Self Adaptive version (SADE)
Genetic Algorithm (GA)
Particle Swarm Optimization (PSO) and its Generational version (GPSO)
Simulated Annealing (SA)

Implementation

The Topic Aggregation_5stars.ipynb file is a Jupyter Notebook document that presents steps while aggregating topics into domains.

More details in the research paper

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
Experiments_Benchmark		Experiments_Benchmark
Experiments_more_5stars		Experiments_more_5stars
log		log
yearly_dataset/5_stars		yearly_dataset/5_stars
README.md		README.md
Topic Aggregation_5stars.ipynb		Topic Aggregation_5stars.ipynb
csvs_processer.py		csvs_processer.py
document_processor.py		document_processor.py
lda_pygmo.py		lda_pygmo.py
nlp_pipeline.py		nlp_pipeline.py
plots.py		plots.py
stopwords.txt		stopwords.txt
text_classifier.py		text_classifier.py
topic_agg.py		topic_agg.py
wordclouds.py		wordclouds.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

lda_topics_metaheuristics

Libraries and Algorithms

Implementation

About

Languages

amandacrtv/lda_topics_metaheuristics

Folders and files

Latest commit

History

Repository files navigation

lda_topics_metaheuristics

Libraries and Algorithms

Implementation

About

Topics

Resources

Stars

Watchers

Forks

Languages