Identifying Files and Workflows Contributing to Technical Debt in GitHub Repositories Using Data Mining and Natural Language Processing Techniques

Features

Data Fetching from GitHub Actions Workflows
Efficiently fetch data from GitHub Actions workflows for further analysis.
Data Cleaning
Clean and preprocess the fetched data to ensure consistency and accuracy for downstream tasks.
Automatic Text Classification with NLP
Leverage a pre-trained NLP model to automatically classify text,as TD and Not_TD instances
Technical Debt (TD) Visualization
To visualize technical debt, we have generated insightful plots using popular Python libraries such as Matplotlib, Pandas, and Seaborn. These visualizations help to better understand the distribution and impact of technical debt across different aspects of the project.

Installation

Prerequisites

Ensure that you have the following installed:

Clone the repository

First, clone the project to your local machine:

git clone https://github.com/Aqila-Farahmand/MasterThesis
cd your-repository

Create a virtual environment (optional but recommended)

python -m venv venv
source venv/bin/activate    # For Linux/macOS
venv\Scripts\activate       # For Windows

Install the dependencies

Use pip to install all the required packages from requirements.txt:

pip install -r requirements.txt

Usage

Running the Project

After installing the dependencies, you can run the project with:

Fetching Data
To fetch the required data, run the following command:

python -m data_fetching.__main__

Data Cleaning For data cleaning, you can use Google Colab or Jupyter Notebook. Import the file from data_cleaning/clean_data.ipynb and run the code there.
Text Classification The text classification process uses an NLP model trained on a large dataset from GitHub issues. Due to GitHub's large file size limitations, only the inference code is provided in this repository.
Technical Debt (TD) Visualization
You can generate simple plots to visualize your data using the script located at data_visualization/workflow_analysis.py. Simply run the file to create visualizations based on your dataset.

Configuration

To fetch data, you'll need to configure authentication for making GitHub API requests.

Generate a Personal Access Token (GITHUB_TOKEN) for authenticated API requests.

Set the GITHUB_TOKEN as an environment variable:

env:
  GITHUB_TOKEN: ${{your_github_token}}

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Steps to contribute:

Fork the repository
Create a new branch (git checkout -b feature-branch)
Commit your changes (git commit -m 'Add new feature')
Push to the branch (git push origin feature-branch)
Create a new pull request

License

Apache 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
.github/workflows		.github/workflows
cache		cache
data_cleaning		data_cleaning
data_fetching		data_fetching
data_visualization		data_visualization
images		images
tests		tests
text_classification		text_classification
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.text		requirements.text

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Identifying Files and Workflows Contributing to Technical Debt in GitHub Repositories Using Data Mining and Natural Language Processing Techniques

Features

Installation

Prerequisites

Clone the repository

Create a virtual environment (optional but recommended)

Install the dependencies

Usage

Running the Project

Configuration

Contributing

Steps to contribute:

License

About

Releases

Packages

Languages

License

Aqila-Farahmand/MasterThesis

Folders and files

Latest commit

History

Repository files navigation

Identifying Files and Workflows Contributing to Technical Debt in GitHub Repositories Using Data Mining and Natural Language Processing Techniques

Features

Installation

Prerequisites

Clone the repository

Create a virtual environment (optional but recommended)

Install the dependencies

Usage

Running the Project

Configuration

Contributing

Steps to contribute:

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages