Skip to content

Commit

Permalink
feat: add code instructions
Browse files Browse the repository at this point in the history
  • Loading branch information
leomaurodesenv committed Nov 7, 2023
1 parent c860267 commit 74fb652
Showing 1 changed file with 55 additions and 6 deletions.
61 changes: 55 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,59 @@ This is a project study to create a NLP pipeline using DVC and Luigi. The pipeli
<p align="center"><img src="./docs/submission-score.png"></p>

This is a learning repository about DVC Data Version Control and Luigi Pipelines
---
## Code

- luigi, dvc, pre-commit
-
- setup https://pre-commit.com/, https://pre-commit.com/hooks.html
- setup https://github.com/Kaggle/kaggle-api
- `kaggle competitions download -c sentiment-analysis-on-movie-reviews -p data`
Download or clone this repository.

### Data

1. Setup your [Kaggle API](https://github.com/Kaggle/kaggle-api) to download the data.
3. Now, you can run the code using `luigi`!

### Running

```shell
## Create a Python environment
$ python -m venv .venv
$ source .venv/bin/activate

## Install requirements
$ pip install -r src/requirements.txt
## Install pre-commit [optional for development]
$ pre-commit install

## Download the dataset
$ kaggle competitions download -c sentiment-analysis-on-movie-reviews -p data

## Running
$ cd source && python -m luigi --module model Predict --local-scheduler
## Output:
# DEBUG: Checking if Predict() is complete
# INFO: Informed scheduler that task Predict__99914b932b has status PENDING
# INFO: Informed scheduler that task TrainModel__99914b932b has status PENDING
# INFO: Informed scheduler that task Preprocessing__99914b932b has status PENDING
# [...]
# INFO: Done scheduling tasks
# INFO: Running Worker with 1 processes
# DEBUG: Asking scheduler for work...
# DEBUG: Pending tasks: 4
# INFO: [pid 13975] Worker Worker(salt=677210727, workers=1, host=CL-PE08WLYF, username=leonardo-moraes, pid=13975) running ExtractRawData()
# INFO: [pid 13975] Worker Worker(salt=677210727, workers=1, host=CL-PE08WLYF, username=leonardo-moraes, pid=13975) done ExtractRawData()
# DEBUG: 1 running tasks, waiting for next task to finish
# INFO: Informed scheduler that task ExtractRawData__99914b932b has status DONE
# DEBUG: Asking scheduler for work...
# DEBUG: Pending tasks: 3
# INFO: [pid 13975] Worker Worker(salt=677210727, workers=1, host=CL-PE08WLYF, username=leonardo-moraes, pid=13975) running Preprocessing()
# INFO: [pid 13975] Worker Worker(salt=677210727, workers=1, host=CL-PE08WLYF, username=leonardo-moraes, pid=13975) done Preprocessing()
# DEBUG: 1 running tasks, waiting for next task to finish
# INFO: Informed scheduler that task Preprocessing__99914b932b has status DONE
# DEBUG: Asking scheduler for work...
# [...]
```

---
## Also look ~

- License [MIT](LICENSE)
- Created by [leomaurodesenv](https://github.com/leomaurodesenv/)

0 comments on commit 74fb652

Please sign in to comment.