Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(package): Add model packages #5

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

its-sushant
Copy link

@its-sushant its-sushant commented Jun 28, 2022

Description

This pr is to add model training code and the python package code for logistic regression linear support vector machine model that has been created.

File Structure

                ├── linearsvc
                │   ├── LICENSE
                │   ├── MANIFEST.in
                │   ├── README.md
                │   ├── setup.py
                │   └── src
                │       ├── linearsvc
                │       │   ├── data
                │       │   │   └── linearsvc
                │       │   └── __init__.py
                │       └── model_train.py
                └── logreg
                    ├── LICENSE
                    ├── MANIFEST.in
                    ├── README.md
                    ├── setup.py
                    └── src
                        ├── logreg
                        │   ├── data
                        │   │   └── logreg
                        │   └── __init__.py
                        └── model_train.py

How to train

For training the model run python path/to/model_train.py

Notes

Test for the implemented model is done locally by creating the agents for logisticRegression and linearsvc on atarashi and the accuracy score that has been tested using evaluator.py is 63% for both models.

CC: @GMishx @Kaushl2208 @hastagAB @ag4ums @vasudevmaduri

logreg = Pipeline(
[
("vect", CountVectorizer()),
("tfidf", TfidfTransformer()),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just an idea came to my mind. Can we try using BM25 in place of TF-IDF and see if there are any improvements? This will also help us compare the two for the license domain.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya sure, I will try using BM25.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now I have modified the code as suggested and also added the package for linear support vector machine model.

Thanks:)

@its-sushant its-sushant changed the title feat(package): Add logreg package feat(package): Add model packages Jul 5, 2022
@hastagAB hastagAB self-requested a review July 5, 2022 09:37
Copy link
Member

@Kaushl2208 Kaushl2208 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, I have found few uniformity issues, You can fix them up in no time :)

Cheers!!

("clf", LinearSVC(n_jobs=1, C=1e5)),
]
)
print("Model training is started!")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
print("Model training is started!")
print("Model training has started!")

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done @Kaushl2208 bhaiya

train_data = data()

X_train = train_data.text
y_train = train_data.short_name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
y_train = train_data.short_name
Y_train = train_data.short_name

or lower caps the X_train as well :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

]
)
print("Model training is started!")
logreg_model = logreg.fit(X_train, y_train)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, The X_train and Y_train uniformity

def train():
train_data = data()

X_train = train_data.text
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue @its-sushant

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

]
)
print("Model training is started!")
logreg_model = logreg.fit(X_train, y_train)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here too :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

("clf", LogisticRegression(n_jobs=1, C=1e5)),
]
)
print("Model training is started!")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
print("Model training is started!")
print("Model training has started!")

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants