Skip to content

Commit

Permalink
Merge pull request #48 from lisasivak/patch-6
Browse files Browse the repository at this point in the history
Update README.md
  • Loading branch information
vloothuis authored Apr 23, 2024
2 parents 46b7830 + 5d01fe3 commit ff43f92
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,9 +28,9 @@ To participate in the challenge you need to submit a method using this repositor

3. **Preprocess the data**: any steps to clean or preprocess the data need to be added to the ```clean_df``` function in the `submission.py`/`submission.R` script with documentation. *Note*: The function ```clean_df``` will also be applied to the holdout data when you submit your model. At this point, the [codebooks](https://preferdatachallenge.nl/posts/posts/2024-03-21-prefer-codebooks.html) can be useful to make sense of the data.

4. **Train, tune, and save your model**: any steps to train your model need to be added to the `training.py`/`training.R` script with documentation (e.g., code for the model, number of folds, set seed). The only function in this script is `train_save_model` in which you can add the steps needed to run the model. The output of this script is your saved model, either ```model.joblib``` for Python or ```model.rds``` for R. Make sure that your model is saved in the same folder as `submission.py`/`submission.R` under the name `model.joblib` for Python or `model.rds` for R. The model will be applied to the holdout data when you submit your method.
4. **Train, tune, and save your model**: any steps to train your model need to be added to the `training.py`/`training.R` script with documentation (e.g., code for the model, number of folds, set seed). The only function in this script is `train_save_model` in which you can add the steps needed to run the model. The output of this script is your saved model, e.g. ```model.joblib``` for Python or ```model.rds``` for R. Make sure that your model is saved in the same folder as `submission.py`/`submission.R` under the name `model.joblib` for Python or `model.rds` for R. You can save the model in another format as well.

5. **Test your model on fake data**: you can test your ```clean_df``` function and your model (stored in: ```model.joblib```/```model.rds```) on the fake data (`PreFer_fake_data.csv`) with the ```predict_outcomes``` function. The ```predict_outcomes``` function in `submission.py`/`submission.R` will be run on the holdout data to generate your challenge submission result on the leaderboard. Make sure that the outputs of your model are predicted classes (i.e. 0s and 1s) rather than, for example, probabilities and to add or edit dependencies when required as described [here](https://github.com/eyra/fertility-prediction-challenge/wiki/PreFer-Challenge-Wiki#how-to-add-or-edit-dependencies-librariespackages). If your method does not run on the "fake data", it will not run on the holdout data. If you passed the test (i.e.```predict_outcomes``` led to predictions rather than errors), you can start [submitting your method](https://github.com/eyra/fertility-prediction-challenge/tree/master#submit-your-method).
5. **Test your model on fake data**: you can test your ```clean_df``` function and your model (stored in: ```model.joblib```/```model.rds```) on the fake data (`PreFer_fake_data.csv`) with the ```predict_outcomes``` function. The ```predict_outcomes``` function in `submission.py`/`submission.R` will be run on the holdout data to generate your challenge submission result on the leaderboard. Make sure that the outputs of your model are predicted classes (i.e. 0s and 1s) rather than, for example, probabilities. If you saved the model in another format (not 'joblib' for Python or 'rds' for R), update the way of loading the model. Also, make sure to add or edit dependencies when required as described [here](https://github.com/eyra/fertility-prediction-challenge/wiki/PreFer-Challenge-Wiki#how-to-add-or-edit-dependencies-librariespackages). If your method does not run on the "fake data", it will not run on the holdout data. If you passed the test (i.e.```predict_outcomes``` led to predictions rather than errors), you can start [submitting your method](https://github.com/eyra/fertility-prediction-challenge/tree/master#submit-your-method).

ℹ️ Check out [this website](https://preferdatachallenge.nl/posts) for guides, notebooks, and blogs to guide you through this process.

Expand Down

0 comments on commit ff43f92

Please sign in to comment.