Update README.md

Final edits to Prepare your method section.
eyra · Mar 31, 2024 · 9594844 · 9594844
1 parent 3e5bbdc
commit 9594844
Showing 1 changed file with 6 additions and 8 deletions.
diff --git a/README.md b/README.md
@@ -20,19 +20,17 @@ This is a template repository to prepare your submission for phase 1 of the Pred
 
 ## Prepare your method
 
-To participate in the challenge you need to submit a method (i.e. code for data preprocessing, training, and making predictions, and the trained model) using this repository. 
+To participate in the challenge you need to submit a method using this repository. 
 
-ℹ️ You can use either Python or R for your method. By default, Python is used. For Python this repo assumes that your method uses the [Anaconda](https://docs.conda.io/projects/conda/en/stable/user-guide/install/index.html) Python distribution.
+1. **Choose your programming language**: the default set-up is Python, if you would like to use R, go to ```settings.json``` and change ```{"dockerfile": "python.Dockerfile"}``` into ```{"dockerfile": "r.Dockerfile"}```. Read [here](https://github.com/eyra/fertility-prediction-challenge/wiki#how-to-update-files-in-your-forked-repository) how to update files in your forked repository. ℹ️ For Python this repo assumes that your method uses the [Anaconda](https://docs.conda.io/projects/conda/en/stable/user-guide/install/index.html) Python distribution.
 
-1. **Choose your programming language**: the default set-up is Python, if you would like to use R, go to ```settings.json``` and change ```{"dockerfile": "python.Dockerfile"}``` into ```{"dockerfile": "r.Dockerfile"}```. Read [here](https://github.com/eyra/fertility-prediction-challenge/wiki#how-to-update-files-in-your-forked-repository) how to update files in your forked repository. 
+2. **Choose the main script to work with**: go to ```submission.py``` for Python or ```submission.R``` for R. 
 
-2. **Choose the main script to work with**: go to ```submission.py``` (Python) or ```submission.R``` (R) depending on your preferred programming language. 
+3. **Preprocess the data**: any steps to clean or preprocess the data need to be added to the ```clean_df``` function in the `submission.py`/`submission.R` script with documentation. *Note*: The function ```clean_df``` will also be applied to the holdout data when you submit your model. At this point, the [codebooks](https://preferdatachallenge.nl/posts/posts/2024-03-21-prefer-codebooks.html) can be useful to make sense of the data.
 
-3. **Preprocess the data**: any steps to clean or preprocess the data need to be documented within the function ```clean_df``` in the `submission.py` / `submission.R` script (depending on your preferred programming language). *Note*: The function ```clean_df``` will also be applied to the holdout data when you submit your model. At this point, the [codebooks](https://preferdatachallenge.nl/posts/posts/2024-03-21-prefer-codebooks.html) can be useful to make sense of the data.
+4. **Train, tune, and save your model**: any steps to train your model need to be added to the `training.py`/`training.R` script with documentation  (e.g., code for the model, number of folds, set seed). The only function in this script is `train_save_model` in which you can add the steps needed to run the model. The output of this script is your saved model, either ```model.joblib``` for Python or  ```model.rds``` for R. Make sure that your model is saved in the same folder as `submission.py`/`submission.R` under the name `model.joblib` for Python or `model.rds` for R. The model will be applied to the holdout data when you submit your method. 
 
-4. **Train, tune, and save your model**: any steps to train your model need to be documented (e.g., code for the model, number of folds, set seed) within the  `training.py` / `training.R` script. The only function in this script is `train_save_model` in which you can put the steps needed to run the model. The output of this script is your saved model, either ```model.joblib``` or  ```model.rds```. Make sure that your model is saved in the same folder as `submission.py`/`submission.R` under the name `model.joblib` (for Python) or `model.rds` (for R). The model will be applied to the holdout data when you submit your model. 
-
-5. **Test your model on fake data**: you can test your ```clean_df``` function and your model (stored in:  ```model.joblib```/```model.rds```) on fake data (`PreFer_fake_data.csv`) through the function ```predict_outcomes```. You will also need to adapt this function such that the outputs of your model are predicted classes (i.e., 0s and 1s) rather than, for example, probabilities. Make sure to add or edit dependencies as described [here](https://github.com/eyra/fertility-prediction-challenge/wiki#how-to-add-or-edit-dependencies-librariespackages). If your method does not run on the "fake data", it will not run on the holdout data. If you passed the test (i.e.```predict_outcomes``` led to predictions rather than errors), you can [submit your method](https://github.com/eyra/fertility-prediction-challenge/tree/master#submit-your-method). 
+5. **Test your model on fake data**: you can test your ```clean_df``` function and your model (stored in:  ```model.joblib```/```model.rds```) on the fake data (`PreFer_fake_data.csv`) with the ```predict_outcomes``` function. The the ```predict_outcomes``` function in `submission.py`/`submission.R` will be run on the holdout data to generate your challenge submission result on the leaderboard. Make sure that the outputs of your model are predicted classes (i.e. 0s and 1s) rather than, for example, probabilities and to add or edit dependencies when required as described [here](https://github.com/eyra/fertility-prediction-challenge/wiki#how-to-add-or-edit-dependencies-librariespackages). If your method does not run on the "fake data", it will not run on the holdout data. If you passed the test (i.e.```predict_outcomes``` led to predictions rather than errors), you can start [submitting your method](https://github.com/eyra/fertility-prediction-challenge/tree/master#submit-your-method). 
 
 ℹ️ Check out [this website](https://preferdatachallenge/posts) for videos, guides, notebooks, and blogs to guide you through this process.