HSMA Session 4D: Decision Trees & Random Forests

Slides

Lecture Recording

Decision Trees:

Random Forests:

Exercises

The notebooks in the exercises folder can be downloaded and run locally if you have Python installed.

Alternatively, you can run each exercise on Google Colab, a free online platform for coding exercises. You will need to be logged in to a google account in your browser.

Using the links below will open a fresh copy of the notebook to work on - your changes will not be visible to anyone else. However, if you want to be able to refer back to your version of the notebook in future, make sure you click 'File --> Save to Drive'. Your changes will then be saved to your own account, and you can access your edited copy of the notebook from https://colab.research.google.com/.

Open Exercise 1 in Google Colab:

Open Exercise 2 in Google Colab:

Exercise Structure

Notebooks are split into core, extension and challenge sections.

All students should aim to complete the exercises within the core section. Completing these exercises will give you practice of all of the key concepts discussed in the lectures and you can stop after this section if you wish.

Students looking to push themselves and their understanding can go on to attempt the extension exercises if they would like to.

The challenge section contains exercises that may go beyond what is covered in the lectures; there will be an expectation of looking things up in documentation or on sites such as StackOverflow, or using tools such as perplexity.ai to obtain boilerplate code. These exercises may take significantly longer than is allocated during the lectures and are designed to be an enjoyable challenge for those who want to push their coding skills.

Solutions

Coming Soon.

Learning Objectives

Students should be able to:

Explain the main points of how a decision tree works
List the methods that may be used to determine splits (Gini Impurity, entropy, information gain)
Explain why feature scaling is not required in decision trees
Explain some benefits and downsides of decision trees
Write code to classify a dataset using a decision tree using the sklearn library
Write code to plot the resulting decision tree
Explain some of the hyperparameters that may be set when using decision trees
- Pruning
- Minimum samples (leaf and split)
- Maximum depth
Explain the main points of how random forests work
List the two ways in which randomness is introduced into the tree building process
Explain the concept of bootstrapping
Explain the difference between sampling with and without replacement
Explain some benefits and downsides of random forests
Write code to classify a dataset using a random forest using the sklearn library
Explain what f1 score is
Explain the different kinds of averages that can be calculated for metrics like precision and recall
Write code to create a confusion matrix using the sklearn library
Explain the benefits of normalising a confusion matrix
Write code to create a normalised confusion matrix using the sklearn library

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
datasets		datasets
environment		environment
exercises		exercises
exercises_colab		exercises_colab
solutions		solutions
teaching_example		teaching_example
.gitignore		.gitignore
LICENCE.md		LICENCE.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HSMA Session 4D: Decision Trees & Random Forests

Slides

Lecture Recording

Exercises

Exercise Structure

Solutions

Learning Objectives

About

Releases

Packages

Languages

License

hsma-programme/h6_4d_decision_trees_random_forests

Folders and files

Latest commit

History

Repository files navigation

HSMA Session 4D: Decision Trees & Random Forests

Slides

Lecture Recording

Exercises

Exercise Structure

Solutions

Learning Objectives

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages