Skip to content

Latest commit

 

History

History
173 lines (144 loc) · 4.46 KB

README.md

File metadata and controls

173 lines (144 loc) · 4.46 KB

Simple decision tree induction and classification using ID3 algorithm with Gini index impurity measure. Shows calculations on each step. Allows only 2 classes (Y, G) and all instance parameters (x1, x2, ...) can have one of 2 values (c, d).

Created during ML course in university, following S. Russell and P. Norvig book "Artificial Intelligence - A Modern Approach" and lecture materials.

C#, VS 2015+.

Training and test sets are hardcoded in Main for simplicity.

Can be run using online compiler here: https://dotnetfiddle.net/KNvfGJ.

Output example resulting in this decision tree:

e1: x1=c x2=c x3=d x4=d x5=d - Y
e2: x1=d x2=c x3=d x4=d x5=d - G
e3: x1=c x2=d x3=d x4=c x5=c - Y
e4: x1=c x2=c x3=d x4=c x5=d - Y
e5: x1=d x2=d x3=c x4=c x5=d - G
e6: x1=c x2=d x3=d x4=d x5=c - Y
e7: x1=d x2=d x3=c x4=d x5=d - Y
e8: x1=c x2=d x3=d x4=c x5=d - G
e9: x1=d x2=c x3=d x4=c x5=d - G
e10: x1=d x2=c x3=d x4=c x5=c - G

Selecting parameter for
e1: x1=c x2=c x3=d x4=d x5=d - Y
e2: x1=d x2=c x3=d x4=d x5=d - G
e3: x1=c x2=d x3=d x4=c x5=c - Y
e4: x1=c x2=c x3=d x4=c x5=d - Y
e5: x1=d x2=d x3=c x4=c x5=d - G
e6: x1=c x2=d x3=d x4=d x5=c - Y
e7: x1=d x2=d x3=c x4=d x5=d - Y
e8: x1=c x2=d x3=d x4=c x5=d - G
e9: x1=d x2=c x3=d x4=c x5=d - G
e10: x1=d x2=c x3=d x4=c x5=c - G
Gini(E) = 0,5
    Gini(E, x1) = 0,32
    P(x1=c) = 0,5 Gini(Ex1=c) = 0,32; P(x1=d) = 0,5 Gini(Ex1=d) = 0,32
GiniGain(x1) = 0,18
    Gini(E, x2) = 0,48
    P(x2=c) = 0,5 Gini(Ex2=c) = 0,48; P(x2=d) = 0,5 Gini(Ex2=d) = 0,48
GiniGain(x2) = 0,02
    Gini(E, x3) = 0,5
    P(x3=c) = 0,2 Gini(Ex3=c) = 0,5; P(x3=d) = 0,8 Gini(Ex3=d) = 0,5
GiniGain(x3) = 0
    Gini(E, x4) = 0,42
    P(x4=c) = 0,6 Gini(Ex4=c) = 0,44; P(x4=d) = 0,4 Gini(Ex4=d) = 0,38
GiniGain(x4) = 0,08
    Gini(E, x5) = 0,48
    P(x5=c) = 0,3 Gini(Ex5=c) = 0,44; P(x5=d) = 0,7 Gini(Ex5=d) = 0,49
GiniGain(x5) = 0,02
Selected x1

x1=c

Selecting parameter for
e1: x2=c x3=d x4=d x5=d - Y
e3: x2=d x3=d x4=c x5=c - Y
e4: x2=c x3=d x4=c x5=d - Y
e6: x2=d x3=d x4=d x5=c - Y
e8: x2=d x3=d x4=c x5=d - G
Gini(E) = 0,32
    Gini(E, x2) = 0,27
    P(x2=c) = 0,4 Gini(Ex2=c) = 0; P(x2=d) = 0,6 Gini(Ex2=d) = 0,44
GiniGain(x2) = 0,05
    Gini(E, x3) = 0,32
    P(x3=c) = 0 Gini(Ex3=c) = 1; P(x3=d) = 1 Gini(Ex3=d) = 0,32
GiniGain(x3) = 0
    Gini(E, x4) = 0,27
    P(x4=c) = 0,6 Gini(Ex4=c) = 0,44; P(x4=d) = 0,4 Gini(Ex4=d) = 0
GiniGain(x4) = 0,05
    Gini(E, x5) = 0,27
    P(x5=c) = 0,4 Gini(Ex5=c) = 0; P(x5=d) = 0,6 Gini(Ex5=d) = 0,44
GiniGain(x5) = 0,05
Selected x2

x2=c

All have class Y
e1: x3=d x4=d x5=d - Y
e4: x3=d x4=c x5=d - Y

x2=d

Selecting parameter for
e3: x3=d x4=c x5=c - Y
e6: x3=d x4=d x5=c - Y
e8: x3=d x4=c x5=d - G
Gini(E) = 0,44
    Gini(E, x3) = 0,44
    P(x3=c) = 0 Gini(Ex3=c) = 1; P(x3=d) = 1 Gini(Ex3=d) = 0,44
GiniGain(x3) = 0
    Gini(E, x4) = 0,33
    P(x4=c) = 0,666666666666667 Gini(Ex4=c) = 0,5; P(x4=d) = 0,333333333333333 Gini(Ex4=d) = 0
GiniGain(x4) = 0,11
    Gini(E, x5) = 0
    P(x5=c) = 0,666666666666667 Gini(Ex5=c) = 0; P(x5=d) = 0,333333333333333 Gini(Ex5=d) = 0
GiniGain(x5) = 0,44
Selected x5

x5=c

All have class Y
e3: x3=d x4=c - Y
e6: x3=d x4=d - Y

x5=d

All have class G
e8: x3=d x4=c - G

x1=d

Selecting parameter for
e2: x2=c x3=d x4=d x5=d - G
e5: x2=d x3=c x4=c x5=d - G
e7: x2=d x3=c x4=d x5=d - Y
e9: x2=c x3=d x4=c x5=d - G
e10: x2=c x3=d x4=c x5=c - G
Gini(E) = 0,32
    Gini(E, x2) = 0,2
    P(x2=c) = 0,6 Gini(Ex2=c) = 0; P(x2=d) = 0,4 Gini(Ex2=d) = 0,5
GiniGain(x2) = 0,12
    Gini(E, x3) = 0,2
    P(x3=c) = 0,4 Gini(Ex3=c) = 0,5; P(x3=d) = 0,6 Gini(Ex3=d) = 0
GiniGain(x3) = 0,12
    Gini(E, x4) = 0,2
    P(x4=c) = 0,6 Gini(Ex4=c) = 0; P(x4=d) = 0,4 Gini(Ex4=d) = 0,5
GiniGain(x4) = 0,12
    Gini(E, x5) = 0,3
    P(x5=c) = 0,2 Gini(Ex5=c) = 0; P(x5=d) = 0,8 Gini(Ex5=d) = 0,38
GiniGain(x5) = 0,02
Selected x2

x2=c

All have class G
e2: x3=d x4=d x5=d - G
e9: x3=d x4=c x5=d - G
e10: x3=d x4=c x5=c - G

x2=d

Selecting parameter for
e5: x3=c x4=c x5=d - G
e7: x3=c x4=d x5=d - Y
Gini(E) = 0,5
    Gini(E, x3) = 0,5
    P(x3=c) = 1 Gini(Ex3=c) = 0,5; P(x3=d) = 0 Gini(Ex3=d) = 1
GiniGain(x3) = 0
    Gini(E, x4) = 0
    P(x4=c) = 0,5 Gini(Ex4=c) = 0; P(x4=d) = 0,5 Gini(Ex4=d) = 0
GiniGain(x4) = 0,5
    Gini(E, x5) = 0,5
    P(x5=c) = 0 Gini(Ex5=c) = 1; P(x5=d) = 1 Gini(Ex5=d) = 0,5
GiniGain(x5) = 0
Selected x4

x4=c

All have class G
e5: x3=c x5=d - G

x4=d

All have class Y
e7: x3=c x5=d - Y

e19: x1=c x2=d x3=c x4=c x5=c - Y == Y
e20: x1=d x2=c x3=c x4=d x5=c - G == G