[ENH] Implement proper Lift curve; keep Cumulative gains as an option #5075

janezd · 2020-11-06T20:33:11Z

Issue

Fixes #5056. The widget showed cumulative gain curve instead of lift chart. Now it can do both.

Description of changes

The function used to use sklearn's roc curve. This PR has its own, simpler function modelled by the one that was used.

Includes

Code changes
Tests
Documentation

codecov · 2020-11-06T20:42:46Z

Codecov Report

Merging #5075 (b84cc57) into master (1277566) will increase coverage by 0.00%.
The diff coverage is 94.54%.

@@           Coverage Diff           @@
##           master    #5075   +/-   ##
=======================================
  Coverage   84.74%   84.74%           
=======================================
  Files         286      286           
  Lines       60043    60061   +18     
=======================================
+ Hits        50884    50900   +16     
- Misses       9159     9161    +2

lanzagar · 2020-11-20T08:03:30Z

Orange/widgets/evaluate/owliftcurve.py


    graph_name = "plot"

+    YLabels = ("Lift", "TPR")


"TPR" used to be "TP Rate" and we still have "P Rate", so I suggest we use the same naming.

This was unintentional. Fixed. Thanks for noticing.

lanzagar · 2020-11-20T08:10:34Z

Documentation is missing for the new widget look and functions. Not sure if you plan to add that as well or @ajdapretnar will help with that?
After that I would merge this.

…otion

janezd · 2020-11-20T11:00:10Z

@ajdapretnar, I've rewritten the documentation and I ask for stamping the picture and reviewing the text.

doc/visual-programming/source/widgets/evaluate/liftcurve.md

ajdapretnar · 2020-11-20T11:21:24Z

doc/visual-programming/source/widgets/evaluate/liftcurve.md

-The **Lift curve** shows the relation between the number of instances which were predicted positive and those that are indeed positive and thus measures the performance of a chosen classifier against a random classifier. The graph is constructed with the cumulative number of cases (in descending order of probability) on the x-axis and the cumulative number of true positives on the y-axis. Lift curve is often used in segmenting the population, e.g., plotting the number of responding customers against the number of all customers contacted. You can also determine the optimal classifier and its threshold from the graph.
+The **Lift curve** shows to curves for analysing the proportion of true positive data instances in relation to the classifier's threshold or the number of instances that we classify as positive.
+
+Cummulative gains chart shows the proportion of true positive instances (for example, the number of clients who accept the offer) as a function of the number of positive instances (the number of clients contacted), assuming the the instances are ordered according to the models probability of being positive (e.g. ranking of clients).


according to the models --> according to the model's

Also, cumulative is written with a single m.

... which I occasionally, but seldom do.

ajdapretnar · 2020-11-20T11:26:27Z

doc/visual-programming/source/widgets/evaluate/liftcurve.md

 2. If test results contain more than one classifier, the user can choose which curves she or he wants to see plotted. Click on a classifier to select or deselect the curve.
 3. *Show lift convex hull* plots a convex hull over lift curves for all classifiers (yellow curve). The curve shows the optimal classifier (or combination thereof) for each desired TP/P rate.
 4. Press *Save Image* if you want to save the created image to your computer in a .svg or .png format.
 5. Produce a report.
-6. 2-D pane with **P rate** (population) as x-axis and **TP rate** (true positives) as a y-axis. The diagonal line represents the behavior of a random classifier. Click and drag to move the pane and scroll in or out to zoom. Click on the "*A*" sign at the bottom left corner to realign the pane.
+6. A plot with **Lift** or **true positive rate** vs. **P rate**. The dashed line represents the behavior of a random classifier.


P rate or positive rate (it would make sense to explain this as in the previous part with Lift)
Also, the second sentence refers only to cumulative gains. I'd make this a bit more obvious.

It does not. The dashed line also appears for lift, but it's horizontal, at 1. It is not always at the bottom; lift curve can go below 1.

Oh, didn't notice it! 👀

ajdapretnar · 2020-11-20T11:28:18Z

doc/visual-programming/source/widgets/evaluate/liftcurve.md

-
-References
----------
+The widgets that provide the right type of the signal needed by the **Lift Curve** (evaluation data) are [Test & Score](../evaluate/testandscore.md) and [Predictions](../evaluate/predictions.md).


The Predictions part is true only for labelled data. Perhaps make this clear?

Don't both?

Yes, but Test & Score warns you about missing target variable, while Predictions doesn't. I don't know, I just think it would be clearer that way.

I understand what you meant. But it's somehow clear that the data has to have a target variable, so I'd rather keep it short.

ajdapretnar · 2020-11-20T11:30:52Z

doc/visual-programming/source/widgets/evaluate/liftcurve.md


-Handouts of the University of Notre Dame on Data Mining - Lift Curve. Available [here](https://www3.nd.edu/~busiforc/handouts/DataMining/Lift%20Charts.html).
+In the example below, we observe the lift curve and cummulative gain for the bank marketing data, where the classification goal is to predict whether the client will accept a term deposit offer based on his age, job, education, marital status and similar data. The data set is available in the Datasets widget. We run the learning algorithms in the Test Learners widget and send the results to Lift Curve.  to see their performance against a random model. Of the two algorithms tested, logistic regression outperforms the naive Bayesian classifier. The curve tells us that by picking the first 20 % of clients as ranked by the model, we are going to hit four times more positive instances than by selecting a random sample with 20 % of clients.


results to Lift Curve. to see their performance against a random model.

? remove the full stop? Also, Test Learners is now Test and Score.

Whatever I was reading after writing this documentation, it was not this documentation. Sorry that you had to check my typos.

Tell me about it. I proofread my papers three times only for the second reader to find about 10 typos in them. 😒

... with the first one being the fifth word of the abstract.

janezd force-pushed the lift-curve-cummulative branch from 9212d09 to 46f4b6a Compare November 6, 2020 21:01

janezd assigned lanzagar Nov 13, 2020

lanzagar reviewed Nov 20, 2020

View reviewed changes

lanzagar changed the title ~~Implement proper Lift curve; keep Cumulative gains as an option~~ [ENH] Implement proper Lift curve; keep Cumulative gains as an option Nov 20, 2020

janezd assigned janezd and unassigned lanzagar Nov 20, 2020

janezd added 3 commits November 20, 2020 10:54

Lift Curve: Implement proper lift curve; keep cumulative gains as an …

070e43a

…otion

Lift Curve: Minor refactoring

1588c07

Lift Curve: Add tests

2559100

janezd force-pushed the lift-curve-cummulative branch from 46f4b6a to 6bf723e Compare November 20, 2020 10:58

janezd assigned ajdapretnar and unassigned janezd Nov 20, 2020

ajdapretnar reviewed Nov 20, 2020

View reviewed changes

doc/visual-programming/source/widgets/evaluate/liftcurve.md Outdated Show resolved Hide resolved

ajdapretnar reviewed Nov 20, 2020

View reviewed changes

janezd force-pushed the lift-curve-cummulative branch from 6bf723e to 589216b Compare November 20, 2020 11:32

LiftCurve: Update documentation

b84cc57

janezd force-pushed the lift-curve-cummulative branch from 589216b to b84cc57 Compare November 20, 2020 13:17

lanzagar merged commit d04e3cd into biolab:master Dec 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENH] Implement proper Lift curve; keep Cumulative gains as an option #5075

[ENH] Implement proper Lift curve; keep Cumulative gains as an option #5075

janezd commented Nov 6, 2020

codecov bot commented Nov 6, 2020 •

edited

Loading

lanzagar Nov 20, 2020

janezd Nov 20, 2020

lanzagar commented Nov 20, 2020

janezd commented Nov 20, 2020

ajdapretnar Nov 20, 2020

ajdapretnar Nov 20, 2020

janezd Nov 20, 2020

ajdapretnar Nov 20, 2020

janezd Nov 20, 2020

ajdapretnar Nov 20, 2020

ajdapretnar Nov 20, 2020

janezd Nov 20, 2020

ajdapretnar Nov 20, 2020

janezd Nov 20, 2020

ajdapretnar Nov 20, 2020

janezd Nov 20, 2020

ajdapretnar Nov 20, 2020

janezd Nov 20, 2020


		Handouts of the University of Notre Dame on Data Mining - Lift Curve. Available [here](https://www3.nd.edu/~busiforc/handouts/DataMining/Lift%20Charts.html).
		In the example below, we observe the lift curve and cummulative gain for the bank marketing data, where the classification goal is to predict whether the client will accept a term deposit offer based on his age, job, education, marital status and similar data. The data set is available in the Datasets widget. We run the learning algorithms in the Test Learners widget and send the results to Lift Curve. to see their performance against a random model. Of the two algorithms tested, logistic regression outperforms the naive Bayesian classifier. The curve tells us that by picking the first 20 % of clients as ranked by the model, we are going to hit four times more positive instances than by selecting a random sample with 20 % of clients.

[ENH] Implement proper Lift curve; keep Cumulative gains as an option #5075

[ENH] Implement proper Lift curve; keep Cumulative gains as an option #5075

Conversation

janezd commented Nov 6, 2020

Issue

Description of changes

Includes

codecov bot commented Nov 6, 2020 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lanzagar commented Nov 20, 2020

janezd commented Nov 20, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Nov 6, 2020 •

edited

Loading