-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] Enable multitarget problem types for OWTestAndScore and OWPredictions #5848
Conversation
Codecov Report
@@ Coverage Diff @@
## master #5848 +/- ##
==========================================
- Coverage 86.29% 86.28% -0.02%
==========================================
Files 315 315
Lines 66830 66884 +54
==========================================
+ Hits 57674 57712 +38
- Misses 9156 9172 +16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read the code to see the idea. My comments refer to what I spotted, and do not mean I like the idea. (I don't. :)
The problem is not the idea itself. I dislike is that it is rather a patch over bad overall design. We should consider some deeper changes, though I fear they lead towards discussing finally moving to pandas.
I think that in light of survival analysis and similar problems, we should consider multiple roles of variables. Currently, a variable can be an independent variable, a dependent variable or a meta, and they are stored in different matrices. We had another type (a weight), which was supposed to be unique (e.g. you cannot choose between multiple weight variables on the same data). You propose to add another role...
Going pandas would mean abandoning X, Y and metas as permanently materialized, and instead having column-based representation. At the same time, every column could be assigned a role. class_var
would then be a property that would return the variable that is assigned a "target role".
We need to decide whether to continue patching or bite into pandas.
fdcee4b
to
f76ef62
Compare
d9b8d98
to
63b7129
Compare
50eb9a0
to
5d4e3aa
Compare
ba37414
to
adc46c7
Compare
83163ea
to
04cf2e3
Compare
8b327bd
to
cc5b1be
Compare
cc5b1be
to
e80e45f
Compare
b29e861
to
b9ee08b
Compare
b9ee08b
to
04937ae
Compare
04937ae
to
f820dea
Compare
Issue
In survival analysis, it is expected to have two target variables. First is the duration of time until the event of interest, and the second is the indicator of censorship.
Preferably, the Survival Analysis add-on would then use the same infrastructure for testing and scoring survival models as it is currently in place for classification and regression related problems. To achieve this, we need to loosen the constraint of a single target variable for the input data.
Description of changes
With this pull request, the task was not to change the interface to accommodate for all future tasks one would like to support but to:
Since the registration of Scorers is already implemented, the next step was how to find the usable scorers given the input data. The Scorer base class now holds additional [information](https://github.com/biolab/orange3/compare/master...JakaKokosar:multi_target?expand=1#diff-ebac791194327a764153704a5e2567261585ad3971623c2138be81e8c02b8da5R69) to recognise Scorers that are built-in and those implemented through add-ons.If Scorer is defined as built-in, nothing changes. For non-built-in scorers, we look into Table attributes to determine the 'problem type' of input data. For example, for the survival analysis, As Survival data widget will set class variables to the output table and set attributes of the table as follows:
Usable scorers are those that match the same problem_type with input data. This is not necessarily the best solution and could use further debate. At this stage, no significant changes to the code-base were needed. In theory, everything else should be handled by Learners, Models and Scorers defined for related tasks biolab/orange3-survival-analysis#27.
Some examples:
Includes