-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conventions about the representations of scalar data #86
Comments
Makes sense to me, small comments:
In addition, it might be worth considering functionality which:
|
also, I find the name "kind" weird. "in type system 1 (= Julia standard) it has type X" rather than "in system 1 it has type X and also type Y". More precisely, we want to assert Is there a more natural way to do this "double typing" (i.e., the first option) in Julia? And given that we do not want to re-write all of Julia's type management in "system 2". |
Before reading previous two comments (addressed in additional post(s) below): Updated document to include built-in Missing type and changed "kind" to "scitype". |
Sure. Makes sense.
I don't see why not. So, I could have, eg, Continuous{0}{Inf}?
Yes. Julia already has
Absolutely and needed for the task interface (see next comment box below). There is minor technical annoyance: while an element in a column determines the scitype, the type of the element does not. The Tables.jl interface provides the eltypes but we have to dig inside the table to get the scitype. |
Agreed, anticipated and corrected: kind -> scitype
Unfortunately, the julia type does not determine the scitype. It would if
I'm not sure I understand your objection. Let me say that the scientific type hierarchy is a hierarchy of Julia types. So we get all the type semantics for free! So, eg, I can specify things like:
And the logic for matching models to tasks is compactly expressed. So, eg, if the task data has univariate target scitype |
Mmm. One misgiving about parameterising |
Regarding "objection": it's not an objection, just a comment that it appears that it might not be possible for a number (e.g., 42) to "have" both the type integer and the "scitype" OrderedFactorInfinite, in the sense that both are Julia types of the number. |
Regarding bounds of Continuous: I see, that's a bit troubling. Should there hence be an optional step of user input (e.g., triggering a type conversion to CategoricalArray)? |
Now implemented as the basis of an overhaul of trait functions (metadata). See Scientific Data Types and the updated Adding New Models guide for details. |
I think MLJ should have clear conventions regarding the representation of the various "scientific" data types (continuous, ordered factor, and so forth). To this end, I have drafted this document and invite collaborators' responses.
Related: #81
The text was updated successfully, but these errors were encountered: