Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

middle is undefined for Date and DateTime (so median does not work) #47

Open
jameslefevre opened this issue Jul 15, 2020 · 7 comments
Open

Comments

@jameslefevre
Copy link

I have this working code:

using Dates
import Statistics.middle
middle(d1::Date, d2::Date) = Date(Dates.UTD(Int(round(middle(Dates.value(d1),Dates.value(d2))))))
middle(dt1::DateTime, dt2::DateTime) = DateTime(Dates.UTM(Int(round(middle(Dates.value(dt1),Dates.value(dt2))))))

But I wanted to quickly check that this is an appropriate approach before putting in a pull request. I saw open pull request #28 (relax type definition of middle), but I don't think helps - Date and DateTime should not subtype either Real or Number.

@krisztianpinter
Copy link

the problem runs deeper. apparently Statistics.jl assumes vector spaces, but most things work in affine spaces as well. in particular mean is the same concept as barycenter in affine spaces. DateTime forms an affine space, and thus mean should be defined for it, but it isn't as it is implemented now.

adding a specialization for DateTime to mean (and middle) seems like a patchwork. the true solution would be an implementation of mean which is affine space aware. consider:

mean t = t_base + sum (t_i - t_base) / n

it raises a number of issues, namely the choice of t_base, which is arbitrary in theory, but might introduce practical differences. we also don't want to work with numbers in this way for performance reasons, so some kind of Holy traits might come to the rescue.

@nalimilan
Copy link
Member

This issue is about median so that's totally independent from mean.

@quinnj Do you think the definition proposed in the OP is correct? Or would it make sense to define middle in terms of x + (y-x)/2?

@quinnj
Copy link
Member

quinnj commented Aug 18, 2020

OP definition seems.....fine. It's just treating Date/DateTime as raw integer values, then constructing directly from results. I think you can just do round(Int, ...) though instead of Int(round(...)).

But yeah, seems fine to me.

@nalimilan
Copy link
Member

nalimilan commented Aug 18, 2020

Cool. Feel free to make a PR @jameslefevre.

@nalimilan
Copy link
Member

To clarify what this addition would entail: it would mean that middle on Dates chooses the "middle" date allowing for some rounding. This would differ from date arithmetic, where e.g. x + (y-x)/2 fails if the result isn't an integer instead of returning a rounded result.

This may be OK though and would kind of justify the existence of middle. We would also need a more general version of middle which would be used by quantile, taking an argument γ and returning a rounded x + γ*(y-x) as in
https://github.com/JuliaLang/Statistics.jl/blob/74897fed33700dba92578aa0fefef5b99ba16086/src/Statistics.jl#L1006

@jariji
Copy link

jariji commented Mar 18, 2023

Or would it make sense to define middle in terms of x + (y-x)/2?

That makes most sense to me. AFAICT it's equivalent to the current definition (x + y) / 2 except it generalizes to affine spaces like dates and Fahrenheits.


I'm also not really convinced rounding is the right thing. How do we know whether to round up or down? InexactError would be safer so the user can be intentional about the decision when it's unclear what they meant, and that's what will happen automatically using the x + (y-x)/2 implementation. I'd propose middle(today, tomorrow) error and middle(yesterday, tomorrow) == today.

This will also apply for middle('a', 'b'), which is ambiguous.

@nalimilan
Copy link
Member

x + (y-x)/2 probably makes sense in general, but for Date and DateTime without rounding an error will be thrown most of the time so I'm not sure it would be useful for that use case. Maybe it would be useful for other cases though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants