-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Average of Dates #54542
Comments
What should this return for |
Workaround: Rounding is mandatory to avoid |
To answer your question about the average of Joking apart, following the principle of this discussion about average of integers [return a Float and let the user explicitly decide about spurious accuracy], I'd say that the mean of 2024-05-22 and 2024-05-21 is
which (extra Julia) I'd round to 2024-05-21 because t1 is in facts somwhere between 2024-05-21T00:00:00 and 2024-05-21T23:59:59 and t2 between 2024-05-22T00:00:00 and 2024-05-22T23:59:59. |
Another line of thought: For two values, their mean is their "middle". For 2024-05-21 and 2024-05-22, their middle seems to be midnight, i.e. 2024-05-22T00:00:00, so their mean should be 2024-05-22. No, it shouldn't. IMHO, we shouldn't define it, as it unclear what it should be. Let's not define |
julia> convert(Date, Day(round(mean(Dates.value.([Date("2000-01-01"), Date("2004-01-01")])))))
2001-12-31 I'm certain someone would consider this a bug and expect 2002-01-01. Did I mention I'd prefer not define |
I'm a little unclear what the definition of average/mean when you can neither add nor divide by a count. Median and extrema seem well-defined to me, but mean feels a lot iffier... |
Generations of astronomers did it however. After all, for them time is just a number, the Julian day number. Personally, I need it for a regression y=f(t) with t the time. And from time to time, I also need it when I have a bunch of events supposed to arise at about the same time, but are known to be normally distributed. It is just like temperature: adding or dividing by a count have no meaning, but you find average temperature in any newspaper. |
Computationally, it is also easy to define in a rigorous way, because while Date cannot be added, delta days can be. And we can conveniently pick day 0 for the arithmetic, which makes it seem like our Date kind and Days kind are almost alike in units (although in strict mathematics, they are not):
|
I think the decision depends on the rules for dividing
julia> Day(2)/2
1 day
julia> Day(1)/2
ERROR: InexactError: Int64(0.5)
Imho it would be best if we could define precise semantics for the date type and for |
Totally agree with @vtjnash: a Date is a point in Time, and Time is continuous. |
jariji : I suggest solution 4, although I usually use solution 3, rounding down. The resaon is that a Date like 2024-05-25 means any point in time between 2024-05-25 midnight and 2024-05-26 midnight. So a Date refers to a point in time which is (on average) 12 hours after its literal value. And a the mean of a bunch of Dates will be on average 12 hours after However, solution 4 is in accordance with Julia philosophy : let the rouding rule be explicitely stated by the user. Solution 1 and 2 are just painfull. |
Oh yes, there is a leap year at one end and not at the other, so 2001-12-31 is in fact correct, just as
Maybe I rephrase the issue title in Average of Time (time not beeing a Julia type). |
Rounding is mandatory to avoid Workaround: mean_dates(dates) = convert(DateTime, Millisecond(mean(Dates.value.(dates)))) |
So... a Date is a specific but unknown (within the day) point in time? To me, a date is rather an interval (usually of 24 hours length).
Ok, but why then do I get 2006-01-01 for the mean of 2004-01-01 and 2008-01-01? Same situation wrt leap year, no? If I look at this: julia> for y in 2000:2020
d1 = Date(y)
d2 = Date(y+4)
m = convert(Date, Day(round(mean(Dates.value.([d1, d2])))))
println("Mean of $(d1) and $(d2) is $(m)")
end
Mean of 2000-01-01 and 2004-01-01 is 2001-12-31
Mean of 2001-01-01 and 2005-01-01 is 2003-01-01
Mean of 2002-01-01 and 2006-01-01 is 2004-01-02
Mean of 2003-01-01 and 2007-01-01 is 2004-12-31
Mean of 2004-01-01 and 2008-01-01 is 2006-01-01
Mean of 2005-01-01 and 2009-01-01 is 2007-01-02
Mean of 2006-01-01 and 2010-01-01 is 2008-01-01
Mean of 2007-01-01 and 2011-01-01 is 2009-01-01
Mean of 2008-01-01 and 2012-01-01 is 2009-12-31
Mean of 2009-01-01 and 2013-01-01 is 2011-01-01
Mean of 2010-01-01 and 2014-01-01 is 2012-01-02
Mean of 2011-01-01 and 2015-01-01 is 2012-12-31
Mean of 2012-01-01 and 2016-01-01 is 2014-01-01
Mean of 2013-01-01 and 2017-01-01 is 2015-01-02
Mean of 2014-01-01 and 2018-01-01 is 2016-01-01
Mean of 2015-01-01 and 2019-01-01 is 2017-01-01
Mean of 2016-01-01 and 2020-01-01 is 2017-12-31
Mean of 2017-01-01 and 2021-01-01 is 2019-01-01
Mean of 2018-01-01 and 2022-01-01 is 2020-01-02
Mean of 2019-01-01 and 2023-01-01 is 2020-12-31
Mean of 2020-01-01 and 2024-01-01 is 2022-01-01 I do believe this makes perfect sense in some contexts - but also that it may be rather confusing in others. (And I certainly couldn't predict these results.) |
I think it would be a very bad choice to make the mean of the mean of integers is non-integral... |
That why I said (but nobody seams to have read it): the best strategy would be to return a |
I'd be ok with defining |
You hit what Julia calls calendrical vs temporal nature of time (see doc). Since Babylonian astromomers, you record time on a calendar and compute time on |
Something like ?
Example:
|
Unitful has the same problem with regard to °C. It's interesting to consider that a generic fallback definition along the lines of In any case, it's probably more pragmatic to define specialized methods for |
in case the reference is useful, dropping the link to the |
Thanks @adienes, exactly what was expected. |
That you cannot add
Dates
and cannot divide aDates
by an integer seams perfectly normal.However computing the mean of
Dates
is well founded and sometimes mostly needed.Example:
The text was updated successfully, but these errors were encountered: