-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pairwise functions #123
Comments
While I'm a little concerned about (2), I'm not sure I agree about (1). Why should DataVec's version of Have you examined the low-level bits of diff(dv)[1] and diff(v)[1]? I'm not totally sure that these are floating point errors as much as a dissimilarity in how DataVec's are being printed relative to Vector's. |
#1 is from this in the documentation:
Based on how I read this I was expecting an insertion of #2, I didn't think to look at low-level bits (and don't know how to quite frankly) but will check into this. Ideally though we'd like to print something neat. |
I do agree that diff(dv) should produce a vector of the same length as the And I agree with John, that the printing thing is almost certainly a On Mon, Dec 17, 2012 at 12:11 AM, milktrader notifications@gh.neting.ccwrote:
|
R gives you NAs for free > NAfive = matrix(c(NA,2,3,4,5))
> NAfive
[,1]
[1,] NA
[2,] 2
[3,] 3
[4,] 4
[5,] 5
> SMA(NAfive,3)
[1] NA NA NA 3 4 Duplicating this in Juia, with a little dance around getting NAs into a vector ... julia> five = DataVec([NA,2,3,4,5])
no promotion exists for NAtype and Int64
in promote_type at promotion.jl:14
in promote_type at promotion.jl:8
in cat at abstractarray.jl:655
in vcat at abstractarray.jl:668
julia> five = DataVec([1,2,3,4,5])
5-element Int64 DataVec
1
2
3
4
5
julia> NAfive = five;
julia> NAfive[1] = NA
NA
julia> NAfive
5-element Int64 DataVec
NA
2
3
4
5
julia> moving_average(NAfive,3)
3-element Any Array:
NA
3.0
4.0 SMA (a moving average function in TTR) returns an equal length matrix while applying the my moving_average function (basically a one-liner) in Julia results in a truncated array. //Need to understand Julia's printing routines |
Yep, same deal. It does seem to me that we should change those defaults to One note on the constructor. This should work:
Note no parens. There's a cute trick with referencing into types that lets On Mon, Dec 17, 2012 at 8:32 AM, milktrader notifications@gh.neting.ccwrote:
|
nice constructor tip, thanks! |
I'm on the train, so I may take time to respond, but I want to voice my disagreement again. To me it's much important that behavior within Julia is consistent that that we seek agreement with R. I think it's really confusing if diff behaves so differently depending on the type of the inputs. Imagine that you test out a looping algorithm using vectors. You try to make it work with DataVec's and suddenly it's broken? That seems really terrible to me. In general, my priorities in order are:
The consistency argument also answers the printing question for me: DataVec's should print like vectors. |
I see your point. This is a case for options, maybe. How about:
When writing looping algorithms, yes, you'd want diff(DV) to act like On Mon, Dec 17, 2012 at 9:02 AM, John Myles White
|
@HarlanH Now we're cooking with fire! I'm always up for options. I'll try to get to making that change soon. @milktrader, I'd like to rewrite the manual so that there's no ambiguity. Would the following be better?
Thinking about the consistency argument more, my general principle is this: functions on DataVec's should, by default, behave as if you had called the standard Julia function on the inputs in a way that obeys the |
Sounds like a great solution. I was willing to introduce NAs at the function level but that's definitely not ideal. I can see the point about consistency with Julia is primary. Btw, is there an Here's my version: function moving_average(x,n)
[sum(x[i:i+(n-1)])/n for i=1:length(x)-(n-1)]
end |
I don't think there is. I'm always tempted to try to get those functions added to Julia, but the core team is trying to keep the core language small. The reason I mention that is that my general strategy is to define something like Regarding our earlier conversation, you can check out the raw bits using the
As suspected, the real bug isn't the function |
Aha, thanks for the raw bits information. I'm planning a technical analysis package that would have functions like |
I want to point out that diff and moving average are fundamentally I also want to +1 JMW's priorities for DataFrames. On Monday, December 17, 2012, milktrader wrote:
|
On Monday, December 17, 2012, Harlan Harris wrote:
I have to agree with John, this strikes me as a really weird behavior.
This may actually be a technology issue that became baked in: until the |
Adding new things to Base is ok, but there's definitely tension between wanting to keep is small and wanting to have lots of useful stuff just available. It's easier to add things later than get rid of them, so we're biased towards conservatism. One nice thing about the way |
Maintaining length without a lot of counting is pretty important for On Mon, Dec 17, 2012 at 3:13 PM, Stefan Karpinski
|
If you view diff as doing a moving average with weights [-1,1,0] then putting the NA up front makes perfect sense, but I think that's at odds with how Matlab views diff. Maybe different functions make sense? A flexible mva function that takes a centered window of weights or just a number? |
Just to be clear, this is R's
So I'm thinking that we should use an entirely different function name for the desired behavior. |
John and I were speculating last night that R might supply the NAs if you apply diff to a data frame, but I can't get that to work: > df = data.frame(foo=c(2,3,1))
> diff(df)
data frame with 0 columns and 3 rows
> diff(df$foo) Admittedly my R is very rusty these days (it's hard to believe there was even a time it was pretty good), but am I missing something there? Are there situations where R does actually supply the NAs for you? I also note that diff with negative lag isn't defined: > diff(v,lag=-1)
Error in diff.default(v, lag = -1) :
'lag' and 'differences' must be integers >= 1 Matlab does the same thing. Unfortunately, however, Matlab takes a second argument to diff to mean something rather different than R: in R I would propose a [ 1 <= i-k <= length(v) ? v[i]-v[i-k] : NA for i=1:length(v) ] This definition kind of makes me think that it would be very convenient for a lot of things if indexing off the end of a DataVec or DataFrame returned NAs. With that behavior, that could be written as just: [ v[i]-v[i-k] for i=1:length(v) ] which is simple enough that it almost doesn't even merit its own function. |
I'm down with calling it deltas() and using the second argument for lag and Would it make sense to have a version for non-DataVecs that returns a Would it make sense to have another function with the Matlab n-th order On Tue, Dec 18, 2012 at 6:53 AM, Stefan Karpinski
|
I think that |
hmm, I play more with > class(ttrc)
[1] "data.frame"
> class(spx)
[1] "xts" "zoo"
> ttrc$diff = diff(ttrc$Close)
Error in `$<-.data.frame`(`*tmp*`, "diff", value = c(0.0299999999999998, :
replacement has 5549 rows, data has 5550
> spx$diff = diff(Cl(spx))
> head(spx, 2)
GSPC.Open GSPC.High GSPC.Low GSPC.Close GSPC.Volume GSPC.Adjusted
1970-01-02 92.06 93.54 91.79 93.00 8050000 93.00
1970-01-05 93.00 94.25 92.53 93.46 11490000 93.46
diff
1970-01-02 NA
1970-01-05 0.46 I'm all for leaving For
> spx$lagPOS = lag(Cl(spx))
> spx$lagNEG = lag(Cl(spx), k=-1)
> head(spx[,c(4,7:9)],3)
GSPC.Close diff lagPOS lagNEG
1970-01-02 93.00 NA NA 93.46
1970-01-05 93.46 0.46 93.00 92.82
1970-01-06 92.82 -0.64 93.46 92.63 |
That seems at odds with what everything else does, so while it's convenient, I think it would be ill-advised to change diff to behave the way only the xts-specific diff does. What does |
> foo = diff(Cl(spx))
> head(foo, 2)
GSPC.Close
1970-01-02 NA
1970-01-05 0.46 |
Admittedly, R has sort of a patchwork approach to > ttrc$lagPOS = lag(ttrc$Close)
> ttrc$lagNEG = lag(ttrc$Close, k=-1)
> ttrc$LagPOS = Lag(ttrc$Close)
> ttrc$LagNEG = Lag(ttrc$Close, k=-1)
Error in FUN(X[[1L]], ...) : k must be a non-negative integer
> head(ttrc, 2)
Date Open High Low Close Volume lagPOS lagNEG Lag.1
1 1985-01-02 3.18 3.18 3.08 3.08 1870906 3.08 3.08 NA
2 1985-01-03 3.09 3.15 3.09 3.11 3099506 3.11 3.11 3.08 so |
I thought the best idea was to let |
That seems pretty reasonable to me. |
I'm confused here. If we're going to keep diff for non-AbstractDataVecs, No? On Tue, Dec 18, 2012 at 9:05 AM, Stefan Karpinski
|
Actually, I think @milktrader is right – just add a |
Not sure where this code is inserted. I'd be happy to try it out. From an
|
Please add optional lags at the end of operators.jl. -- John On Jan 4, 2013, at 12:59 PM, milktrader notifications@github.com wrote:
|
Okay, I've checked out a branch called padding and will investigate how this will work. Thanks for the file name, good start. |
I'm still looking for the elegant solution, but I have hacked out an interim solution in the meantime. julia> spx = read_stock("data/spx.csv");
julia> head(spx, 3)
3x7 DataFrame:
Date Open High Low Close Volume Adj Close
[1,] 1970-01-02 92.06 93.54 91.79 93.0 8050000 93.0
[2,] 1970-01-05 93.0 94.25 92.53 93.46 11490000 93.46
[3,] 1970-01-06 93.46 93.81 92.13 92.82 11460000 92.82
julia> wat = uoo(spx, 4, "Low");
julia> head(wat)
6x8 DataFrame:
Date Open High Low Close Volume Adj Close ma.4
[1,] 1970-01-02 92.06 93.54 91.79 93.0 8050000 93.0 NA
[2,] 1970-01-05 93.0 94.25 92.53 93.46 11490000 93.46 NA
[3,] 1970-01-06 93.46 93.81 92.13 92.82 11460000 92.82 NA
[4,] 1970-01-07 92.82 93.38 91.93 92.63 10010000 92.63 92.095
[5,] 1970-01-08 92.63 93.47 91.99 92.68 10670000 92.68 92.145
[6,] 1970-01-09 92.68 93.25 91.82 92.4 9380000 92.4 91.9675 The |
I have a working solution to the original problem and wonder if we should close this long-winded issue and open a feature request for My solution takes care of the padding inside the function. Any refactor or other tips are welcome. |
Please do close this issue. We can iterate on your solution in a separate issue. |
Reincarnated as Padding with NAs (needs a feature request label) |
Using
diff
with aVector
:This is the behavior we expect. But the goal of
DataVec
(from the docs as I understand it) is to insert anNA
in the first row and return a 6-element Float64 DataVec.So two issues. 1) we don't get the
NA
s. 2) we get floating point rounding errors.What we really would like to see is this:
The text was updated successfully, but these errors were encountered: