Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pairwise functions #123

Closed
milktrader opened this issue Dec 17, 2012 · 36 comments
Closed

pairwise functions #123

milktrader opened this issue Dec 17, 2012 · 36 comments

Comments

@milktrader
Copy link

Using diff with a Vector:

julia> v
6-element Float64 Array:
 44.38
 44.78
 44.95
 44.75
 45.05
 45.29

julia> diff(v)
5-element Float64 Array:
  0.4 
  0.17
 -0.2 
  0.3 
  0.24

This is the behavior we expect. But the goal of DataVec (from the docs as I understand it) is to insert an NA in the first row and return a 6-element Float64 DataVec.

julia> dv = DataVec(v)
6-element Float64 DataVec
 44.38
 44.78
 44.95
 44.75
 45.05
 45.29

julia> diff(dv)
5-element Float64 DataVec
 0.3999999999999986
 0.1700000000000017
 -0.20000000000000284
 0.29999999999999716
 0.240000000000002

So two issues. 1) we don't get the NAs. 2) we get floating point rounding errors.

What we really would like to see is this:

julia> diff(dv)
6-element Float64 DataVec:
  NA
  0.4 
  0.17
 -0.2 
  0.3 
  0.24
@johnmyleswhite
Copy link
Contributor

While I'm a little concerned about (2), I'm not sure I agree about (1). Why should DataVec's version of diff always put an NA at the the front? And why would it do that in spite of the different definition for Vector's? That seems kind of confusing.

Have you examined the low-level bits of diff(dv)[1] and diff(v)[1]? I'm not totally sure that these are floating point errors as much as a dissimilarity in how DataVec's are being printed relative to Vector's.

@milktrader
Copy link
Author

#1 is from this in the documentation:

Functions that operate on pairs of entries of a Vector work on DataVec’s and insert NA where it would be produced by other operator rules:

Based on how I read this I was expecting an insertion of NA in the front, similar to behavior of how R treats this. For example, a 5-day moving average doesn't make sense for the first four days, so you insert an NA there. This acts as a placeholder much as anything to ensure the DataVec of 5-day moving averages is the same length as the original DataVec that is being averaged.

#2, I didn't think to look at low-level bits (and don't know how to quite frankly) but will check into this. Ideally though we'd like to print something neat.

@HarlanH
Copy link
Contributor

HarlanH commented Dec 17, 2012

I do agree that diff(dv) should produce a vector of the same length as the
input, at least by default. That's a small change worth doing.

And I agree with John, that the printing thing is almost certainly a
printing difference, not a rounding error. Julia's printing routines are
much more likely to give you a bunch of digits than R's are. I believe that
the standard routines for floating point display generally give you enough
digits so that if you were to read the decimal back into a floating point,
you'd get the same bits. R doesn't have that philosophy.

On Mon, Dec 17, 2012 at 12:11 AM, milktrader notifications@gh.neting.ccwrote:

#1 #1 is from this in
the documentation:

Functions that operate on pairs of entries of a Vector work on DataVec’s and insert NA where it would be produced by other operator rules:

Based on how I read this I was expecting an insertion of NA in the front,
similar to behavior of how R treats this. For example, a 5-day moving
average doesn't make sense for the first four days, so you insert an NAthere.

#2 #2, I didn't think to
look at low-level bits (and don't know how to quite frankly) but will check
into this. Ideally though we'd like to print something neat.


Reply to this email directly or view it on GitHubhttps://github.com//issues/123#issuecomment-11430686.

@milktrader
Copy link
Author

R gives you NAs for free

> NAfive = matrix(c(NA,2,3,4,5))
> NAfive
     [,1]
[1,]   NA
[2,]    2
[3,]    3
[4,]    4
[5,]    5
> SMA(NAfive,3)
[1] NA NA NA  3  4

Duplicating this in Juia, with a little dance around getting NAs into a vector ...

julia> five = DataVec([NA,2,3,4,5])
no promotion exists for NAtype and Int64
 in promote_type at promotion.jl:14
 in promote_type at promotion.jl:8
 in cat at abstractarray.jl:655
 in vcat at abstractarray.jl:668

julia> five = DataVec([1,2,3,4,5])
5-element Int64 DataVec
 1
 2
 3
 4
 5

julia> NAfive = five;

julia> NAfive[1] = NA
NA

julia> NAfive
5-element Int64 DataVec
 NA
 2
 3
 4
 5

julia> moving_average(NAfive,3)
3-element Any Array:
  NA
 3.0
 4.0

SMA (a moving average function in TTR) returns an equal length matrix while applying the my moving_average function (basically a one-liner) in Julia results in a truncated array.

//Need to understand Julia's printing routines

@HarlanH
Copy link
Contributor

HarlanH commented Dec 17, 2012

Yep, same deal. It does seem to me that we should change those defaults to
match R.

One note on the constructor. This should work:

five = DataVec[NA,2,3,4,5]

Note no parens. There's a cute trick with referencing into types that lets
you use them as constructors... See datavec.jl:138 for the code...

On Mon, Dec 17, 2012 at 8:32 AM, milktrader notifications@gh.neting.ccwrote:

R gives you NAs for free

NAfive = matrix(c(NA,2,3,4,5))> NAfive
[,1][1,] NA[2,] 2[3,] 3[4,] 4[5,] 5> SMA(NAfive,3)[1] NA NA NA 3 4

Duplicating this in Juia, with a little dance around getting NAs into a
vector ...

julia> five = DataVec([NA,2,3,4,5])no promotion exists for NAtype and Int64
in promote_type at promotion.jl:14
in promote_type at promotion.jl:8
in cat at abstractarray.jl:655
in vcat at abstractarray.jl:668
julia> five = DataVec([1,2,3,4,5])5-element Int64 DataVec
1
2
3
4
5
julia> NAfive = five;
julia> NAfive[1] = NANA
julia> moving_average(NAfive,3)3-element Any Array:
NA
3.0
4.0

SMA (a moving average function in TTR) returns an equal length matrix
while applying the moving_average function in Julia results in a truncated
array.


Reply to this email directly or view it on GitHubhttps://github.com//issues/123#issuecomment-11441536.

@milktrader
Copy link
Author

nice constructor tip, thanks!

@johnmyleswhite
Copy link
Contributor

I'm on the train, so I may take time to respond, but I want to voice my disagreement again. To me it's much important that behavior within Julia is consistent that that we seek agreement with R. I think it's really confusing if diff behaves so differently depending on the type of the inputs. Imagine that you test out a looping algorithm using vectors. You try to make it work with DataVec's and suddenly it's broken? That seems really terrible to me.

In general, my priorities in order are:

  • Consistency: The transition between a Data* object and * object should be seamless. diff(DataVec[1, 2]) and diff(failNA(DataVec[1, 2])) should not have different semantics.
  • Precedent: When Julia doesn't enforce certain behaviors, strive to emulate the majority opinion of previous programming languages. For DataFrames, that's often just the vote of R, but we also need to think about Matlab, Python, Ruby, etc.

The consistency argument also answers the printing question for me: DataVec's should print like vectors.

@HarlanH
Copy link
Contributor

HarlanH commented Dec 17, 2012

I see your point. This is a case for options, maybe. How about:

diff(DataVec[1,2], @options preserve_length=True) # defaults to False

When writing looping algorithms, yes, you'd want diff(DV) to act like
diff(Vector), but when writing vectorized operations on DataFrames, you
really, really want the operations to return same-length objects, with NAs
in the correct places.

On Mon, Dec 17, 2012 at 9:02 AM, John Myles White
notifications@gh.neting.ccwrote:

I'm on the train, so I may take time to respond, but I want to voice my
disagreement again. To me it's much important that behavior within Julia is
consistent that that we seek agreement with R. I think it's really
confusing if diff behaves so differently depending on the type of the
inputs. Imagine that you test out a looping algorithm using vectors. You
try to make it work with DataVec's and suddenly it's broken? That seems
really terrible to me.

In general, my priorities in order are:

  • Consistency: The transition between a Data* object and * object
    should be seamless. diff(DataVec[1, 2]) and diff(failNA(DataVec[1, 2]))should not have different semantics.
  • Precedent: When Julia doesn't enforce certain behaviors, strive to
    emulate the majority opinion of previous programming languages. For
    DataFrames, that's often just the vote of R, but we also need to think
    about Matlab, Python, Ruby, etc.

The consistency argument also answers the printing question for me:
DataVec's should print like vectors.


Reply to this email directly or view it on GitHubhttps://github.com//issues/123#issuecomment-11442505.

@johnmyleswhite
Copy link
Contributor

@HarlanH Now we're cooking with fire! I'm always up for options. I'll try to get to making that change soon.

@milktrader, I'd like to rewrite the manual so that there's no ambiguity. Would the following be better?

  • Several functions like diff and sma operate on small windows of a DataVec. By default these functions call the same operation on the underlying vector and then insert NA's where they would be induced by the rules for NA arithmetic. If you specify the option @options preserve_length=True these functions will left-pad the results with NA's to preserve length.

Thinking about the consistency argument more, my general principle is this: functions on DataVec's should, by default, behave as if you had called the standard Julia function on the inputs in a way that obeys the NA arithmetic rules. There's no NA to produce an NA for diff by default. Preserve length essentially imputes missing entries as NA's at the start of the vector and then does standard arithmetic.

@milktrader
Copy link
Author

Sounds like a great solution. I was willing to introduce NAs at the function level but that's definitely not ideal.

I can see the point about consistency with Julia is primary.

Btw, is there an sma function?

Here's my version:

function moving_average(x,n)
  [sum(x[i:i+(n-1)])/n for i=1:length(x)-(n-1)]
end

@johnmyleswhite
Copy link
Contributor

I don't think there is. I'm always tempted to try to get those functions added to Julia, but the core team is trying to keep the core language small. The reason I mention that is that my general strategy is to define something like sma on Vector{T} first, then induce a definition for DataVec{T} using a macro. I'll work a bit on your draft function and try to incorporate it.

Regarding our earlier conversation, you can check out the raw bits using the bits function:

load("DataFrames"); using DataFrames

v = [44.38, 44.78, 44.95, 44.75, 45.05, 45.29]
dv = DataVec(v)

bits(diff(v)[1])
bits(diff(dv)[1])

As suspected, the real bug isn't the function diff, but the fact that we're not emulating Julia's approach to printing vectors correctly. I'll also get to fixing that soon.

@milktrader
Copy link
Author

Aha, thanks for the raw bits information.

I'm planning a technical analysis package that would have functions like sma in it. I don't fully comprehend and appreciate the type scaffolding built around functions, yet. I know it contributes quite a bit to speed and efficiency and is worth learning.

@StefanKarpinski
Copy link
Member

I want to point out that diff and moving average are fundamentally
different. Why does diff produce an NA at the front, not the back? There's
no real reason. It's just because the the diff values are arbitrarily
mapped onto the original indices that way. Really, the indices of the diff
correspond to the spaces between values in the original vector, and there
are n-1 of them, not n. With moving average, on the other hand, you map
each value to the average of it and its neighbors. In particular, there is
a natural association of averages to the original vector's indices, and at
the fringes of the vector – at both the front and the back – you may
not have enough data to compute that linear combination, so NA may be a
good conservative value to use (taking an average of the available values
may be a good choice too).

I also want to +1 JMW's priorities for DataFrames.

On Monday, December 17, 2012, milktrader wrote:

Aha, thanks for the raw bits information.

I'm planning a technical analysis package that would have functions like
sma in it. I don't fully comprehend and appreciated the type scaffolding
built around functions, yet. I know it contributes quite a bit to speed and
efficiency and is worth learning.


Reply to this email directly or view it on GitHubhttps://github.com//issues/123#issuecomment-11449167.

@StefanKarpinski
Copy link
Member

On Monday, December 17, 2012, Harlan Harris wrote:

I do agree that diff(dv) should produce a vector of the same length as the
input, at least by default. That's a small change worth doing.

I have to agree with John, this strikes me as a really weird behavior.

And I agree with John, that the printing thing is almost certainly a
printing difference, not a rounding error. Julia's printing routines are
much more likely to give you a bunch of digits than R's are. I believe
that
the standard routines for floating point display generally give you enough
digits so that if you were to read the decimal back into a floating point,
you'd get the same bits. R doesn't have that philosophy.

This may actually be a technology issue that became baked in: until the
publication of the Grisu algorithm, I don't think anyone knew how to do
guaranteed minimal float printing efficiently. Certainly when R was created
no one knew how to do it and just printing a fixed number of digits with
rounding was standard.

@StefanKarpinski
Copy link
Member

Adding new things to Base is ok, but there's definitely tension between wanting to keep is small and wanting to have lots of useful stuff just available. It's easier to add things later than get rid of them, so we're biased towards conservatism. One nice thing about the way using works is that you can safely add new exports without breaking other people's code.

@HarlanH
Copy link
Contributor

HarlanH commented Dec 17, 2012

Maintaining length without a lot of counting is pretty important for
working with columns in DataFrames. I think we should support it easily, if
not by default. It's weird mathematically, yes, but it's not weird
practically.

On Mon, Dec 17, 2012 at 3:13 PM, Stefan Karpinski
notifications@gh.neting.ccwrote:

On Monday, December 17, 2012, Harlan Harris wrote:

I do agree that diff(dv) should produce a vector of the same length as
the
input, at least by default. That's a small change worth doing.

I have to agree with John, this strikes me as a really weird behavior.

And I agree with John, that the printing thing is almost certainly a
printing difference, not a rounding error. Julia's printing routines are
much more likely to give you a bunch of digits than R's are. I believe
that
the standard routines for floating point display generally give you
enough
digits so that if you were to read the decimal back into a floating
point,
you'd get the same bits. R doesn't have that philosophy.

This may actually be a technology issue that became baked in: until the
publication of the Grisu algorithm, I don't think anyone knew how to do
guaranteed minimal float printing efficiently. Certainly when R was created
no one knew how to do it and just printing a fixed number of digits with
rounding was standard.


Reply to this email directly or view it on GitHubhttps://github.com//issues/123#issuecomment-11459171.

@StefanKarpinski
Copy link
Member

If you view diff as doing a moving average with weights [-1,1,0] then putting the NA up front makes perfect sense, but I think that's at odds with how Matlab views diff. Maybe different functions make sense? A flexible mva function that takes a centered window of weights or just a number?

@johnmyleswhite
Copy link
Contributor

Just to be clear, this is R's diff:

> diff(c(1, 2, 3, 4))
[1] 1 1 1

So I'm thinking that we should use an entirely different function name for the desired behavior.

@StefanKarpinski
Copy link
Member

John and I were speculating last night that R might supply the NAs if you apply diff to a data frame, but I can't get that to work:

> df = data.frame(foo=c(2,3,1))
> diff(df)
data frame with 0 columns and 3 rows
> diff(df$foo)

Admittedly my R is very rusty these days (it's hard to believe there was even a time it was pretty good), but am I missing something there? Are there situations where R does actually supply the NAs for you? I also note that diff with negative lag isn't defined:

> diff(v,lag=-1)
Error in diff.default(v, lag = -1) : 
  'lag' and 'differences' must be integers >= 1

Matlab does the same thing. Unfortunately, however, Matlab takes a second argument to diff to mean something rather different than R: in R diff(v,n) means take the diff with a lag of n indices; in Matlab, it means take the nth order differences, which essentially means apply the diff function n times and give the result of that. Even worse, the size of the resulting value is the same, but they produces very different results.

I would propose a deltas(v,k) function that has a slightly different conceptual definition:

[ 1 <= i-k <= length(v) ? v[i]-v[i-k] : NA for i=1:length(v) ]

This definition kind of makes me think that it would be very convenient for a lot of things if indexing off the end of a DataVec or DataFrame returned NAs. With that behavior, that could be written as just:

[ v[i]-v[i-k] for i=1:length(v) ]

which is simple enough that it almost doesn't even merit its own function.

@HarlanH
Copy link
Contributor

HarlanH commented Dec 18, 2012

I'm down with calling it deltas() and using the second argument for lag and
making the output DataVec the same length as the input.

Would it make sense to have a version for non-DataVecs that returns a
Vector{Float64} and pads with NaN?

Would it make sense to have another function with the Matlab n-th order
dfiference semantics?

On Tue, Dec 18, 2012 at 6:53 AM, Stefan Karpinski
notifications@gh.neting.ccwrote:

John and I were speculating last night that R might supply the NAs if you
apply diff to a data frame, but I can't get that to work:

df = data.frame(foo=c(2,3,1))> diff(df)data frame with 0 columns and 3 rows> diff(df$foo)

Admittedly my R is very rusty these days (it's hard to believe there was
even a time it was pretty good), but am I missing something there? Are
there situations where R does actually supply the NAs for you? I also note
that diff with negative lag isn't defined:

diff(v,lag=-1)Error in diff.default(v, lag = -1) :
'lag' and 'differences' must be integers >= 1

Matlab does the same thing. Unfortunately, however, Matlab takes a second
argument to diff to mean something rather different than R: in R diff(v,n)means take the diff with a lag of
n indices; in Matlab, it means take the nth order differences, which
essentially means apply the diff function n times and give the result of
that. Even worse, the size of the resulting value is the same, but they
produces very different results.

I would propose a deltas(v,k) function that has a slightly different
conceptual definition:

[ 1 <= i-k <= length(v) ? v[i]-v[i-k] : NA for i=1:length(v) ]

This definition kind of makes me think that it would be very convenient
for a lot of things if indexing off the end of a DataVec or DataFrame
returned NAs. With that behavior, that could be written as just:

[ v[i]-v[i-k] for i=1:length(v) ]

which is simple enough that it almost doesn't even merit its own function.


Reply to this email directly or view it on GitHubhttps://github.com//issues/123#issuecomment-11483068.

@StefanKarpinski
Copy link
Member

I think that diff should probably not take any second argument but use options to take lag and order options. That way neither Matlab nor R users will have to go through the pain of discovering that diff doesn't do what they think it does, but will instead be forced to read the docs. For now I think deltas can be data frame/vec only. If there's demand for it elsewhere, we can port it into base and do the NaN thing you're suggesting.

@milktrader
Copy link
Author

hmm, I play more with xts than data.frame. But you can see that diff is rejected on a data.frame but xts does something internal to make sense of the computation.

> class(ttrc)
[1] "data.frame"
> class(spx)
[1] "xts" "zoo"
> ttrc$diff = diff(ttrc$Close)
Error in `$<-.data.frame`(`*tmp*`, "diff", value = c(0.0299999999999998,  : 
  replacement has 5549 rows, data has 5550
> spx$diff = diff(Cl(spx))
> head(spx, 2)
           GSPC.Open GSPC.High GSPC.Low GSPC.Close GSPC.Volume GSPC.Adjusted
1970-01-02     92.06     93.54    91.79      93.00     8050000         93.00
1970-01-05     93.00     94.25    92.53      93.46    11490000         93.46
           diff
1970-01-02   NA
1970-01-05 0.46

I'm all for leaving diff alone and let it behave the way everyone expects.

For DataFrame and my dream type TimeSeries, I think end users would be fine with not using diff but instead calling something like deltas

lag does provide positive and negative direction

> spx$lagPOS = lag(Cl(spx))
> spx$lagNEG = lag(Cl(spx), k=-1)
> head(spx[,c(4,7:9)],3)
           GSPC.Close  diff lagPOS lagNEG
1970-01-02      93.00    NA     NA  93.46
1970-01-05      93.46  0.46  93.00  92.82
1970-01-06      92.82 -0.64  93.46  92.63

@StefanKarpinski
Copy link
Member

That seems at odds with what everything else does, so while it's convenient, I think it would be ill-advised to change diff to behave the way only the xts-specific diff does. What does diff(Cl(spx)) return in that code? It's possible that the NA is introduced upon assignment.

@milktrader
Copy link
Author

> foo = diff(Cl(spx))
> head(foo, 2)
           GSPC.Close
1970-01-02         NA
1970-01-05       0.46

@milktrader
Copy link
Author

Admittedly, R has sort of a patchwork approach to lag. There is another function called Lag. Behavior is not intuitive.

> ttrc$lagPOS = lag(ttrc$Close)
> ttrc$lagNEG = lag(ttrc$Close, k=-1)
> ttrc$LagPOS = Lag(ttrc$Close)
> ttrc$LagNEG = Lag(ttrc$Close, k=-1)
Error in FUN(X[[1L]], ...) : k must be a non-negative integer
> head(ttrc, 2)
        Date Open High  Low Close  Volume lagPOS lagNEG Lag.1
1 1985-01-02 3.18 3.18 3.08  3.08 1870906   3.08   3.08    NA
2 1985-01-03 3.09 3.15 3.09  3.11 3099506   3.11   3.11  3.08

so lag on a data.frame fails silently (doesn't do anything but return the value passed). Lag works with padding but it doesn't work in both directions.

@milktrader
Copy link
Author

I thought the best idea was to let diff behave the way everyone expects but to offer an option to pad with NAs to enforce same-length.

@StefanKarpinski
Copy link
Member

That seems pretty reasonable to me.

@HarlanH
Copy link
Contributor

HarlanH commented Dec 18, 2012

I'm confused here. If we're going to keep diff for non-AbstractDataVecs,
then it should presumably not pad by default (but perhaps pad with NaN
optionally). But deltas will be ADV-only, have options for lag and order
and pad (true by default).

No?

On Tue, Dec 18, 2012 at 9:05 AM, Stefan Karpinski
notifications@gh.neting.ccwrote:

That seems pretty reasonable to me.


Reply to this email directly or view it on GitHubhttps://github.com//issues/123#issuecomment-11487089.

@StefanKarpinski
Copy link
Member

Actually, I think @milktrader is right – just add a pad=[:none|:top|:bottom] option to the DataFrame and DataVec diff methods which pads with NAs if set.

@milktrader
Copy link
Author

Not sure where this code is inserted. I'd be happy to try it out. From an ack diff -a search I get the following (many obviously not relevant).

  1 dataframe.jl:1012:    newcols = _setdiff([1:ncol(df)], icols)
  2 dataframe.jl:1124:    # TODO fix PooledDataVec columns with different pools.
  3 dataframe.jl:1163:#     # TODO fix PooledDataVec columns with different pools.
  4 dataframe.jl:1465:    remainingcols = _setdiff([1:ncol(df)], icols)
  5 dataframe.jl:1477:    remainingcols = _setdiff([1:ncol(df)], [ikey, ivalue])
  6 DataFrames.jl:99:       Base.diff,
  7 DataFrames.jl:349:       reldiff,
  8 indexing.jl:36:# 4`. If Indexers have different IndexedVectors (like `idv1 .== 1 |
  9 operators.jl:58:pairwise_vector_operators = [:diff, :reldiff, :percent_change]
 10 operators.jl:1019:# * If missingness differs, underlying values are irrelevant
 11 statistics.jl:1:# This is multiplicative analog of diff
 12 statistics.jl:2:function reldiff{T}(v::Vector{T})
 13 utils.jl:14:function _setdiff(a::Vector, b::Vector)
 14 utils.jl:25:## setdiff(a::Vector, b::Vector) = elements(Set(a...) - Set(b...))

@johnmyleswhite
Copy link
Contributor

Please add optional lags at the end of operators.jl.

-- John

On Jan 4, 2013, at 12:59 PM, milktrader notifications@github.com wrote:

Not sure where this code is inserted. I'd be happy to try it out. From an ack diff -a search I get the following (many obviously not relevant).

1 dataframe.jl:1012: newcols = _setdiff([1:ncol(df)], icols)
2 dataframe.jl:1124: # TODO fix PooledDataVec columns with different pools.
3 dataframe.jl:1163:# # TODO fix PooledDataVec columns with different pools.
4 dataframe.jl:1465: remainingcols = _setdiff([1:ncol(df)], icols)
5 dataframe.jl:1477: remainingcols = _setdiff([1:ncol(df)], [ikey, ivalue])
6 DataFrames.jl:99: Base.diff,
7 DataFrames.jl:349: reldiff,
8 indexing.jl:36:# 4. If Indexers have different IndexedVectors (likeidv1 .== 1 |
9 operators.jl:58:pairwise_vector_operators = [:diff, :reldiff, :percent_change]
10 operators.jl:1019:# * If missingness differs, underlying values are irrelevant
11 statistics.jl:1:# This is multiplicative analog of diff
12 statistics.jl:2:function reldiff{T}(v::Vector{T})
13 utils.jl:14:function _setdiff(a::Vector, b::Vector)
14 utils.jl:25:## setdiff(a::Vector, b::Vector) = elements(Set(a...) - Set(b...))

Reply to this email directly or view it on GitHub.

@milktrader
Copy link
Author

Okay, I've checked out a branch called padding and will investigate how this will work. Thanks for the file name, good start.

@milktrader
Copy link
Author

I'm still looking for the elegant solution, but I have hacked out an interim solution in the meantime.

julia> spx  = read_stock("data/spx.csv");

julia> head(spx, 3)
3x7 DataFrame:
              Date  Open  High   Low Close   Volume Adj Close
[1,]    1970-01-02 92.06 93.54 91.79  93.0  8050000      93.0
[2,]    1970-01-05  93.0 94.25 92.53 93.46 11490000     93.46
[3,]    1970-01-06 93.46 93.81 92.13 92.82 11460000     92.82

julia> wat = uoo(spx, 4, "Low");

julia> head(wat)
6x8 DataFrame:
              Date  Open  High   Low Close   Volume Adj Close    ma.4
[1,]    1970-01-02 92.06 93.54 91.79  93.0  8050000      93.0      NA
[2,]    1970-01-05  93.0 94.25 92.53 93.46 11490000     93.46      NA
[3,]    1970-01-06 93.46 93.81 92.13 92.82 11460000     92.82      NA
[4,]    1970-01-07 92.82 93.38 91.93 92.63 10010000     92.63  92.095
[5,]    1970-01-08 92.63 93.47 91.99 92.68 10670000     92.68  92.145
[6,]    1970-01-09 92.68 93.25 91.82  92.4  9380000      92.4 91.9675

The NAs are padded by the function which is in rough shape and needs to be integrated into my module.

@milktrader
Copy link
Author

I have a working solution to the original problem and wonder if we should close this long-winded issue and open a feature request for NA padding.

My solution takes care of the padding inside the function. Any refactor or other tips are welcome.

@johnmyleswhite
Copy link
Contributor

Please do close this issue. We can iterate on your solution in a separate issue.

@milktrader
Copy link
Author

Reincarnated as Padding with NAs (needs a feature request label)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants