You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is there appetite to change the API for TSFrame so it stores the name of the index column, preserving the source dataframe, rather than replacing the column with a new named Index even when user specified?
For context, I'm building a time series system with streaming and batch APIs. In my system the user defines schemas for their time series, these schemas include the time field/column, and preserving the names of fields/columns throughout consistently is important for my use case. The current TSFrame API makes that awkward and I don't want to let the TSFrames column name override govern downstream design and naming decisions.
At a more fundamental level what I would expect TSFrame to be is a pure semantic layer that verifies time ordering of rows in dataframes, guaranteeing that invariant to functions operating on time series, without changing the underlying data the way it currently does.
Now that the design is burned in, I appreciate it may not be possible to change it without breaking assumptions in dependent code, but I thought asking is worth it.
The text was updated successfully, but these errors were encountered:
I do appreciate the design choice of having the user define the date-time (sorting/matching) column but this is one of those assumptions (having Index as the index column) which provides certainty and somewhat easier maintenance of the TSFrames functions.
One can have:
struct TSFrame
coredata ::DataFrame
Index ::Stringend
The constructors can default to the name Index in absence of a provided index column (the current behaviour).
Having said that, a lot of code will need to change, and, yes, many other assumptions will also need to be thought about again.
Meanwhile, would it to be possible for your package to compose with a TSFrame and an index string in the package struct? Would that solve your immediate problem?
In my use case a lot of the end processing happens on the underlying data frame (coredata) directly, so that's the crux of the issue. I need to preserve the column names in these. For now I'm going with plain DataFrame objects, and in the future we'll either develop our own time series wrapper, or see if TSFrames can move towards an API that doesn't touch the underlying data.
I understand. As I said, it will be useful to have this flexibility in the package. I will keep this issue open for now, open for someone to pick it up, submit a PR.
Is there appetite to change the API for
TSFrame
so it stores the name of the index column, preserving the source dataframe, rather than replacing the column with a new namedIndex
even when user specified?For context, I'm building a time series system with streaming and batch APIs. In my system the user defines schemas for their time series, these schemas include the time field/column, and preserving the names of fields/columns throughout consistently is important for my use case. The current
TSFrame
API makes that awkward and I don't want to let theTSFrames
column name override govern downstream design and naming decisions.At a more fundamental level what I would expect
TSFrame
to be is a pure semantic layer that verifies time ordering of rows in dataframes, guaranteeing that invariant to functions operating on time series, without changing the underlying data the way it currently does.Now that the design is burned in, I appreciate it may not be possible to change it without breaking assumptions in dependent code, but I thought asking is worth it.
The text was updated successfully, but these errors were encountered: