Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider storing index column name rather than fixing to Index #201

Open
ancapdev opened this issue May 21, 2024 · 3 comments
Open

Consider storing index column name rather than fixing to Index #201

ancapdev opened this issue May 21, 2024 · 3 comments

Comments

@ancapdev
Copy link

Is there appetite to change the API for TSFrame so it stores the name of the index column, preserving the source dataframe, rather than replacing the column with a new named Index even when user specified?

For context, I'm building a time series system with streaming and batch APIs. In my system the user defines schemas for their time series, these schemas include the time field/column, and preserving the names of fields/columns throughout consistently is important for my use case. The current TSFrame API makes that awkward and I don't want to let the TSFrames column name override govern downstream design and naming decisions.

At a more fundamental level what I would expect TSFrame to be is a pure semantic layer that verifies time ordering of rows in dataframes, guaranteeing that invariant to functions operating on time series, without changing the underlying data the way it currently does.

Now that the design is burned in, I appreciate it may not be possible to change it without breaking assumptions in dependent code, but I thought asking is worth it.

@chiraganand
Copy link
Member

I do appreciate the design choice of having the user define the date-time (sorting/matching) column but this is one of those assumptions (having Index as the index column) which provides certainty and somewhat easier maintenance of the TSFrames functions.

One can have:

struct TSFrame
  coredata :: DataFrame
  Index :: String
end

The constructors can default to the name Index in absence of a provided index column (the current behaviour).

Having said that, a lot of code will need to change, and, yes, many other assumptions will also need to be thought about again.

Meanwhile, would it to be possible for your package to compose with a TSFrame and an index string in the package struct? Would that solve your immediate problem?

@ancapdev
Copy link
Author

ancapdev commented Jun 7, 2024

Hi, thanks for replying.

In my use case a lot of the end processing happens on the underlying data frame (coredata) directly, so that's the crux of the issue. I need to preserve the column names in these. For now I'm going with plain DataFrame objects, and in the future we'll either develop our own time series wrapper, or see if TSFrames can move towards an API that doesn't touch the underlying data.

@chiraganand
Copy link
Member

I understand. As I said, it will be useful to have this flexibility in the package. I will keep this issue open for now, open for someone to pick it up, submit a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants