Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add an option to define the time column in the query API #671

Closed
nitisht opened this issue Feb 22, 2024 · 0 comments
Closed

feat: add an option to define the time column in the query API #671

nitisht opened this issue Feb 22, 2024 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@nitisht
Copy link
Member

nitisht commented Feb 22, 2024

In many cases users may want to query their data based on a different time column (instead of default time column added by Parseable server p_timestamp).

We'll need to extend the ingestion API to define the time column so that catalog etc are created based on this column.
Then query can use this time column for the startTime and endTime entries.

This concept may be extended further to allow configurable partitions for a given stream. This will allow users to store data partitioned on a specific column over a given time period.

@nitisht nitisht added the enhancement New feature or request label Feb 22, 2024
@nikhilsinhaparseable nikhilsinhaparseable self-assigned this Mar 7, 2024
nikhilsinhaparseable added a commit to nikhilsinhaparseable/parseable that referenced this issue Apr 20, 2024
…seablehq#683)

This PR adds enhancement to use a user provided timestamp for partition 
in ingesting logs instead of using server time.

User needs to add custom header X-P-Time-Partition (optional) at stream 
creation api to allow ingestion/query using timestamp column from the 
log data instead of server time p_timestamp

This is time_partition field name is stored in stream.json and in memory 
STREAM_INFO in ingest api. Server checks if timestamp column name exists in 
the log event, if not, throw exception. Also, checks if timestamp value can be 
parsed into datetime, if not, throw exception arrow file name gets the date, 
hr, mm from the timestamp field (if defined in stream) else file name gets 
the date, hr, mm from the server time parquet file name gets a random 
number attached to it. This is because a lot of log data can have same 
date, hr, mm value of the timestamp field and with this random number, 
parquet will not get overwritten in the console, query from and to date will 
be matched against the value of the timestamp column of the log data (if 
defined in the stream), else from and to date will be matched against the 
p_timestamp column.

Fixes parseablehq#671 
Fixes parseablehq#685
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants