Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On-demand generation of current full dataset for download #1029

Open
cbaksik opened this issue Jan 22, 2024 · 0 comments
Open

On-demand generation of current full dataset for download #1029

cbaksik opened this issue Jan 22, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@cbaksik
Copy link

cbaksik commented Jan 22, 2024

We would like to be able to download a full dump of data that represents the current data set in POD for a given stream. In other words, a download that contains all of our unique records in their latest state, excluding any records deleted since the stream was created.

Currently, to get this data, we would have to download the last full that we sent to POD, as well as any incremental and delete files since then, and integrate them all together locally. We would like to avoid this, and instead have an option in the POD UI (or via API) to do this for us on demand, so that we can fetch a single full file of our current data set.

We would not expect to use this feature frequently, so there is no expectation that the file needs to be generated quickly (e.g. under 24 hours). If the generation takes more than 24 hours, that is fine. Ideally, the system would send us a notification when the file was available, but that's not part of the requirement. We can just check periodically to see if it's done.

We would like the feature to be available to any of our users. It doesn't have to be restricted to Admins. It should be clear in the UI when the last full was generated, and/or whether one is currently being generated, so that a user doesn't initiate the process if it's been done recently, or is currently in progress. Alternatively, the UI piece could be a later enhancement, in which case the option to generate a full file should be limited to Admins, or be limited to a request via API (admin or any user) to make it less likely that someone would do it by accident.

We're interested in using this feature for our own data. If the feature would allow other partners to generate a new full set of our data, this is fine too.

USE CASE:

We have a project partner (unrelated to POD) and would like to supply them with a recent full set of our data. The last full dump we sent to POD as a new stream was in June 2022. We send daily incremental and deletes to POD, so we have sent hundreds of files since mid-2022. We would like to use POD to generate a full set of current data, without having to go through the process of incorporating those hundreds of incremental and deletes into the full file from mid-2022.

@cbaksik cbaksik added the enhancement New feature or request label Jan 22, 2024
@cbaksik cbaksik changed the title On-demand generation of current full file for download On-demand generation of current full dataset for download Jan 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant