Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve DFP period functionality to allow for better sampling and ignoring period #912

Merged

Conversation

mdemoret-nv
Copy link
Contributor

Description

Currently, the DFP pipeline simulates time by breaking incoming messages up by a specific period and processing each period independently. This makes it impossible to process all of the incoming data at once for batch mode with a single trained model.

This PR adds a few things:

  • If DFPFileBatcherStage.period == None, then all messages will be processed in a single batch, instead of per period
  • Fixes how periods were handled to work with counts
    • Before, "D" would work as expected but "5D" would not. This was due to using to_period
  • The DFPFileBatcherStage.sampling_rate_s property was deprecated in favor of a more general sampling property
    • This property can support different values
      • If its a string, the value is interpreted as a frequency. The first row for each frequency will be taken
      • If its a value between [0,1), its a fraction. A percentage of rows will be taken
      • If its >=1, its a count. A random count of rows will be taken

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@mdemoret-nv mdemoret-nv added non-breaking Non-breaking change improvement Improvement to existing functionality 3 - Ready for Review labels Apr 28, 2023
@mdemoret-nv mdemoret-nv requested a review from a team as a code owner April 28, 2023 00:17
@mdemoret-nv
Copy link
Contributor Author

@drobison00 Do any of the DFP modules need to be updated to match?

@mdemoret-nv
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit 446f452 into nv-morpheus:branch-23.07 May 10, 2023
@mdemoret-nv mdemoret-nv deleted the mdd_improve-dfp-period branch May 10, 2023 19:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement to existing functionality non-breaking Non-breaking change
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

2 participants