-
-
Notifications
You must be signed in to change notification settings - Fork 433
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WeightedIndex: Make it possible to update a subset of weights #866
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, we can add this.
return Err(WeightedError::InvalidWeight); | ||
} | ||
if i >= self.cumulative_weights.len() { | ||
return Err(WeightedError::TooMany); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be worth adding InvalidIndex
, except that it's a breaking change. Perhaps do so in a separate PR which we don't land until we start preparing the next Rand version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I though about this as well. Will do once this is merged.
src/distributions/weighted/mod.rs
Outdated
old_w -= &self.cumulative_weights[i - 1]; | ||
} | ||
|
||
for j in i..self.cumulative_weights.len() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is O(n*m)
where n = cumulative_weights.len() - min_index; m = new_weights.len()
.
Instead we should sort the new_weights
by index, then apply in-turn (like in new
); this is O(m*log(m) + n)
.
Also, we can just take total_weight = cumulative_weights.last().unwrap()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead we should sort the new_weights by index, then apply in-turn (like in new); this is O(m*log(m) + n).
I'll look into this.
Also, we can just take total_weight = cumulative_weights.last().unwrap().
I don't think so, the last cumulative weight is not stored in the vector. Or are you saying we should change it such that it is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha, binary_search_by
is happy to return an index one-past-the-last-item, therefore the final weight is not needed. (And we have motive for not including the final weight: it guarantees we will never exceed the last index of the input weights
list.)
Then yes, we need to store either the last weight or the total as an extra field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead we should sort the new_weights by index, then apply in-turn (like in new); this is O(m*log(m) + n).
I implemented that. It's a bit messy, because the the index type might be unsigned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that did get messy! Eventually I convinced myself that your implementation is probably right.
Fortunately we can clean it up a lot (at the cost of two clones and one extra subtraction per un-adjusted weight). I think for all types we care about the clones will be cheap. Granted this is probably slower than your method when only updating a small subset of many indices, but I think not hugely slow and it's still O(n+m)
.
I'll leave it to your preference to require ordered input vs sorting.
Finally, do we need two loops? Only if we care about not changing self
when given invalid parameters.
I think it is better to require sorted input, because usually it's trivial for the user to provide.
The problem is that this would result in |
I simplified the code as you suggested. The performance seems similar enough. |
Thanks; then I think this is good to go. I won't have very much time available for this for a few weeks, so I'll leave you to merge. |
@dhardy Unfortunately, I'm not authorized to merge. |
One timeout, one Redox failure. Good enough, I guess. |
This could be useful for crates like droprate.
I had to add an additional field
total_weight
toWeightedIndex
, which is redundant to the fieldweight_distribution
. However, I cannot use the latter without making the information about the end of the sampled range public.