Skip to content

Commit

Permalink
Add comment and change equality
Browse files Browse the repository at this point in the history
  • Loading branch information
JAEarly committed Feb 8, 2024
1 parent 4a85783 commit 7d3b723
Showing 1 changed file with 5 additions and 1 deletion.
6 changes: 5 additions & 1 deletion streaming/base/partition/orig.py
Original file line number Diff line number Diff line change
Expand Up @@ -81,8 +81,12 @@ def get_partitions_orig(num_samples: int,
padding = node_ratio - overflow
padded_samples_per_canonical_node = samples_per_canonical_node + padding

# For samples to be properly split across canonical nodes, there must be more samples than nodes.
# The edge case is when the number of samples is equal to the number of canonical nodes, but this only works when
# there is an equal or greater number of canonical nodes than physical nodes.
# If these conditions are not met, an alternative sampling approach is used that leads to many repeats.
if num_samples > num_canonical_nodes or (num_samples == num_canonical_nodes and
num_canonical_nodes > num_physical_nodes):
num_canonical_nodes >= num_physical_nodes):
# Create the initial sample ID matrix.
#
# ids: (canonical nodes, padded samples per canonical node).
Expand Down

0 comments on commit 7d3b723

Please sign in to comment.