Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MRG: standardize on u32 for scaled, and introduce ScaledType #3364

Merged
merged 22 commits into from
Nov 5, 2024

Conversation

ctb
Copy link
Contributor

@ctb ctb commented Oct 26, 2024

Makes scaled a u32, and change a few others as well.

This PR started because we were mixing u32 and u64 in places, but I think a switch away from usize (architecture specific & mostly returned from collection lengths, and so on) to explicit u32/u64 seems good.

Fixes #3363

Copy link

codecov bot commented Oct 27, 2024

Codecov Report

Attention: Patch coverage is 93.33333% with 1 line in your changes missing coverage. Please review.

Project coverage is 86.45%. Comparing base (122c4b7) to head (5f49d1c).
Report is 1 commits behind head on latest.

Files with missing lines Patch % Lines
src/core/src/ffi/index/revindex.rs 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           latest    #3364      +/-   ##
==========================================
- Coverage   86.45%   86.45%   -0.01%     
==========================================
  Files         137      137              
  Lines       16090    16089       -1     
  Branches     2219     2219              
==========================================
- Hits        13911    13910       -1     
  Misses       1872     1872              
  Partials      307      307              
Flag Coverage Δ
hypothesis-py 25.43% <ø> (ø)
python 92.40% <ø> (ø)
rust 62.16% <93.33%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ctb ctb changed the title WIP: standardize on u64 for scaled, and others MRG: standardize on u64 for scaled, and others Oct 27, 2024
@ctb
Copy link
Contributor Author

ctb commented Oct 27, 2024

Ready for review @luizirber

@luizirber
Copy link
Member

This PR started because we were mixing u32 and u64 in places, but I think a switch away from usize (architecture specific & mostly returned from collection lengths, and so on) to explicit u32/u64 seems good.

Initially u32 started to get into the API because wasm is 32-bits nowadays, so some function calls were easier to set up across JS/Rust. But also based on the scaled values we actually use (1 -> ~100k) u64 is way too big, u16 is too small, so... u32 it is?

(ksize should probably be u16, has anyone ever used k > 65535? 😹 )

I would have no problem standardizing scaled on u32, too. Just want it to be one type of number :).

Maybe we can do a type alias like HashIntoType for scaled:

type HashIntoType = u64;

would likely still need some conversions but can be done with as HashIntoType.
(And easier to play out with changing the type too and evaluating consequences, instead of having to fix everywhere in the codebase)

@ctb
Copy link
Contributor Author

ctb commented Oct 28, 2024

sounds good!

@ctb ctb changed the title MRG: standardize on u64 for scaled, and others MRG: standardize on u32 for scaled, and others Nov 4, 2024
@ctb ctb changed the title MRG: standardize on u32 for scaled, and others MRG: standardize on u32 for scaled, and introduce ScaledType Nov 5, 2024
@ctb
Copy link
Contributor Author

ctb commented Nov 5, 2024

Ready for review @luizirber !

@luizirber luizirber self-requested a review November 5, 2024 18:06
@@ -155,7 +156,7 @@ pub unsafe extern "C" fn computeparams_set_num_hashes(
}

#[no_mangle]
pub unsafe extern "C" fn computeparams_scaled(ptr: *const SourmashComputeParameters) -> u32 {
pub unsafe extern "C" fn computeparams_scaled(ptr: *const SourmashComputeParameters) -> ScaledType {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k! was wondering where to do the conversion ;)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ScaledType is just an alias, so it gets resolved to u32 in include/sourmash.h. So it doesn't change much now, but easier to play with different ScaledType values in the future =]

@ctb ctb merged commit 2cc44e0 into latest Nov 5, 2024
44 checks passed
@ctb ctb deleted the update_scaled_type branch November 5, 2024 19:36
ctb added a commit that referenced this pull request Nov 5, 2024
## [0.17.0] - 2024-11-05

Changes/additions:
* standardize on u32 for scaled, and introduce `ScaledType` (#3364)
* panic when `FSStorage::load_sig` encounters more than one `Signature`
in a JSON record (#3333)

Updates:

* Bump needletail from 0.5.1 to 0.6.0 (#3376)
* Bump histogram from 0.11.0 to 0.11.1 (#3377)
* Bump serde from 1.0.210 to 1.0.214 (#3368)
* Bump serde_json from 1.0.128 to 1.0.132 (#3358)
* Fix clippy lints from 1.83 beta (#3357)
@ctb ctb mentioned this pull request Dec 5, 2024
ctb added a commit that referenced this pull request Dec 5, 2024
Developer updates:

* build: move ORCID to metadata in pyproject.toml, fix pixi (#3416)
* build: simplify Rust release (#3392)
* fix: Avoid re-calculating md5sum on clone and conversion to
KmerMinHashBTree (#3385)
* r0.15.1 release (#3304)
* update sourmash core to r0.17.0 (#3381)
* Added union method to HLL (#3293)
* Build: upgrade to newer maturin (#3366)
* CI: use supported ubuntu for codspeed (#3350)
* Fix clippy lints from 1.83 beta (#3357)
* Implement resumability for revindex (#3275)
* add `Manifest::intersect_manifest` to Rust core (#3305)
* bump sourmash core to r0.17.2 (#3399)
* change `sig_from_record` to use scaled from `Record` to downsample
(#3387)
* derive Hash for `HashFunctions` (#3344)
* enforce a single scaled on a `CollectionSet` (#3397)
* fix formatting from #3306 (#3307)
* have ruff ignore ipynb so as to avoid triggering an error during CI
(#3325)
* improve downsampling behavior on `KmerMinHash`; fix `RevIndex::gather`
bug around `scaled`. (#3342)
* panic when `FSStorage::load_sig` encounters more than one `Signature`
in a JSON record (#3333)
* propagate error from `RocksDB::open` on bad directory (#3306)
* refactor `calculate_gather_stats` to disallow repeated downsampling
(#3352)
* release core r0.17.1 (#3388)
* release sourmash rust core r0.16.0 (#3356)
* standardize on u32 for scaled, and introduce `ScaledType` (#3364)
* update plugin documentation for users (#3286)
* update sourmash core to r0.15.2 (#3338)
* when lingroups are provided, use them for `csv_summary` (#3311)
* Misc Rust updates to core (#3297)
* Resolve issue for high precision MLE estimation (#3296)

Dependabot and pre-commit CI updates:

* Bump DeterminateSystems/magic-nix-cache-action from 7 to 8 (#3319)
* Bump DeterminateSystems/nix-installer-action from 13 to 14 (#3320)
* Bump DeterminateSystems/nix-installer-action from 14 to 15 (#3374)
* Bump DeterminateSystems/nix-installer-action from 15 to 16 (#3401)
* Bump camino from 1.1.7 to 1.1.9 (#3301)
* Bump codspeed-criterion-compat from 2.6.0 to 2.7.2 (#3324)
* Bump conda-incubator/setup-miniconda from 3.0.4 to 3.1.0 (#3373)
* Bump csv from 1.3.0 to 1.3.1 (#3390)
* Bump getset from 0.1.2 to 0.1.3 (#3328)
* Bump histogram from 0.11.0 to 0.11.1 (#3377)
* Bump js-sys from 0.3.72 to 0.3.74 (#3412)
* Bump memmap2 from 0.9.4 to 0.9.5 (#3326)
* Bump myst-parser from 3.0.1 to 4.0.0 (#3277)
* Bump needletail from 0.5.1 to 0.6.0 (#3376)
* Bump pypa/cibuildwheel from 2.19.2 to 2.20.0 (#3278)
* Bump pypa/cibuildwheel from 2.20.0 to 2.21.1 (#3332)
* Bump pypa/cibuildwheel from 2.21.1 to 2.21.2 (#3345)
* Bump pypa/cibuildwheel from 2.21.2 to 2.21.3 (#3353)
* Bump pypa/cibuildwheel from 2.21.3 to 2.22.0 (#3408)
* Bump roaring from 0.10.6 to 0.10.7 (#3413)
* Bump serde from 1.0.204 to 1.0.207 (#3289)
* Bump serde from 1.0.207 to 1.0.208 (#3298)
* Bump serde from 1.0.208 to 1.0.209 (#3310)
* Bump serde from 1.0.209 to 1.0.210 (#3318)
* Bump serde from 1.0.210 to 1.0.214 (#3368)
* Bump serde from 1.0.214 to 1.0.215 (#3403)
* Bump serde_json from 1.0.120 to 1.0.121 (#3267)
* Bump serde_json from 1.0.121 to 1.0.122 (#3280)
* Bump serde_json from 1.0.122 to 1.0.124 (#3288)
* Bump serde_json from 1.0.124 to 1.0.125 (#3302)
* Bump serde_json from 1.0.125 to 1.0.127 (#3309)
* Bump serde_json from 1.0.127 to 1.0.128 (#3316)
* Bump serde_json from 1.0.128 to 1.0.132 (#3358)
* Bump serde_json from 1.0.132 to 1.0.133 (#3402)
* Bump sphinx-design from 0.5.0 to 0.6.0 (#3268)
* Bump sphinx-design from 0.6.0 to 0.6.1 (#3276)
* Bump tempfile from 3.10.1 to 3.11.0 (#3279)
* Bump tempfile from 3.11.0 to 3.12.0 (#3287)
* Bump tempfile from 3.12.0 to 3.13.0 (#3340)
* Bump tempfile from 3.13.0 to 3.14.0 (#3391)
* Bump thiserror from 1.0.63 to 1.0.64 (#3335)
* Bump thiserror from 1.0.64 to 1.0.65 (#3367)
* Bump thiserror from 1.0.65 to 1.0.68 (#3379)
* Bump thiserror from 1.0.68 to 2.0.3 (#3389)
* Bump web-sys from 0.3.69 to 0.3.70 (#3299)
* Bump web-sys from 0.3.70 to 0.3.72 (#3354)
* Bump web-sys from 0.3.72 to 0.3.74 (#3411)
* Update pytest-cov requirement from <6.0,>=4 to >=4,<7.0 (#3375)
* Update sphinx requirement from <8,>=6 to >=6,<9 (#3269)
* Upgrade rocksdb to 0.22.0, bump MSRV to 1.66  (#3383)
* [pre-commit.ci] pre-commit autoupdate (#3281)
* [pre-commit.ci] pre-commit autoupdate (#3290)
* [pre-commit.ci] pre-commit autoupdate (#3312)
* [pre-commit.ci] pre-commit autoupdate (#3330)
* [pre-commit.ci] pre-commit autoupdate (#3336)
* [pre-commit.ci] pre-commit autoupdate (#3341)
* [pre-commit.ci] pre-commit autoupdate (#3346)
* [pre-commit.ci] pre-commit autoupdate (#3360)
* [pre-commit.ci] pre-commit autoupdate (#3369)
* [pre-commit.ci] pre-commit autoupdate (#3380)
* [pre-commit.ci] pre-commit autoupdate (#3393)
* [pre-commit.ci] pre-commit autoupdate (#3404)
* [pre-commit.ci] pre-commit autoupdate (#3409)
* [pre-commit.ci] pre-commit autoupdate (#3414)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

regularized 'scaled' type in Rust code
2 participants