Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert all bigints to number in H5WasmProvider without exceptions #1581

Merged
merged 1 commit into from
Feb 28, 2024

Conversation

axelboc
Copy link
Contributor

@axelboc axelboc commented Feb 27, 2024

Fix #1536

RawVis calls JSON.stringify(), which doesn't know how to serialize bigints.

The existential question is: should RawVis support bigints (perhaps by implementing BigInt.prototype.toJSON), or is RawVis not meant to receive bigints at all (like all the other visualizations)?

The ideal long-term goal is for all the visualizations to support bigints, as this would allow showing exact bingint values notably in the scalar and matrix visualizations and in the tooltip of the line and heatmap visualizations.

For the time being, however, I'm more concerned about the fact that H5WasmProvider sometimes converts bigints to numbers, and sometimes doesn't. This is what I decided to fix in this PR.

It only happens inside "non-printable" compound datasets (i.e. containing fields with non-trivial dtypes, like nested compounds). We already had a nested compound dataset in sample.h5, but it didn't contain bigints. I added bigints to it to show that in this situation, bigints were indeed not converted to numbers. (You can't see it in the diff, since that's what this PR is fixing).

This logic was implemented on purpose in H5WasmProvider: whenever we encountered a bigint array or a printable compound with at least one bigint field, we would call h5wDataset.to_array(), which returns a JSON-safe value notably by converting any bigints to numbers and any typed arrays to plain JS arrays. The rest of the time (so for nested compounds), we would just return h5wDataset.value (or slice()), which don't perform any JSON-related conversions.

Anyway, so I broadened this logic to cover all possible datasets containing bigints.

Instead of calling to_array, I now always read the raw value with h5wDataset.value (or slice()) and perform the bigint conversion myself. This has the advantage of not requiring re-flattening or manual slicing, and ensures that only bigints are converted; typed arrays, notably, remain untouched. I'm hoping that this will be more future proof. Also, my goal is to later move this bigint conversion closer to the visualizations, where needed, as the providers should return values that are as true to the content of the HDF5 as possible.

@bmaranville
Copy link
Contributor

At the moment, h5wasm is reading datasets into the same dtype as they were stored. There is nothing requiring this - when h5wasm reads the values a different output type could be specified, e.g. read uint64 into uint32 etc.

You can see the documentation for supported conversions here: https://docs.hdfgroup.org/hdf5/develop/_h5_t__u_g.html#title10

Would this be important for performance reasons, for the h5wasm provider? If so, please submit an issue to the h5wasm repo.

Copy link
Member

@loichuder loichuder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call: I think it is best to be consistent until we have proper support of BigInt

packages/h5wasm/src/utils.ts Outdated Show resolved Hide resolved
packages/h5wasm/src/utils.ts Outdated Show resolved Hide resolved
@loichuder
Copy link
Member

Would this be important for performance reasons, for the h5wasm provider?

I think our ultimate goal is to support BigInt in H5Web (there is no reason we should not) so that this conversion is only temporary.

But thanks for chiming in 😉

@axelboc
Copy link
Contributor Author

axelboc commented Feb 28, 2024

At the moment, h5wasm is reading datasets into the same dtype as they were stored.

This is great! I think that providers should return the data as accurately as possible; it should then be up to each visualization to convert it when needed (the heatmap's shader, for instance, works almost only with float32 arrays). So I do want to return bigints from the providers eventually and move the conversion to numbers further down, in the relevant visualizations.

This PR is just a first step to fix the linked issue, replace to_array() with value, and bring the bigint-to-number conversion logic into H5Web.

@axelboc axelboc force-pushed the sanitize-bigints branch 2 times, most recently from 6811b82 to 0a04729 Compare February 28, 2024 09:00
@axelboc axelboc merged commit 689d589 into main Feb 28, 2024
8 checks passed
@axelboc axelboc deleted the sanitize-bigints branch February 28, 2024 09:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Do not know how to serialize a BigInt
3 participants