Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make WebGL bindings library 4GB and MEMORY64 aware #15433

Closed
wants to merge 7 commits into from

Conversation

juj
Copy link
Collaborator

@juj juj commented Nov 4, 2021

Make WebGL bindings library 4GB and MEMORY64 aware, with a general machinery to manage pointers in each build mode.

We have four different build modes:

  1. MAXIMUM_MEMORY < 2GB mode. Pointers can be signed int32.
  2. 2GB <= MAXIMUM_MEMORY <= 4GB heap size mode. Need unsigned indices.
  3. MAXIMUM_MEMORY <= 2**53 build mode. Pointers can be JS Numbers.
  4. MAXIMUM_MEMORY > 2**53 build mode. Pointers require BigInts.

Mode 4 is not yet enabled today since WebAssembly/memory64#4 will not be part of initial Wasm64, but we know already today how the JS code will have to look like if/when it becomes a thing, so we might as well already today craft an Emscripten JS library programming convention that is compatible with all these four modes into the future.

Let's systematically use two conceptual "meta-type" names in a JS library helper API: a "wasm pointer", and a "wasm heap index", where

  • a wasm pointer is what comes out from Wasm to JS in a function call: a signed int32 in 2GB and 4GB heap size modes when MEMORY64=0, and a BigInt in MEMORY64=1 build mode.
  • a "wasm heap index" that is an unsigned JS Number for build modes 1-3 above, and a BigInt for mode 4 above.

Then have functions

idx = ptrToIdx(ptr, accessWidth) // returns a wasm heap index converted from a wasm pointer
convertPtrToIdx(ptr, accessWidth) // converts a wasm pointer to a wasm heap index in-place

to convert a wasm pointer to a wasm heap index, and

idx = createHeapIdx(number) // casts the given JS Number to a wasm heap index
number = idxToMemory53(heapIndex) // checked/assert casts the given wasm heap index back to a JS Number

Open to bikeshedding the function names.

This way we can obsolete/unify the current mixture of

#if CAN_ADDRESS_2GB
    ptr >>>= 0;
#endif

and

    {{{ from64('ptr') }}};

of code to a unified JS library code that will be proper to each build/memory mode without much hassle.

To get going, this PR converts the WebGL library to use this API, before mass converting the rest of the runtime and system JS libraries.

@juj juj added the GL label Nov 4, 2021
@juj
Copy link
Collaborator Author

juj commented Nov 4, 2021

Then, operations that are valid on a wasm pointer are:

  1. convert to a wasm index by calling ptrToIdx(ptr, ...) or convertPtrToIdx(ptr, ...),
  2. pass-through call a Wasm function with it (someWasmFunc(ptr)),
  3. pre/post-increment or decrement: (ptr++, ++ptr, ptr--, --ptr)
  4. test if the pointer is aligned to a particular byte boundary: isPtrAligned(ptr, 4)

Operations that are valid on a wasm index are:

  1. Index a heap object, i.e. HEAPF32[heapIdx],
  2. pre/post-increment or decrement: (heapIdx++, ++heapIdx, heapIdx--, --heapIdx)
  3. cast to a JS Number: idxToMemory53(heapIdx) (e.g. to call a web API that does not support BigInts)
  4. add/subtract from another wasm index for pointer arithmetic (heapIdx += someOtherHeapIdx, and heapIdx += createHeapIdx(offset);)

Adding more JS library functions we can expand these feature sets as needed, though I think that will be most that we'll ever need.

@kripken kripken requested a review from aardappel November 4, 2021 21:36
Copy link
Member

@kripken kripken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

memory53 as a name seems a little odd, but I can't really think of a better one 😄 But perhaps mem53 which would match idx in being short?

It might be nice to have a helper for the common idxToMemory53(ptrToIdx(..)) case? Perhaps ptrToMemory53Idx?

assert(v != 0, 'Null pointer passed to glVertexAttrib2fv!');
#endif

GLctx.vertexAttrib2f(index, HEAPF32[v>>2], HEAPF32[v+4>>2]);
{{{ convertPtrToIdx('v', 2) }}};
GLctx.vertexAttrib2f(index, HEAPF32[v++], HEAPF32[v]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About this pattern:

Suggested change
GLctx.vertexAttrib2f(index, HEAPF32[v++], HEAPF32[v]);
GLctx.vertexAttrib2f(index, HEAPF32[v], HEAPF32[v+1]);

I think this is clearer, and the same amount of bytes? (also less at risk of breaking during a refactoring, but maybe that's unlikely for this code)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HEAPF32[v+1] would unfortunately not work, since v could be a BigInt, and BigInt+1 is not allowed (would have to be BigInt+1n.

assert(v != 0, 'Null pointer passed to glVertexAttrib3fv!');
#endif

GLctx.vertexAttrib3f(index, HEAPF32[v>>2], HEAPF32[v+4>>2], HEAPF32[v+8>>2]);
{{{ convertPtrToIdx('v', 2) }}};
GLctx.vertexAttrib3f(index, HEAPF32[v++], HEAPF32[v++], HEAPF32[v]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
GLctx.vertexAttrib3f(index, HEAPF32[v++], HEAPF32[v++], HEAPF32[v]);
GLctx.vertexAttrib3f(index, HEAPF32[v], HEAPF32[v+1], HEAPF32[v+2]);

(same here for this pattern - I think it's always going to be the same # of bytes)

@@ -1335,8 +1335,57 @@ function from64(x) {
}

function to64(x) {
if (!MEMORY64) return x;
return `BigInt(${x})`;
if (MEMORY64) return /^\d+$/.test(x) ? `${x}n` : `BigInt(${x})`;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we can do this atm. The 10n notation requires a newer version of terser than we have here, last I checked. I verified that it still breaks with -O3 on this:

#include <emscripten.h>

int main() {
  EM_ASM({
    Module['foo'](10n);
  });

That crashes in terser.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, sad. :( Would updating be possible? If not, we could as a temporary workaround unconditionally use BigInt() constructor, and once we have an update in place, switch to supporting the literals?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, can write BigInt(10), yeah. That's what we've done elsewhere.

Eventually upgrading terser would be good, but I'm not sure how easy that would be.

src/parseTools.js Outdated Show resolved Hide resolved
@aardappel
Copy link
Collaborator

Again, disagree we need a distinction between 53 and 64 at this point, and the complexity it brings, see my previous comments.

The issue you refer to as a reason we already need to be supporting these last 11 bits right now was written by me, has no support yet in the Wasm community, is likely hard to implement in browsers, and is pure blue sky thinking.

if (MAXIMUM_MEMORY > MAX_MEMORY53) .. do we have coverage for this path on a 16PB test machine? ;)
This seems a classical example of engineering complexity before you need it.

But hey, I've argued this multiple times now and it gets ignored, so I'll leave it to the project maintainers.

Also btw, both MEMORY64=1 and MEMORY64=2 result in 64-bit values at the boundary.

@juj
Copy link
Collaborator Author

juj commented Nov 4, 2021

memory53 as a name seems a little odd, but I can't really think of a better one 😄 But perhaps mem53 which would match idx in being short?

It might be nice to have a helper for the common idxToMemory53(ptrToIdx(..)) case? Perhaps ptrToMemory53Idx?

idxToMem53() sounds ok to me. One thing to note is that whenever this function appears in our JS library code, it is always due to a limitation of a web API, e.g. WebGL not supporting BigInts in entrypoints, or WebGPU not supporting them: gpuweb/gpuweb#2252 or similar. I can't think of any other reason why one would want to do this kind of conversion that would lose bits.

That is why I think that idxToMemory53(ptrToIdx(..)) might actually be a rare case, and exist more to workaround a web api limitation rather than the general thing.


let conversion;
if (MEMORY64) {
if (MAXIMUM_MEMORY > MAX_MEMORY53) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we can just make this invalid for MAXIMUM_MEMORY to ever be this big? (early exit in emcc if it ever is?).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that works.

// Returns a pointer (either a BigInt or a signed int32 JS Number) shifted to an index on the heap (a BigInt or an unsigned JS Number).
function ptrToIdx(ptr, accessWidth) {
if (MEMORY64) {
if (MAXIMUM_MEMORY > MAX_MEMORY53) return /^\d+$/.test(accessWidth) ? `${ptr} >> ${accessWidth}n` : `${ptr} >> BigInt(${accessWidth})`;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about a helper function isIntLiteral(x) .. assuming that is what this regex is trying to achieve. Its wan't obvious to me on first readying and its a repeated patter withing this file.

@juj
Copy link
Collaborator Author

juj commented Nov 4, 2021

@aardappel could you maybe drop the snarky undertone? I don't think I deserve this attitude from you.

If you read the code, practically the same amount of complexity here is already with respect to collapsing BigInts to Numbers. Removing those four lines that have if (MAXIMUM_MEMORY > MAX_MEMORY53) would not help the complexity for the user JS library code one bit. They will still need to distinguish between what is a pointer that can e.g. flow back in to a Wasm function call as a BigInt, and how to do indexing and pointer arithmetic in a 32-bit vs 64-bit general way. The current implementation with an ad hoc mix of CAN_ADDRESS_2GB and from64s and to64s here and there does not provide the needed coherency that is needed for users to be able to adopt this.

I get that such a MAXIMUM_MEMORY > MAX_MEMORY53 path will not be exercised for a while, but that does not hurt me or you or the community to have a design that makes that a non-issue. I honestly don't understand why having that in there would offend you or warrant ridicule.

@kripken
Copy link
Member

kripken commented Nov 4, 2021

@aardappel

Again, disagree we need a distinction between 53 and 64 at this point, and the complexity it brings, see my previous comments.

I admit I do not fully understand all the issues there, which is why I haven't commented on it. However, what I like in this PR is that it makes explicit what is going on everywhere. The new functions make it easier to read the code and understand what's going on (what is a pointer, what is an index, how they are converted, where we assume that such a value can fit in a Number, etc.), and if we want to change things then we have the right hooks to do so in a single place. That's why I like this direction.

if (MAXIMUM_MEMORY > MAX_MEMORY53) {
conversion = /^\d+$/.test(accessWidth) ? `${ptr} >>= ${accessWidth}n` : `${ptr} >>= BigInt(${accessWidth})`;
} else {
conversion = `${ptr} = Number(${ptr}) / ${accessWidth}`;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am currently looking into getting a wasm64 build to work with the emscripten opengl integration and have stumbled over this pull request. While the approach looks good, there is an issue with this line. I assume that accessWidth is supposed to be the number of bits to shift. So you should not divide ptr by this but instead divide it by 2^accessWidth.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, that is definitely wrong. I have been looking to get back to finishing this PR soon, once I get my other audio worklets feature through the finish line. I'll push a fix to this in a sec.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick fix!

@juj juj force-pushed the webgl_ptr_management branch from f94b6ab to 34c6a31 Compare January 26, 2023 12:37
# Conflicts:
#	src/library_webgl.js
#	src/library_webgl2.js
@juj
Copy link
Collaborator Author

juj commented May 3, 2023

This PR is now quite old in the tree, so I'll close this and open a new one.

@juj juj closed this May 3, 2023
juj added a commit to juj/emscripten that referenced this pull request May 3, 2023
…dxToPtr() }}} to help 2GB, 4GB and MEMORY64 memory modes. Remove {{{ to64() }}} and {{{ from64() }}} since they had an awkward mismatch of in-place vs expression styles.

This is a looser form of PR emscripten-core#15433, and it embraces the idea that pointer values in JavaScript side will not need to retain the full 64-bit values, but can be truncated to 53 bits.
juj added a commit to juj/emscripten that referenced this pull request May 3, 2023
…dxToPtr() }}} to help 2GB, 4GB and MEMORY64 memory modes. Remove {{{ to64() }}} and {{{ from64() }}} since they had an awkward mismatch of in-place vs expression styles.

This is a looser form of PR emscripten-core#15433, and it embraces the idea that pointer values in JavaScript side will not need to retain the full 64-bit values, but can be truncated to 53 bits.
juj added a commit to juj/emscripten that referenced this pull request May 3, 2023
…dxToPtr() }}} to help 2GB, 4GB and MEMORY64 memory modes. Remove {{{ to64() }}} and {{{ from64() }}} since they had an awkward mismatch of in-place vs expression styles.

This is a looser form of PR emscripten-core#15433, and it embraces the idea that pointer values in JavaScript side will not need to retain the full 64-bit values, but can be truncated to 53 bits.
juj added a commit to juj/emscripten that referenced this pull request May 3, 2023
…dxToPtr() }}} to help 2GB, 4GB and MEMORY64 memory modes. Remove {{{ to64() }}} and {{{ from64() }}} since they had an awkward mismatch of in-place vs expression styles.

This is a looser form of PR emscripten-core#15433, and it embraces the idea that pointer values in JavaScript side will not need to retain the full 64-bit values, but can be truncated to 53 bits.
juj added a commit to juj/emscripten that referenced this pull request May 4, 2023
…dxToPtr() }}} to help 2GB, 4GB and MEMORY64 memory modes. Remove {{{ to64() }}} and {{{ from64() }}} since they had an awkward mismatch of in-place vs expression styles.

This is a looser form of PR emscripten-core#15433, and it embraces the idea that pointer values in JavaScript side will not need to retain the full 64-bit values, but can be truncated to 53 bits.

Fix typos and lint

Cleanup an use of convertPtrToIdx
juj added a commit to juj/emscripten that referenced this pull request May 4, 2023
…dxToPtr() }}} to help 2GB, 4GB and MEMORY64 memory modes. Remove {{{ to64() }}} and {{{ from64() }}} since they had an awkward mismatch of in-place vs expression styles.

This is a looser form of PR emscripten-core#15433, and it embraces the idea that pointer values in JavaScript side will not need to retain the full 64-bit values, but can be truncated to 53 bits.
juj added a commit to juj/emscripten that referenced this pull request May 11, 2023
…dxToPtr() }}} to help 2GB, 4GB and MEMORY64 memory modes. Remove {{{ to64() }}} and {{{ from64() }}} since they had an awkward mismatch of in-place vs expression styles.

This is a looser form of PR emscripten-core#15433, and it embraces the idea that pointer values in JavaScript side will not need to retain the full 64-bit values, but can be truncated to 53 bits.
juj added a commit to Unity-Technologies/emscripten that referenced this pull request May 11, 2023
…dxToPtr() }}} to help 2GB, 4GB and MEMORY64 memory modes. Remove {{{ to64() }}} and {{{ from64() }}} since they had an awkward mismatch of in-place vs expression styles.

This is a looser form of PR emscripten-core#15433, and it embraces the idea that pointer values in JavaScript side will not need to retain the full 64-bit values, but can be truncated to 53 bits.
@sbc100 sbc100 added the wasm64 label Jun 7, 2023
juj added a commit to juj/emscripten that referenced this pull request Aug 29, 2023
…dxToPtr() }}} to help 2GB, 4GB and MEMORY64 memory modes. Remove {{{ to64() }}} and {{{ from64() }}} since they had an awkward mismatch of in-place vs expression styles.

This is a looser form of PR emscripten-core#15433, and it embraces the idea that pointer values in JavaScript side will not need to retain the full 64-bit values, but can be truncated to 53 bits.

Update library_html5.js for >2GB

Address review and fix convertPtrToIdx on non-byte shifts
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants