-
Notifications
You must be signed in to change notification settings - Fork 43
wasm_simd128.h intrinsic naming #342
Comments
I like the current naming conventions, and have plenty of code written against current version of |
I also like the current conventions just fine, but will also be happy with explicit |
Oh yeah, I guess I was implicitly assuming that we should update the |
Is the the "flexible" SIMD stuff about a vector length agnostic proposal (similar to SVE or the RISC-V V spec)? If that's the case and there will never be explicit 256-bit (or anything else) vectors in WASM SIMD (which I think is the right decision), one possibility would be to drop the lane count. For example, One possible wrinkle here is that everything is in the |
Yes, so far there is no intention to provide 256-bit fixed-width operations. This is a good question - really it depends on operop between the two and requirements of defining user-facing intrinsics. Native intrinsics typically have different versions for different ISA extensions, even if they do the same thing, I am not sure what path Wasm would take here. BTW, if you are curious on how flexible proposal would handle some of this we should definitely talk about in that repository. |
I hadn't really considered the possibility of using the same API for flexible vectors, but I think there are some obstacles which would make that difficult. For example, you probably want an additional parameter on most functions for the predicate (unless you want to get rid of the predicate in the API). I'd prefer to see a separate namespace for flexible vectors, but having (for example) My concern was more about the possibility of future functions like If the flexible SIMD would use a different namespace and there isn't going to be a fixed 256-bit SIMD extension, I think dropping the lane counts from the API would make code using it a lot more readable (and easier to write, but I care less about that). Otherwise it could still be done by overloading the functions, which I would be good with but others may find distasteful…
I am very interested in the C/C++ side, yes. I'll watch the repo. |
I'd say flexible SIMD is highly likely to use a different namespace - that is how things usually go with the toolchain.
You are right - adding them would be beneficial for x86 at expense of everybody else, which did not make a lot of sense. To my best knowledge there are no proposals to introduce 256-bit ops as of today. BTW, flexible vectors take approach somewhat in the same direction as WAV - there are types for lanes, instead of a generic 'vector' type. Also, manipulating lane count (which a few people would like to see gone) is done via a separate instruction, rather than through an extra parameter, which means that the core API would not need a lane count argument. |
I just added a lot of tests to the tests branch of WAV to verify that everything compiles down to exactly what we expect. Functions which don't generate the expected result have been annotated in wav.h (in the main branch) as requiring unimplemented-simd128 (grep for "_UNIMPLEMENTED"). I think this should serve as a pretty good list for you, though it doesn't incorporate your change from yesterday. The exception is the const functions, which generate splat + replace_lane. One other thing I noticed is that there are 4 different builtins for any_true, but the spec only has a single variant (which makes sense). Also, if you mask the count on the shift functions in C (i.e., |
@nemequ, I landed a patch in LLVM yesterday to remove the unimplemented-simd128 target feature and the implicit definitions of the corresponding feature macro, but of course that shouldn't stop you from using the same macro for your own purposes. That same patch should enable v128.const by default. I expect to remove the current any_true_* intrinsics and replace them with a single any_true_v128 intrinsic to mirror the change we made to the instructions. I submitted https://bugs.llvm.org/show_bug.cgi?id=49655 to track the suboptimal shift issue. |
I just uploaded https://reviews.llvm.org/D101112 to add the missing intrinsics and update the existing intrinsics to match the new instruction names. The old names are kept as deprecated functions to ease transitioning to the new names. Reviews welcome! |
@tlively The updated intrinsics look good. I'd like to have |
@Maratyszcza, if you do e.g. |
It is enough for me, but is probably a bad idea overall: polymorphic macros/functions are not portable to older C codebases. |
This isn't polymorphism in the clang front-end; it's the backend instruction selection detecting that a splat of a constant is better represented with a v128.const than with a splat instruction. |
Then it is not enough: other compilers may not have the same optimization. E.g. last time I checked, Microsoft Compiler happily generated multi-instruction sequence for |
The first parameter to
Other than that, my biggest concern is that |
@tlively let me know if you want me to look into any of this. |
Thanks for the feedback @Maratyszcza and @nemequ, and thanks for the offer of help @penzn! I'm currently working on adding a test upstream in clang to directly test the codegen for all of the intrinsics to catch the i64x2.abs issue @nemequ identified and other similar issues. @penzn, if you want to look into replacing the macro intrinsics with functions using the pattern @nemequ shared, that would be really helpful and would not clash with my current work. If you're interested, I would also appreciate help fixing codegen issues after my test lands. |
Another one: Also, the input to FWIW, I should have a bunch of examples where codegen can be improved soon. My plan is to update the WASM SIMD128 implementation in SIMDe to match the current wasm_simd128.h over the weekend, then port the (verified correct) implementations to WAV so I can use the tests I have there to easily find implementations for which LLVM doesn't generate optimal code (i.e., a single instruction). I'll probably file them in the LLVM bugzilla next week. Sorry, it's going to be a lot of work on the LLVM side, but I think it should be pretty helpful for improving autovectorization. |
Excellent, I look forward to the reports. Thanks for your work on this! |
I have the codegen tests up for review here: https://reviews.llvm.org/D101684. @nemequ, could you take a look through the FIXMEs there and make sure they line up with the issues you've seen? |
LGTM. They include all the issues I've seen; your list has a few more because WAV uses the builtins directly (unless I can avoid them altogether and still get the right instructions) instead of going through wasm_simd128.h, and most of these seem to be bugs in the header. I did just notice that the pointer parameters for |
Argh, one last thing, sorry: the inputs to |
@tlively One more thing: the |
Looks like the tests need to be re-done in terms of C->IR->Wasm. @tlively since this is more work, let me know if you need help with that as well, as a baseline it should be possible to take the tests you have written and turn them into two files for two stages. |
Thanks, @penzn! I've already landed the C-> IR test here: https://reviews.llvm.org/rGf3b769e82ff3340e96c2e50ac2bd6361fbc3d797. I don't think we'll want a specific IR->Wasm test file that matches the intrinsics, though. I still have my end-to-end test in a separate branch to help me identify specific backend issues, though. For example, https://reviews.llvm.org/rG14ca2e5e22e7806c77ca3e5b126e888c9b1c4041 should fix the bug with i64x2.abs not being generated. |
How would folks feel about changing the return type of the extract_lane functions and parameters to replace_lane functions to be the specific element types, e.g. |
I'm a little worried about the C++ compiler inserting redundant extend instructions in both cases, since the underlying instructions consume and return int. I expect this is mostly not a problem, though it is possibly ABI-dependent. (Consistency is good, but could it be that it's the arguments to the splat functions that should be changed to int instead?) |
Yes, it would also be possible to change the arguments to splats to be ints, but IMO the more specific types give a better developer experience. I can add tests that check that no extra extend instructions are emitted, if that sounds good. |
Tests are good... Still wondering if this might be an issue in corner cases, but having a hard time constructing concrete examples for it, depends on compiler internals. In principle though, something as simple as |
I'm strongly in favor of this, but isn't it basically #257? I don't see why it would make sense for the builtin to return an This brings up another question I've been wondering about: is there a reason for these builtins to exist at all? Why not just drop them and implement the extract_lane and replace_lane functions using the subscript operator? The only benefit of the builtin I can think of is that it verifies that the index is a constant, but you can do that in the wrapper instead (which is what I do in WAV). It's not just these functions; a lot of the current builtins can already be replaced by normal code (I can very easily put together a fairly complete list if there is interest). There are a lot of functions in wasm_simd128.h which already work this way so obviously it's not forbidden, but what is the criteria for determining whether something needs a builtin or not? |
Ah yes, I had forgotten about #257. @Maratyszcza, on that issue you were arguing against using the smaller types; has your thinking on that evolved? I know my own focus has shifted from trying to represent the underlying instructions precisely to trying to present the best developer experience (with no difference in the produced code either way).
Generally, many of the builtins exist because getting end-to-end codegen for particular instructions was simpler using builtin functions and LLVM intrinsics than proper ISel patterns. My plan is to continue replacing those builtins with normal code patterns. The extract and replace builtins predate all that rapid prototyping work, though. I was very new to this project when I added them, and I agree that they are not useful today ;)
If normal code patterns can reliably give us good results via reasonable ISel patterns, no builtin or LLVM intrinsic should be necessary. If the underlying instruction has potentially useful semantic differences from similar C, a builtin is probably still useful to expose those semantics at the source level, though. (For example, the builtins for the trapping and non-trapping float-to-int instructions are useful because out-of-bounds float-to-int conversions are UB in C.) Builtins are also useful when the operation cannot be simply expressed in C or LLVM IR, for example for the saturating, rounding q15 mul instruction. |
That's great, though IMHO it would need some major, fundamental changes to present a good experience; that's exactly why I started working on WAV.
Ok, it seems like we're on the same page here. I'm trying to do the same. By "replacing", do you mean that you plan to remove the builtins, or just stop using them in wasm_simd128.h? Right now I'm using them directly in WAV (when normal code won't work), but since WAV isn't distributed with LLVM like wasm_simd128.h is I'm eventually going to need a stable target… I have a bunch of functions in WAV (besides extract/replace_lane) which don't call builtins but do generate the correct instructions, including many which wasm_simd128.h does use the builtins for. Once you have the tests you're working on updated and merged into LLVM do you want me to put together a patch to merge those implementations into wasm_simd128.h? |
Yes, I'm just trying to get the best experience possible without major changes ;) I'm very excited about your WAV work as well!
For now I would like to remove them entirely whenever possible, but I totally agree about a stable target. LLVM 13 will branch on July 27; let's finish all the cleanup by then and call whatever builtins make it into LLVM 13 stable. Does that seem reasonable to you?
Yes, the tests are merged upstream and patches to wasm_simd128.h would be very welcome! |
Yes. Are you open to similar wasm_simd128.h changes for roughly the same window? Here are some ideas off the top of my head:
Also, I think it's worth considering how aggressively uni-typed wasm_simd128.h should be. For example
I can see an argument for leaving things as they are for consistency but since wasm_simd128.h is going to be the official API, short of deprecating the whole thing out and switching to a typed API (which I'd be completely comfortable with ;)) I think filling in some of the gaps like this would be the least-bad option… you still don't get type safety, but at least there are fewer weird holes in the API.
Oh, that was quick. I'll try to get on that soon. |
Generally yes, although breaking changes would be a harder sell and would need to allow time for users to update their code.
I was thinking that for aligned loads and store, users could just dereference
I agree, and have started moving in that direction with https://reviews.llvm.org/D102018.
Sounds like a good idea!
Interesting! I assume C++ supports that as well? Seems worth considering.
I get where you're coming from, but I think this would make more sense if we actually exposed different types. I slightly prefer leaning into the uni-typed design in which the function/instruction names give the semantics, but the type itself is not interpreted. I guess we could have multiple names for the same operation, but I'm not sure it would be worth it. I'm open to further discussion here. |
A quick list of recent changes I have landed upstream:
|
One problem with just casting to
Conformant array parameters? No, C++ doesn't support them. They're based on the syntax C uses for VLAs which C++ (wisely) never added. They don't technically create VLAs since array arguments degrade to pointers, but clang and GCC will both emit a -Wvla diagnostic (if it's enabled) when they are encountered, and MSVC and IAR (which don't support VLAs) error. However, that's easy enough to get around with a macro, which is what I do in WAV. Even if it gets compiled away to nothing I still find it valuable for documentation. To be clear, an empty square bracket there is totally fine for C++, so compiling it away to nothing is valid on C++ and pre-C99. Also, the C2x charter adds an additional principle (the only new one) for this:
So I expect this to be more common going forward, hopefully with more tooling picking up support.
Completely agree, but IMHO it's still sensible with a single type.
Okay, so here's my pitch: For stuff like shuffle where it truly is generic, I'd be happy with There is also the readability issue. For example, consider something like: v128_t madd_u32(v128_t a, v128_t b, v128_t c) {
return wasm_u32x4_add(wasm_i32x4_mul(a, b), c);
} To me, that just looks like a bug. I'd definitely trip over something like that if I was skimming some code and have to think about it for a second. IMHO it's important enough to make sure an API is readable and unsurprising to justify adding aliases even when, if you think about it, under the hood they do exactly the same thing. Then there is the consistency issue. The above snippet isn't a bug, but this is: v128_t madd_f32(v128_t a, v128_t b, v128_t c) {
return wasm_f32x4_add(wasm_i32x4_mul(a, b), c);
} In some ways, I think it's actually more important to do stuff like this in a uni-typed API. That first example doesn't just look like a bug, it looks like a class of bug which is probably going to be very common since there is no type safety to prevent you from making that mistake. I'd expect people to be more sensitive to issues like that when reviewing code using this API, so you're going to end up with a lot of false-positive diagnostics from your brain for stuff like this. Basically, the aliases would make code easier to read and write, and the only cost is a few extra functions in a system include. |
IIRC, V8 & SM ignore alignment hints, and practically enforcing alignment hints would cost performance on platforms where they are not natively enforced by hardware (including x86 AVX and ARM64 NEON). Thus, practically alignment hints are of no use in WAsm SIMD. |
For these types of people we have automatic autovectorization in Clang. IMO it is reasonable to expect that people who directly program in low-level intrinsics understand which instructions those map to, and what operations they do. |
Compiler does not normally carry full info about extra operations or constraints intrinsics have, for use in optimizations. I would not consider this a bug in spirit of @Maratyszcza's point right above this message :) Posted in bugzilla (LLVM is migrating to Github issues some time soon): https://bugs.llvm.org/show_bug.cgi?id=49655#c1 |
Currently the intrinsic header uses a different naming convention than the underlying SIMD instructions. The intrinsic name can be determined from the instruction name by prepending
wasm_
, turning the.
into an_
, and removing any signedness suffix in favor of representing unsigned interpretations by using au
instead of ani
in the initial interpretation specifier.For example
i32x4.lt_u
becomeswasm_u32x4_lt
andi32x4.lt_s
becomeswasm_i32x4_lt
.I don't think we ever discussed this naming convention as a group, so with the recent instruction renamings, now is a good time to make sure there are no better alternatives before everything stabilizes.
One possibility, for example, would be to get rid of the
u
s in lane interpretations and use the same_u
and_s
prefixes as the instructions. What other ideas do we have? Or do we like the current scheme?cc @zeux.
The text was updated successfully, but these errors were encountered: