Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alignment issues and bad asm.js output #3261

Closed
AerialX opened this issue Mar 16, 2015 · 8 comments
Closed

Alignment issues and bad asm.js output #3261

AerialX opened this issue Mar 16, 2015 · 8 comments

Comments

@AerialX
Copy link

AerialX commented Mar 16, 2015

I've noticed some code will miscompile and produce unexpected results depending on the alignment of members in a struct. I'm compiling from LLVM IR output by the Rust compiler.

The following test case IR fails to work properly when run through emcc: main.ll (original source)
It should end in an exit code of 2 (to signify the enum being of value Enum::B), but instead exits with 1, as the object's contents are corrupt after the two clones. Various other targets using this same IR (lli, llc -filetype=obj, NaCl compilation to pexe/nexe, etc.) all produce the expected result. Changing the first field of the Test struct to u32 hides the issue and returns 2 as expected.

I have tried this on both incoming and merge-pnacl-mar-13-2015. I can provide more info or some other versions compiled at different optimization levels if needed. Unfortunately I don't know enough about LLVM IR and Emscripten to follow along what's actually happening here... It seems to be that stores into fields on a new copy of the object can sometimes just fail in strange ways.

@kripken
Copy link
Member

kripken commented Mar 16, 2015

Building that with -s SAFE_HEAP=1, it shows an alignment error. The emscripten target must have all accesses aligned, or marked as unaligned if they are not, so we can handle that (because JS typed arrays disallow unaligned reads and writes).

Was this compiled with Rust's LLVM, then read in another LLVM using emscripten, and with the target triple and datalayout manually modified?

The optimal thing would be to use a single LLVM. Building rust code should be done against the emscripten triple and datalayout,

target datalayout = "e-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-p:32:32:32-v128:32:128-n32-S128"
target triple = "asmjs-unknown-emscripten"

@kripken
Copy link
Member

kripken commented Mar 16, 2015

We could add an option to force all memory accesses to be unaligned. This would be very slow (4x or so), but would not fail on stuff like this. Would that be useful for now?

@AerialX
Copy link
Author

AerialX commented Mar 16, 2015

Aha, thanks for clearing that up! This is indeed compiled from Rust's LLVM, though it is using the target triple and data layout you can see described in the files. i386 is used as the arch instead of asmjs because rustc expects the backend target to have valid asm/output/writer/etc. (and for a few other reasons), though that output is discarded/unused for Emscripten/PNaCl targets.

There's ongoing work in actually building Rust with llvm-pnacl/emscripten-fastcomp and removing its assumptions about how LLVM targets operate (and supporting LLVM-only output for linking purposes) but it's a bit of a way's off from being usable right now. The current method involves using Rust's LLVM to generate as emscripten-compatible IR as possible and then passing that on to emcc. Does having marked unaligned reads/writes require emscripten-specific changes to LLVM? Is it inferred from the target configuration, or the data layout? Is it embedded in debug metadata that's been stripped out? Just wondering if it's possible to get this stopgap setup working until proper integration is ready.

@kripken
Copy link
Member

kripken commented Mar 17, 2015

If the emscripten-asmjs target were upstream, you should at least easily build against it. @sunfishcode , perhaps that is a reason to try to upstream it sooner than later?

Unaligned reads/writes could be done with a new LLVM pass that just strips them. They are on each load/store instruction, e.g. %7 = load i32* %6, align 4 has alignment 4, which we would want to change to 1. Or, we could add a flag on the emscripten backend I suppose. Are you going through LLVM IR in text format anyhow? If so you might want to try s/align 4/align 1 and likewise with align 2 and 8 meanwhile, but I'll try to add that soon.

@AerialX
Copy link
Author

AerialX commented Mar 17, 2015

So then to quote the LLVM reference...

It is the responsibility of the code emitter to ensure that the alignment information is correct. Overestimating the alignment results in undefined behavior.

Does that make this issue rustc's fault then? An alignment in some of the load/stores in that LLVM IR is incorrect? They all seem to be explicitly stated, so it's not falling back to a data layout default (it's using the same data layout as emscripten anyway AFAICT).

As far as assuming all accesses are unaligned, probably don't worry about a flag for that. I can try going over the IR with string replacement, which I'm already doing now to adjust the 3.6 metadata format syntax anyway (and I'm not using BC because binary version compatibility isn't very stable).

@kripken
Copy link
Member

kripken commented Mar 17, 2015

I'm not sure. Possibly rust is declaring structs for LLVM and then using LLVM utilities to calc the alignments. Rust output does work on ARM, which has similar alignment restrictions as emscripten. So my guess is that if Rust were building using the right target and datalayout, it would work.

@AerialX
Copy link
Author

AerialX commented Mar 17, 2015

Just ran a few tests here. The code works as expected on a ARMv6 board, but the same binary exits with the incorrect 1 instead on ARMv5. It does seem to be a bug on rust's side, thanks for your help!

Hopefully rust will be able to target emscripten some day! This hacky method actually works surprisingly well for fairly complex projects, but proper integration will be great to have. The main blocker seems to be the time involved in maintaining such a fork against the constantly-moving rust codebase. Changes to the compiler mostly involve working around differences between llvm-pnacl and 3.6, a few unsupported type/argument constructs, and special-casing the LLVM intrinsics that don't exist (math and integer overflow arithmetic mainly). Then that leaves the compiler being confused when it tries to generate static libraries for linking but isn't able to output anything to store in them.

@AerialX
Copy link
Author

AerialX commented Apr 19, 2015

Closing because not an emscripten bug and was also fixed in rustc: rust-lang/rust#24472

@AerialX AerialX closed this as completed Apr 19, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants