Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

src: don't use __builtin_bswap16() and friends #7644

Closed
wants to merge 3 commits into from

Conversation

bnoordhuis
Copy link
Member

@bnoordhuis bnoordhuis commented Jul 10, 2016

Said builtins are not supported by older versions of apple-gcc, breaking
the build on OS X 10.8.

Fixes: #7618
Refs: #4290
Refs: #7157

CI: https://ci.nodejs.org/job/node-test-pull-request/3235/

@bnoordhuis bnoordhuis added c++ Issues and PRs that require attention from people who are familiar with C++. lib / src Issues and PRs related to general changes in the lib or src directory. labels Jul 10, 2016
@gibfahn
Copy link
Member

gibfahn commented Jul 10, 2016

This fix works for me with OSX 10.8 and Clang 3.2 using make CXX.host=c++. Does that CXX.host option need to be added to BUILDING.md?

@ChALkeR
Copy link
Member

ChALkeR commented Jul 10, 2016

Note that macOS 10.8 is not supported by Apple anymore, last version was released in 2013.

This looks like it simplifies some code, though. Are there any performance implications here?

@zbjornson
Copy link
Contributor

zbjornson commented Jul 10, 2016

There are big performance implications, yes. See the benchmark at the top of #7157 (up to 6x speedup) -- that was actually the main goal of the PR. I'd rather detect the Clang version or feature check in the macro, which is easy enough. (Or if the node maintainers decide, really drop support, as technically this isn't a supported compiler already.)

@zbjornson
Copy link
Contributor

zbjornson commented Jul 10, 2016

@gibfahn or @bnoordhuis can you try this patch instead please? This works on ubuntu with clang 3.0 and later but again I can't test on OS X and I had to jerry-rig gnu libs with clang.

+++ src/node_buffer.cc
--- src/node_buffer.cc

@@ -55,1  +55,5 @@
- #if defined(__GNUC__) || defined(__clang__)
+ #ifndef __has_builtin
+ #  define __has_builtin(x) 0 // compatibility with non-clang
+ #endif
+
+ #if defined(__GNUC__) || (defined(__clang__) && __has_builtin(__builtin_bswap16))

godbolt if you want to play with this

Note that LLVM clang 3.2 appears to have __builtin_bswap16; it's not present in LLVM clang 3.0. Perhaps Apple's fork was created after it was added?

(I like that this PR consolidates all of the bswap code though.)

Edit See follow-up PR #7645

@Fishrock123
Copy link
Contributor

Hmm, we could backport this but I doubt this is worthwhile on master? Should we then drop OS X 10.8 support in v7?

@ChALkeR
Copy link
Member

ChALkeR commented Jul 10, 2016

/cc @nodejs/build @jbergstroem, ref: nodejs/build#367. Any updates on the CI side?

@Fishrock123 Google doesn't support 10.7/10.8 since Chrome 50. They could one day just break old clang support on their side (i.e. in v8). Also, 3 years of no security updates from Apple.

So yes, +1 for dropping macOS 10.7 and 10.8.

@cjihrig
Copy link
Contributor

cjihrig commented Jul 11, 2016

+1 for dropping older, unsupported versions.

@gibfahn
Copy link
Member

gibfahn commented Jul 11, 2016

As I understand it this PR reverts the #7157 performance gains, whereas #7645 uses the newer builtins if available. So in theory there should be no performance loss with #7645.

Also, the problem here is with an outdated version of Clang. It should be possible to update to a later version of Xcode on an OSX 10.8 mac, so this issue wouldn't force dropping support for OSX 10.8. In fact, as the v6.0.0 and v4.4.7 BUILDING.md instructions already list Clang 3.4 as the minimum supported version, Clang 3.2 isn't technically supported anyway.

Obviously it may well make sense to drop OSX 10.8 support from v7 onwards anyway.

Just to clarify, does supporting a version mean that node will build on that OS, or just that it will run there?

@bnoordhuis
Copy link
Member Author

does supporting a version mean that node will build on that OS, or just that it will run there?

Strictly speaking just run, I think, although I can't really come up with an example where it's known to run but not build on a platform. CentOS 6 without devtoolset comes closest.

Said builtins are not supported by older versions of apple-gcc, breaking
the build on OS X 10.8.

Fixes: nodejs#7618
Refs: nodejs#4290
Refs: nodejs#7157
Fixes a couple of strict aliasing violations in the process.
@bnoordhuis
Copy link
Member Author

@zbjornson PTAL. I didn't go all out (see footnote) but I get comparable numbers for the aligned case and double or treble the throughput for the unaligned swap32 and swap64 cases. (x86_64 linux, g++ 6.1.1 and clang++ 3.8.0, no -march or -mtune.)

Footnote: the swap16 and swap32 cases can probably be sped up by reading and writing 64 bits at a time (on 64 bits architectures, or 32 bits with swap16 on 32 bits architectures.)

@zbjornson
Copy link
Contributor

@bnoordhuis on Windows this branch appears substantially slower (using defaults with vcbuild nosign) (only looking at buffers large enough to use C++):

aligned="true" method="swap16" len="256": node-bn.exe: 8211400 node-zb.exe: 24514000 ... -66.50%
aligned="true" method="swap16" len="512": node-bn.exe: 5197000 node-zb.exe: 21143000 ... -75.42%
aligned="true" method="swap16" len="768": node-bn.exe: 3793800 node-zb.exe: 17304000 ... -78.08%
aligned="true" method="swap16" len="1024": node-bn.exe: 2983100 node-zb.exe: 15854000 .. -81.18%
aligned="true" method="swap16" len="1536": node-bn.exe: 2081200 node-zb.exe: 13167000 .. -84.19%
aligned="true" method="swap16" len="2056": node-bn.exe: 1575300 node-zb.exe: 10470000 .. -84.95%
aligned="true" method="swap16" len="4096": node-bn.exe: 826430 node-zb.exe: 7266900 .... -88.63%
aligned="true" method="swap16" len="8192": node-bn.exe: 421840 node-zb.exe: 4015800 .... -89.50%

aligned="true" method="swap32" len="256": node-bn.exe: 9243300 node-zb.exe: 24699000 ... -62.58%
aligned="true" method="swap32" len="512": node-bn.exe: 5794900 node-zb.exe: 22149000 ... -73.84%
aligned="true" method="swap32" len="768": node-bn.exe: 4227900 node-zb.exe: 19544000 ... -78.37%
aligned="true" method="swap32" len="1024": node-bn.exe: 3321900 node-zb.exe: 17150000 .. -80.63%
aligned="true" method="swap32" len="1536": node-bn.exe: 2330500 node-zb.exe: 13268000 .. -82.43%
aligned="true" method="swap32" len="2056": node-bn.exe: 1756900 node-zb.exe: 11414000 .. -84.61%
aligned="true" method="swap32" len="4096": node-bn.exe: 932390 node-zb.exe: 7254200 .... -87.15%
aligned="true" method="swap32" len="8192": node-bn.exe: 477350 node-zb.exe: 3929900 .... -87.85%

aligned="true" method="swap64" len="256": node-bn.exe: 6482100 node-zb.exe: 24790000 ... -73.85%
aligned="true" method="swap64" len="512": node-bn.exe: 3745600 node-zb.exe: 21872000 ... -82.88%
aligned="true" method="swap64" len="768": node-bn.exe: 2640600 node-zb.exe: 19708000 ... -86.60%
aligned="true" method="swap64" len="1024": node-bn.exe: 2041600 node-zb.exe: 15523000 .. -86.85%
aligned="true" method="swap64" len="1536": node-bn.exe: 1399100 node-zb.exe: 12727000 .. -89.01%
aligned="true" method="swap64" len="2056": node-bn.exe: 1058700 node-zb.exe: 11202000 .. -90.55%
aligned="true" method="swap64" len="4096": node-bn.exe: 545650 node-zb.exe: 7561600 .... -92.78%
aligned="true" method="swap64" len="8192": node-bn.exe: 275200 node-zb.exe: 4401100 .... -93.75%

aligned="false" method="swap16" len="256": node-bn.exe: 8412100 node-zb.exe: 8760400 .... -3.98%
aligned="false" method="swap16" len="512": node-bn.exe: 5221500 node-zb.exe: 5219800 ..... 0.03%
aligned="false" method="swap16" len="768": node-bn.exe: 3789700 node-zb.exe: 3708100 ..... 2.20%
aligned="false" method="swap16" len="1024": node-bn.exe: 2975000 node-zb.exe: 2871400 .... 3.61%
aligned="false" method="swap16" len="1536": node-bn.exe: 2085500 node-zb.exe: 1986200 .... 5.00%
aligned="false" method="swap16" len="2056": node-bn.exe: 1585000 node-zb.exe: 1507700 .... 5.13%
aligned="false" method="swap16" len="4096": node-bn.exe: 824090 node-zb.exe: 780640 ...... 5.57%
aligned="false" method="swap16" len="8192": node-bn.exe: 417580 node-zb.exe: 392840 ...... 6.30%

aligned="false" method="swap32" len="256": node-bn.exe: 9199500 node-zb.exe: 9673100 .... -4.90%
aligned="false" method="swap32" len="512": node-bn.exe: 5811400 node-zb.exe: 5965200 .... -2.58%
aligned="false" method="swap32" len="768": node-bn.exe: 4207800 node-zb.exe: 4316100 .... -2.51%
aligned="false" method="swap32" len="1024": node-bn.exe: 3324500 node-zb.exe: 3367700 ... -1.28%
aligned="false" method="swap32" len="1536": node-bn.exe: 2334200 node-zb.exe: 2337200 ... -0.13%
aligned="false" method="swap32" len="2056": node-bn.exe: 1782200 node-zb.exe: 1770500 .... 0.66%
aligned="false" method="swap32" len="4096": node-bn.exe: 924390 node-zb.exe: 916570 ...... 0.85%
aligned="false" method="swap32" len="8192": node-bn.exe: 474670 node-zb.exe: 458800 ...... 3.46%

aligned="false" method="swap64" len="256": node-bn.exe: 6473300 node-zb.exe: 9449900 ... -31.50%
aligned="false" method="swap64" len="512": node-bn.exe: 3730000 node-zb.exe: 5946600 ... -37.28%
aligned="false" method="swap64" len="768": node-bn.exe: 2641800 node-zb.exe: 4304500 ... -38.63%
aligned="false" method="swap64" len="1024": node-bn.exe: 2031200 node-zb.exe: 3353800 .. -39.43%
aligned="false" method="swap64" len="1536": node-bn.exe: 1397300 node-zb.exe: 2341700 .. -40.33%
aligned="false" method="swap64" len="2056": node-bn.exe: 1056600 node-zb.exe: 1794600 .. -41.12%
aligned="false" method="swap64" len="4096": node-bn.exe: 539910 node-zb.exe: 942980 .... -42.74%
aligned="false" method="swap64" len="8192": node-bn.exe: 271870 node-zb.exe: 481670 .... -43.56%

(Are different build args used to make the precompiled binaries that would affect use of SSE and AVX?)

Testing on linux now...

@zbjornson
Copy link
Contributor

zbjornson commented Jul 12, 2016

Mighty long benchmark.

Linux with gcc and default build params I'm getting far closer (avg -5.6%) as you found. (Using std::swap is slow compared to a = d[0], d[0] = d[1], d[1] = a, so the unaligned case is faster in your branch -- easy fix.)

Not sure what to do then...

@rvagg
Copy link
Member

rvagg commented Jul 12, 2016

/cc @nodejs/build @jbergstroem, ref: nodejs/build#367. Any updates on the CI side?

no, sorry, everything is terrible still and we only have 10.10 coverage still

@bnoordhuis
Copy link
Member Author

@zbjornson You mentioned that performance on Windows was significantly worse. Did you build with VS 2013 or 2015?

@zbjornson
Copy link
Contributor

zbjornson commented Aug 4, 2016

@bnoordhuis I'm actually not positive which version the above benchmarks came from. I have definitely used 2015 for some of this, but I've also used 2013 and I think even 2012 with some service packs (which I had on one computer until this week!). Haven't looked at the benchmarks side-by-side.

I also haven't had time yet to test both PRs with different GCC flags, if only to address linux users building from source with said flags.

Will work on some more benchmark comparisons.

@gibfahn
Copy link
Member

gibfahn commented Sep 24, 2016

@bnoordhuis
Copy link
Member Author

I think this can be closed now that #7618 has been fixed by #7645 in commit 7420835.

@bnoordhuis bnoordhuis closed this Oct 17, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
buffer Issues and PRs related to the buffer subsystem. c++ Issues and PRs that require attention from people who are familiar with C++. lib / src Issues and PRs related to general changes in the lib or src directory.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

osx, build: Node v6.3.0 won't build on Clang 3.2 (OSX 10.8 Darwin 12.5.0)
8 participants