Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build: Drop endianess workaround #7

Open
wants to merge 1 commit into
base: bitcoin-fork
Choose a base branch
from

Conversation

fanquake
Copy link
Member

This mirrors a change in leveldb: google/leveldb@201f522, now that compilers can better optimise the generic code.

This change is part of bitcoin/bitcoin#29852.

@laanwj
Copy link
Member

laanwj commented Apr 15, 2024

going to close and open to test event delivery

@laanwj laanwj closed this Apr 15, 2024
@laanwj laanwj reopened this Apr 15, 2024
@hebasto
Copy link
Member

hebasto commented Apr 16, 2024

Cross posting my concerns regarding possible performance deterioration for MSVC builds.

@hebasto
Copy link
Member

hebasto commented Apr 19, 2024

bitcoin/bitcoin#29852 (comment):

What benchmarks might be appropiate for testing changes like these?

Microbenchmarks + IBD?

There's the only benchmark in the Bitcoin Core benchmark set that is noticeably affected by the touched functions performance, which is BlockFilterIndexSync.

I'm going to provide numbers for MSVC builds shortly. They will use the CMake staging branch as the master branch sets HAVE_CRC32C=0 for the leveldb library.

@hebasto
Copy link
Member

hebasto commented Apr 20, 2024

I did benchmarks on the following system:

  1. Windows 11 Pro 23H2.
  2. CPU Intel Core i5-8350U.
  3. Visual Studio v17.9.6.
  4. MSBuild version 17.9.8+b34f75857.

Compiled with /O2 flag ("Release" configuration).

  • No patch applied:
>src\bench\Release\bench_bitcoin.exe -filter=BlockFilterIndexSync

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|    3,664,848,800.00 |                0.27 |    0.7% |    219.34 | `BlockFilterIndexSync`

>src\bench\Release\bench_bitcoin.exe -filter=BlockFilterIndexSync

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|    3,690,305,700.00 |                0.27 |    1.1% |    217.64 | `BlockFilterIndexSync`

>src\bench\Release\bench_bitcoin.exe -filter=BlockFilterIndexSync

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|    3,603,970,600.00 |                0.28 |    0.8% |    213.21 | `BlockFilterIndexSync`

  • With the patch:
>src\bench\Release\bench_bitcoin.exe -filter=BlockFilterIndexSync

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|    3,736,477,540.00 |                0.27 |    1.3% |    219.36 | `BlockFilterIndexSync`

>src\bench\Release\bench_bitcoin.exe -filter=BlockFilterIndexSync

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|    3,677,849,060.00 |                0.27 |    0.4% |    216.40 | `BlockFilterIndexSync`

>src\bench\Release\bench_bitcoin.exe -filter=BlockFilterIndexSync

|               ns/op |                op/s |    err% |     total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
|    3,654,515,533.33 |                0.27 |    0.7% |    215.83 | `BlockFilterIndexSync`

I have to admit that there isn't any significant changes.

Compiling with MSVC without crc32c library, as it is done in the master branch, does not have any performance impact as well.

@hebasto
Copy link
Member

hebasto commented Apr 20, 2024

For crc32c microbenchmarks, the performance deterioration (x86_64, MSVC) is quite obvious:

  • the main branch:
> .\Release\crc32c_bench.exe
2024-04-20T11:41:18+01:00
Running C:\Users\hebasto\crc32c\build-master\Release\crc32c_bench.exe
Run on (8 X 2496 MHz CPU s)
CPU Caches:
  L1 Data 48 KiB (x8)
  L1 Instruction 32 KiB (x8)
  L2 Unified 1280 KiB (x8)
  L3 Unified 24576 KiB (x8)
--------------------------------------------------------------------------------------------
Benchmark                                  Time             CPU   Iterations UserCounters...
--------------------------------------------------------------------------------------------
CRC32CBenchmark/Public/256              20.9 ns         20.5 ns     32000000 bytes_per_second=11.6257G/s
CRC32CBenchmark/Public/4096              120 ns          122 ns      6400000 bytes_per_second=31.25G/s
CRC32CBenchmark/Public/65536            1783 ns         1758 ns       373333 bytes_per_second=34.7222G/s
CRC32CBenchmark/Public/1048576         44380 ns        45200 ns        16593 bytes_per_second=21.6055G/s
CRC32CBenchmark/Public/16777216      1496335 ns      1506024 ns          498 bytes_per_second=10.375G/s
CRC32CBenchmark/Portable/256            59.0 ns         58.6 ns     11200000 bytes_per_second=4.06901G/s
CRC32CBenchmark/Portable/4096            787 ns          802 ns       896000 bytes_per_second=4.75543G/s
CRC32CBenchmark/Portable/65536         13495 ns        13497 ns        49778 bytes_per_second=4.52198G/s
CRC32CBenchmark/Portable/1048576      268905 ns       272042 ns         2240 bytes_per_second=3.58974G/s
CRC32CBenchmark/Portable/16777216    6816839 ns      6944444 ns           90 bytes_per_second=2.25G/s
CRC32CBenchmark/Sse42/256               21.0 ns         20.9 ns     34461538 bytes_per_second=11.4313G/s
CRC32CBenchmark/Sse42/4096               119 ns          117 ns      5600000 bytes_per_second=32.5521G/s
CRC32CBenchmark/Sse42/65536             1987 ns         1988 ns       298667 bytes_per_second=30.7018G/s
CRC32CBenchmark/Sse42/1048576          56126 ns        55804 ns        11200 bytes_per_second=17.5G/s
CRC32CBenchmark/Sse42/16777216       1510869 ns      1534598 ns          448 bytes_per_second=10.1818G/s
  • this PR:
> .\Release\crc32c_bench.exe
2024-04-20T11:32:12+01:00
Running C:\Users\hebasto\crc32c\build-pr64\Release\crc32c_bench.exe
Run on (8 X 2496 MHz CPU s)
CPU Caches:
  L1 Data 48 KiB (x8)
  L1 Instruction 32 KiB (x8)
  L2 Unified 1280 KiB (x8)
  L3 Unified 24576 KiB (x8)
--------------------------------------------------------------------------------------------
Benchmark                                  Time             CPU   Iterations UserCounters...
--------------------------------------------------------------------------------------------
CRC32CBenchmark/Public/256              44.1 ns         44.4 ns     17230769 bytes_per_second=5.36573G/s
CRC32CBenchmark/Public/4096              621 ns          628 ns      1120000 bytes_per_second=6.07639G/s
CRC32CBenchmark/Public/65536           11199 ns        11230 ns        64000 bytes_per_second=5.43478G/s
CRC32CBenchmark/Public/1048576        193586 ns       196725 ns         3733 bytes_per_second=4.9641G/s
CRC32CBenchmark/Public/16777216      3245298 ns      3278460 ns          224 bytes_per_second=4.76596G/s
CRC32CBenchmark/Portable/256            87.2 ns         88.9 ns      8960000 bytes_per_second=2.68076G/s
CRC32CBenchmark/Portable/4096           1264 ns         1283 ns       560000 bytes_per_second=2.97215G/s
CRC32CBenchmark/Portable/65536         20889 ns        21484 ns        32000 bytes_per_second=2.84091G/s
CRC32CBenchmark/Portable/1048576      389843 ns       386151 ns         1659 bytes_per_second=2.52896G/s
CRC32CBenchmark/Portable/16777216    7098817 ns      7118056 ns           90 bytes_per_second=2.19512G/s
CRC32CBenchmark/Sse42/256               40.1 ns         40.1 ns     17920000 bytes_per_second=5.94429G/s
CRC32CBenchmark/Sse42/4096               612 ns          614 ns      1120000 bytes_per_second=6.21449G/s
CRC32CBenchmark/Sse42/65536            11558 ns        11719 ns        64000 bytes_per_second=5.20833G/s
CRC32CBenchmark/Sse42/1048576         186027 ns       188354 ns         3733 bytes_per_second=5.18472G/s
CRC32CBenchmark/Sse42/16777216       3123212 ns      3138951 ns          224 bytes_per_second=4.97778G/s

@fanquake fanquake mentioned this pull request Aug 28, 2024
fanquake added a commit that referenced this pull request Aug 30, 2024
21fc8ef Fix typo (google#59) (Dimitris Apostolou)
89f6984 Fix misspelled "Proccess" in comment (Munkybooty)
02e65f4 Bump deps (google#56) (Victor Costan)
b9d6e82 Fix Windows CI build. (google#54) (Victor Costan)
bbbb93a Switch CI to GitHub Actions (google#55) (Victor Costan)
d46cd17 Add clangd cache directory to .gitignore. (Victor Costan)

Pull request description:

  Pulls the few changes from upstream (last commit ~ 2 years ago).
  Seems reasonable to do before making any other changes. i.e #7, or if we are going to make further changes to the CMake build system etc.

Top commit has no ACKs.

Tree-SHA512: ca1cd5a085584d6a4ae65c5d83ae80db7bdd9ab251dc56fa899787dfa0b6fe0a60d32e89b5af5d066fa925a7120f02749aaa854e013f35c442f37a37a30caf23
This mirrors a change in leveldb:
google/leveldb@201f522,
now that compilers can better optimise the generic code.
@fanquake fanquake force-pushed the drop_endian_workaround branch from dfc289c to 1466756 Compare August 30, 2024 10:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants