Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

<algorithm>: reverse_copy optimization is not activated under C++20/C++23 mode #2415

Closed
statementreply opened this issue Dec 13, 2021 · 2 comments · Fixed by #2416
Closed
Labels
fixed Something works now, yay! performance Must go faster

Comments

@statementreply
Copy link
Contributor

Describe the bug

STL/stl/inc/algorithm

Lines 4366 to 4387 in 2f9f567

#pragma warning(suppress : 6326) // Potential comparison of a constant with another constant
if constexpr (_Allow_vectorization && _Nx <= 8 && (_Nx & (_Nx - 1)) == 0) {
#if _HAS_CXX20
if (!_STD is_constant_evaluated())
#endif // _HAS_CXX20
{
if constexpr (_Nx == 1) {
__std_reverse_copy_trivially_copyable_1(_To_address(_UFirst), _To_address(_ULast), _To_address(_UDest));
} else if constexpr (_Nx == 2) {
__std_reverse_copy_trivially_copyable_2(_To_address(_UFirst), _To_address(_ULast), _To_address(_UDest));
} else if constexpr (_Nx == 4) {
__std_reverse_copy_trivially_copyable_4(_To_address(_UFirst), _To_address(_ULast), _To_address(_UDest));
} else {
__std_reverse_copy_trivially_copyable_8(_To_address(_UFirst), _To_address(_ULast), _To_address(_UDest));
}
_UDest += _ULast - _UFirst;
_Seek_wrapped(_Dest, _UDest);
return _Dest;
}
}
#endif // _USE_STD_VECTOR_ALGORITHMS

There is an optimized code path in reverse_copy for contiguous ranges of small trivial types. However, for some reasons it is only activated under /std:c++14 and /std:c++17, not under /std:c++20 or /std:c++latest.

Command-line test case
https://godbolt.org/z/9bc17xT8e

C:\Users\He\source\temp>type reverse_copy.cpp
#include <algorithm>
#include <chrono>
#include <iostream>
#include <iterator>
#include <random>

constexpr int len = 100'000;
constexpr int iters = 100'000;

unsigned char src[len];
unsigned char dst[len];

int main() {
    std::mt19937 rng;
    std::uniform_int_distribution<> dist{0, 255};
    for (auto& x : src) {
        x = static_cast<unsigned char>(dist(rng));
    }

    const auto t0 = std::chrono::steady_clock::now();
    for (int i = 0; i < iters; ++i) {
        std::reverse_copy(std::cbegin(src), std::cend(src), std::begin(dst));
    }
    const auto t1 = std::chrono::steady_clock::now();

    std::cout << std::chrono::duration<double>{t1 - t0}.count() << " s\n";
    return 0;
}
C:\Users\He\source\temp>cl /EHsc /W4 /O2 /std:c++17 reverse_copy.cpp
用于 x64 的 Microsoft (R) C/C++ 优化编译器 19.31.30818 版
版权所有(C) Microsoft Corporation。保留所有权利。

reverse_copy.cpp
Microsoft (R) Incremental Linker Version 14.31.30818.0
Copyright (C) Microsoft Corporation.  All rights reserved.

/out:reverse_copy.exe
reverse_copy.obj

C:\Users\He\source\temp>.\reverse_copy.exe
0.194523 s

C:\Users\He\source\temp>cl /EHsc /W4 /O2 /std:c++20 reverse_copy.cpp
用于 x64 的 Microsoft (R) C/C++ 优化编译器 19.31.30818 版
版权所有(C) Microsoft Corporation。保留所有权利。

reverse_copy.cpp
Microsoft (R) Incremental Linker Version 14.31.30818.0
Copyright (C) Microsoft Corporation.  All rights reserved.

/out:reverse_copy.exe
reverse_copy.obj

C:\Users\He\source\temp>.\reverse_copy.exe
3.23843 s

STL version
2f9f567

@AlexGuteniev
Copy link
Contributor

Check for contiguous_iterator fails for a pointer with top level const

template<std::contiguous_iterator I>
struct S {};

int main()
{
    S<char*> s1;
    S<const char*> s2;
    S<char* const> s3; // fails
}

@miscco
Copy link
Contributor

miscco commented Dec 13, 2021

As discussed in discord, this is actually a copy & paste bug:

bool_constant<_Iterators_are_contiguous<decltype(_UFirst), decltype(_UDest)>>, is_trivially_copyable<_Elem>,

We are checking the type of _UFirst, which is declared as a const variable as we are actually iterating over _ULast, because this is a reverse algorithm.

So the solution would be to change that line to use _ULast rather than _UFirst

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fixed Something works now, yay! performance Must go faster
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants