Releases: kimwalisch/primecount
primecount-7.4
This is a new minor release, the API and ABI of primecount are backwards compatible. However, this release requires the latest primesieve-8.* library, it won't work with primesieve-7.*.
ChangeLog
CMakeLists.txt
: primecount now requires primesieve-8.0 or later.CMakeLists.txt
: Simplify, split up into multiple *.cmake files.primecount.pc.in
: primecount now requires primesieve-8.0 or later.Sieve.cpp
: Reduce branch mispredictions.B.cpp
: Faster counting of primes.P2.cpp
: Faster counting of primes.isqrt.cpp
: Improve code gen for 128-bit integers.- Update to latest libprimesieve-8.0 with
AVX512
support.
primecount-7.3
This is a new minor release, the API and ABI of primecount are backwards compatible. This release improves primecount's CMake build system (.e.g. improved OpenMP detection on macOS, build time POPCNT detection & new warning messages) and improves primecount's performance on big.LITTLE CPUs due to an update to the latest libprimesieve version.
ChangeLog
util.cpp
: Tuned gourdon alpha factors for primecount-7.3.nth_prime.cpp
: Improved performance for small n.FactorTable.hpp
: Reduced memory allocations.DFactorTable.hpp
: Reduced memory allocations.S2_trivial.cpp
: Reduced memory allocations.SegmentedPiTable.cpp
: Reduced memory allocations.CMakeLists.txt
: Detect at build time if the C++ compiler and the C++ Standard Library supportint128_t
. If not,int128_t
will be disabled. The-std=c++*
option usually disablesint128_t
, use-std=gnu++*
instead (or omit both-std=c++*
and-std=gnu++*
).CMakeLists.txt
: Detect at build time if the host CPU (x86) supports thePOPCNT
instruction. If not, disable thePOPCNT
instruction.CMakeLists.txt
: Fix AppleClang OpenMP detection.CMakeLists.txt
: Print warning message if OpenMP library is missing.CMakeLists.txt
: Add optionSTATICALLY_LINK_LIBPRIMECOUNT=ON/OFF
.appveyor.yml
: Test primecount using many different C++ compilers: GCC-5, GCC-7, GCC-8, GCC-9, Clang-9, AppleClang 13, MSVC 2019.ci.yml
: New Github Actions tests: Windows/MinGW-w64 and Linux with Valgrind and PrimePi(10^20) 128-bit test.- Update to latest libprimesieve-7.9.
primecount-7.2
This is a new minor release, the API and ABI of primecount are backwards compatible. The CPU cache size detection has been improved on big.LITTLE CPUs such as Intel Alder Lake, the documentation has been improved and primecount now has pkg-config/pkgconf support.
- Hard-Special-Leaves.md: Many new paragraphs: Multiple levels of counters, Batch counting, Runtime complexity and Appendix.
Sieve.hpp
: Add Counter struct.Sieve.cpp
: Use new Counter struct.LoadBalancerS2.cpp
: Increase default sieve array size to 128 KiB.primecount.pc.in
: Add pkg-config/pkgconf support.CMakeLists.txt
: AddWITH_MSVC_CRT_STATIC
option to force static linking.- Updated to libprimesieve-7.7 (improved CPU cache size detection).
primecount-7.1
PrimePi(x) is now computed in O(1) for small values of x < 64 * 240 using a lookup table. primecount's partial sieve function phi(x, a) implementation now uses a compressed lookup table for small values of a. It can compute phi(x, a) in O(1) for a β€ 8 (previously 6). The improved phi(x, a) also benefits the computation of the hard special leaves which now does more pre-sieving and hence runs slightly faster.
- Partial-Sieve-Function.md: Description of phi(x, a) algorithm.
PhiTiny.cpp
: Use compressed phi(x, a) lookup table.phi.cpp
: More correct usage of recursive phi(x, a) formula.PiTable.cpp
: Add PrimePi(x) lookup table for x < 64 * 240.generate_phi.hpp
: More correct usage of recursive phi(x, a) formula.nth_prime.cpp
: Cache small primes β€ 1009.nth_prime.cpp
: Improve error handling, n must be β₯ 1.appveyor.yml
: Fix debug build.
primecount-7.0
In primecout-6.x the AC algorithm (which is part of Gourdon's algorithm) scales very badly on PCs & servers with a large number of CPU cores. The two root causes of this scaling issue are cache misses and thread synchronization overhead. In primecount-7.0 the AC algorithm has been completely rewritten, now all threads are independent from each other and require only minimal synchronization. Furthermore each thread operates on its own tiny chunk of memory that conveniently fits into the CPU's cache. On my dual socket AMD EPYC server this improves performance by more than 2x for large AC computations with x β₯ 1e23. The P2 & B algorithms have also been rewritten so that the threads are independent from each other.
The Easy-Special-Leaves.md document contains an in-depth description of the new AC algorithm.
AC.cpp
: New algorithm with improved scaling.AC_libdivide.cpp
: New algorithm with improved scaling.B.cpp
: Improved scaling due to independent threads.P2.cpp
: Improved scaling due to independent threads.LoadBalancerAC.cpp
: New thread scheduler for AC algorithm.LoadBalancerS2.cpp
: Improve load balancing of S2_hard & D.LoadBalancerP2.cpp
: Rewritten, now similar to other load balancers.SegmentedPiTable.cpp
: Decrease size to x^(1/4).util.cpp
: Improve scaling using larger default alpha_z = 2.imath.hpp
: Optimizeilog2()
&next_power_of_2()
using__builtin_clzll()
.- Remove MPI support: No users, too much work to maintain.
primecount-6.4
This is a minor new release, the API and ABI (Application binary interface) are backwards compatible.
This version fixes a critical integer overflow in the B formula which is part of Xavier Gourdon's prime counting algorithm. The integer overflow occurred when computing B(x) with x > 1e25, this bug has been present in primecount-6.1, 6.2 and 6.3. This version also adds a workaround for a severe scaling issue in Clang's OpenMP library (Clang 11, 2021) that occurs when using dynamic thread scheduling combined with long running computations. Because of this issue primecount-6.3 compiled with Clang takes 2.5x longer to compute AC(1e24) than primecount-6.3 compiled with GCC on my dual-socket AMD EPYC Rome server. I have submitted a bug report to LLVM which contains more information: https://bugs.llvm.org/show_bug.cgi?id=49588
B.cpp
: Fix integer overflow (thanks to David Baugh for reporting it)B_mpi.cpp
: Fix integer overflow (same as inB.cpp
).P2.cpp
: Fix integer overflow (same as inB.cpp
).for_atomic.hpp
: Lock-free for loop built withstd::atomic
.AC.cpp
: Fix Clang/OpenMP scaling issue.AC.cpp
: Decrease size of SegmentedPiTable by 1.5x.AC_libdivide.cpp
: Fix Clang/OpenMP scaling issue.AC_libdivide.cpp
: Decrease size of SegmentedPiTable by 1.5x.S2_easy.cpp
: Fix Clang/OpenMP scaling issue.S2_easy_libdivide.cpp
: Fix Clang/OpenMP scaling issue.SegmentedPiTable.cpp
: Increase minimum segment size.SegmentedPiTable.cpp
: Annotate pi(x) withALWAYS_INLINE
.Sieve.cpp
: Simplify counters_dist initialization.PiTable.cpp
: Annotate pi(x) withALWAYS_INLINE
.LoadBalancerP2.cpp
: Fix last iteration detection.
primecount-backup-6.4
See the ChangeLog for what's new.
Warning: primecount-backup-6.4 cannot resume AC computations from backup files produced by previous versions of primecount-backup (6.3, 6.2 & 6.0). primecount-backup-6.4 can however resume all other computations from backup files produced by primecount-backup-6.x.
primecount-6.3
This is a minor new release, the API and ABI (Application binary interface) are backwards compatible.
The memory usage of the PiTable
and SegmentedPiTable
has been reduced by about 2x, this speeds up the computation of the easy special leaves (S2_easy & AC formulas) by up to 30% for large pi(x) computations with x β₯ 1e23. Furthermore, the partial sieve function a.k.a. phi(x, a) has been improved significantly, it now runs 3 to 5x faster for most input. phi(x, a) is used extensively by many other functions such as pi_legendre(x)
and pi_meissel(x)
which now also run up to 5x faster.
PiTable.cpp
: Reduce memory usage by 2x.SegmentedPiTable.cpp
: Reduce memory usage by 2x.Sieve.cpp
: Reduce memory usage of counters array by 2x.phi.cpp
: Fixed integer overflow #39.phi.cpp
: Cache 40x more phi(x, a) results.phi.cpp
: Use pi(x) if a > pi(sqrt(x)).phi.cpp
: Use larger c constant ifphi(x, larger_c)
is cached.generate_phi.hpp
: Same optimizations as phi.cpp.LoadBalancer.cpp
: Tune for new phi(x, a) implementation.FactorTable.hpp
: Reduce number of branches.DFactorTable.hpp
: Reduce number of branches.
primecount-backup-6.3
See the ChangeLog for what's new.
primecount-6.2
This is a minor new release, the API and ABI (Application binary interface) are backwards compatible.
The sieving algorithm has been improved by annotating branches with likely/unlikely and __builtin_unreachable()
. This improves the performance of the S2_hard and D formulas by up to 5%. GCC benefits most from these changes (Clang's performance was already very good before). With this release there is also a new primecount-backup version (last primecount-backup release was 6.0).
In order to reduce the amount of work for myself there will be no precompiled binaries anymore going forward (except for Windows). You can now install primecount using your operating system's package manager (if it is available there) or by compiling it from source yourself.
ChangeLog
- Use Appveyor-CI instead of Travis-CI for testing.
README.md
: primecount can now be installed using package managers.Sieve.cpp
: Annotate branches with unlikely, up to 5% speedup.Sieve.cpp
: Annotate switch cases with fallthrough.AC.cpp
: Reduce number of function paramaters.AC_libdivide.cpp
: Reduce number of function paramaters.B.cpp
: Improve GCC performance.P2.cpp
: Improve GCC performance.popcnt.hpp
: Improve GCC performance.