Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect the number of performance cores in the M1 macs via the new macos12 API #44072

Merged
merged 3 commits into from
Feb 8, 2022

Conversation

gbaraldi
Copy link
Member

@gbaraldi gbaraldi commented Feb 8, 2022

Should fix #44067. The logic is a bit convoluted right now due to older mac versions not having the API.

src/sys.c Outdated Show resolved Hide resolved
Copy link
Contributor

@navidcy navidcy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this on a MacBook Pro M1 Max and it works.

navid:gbaraldi_julia/ (add-ncpus-apple) $ /Users/navid/gbaraldi_julia/julia --threads=auto                                                                                                                         [13:59:59]
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.8.0-DEV.1459 (2022-02-08)
 _/ |\__'_|_|_|\__'_|  |  add-ncpus-apple/304e20b1b6 (fork: 109 commits, 19 days)
|__/                   |

julia> versioninfo()
Julia Version 1.8.0-DEV.1459
Commit 304e20b1b6 (2022-02-08 02:45 UTC)
Platform Info:
  OS: macOS (arm64-apple-darwin21.3.0)
  CPU: Apple M1 Max
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.0 (ORCJIT, cyclone)
Environment:
  JULIA_EDITOR = vim

julia> Threads.nthreads()
8

@oscardssmith oscardssmith merged commit d30f403 into JuliaLang:master Feb 8, 2022
@vchuravy
Copy link
Member

vchuravy commented Feb 8, 2022

If someone can also see how to do that for Intel on Alderlake (on Linux/Windows). That might make @chriselrod happy

antoine-levitt pushed a commit to antoine-levitt/julia that referenced this pull request Feb 17, 2022
…os12 API (JuliaLang#44072)

* Use the new macos12 api to query perf cores
LilithHafner pushed a commit to LilithHafner/julia that referenced this pull request Feb 22, 2022
…os12 API (JuliaLang#44072)

* Use the new macos12 api to query perf cores
@gbaraldi gbaraldi deleted the add-ncpus-apple branch February 24, 2022 15:06
LilithHafner pushed a commit to LilithHafner/julia that referenced this pull request Mar 8, 2022
…os12 API (JuliaLang#44072)

* Use the new macos12 api to query perf cores
@giordano
Copy link
Contributor

giordano commented Mar 25, 2022

This is macOS-specific, but now we can run also Linux on this CPU 🙂 Do we know a similar API for Linux? I get 8 virtual cores in Asahi Linux
image (sorry for the screenshot, can't boot this system right now)

Edit: well, I guess this is the same question as the one above 😄

@staticfloat
Copy link
Sponsor Member

Just shooting in the dark here, but can you look around in e.g. /sys/devices/system/cpu/cpu*/topology/ to see if there's anything interesting?

@giordano
Copy link
Contributor

Difference between cpu0 and cpu3 (same "family"):

$ diff -ur /sys/devices/system/cpu/cpu{0,3}/topology/ 2>| /dev/null
diff -ur /sys/devices/system/cpu/cpu0/topology/cluster_cpus /sys/devices/system/cpu/cpu3/topology/cluster_cpus
--- /sys/devices/system/cpu/cpu0/topology/cluster_cpus  2022-03-27 18:30:03.818188313 +0100
+++ /sys/devices/system/cpu/cpu3/topology/cluster_cpus  2022-03-27 18:32:38.189584324 +0100
@@ -1 +1 @@
-01
+08
diff -ur /sys/devices/system/cpu/cpu0/topology/cluster_cpus_list /sys/devices/system/cpu/cpu3/topology/cluster_cpus_list
--- /sys/devices/system/cpu/cpu0/topology/cluster_cpus_list     2022-03-27 18:30:03.818188313 +0100
+++ /sys/devices/system/cpu/cpu3/topology/cluster_cpus_list     2022-03-27 18:32:38.189584324 +0100
@@ -1 +1 @@
-0
+3
diff -ur /sys/devices/system/cpu/cpu0/topology/core_cpus /sys/devices/system/cpu/cpu3/topology/core_cpus
--- /sys/devices/system/cpu/cpu0/topology/core_cpus     2022-03-27 18:30:03.818188313 +0100
+++ /sys/devices/system/cpu/cpu3/topology/core_cpus     2022-03-27 18:32:38.189584324 +0100
@@ -1 +1 @@
-01
+08
diff -ur /sys/devices/system/cpu/cpu0/topology/core_cpus_list /sys/devices/system/cpu/cpu3/topology/core_cpus_list
--- /sys/devices/system/cpu/cpu0/topology/core_cpus_list        2022-03-27 18:30:03.818188313 +0100
+++ /sys/devices/system/cpu/cpu3/topology/core_cpus_list        2022-03-27 18:32:38.189584324 +0100
@@ -1 +1 @@
-0
+3
diff -ur /sys/devices/system/cpu/cpu0/topology/core_id /sys/devices/system/cpu/cpu3/topology/core_id
--- /sys/devices/system/cpu/cpu0/topology/core_id       2022-03-27 18:30:03.818188313 +0100
+++ /sys/devices/system/cpu/cpu3/topology/core_id       2022-03-27 18:31:44.204678091 +0100
@@ -1 +1 @@
-0
+3
diff -ur /sys/devices/system/cpu/cpu0/topology/thread_siblings /sys/devices/system/cpu/cpu3/topology/thread_siblings
--- /sys/devices/system/cpu/cpu0/topology/thread_siblings       2022-03-27 18:30:03.818188313 +0100
+++ /sys/devices/system/cpu/cpu3/topology/thread_siblings       2022-03-27 18:32:38.190584322 +0100
@@ -1 +1 @@
-01
+08
diff -ur /sys/devices/system/cpu/cpu0/topology/thread_siblings_list /sys/devices/system/cpu/cpu3/topology/thread_siblings_list
--- /sys/devices/system/cpu/cpu0/topology/thread_siblings_list  2022-03-27 18:30:03.818188313 +0100
+++ /sys/devices/system/cpu/cpu3/topology/thread_siblings_list  2022-03-27 18:32:38.190584322 +0100
@@ -1 +1 @@
-0
+3

Difference between cpu3 and cpu4 (different "families"):

$ diff -ur /sys/devices/system/cpu/cpu{3,4}/topology/ 2>| /dev/null
diff -ur /sys/devices/system/cpu/cpu3/topology/cluster_cpus /sys/devices/system/cpu/cpu4/topology/cluster_cpus
--- /sys/devices/system/cpu/cpu3/topology/cluster_cpus  2022-03-27 18:32:38.189584324 +0100
+++ /sys/devices/system/cpu/cpu4/topology/cluster_cpus  2022-03-27 18:32:38.189584324 +0100
@@ -1 +1 @@
-08
+10
diff -ur /sys/devices/system/cpu/cpu3/topology/cluster_cpus_list /sys/devices/system/cpu/cpu4/topology/cluster_cpus_list
--- /sys/devices/system/cpu/cpu3/topology/cluster_cpus_list     2022-03-27 18:32:38.189584324 +0100
+++ /sys/devices/system/cpu/cpu4/topology/cluster_cpus_list     2022-03-27 18:32:38.189584324 +0100
@@ -1 +1 @@
-3
+4
diff -ur /sys/devices/system/cpu/cpu3/topology/core_cpus /sys/devices/system/cpu/cpu4/topology/core_cpus
--- /sys/devices/system/cpu/cpu3/topology/core_cpus     2022-03-27 18:32:38.189584324 +0100
+++ /sys/devices/system/cpu/cpu4/topology/core_cpus     2022-03-27 18:32:38.189584324 +0100
@@ -1 +1 @@
-08
+10
diff -ur /sys/devices/system/cpu/cpu3/topology/core_cpus_list /sys/devices/system/cpu/cpu4/topology/core_cpus_list
--- /sys/devices/system/cpu/cpu3/topology/core_cpus_list        2022-03-27 18:32:38.189584324 +0100
+++ /sys/devices/system/cpu/cpu4/topology/core_cpus_list        2022-03-27 18:32:38.189584324 +0100
@@ -1 +1 @@
-3
+4
diff -ur /sys/devices/system/cpu/cpu3/topology/core_id /sys/devices/system/cpu/cpu4/topology/core_id
--- /sys/devices/system/cpu/cpu3/topology/core_id       2022-03-27 18:31:44.204678091 +0100
+++ /sys/devices/system/cpu/cpu4/topology/core_id       2022-03-27 18:31:44.204678091 +0100
@@ -1 +1 @@
-3
+0
diff -ur /sys/devices/system/cpu/cpu3/topology/core_siblings /sys/devices/system/cpu/cpu4/topology/core_siblings
--- /sys/devices/system/cpu/cpu3/topology/core_siblings 2022-03-27 18:32:38.189584324 +0100
+++ /sys/devices/system/cpu/cpu4/topology/core_siblings 2022-03-27 18:32:38.189584324 +0100
@@ -1 +1 @@
-0f
+f0
diff -ur /sys/devices/system/cpu/cpu3/topology/core_siblings_list /sys/devices/system/cpu/cpu4/topology/core_siblings_list
--- /sys/devices/system/cpu/cpu3/topology/core_siblings_list    2022-03-27 18:32:38.189584324 +0100
+++ /sys/devices/system/cpu/cpu4/topology/core_siblings_list    2022-03-27 18:32:38.189584324 +0100
@@ -1 +1 @@
-0-3
+4-7
diff -ur /sys/devices/system/cpu/cpu3/topology/package_cpus /sys/devices/system/cpu/cpu4/topology/package_cpus
--- /sys/devices/system/cpu/cpu3/topology/package_cpus  2022-03-27 18:32:38.190584322 +0100
+++ /sys/devices/system/cpu/cpu4/topology/package_cpus  2022-03-27 18:32:38.190584322 +0100
@@ -1 +1 @@
-0f
+f0
diff -ur /sys/devices/system/cpu/cpu3/topology/package_cpus_list /sys/devices/system/cpu/cpu4/topology/package_cpus_list
--- /sys/devices/system/cpu/cpu3/topology/package_cpus_list     2022-03-27 18:32:38.190584322 +0100
+++ /sys/devices/system/cpu/cpu4/topology/package_cpus_list     2022-03-27 18:32:38.190584322 +0100
@@ -1 +1 @@
-0-3
+4-7
diff -ur /sys/devices/system/cpu/cpu3/topology/physical_package_id /sys/devices/system/cpu/cpu4/topology/physical_package_id
--- /sys/devices/system/cpu/cpu3/topology/physical_package_id   2022-03-27 18:32:05.788637996 +0100
+++ /sys/devices/system/cpu/cpu4/topology/physical_package_id   2022-03-27 18:32:05.788637996 +0100
@@ -1 +1 @@
-0
+1
diff -ur /sys/devices/system/cpu/cpu3/topology/thread_siblings /sys/devices/system/cpu/cpu4/topology/thread_siblings
--- /sys/devices/system/cpu/cpu3/topology/thread_siblings       2022-03-27 18:32:38.190584322 +0100
+++ /sys/devices/system/cpu/cpu4/topology/thread_siblings       2022-03-27 18:32:38.190584322 +0100
@@ -1 +1 @@
-08
+10
diff -ur /sys/devices/system/cpu/cpu3/topology/thread_siblings_list /sys/devices/system/cpu/cpu4/topology/thread_siblings_list
--- /sys/devices/system/cpu/cpu3/topology/thread_siblings_list  2022-03-27 18:32:38.190584322 +0100
+++ /sys/devices/system/cpu/cpu4/topology/thread_siblings_list  2022-03-27 18:32:38.190584322 +0100
@@ -1 +1 @@
-3
+4

Also, looking at /proc/cpuinfo I noticed that CPUs from the two different "families" have different "CPU part" value:

$ grep -A6 'processor.*:.*[34]' /proc/cpuinfo
processor       : 3
BogoMIPS        : 48.00
Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 asimddp sha512 asimdfhm dit uscat ilrcpc flagm ssbs sb paca pacg dcpodp flagm2 frint
CPU implementer : 0x61
CPU architecture: 8
CPU variant     : 0x1
CPU part        : 0x022
--
processor       : 4
BogoMIPS        : 48.00
Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 asimddp sha512 asimdfhm dit uscat ilrcpc flagm ssbs sb paca pacg dcpodp flagm2 frint
CPU implementer : 0x61
CPU architecture: 8
CPU variant     : 0x1
CPU part        : 0x023

I'd also note that the "About this System" window in Asahi Linux correctly identifies the two different CPUs (4 firestorm and 4 icestorm):
image
So I'm moderately sure there is a way to programmatically distinguish the two CPUs (I'd probably need to dig out the code of this window), although I'm not sure there is a macOS-like way to query the "performance" ones only, and hardcoding this logic isn't very scalable.

@yuyichao
Copy link
Contributor

So I'm moderately sure there is a way to programmatically distinguish the two CPUs

// /sys/devices/system/cpu/cpu<n>/regs/identification/midr_el1 reader

@chritkhalil
Copy link

I'm currently running the newly released Julia version 1.9.1 on an m2 mac air, and I'm facing the same issue.

After reading through this thread about the detection of the number of performance cores in m1 CPUs, I wonder if the same patch may need to be applied to the m2 chips or if there's a different problem. I believe that BLAS.get_num_threads() should return 8 on an 8-core system, unless its intently restricted to use of some cores due to the nature of the architecture (maybe?).

Testing with 1.7.1 and 1.8.0-beta1~x64, BLAS.get_num_threads() returns 8

@gbaraldi
Copy link
Member Author

gbaraldi commented Jul 3, 2023

So, from measurements, using all cores lead to slower performance than limiting to only performance cores, it also made the system very sluggish. I might be wrong but the 8 core m2 has 4 performance and 4 efficiency which means we tell blas to use 4 cores.

@staticfloat
Copy link
Sponsor Member

That is correct, we default to the number of performance cores, as the efficiency cores provide very little benefit to Julia, but create a large performance issue for the rest of the system. You can see how many performance cores you have by opening Activity Monitor and then clicking Window -> CPU History, which gives you a nice core-by-core activity graph, and annotates which are efficiency and performance. As an example, my M1 Pro has 8 performance cores:

image

@oscardssmith oscardssmith added multithreading Base.Threads and related functionality system:apple silicon Affects Apple Silicon only (Darwin/ARM64) - e.g. M1 and other M-series chips labels Jun 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
multithreading Base.Threads and related functionality system:apple silicon Affects Apple Silicon only (Darwin/ARM64) - e.g. M1 and other M-series chips
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MacBook Pro M1 Max doesn't give correct number of threads with --threads=auto
8 participants