Benchmark Turing Pi 2 cluster with 4x RK1 Nodes #27

geerlingguy · 2024-02-23T15:05:31Z

I am going to see how well 4x RK1 nodes (each running an 8-core RK3588 SoC) will run on the Turing Pi 2 cluster.

geerlingguy · 2024-02-23T16:06:16Z

I was having some trouble getting the test to run (I am using an ssh_user of ubuntu), then realized avahi-daemon isn't running on this install, so mDNS / .local discovery doesn't work (I was testing with DNS names like turing1.local, turing2.local, etc.

So I installed it on all four nodes: sudo apt-get install avahi-daemon

And now DNS resolution for .local hostnames on each of the cluster nodes is working. I'll add a note in the README.

geerlingguy · 2024-02-23T16:20:34Z

Benchmark is running now. First test with 4/8 Ps and Qs: 228.94 Gflops

================================================================================
HPLinpack 2.3  --  High-Performance Linpack benchmark  --   December 2, 2018
Written by A. Petitet and R. Clint Whaley,  Innovative Computing Laboratory, UTK
Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK
Modified by Julien Langou, University of Colorado Denver
================================================================================

An explanation of the input/output parameters follows:
T/V    : Wall time / encoded variant.
N      : The order of the coefficient matrix A.
NB     : The partitioning blocking factor.
P      : The number of process rows.
Q      : The number of process columns.
Time   : Time in seconds to solve the linear system.
Gflops : Rate of execution for solving the linear system.

The following parameter values will be used:

N      :  100009
NB     :     256
PMAP   : Row-major process mapping
P      :       4
Q      :       8
PFACT  :   Right
NBMIN  :       4
NDIV   :       2
RFACT  :   Crout
BCAST  :  1ringM
DEPTH  :       1
SWAP   : Mix (threshold = 64)
L1     : transposed form
U      : transposed form
EQUIL  : yes
ALIGN  : 8 double precision words

--------------------------------------------------------------------------------

- The matrix A is randomly generated for each test.
- The following scaled residual check will be computed:
      ||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )
- The relative machine precision (eps) is taken to be               1.110223e-16
- Computational tests pass if scaled residuals are less than                16.0

================================================================================
T/V                N    NB     P     Q               Time                 Gflops
--------------------------------------------------------------------------------
WR11C2R4      100009   256     4     8            2912.76             2.2894e+02
HPL_pdgesv() start time Fri Feb 23 16:15:13 2024

HPL_pdgesv() end time   Fri Feb 23 17:03:46 2024

--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=   2.41638360e-03 ...... PASSED
================================================================================

Finished      1 tests with the following results:
              1 tests completed and passed residual checks,
              0 tests completed and failed residual checks,
              0 tests skipped because of illegal input values.
--------------------------------------------------------------------------------

End of Tests.
================================================================================

geerlingguy · 2024-02-23T18:58:19Z

Testing with one node, while measuring power consumption: 59.810 Gflops, 18.1W, for 3.30 Gflops/W

================================================================================
HPLinpack 2.3  --  High-Performance Linpack benchmark  --   December 2, 2018
Written by A. Petitet and R. Clint Whaley,  Innovative Computing Laboratory, UTK
Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK
Modified by Julien Langou, University of Colorado Denver
================================================================================

An explanation of the input/output parameters follows:
T/V    : Wall time / encoded variant.
N      : The order of the coefficient matrix A.
NB     : The partitioning blocking factor.
P      : The number of process rows.
Q      : The number of process columns.
Time   : Time in seconds to solve the linear system.
Gflops : Rate of execution for solving the linear system.

The following parameter values will be used:

N      :   50004
NB     :     256
PMAP   : Row-major process mapping
P      :       1
Q      :       4
PFACT  :   Right
NBMIN  :       4
NDIV   :       2
RFACT  :   Crout
BCAST  :  1ringM
DEPTH  :       1
SWAP   : Mix (threshold = 64)
L1     : transposed form
U      : transposed form
EQUIL  : yes
ALIGN  : 8 double precision words

--------------------------------------------------------------------------------

- The matrix A is randomly generated for each test.
- The following scaled residual check will be computed:
      ||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )
- The relative machine precision (eps) is taken to be               1.110223e-16
- Computational tests pass if scaled residuals are less than                16.0

================================================================================
T/V                N    NB     P     Q               Time                 Gflops
--------------------------------------------------------------------------------
WR11C2R4       50004   256     1     4            1393.69             5.9810e+01
HPL_pdgesv() start time Fri Feb 23 18:28:10 2024

HPL_pdgesv() end time   Fri Feb 23 18:51:24 2024

--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=   3.65051815e-03 ...... PASSED
================================================================================

Finished      1 tests with the following results:
              1 tests completed and passed residual checks,
              0 tests completed and failed residual checks,
              0 tests skipped because of illegal input values.
--------------------------------------------------------------------------------

End of Tests.
================================================================================

geerlingguy · 2024-02-23T21:24:40Z

Full cluster again, with Ps/Qs being 4/8: 224.60 Gflops, 73 W, for 3.08 Gflops/W

================================================================================
HPLinpack 2.3  --  High-Performance Linpack benchmark  --   December 2, 2018
Written by A. Petitet and R. Clint Whaley,  Innovative Computing Laboratory, UTK
Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK
Modified by Julien Langou, University of Colorado Denver
================================================================================

An explanation of the input/output parameters follows:
T/V    : Wall time / encoded variant.
N      : The order of the coefficient matrix A.
NB     : The partitioning blocking factor.
P      : The number of process rows.
Q      : The number of process columns.
Time   : Time in seconds to solve the linear system.
Gflops : Rate of execution for solving the linear system.

The following parameter values will be used:

N      :  100009
NB     :     256
PMAP   : Row-major process mapping
P      :       4
Q      :       8
PFACT  :   Right
NBMIN  :       4
NDIV   :       2
RFACT  :   Crout
BCAST  :  1ringM
DEPTH  :       1
SWAP   : Mix (threshold = 64)
L1     : transposed form
U      : transposed form
EQUIL  : yes
ALIGN  : 8 double precision words

--------------------------------------------------------------------------------

- The matrix A is randomly generated for each test.
- The following scaled residual check will be computed:
      ||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )
- The relative machine precision (eps) is taken to be               1.110223e-16
- Computational tests pass if scaled residuals are less than                16.0

================================================================================
T/V                N    NB     P     Q               Time                 Gflops
--------------------------------------------------------------------------------
WR11C2R4      100009   256     4     8            2969.05             2.2460e+02
HPL_pdgesv() start time Fri Feb 23 19:03:56 2024

HPL_pdgesv() end time   Fri Feb 23 19:53:25 2024

--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=   2.41638360e-03 ...... PASSED
================================================================================

Finished      1 tests with the following results:
              1 tests completed and passed residual checks,
              0 tests completed and failed residual checks,
              0 tests skipped because of illegal input values.
--------------------------------------------------------------------------------

End of Tests.
================================================================================

geerlingguy · 2024-02-23T22:44:35Z

With 1/32 Ps/Qs, the result is: 171.50 Gflops at 68W average, so 2.52 Gflops/W.

I think 4/8 is the most efficient layout on this cluster.

geerlingguy mentioned this issue Feb 23, 2024

Turing RK1 - 32GB geerlingguy/sbc-reviews#38

Open

geerlingguy added a commit that referenced this issue Feb 23, 2024

Issue #27: Add Turing Pi 2 cluster with 4x RK1 32GB Nodes.

bd3cea5

geerlingguy closed this as completed Feb 23, 2024

geerlingguy reopened this Feb 23, 2024

geerlingguy closed this as completed Feb 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmark Turing Pi 2 cluster with 4x RK1 Nodes #27

Benchmark Turing Pi 2 cluster with 4x RK1 Nodes #27

geerlingguy commented Feb 23, 2024

geerlingguy commented Feb 23, 2024

geerlingguy commented Feb 23, 2024 •

edited

Loading

geerlingguy commented Feb 23, 2024

geerlingguy commented Feb 23, 2024

geerlingguy commented Feb 23, 2024

Benchmark Turing Pi 2 cluster with 4x RK1 Nodes #27

Benchmark Turing Pi 2 cluster with 4x RK1 Nodes #27

Comments

geerlingguy commented Feb 23, 2024

geerlingguy commented Feb 23, 2024

geerlingguy commented Feb 23, 2024 • edited Loading

geerlingguy commented Feb 23, 2024

geerlingguy commented Feb 23, 2024

geerlingguy commented Feb 23, 2024

geerlingguy commented Feb 23, 2024 •

edited

Loading