Skip to content

Commit

Permalink
Omnibus PR - Oct 2023 (#678)
Browse files Browse the repository at this point in the history
Details:
- This is an "omnibus" commit, consisting of multiple medium-sized
  commits that affect non-trivial aspects of BLIS. The major highlights:
  - Relocated the pba, sba pool (from the rntm_t), and mem_t (from the
    cntl_t) to the thrinfo_t object. This allows the rntm_t to be
    effectively const (although it is sometimes copied internally and
    modified to reflect different ways of parallelism). Moving the mem_t
    sets the stage for sharing a global control tree amongst all
    threads.
  - De-templatized the macrokernels for gemmt, trmm, and trsm to match
    the macrokernel for gemm, which has been de-templatized since
    54fa28b.
  - Reimplemented bli_l3_determine_kc() by separating out the logic for
    adjusting KC based on MR/NR for triangular A and/or B into a new
    function, bli_l3_adjust_kc(). For now, this function is still called
    from bli_l3_determine_kc(), but in the future we plan to have it
    called once when constructing the control tree.
  - Refactored the level-3 thread decorator into two parts:
    - One part deals only with launching threads, each one calling a 
      generic thread entry function. This code resides in frame/thread  
      and constitutes the definition of bli_thread_launch(). Note that 
      it is specific to the threading implementation (OpenMP, pthreads, 
      single, etc.)
    - The other part deals with passing the matrix operands and related
      information into bli_thread_launch(). This is the "l3 decorator" 
      and now resides in frame/3. It is agnostic to the threading
      implementation.
  - Modified the "level" of the thread control tree passed in at each 
    operation. Previously, each operation (e.g. bli_gemm_blk_var1()) was 
    passed in a communicator representing the active thread teams which
    would share the available work. Now, the *parent* thread comm is 
    passed in. The operation then grabs the child comm and uses it to 
    partition the work. The difference is in bli_trsm_blk_var1(), where
    there are now two children nodes for this single operation (i.e. the 
    thread control tree is split one level above where the control tree 
    is). The sub-prenode is used for the trsm subproblem while the 
    normal sub-node is used for the gemm part. Importantly, the parent 
    comm is used for the barrier between them. 
- Removed cntl_t* arguments from bli_*_front() functions. These will be
  added back in the future when the control tree's creation is moved so
  that it happens much sooner (provided that bli_*_front() have not been
  absorbed into their respective bli_*_ex() functions).
- Renamed various bli_thread_*() query functions to bli_thrinfo_*(),
  for consistency. This includes _num_threads(), _thread_id(), _n_way(), 
  _work_id(), _sba_pool(), _pba(), _mem(), _barrier(), _broadcast(), and
  _am_chief().
- Removed extraneous barrier from _blk_var3() of gemm and trsm.
- Fixed a typo in bli_type_defs.h where BLIS_BLAS_INT_TYPE_SIZE was
  misspelled.
  • Loading branch information
devinamatthews authored Oct 27, 2022
1 parent c803b03 commit aeb5f0c
Show file tree
Hide file tree
Showing 206 changed files with 5,013 additions and 11,035 deletions.
10 changes: 5 additions & 5 deletions addon/gemmd/attic/bao_gemmd_bp_var2.c
Original file line number Diff line number Diff line change
Expand Up @@ -386,8 +386,8 @@ void PASTECH2(bao_,ch,varname) \
/* Query the number of threads and thread ids for the JR loop.
NOTE: These values are only needed when computing the next
micropanel of B. */ \
const dim_t jr_nt = bli_thread_n_way( thread_jr ); \
const dim_t jr_tid = bli_thread_work_id( thread_jr ); \
const dim_t jr_nt = bli_thrinfo_n_way( thread_jr ); \
const dim_t jr_tid = bli_thrinfo_work_id( thread_jr ); \
\
/* Compute number of primary and leftover components of the JR loop. */ \
dim_t jr_iter = ( nc_cur + NR - 1 ) / NR; \
Expand Down Expand Up @@ -416,8 +416,8 @@ void PASTECH2(bao_,ch,varname) \
/* Query the number of threads and thread ids for the IR loop.
NOTE: These values are only needed when computing the next
micropanel of A. */ \
const dim_t ir_nt = bli_thread_n_way( thread_ir ); \
const dim_t ir_tid = bli_thread_work_id( thread_ir ); \
const dim_t ir_nt = bli_thrinfo_n_way( thread_ir ); \
const dim_t ir_tid = bli_thrinfo_work_id( thread_ir ); \
\
/* Compute number of primary and leftover components of the IR loop. */ \
dim_t ir_iter = ( mc_cur + MR - 1 ) / MR; \
Expand Down Expand Up @@ -476,7 +476,7 @@ void PASTECH2(bao_,ch,varname) \
/* This barrier is needed to prevent threads from starting to pack
the next row panel of B before the current row panel is fully
computed upon. */ \
bli_thread_barrier( thread_pb ); \
bli_thrinfo_barrier( thread_pb ); \
} \
} \
\
Expand Down
10 changes: 5 additions & 5 deletions addon/gemmd/bao_gemmd_bp_var1.c
Original file line number Diff line number Diff line change
Expand Up @@ -370,8 +370,8 @@ void PASTECH2(bao_,ch,varname) \
/* Query the number of threads and thread ids for the JR loop.
NOTE: These values are only needed when computing the next
micropanel of B. */ \
const dim_t jr_nt = bli_thread_n_way( thread_jr ); \
const dim_t jr_tid = bli_thread_work_id( thread_jr ); \
const dim_t jr_nt = bli_thrinfo_n_way( thread_jr ); \
const dim_t jr_tid = bli_thrinfo_work_id( thread_jr ); \
\
/* Compute number of primary and leftover components of the JR loop. */ \
dim_t jr_iter = ( nc_cur + NR - 1 ) / NR; \
Expand Down Expand Up @@ -400,8 +400,8 @@ void PASTECH2(bao_,ch,varname) \
/* Query the number of threads and thread ids for the IR loop.
NOTE: These values are only needed when computing the next
micropanel of A. */ \
const dim_t ir_nt = bli_thread_n_way( thread_ir ); \
const dim_t ir_tid = bli_thread_work_id( thread_ir ); \
const dim_t ir_nt = bli_thrinfo_n_way( thread_ir ); \
const dim_t ir_tid = bli_thrinfo_work_id( thread_ir ); \
\
/* Compute number of primary and leftover components of the IR loop. */ \
dim_t ir_iter = ( mc_cur + MR - 1 ) / MR; \
Expand Down Expand Up @@ -458,7 +458,7 @@ void PASTECH2(bao_,ch,varname) \
/* This barrier is needed to prevent threads from starting to pack
the next row panel of B before the current row panel is fully
computed upon. */ \
bli_thread_barrier( rntm, thread_pb ); \
bli_thrinfo_barrier( thread_pb ); \
} \
} \
\
Expand Down
10 changes: 5 additions & 5 deletions addon/gemmd/bao_l3_packm_a.c
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ void PASTECH2(bao_,ch,opname) \
\
/* Barrier to make sure all threads are caught up and ready to begin the
packm stage. */ \
bli_thread_barrier( rntm, thread ); \
bli_thrinfo_barrier( thread ); \
\
/* Compute the size of the memory block eneded. */ \
siz_t size_needed = sizeof( ctype ) * m_pack * k_pack; \
Expand Down Expand Up @@ -90,7 +90,7 @@ void PASTECH2(bao_,ch,opname) \
\
/* Broadcast the address of the chief thread's passed-in mem_t to all
threads. */ \
mem_t* mem_p = bli_thread_broadcast( rntm, thread, mem ); \
mem_t* mem_p = bli_thrinfo_broadcast( thread, mem ); \
\
/* Non-chief threads: Copy the contents of the chief thread's
passed-in mem_t to the passed-in mem_t for this thread. (The
Expand Down Expand Up @@ -139,7 +139,7 @@ void PASTECH2(bao_,ch,opname) \
\
/* Broadcast the address of the chief thread's passed-in mem_t
to all threads. */ \
mem_t* mem_p = bli_thread_broadcast( rntm, thread, mem ); \
mem_t* mem_p = bli_thrinfo_broadcast( thread, mem ); \
\
/* Non-chief threads: Copy the contents of the chief thread's
passed-in mem_t to the passed-in mem_t for this thread. (The
Expand Down Expand Up @@ -313,13 +313,13 @@ void PASTECH2(bao_,ch,opname) \
d, incd, \
a, rs_a, cs_a, \
*p, *rs_p, *cs_p, \
pd_p, *ps_p, \
pd_p, *ps_p, \
cntx, \
thread \
); \
\
/* Barrier so that packing is done before computation. */ \
bli_thread_barrier( rntm, thread ); \
bli_thrinfo_barrier( thread ); \
}

//INSERT_GENTFUNC_BASIC0( packm_a )
Expand Down
10 changes: 5 additions & 5 deletions addon/gemmd/bao_l3_packm_b.c
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ void PASTECH2(bao_,ch,opname) \
\
/* Barrier to make sure all threads are caught up and ready to begin the
packm stage. */ \
bli_thread_barrier( rntm, thread ); \
bli_thrinfo_barrier( thread ); \
\
/* Compute the size of the memory block eneded. */ \
siz_t size_needed = sizeof( ctype ) * k_pack * n_pack; \
Expand Down Expand Up @@ -90,7 +90,7 @@ void PASTECH2(bao_,ch,opname) \
\
/* Broadcast the address of the chief thread's passed-in mem_t to all
threads. */ \
mem_t* mem_p = bli_thread_broadcast( rntm, thread, mem ); \
mem_t* mem_p = bli_thrinfo_broadcast( thread, mem ); \
\
/* Non-chief threads: Copy the contents of the chief thread's
passed-in mem_t to the passed-in mem_t for this thread. (The
Expand Down Expand Up @@ -139,7 +139,7 @@ void PASTECH2(bao_,ch,opname) \
\
/* Broadcast the address of the chief thread's passed-in mem_t
to all threads. */ \
mem_t* mem_p = bli_thread_broadcast( rntm, thread, mem ); \
mem_t* mem_p = bli_thrinfo_broadcast( thread, mem ); \
\
/* Non-chief threads: Copy the contents of the chief thread's
passed-in mem_t to the passed-in mem_t for this thread. (The
Expand Down Expand Up @@ -313,13 +313,13 @@ void PASTECH2(bao_,ch,opname) \
d, incd, \
b, rs_b, cs_b, \
*p, *rs_p, *cs_p, \
pd_p, *ps_p, \
pd_p, *ps_p, \
cntx, \
thread \
); \
\
/* Barrier so that packing is done before computation. */ \
bli_thread_barrier( rntm, thread ); \
bli_thrinfo_barrier( thread ); \
}

//INSERT_GENTFUNC_BASIC0( packm_b )
Expand Down
4 changes: 2 additions & 2 deletions addon/gemmd/bao_l3_packm_var1.c
Original file line number Diff line number Diff line change
Expand Up @@ -127,8 +127,8 @@ void PASTECH2(bao_,ch,varname) \
\
/* Query the number of threads and thread ids from the current thread's
packm thrinfo_t node. */ \
const dim_t nt = bli_thread_n_way( thread ); \
const dim_t tid = bli_thread_work_id( thread ); \
const dim_t nt = bli_thrinfo_n_way( thread ); \
const dim_t tid = bli_thrinfo_work_id( thread ); \
\
/* Suppress warnings in case tid isn't used (ie: as in slab partitioning). */ \
( void )nt; \
Expand Down
4 changes: 2 additions & 2 deletions addon/gemmd/bao_l3_packm_var2.c
Original file line number Diff line number Diff line change
Expand Up @@ -127,8 +127,8 @@ void PASTECH2(bao_,ch,varname) \
\
/* Query the number of threads and thread ids from the current thread's
packm thrinfo_t node. */ \
const dim_t nt = bli_thread_n_way( thread ); \
const dim_t tid = bli_thread_work_id( thread ); \
const dim_t nt = bli_thrinfo_n_way( thread ); \
const dim_t tid = bli_thrinfo_work_id( thread ); \
\
/* Suppress warnings in case tid isn't used (ie: as in slab partitioning). */ \
( void )nt; \
Expand Down
Loading

0 comments on commit aeb5f0c

Please sign in to comment.