Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip 1m optimization when forcing hemm_l/symm_l. #697

Merged
merged 1 commit into from
Dec 16, 2022

Conversation

fgvanzee
Copy link
Member

@fgvanzee fgvanzee commented Dec 16, 2022

Details:

  • Fixed a bug in right-sided hemm when:
    • using the 1m method,
    • #defining BLIS_DISABLE_HEMM_RIGHT in the active subconfiguration, and
    • the storage of C matches the gemm microkernel IO preference PRIOR to the right-sidedness being detected and recast in terms of the left-side code path.
  • It turns out that bli_gemm_ind_recast_1m_params() was applying its optimization (recasting a complex-domain macrokernel calling a 1m virtual microkernel to a real-domain macrokernel calling the real-domain microkernel) in situations in which it should not have. The optimization was silently assuming that the storage of C always matched that of the microkernel preference, since the front-end would have already had a chance to transpose the operation to bring the two into agreement. However, by disabling right-sided hemm, we deprive BLIS of that flexibility, and thus suddenly the assumption was no longer holding in all cases. Thanks to Nisanth M P (@nisanthmp) for reporting this bug in Issue Tests fail on POWER machines #621.
  • The original bug, and this bugfix, also extend to symm when BLIS_DISABLE_SYMM_RIGHT is defined.
  • Comment updates.
  • CREDITS file update.

Details:
- Fixed a bug in right-sided hemm when:
  - using the 1m method
  - #defining BLIS_DISABLE_HEMM_RIGHT in the active subconfiguration
  - the storage of C matches the gemm microkernel IO preference PRIOR to
    the right-sidedness being detected and recast in terms of the left
    side code path.
  It turns out that bli_gemm_ind_recast_1m_params() was applying its
  optimization (recasting a complex-domain macrokernel calling a 1m
  virtual microkernel to a real-domain macrokernel calling the real-
  domain microkernel) in situations in which it should not have. The
  optimization was silently assuming that the storage of C always
  matched that of the microkernel preference, since the front-end
  would have already had a chance to transpose the operation to bring
  the two into agreement. However, by disabling right-sided hemm, we
  deprive BLIS of that flexiblity, and thus suddenly the assumption was
  no longer holding in all cases. Thanks to Nisanth M P for reporting
  this bug in Issue #621.
- The original bug, and this bugfix, also extend to symm when
  BLIS_DISABLE_SYMM_RIGHT is defined.
- Comment updates.
- CREDITS file update.
@fgvanzee fgvanzee merged commit 3accacf into master Dec 16, 2022
@fgvanzee fgvanzee deleted the 1m_hemm_forceleft_fix branch December 16, 2022 16:26
fgvanzee added a commit that referenced this pull request May 20, 2024
Details:
- Fixed a bug in right-sided hemm when:
  - using the 1m method,
  - #defining BLIS_DISABLE_HEMM_RIGHT in the active subconfiguration,
    and
  - the storage of C matches the gemm microkernel IO preference PRIOR to
    the right-sidedness being detected and recast in terms of the left-
    side code path.
  It turns out that bli_gemm_ind_recast_1m_params() was applying its
  optimization (recasting a complex-domain macrokernel calling a 1m
  virtual microkernel to a real-domain macrokernel calling the real-
  domain microkernel) in situations in which it should not have. The
  optimization was silently assuming that the storage of C always
  matched that of the microkernel preference, since the front-end (in
  this case, bli_hemm_front()) would have already had a chance to
  transpose the operation to bring the two into agreement. However, by
  disabling right-sided hemm, we deprive BLIS of that flexibility (as a
  transposed left-sided case would necessarily have to become a right-
  sided case), and thus the assumption was no longer holding in all
  cases. Thanks to Nisanth M P for reporting this bug in Issue #621.
- The aforementioned bug, and its bugfix, also apply to symm when
  BLIS_DISABLE_SYMM_RIGHT is defined.
- Comment updates.
- CREDITS file update.
- (cherry picked from commit 3accacf)

Fixed perf of mt sup with packing, and mt gemmlike. (#696)

Details:
- Brought the gemmsup code path up to date relative to the latest
  thrinfo_t semantics introduced in the October Omnibus commit
  (aeb5f0c). This was done by passing the prenode (instead of the
  current node) into the packm variant within bli_l3_sup_packm.c as well
  as creating the prenodes and attaching them to the thrinfo_t tree in
  bli_l3_sup_thrinfo_create(). These changes erase the performance
  degradation introduced in the omnibus when running multithreaded sup
  with optional packing enabled. Special thanks to Devin Matthews for
  sussing out this fix in short order.
- Fixed the gemmlike sandbox in a manner similar to that of sup with
  packing, described above. This also involved passing the prenode into
  the local gemmlike packm variant. (Recall that gemmlike recycles the
  use of bli_l3_sup_thrinfo_create(), so it automatically inherits that
  part of the sup fix described above.)
- Updated bls_l3_packm_var[123].c to use bli_thrinfo_n_way() and
  bli_thrinfo_work_id() instead of bli_thrinfo_num_threads() and
  bli_thrinfo_thread_id(), respectively.
- (cherry picked from 4833ba2)

Fixed _gemm_small() prototype; disabled gemm_small.

Details:
- Fixed a mismatch between the prototype for bli_gemm_small() in
  bli_gemm_front.h and the actual definition of bli_gemm_small() in
  kernels/zen/3/bli_gemm_small.c. The former was erroneously declaring
  the cntl_t* argument as 'const'. Thanks to Jeff Diamond for reporting
  this issue.
- Commented out BLIS_ENABLE_SMALL_MATRIX, BLIS_ENABLE_SMALL_MATRIX_TRSM
  macro definitions in config/zen3/bli_family_zen3.h. AMD's small matrix
  implementation should probably remain disabled in vanilla BLIS, at
  least for now.
- (cherry picked from db10dd8)

Trival whitespace/comment tweaks.

Details:
- Trivial whitespace and comment changes, most of which ideally would
  have been part of the previous commit pertaining to HPX (2b05948).
- (cherry picked from f0337b7)

blis support for hpx (#682)

- Implement threading backend via HPX.
- HPX is an asynchronous many task runtime system used in high
  performance computing applications. The runtime implements the
  ISO C++ parallelism specification and provides a user-space
  thread implementation.
- This PR provides BLIS a thread backend implementation using HPX
  and resolves feature request #681. The configuration script,
  makefiles, and testsuite have been updated to support an HPX
  build option. The addition of HPX support provides other
  developers an exemplar for integrating other C++ threading
  backends into BLIS.
- (cherry picked from 2b05948)

Fixed subtle barrier_fpa bug in bli_thrcomm.c. (#690)

Details:
- In bli_thrcommo.c, correctly initialize the BLIS_OPENMP element of the
  barrier function pointer array (barrier_fpa) to NULL when
  BLIS_ENABLE_OPENMP is *not* defined. Similarly, initialize the
  BLIS_POSIX element of barrier_fpa to NULL when BLIS_ENABLE_PTHREADS is
  not enabled. This bug was introduced in a1a5a9b and was likely the
  result of an incomplete edit. The effects of the bug would have
  likely manifested when querying a thrcomm_t that was initialized with
  a timpl_t value corresponding to a threading implementation that was
  omitted from the -t option at configure-time.
- (cherry picked from e1ea25d)

Enhance emacs formatting of C files to remove trailing whitespace and ensure
 a newline at the end of file
- (cherry picked from dc6e5f3)

Delete mpi_test garbage. (#689)

Details:
- tlrmchlsmth: "What even is this? No comments, no commit message, not
  used by anything. Trash."
- (cherry picked from 713d078)

Some decluttering of the top-level directory.

Details:
- Relocated 'mpi_test' directory to test/mpi_test.
- Relocated 'so_version' and 'version' files from top-level directory to
  'build' directory.
- Updated build/bump-version.sh script to accommodate relocation of
  'version' file to 'build' directory.
- Updated configure script to accommodate relocation of 'so_version'
  file to 'build' directory.
- Updated INSTALL file to replace pointers to blis-devel mailing list
  with a pointer to docs/Discord.md.
- Updated RELEASING file to contain a reminder to consider whether the
  so_version file should be updated prior to the release.
- (cherry picked from 8d813f7)

Fix typo in configure --help text. (#686)

Details:
- Fixed a misspelling in the --help description for the --int-size (-i)
  configure option.
- (cherry picked from 6774bf0)

Support --nosup, --sup configure options. (#684)

Details:
- Added --nosup and --sup as alternative ways of requesting that sup be
  disabled or enabled. These are analagous to --disable-sup-handling and
  --enable-sup-handling, respectively. (I got tired of typing out
  --disable-sup-handling and needed a shorthand notation.)
- Tweaked message output by configure when sup is enable/disabled for
  clarity and specificity.
- Whitespace changes.
- (cherry picked from edcc2f9)

Add mention of Wilkinson Prize to README.md. (#683)

Details:
- Added blurbs and links to Wilkinson Prize to README.md.
- Added mention of both Best Paper and Wilkinson Prizes to the top of
  README.md.
- Other minor tweaks.
- (cherry picked from 5eea6ad)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant