Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support multiple axes #31

Open
thomasahle opened this issue Dec 5, 2021 · 1 comment
Open

Support multiple axes #31

thomasahle opened this issue Dec 5, 2021 · 1 comment

Comments

@thomasahle
Copy link

I want to reduce Hadamard Transform every column in my matrix.
Is there currently a way to do this with FFHT without using a python outer loop?

@W-Wuxian
Copy link

W-Wuxian commented Jan 21, 2022

Hi, I am also interested by a blocked FFHT. For instance, I am using a dumb FFHT with a rotation kernel based on the cblas?rotm_ (code below) witch allow me to compute 1-D Hadamard transform of a block vector W (W is m x bs). But in fact cblas?rotm_ is considering data as vector of length 2, not of length bs with leads to poor performance.

void _preAlps_ROTATEData_MAT_d(double* data, int nIn, int ncols, unsigned c1, unsigned c2,
 Timing_FWHT_t* TimerFWHT){
  TimerFWHT->Level3TrashTime = MPI_Wtime();
  // param := flag, h11, h21, h12, h22
  static const double rotatedata_mat_param[] = {-1.0, 1.0, 1.0, 1.0, -1.0};
  // Alias
  double* x = &data[c1];
  double* y = &data[c2];
  TimerFWHT->PrepareROTATEDataTime += MPI_Wtime() - TimerFWHT->Level3TrashTime;
  TimerFWHT->Level3TrashTime = MPI_Wtime();
  // Performs modified Givens rotation of points in the plane
  cblas_drotm(ncols, x, nIn, y, nIn, rotatedata_mat_param);
  TimerFWHT->ROTATEDataTime += MPI_Wtime() - TimerFWHT->Level3TrashTime;
}
int preAlps_FWHT_MAT_d( double* data, unsigned n, int ncols,
 Timing_FWHT_t* TimerFWHT) {
  TimerFWHT->FWHTSample += 1;
  TimerFWHT->Level1TrashTime = MPI_Wtime();
  int ierr = 0;
  // Require localsize to be a power of 2
  unsigned localsize = n;
  unsigned l2 = preAlps_ILOG2(localsize) - 1;
  unsigned l3 = 1 << l2;
  if (localsize != l3) {
    CPLM_Abort("Data row length should be a power of 2,"
	       " localsize: %u ; l3: %u", localsize, l3);
  }
  // Compute the FWHT
  unsigned tmp;
  for (unsigned i = 0; i<l2; ++i) {
    l3 = (unsigned)(1 << i);
    for (unsigned j = 0; j < localsize; j += (1 << (i+1))) {
      for (unsigned k = 0; k < l3; ++k) {
        tmp = j + k;
        TimerFWHT->Level2TrashTime = MPI_Wtime();
        _preAlps_ROTATEData_MAT_d(data, localsize, ncols, tmp, tmp+l3, TimerFWHT);
        TimerFWHT->FUNROTATEDataTime += MPI_Wtime() - TimerFWHT->Level2TrashTime;
      }
    }
  }
  TimerFWHT->FWHTTotalTime += MPI_Wtime() - TimerFWHT->Level1TrashTime;
  return ierr;
}

I did a benchmark using np=8 mpi process (computer is intel) to compute a 1-D Blocked Fast Walsh Hadamard transform of a Matrix W of size m x n. W is Column Major and distributed by rows on each mpi process. The global size is denoted by m=np*mp and the local size is denoted by mp=6.25E+5. I used m=5E+6 and n=256 and applied a zero-padding on mp. The block size is denoted by bs. I am computing the 1-D block Fast Walsh Hadamard transform (almost like) as the following:

int remainder = n%bs;
int coef = (remainder==0) ? n/bs : ((n - remainder) / bs) + 1;
int curbs = 0; /**< current block size */
for (int j = 0; j < coef; ++j){
  curbs = (remainder==0) ? bs : ((j<coef-1) ? bs : remainder);
  preAlps_FWHT_MAT_d( *(W+j*mp*bs), mp, curbs); 
}

who gave me the following plot:
TpsBSRHT_d.pdf
We should only focus on blue bars (FWHT) and the bs facet parameters. From the plot TpsBSRHT_d.pdf, we can see that the computation time decreases for intermediate block size (bs), probably due to the cache effect. But from my point of view the computation time is still terrible!?

So my point is could you update the FFHT lib in order to allow yours users to used blocked FFHT/FWHT? Or help me to optimize my code? From your home page, I am targetting bs*0.50 sec for m=2^27 as computation time. (TY)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants