Vectorization of Fortran loops and other optimizations #227

AlexKurek · 2024-06-09T15:54:42Z

Just a question / suggestion: this are the results of profiling:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      571 7110.513   12.453 7110.513   12.453 {built-in method posix.waitpid}
        4 1365.157  341.289 2005.113  501.278 wavelet_atrous.py:667(check_islands_for_overlap)
        1  830.799  830.799  830.799  830.799 psf_vary.py:685(tess_simple)
     3407  381.475    0.112  381.497    0.112 necompiler.py:978(re_evaluate)
     1544  312.586    0.202  312.586    0.202 {method 'partition' of 'numpy.ndarray' objects}
       20  305.964   15.298  305.964   15.298 {built-in method scipy.ndimage._nd_image.geometric_transform}
       44  288.652    6.560  288.652    6.560 {built-in method gc.collect}
        5  270.768   54.154 1012.391  202.478 functions.py:2223(bstat)
      214  168.266    0.786  190.097    0.888 wavelet_atrous.py:649(renumber_islands)
   314391  137.325    0.000  225.320    0.001 _methods.py:196(_var)
  5110219  134.055    0.000  134.055    0.000 {method 'reduce' of 'numpy.ufunc' objects}
        6  117.511   19.585  251.401   41.900 make_residimage.py:25(__call__)
    27523  106.675    0.004  106.675    0.004 {method 'flatten' of 'numpy.ndarray' objects}
10056453/7133976   97.635    0.000  817.598    0.000 {built-in method numpy.core._multiarray_umath.implement_array_function}
    31858   55.493    0.002   55.493    0.002 {built-in method posix.read}

Have you tried to vectorize loops there:
https://github.com/lofar-astron/PyBDSF/blob/master/src/fortran/pytess_simple.f#L16C1-L29C15
?
ChatGPT seems to be doing this:

program vectorized_code
  implicit none
  integer, parameter :: m = 100, n = 100, ngens = 10
  integer :: i, j, k
  real(8), dimension(n, m) :: volrank
  real(8), dimension(ngens) :: xgens, ygens, wts
  real(8), dimension(n, m, ngens) :: dist
  real(8) :: dumr
  integer, dimension(n, m) :: minind

  ! Initialize generators and weights (example values)
  do k = 1, ngens
     xgens(k) = k
     ygens(k) = k
     wts(k) = 1.0d0
  end do

  ! Vectorized computation
  volrank = 0.d0
  do k = 1, ngens
     ! Calculate distance from generator k to all points (i, j)
     dist(:, :, k) = sqrt((reshape(([(i, i = 1, n)], [n, m]) - xgens(k)), [n, m]) ** 2 + &
                          (reshape(([(j, j = 1, m)], [m, n]) - ygens(k)), [m, n]) ** 2) / wts(k)
  end do

  ! Find the minimum distance and corresponding generator index
  do j = 1, m
     do i = 1, n
        dumr = minval(dist(i, j, :))
        minind(i, j) = minloc(dist(i, j, :), dim = 1)
     end do
  end do
end program vectorized_code

but Im bad at Fortran so I cannot verify if this is OK.

The text was updated successfully, but these errors were encountered:

gmloose · 2024-06-10T13:51:49Z

Short answer: no. My Fortran knowledge is rusty and certainly not up to par with modern standards.
However, before trying to optimize code manually, have you tried to let the compiler do the optimizations for you. In other words, how did you compile the code that you profiled?

AlexKurek · 2024-06-11T04:25:07Z

I just installed PyBDSF using pip install git+https://github.com/lofar-astron/PyBDSF.git. First I used cProfile to find slowest functions, then line_profiler to profile within those functions, which looks like this:

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   632                                           @line_profiler.profile                                               
   633                                           def copy_gaussians(img, isl1, isl2):                                 
   634                                               """Copies Gaussians from isl2 to isl1                            
   635                                                                                                                
   636                                               img.gaussians is also updated                                    
   637                                               """                                                              
   638        31        148.5      4.8      0.3      if img.ngaus == 0:                                               
   639                                                   gaus_id = -1                                                 
   640                                               else:                                                            
   641        31        105.1      3.4      0.2          gaus_id = img.gaussians[-1].gaus_num                         
   642        65         28.1      0.4      0.1      for g in isl2.gaul:                                              
   643        34         11.6      0.3      0.0          gaus_id += 1                                                 
   644        34      45805.1   1347.2     98.3          gcp = Gaussian(img, g.parameters[:], isl1.island_id, gaus_id)
   645        34          9.6      0.3      0.0          gcp.gaus_num = gaus_id                                       
   646        34         12.3      0.4      0.0          gcp.jlevel = g.jlevel                                        
   647        34         14.8      0.4      0.0          isl1.gaul.append(gcp)                                        
   648        34        378.7     11.1      0.8          img.ngaus += 1                                               
   649        34        100.5      3.0      0.2          img.gaussians.append(gcp)

Do you think the compiler is capable of doing the vectorizations that well?

gmloose · 2024-06-11T14:39:35Z

I'm confused. AFAICS Gaussian is a class written in pure Python. I don't see a link with the Fortran snippet you referred to earlier. It may very well be that you found a hot-spot in copy_gaussians() that can be improved, but that's hard for me to tell by quickly glancing over the code (which I didn't write BTW 😄). It's also not completely clear to me what exactly you're profiling.

AlexKurek · 2024-06-11T14:53:52Z

I'm confused. AFAICS Gaussian is a class written in pure Python. I don't see a link with the Fortran snippet you referred to earlier.

No, it is not, that is a different bottleneck.

It's also not completely clear to me what exactly you're profiling.

I profiled an entire run on a small FITS image.

It may very well be that you found a hot-spot in copy_gaussians() that can be improved.

FYI: this is how this class looks when profiled:

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   936                                               @line_profiler.profile                                                                                                                
   937                                               def __init__(self, img, gaussian, isl_idx, g_idx, flg=0):                                                                             
   938                                                   """Initialize Gaussian object from fitting data                                                                                   
   939                                                                                                                                                                                     
   940                                                   Parameters:                                                                                                                       
   941                                                   img: PyBDSM image object                                                                                                          
   942                                                   gaussian: 6-tuple of fitted numbers                                                                                               
   943                                                   isl_idx: island serial number                                                                                                     
   944                                                   g_idx: gaussian serial number                                                                                                     
   945                                                   flg: flagging (if any)                                                                                                            
   946                                                   """                                                                                                                               
   947       458       1109.3      2.4      0.2          from . import functions as func                                                                                                   
   948       458       1124.8      2.5      0.2          from .const import fwsig                                                                                                          
   949       458        222.0      0.5      0.0          import numpy as N                                                                                                                 
   950                                                                                                                                                                                     
   951                                                   # Add attribute definitions needed for output                                                                                     
   952       458       3947.7      8.6      0.7          self.source_id_def           = Int(doc="Source index", colname='Source_id')                                                       
   953       458       2647.0      5.8      0.5          self.code_def                = String(doc='Source code S, C, or M', colname='S_Code')                                             
   954       458       2198.5      4.8      0.4          self.gaus_num_def = Int(doc="Serial number of the gaussian for the image", colname='Gaus_id')                                     
   955       458       2284.2      5.0      0.4          self.island_id_def   = Int(doc="Serial number of the island", colname='Isl_id')                                                   
   956       458       2197.6      4.8      0.4          self.flag_def        = Int(doc="Flag associated with gaussian", colname='Flag')                                                   
   957       458       2487.5      5.4      0.4          self.total_flux_def  = Float(doc="Total flux density, Jy", colname='Total_flux', units='Jy')                                      
   958       916       2242.1      2.4      0.4          self.total_fluxE_def = Float(doc="Total flux density error, Jy", colname='E_Total_flux',                                          
   959       458         68.7      0.2      0.0                              units='Jy')                                                                                                   
   960       916       2236.5      2.4      0.4          self.peak_flux_def   = Float(doc="Peak flux density/beam, Jy/beam", colname='Peak_flux',                                          
   961       458         67.9      0.1      0.0                              units='Jy/beam')                                                                                              
   962       916       2189.3      2.4      0.4          self.peak_fluxE_def  = Float(doc="Peak flux density/beam error, Jy/beam",                                                         
   963       458         76.9      0.2      0.0                              colname='E_Peak_flux', units='Jy/beam')                                                                       
   964       916       6217.4      6.8      1.1          self.centre_sky_def  = List(Float(), doc="Sky coordinates of gaussian centre",                                                    
   965       458        150.0      0.3      0.0                             colname=['RA', 'DEC'], units=['deg', 'deg'])                                                                   
   966       916       5131.7      5.6      0.9          self.centre_skyE_def = List(Float(), doc="Error on sky coordinates of gaussian centre",                                           
   967       458        127.0      0.3      0.0                             colname=['E_RA', 'E_DEC'], units=['deg', 'deg'])                                                               
   968       916       5134.0      5.6      0.9          self.centre_pix_def  = List(Float(), doc="Pixel coordinates of gaussian centre",                                                  
   969       458        141.0      0.3      0.0                             colname=['Xposn', 'Yposn'], units=['pix', 'pix'])                                                              
   970       916       5030.1      5.5      0.9          self.centre_pixE_def = List(Float(), doc="Error on pixel coordinates of gaussian centre",                                         
   971       458        133.0      0.3      0.0                             colname=['E_Xposn', 'E_Yposn'], units=['pix', 'pix'])                                                          
   972       916       4949.9      5.4      0.9          self.size_sky_def   = List(Float(), doc="Shape of the gaussian FWHM, PA, deg",                                                    
   973       458        165.2      0.4      0.0                            colname=['Maj', 'Min', 'PA'], units=['deg', 'deg',                                                              
   974                                                                     'deg'])                                                                                                         
   975       916       4918.2      5.4      0.9          self.size_skyE_def  = List(Float(), doc="Error on shape of the gaussian FWHM, PA, deg",                                           
   976       458        139.7      0.3      0.0                            colname=['E_Maj', 'E_Min', 'E_PA'], units=['deg', 'deg',                                                        
   977                                                                     'deg'])                                                                                                         
   978       916       4966.7      5.4      0.9          self.deconv_size_sky_def = List(Float(), doc="Deconvolved shape of the gaussian FWHM, PA, deg",                                   
   979       458        127.1      0.3      0.0                            colname=['DC_Maj', 'DC_Min', 'DC_PA'], units=['deg', 'deg',                                                     
   980                                                                     'deg'])                                                                                                         
   981       916       4907.4      5.4      0.9          self.deconv_size_skyE_def = List(Float(), doc="Error on deconvolved shape of the gaussian FWHM, PA, deg",                         
   982       458        123.9      0.3      0.0                            colname=['E_DC_Maj', 'E_DC_Min', 'E_DC_PA'], units=['deg', 'deg',                                               
   983                                                                     'deg'])                                                                                                         
   984       916       4853.6      5.3      0.9          self.size_sky_uncorr_def   = List(Float(), doc="Shape in image plane of the gaussian FWHM, PA, deg",                              
   985       458        121.3      0.3      0.0                            colname=['Maj_img_plane', 'Min_img_plane', 'PA_img_plane'], units=['deg', 'deg',                                
   986                                                                     'deg'])                                                                                                         
   987       916       4985.4      5.4      0.9          self.size_skyE_uncorr_def  = List(Float(), doc="Error on shape in image plane of the gaussian FWHM, PA, deg",                     
   988       458        125.0      0.3      0.0                            colname=['E_Maj_img_plane', 'E_Min_img_plane', 'E_PA_img_plane'], units=['deg', 'deg',                          
   989                                                                     'deg'])                                                                                                         
   990       916       5015.5      5.5      0.9          self.deconv_size_sky_uncorr_def = List(Float(), doc="Deconvolved shape in image plane of the gaussian FWHM, PA, deg",             
   991       458        118.7      0.3      0.0                            colname=['DC_Maj_img_plane', 'DC_Min_img_plane', 'DC_PA_img_plane'], units=['deg', 'deg',                       
   992                                                                     'deg'])                                                                                                         
   993       916       4936.5      5.4      0.9          self.deconv_size_skyE_uncorr_def = List(Float(), doc="Error on deconvolved shape in image plane of the gaussian FWHM, PA, deg",   
   994       458        128.2      0.3      0.0                            colname=['E_DC_Maj_img_plane', 'E_DC_Min_img_plane', 'E_DC_PA_img_plane'], units=['deg', 'deg',                 
   995                                                                     'deg'])                                                                                                         
   996       458       2137.0      4.7      0.4          self.rms_def        = Float(doc="Island rms Jy/beam", colname='Isl_rms', units='Jy/beam')                                         
   997       458       2105.7      4.6      0.4          self.mean_def       = Float(doc="Island mean Jy/beam", colname='Isl_mean', units='Jy/beam')                                       
   998       458       2714.3      5.9      0.5          self.total_flux_isl_def = Float(doc="Island total flux from sum of pixels", colname='Isl_Total_flux', units='Jy')                 
   999       458       2132.5      4.7      0.4          self.total_flux_islE_def = Float(doc="Error on island total flux from sum of pixels", colname='E_Isl_Total_flux', units='Jy')     
  1000       458       2091.6      4.6      0.4          self.gresid_rms_def = Float(doc="Island rms in Gaussian residual image", colname='Resid_Isl_rms', units='Jy/beam')                
  1001       458       2048.9      4.5      0.4          self.gresid_mean_def= Float(doc="Island mean in Gaussian residual image", colname='Resid_Isl_mean', units='Jy/beam')              
  1002       458       2093.8      4.6      0.4          self.sresid_rms_def = Float(doc="Island rms in Shapelet residual image", colname='Resid_Isl_rms', units='Jy/beam')                
  1003       458       2073.2      4.5      0.4          self.sresid_mean_def= Float(doc="Island mean in Shapelet residual image", colname='Resid_Isl_mean', units='Jy/beam')              
  1004       458       2193.4      4.8      0.4          self.jlevel_def     = Int(doc="Wavelet number to which Gaussian belongs", colname='Wave_id')                                      
  1005       458       2146.3      4.7      0.4          self.spec_indx_def = Float(doc = "Spectral index", colname='Spec_Indx', units=None)                                               
  1006       458       2066.4      4.5      0.4          self.e_spec_indx_def = Float(doc = "Error in spectral index", colname='E_Spec_Indx', units=None)                                  
  1007       458       5359.7     11.7      0.9          self.specin_flux_def = List(Float(), doc = "Total flux density per channel, Jy", colname=['Total_flux'], units=['Jy'])            
  1008       458       5074.1     11.1      0.9          self.specin_fluxE_def = List(Float(), doc = "Error in total flux density per channel, Jy", colname=['E_Total_flux'], units=['Jy'])
  1009       458       5005.9     10.9      0.9          self.specin_freq_def = List(Float(), doc = "Frequency per channel, Hz", colname=['Freq'], units=['Hz'])                           
  1010                                                                                                                                                                                     
  1011       458        102.4      0.2      0.0          use_wcs = True                                                                                                                    
  1012       458        112.7      0.2      0.0          self.gaussian_idx = g_idx                                                                                                         
  1013       458        125.4      0.3      0.0          self.gaus_num = 0 # stored later                                                                                                  
  1014       458         96.2      0.2      0.0          self.island_id = isl_idx                                                                                                          
  1015       458       1621.6      3.5      0.3          self.jlevel = img.j                                                                                                               
  1016       458         95.8      0.2      0.0          self.flag = flg                                                                                                                   
  1017       458        119.2      0.3      0.0          self.parameters = gaussian                                                                                                        
  1018                                                                                                                                                                                     
  1019       458         72.4      0.2      0.0          p = gaussian                                                                                                                      
  1020       458        142.1      0.3      0.0          self.peak_flux = p[0]                                                                                                             
  1021       458        172.3      0.4      0.0          self.centre_pix = p[1:3]                                                                                                          
  1022       458        124.7      0.3      0.0          size = p[3:6]                                                                                                                     
  1023       916       5363.2      5.9      0.9          if func.approx_equal(size[0], img.pixel_beam()[0]*1.1) and \                                                                      
  1024                                                           func.approx_equal(size[1], img.pixel_beam()[1]) and \                                                                     
  1025                                                           func.approx_equal(size[2], img.pixel_beam()[2]+90.0) or \                                                                 
  1026       458       2068.8      4.5      0.4                  img.opts.fix_to_beam:                                                                                                     
  1027                                                       # Check whether fitted Gaussian is just the distorted pixel beam given as an                                                  
  1028                                                       # initial guess (always set to [bm_maj*1.1, bm_min, bm_pa+90]) or if size was                                                 
  1029                                                       # fixed to the beam. If so, reset the size to the undistorted beam. Note:                                                     
  1030                                                       # these are sigma sizes, not FWHM sizes.                                                                                      
  1031                                                       size = img.pixel_beam()                                                                                                       
  1032                                                       size = (size[0], size[1], size[2]+90.0) # adjust angle so that corrected_size() works correctly                               
  1033       458       2300.4      5.0      0.4          size = func.corrected_size(size)  # gives fwhm and P.A.                                                                           
  1034       458        100.7      0.2      0.0          self.size_pix = size # FWHM in pixels and P.A. CCW from +y axis                                                                   
  1035                                                                                                                                                                                     
  1036                                                   # Use img.orig_beam for flux calculation and deconvolution on wavelet                                                             
  1037                                                   # images, as img.beam has been altered to match the wavelet scale.                                                                
  1038                                                   # Note: these are all FWHM sizes.                                                                                                 
  1039       458       1270.1      2.8      0.2          if img.waveletimage:                                                                                                              
  1040        40        392.1      9.8      0.1              bm_pix = N.array(img.beam2pix(img.orig_beam))                                                                                 
  1041                                                   else:                                                                                                                             
  1042       418       2961.3      7.1      0.5              bm_pix = N.array(img.beam2pix(img.beam))                                                                                      
  1043                                                                                                                                                                                     
  1044                                                   # Calculate fluxes, sky sizes, etc. All sizes are FWHM.                                                                           
  1045       458        673.3      1.5      0.1          tot = p[0]*size[0]*size[1]/(bm_pix[0]*bm_pix[1])                                                                                  
  1046       458        106.4      0.2      0.0          if flg == 0:                                                                                                                      
  1047                                                       # These are good Gaussians                                                                                                    
  1048       446      28791.4     64.6      5.1              errors = func.get_errors(img, p+[tot], img.islands[isl_idx].rms, fixed_to_beam=img.opts.fix_to_beam)                          
  1049       446      20891.4     46.8      3.7              self.centre_sky = img.pix2sky(p[1:3])                                                                                         
  1050       446     101608.3    227.8     17.9              self.centre_skyE = img.pix2coord(errors[1:3], self.centre_pix, use_wcs=use_wcs)                                               
  1051       446      89407.0    200.5     15.7              self.size_sky = img.pix2gaus(size, self.centre_pix, use_wcs=use_wcs) # FWHM in degrees and P.A. east from north               
  1052       446       4037.4      9.1      0.7              self.size_sky_uncorr = img.pix2gaus(size, self.centre_pix, use_wcs=False) # FWHM in degrees and P.A. east from +y axis        
  1053       446      47263.1    106.0      8.3              self.size_skyE = img.pix2gaus(errors[3:6], self.centre_pix, use_wcs=use_wcs, is_error=True)                                   
  1054       446       3448.7      7.7      0.6              self.size_skyE_uncorr = img.pix2gaus(errors[3:6], self.centre_pix, use_wcs=False, is_error=True)                              
  1055       446       6299.3     14.1      1.1              gaus_dc, err = func.deconv2(bm_pix, size)                                                                                     
  1056       446      61356.6    137.6     10.8              self.deconv_size_sky = img.pix2gaus(gaus_dc, self.centre_pix, use_wcs=use_wcs)                                                
  1057       446       3714.9      8.3      0.7              self.deconv_size_sky_uncorr = img.pix2gaus(gaus_dc, self.centre_pix, use_wcs=False)                                           
  1058       446      47178.3    105.8      8.3              self.deconv_size_skyE  = img.pix2gaus(errors[3:6], self.centre_pix, use_wcs=use_wcs, is_error=True)                           
  1059       446       3418.6      7.7      0.6              self.deconv_size_skyE_uncorr = img.pix2gaus(errors[3:6], self.centre_pix, use_wcs=False, is_error=True)                       
  1060                                                   else:                                                                                                                             
  1061                                                       # These are flagged Gaussians, so don't calculate sky values or errors                                                        
  1062        12          6.8      0.6      0.0              errors = [0]*7                                                                                                                
  1063        12          4.1      0.3      0.0              self.centre_sky = [0., 0.]                                                                                                    
  1064        12          3.6      0.3      0.0              self.centre_skyE = [0., 0.]                                                                                                   
  1065        12          4.2      0.3      0.0              self.size_sky = [0., 0., 0.]                                                                                                  
  1066        12          3.2      0.3      0.0              self.size_sky_uncorr = [0., 0., 0.]                                                                                           
  1067        12          4.1      0.3      0.0              self.size_skyE = [0., 0.]                                                                                                     
  1068        12          3.0      0.2      0.0              self.size_skyE_uncorr = [0., 0., 0.]                                                                                          
  1069        12          3.8      0.3      0.0              self.deconv_size_sky = [0., 0., 0.]                                                                                           
  1070        12          3.6      0.3      0.0              self.deconv_size_sky_uncorr = [0., 0., 0.]                                                                                    
  1071        12          3.1      0.3      0.0              self.deconv_size_skyE  = [0., 0., 0.]                                                                                         
  1072        12          2.8      0.2      0.0              self.deconv_size_skyE_uncorr = [0., 0., 0.]                                                                                   
  1073       458        130.9      0.3      0.0          self.total_flux = tot                                                                                                             
  1074       458        185.7      0.4      0.0          self.total_fluxE = errors[6]                                                                                                      
  1075       458        139.6      0.3      0.0          self.peak_fluxE = errors[0]                                                                                                       
  1076       458        121.3      0.3      0.0          self.total_fluxE = errors[6]                                                                                                      
  1077       458        158.7      0.3      0.0          self.centre_pixE = errors[1:3]                                                                                                    
  1078       458        124.9      0.3      0.0          self.size_pixE = errors[3:6]                                                                                                      
  1079       458       1386.1      3.0      0.2          self.rms = img.islands[isl_idx].rms                                                                                               
  1080       458       1235.0      2.7      0.2          self.mean = img.islands[isl_idx].mean                                                                                             
  1081       458       1297.7      2.8      0.2          self.total_flux_isl = img.islands[isl_idx].total_flux                                                                             
  1082       458       1260.2      2.8      0.2          self.total_flux_islE = img.islands[isl_idx].total_fluxE

I dont understand the whole flow of PyBDSFs code, but maybe @cache decorator would help for pix2coord and pix2gaus.

gmloose · 2024-06-11T15:51:25Z

Maybe, but this is yet another potential hot-spot 😄. It's very well possible that there are numerous performance bottle-necks in PyBDSF. Unfortunately, I lack the time to dive into this. What's also not helping is that there's no reliable suite of unit tests, so I'm a bit hesitant to make changes to the code without knowing if I break something. Maybe @darafferty can give some insights here?

darafferty · 2024-06-12T07:02:52Z

Interesting results -- as far as I know, this kind of profiling has never really been done. So I'm not surprised that there are lots of bottlenecks! The pytess_simple code is I think very rarely used in practice (it is part of the the module that estimates the PSF variations across the image, and I'm aware of only a few people who have used it). But the copy_gaussians() function is used in a lot of LOFAR processing (e.g., by the DDF pipeline). And yes, the lack of unit tests makes it difficult to be sure something doesn't break, but I guess it would be a monumental task to add them now.

So I would say that it would be worthwhile to experiment with improving some of the worst bottlenecks that Alex has identified. Perhaps for those at least we can write unit tests without too much work.

AlexKurek · 2024-06-12T08:56:41Z

As far as I can remember, there were also issues with images that took forever or a very long time to calculate. This may be a good way to identify where PyBDSF stalls in such cases. But I would use statistical profiler for this.

Tagging @mhardcastle since you may be interested to see this.

AlexKurek · 2024-06-12T10:00:28Z

BTW: for each --> Wavelet scale # I see in the logs: -> Calculating background rms and mean images. Does it have to be done for each scale anew? If I understand well, wavelets are larger, but the image is not decimated (?).

darafferty · 2024-06-12T10:19:03Z

Yes, unfortunately I think it is necessary to recalculate the rms and mean maps for each scale, as they do change (as it turns out, this fact was relevant to the last PR that was just merged).

AlexKurek · 2024-06-12T15:52:13Z

This is how it looks in general for 150 MB image:

Profile of rmsimage.py:
rmsimage_prof.zip
In short: here line 265: mean, rms = self.calculate_maps(img, data, mean, rms, mask, map_opts, [...])
takes 86.1% of CPU time. In calculate_maps self.map_2d takes 99.7%. And there:

line 539:   32.4 %     axes, mean_map1, rms_map1 = self.rms_mean_map(arr, mask_small, kappa, box, ncores)
line 548:   50.0 %     axes2, mean_map2, rms_map2 = self.rms_mean_map(arr, mask, kappa, box2, ncores)

And there:

712        10   54550644.9    5e+06     40.9%              cm_cr_list = mp.parallel_map(func.eval_func_tuple
717        10   75166237.6    8e+06     56.4 %             cm_cr_list = mp.parallel_map(func.eval_func_tuple

gmloose · 2024-06-13T15:47:26Z

Thanks for this work. We definitely need to follow up on this, but we do need to find the time for it.

AlexKurek changed the title ~~Vectorization of Fortran loops~~ Vectorization of Fortran loops and other optimizations Jun 12, 2024

darafferty mentioned this issue Jun 20, 2024

Add checks of output to tests #231

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vectorization of Fortran loops and other optimizations #227

Vectorization of Fortran loops and other optimizations #227

AlexKurek commented Jun 9, 2024

gmloose commented Jun 10, 2024

AlexKurek commented Jun 11, 2024 •

edited

Loading

gmloose commented Jun 11, 2024

AlexKurek commented Jun 11, 2024

gmloose commented Jun 11, 2024

darafferty commented Jun 12, 2024 •

edited

Loading

AlexKurek commented Jun 12, 2024 •

edited

Loading

AlexKurek commented Jun 12, 2024

darafferty commented Jun 12, 2024

AlexKurek commented Jun 12, 2024 •

edited

Loading

gmloose commented Jun 13, 2024

Vectorization of Fortran loops and other optimizations #227

Vectorization of Fortran loops and other optimizations #227

Comments

AlexKurek commented Jun 9, 2024

gmloose commented Jun 10, 2024

AlexKurek commented Jun 11, 2024 • edited Loading

gmloose commented Jun 11, 2024

AlexKurek commented Jun 11, 2024

gmloose commented Jun 11, 2024

darafferty commented Jun 12, 2024 • edited Loading

AlexKurek commented Jun 12, 2024 • edited Loading

AlexKurek commented Jun 12, 2024

darafferty commented Jun 12, 2024

AlexKurek commented Jun 12, 2024 • edited Loading

gmloose commented Jun 13, 2024

AlexKurek commented Jun 11, 2024 •

edited

Loading

darafferty commented Jun 12, 2024 •

edited

Loading

AlexKurek commented Jun 12, 2024 •

edited

Loading

AlexKurek commented Jun 12, 2024 •

edited

Loading