-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PERF: Make AdvancedNormalizedCorrelationImageToImageMetric faster #556
Conversation
Added a new member function to `AdvancedImageToImageMetric`, `FastEvaluateMovingImageValueAndDerivative`, which calls the multi-threaded overload of ITK's `itk::BSplineInterpolateImageFunction::EvaluateValueAndDerivativeAtContinuousIndex`, indirectly. (It does so via `EvaluateMovingImageValueAndDerivativeWithOptionalThreadId`, another new member function.) Made `AdvancedNormalizedCorrelationImageToImageMetric::ThreadedGetValueAndDerivative` faster, by calling this new `FastEvaluateMovingImageValueAndDerivative`. A large performance improvement was observed for GoogleTest unit test `itkElastixRegistrationMethod.EulerDiscRotation2D` (which uses the "AdvancedNormalizedCorrelation" metric), from ~1.5 second before this commit down to ~0.9 second after this commit, using Visual Studio 2019, Release configuration. (For a Debug configuration even from ~15 seconds before, down to ~4 seconds after this commit.)
@mstaring @stefanklein Here is the "Slower, but safer" Here is the faster overload (which has a "FIXME -- Review this "fix" and ensure it works." The performance improvement comes from skipping the memory allocation of local For the record, the "FIXME" was there already with commit InsightSoftwareConsortium/ITK@6abbc79 (6 August 2010): https://github.com/InsightSoftwareConsortium/ITK/blob/6abbc7969a90786c4c73f5d191f634db536c2d1d/Code/BasicFilters/itkBSplineInterpolateImageFunction.txx#L297 |
Replace `this->EvaluateMovingImageValueAndDerivative(...)` calls by `this->FastEvaluateMovingImageValueAndDerivative(..., threadId)` in four more metrics: itkAdvancedKappaStatisticImageToImageMetric itkParzenWindowMutualInformationImageToImageMetric itkAdvancedMeanSquaresImageToImageMetric itkPCAMetric_F_multithreaded Follow-up to pull request #556 commit c8c4a6a "PERF: Make AdvancedNormalizedCorrelationImageToImageMetric faster"
Replace `this->EvaluateMovingImageValueAndDerivative(...)` calls by `this->FastEvaluateMovingImageValueAndDerivative(..., threadId)` in the "Threaded" member functions of six more metrics: itkAdvancedKappaStatisticImageToImageMetric itkAdvancedMeanSquaresImageToImageMetric itkParzenWindowHistogramImageToImageMetric itkParzenWindowMutualInformationImageToImageMetric itkPCAMetric_F_multithreaded itkSumSquaredTissueVolumeDifferenceImageToImageMetric Expected a significant performance gain on the run-time duration. Follow-up to pull request #556 commit c8c4a6a "PERF: Make AdvancedNormalizedCorrelationImageToImageMetric faster"
Replace `this->EvaluateMovingImageValueAndDerivative(...)` calls by `this->FastEvaluateMovingImageValueAndDerivative(..., threadId)` in the "Threaded" member functions of six more metrics: itkAdvancedKappaStatisticImageToImageMetric itkAdvancedMeanSquaresImageToImageMetric itkParzenWindowHistogramImageToImageMetric itkParzenWindowMutualInformationImageToImageMetric itkPCAMetric_F_multithreaded itkSumSquaredTissueVolumeDifferenceImageToImageMetric Expected a significant performance gain on the run-time duration. Follow-up to pull request #556 commit c8c4a6a "PERF: Make AdvancedNormalizedCorrelationImageToImageMetric faster"
Replaced `this->EvaluateMovingImageValueAndDerivative(...)` calls by `this->FastEvaluateMovingImageValueAndDerivative(..., threadId)` in the "Threaded" member functions of six more metrics: itkAdvancedKappaStatisticImageToImageMetric itkAdvancedMeanSquaresImageToImageMetric itkParzenWindowHistogramImageToImageMetric itkParzenWindowMutualInformationImageToImageMetric itkPCAMetric_F_multithreaded itkSumSquaredTissueVolumeDifferenceImageToImageMetric Expected a significant performance gain on the run-time duration. `AdvancedKappaStatisticImageToImageMetric::ThreadedGetValueAndDerivative` is being tested by 55/161 - elastix_run_3DCT_lung.Kappa.bspline.ASGD.001_OUTPUT `AdvancedMeanSquaresImageToImageMetric::ThreadedGetValueAndDerivative` is being tested by: - 43/161 - elastix_run_3DCT_lung.SSD.bspline.ASGD.001_OUTPUT - 98/161 - elastix_run_3DCT_lung.SSD.bspline.ASGD.002_OUTPUT - 102/161 - elastix_run_3DCT_lung.SSD.bspline.ASGD.003_OUTPUT - 106/161 - elastix_run_3DCT_lung.SSD.bspline.ASGD.001-Threads1_OUTPUT - 110/161 - elastix_run_3DCT_lung.SSD.bspline.ASGD.001-Threads2_OUTPUT - 114/161 - elastix_run_3DCT_lung.SSD.bspline.ASGD.001-Threads4_OUTPUT `ParzenWindowHistogramImageToImageMetric::ThreadedComputePDFs` is being tested by: - 20/161 - elastix_run_example_OUTPUT - 47/161 - elastix_run_3DCT_lung.MI.bspline.ASGD.001_OUTPUT - 82/161 - elastix_run_3DCT_lung.MI.bspline.SGD.001_OUTPUT - 86/161 - elastix_run_3DCT_lung.MI.bspline.SGD.002_OUTPUT - 90/161 - elastix_run_3DCT_lung.MI.bspline.SGD.003_OUTPUT - 94/161 - elastix_run_3DCT_lung.MI.bspline.SGD.004_OUTPUT - 130/161 - elastix_run_3DCT_lung.MI.bspline.SGD.001-Threads1_OUTPUT - 134/161 - elastix_run_3DCT_lung.MI.bspline.SGD.001-Threads2_OUTPUT - 138/161 - elastix_run_3DCT_lung.MI.bspline.SGD.001-Threads4_OUTPUT Follow-up to pull request #556 commit c8c4a6a "PERF: Make AdvancedNormalizedCorrelationImageToImageMetric faster"
Replaced `this->EvaluateMovingImageValueAndDerivative(...)` calls by `this->FastEvaluateMovingImageValueAndDerivative(..., threadId)` in the "Threaded" member functions of six more metrics: itkAdvancedKappaStatisticImageToImageMetric itkAdvancedMeanSquaresImageToImageMetric itkParzenWindowHistogramImageToImageMetric itkParzenWindowMutualInformationImageToImageMetric itkPCAMetric_F_multithreaded itkSumSquaredTissueVolumeDifferenceImageToImageMetric Expected a significant performance gain on the run-time duration. `AdvancedKappaStatisticImageToImageMetric::ThreadedGetValueAndDerivative` is being tested by 55/161 - elastix_run_3DCT_lung.Kappa.bspline.ASGD.001_OUTPUT `AdvancedMeanSquaresImageToImageMetric::ThreadedGetValueAndDerivative` is being tested by: - 43/161 - elastix_run_3DCT_lung.SSD.bspline.ASGD.001_OUTPUT - 98/161 - elastix_run_3DCT_lung.SSD.bspline.ASGD.002_OUTPUT - 102/161 - elastix_run_3DCT_lung.SSD.bspline.ASGD.003_OUTPUT - 106/161 - elastix_run_3DCT_lung.SSD.bspline.ASGD.001-Threads1_OUTPUT - 110/161 - elastix_run_3DCT_lung.SSD.bspline.ASGD.001-Threads2_OUTPUT - 114/161 - elastix_run_3DCT_lung.SSD.bspline.ASGD.001-Threads4_OUTPUT `ParzenWindowHistogramImageToImageMetric::ThreadedComputePDFs` is being tested by: - 20/161 - elastix_run_example_OUTPUT - 47/161 - elastix_run_3DCT_lung.MI.bspline.ASGD.001_OUTPUT - 82/161 - elastix_run_3DCT_lung.MI.bspline.SGD.001_OUTPUT - 86/161 - elastix_run_3DCT_lung.MI.bspline.SGD.002_OUTPUT - 90/161 - elastix_run_3DCT_lung.MI.bspline.SGD.003_OUTPUT - 94/161 - elastix_run_3DCT_lung.MI.bspline.SGD.004_OUTPUT - 130/161 - elastix_run_3DCT_lung.MI.bspline.SGD.001-Threads1_OUTPUT - 134/161 - elastix_run_3DCT_lung.MI.bspline.SGD.001-Threads2_OUTPUT - 138/161 - elastix_run_3DCT_lung.MI.bspline.SGD.001-Threads4_OUTPUT Follow-up to pull request #556 commit c8c4a6a "PERF: Make AdvancedNormalizedCorrelationImageToImageMetric faster"
Added a new member function to
AdvancedImageToImageMetric
,FastEvaluateMovingImageValueAndDerivative
, which calls the multi-threaded overload of ITK'sitk::BSplineInterpolateImageFunction::EvaluateValueAndDerivativeAtContinuousIndex
, indirectly. (It does so viaEvaluateMovingImageValueAndDerivativeWithOptionalThreadId
, another new member function.)Made
AdvancedNormalizedCorrelationImageToImageMetric::ThreadedGetValueAndDerivative
faster, by calling this newFastEvaluateMovingImageValueAndDerivative
.A large performance improvement was observed for GoogleTest unit test
itkElastixRegistrationMethod.EulerDiscRotation2D
(which uses the "AdvancedNormalizedCorrelation" metric), from ~1.5 second before this commit down to ~0.9 second after this commit, using Visual Studio 2019, Release configuration. (For a Debug configuration even from ~15 seconds before, down to ~4 seconds after this commit.)