Optimize values known at compile time. #547

carlo-bramini · 2019-07-15T16:13:52Z

When doing float math PR, I have seen that into fluid_rev.c there is something that could be optimized with little effort.
In several lines it is used delay_length[NBR_DELAYS - 1], but this value is known at compile time and it is equal to DELAY_L7 or DELAY_L11, depending on the value of macro NBR_DELAYS.
So, I introduced a macro called DELAY_LEN_MAX, which points to that value.
In this way, some code has been simplified to a constant value instead of a number of load-multiply calculations.
I also introduced a small array called delay_length_x3[], which just stores the values of delay_length[] but multiplied by -3. This table is so small and its size can be comparable to the code generated previously, so in my opinion it would not be a bad idea for reducing a little the overhead.

jjceresa · 2019-07-16T10:46:30Z

Sure , however please may i suggest to keep original formulas as comment just above the new formula optimized. Also please comment the reason of new table delay_length_x3.

derselbst · 2019-07-16T13:25:33Z

Sry, but DELAY_LEN_MAX doesn't optimize anything. delay_length[] is static const, an optimizing compiler will not read the array. It will fold the constants at compile time. Here's the assembly compiled with MSVC 19.21.27702.2 with "Maximum Optimization (Favor Speed) (/O2)":

Current master:

; 778  :             gi_tmp = gi_min + roomsize * (gi_max - gi_min);

	subsd	xmm0, xmm6
	mulsd	xmm0, xmm7
	addsd	xmm0, xmm6

; 779  :             /* Computes T60DC from gi using inverse of relation E2.*/
; 780  :             dc_rev_time = -3 * FLUID_M_LN10 * delay_length[NBR_DELAYS - 1] * sample_period / FLUID_LOGF(gi_tmp);

	call	log ; gi_tmp is in xmm0

; 781  :         }
; 782  : #endif /* ROOMSIZE_RESPONSE_LINEAR */
; 783  :         /*--------------------------------------------
; 784  :             Computes alpha
; 785  :         ----------------------------------------------*/
; 786  :         /* Computes alpha from damp,ai_tmp,gi_tmp using relation R */
; 787  :         /* - damp (0 to 1) controls concave reverb time for fs/2 frequency (T60DC to 0) */
; 788  :         ai_tmp = 1.0f * damp;
; 789  : 
; 790  :         /* Preserve the square of R */
; 791  :         alpha2 = 1.f / (1.f - ai_tmp / ((20.f / 80.f) * FLUID_LOGF(gi_tmp)));

	movsd	xmm10, QWORD PTR __real@3fd0000000000000
	movaps	xmm9, xmm11
; magic const folded number (-3 * FLUID_M_LN10 * delay_length[NBR_DELAYS - 1]) goes below
	mulsd	xmm9, QWORD PTR __real@c0be76db0fcf0397
	movaps	xmm6, xmm14
	divsd	xmm9, xmm0 ;
	mulsd	xmm0, xmm10
	divsd	xmm8, xmm0
	movaps	xmm0, xmm14
	subsd	xmm0, xmm8
	divsd	xmm6, xmm0

; 792  : 
; 793  :         alpha = FLUID_SQRT(alpha2); /* R */

	movaps	xmm0, xmm6
	call	sqrt

Your modification is the same, only register allocation differs:

; 792  :             gi_tmp = gi_min + roomsize * (gi_max - gi_min);

	subsd	xmm0, xmm6
	mulsd	xmm0, xmm7
	addsd	xmm0, xmm6

; 793  :             /* Computes T60DC from gi using inverse of relation E2.*/
; 794  :             dc_rev_time = (-3 * FLUID_M_LN10 * DELAY_LEN_MAX) * sample_period / FLUID_LOGF(gi_tmp);

	call	log

; 795  :         }
; 796  : #endif /* ROOMSIZE_RESPONSE_LINEAR */
; 797  :         /*--------------------------------------------
; 798  :             Computes alpha
; 799  :         ----------------------------------------------*/
; 800  :         /* Computes alpha from damp,ai_tmp,gi_tmp using relation R */
; 801  :         /* - damp (0 to 1) controls concave reverb time for fs/2 frequency (T60DC to 0) */
; 802  :         ai_tmp = 1.0f * damp;
; 803  : 
; 804  :         /* Preserve the square of R */
; 805  :         alpha2 = 1.f / (1.f - ai_tmp / ((20.f / 80.f) * FLUID_LOGF(gi_tmp)));

	movsd	xmm10, QWORD PTR __real@3fd0000000000000
	movaps	xmm9, xmm12
	mulsd	xmm9, QWORD PTR __real@c0be76db0fcf0397
	movaps	xmm6, xmm13
	divsd	xmm9, xmm0
	mulsd	xmm0, xmm10
	divsd	xmm8, xmm0
	movaps	xmm0, xmm13
	subsd	xmm0, xmm8
	divsd	xmm6, xmm0

; 806  : 
; 807  :         alpha = FLUID_SQRT(alpha2); /* R */

	movaps	xmm0, xmm6
	call	sqrt

Below is how it would look like if delay_length[] was neither const nor static; access to the array is pretty obvious:

; 778  :             gi_tmp = gi_min + roomsize * (gi_max - gi_min);

	subsd	xmm0, xmm6
	mulsd	xmm0, xmm8
	addsd	xmm0, xmm6

; 779  :             /* Computes T60DC from gi using inverse of relation E2.*/
; 780  :             dc_rev_time = -3 * FLUID_M_LN10 * delay_length[NBR_DELAYS - 1] * sample_period / FLUID_LOGF(gi_tmp);

	call	log

; 781  :         }
; 782  : #endif /* ROOMSIZE_RESPONSE_LINEAR */
; 783  :         /*--------------------------------------------
; 784  :             Computes alpha
; 785  :         ----------------------------------------------*/
; 786  :         /* Computes alpha from damp,ai_tmp,gi_tmp using relation R */
; 787  :         /* - damp (0 to 1) controls concave reverb time for fs/2 frequency (T60DC to 0) */
; 788  :         ai_tmp = 1.0f * damp;
; 789  : 
; 790  :         /* Preserve the square of R */
; 791  :         alpha2 = 1.f / (1.f - ai_tmp / ((20.f / 80.f) * FLUID_LOGF(gi_tmp)));

	movsd	xmm11, QWORD PTR __real@3fd0000000000000
	xorps	xmm10, xmm10
	cvtsi2sd xmm10, DWORD PTR delay_length+28
	movaps	xmm6, xmm14
	mulsd	xmm10, QWORD PTR __real@c01ba18a998fffa1
	mulsd	xmm10, xmm13
	divsd	xmm10, xmm0
	mulsd	xmm0, xmm11
	divsd	xmm9, xmm0
	movaps	xmm0, xmm14
	subsd	xmm0, xmm9
	divsd	xmm6, xmm0

; 792  : 
; 793  :         alpha = FLUID_SQRT(alpha2); /* R */

	movaps	xmm0, xmm6
	call	sqrt

Same is true for delay_length_x3. Full disassembly is attached below.

The only optimization you've made is rearranging the constants when calculating gi_max and gi_min, which expectedly saves a few multiplications. I would accept this change. However, keep in mind that the code you are about to optimize is only executed when the user changes reverb parameters or the sample rate, which I believe is rarely the case.

fluid_rev_carlo.txt
fluid_rev_master.txt
fluid_rev_non_const.txt

carlo-bramini · 2019-07-16T15:39:13Z

keep in mind that the code you are about to optimize is only executed when the user changes reverb parameters or the sample rate, which I believe is rarely the case.

Thank you, I did not know that.

Optimize values known at compile time.

f413600

Revert fixes that could be done by C optimizer

111edfd

derselbst merged commit d6c51cd into FluidSynth:master Jul 16, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize values known at compile time. #547

Optimize values known at compile time. #547

carlo-bramini commented Jul 15, 2019

jjceresa commented Jul 16, 2019

derselbst commented Jul 16, 2019

carlo-bramini commented Jul 16, 2019

Optimize values known at compile time. #547

Optimize values known at compile time. #547

Conversation

carlo-bramini commented Jul 15, 2019

jjceresa commented Jul 16, 2019

derselbst commented Jul 16, 2019

carlo-bramini commented Jul 16, 2019