Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize values known at compile time. #547

Merged
merged 2 commits into from
Jul 16, 2019
Merged

Optimize values known at compile time. #547

merged 2 commits into from
Jul 16, 2019

Conversation

carlo-bramini
Copy link
Contributor

When doing float math PR, I have seen that into fluid_rev.c there is something that could be optimized with little effort.
In several lines it is used delay_length[NBR_DELAYS - 1], but this value is known at compile time and it is equal to DELAY_L7 or DELAY_L11, depending on the value of macro NBR_DELAYS.
So, I introduced a macro called DELAY_LEN_MAX, which points to that value.
In this way, some code has been simplified to a constant value instead of a number of load-multiply calculations.
I also introduced a small array called delay_length_x3[], which just stores the values of delay_length[] but multiplied by -3. This table is so small and its size can be comparable to the code generated previously, so in my opinion it would not be a bad idea for reducing a little the overhead.

@jjceresa
Copy link
Collaborator

Sure , however please may i suggest to keep original formulas as comment just above the new formula optimized. Also please comment the reason of new table delay_length_x3.

@derselbst
Copy link
Member

Sry, but DELAY_LEN_MAX doesn't optimize anything. delay_length[] is static const, an optimizing compiler will not read the array. It will fold the constants at compile time. Here's the assembly compiled with MSVC 19.21.27702.2 with "Maximum Optimization (Favor Speed) (/O2)":

Current master:

; 778  :             gi_tmp = gi_min + roomsize * (gi_max - gi_min);

	subsd	xmm0, xmm6
	mulsd	xmm0, xmm7
	addsd	xmm0, xmm6

; 779  :             /* Computes T60DC from gi using inverse of relation E2.*/
; 780  :             dc_rev_time = -3 * FLUID_M_LN10 * delay_length[NBR_DELAYS - 1] * sample_period / FLUID_LOGF(gi_tmp);

	call	log ; gi_tmp is in xmm0

; 781  :         }
; 782  : #endif /* ROOMSIZE_RESPONSE_LINEAR */
; 783  :         /*--------------------------------------------
; 784  :             Computes alpha
; 785  :         ----------------------------------------------*/
; 786  :         /* Computes alpha from damp,ai_tmp,gi_tmp using relation R */
; 787  :         /* - damp (0 to 1) controls concave reverb time for fs/2 frequency (T60DC to 0) */
; 788  :         ai_tmp = 1.0f * damp;
; 789  : 
; 790  :         /* Preserve the square of R */
; 791  :         alpha2 = 1.f / (1.f - ai_tmp / ((20.f / 80.f) * FLUID_LOGF(gi_tmp)));

	movsd	xmm10, QWORD PTR __real@3fd0000000000000
	movaps	xmm9, xmm11
; magic const folded number (-3 * FLUID_M_LN10 * delay_length[NBR_DELAYS - 1]) goes below
	mulsd	xmm9, QWORD PTR __real@c0be76db0fcf0397
	movaps	xmm6, xmm14
	divsd	xmm9, xmm0 ;
	mulsd	xmm0, xmm10
	divsd	xmm8, xmm0
	movaps	xmm0, xmm14
	subsd	xmm0, xmm8
	divsd	xmm6, xmm0

; 792  : 
; 793  :         alpha = FLUID_SQRT(alpha2); /* R */

	movaps	xmm0, xmm6
	call	sqrt

Your modification is the same, only register allocation differs:

; 792  :             gi_tmp = gi_min + roomsize * (gi_max - gi_min);

	subsd	xmm0, xmm6
	mulsd	xmm0, xmm7
	addsd	xmm0, xmm6

; 793  :             /* Computes T60DC from gi using inverse of relation E2.*/
; 794  :             dc_rev_time = (-3 * FLUID_M_LN10 * DELAY_LEN_MAX) * sample_period / FLUID_LOGF(gi_tmp);

	call	log

; 795  :         }
; 796  : #endif /* ROOMSIZE_RESPONSE_LINEAR */
; 797  :         /*--------------------------------------------
; 798  :             Computes alpha
; 799  :         ----------------------------------------------*/
; 800  :         /* Computes alpha from damp,ai_tmp,gi_tmp using relation R */
; 801  :         /* - damp (0 to 1) controls concave reverb time for fs/2 frequency (T60DC to 0) */
; 802  :         ai_tmp = 1.0f * damp;
; 803  : 
; 804  :         /* Preserve the square of R */
; 805  :         alpha2 = 1.f / (1.f - ai_tmp / ((20.f / 80.f) * FLUID_LOGF(gi_tmp)));

	movsd	xmm10, QWORD PTR __real@3fd0000000000000
	movaps	xmm9, xmm12
	mulsd	xmm9, QWORD PTR __real@c0be76db0fcf0397
	movaps	xmm6, xmm13
	divsd	xmm9, xmm0
	mulsd	xmm0, xmm10
	divsd	xmm8, xmm0
	movaps	xmm0, xmm13
	subsd	xmm0, xmm8
	divsd	xmm6, xmm0

; 806  : 
; 807  :         alpha = FLUID_SQRT(alpha2); /* R */

	movaps	xmm0, xmm6
	call	sqrt

Below is how it would look like if delay_length[] was neither const nor static; access to the array is pretty obvious:

; 778  :             gi_tmp = gi_min + roomsize * (gi_max - gi_min);

	subsd	xmm0, xmm6
	mulsd	xmm0, xmm8
	addsd	xmm0, xmm6

; 779  :             /* Computes T60DC from gi using inverse of relation E2.*/
; 780  :             dc_rev_time = -3 * FLUID_M_LN10 * delay_length[NBR_DELAYS - 1] * sample_period / FLUID_LOGF(gi_tmp);

	call	log

; 781  :         }
; 782  : #endif /* ROOMSIZE_RESPONSE_LINEAR */
; 783  :         /*--------------------------------------------
; 784  :             Computes alpha
; 785  :         ----------------------------------------------*/
; 786  :         /* Computes alpha from damp,ai_tmp,gi_tmp using relation R */
; 787  :         /* - damp (0 to 1) controls concave reverb time for fs/2 frequency (T60DC to 0) */
; 788  :         ai_tmp = 1.0f * damp;
; 789  : 
; 790  :         /* Preserve the square of R */
; 791  :         alpha2 = 1.f / (1.f - ai_tmp / ((20.f / 80.f) * FLUID_LOGF(gi_tmp)));

	movsd	xmm11, QWORD PTR __real@3fd0000000000000
	xorps	xmm10, xmm10
	cvtsi2sd xmm10, DWORD PTR delay_length+28
	movaps	xmm6, xmm14
	mulsd	xmm10, QWORD PTR __real@c01ba18a998fffa1
	mulsd	xmm10, xmm13
	divsd	xmm10, xmm0
	mulsd	xmm0, xmm11
	divsd	xmm9, xmm0
	movaps	xmm0, xmm14
	subsd	xmm0, xmm9
	divsd	xmm6, xmm0

; 792  : 
; 793  :         alpha = FLUID_SQRT(alpha2); /* R */

	movaps	xmm0, xmm6
	call	sqrt

Same is true for delay_length_x3. Full disassembly is attached below.

The only optimization you've made is rearranging the constants when calculating gi_max and gi_min, which expectedly saves a few multiplications. I would accept this change. However, keep in mind that the code you are about to optimize is only executed when the user changes reverb parameters or the sample rate, which I believe is rarely the case.

fluid_rev_carlo.txt
fluid_rev_master.txt
fluid_rev_non_const.txt

@carlo-bramini
Copy link
Contributor Author

keep in mind that the code you are about to optimize is only executed when the user changes reverb parameters or the sample rate, which I believe is rarely the case.

Thank you, I did not know that.

@derselbst derselbst merged commit d6c51cd into FluidSynth:master Jul 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants