Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

@asmcall should take an option for emitting as volatile #320

Closed
Seelengrab opened this issue Dec 1, 2022 · 4 comments
Closed

@asmcall should take an option for emitting as volatile #320

Seelengrab opened this issue Dec 1, 2022 · 4 comments

Comments

@Seelengrab
Copy link

While helping out debugging some slowness on an A64FX machine, I encountered this:

julia> using MCAnalyzer
[ Info: Precompiling MCAnalyzer [a81df072-f4bb-11e8-03d3-cfaeda626d18]

julia> function mysum(A)
           acc = zero(eltype(A))
           for i in eachindex(A)
               mark_start()
               @inbounds acc += A[i]
           end
           mark_end()
           return acc
       end
mysum (generic function with 1 method)

julia> analyze(mysum, (Vector{Int},))
/tmp/jl_zCoQt8/a.S:67:3: error: found an invalid region end directive
        # LLVM-MCA-END
         ^
/tmp/jl_zCoQt8/a.S:67:3: note: unable to find an active anonymous region
        # LLVM-MCA-END
         ^

This is reproducible with https://github.com/JuliaPerf/MCAnalyzer.jl/tree/no_iaca, while the example above works fine on the most recent release of MCAnalyzer.jl. The reason this fails seems to be due to the mark in the assembly being emitted twice, which of course llvm-mca then doesn't like:

	.text
	.file	"start"
	.globl	julia_mysum_1409                # -- Begin function julia_mysum_1409
	.p2align	4, 0x90
	.type	julia_mysum_1409,@function
julia_mysum_1409:                       # @julia_mysum_1409
	.cfi_startproc
# %bb.0:                                # %top
	movq	8(%rdi), %rcx
	testq	%rcx, %rcx
	je	.LBB0_1
# %bb.2:                                # %L13.preheader
	xorl	%edx, %edx
	xorl	%eax, %eax
	.p2align	4, 0x90
.LBB0_3:                                # %L13
                                        # =>This Inner Loop Header: Depth=1
	#APP
	# LLVM-MCA-BEGIN

	#NO_APP
	movq	(%rdi), %rsi
	addq	(%rsi,%rdx,8), %rax
	incq	%rdx
	cmpq	%rdx, %rcx
	jne	.LBB0_3
# %bb.4:                                # %L31
	#APP
	# LLVM-MCA-END

	#NO_APP
	retq
.LBB0_1:
	xorl	%eax, %eax
	#APP
	# LLVM-MCA-END

	#NO_APP
	retq
.Lfunc_end0:
	.size	julia_mysum_1409, .Lfunc_end0-julia_mysum_1409
	.cfi_endproc
                                        # -- End function
	.section	".note.GNU-stack","",@progbits

I think this wouldn't happen if this were marked as volatile, but @asmcall doesn't have this in its interface.

@vchuravy
Copy link
Collaborator

vchuravy commented Dec 1, 2022

The LLVM LangRef does not specify volatile for assembly https://llvm.org/docs/LangRef.html#inline-assembler-expressions

IIRC C level volatile get's translated to side-effect=true. This is more a codegen problem that the return block got split and the assembly duplicated.

@Seelengrab
Copy link
Author

Seelengrab commented Dec 1, 2022

Hm, that makes sense. I misremembered that I used volatile in a similar situation before, but that too only has sideeffect. Where should I look into to fix this (if necessary just for MCAnalyzer)? Base? Some custom pass in the IR using GPUCompiler..?

@vchuravy
Copy link
Collaborator

vchuravy commented Dec 2, 2022

Uhm... It's probably somewhere in LLVM. We could put in a pseudo intrinsic that we noop after optimization.

@Seelengrab
Copy link
Author

Yeah, that might be necessary.. I've noticed that adding this via the sideeffect/memory clobber breaks autovectorization pretty bad :/ Should I close this and track this in MCAnalyzer instead?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants