diff --git a/doc/src/manual/profile.md b/doc/src/manual/profile.md index 5b18f57a186be..60e2d8042ee18 100644 --- a/doc/src/manual/profile.md +++ b/doc/src/manual/profile.md @@ -310,7 +310,122 @@ the amount of memory allocated by each line of code. ### Line-by-Line Allocation Tracking -To measure allocation line-by-line, start Julia with the `--track-allocation=` command-line +While [`@time`](@ref) logs high-level stats about memory usage and garbage collection over the course +of evaluating an expression, it can be useful to log each garbage collection event, to get an +intuitive sense of how often the garbage collector is running, how long it's running each time, +and how much garbage it collects each time. This can be enabled with +[`GC.enable_logging(true)`](@ref), which causes Julia to log to stderr every time +a garbage collection happens. + +### [Allocation Profiler](@id allocation-profiler) + +!!! compat "Julia 1.8" + This functionality requires at least Julia 1.8. + +The allocation profiler records the stack trace, type, and size of each +allocation while it is running. It can be invoked with +[`Profile.Allocs.@profile`](@ref). + +This information about the allocations is returned as an array of `Alloc` +objects, wrapped in an `AllocResults` object. The best way to visualize these is +currently with the [PProf.jl](https://github.com/JuliaPerf/PProf.jl) and +[ProfileCanvas.jl](https://github.com/pfitzseb/ProfileCanvas.jl) packages, which +can visualize the call stacks which are making the most allocations. + +The allocation profiler does have significant overhead, so a `sample_rate` +argument can be passed to speed it up by making it skip some allocations. +Passing `sample_rate=1.0` will make it record everything (which is slow); +`sample_rate=0.1` will record only 10% of the allocations (faster), etc. + +!!! compat "Julia 1.11" + + Older versions of Julia could not capture types in all cases. In older versions of + Julia, if you see an allocation of type `Profile.Allocs.UnknownType`, it means that + the profiler doesn't know what type of object was allocated. This mainly happened when + the allocation was coming from generated code produced by the compiler. See + [issue #43688](https://github.com/JuliaLang/julia/issues/43688) for more info. + + Since Julia 1.11, all allocations should have a type reported. + +For more details on how to use this tool, please see the following talk from JuliaCon 2022: +https://www.youtube.com/watch?v=BFvpwC8hEWQ + +##### Allocation Profiler Example + +In this simple example, we use PProf to visualize the alloc profile. You could use another +visualization tool instead. We collect the profile (specifying a sample rate), then we visualize it. +```julia +using Profile, PProf +Profile.Allocs.clear() +Profile.Allocs.@profile sample_rate=0.0001 my_function() +PProf.Allocs.pprof() +``` + +Here is a more in-depth example, showing how we can tune the sample rate. A +good number of samples to aim for is around 1 - 10 thousand. Too many, and the +profile visualizer can get overwhelmed, and profiling will be slow. Too few, +and you don't have a representative sample. + + +```julia-repl +julia> import Profile + +julia> @time my_function() # Estimate allocations from a (second-run) of the function + 0.110018 seconds (1.50 M allocations: 58.725 MiB, 17.17% gc time) +500000 + +julia> Profile.Allocs.clear() + +julia> Profile.Allocs.@profile sample_rate=0.001 begin # 1.5 M * 0.001 = ~1.5K allocs. + my_function() + end +500000 + +julia> prof = Profile.Allocs.fetch(); # If you want, you can also manually inspect the results. + +julia> length(prof.allocs) # Confirm we have expected number of allocations. +1515 + +julia> using PProf # Now, visualize with an external tool, like PProf or ProfileCanvas. + +julia> PProf.Allocs.pprof(prof; from_c=false) # You can optionally pass in a previously fetched profile result. +Analyzing 1515 allocation samples... 100%|████████████████████████████████| Time: 0:00:00 +Main binary filename not available. +Serving web UI on http://localhost:62261 +"alloc-profile.pb.gz" +``` +Then you can view the profile by navigating to http://localhost:62261, and the profile is saved to disk. +See PProf package for more options. + +##### Allocation Profiling Tips + +As stated above, aim for around 1-10 thousand samples in your profile. + +Note that we are uniformly sampling in the space of _all allocations_, and are not weighting +our samples by the size of the allocation. So a given allocation profile may not give a +representative profile of where most bytes are allocated in your program, unless you had set +`sample_rate=1`. + +Allocations can come from users directly constructing objects, but can also come from inside +the runtime or be inserted into compiled code to handle type instability. Looking at the +"source code" view can be helpful to isolate them, and then other external tools such as +[`Cthulhu.jl`](https://github.com/JuliaDebug/Cthulhu.jl) can be useful for identifying the +cause of the allocation. + +##### Allocation Profile Visualization Tools + +There are several profiling visualization tools now that can all display Allocation +Profiles. Here is a small list of some of the main ones we know about: +- [PProf.jl](https://github.com/JuliaPerf/PProf.jl) +- [ProfileCanvas.jl](https://github.com/pfitzseb/ProfileCanvas.jl) +- VSCode's built-in profile visualizer (`@profview_allocs`) [docs needed] +- Viewing the results directly in the REPL + - You can inspect the results in the REPL via [`Profile.Allocs.fetch()`](@ref), to view + the stacktrace and type of each allocation. + +#### Line-by-Line Allocation Tracking + +An alternative way to measure allocations is to start Julia with the `--track-allocation=` command-line option, for which you can choose `none` (the default, do not measure allocation), `user` (measure memory allocation everywhere except Julia's core code), or `all` (measure memory allocation at each line of Julia code). Allocation gets measured for each line of compiled code. When you quit diff --git a/src/gc-alloc-profiler.h b/src/gc-alloc-profiler.h index 3fd8bf4388a0a..fcd8e45caa2d8 100644 --- a/src/gc-alloc-profiler.h +++ b/src/gc-alloc-profiler.h @@ -35,6 +35,7 @@ void _maybe_record_alloc_to_profile(jl_value_t *val, size_t size, jl_datatype_t extern int g_alloc_profile_enabled; +// This should only be used from _deprecated_ code paths. We shouldn't see UNKNOWN anymore. #define jl_gc_unknown_type_tag ((jl_datatype_t*)0xdeadaa03) static inline void maybe_record_alloc_to_profile(jl_value_t *val, size_t size, jl_datatype_t *typ) JL_NOTSAFEPOINT { diff --git a/src/gc.c b/src/gc.c index ea207bbab7698..8abe092b1c249 100644 --- a/src/gc.c +++ b/src/gc.c @@ -1177,7 +1177,7 @@ static inline jl_value_t *jl_gc_big_alloc_inner(jl_ptls_t ptls, size_t sz) return jl_valueof(&v->header); } -// Instrumented version of jl_gc_big_alloc_inner, called into by LLVM-generated code. +// Deprecated version, supported for legacy code. JL_DLLEXPORT jl_value_t *jl_gc_big_alloc(jl_ptls_t ptls, size_t sz) { jl_value_t *val = jl_gc_big_alloc_inner(ptls, sz); @@ -1185,6 +1185,13 @@ JL_DLLEXPORT jl_value_t *jl_gc_big_alloc(jl_ptls_t ptls, size_t sz) maybe_record_alloc_to_profile(val, sz, jl_gc_unknown_type_tag); return val; } +// Instrumented version of jl_gc_big_alloc_inner, called into by LLVM-generated code. +JL_DLLEXPORT jl_value_t *jl_gc_big_alloc_instrumented(jl_ptls_t ptls, size_t sz, jl_value_t *type) +{ + jl_value_t *val = jl_gc_big_alloc_inner(ptls, sz); + maybe_record_alloc_to_profile(val, sz, (jl_datatype_t*)type); + return val; +} // This wrapper exists only to prevent `jl_gc_big_alloc_inner` from being inlined into // its callers. We provide an external-facing interface for callers, and inline `jl_gc_big_alloc_inner` @@ -1494,7 +1501,7 @@ static inline jl_value_t *jl_gc_pool_alloc_inner(jl_ptls_t ptls, int pool_offset return jl_valueof(v); } -// Instrumented version of jl_gc_pool_alloc_inner, called into by LLVM-generated code. +// Deprecated version, supported for legacy code. JL_DLLEXPORT jl_value_t *jl_gc_pool_alloc(jl_ptls_t ptls, int pool_offset, int osize) { @@ -1503,6 +1510,14 @@ JL_DLLEXPORT jl_value_t *jl_gc_pool_alloc(jl_ptls_t ptls, int pool_offset, maybe_record_alloc_to_profile(val, osize, jl_gc_unknown_type_tag); return val; } +// Instrumented version of jl_gc_pool_alloc_inner, called into by LLVM-generated code. +JL_DLLEXPORT jl_value_t *jl_gc_pool_alloc_instrumented(jl_ptls_t ptls, int pool_offset, + int osize, jl_value_t* type) +{ + jl_value_t *val = jl_gc_pool_alloc_inner(ptls, pool_offset, osize); + maybe_record_alloc_to_profile(val, osize, (jl_datatype_t*)type); + return val; +} // This wrapper exists only to prevent `jl_gc_pool_alloc_inner` from being inlined into // its callers. We provide an external-facing interface for callers, and inline `jl_gc_pool_alloc_inner` @@ -4054,7 +4069,10 @@ static void *gc_managed_realloc_(jl_ptls_t ptls, void *d, size_t sz, size_t olds SetLastError(last_error); #endif errno = last_errno; - maybe_record_alloc_to_profile((jl_value_t*)b, sz, jl_gc_unknown_type_tag); + // gc_managed_realloc_ is currently used exclusively for resizing array buffers. + if (allocsz > oldsz) { + maybe_record_alloc_to_profile((jl_value_t*)b, allocsz - oldsz, (jl_datatype_t*)jl_buff_tag); + } return b; } diff --git a/src/jl_exported_funcs.inc b/src/jl_exported_funcs.inc index 7f29176e67755..6511555286de7 100644 --- a/src/jl_exported_funcs.inc +++ b/src/jl_exported_funcs.inc @@ -160,6 +160,7 @@ XX(jl_gc_alloc_3w) \ XX(jl_gc_alloc_typed) \ XX(jl_gc_big_alloc) \ + XX(jl_gc_big_alloc_instrumented) \ XX(jl_gc_collect) \ XX(jl_gc_conservative_gc_support_enabled) \ XX(jl_gc_counted_calloc) \ @@ -186,6 +187,7 @@ XX(jl_gc_new_weakref_th) \ XX(jl_gc_num) \ XX(jl_gc_pool_alloc) \ + XX(jl_gc_pool_alloc_instrumented) \ XX(jl_gc_queue_multiroot) \ XX(jl_gc_queue_root) \ XX(jl_gc_safepoint) \ diff --git a/src/llvm-final-gc-lowering.cpp b/src/llvm-final-gc-lowering.cpp index 1da37a249fbd2..0a43c52ddfbc4 100644 --- a/src/llvm-final-gc-lowering.cpp +++ b/src/llvm-final-gc-lowering.cpp @@ -205,12 +205,13 @@ Value *FinalLowerGC::lowerQueueGCBinding(CallInst *target, Function &F) Value *FinalLowerGC::lowerGCAllocBytes(CallInst *target, Function &F) { ++GCAllocBytesCount; - assert(target->arg_size() == 2); + assert(target->arg_size() == 3); CallInst *newI; IRBuilder<> builder(target); builder.SetCurrentDebugLocation(target->getDebugLoc()); auto ptls = target->getArgOperand(0); + auto type = target->getArgOperand(2); Attribute derefAttr; if (auto CI = dyn_cast(target->getArgOperand(1))) { @@ -221,19 +222,19 @@ Value *FinalLowerGC::lowerGCAllocBytes(CallInst *target, Function &F) if (offset < 0) { newI = builder.CreateCall( bigAllocFunc, - { ptls, ConstantInt::get(getSizeTy(F.getContext()), sz + sizeof(void*)) }); + { ptls, ConstantInt::get(getSizeTy(F.getContext()), sz + sizeof(void*)), type }); derefAttr = Attribute::getWithDereferenceableBytes(F.getContext(), sz + sizeof(void*)); } else { auto pool_offs = ConstantInt::get(Type::getInt32Ty(F.getContext()), offset); auto pool_osize = ConstantInt::get(Type::getInt32Ty(F.getContext()), osize); - newI = builder.CreateCall(poolAllocFunc, { ptls, pool_offs, pool_osize }); + newI = builder.CreateCall(poolAllocFunc, { ptls, pool_offs, pool_osize, type }); derefAttr = Attribute::getWithDereferenceableBytes(F.getContext(), osize); } } else { auto size = builder.CreateZExtOrTrunc(target->getArgOperand(1), getSizeTy(F.getContext())); size = builder.CreateAdd(size, ConstantInt::get(getSizeTy(F.getContext()), sizeof(void*))); - newI = builder.CreateCall(allocTypedFunc, { ptls, size, ConstantPointerNull::get(Type::getInt8PtrTy(F.getContext())) }); + newI = builder.CreateCall(allocTypedFunc, { ptls, size, type }); derefAttr = Attribute::getWithDereferenceableBytes(F.getContext(), sizeof(void*)); } newI->setAttributes(newI->getCalledFunction()->getAttributes()); diff --git a/src/llvm-late-gc-lowering.cpp b/src/llvm-late-gc-lowering.cpp index eaba9c7b10d98..9c8959ae7874a 100644 --- a/src/llvm-late-gc-lowering.cpp +++ b/src/llvm-late-gc-lowering.cpp @@ -2324,22 +2324,6 @@ bool LateLowerGCFrame::CleanupIR(Function &F, State *S, bool *CFGModified) { IRBuilder<> builder(CI); builder.SetCurrentDebugLocation(CI->getDebugLoc()); - // Create a call to the `julia.gc_alloc_bytes` intrinsic, which is like - // `julia.gc_alloc_obj` except it doesn't set the tag. - auto allocBytesIntrinsic = getOrDeclare(jl_intrinsics::GCAllocBytes); - auto ptlsLoad = get_current_ptls_from_task(builder, CI->getArgOperand(0), tbaa_gcframe); - auto ptls = builder.CreateBitCast(ptlsLoad, Type::getInt8PtrTy(builder.getContext())); - auto newI = builder.CreateCall( - allocBytesIntrinsic, - { - ptls, - builder.CreateIntCast( - CI->getArgOperand(1), - allocBytesIntrinsic->getFunctionType()->getParamType(1), - false) - }); - newI->takeName(CI); - // LLVM alignment/bit check is not happy about addrspacecast and refuse // to remove write barrier because of it. // We pretty much only load using `T_size` so try our best to strip @@ -2378,7 +2362,36 @@ bool LateLowerGCFrame::CleanupIR(Function &F, State *S, bool *CFGModified) { builder.CreateAlignmentAssumption(DL, tag, 16); } } - // Set the tag. + + // Create a call to the `julia.gc_alloc_bytes` intrinsic, which is like + // `julia.gc_alloc_obj` except it specializes the call based on the constant + // size of the object to allocate, to save one indirection, and doesn't set + // the type tag. (Note that if the size is not a constant, it will call + // gc_alloc_obj, and will redundantly set the tag.) + auto allocBytesIntrinsic = getOrDeclare(jl_intrinsics::GCAllocBytes); + auto ptlsLoad = get_current_ptls_from_task(builder, CI->getArgOperand(0), tbaa_gcframe); + auto ptls = builder.CreateBitCast(ptlsLoad, Type::getInt8PtrTy(builder.getContext())); + auto newI = builder.CreateCall( + allocBytesIntrinsic, + { + ptls, + builder.CreateIntCast( + CI->getArgOperand(1), + allocBytesIntrinsic->getFunctionType()->getParamType(1), + false), + builder.CreatePtrToInt(tag, T_size), + }); + newI->takeName(CI); + + // Now, finally, set the tag. We do this in IR instead of in the C alloc + // function, to provide possible optimization opportunities. (I think? TBH + // the most recent editor of this code is not entirely clear on why we + // prefer to set the tag in the generated code. Providing optimziation + // opportunities is the most likely reason; the tradeoff is slightly + // larger code size and increased compilation time, compiling this + // instruction at every allocation site, rather than once in the C alloc + // function.) + auto &M = *builder.GetInsertBlock()->getModule(); StoreInst *store = builder.CreateAlignedStore( tag, EmitTagPtr(builder, tag_type, newI), Align(sizeof(size_t))); store->setOrdering(AtomicOrdering::Unordered); diff --git a/src/llvm-pass-helpers.cpp b/src/llvm-pass-helpers.cpp index fa3437ffdce48..f589cb5672365 100644 --- a/src/llvm-pass-helpers.cpp +++ b/src/llvm-pass-helpers.cpp @@ -120,6 +120,12 @@ namespace jl_intrinsics { static const char *QUEUE_GC_ROOT_NAME = "julia.queue_gc_root"; static const char *QUEUE_GC_BINDING_NAME = "julia.queue_gc_binding"; + static auto T_size_t(const JuliaPassContext &context) { + return sizeof(size_t) == sizeof(uint32_t) ? + Type::getInt32Ty(context.getLLVMContext()) : + Type::getInt64Ty(context.getLLVMContext()); + } + // Annotates a function with attributes suitable for GC allocation // functions. Specifically, the return value is marked noalias and nonnull. // The allocation size is set to the first argument. @@ -150,9 +156,8 @@ namespace jl_intrinsics { FunctionType::get( context.T_prjlvalue, { Type::getInt8PtrTy(context.getLLVMContext()), - sizeof(size_t) == sizeof(uint32_t) ? - Type::getInt32Ty(context.getLLVMContext()) : - Type::getInt64Ty(context.getLLVMContext()) }, + T_size_t(context), + T_size_t(context) }, // type false), Function::ExternalLinkage, GC_ALLOC_BYTES_NAME); @@ -227,12 +232,18 @@ namespace jl_intrinsics { } namespace jl_well_known { - static const char *GC_BIG_ALLOC_NAME = XSTR(jl_gc_big_alloc); - static const char *GC_POOL_ALLOC_NAME = XSTR(jl_gc_pool_alloc); + static const char *GC_BIG_ALLOC_NAME = XSTR(jl_gc_big_alloc_instrumented); + static const char *GC_POOL_ALLOC_NAME = XSTR(jl_gc_pool_alloc_instrumented); static const char *GC_QUEUE_ROOT_NAME = XSTR(jl_gc_queue_root); static const char *GC_QUEUE_BINDING_NAME = XSTR(jl_gc_queue_binding); static const char *GC_ALLOC_TYPED_NAME = XSTR(jl_gc_alloc_typed); + static auto T_size_t(const JuliaPassContext &context) { + return sizeof(size_t) == sizeof(uint32_t) ? + Type::getInt32Ty(context.getLLVMContext()) : + Type::getInt64Ty(context.getLLVMContext()); + } + using jl_intrinsics::addGCAllocAttributes; const WellKnownFunctionDescription GCBigAlloc( @@ -242,9 +253,8 @@ namespace jl_well_known { FunctionType::get( context.T_prjlvalue, { Type::getInt8PtrTy(context.getLLVMContext()), - sizeof(size_t) == sizeof(uint32_t) ? - Type::getInt32Ty(context.getLLVMContext()) : - Type::getInt64Ty(context.getLLVMContext()) }, + T_size_t(context), + T_size_t(context) }, false), Function::ExternalLinkage, GC_BIG_ALLOC_NAME); @@ -258,7 +268,7 @@ namespace jl_well_known { auto poolAllocFunc = Function::Create( FunctionType::get( context.T_prjlvalue, - { Type::getInt8PtrTy(context.getLLVMContext()), Type::getInt32Ty(context.getLLVMContext()), Type::getInt32Ty(context.getLLVMContext()) }, + { Type::getInt8PtrTy(context.getLLVMContext()), Type::getInt32Ty(context.getLLVMContext()), Type::getInt32Ty(context.getLLVMContext()), T_size_t(context) }, false), Function::ExternalLinkage, GC_POOL_ALLOC_NAME); @@ -301,10 +311,8 @@ namespace jl_well_known { FunctionType::get( context.T_prjlvalue, { Type::getInt8PtrTy(context.getLLVMContext()), - sizeof(size_t) == sizeof(uint32_t) ? - Type::getInt32Ty(context.getLLVMContext()) : - Type::getInt64Ty(context.getLLVMContext()), - Type::getInt8PtrTy(context.getLLVMContext()) }, + T_size_t(context), + T_size_t(context) }, // type false), Function::ExternalLinkage, GC_ALLOC_TYPED_NAME); diff --git a/stdlib/Profile/test/allocs.jl b/stdlib/Profile/test/allocs.jl index c2ec7d2f6cb54..ae0cbab945f01 100644 --- a/stdlib/Profile/test/allocs.jl +++ b/stdlib/Profile/test/allocs.jl @@ -121,3 +121,34 @@ end @test length(prof.allocs) >= 1 @test length([a for a in prof.allocs if a.type == String]) >= 1 end + +@testset "alloc profiler catches allocs from codegen" begin + @eval begin + struct MyType x::Int; y::Int end + Base.:(+)(n::Number, x::MyType) = n + x.x + x.y + foo(a, x) = a[1] + x + wrapper(a) = foo(a, MyType(0,1)) + end + a = Any[1,2,3] + # warmup + wrapper(a) + + @eval Allocs.@profile sample_rate=1 wrapper($a) + + prof = Allocs.fetch() + Allocs.clear() + + @test length(prof.allocs) >= 1 + @test length([a for a in prof.allocs if a.type == MyType]) >= 1 +end + +@testset "alloc profiler catches allocs from buffer resize" begin + a = Int[] + Allocs.@profile sample_rate=1 for _ in 1:100; push!(a, 1); end + + prof = Allocs.fetch() + Allocs.clear() + + @test length(prof.allocs) >= 1 + @test length([a for a in prof.allocs if a.type == Profile.Allocs.BufferType]) >= 1 +end diff --git a/test/llvmpasses/alloc-opt-gcframe.jl b/test/llvmpasses/alloc-opt-gcframe.jl index 3b5fc3a51a606..ad4be12be0840 100644 --- a/test/llvmpasses/alloc-opt-gcframe.jl +++ b/test/llvmpasses/alloc-opt-gcframe.jl @@ -14,11 +14,11 @@ target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128" # CHECK-LABEL: @return_obj # CHECK-NOT: @julia.gc_alloc_obj # CHECK: %current_task = getelementptr inbounds {}*, {}** %gcstack, i64 -12 -# CHECK-NEXT: [[ptls_field:%.*]] = getelementptr inbounds {}*, {}** %current_task, i64 15 +# CHECK: [[ptls_field:%.*]] = getelementptr inbounds {}*, {}** %current_task, i64 15 # CHECK-NEXT: [[ptls_load:%.*]] = load {}*, {}** [[ptls_field]], align 8, !tbaa !0 # CHECK-NEXT: [[ppjl_ptls:%.*]] = bitcast {}* [[ptls_load]] to {}** # CHECK-NEXT: [[ptls_i8:%.*]] = bitcast {}** [[ppjl_ptls]] to i8* -# CHECK-NEXT: %v = call noalias nonnull {} addrspace(10)* @ijl_gc_pool_alloc(i8* [[ptls_i8]], i32 [[SIZE_T:[0-9]+]], i32 16) +# CHECK-NEXT: %v = call noalias nonnull {} addrspace(10)* @ijl_gc_pool_alloc_instrumented(i8* [[ptls_i8]], i32 [[SIZE_T:[0-9]+]], i32 16, i64 {{.*}} @tag {{.*}}) # CHECK: store atomic {} addrspace(10)* @tag, {} addrspace(10)* addrspace(10)* {{.*}} unordered, align 8, !tbaa !4 println(""" define {} addrspace(10)* @return_obj() { @@ -260,8 +260,8 @@ L3: """) # CHECK-LABEL: }{{$}} -# CHECK: declare noalias nonnull {} addrspace(10)* @ijl_gc_pool_alloc(i8*, -# CHECK: declare noalias nonnull {} addrspace(10)* @ijl_gc_big_alloc(i8*, +# CHECK: declare noalias nonnull {} addrspace(10)* @ijl_gc_pool_alloc_instrumented(i8*, +# CHECK: declare noalias nonnull {} addrspace(10)* @ijl_gc_big_alloc_instrumented(i8*, println(""" declare void @external_function() declare {}*** @julia.get_pgcstack() diff --git a/test/llvmpasses/final-lower-gc.ll b/test/llvmpasses/final-lower-gc.ll index 4af43f748020b..840c911d0874f 100644 --- a/test/llvmpasses/final-lower-gc.ll +++ b/test/llvmpasses/final-lower-gc.ll @@ -13,7 +13,7 @@ declare noalias nonnull {} addrspace(10)** @julia.new_gc_frame(i32) declare void @julia.push_gc_frame({} addrspace(10)**, i32) declare {} addrspace(10)** @julia.get_gc_frame_slot({} addrspace(10)**, i32) declare void @julia.pop_gc_frame({} addrspace(10)**) -declare noalias nonnull {} addrspace(10)* @julia.gc_alloc_bytes(i8*, i64) #0 +declare noalias nonnull {} addrspace(10)* @julia.gc_alloc_bytes(i8*, i64, i64) #0 attributes #0 = { allocsize(1) } @@ -59,8 +59,8 @@ top: %pgcstack = call {}*** @julia.get_pgcstack() %ptls = call {}*** @julia.ptls_states() %ptls_i8 = bitcast {}*** %ptls to i8* -; CHECK: %v = call noalias nonnull {} addrspace(10)* @ijl_gc_pool_alloc - %v = call {} addrspace(10)* @julia.gc_alloc_bytes(i8* %ptls_i8, i64 8) +; CHECK: %v = call noalias nonnull {} addrspace(10)* @ijl_gc_pool_alloc_instrumented + %v = call {} addrspace(10)* @julia.gc_alloc_bytes(i8* %ptls_i8, i64 8, i64 12341234) %0 = bitcast {} addrspace(10)* %v to {} addrspace(10)* addrspace(10)* %1 = getelementptr {} addrspace(10)*, {} addrspace(10)* addrspace(10)* %0, i64 -1 store {} addrspace(10)* @tag, {} addrspace(10)* addrspace(10)* %1, align 8, !tbaa !0 @@ -74,8 +74,8 @@ top: %ptls = call {}*** @julia.ptls_states() %ptls_i8 = bitcast {}*** %ptls to i8* ; CHECK: %0 = add i64 %size, 8 -; CHECK: %v = call noalias nonnull {} addrspace(10)* @ijl_gc_alloc_typed(i8* %ptls_i8, i64 %0, i8* null) - %v = call {} addrspace(10)* @julia.gc_alloc_bytes(i8* %ptls_i8, i64 %size) +; CHECK: %v = call noalias nonnull {} addrspace(10)* @ijl_gc_alloc_typed(i8* %ptls_i8, i64 %0, i64 12341234) + %v = call {} addrspace(10)* @julia.gc_alloc_bytes(i8* %ptls_i8, i64 %size, i64 12341234) %0 = bitcast {} addrspace(10)* %v to {} addrspace(10)* addrspace(10)* %1 = getelementptr {} addrspace(10)*, {} addrspace(10)* addrspace(10)* %0, i64 -1 store {} addrspace(10)* @tag, {} addrspace(10)* addrspace(10)* %1, align 8, !tbaa !0 diff --git a/test/llvmpasses/late-lower-gc-addrspaces.ll b/test/llvmpasses/late-lower-gc-addrspaces.ll index 7497febf1e846..7bb8c76b07e63 100644 --- a/test/llvmpasses/late-lower-gc-addrspaces.ll +++ b/test/llvmpasses/late-lower-gc-addrspaces.ll @@ -49,7 +49,7 @@ top: ; CHECK-NEXT: [[ptls_load:%.*]] = load {}*, {}** [[ptls_field]], align 8, !tbaa !0 ; CHECK-NEXT: [[ppjl_ptls:%.*]] = bitcast {}* [[ptls_load]] to {}** ; CHECK-NEXT: [[ptls_i8:%.*]] = bitcast {}** [[ppjl_ptls]] to i8* -; CHECK-NEXT: %v = call {} addrspace(10)* @julia.gc_alloc_bytes(i8* [[ptls_i8]], [[SIZE_T:i.[0-9]+]] 8) +; CHECK-NEXT: %v = call {} addrspace(10)* @julia.gc_alloc_bytes(i8* [[ptls_i8]], [[SIZE_T:i.[0-9]+]] 8, i64 {{.*}} @tag {{.*}}) ; CHECK-NEXT: [[V2:%.*]] = bitcast {} addrspace(10)* %v to {} addrspace(10)* addrspace(10)* ; CHECK-NEXT: [[V_HEADROOM:%.*]] = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(10)* [[V2]], i64 -1 ; CHECK-NEXT: store atomic {} addrspace(10)* @tag, {} addrspace(10)* addrspace(10)* [[V_HEADROOM]] unordered, align 8, !tbaa !4 @@ -74,7 +74,7 @@ top: ; CHECK-NEXT: [[ptls_load:%.*]] = load {}*, {}** [[ptls_field]], align 8, !tbaa !0 ; CHECK-NEXT: [[ppjl_ptls:%.*]] = bitcast {}* [[ptls_load]] to {}** ; CHECK-NEXT: [[ptls_i8:%.*]] = bitcast {}** [[ppjl_ptls]] to i8* -; CHECK-NEXT: %v = call {} addrspace(10)* @julia.gc_alloc_bytes(i8* [[ptls_i8]], [[SIZE_T:i.[0-9]+]] 8) +; CHECK-NEXT: %v = call {} addrspace(10)* @julia.gc_alloc_bytes(i8* [[ptls_i8]], [[SIZE_T:i.[0-9]+]] 8, i64 {{.*}} @tag {{.*}}) ; CHECK-NEXT: [[V2:%.*]] = bitcast {} addrspace(10)* %v to {} addrspace(10)* addrspace(10)* ; CHECK-NEXT: [[V_HEADROOM:%.*]] = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(10)* [[V2]], i64 -1 ; CHECK-NEXT: store atomic {} addrspace(10)* @tag, {} addrspace(10)* addrspace(10)* [[V_HEADROOM]] unordered, align 8, !tbaa !4 diff --git a/test/llvmpasses/late-lower-gc.ll b/test/llvmpasses/late-lower-gc.ll index 65a67c78d7810..77599290f8ef7 100644 --- a/test/llvmpasses/late-lower-gc.ll +++ b/test/llvmpasses/late-lower-gc.ll @@ -46,7 +46,7 @@ top: ; CHECK-NEXT: [[ptls_load:%.*]] = load {}*, {}** [[ptls_field]], align 8, !tbaa !0 ; CHECK-NEXT: [[ppjl_ptls:%.*]] = bitcast {}* [[ptls_load]] to {}** ; CHECK-NEXT: [[ptls_i8:%.*]] = bitcast {}** [[ppjl_ptls]] to i8* -; CHECK-NEXT: %v = call {} addrspace(10)* @julia.gc_alloc_bytes(i8* [[ptls_i8]], [[SIZE_T:i.[0-9]+]] 8) +; CHECK-NEXT: %v = call {} addrspace(10)* @julia.gc_alloc_bytes(i8* [[ptls_i8]], [[SIZE_T:i.[0-9]+]] 8, i64 {{.*}} @tag {{.*}}) ; CHECK-NEXT: [[V2:%.*]] = bitcast {} addrspace(10)* %v to {} addrspace(10)* addrspace(10)* ; CHECK-NEXT: [[V_HEADROOM:%.*]] = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(10)* [[V2]], i64 -1 ; CHECK-NEXT: store atomic {} addrspace(10)* @tag, {} addrspace(10)* addrspace(10)* [[V_HEADROOM]] unordered, align 8, !tbaa !4 @@ -71,7 +71,7 @@ top: ; CHECK-NEXT: [[ptls_load:%.*]] = load {}*, {}** [[ptls_field]], align 8, !tbaa !0 ; CHECK-NEXT: [[ppjl_ptls:%.*]] = bitcast {}* [[ptls_load]] to {}** ; CHECK-NEXT: [[ptls_i8:%.*]] = bitcast {}** [[ppjl_ptls]] to i8* -; CHECK-NEXT: %v = call {} addrspace(10)* @julia.gc_alloc_bytes(i8* [[ptls_i8]], [[SIZE_T:i.[0-9]+]] 8) +; CHECK-NEXT: %v = call {} addrspace(10)* @julia.gc_alloc_bytes(i8* [[ptls_i8]], [[SIZE_T:i.[0-9]+]] 8, i64 {{.*}} @tag {{.*}}) ; CHECK-NEXT: [[V2:%.*]] = bitcast {} addrspace(10)* %v to {} addrspace(10)* addrspace(10)* ; CHECK-NEXT: [[V_HEADROOM:%.*]] = getelementptr inbounds {} addrspace(10)*, {} addrspace(10)* addrspace(10)* [[V2]], i64 -1 ; CHECK-NEXT: store atomic {} addrspace(10)* @tag, {} addrspace(10)* addrspace(10)* [[V_HEADROOM]] unordered, align 8, !tbaa !4 @@ -154,7 +154,7 @@ define void @decayar([2 x {} addrspace(10)* addrspace(11)*] %ar) { %l0 = load {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %e0 %e1 = extractvalue [2 x {} addrspace(10)* addrspace(11)*] %ar, 1 %l1 = load {} addrspace(10)*, {} addrspace(10)* addrspace(11)* %e1 - %r = call i32 @callee_root({} addrspace(10)* %l0, {} addrspace(10)* %l1) + %r = call i32 @callee_root({} addrspace(10)* %l0, {} addrspace(10)* %l1) ret void } diff --git a/test/llvmpasses/pipeline-o0.jl b/test/llvmpasses/pipeline-o0.jl index ff9cd0aace704..3cbd5a9174cc2 100644 --- a/test/llvmpasses/pipeline-o0.jl +++ b/test/llvmpasses/pipeline-o0.jl @@ -9,7 +9,7 @@ include(joinpath("..", "testhelpers", "llvmpasses.jl")) # CHECK-NOT: julia.get_pgcstack # CHECK: asm # CHECK-NOT: julia.gc_alloc_obj -# CHECK: ijl_gc_pool_alloc +# CHECK: ijl_gc_pool_alloc_instrumented # COM: we want something vaguely along the lines of asm load from the fs register -> allocate bytes function simple() Ref(0)