systematic, efficient approach to string construction #3

StefanKarpinski · 2011-04-27T04:13:04Z

The current approach uses polymorphism to make RopeString objects. This is pretty inefficient for the typical small string use-case. To efficiently construct a C-style string in the current framework, one makes the current output stream a memio object and then prints to it. The general pattern I've used is to write a print_whatever function and then wrap it in a whatever function that returns a string using print_to_string. Should we stick with this pattern? It has the advantage of allowing the printing version to be very efficient, but it's kind of awkward to write. Should we figure out a different pattern? Something like C#'s StringBuilder pattern?

Perhaps it suffices to make strcat check the size and encodings of its arguments and use print_to_string approach to concatenate them into a copied string where appropriate — namely when the arguments are of compatible encodings (e.g. any mixture of ASCIIString and UTF8String), and if concatenated they would be below some size threshold. For larger strings, we should continue to use the RopeString approach. Also, string slices should copy their contents as well unless the resulting string is above the "large string" threshold, in which case, they can continue to use the current SubString with the known issue that this pins the superstring in memory.

The text was updated successfully, but these errors were encountered:

JeffBezanson · 2011-05-03T05:35:45Z

Changing types based on string lengths makes it too hard to infer the types of these rather common operations. Instead, we should have the option to wrap a string as BigString(s) if s might be large, and BigString can use the memory-saving versions of these operations.
There's not much difference between print_to_string and StringBuilder. with_output_stream can be skipped in some cases by using write() with an explicit destination argument. It's also nice to be able to write output directly to an I/O endpoint without building a temporary string first.
Simple string building cases can also be handled by pushing characters into an array.

StefanKarpinski · 2011-05-03T16:38:40Z

Makes sense. I can make the BigString change easily.

Is this an argument for continuing to implement core string building functionality by writing the printing version first and then defining the string creating version by applying print_to_string to the printing version?

JeffBezanson · 2011-05-03T17:40:54Z

Somewhat, but multiple approaches can be used. For example, if you're just
combining strings and characters you can use write() instead of going
through print_to_string. We might want to provide some nicer names for
memio, takebuf_string, and write, and make it look more like StringBuilder.
Or for something like strcat I would determine the size of the result,
allocate it once, and use memcpy.

The trouble is that if I do something like

write(io, strcat(a,b,c))

what you ideally want is to write each string without forming the temporary.
Even if strcat is written using an i/o buffer you don't get that
automatically here. I might have to say

strcat_to(io, a, b, c)

but that's not a very nice interface. If a, b, or c is a BigString though,
the strcat is done lazily and you get the desired behavior of writing all
the pieces with no copying. This seems to convince me that there's no
advantage to writing all the string functions in terms of printing. So do
whatever's simplest/fastest/convenient, and let BigString handle other
concerns. How's that sound?

print_escaped is a bit different since we know that a main use of it is
doing output. So strcat etc. doesn't necessarily need to imitate it.

On Tue, May 3, 2011 at 12:38 PM, StefanKarpinski <
reply@reply.github.com>wrote:

Makes sense. I can make the BigString change easily.

Is this an argument for continuing to implement core string building
functionality by writing the printing version first and then defining the
string creating version by applying print_to_string to the printing version?

Reply to this email directly or view it on GitHub:
#3 (comment)

ViralBShah · 2011-07-09T18:11:45Z

This seems like a 2.0 thing.

StefanKarpinski · 2011-07-09T18:25:39Z

We're actually pretty good on this at this point. All strcat and string ref (substring) operations on ASCIIString and UTF8String objects use memcpy now, so they're fast and they don't create exotic string objects (RopeString, SubString, etc.). Repeating a string does create aRepString` object, but I think that's probably acceptable. I could make a copying implementation of that rather easily.

If someone wants to use a StringBuilder pattern, they can write the printing version and then use print_to_string on it. I feel like that's a reasonable approach if one is worried about strcat efficiency, with the added bonus of providing a version of the same functionality that can print without having to build a string at all.

I think this issue is not fully addressed, but well enough for v1.0 for now. Will reassign to v2.0.

JeffBezanson · 2011-07-09T19:06:24Z

Can I replace memcpy(a) with copy(a)?

StefanKarpinski · 2011-07-09T19:09:08Z

Is copy(a::Array{Uint8,1}) as efficient as memcpy is?

JeffBezanson · 2011-07-09T19:13:17Z

It should be now that we changed copy_to to use memcpy for arrays where possible.

StefanKarpinski · 2011-07-09T19:20:35Z

We can get rid of memcpy entirely then. I'll do it.

ViralBShah · 2011-07-09T19:52:57Z

We also need to experiment with some sizes at which memcpy is faster. It is actually slower for small arrays. Copy_to should have these smarts.

On 10-Jul-2011, at 12:43 AM, JeffBezansonreply@reply.github.com wrote:

It should be now that we changed copy_to to use memcpy for arrays where possible.

Reply to this email directly or view it on GitHub:
#3 (comment)

Rebase to a5b5d64

This commit improves SROA pass by extending the `unswitchtupleunion` optimization to handle the general parametric types, e.g.: ```julia julia> struct A{T} x::T end; julia> function foo(a1, a2, c) t = c ? A(a1) : A(a2) return getfield(t, :x) end; julia> only(Base.code_ircode(foo, (Int,Float64,Bool); optimize_until="SROA")) ``` > Before ``` 2 1 ─ goto #3 if not _4 │ 2 ─ %2 = %new(A{Int64}, _2)::A{Int64} │╻ A └── goto #4 │ 3 ─ %4 = %new(A{Float64}, _3)::A{Float64} │╻ A 4 ┄ %5 = φ (#2 => %2, #3 => %4)::Union{A{Float64}, A{Int64}} │ 3 │ %6 = Main.getfield(%5, :x)::Union{Float64, Int64} │ └── return %6 │ => Union{Float64, Int64} ``` > After ``` julia> only(Base.code_ircode(foo, (Int,Float64,Bool); optimize_until="SROA")) 2 1 ─ goto #3 if not _4 │ 2 ─ nothing::A{Int64} │╻ A └── goto #4 │ 3 ─ nothing::A{Float64} │╻ A 4 ┄ %8 = φ (#2 => _2, #3 => _3)::Union{Float64, Int64} │ │ nothing::Union{A{Float64}, A{Int64}} 3 │ %6 = %8::Union{Float64, Int64} │ └── return %6 │ => Union{Float64, Int64} ```

Part of #3

Together with previous commits, fixes #3

Properly resolve the library symbol in :foreigncall (fixes #3)

Fixes: #33147 Replaces/Closes: #40445 The difference here, compared to past implementations, is that we use the zero-cost `isiterable` check on every intermediate step, instead of wrapping the call in a try/catch and then trying to re-approximate the `isiterable` afterwards. Some samples: ```julia julia> Dict(i for i in 1:3) ERROR: ArgumentError: AbstractDict(kv): kv needs to be an iterator of 2-tuples or pairs Stacktrace: [1] _throw_dict_kv_error() @ Base ./dict.jl:118 [2] grow_to! @ ./dict.jl:132 [inlined] [3] dict_with_eltype @ ./abstractdict.jl:592 [inlined] [4] Dict(kv::Base.Generator{UnitRange{Int64}, typeof(identity)}) @ Base ./dict.jl:120 [5] top-level scope @ REPL[1]:1 julia> Dict(i => error("$i") for i in 1:3) ERROR: 1 Stacktrace: [1] error(s::String) @ Base ./error.jl:35 [2] (::var"#3#4")(i::Int64) @ Main ./none:0 [3] iterate @ ./generator.jl:48 [inlined] [4] grow_to! @ ./dict.jl:124 [inlined] [5] dict_with_eltype @ ./abstractdict.jl:592 [inlined] [6] Dict(kv::Base.Generator{UnitRange{Int64}, var"#3#4"}) @ Base ./dict.jl:120 [7] top-level scope @ REPL[2]:1 ``` The other unrelated change here is that `dest = empty(dest, typeof(k), typeof(v))` is made conditional, so we do not unconditionally construct an empty Dict in order to discard it and allocate an exact duplicate of it, but only do so if inference wasn't precise originally. Co-authored-by: Curtis Vogt <curtis.vogt@gmail.com>

@inbounds

For example, we seek to eliminate the gc frame from this function, as observed here: ```julia julia> code_llvm((BitSet,), raw=true) do x; r = x.bits; GC.safepoint(); @inbounds r[1]; end ; Function Signature: var"#3"(Base.BitSet) ; @ REPL[1]:1 within `#3` define swiftcc i64 @"julia_#3_494"(ptr nonnull swiftself %pgcstack, ptr noundef nonnull align 8 dereferenceable(16) %"x::BitSet") #0 !dbg !5 { top: call void @llvm.dbg.declare(metadata ptr %"x::BitSet", metadata !21, metadata !DIExpression()), !dbg !22 %ptls_field = getelementptr inbounds ptr, ptr %pgcstack, i64 2 %ptls_load = load ptr, ptr %ptls_field, align 8, !tbaa !23 %0 = getelementptr inbounds ptr, ptr %ptls_load, i64 2 %safepoint = load ptr, ptr %0, align 8, !tbaa !27 fence syncscope("singlethread") seq_cst %1 = load volatile i64, ptr %safepoint, align 8, !dbg !22 fence syncscope("singlethread") seq_cst ; ┌ @ Base.jl:49 within `getproperty` %"x::BitSet.bits" = load atomic ptr, ptr %"x::BitSet" unordered, align 8, !dbg !29, !tbaa !27, !alias.scope !33, !noalias !36, !nonnull !11, !dereferenceable !41, !align !42 ; └ ; ┌ @ gcutils.jl:253 within `safepoint` %ptls_load4 = load ptr, ptr %ptls_field, align 8, !dbg !43, !tbaa !23 %2 = getelementptr inbounds ptr, ptr %ptls_load4, i64 2, !dbg !43 %safepoint5 = load ptr, ptr %2, align 8, !dbg !43, !tbaa !27 fence syncscope("singlethread") seq_cst, !dbg !43 %3 = load volatile i64, ptr %safepoint5, align 8, !dbg !43 fence syncscope("singlethread") seq_cst, !dbg !43 ; └ ; ┌ @ essentials.jl:892 within `getindex` %4 = load ptr, ptr %"x::BitSet.bits", align 8, !dbg !46, !tbaa !49, !alias.scope !52, !noalias !53 %5 = load i64, ptr %4, align 8, !dbg !46, !tbaa !54, !alias.scope !57, !noalias !58 ret i64 %5, !dbg !46 ; └ } ```

@inbounds

For example, we seek to eliminate the gc frame from this function, as observed here: ```julia julia> code_llvm((BitSet,), raw=true) do x; r = x.bits; GC.safepoint(); @inbounds r[1]; end ; Function Signature: var"#3"(Base.BitSet) ; @ REPL[1]:1 within `#3` define swiftcc i64 @"julia_#3_494"(ptr nonnull swiftself %pgcstack, ptr noundef nonnull align 8 dereferenceable(16) %"x::BitSet") #0 !dbg !5 { top: call void @llvm.dbg.declare(metadata ptr %"x::BitSet", metadata !21, metadata !DIExpression()), !dbg !22 %ptls_field = getelementptr inbounds ptr, ptr %pgcstack, i64 2 %ptls_load = load ptr, ptr %ptls_field, align 8, !tbaa !23 %0 = getelementptr inbounds ptr, ptr %ptls_load, i64 2 %safepoint = load ptr, ptr %0, align 8, !tbaa !27 fence syncscope("singlethread") seq_cst %1 = load volatile i64, ptr %safepoint, align 8, !dbg !22 fence syncscope("singlethread") seq_cst ; ┌ @ Base.jl:49 within `getproperty` %"x::BitSet.bits" = load atomic ptr, ptr %"x::BitSet" unordered, align 8, !dbg !29, !tbaa !27, !alias.scope !33, !noalias !36, !nonnull !11, !dereferenceable !41, !align !42 ; └ ; ┌ @ gcutils.jl:253 within `safepoint` %ptls_load4 = load ptr, ptr %ptls_field, align 8, !dbg !43, !tbaa !23 %2 = getelementptr inbounds ptr, ptr %ptls_load4, i64 2, !dbg !43 %safepoint5 = load ptr, ptr %2, align 8, !dbg !43, !tbaa !27 fence syncscope("singlethread") seq_cst, !dbg !43 %3 = load volatile i64, ptr %safepoint5, align 8, !dbg !43 fence syncscope("singlethread") seq_cst, !dbg !43 ; └ ; ┌ @ essentials.jl:892 within `getindex` %4 = load ptr, ptr %"x::BitSet.bits", align 8, !dbg !46, !tbaa !49, !alias.scope !52, !noalias !53 %5 = load i64, ptr %4, align 8, !dbg !46, !tbaa !54, !alias.scope !57, !noalias !58 ret i64 %5, !dbg !46 ; └ } ```

@inbounds

For example, we seek to eliminate the gc frame from this function, as observed here: ```julia julia> code_llvm((BitSet,), raw=true) do x; r = x.bits; GC.safepoint(); @inbounds r[1]; end ; Function Signature: var"#3"(Base.BitSet) ; @ REPL[1]:1 within `#3` define swiftcc i64 @"julia_#3_494"(ptr nonnull swiftself %pgcstack, ptr noundef nonnull align 8 dereferenceable(16) %"x::BitSet") #0 !dbg !5 { top: call void @llvm.dbg.declare(metadata ptr %"x::BitSet", metadata !21, metadata !DIExpression()), !dbg !22 %ptls_field = getelementptr inbounds ptr, ptr %pgcstack, i64 2 %ptls_load = load ptr, ptr %ptls_field, align 8, !tbaa !23 %0 = getelementptr inbounds ptr, ptr %ptls_load, i64 2 %safepoint = load ptr, ptr %0, align 8, !tbaa !27 fence syncscope("singlethread") seq_cst %1 = load volatile i64, ptr %safepoint, align 8, !dbg !22 fence syncscope("singlethread") seq_cst ; ┌ @ Base.jl:49 within `getproperty` %"x::BitSet.bits" = load atomic ptr, ptr %"x::BitSet" unordered, align 8, !dbg !29, !tbaa !27, !alias.scope !33, !noalias !36, !nonnull !11, !dereferenceable !41, !align !42 ; └ ; ┌ @ gcutils.jl:253 within `safepoint` %ptls_load4 = load ptr, ptr %ptls_field, align 8, !dbg !43, !tbaa !23 %2 = getelementptr inbounds ptr, ptr %ptls_load4, i64 2, !dbg !43 %safepoint5 = load ptr, ptr %2, align 8, !dbg !43, !tbaa !27 fence syncscope("singlethread") seq_cst, !dbg !43 %3 = load volatile i64, ptr %safepoint5, align 8, !dbg !43 fence syncscope("singlethread") seq_cst, !dbg !43 ; └ ; ┌ @ essentials.jl:892 within `getindex` %4 = load ptr, ptr %"x::BitSet.bits", align 8, !dbg !46, !tbaa !49, !alias.scope !52, !noalias !53 %5 = load i64, ptr %4, align 8, !dbg !46, !tbaa !54, !alias.scope !57, !noalias !58 ret i64 %5, !dbg !46 ; └ } ```

@inbounds

For example, we seek to eliminate the gc frame from this function, as observed here: ```julia julia> code_llvm((BitSet,), raw=true) do x; r = x.bits; GC.safepoint(); @inbounds r[1]; end ; Function Signature: var"https://github.com/JuliaLang/julia/issues/3"(Base.BitSet) ; @ REPL[1]:1 within `https://github.com/JuliaLang/julia/issues/3` define swiftcc i64 @"julia_#3_494"(ptr nonnull swiftself %pgcstack, ptr noundef nonnull align 8 dereferenceable(16) %"x::BitSet") #0 !dbg !5 { top: call void @llvm.dbg.declare(metadata ptr %"x::BitSet", metadata !21, metadata !DIExpression()), !dbg !22 %ptls_field = getelementptr inbounds ptr, ptr %pgcstack, i64 2 %ptls_load = load ptr, ptr %ptls_field, align 8, !tbaa !23 %0 = getelementptr inbounds ptr, ptr %ptls_load, i64 2 %safepoint = load ptr, ptr %0, align 8, !tbaa !27 fence syncscope("singlethread") seq_cst %1 = load volatile i64, ptr %safepoint, align 8, !dbg !22 fence syncscope("singlethread") seq_cst ; ┌ @ Base.jl:49 within `getproperty` %"x::BitSet.bits" = load atomic ptr, ptr %"x::BitSet" unordered, align 8, !dbg !29, !tbaa !27, !alias.scope !33, !noalias !36, !nonnull !11, !dereferenceable !41, !align !42 ; └ ; ┌ @ gcutils.jl:253 within `safepoint` %ptls_load4 = load ptr, ptr %ptls_field, align 8, !dbg !43, !tbaa !23 %2 = getelementptr inbounds ptr, ptr %ptls_load4, i64 2, !dbg !43 %safepoint5 = load ptr, ptr %2, align 8, !dbg !43, !tbaa !27 fence syncscope("singlethread") seq_cst, !dbg !43 %3 = load volatile i64, ptr %safepoint5, align 8, !dbg !43 fence syncscope("singlethread") seq_cst, !dbg !43 ; └ ; ┌ @ essentials.jl:892 within `getindex` %4 = load ptr, ptr %"x::BitSet.bits", align 8, !dbg !46, !tbaa !49, !alias.scope !52, !noalias !53 %5 = load i64, ptr %4, align 8, !dbg !46, !tbaa !54, !alias.scope !57, !noalias !58 ret i64 %5, !dbg !46 ; └ } ```

@inbounds

For example, we seek to eliminate the gc frame from this function, as observed here: ```julia julia> code_llvm((BitSet,), raw=true) do x; r = x.bits; GC.safepoint(); @inbounds r[1]; end ; Function Signature: var"JuliaLang#3"(Base.BitSet) ; @ REPL[1]:1 within `JuliaLang#3` define swiftcc i64 @"julia_#3_494"(ptr nonnull swiftself %pgcstack, ptr noundef nonnull align 8 dereferenceable(16) %"x::BitSet") #0 !dbg !5 { top: call void @llvm.dbg.declare(metadata ptr %"x::BitSet", metadata !21, metadata !DIExpression()), !dbg !22 %ptls_field = getelementptr inbounds ptr, ptr %pgcstack, i64 2 %ptls_load = load ptr, ptr %ptls_field, align 8, !tbaa !23 %0 = getelementptr inbounds ptr, ptr %ptls_load, i64 2 %safepoint = load ptr, ptr %0, align 8, !tbaa !27 fence syncscope("singlethread") seq_cst %1 = load volatile i64, ptr %safepoint, align 8, !dbg !22 fence syncscope("singlethread") seq_cst ; ┌ @ Base.jl:49 within `getproperty` %"x::BitSet.bits" = load atomic ptr, ptr %"x::BitSet" unordered, align 8, !dbg !29, !tbaa !27, !alias.scope !33, !noalias !36, !nonnull !11, !dereferenceable !41, !align !42 ; └ ; ┌ @ gcutils.jl:253 within `safepoint` %ptls_load4 = load ptr, ptr %ptls_field, align 8, !dbg !43, !tbaa !23 %2 = getelementptr inbounds ptr, ptr %ptls_load4, i64 2, !dbg !43 %safepoint5 = load ptr, ptr %2, align 8, !dbg !43, !tbaa !27 fence syncscope("singlethread") seq_cst, !dbg !43 %3 = load volatile i64, ptr %safepoint5, align 8, !dbg !43 fence syncscope("singlethread") seq_cst, !dbg !43 ; └ ; ┌ @ essentials.jl:892 within `getindex` %4 = load ptr, ptr %"x::BitSet.bits", align 8, !dbg !46, !tbaa !49, !alias.scope !52, !noalias !53 %5 = load i64, ptr %4, align 8, !dbg !46, !tbaa !54, !alias.scope !57, !noalias !58 ret i64 %5, !dbg !46 ; └ } ```

Add a README

The functions `toms`, `tons`, and `days` uses `sum` over a vector of `Period`s to obtain the conversion of a `CompoundPeriod`. However, the compiler cannot infer the return type because those functions can return either `Int` or `Float` depending on the type of the `Period`. This PR forces the result of those functions to be `Float64`, fixing the type stability. Before this PR we had: ```julia julia> using Dates julia> p = Dates.Second(1) + Dates.Minute(1) + Dates.Year(1) 1 year, 1 minute, 1 second julia> @code_warntype Dates.tons(p) MethodInstance for Dates.tons(::Dates.CompoundPeriod) from tons(c::Dates.CompoundPeriod) @ Dates ~/.julia/juliaup/julia-nightly/share/julia/stdlib/v1.12/Dates/src/periods.jl:458 Arguments #self#::Core.Const(Dates.tons) c::Dates.CompoundPeriod Body::Any 1 ─ %1 = Dates.isempty::Core.Const(isempty) │ %2 = Base.getproperty(c, :periods)::Vector{Period} │ %3 = (%1)(%2)::Bool └── goto #3 if not %3 2 ─ return 0.0 3 ─ %6 = Dates.Float64::Core.Const(Float64) │ %7 = Dates.sum::Core.Const(sum) │ %8 = Dates.tons::Core.Const(Dates.tons) │ %9 = Base.getproperty(c, :periods)::Vector{Period} │ %10 = (%7)(%8, %9)::Any │ %11 = (%6)(%10)::Any └── return %11 julia> @code_warntype Dates.toms(p) MethodInstance for Dates.toms(::Dates.CompoundPeriod) from toms(c::Dates.CompoundPeriod) @ Dates ~/.julia/juliaup/julia-nightly/share/julia/stdlib/v1.12/Dates/src/periods.jl:454 Arguments #self#::Core.Const(Dates.toms) c::Dates.CompoundPeriod Body::Any 1 ─ %1 = Dates.isempty::Core.Const(isempty) │ %2 = Base.getproperty(c, :periods)::Vector{Period} │ %3 = (%1)(%2)::Bool └── goto #3 if not %3 2 ─ return 0.0 3 ─ %6 = Dates.Float64::Core.Const(Float64) │ %7 = Dates.sum::Core.Const(sum) │ %8 = Dates.toms::Core.Const(Dates.toms) │ %9 = Base.getproperty(c, :periods)::Vector{Period} │ %10 = (%7)(%8, %9)::Any │ %11 = (%6)(%10)::Any └── return %11 julia> @code_warntype Dates.days(p) MethodInstance for Dates.days(::Dates.CompoundPeriod) from days(c::Dates.CompoundPeriod) @ Dates ~/.julia/juliaup/julia-nightly/share/julia/stdlib/v1.12/Dates/src/periods.jl:468 Arguments #self#::Core.Const(Dates.days) c::Dates.CompoundPeriod Body::Any 1 ─ %1 = Dates.isempty::Core.Const(isempty) │ %2 = Base.getproperty(c, :periods)::Vector{Period} │ %3 = (%1)(%2)::Bool └── goto #3 if not %3 2 ─ return 0.0 3 ─ %6 = Dates.Float64::Core.Const(Float64) │ %7 = Dates.sum::Core.Const(sum) │ %8 = Dates.days::Core.Const(Dates.days) │ %9 = Base.getproperty(c, :periods)::Vector{Period} │ %10 = (%7)(%8, %9)::Any │ %11 = (%6)(%10)::Any └── return %11 ``` After this PR we have: ```julia julia> using Dates julia> p = Dates.Second(1) + Dates.Minute(1) + Dates.Year(1) 1 year, 1 minute, 1 second julia> @code_warntype Dates.tons(p) MethodInstance for Dates.tons(::Dates.CompoundPeriod) from tons(c::Dates.CompoundPeriod) @ Dates ~/.julia/juliaup/julia-nightly/share/julia/stdlib/v1.12/Dates/src/periods.jl:458 Arguments #self#::Core.Const(Dates.tons) c::Dates.CompoundPeriod Body::Float64 1 ─ %1 = Dates.isempty::Core.Const(isempty) │ %2 = Base.getproperty(c, :periods)::Vector{Period} │ %3 = (%1)(%2)::Bool └── goto #3 if not %3 2 ─ return 0.0 3 ─ %6 = Dates.Float64::Core.Const(Float64) │ %7 = Dates.sum::Core.Const(sum) │ %8 = Dates.tons::Core.Const(Dates.tons) │ %9 = Base.getproperty(c, :periods)::Vector{Period} │ %10 = (%7)(%8, %9)::Any │ %11 = (%6)(%10)::Any │ %12 = Dates.Float64::Core.Const(Float64) │ %13 = Core.typeassert(%11, %12)::Float64 └── return %13 julia> @code_warntype Dates.toms(p) MethodInstance for Dates.toms(::Dates.CompoundPeriod) from toms(c::Dates.CompoundPeriod) @ Dates ~/.julia/juliaup/julia-nightly/share/julia/stdlib/v1.12/Dates/src/periods.jl:454 Arguments #self#::Core.Const(Dates.toms) c::Dates.CompoundPeriod Body::Float64 1 ─ %1 = Dates.isempty::Core.Const(isempty) │ %2 = Base.getproperty(c, :periods)::Vector{Period} │ %3 = (%1)(%2)::Bool └── goto #3 if not %3 2 ─ return 0.0 3 ─ %6 = Dates.Float64::Core.Const(Float64) │ %7 = Dates.sum::Core.Const(sum) │ %8 = Dates.toms::Core.Const(Dates.toms) │ %9 = Base.getproperty(c, :periods)::Vector{Period} │ %10 = (%7)(%8, %9)::Any │ %11 = (%6)(%10)::Any │ %12 = Dates.Float64::Core.Const(Float64) │ %13 = Core.typeassert(%11, %12)::Float64 └── return %13 julia> @code_warntype Dates.days(p) MethodInstance for Dates.days(::Dates.CompoundPeriod) from days(c::Dates.CompoundPeriod) @ Dates ~/.julia/juliaup/julia-nightly/share/julia/stdlib/v1.12/Dates/src/periods.jl:468 Arguments #self#::Core.Const(Dates.days) c::Dates.CompoundPeriod Body::Float64 1 ─ %1 = Dates.isempty::Core.Const(isempty) │ %2 = Base.getproperty(c, :periods)::Vector{Period} │ %3 = (%1)(%2)::Bool └── goto #3 if not %3 2 ─ return 0.0 3 ─ %6 = Dates.Float64::Core.Const(Float64) │ %7 = Dates.sum::Core.Const(sum) │ %8 = Dates.days::Core.Const(Dates.days) │ %9 = Base.getproperty(c, :periods)::Vector{Period} │ %10 = (%7)(%8, %9)::Any │ %11 = (%6)(%10)::Any │ %12 = Dates.Float64::Core.Const(Float64) │ %13 = Core.typeassert(%11, %12)::Float64 └── return %13 ```

@noinline

E.g. this allows `finalizer` inlining in the following case: ```julia mutable struct ForeignBuffer{T} const ptr::Ptr{T} end const foreign_buffer_finalized = Ref(false) function foreign_alloc(::Type{T}, length) where T ptr = Libc.malloc(sizeof(T) * length) ptr = Base.unsafe_convert(Ptr{T}, ptr) obj = ForeignBuffer{T}(ptr) return finalizer(obj) do obj Base.@assume_effects :notaskstate :nothrow foreign_buffer_finalized[] = true Libc.free(obj.ptr) end end function f_EA_finalizer(N::Int) workspace = foreign_alloc(Float64, N) GC.@preserve workspace begin (;ptr) = workspace Base.@assume_effects :nothrow @noinline println(devnull, "ptr = ", ptr) end end ``` ```julia julia> @code_typed f_EA_finalizer(42) CodeInfo( 1 ── %1 = Base.mul_int(8, N)::Int64 │ %2 = Core.lshr_int(%1, 63)::Int64 │ %3 = Core.trunc_int(Core.UInt8, %2)::UInt8 │ %4 = Core.eq_int(%3, 0x01)::Bool └─── goto #3 if not %4 2 ── invoke Core.throw_inexacterror(:convert::Symbol, UInt64::Type, %1::Int64)::Union{} └─── unreachable 3 ── goto #4 4 ── %9 = Core.bitcast(Core.UInt64, %1)::UInt64 └─── goto #5 5 ── goto #6 6 ── goto #7 7 ── goto #8 8 ── %14 = $(Expr(:foreigncall, :(:malloc), Ptr{Nothing}, svec(UInt64), 0, :(:ccall), :(%9), :(%9)))::Ptr{Nothing} └─── goto #9 9 ── %16 = Base.bitcast(Ptr{Float64}, %14)::Ptr{Float64} │ %17 = %new(ForeignBuffer{Float64}, %16)::ForeignBuffer{Float64} └─── goto #10 10 ─ %19 = $(Expr(:gc_preserve_begin, :(%17))) │ %20 = Base.getfield(%17, :ptr)::Ptr{Float64} │ invoke Main.println(Main.devnull::Base.DevNull, "ptr = "::String, %20::Ptr{Float64})::Nothing │ $(Expr(:gc_preserve_end, :(%19))) │ %23 = Main.foreign_buffer_finalized::Base.RefValue{Bool} │ Base.setfield!(%23, :x, true)::Bool │ %25 = Base.getfield(%17, :ptr)::Ptr{Float64} │ %26 = Base.bitcast(Ptr{Nothing}, %25)::Ptr{Nothing} │ $(Expr(:foreigncall, :(:free), Nothing, svec(Ptr{Nothing}), 0, :(:ccall), :(%26), :(%25)))::Nothing └─── return nothing ) => Nothing ``` However, this is still a WIP. Before merging, I want to improve EA's precision a bit and at least fix the test case that is currently marked as `broken`. I also need to check its impact on compiler performance. Additionally, I believe this feature is not yet practical. In particular, there is still significant room for improvement in the following areas: - EA's interprocedural capabilities: currently EA is performed ad-hoc for limited frames because of latency reasons, which significantly reduces its precision in the presence of interprocedural calls. - Relaxing the `:nothrow` check for finalizer inlining: the current algorithm requires `:nothrow`-ness on all paths from the allocation of the mutable struct to its last use, which is not practical for real-world cases. Even when `:nothrow` cannot be guaranteed, auxiliary optimizations such as inserting a `finalize` call after the last use might still be possible.

@noinline

E.g. this allows `finalizer` inlining in the following case: ```julia mutable struct ForeignBuffer{T} const ptr::Ptr{T} end const foreign_buffer_finalized = Ref(false) function foreign_alloc(::Type{T}, length) where T ptr = Libc.malloc(sizeof(T) * length) ptr = Base.unsafe_convert(Ptr{T}, ptr) obj = ForeignBuffer{T}(ptr) return finalizer(obj) do obj Base.@assume_effects :notaskstate :nothrow foreign_buffer_finalized[] = true Libc.free(obj.ptr) end end function f_EA_finalizer(N::Int) workspace = foreign_alloc(Float64, N) GC.@preserve workspace begin (;ptr) = workspace Base.@assume_effects :nothrow @noinline println(devnull, "ptr = ", ptr) end end ``` ```julia julia> @code_typed f_EA_finalizer(42) CodeInfo( 1 ── %1 = Base.mul_int(8, N)::Int64 │ %2 = Core.lshr_int(%1, 63)::Int64 │ %3 = Core.trunc_int(Core.UInt8, %2)::UInt8 │ %4 = Core.eq_int(%3, 0x01)::Bool └─── goto #3 if not %4 2 ── invoke Core.throw_inexacterror(:convert::Symbol, UInt64::Type, %1::Int64)::Union{} └─── unreachable 3 ── goto #4 4 ── %9 = Core.bitcast(Core.UInt64, %1)::UInt64 └─── goto #5 5 ── goto #6 6 ── goto #7 7 ── goto #8 8 ── %14 = $(Expr(:foreigncall, :(:malloc), Ptr{Nothing}, svec(UInt64), 0, :(:ccall), :(%9), :(%9)))::Ptr{Nothing} └─── goto #9 9 ── %16 = Base.bitcast(Ptr{Float64}, %14)::Ptr{Float64} │ %17 = %new(ForeignBuffer{Float64}, %16)::ForeignBuffer{Float64} └─── goto #10 10 ─ %19 = $(Expr(:gc_preserve_begin, :(%17))) │ %20 = Base.getfield(%17, :ptr)::Ptr{Float64} │ invoke Main.println(Main.devnull::Base.DevNull, "ptr = "::String, %20::Ptr{Float64})::Nothing │ $(Expr(:gc_preserve_end, :(%19))) │ %23 = Main.foreign_buffer_finalized::Base.RefValue{Bool} │ Base.setfield!(%23, :x, true)::Bool │ %25 = Base.getfield(%17, :ptr)::Ptr{Float64} │ %26 = Base.bitcast(Ptr{Nothing}, %25)::Ptr{Nothing} │ $(Expr(:foreigncall, :(:free), Nothing, svec(Ptr{Nothing}), 0, :(:ccall), :(%26), :(%25)))::Nothing └─── return nothing ) => Nothing ``` However, this is still a WIP. Before merging, I want to improve EA's precision a bit and at least fix the test case that is currently marked as `broken`. I also need to check its impact on compiler performance. Additionally, I believe this feature is not yet practical. In particular, there is still significant room for improvement in the following areas: - EA's interprocedural capabilities: currently EA is performed ad-hoc for limited frames because of latency reasons, which significantly reduces its precision in the presence of interprocedural calls. - Relaxing the `:nothrow` check for finalizer inlining: the current algorithm requires `:nothrow`-ness on all paths from the allocation of the mutable struct to its last use, which is not practical for real-world cases. Even when `:nothrow` cannot be guaranteed, auxiliary optimizations such as inserting a `finalize` call after the last use might still be possible.

@noinline

E.g. this allows `finalizer` inlining in the following case: ```julia mutable struct ForeignBuffer{T} const ptr::Ptr{T} end const foreign_buffer_finalized = Ref(false) function foreign_alloc(::Type{T}, length) where T ptr = Libc.malloc(sizeof(T) * length) ptr = Base.unsafe_convert(Ptr{T}, ptr) obj = ForeignBuffer{T}(ptr) return finalizer(obj) do obj Base.@assume_effects :notaskstate :nothrow foreign_buffer_finalized[] = true Libc.free(obj.ptr) end end function f_EA_finalizer(N::Int) workspace = foreign_alloc(Float64, N) GC.@preserve workspace begin (;ptr) = workspace Base.@assume_effects :nothrow @noinline println(devnull, "ptr = ", ptr) end end ``` ```julia julia> @code_typed f_EA_finalizer(42) CodeInfo( 1 ── %1 = Base.mul_int(8, N)::Int64 │ %2 = Core.lshr_int(%1, 63)::Int64 │ %3 = Core.trunc_int(Core.UInt8, %2)::UInt8 │ %4 = Core.eq_int(%3, 0x01)::Bool └─── goto #3 if not %4 2 ── invoke Core.throw_inexacterror(:convert::Symbol, UInt64::Type, %1::Int64)::Union{} └─── unreachable 3 ── goto #4 4 ── %9 = Core.bitcast(Core.UInt64, %1)::UInt64 └─── goto #5 5 ── goto #6 6 ── goto #7 7 ── goto #8 8 ── %14 = $(Expr(:foreigncall, :(:malloc), Ptr{Nothing}, svec(UInt64), 0, :(:ccall), :(%9), :(%9)))::Ptr{Nothing} └─── goto #9 9 ── %16 = Base.bitcast(Ptr{Float64}, %14)::Ptr{Float64} │ %17 = %new(ForeignBuffer{Float64}, %16)::ForeignBuffer{Float64} └─── goto #10 10 ─ %19 = $(Expr(:gc_preserve_begin, :(%17))) │ %20 = Base.getfield(%17, :ptr)::Ptr{Float64} │ invoke Main.println(Main.devnull::Base.DevNull, "ptr = "::String, %20::Ptr{Float64})::Nothing │ $(Expr(:gc_preserve_end, :(%19))) │ %23 = Main.foreign_buffer_finalized::Base.RefValue{Bool} │ Base.setfield!(%23, :x, true)::Bool │ %25 = Base.getfield(%17, :ptr)::Ptr{Float64} │ %26 = Base.bitcast(Ptr{Nothing}, %25)::Ptr{Nothing} │ $(Expr(:foreigncall, :(:free), Nothing, svec(Ptr{Nothing}), 0, :(:ccall), :(%26), :(%25)))::Nothing └─── return nothing ) => Nothing ``` However, this is still a WIP. Before merging, I want to improve EA's precision a bit and at least fix the test case that is currently marked as `broken`. I also need to check its impact on compiler performance. Additionally, I believe this feature is not yet practical. In particular, there is still significant room for improvement in the following areas: - EA's interprocedural capabilities: currently EA is performed ad-hoc for limited frames because of latency reasons, which significantly reduces its precision in the presence of interprocedural calls. - Relaxing the `:nothrow` check for finalizer inlining: the current algorithm requires `:nothrow`-ness on all paths from the allocation of the mutable struct to its last use, which is not practical for real-world cases. Even when `:nothrow` cannot be guaranteed, auxiliary optimizations such as inserting a `finalize` call after the last use might still be possible.

@noinline

E.g. this allows `finalizer` inlining in the following case: ```julia mutable struct ForeignBuffer{T} const ptr::Ptr{T} end const foreign_buffer_finalized = Ref(false) function foreign_alloc(::Type{T}, length) where T ptr = Libc.malloc(sizeof(T) * length) ptr = Base.unsafe_convert(Ptr{T}, ptr) obj = ForeignBuffer{T}(ptr) return finalizer(obj) do obj Base.@assume_effects :notaskstate :nothrow foreign_buffer_finalized[] = true Libc.free(obj.ptr) end end function f_EA_finalizer(N::Int) workspace = foreign_alloc(Float64, N) GC.@preserve workspace begin (;ptr) = workspace Base.@assume_effects :nothrow @noinline println(devnull, "ptr = ", ptr) end end ``` ```julia julia> @code_typed f_EA_finalizer(42) CodeInfo( 1 ── %1 = Base.mul_int(8, N)::Int64 │ %2 = Core.lshr_int(%1, 63)::Int64 │ %3 = Core.trunc_int(Core.UInt8, %2)::UInt8 │ %4 = Core.eq_int(%3, 0x01)::Bool └─── goto #3 if not %4 2 ── invoke Core.throw_inexacterror(:convert::Symbol, UInt64::Type, %1::Int64)::Union{} └─── unreachable 3 ── goto #4 4 ── %9 = Core.bitcast(Core.UInt64, %1)::UInt64 └─── goto #5 5 ── goto #6 6 ── goto #7 7 ── goto #8 8 ── %14 = $(Expr(:foreigncall, :(:malloc), Ptr{Nothing}, svec(UInt64), 0, :(:ccall), :(%9), :(%9)))::Ptr{Nothing} └─── goto #9 9 ── %16 = Base.bitcast(Ptr{Float64}, %14)::Ptr{Float64} │ %17 = %new(ForeignBuffer{Float64}, %16)::ForeignBuffer{Float64} └─── goto #10 10 ─ %19 = $(Expr(:gc_preserve_begin, :(%17))) │ %20 = Base.getfield(%17, :ptr)::Ptr{Float64} │ invoke Main.println(Main.devnull::Base.DevNull, "ptr = "::String, %20::Ptr{Float64})::Nothing │ $(Expr(:gc_preserve_end, :(%19))) │ %23 = Main.foreign_buffer_finalized::Base.RefValue{Bool} │ Base.setfield!(%23, :x, true)::Bool │ %25 = Base.getfield(%17, :ptr)::Ptr{Float64} │ %26 = Base.bitcast(Ptr{Nothing}, %25)::Ptr{Nothing} │ $(Expr(:foreigncall, :(:free), Nothing, svec(Ptr{Nothing}), 0, :(:ccall), :(%26), :(%25)))::Nothing └─── return nothing ) => Nothing ``` However, this is still a WIP. Before merging, I want to improve EA's precision a bit and at least fix the test case that is currently marked as `broken`. I also need to check its impact on compiler performance. Additionally, I believe this feature is not yet practical. In particular, there is still significant room for improvement in the following areas: - EA's interprocedural capabilities: currently EA is performed ad-hoc for limited frames because of latency reasons, which significantly reduces its precision in the presence of interprocedural calls. - Relaxing the `:nothrow` check for finalizer inlining: the current algorithm requires `:nothrow`-ness on all paths from the allocation of the mutable struct to its last use, which is not practical for real-world cases. Even when `:nothrow` cannot be guaranteed, auxiliary optimizations such as inserting a `finalize` call after the last use might still be possible.

@noinline

E.g. this allows `finalizer` inlining in the following case: ```julia mutable struct ForeignBuffer{T} const ptr::Ptr{T} end const foreign_buffer_finalized = Ref(false) function foreign_alloc(::Type{T}, length) where T ptr = Libc.malloc(sizeof(T) * length) ptr = Base.unsafe_convert(Ptr{T}, ptr) obj = ForeignBuffer{T}(ptr) return finalizer(obj) do obj Base.@assume_effects :notaskstate :nothrow foreign_buffer_finalized[] = true Libc.free(obj.ptr) end end function f_EA_finalizer(N::Int) workspace = foreign_alloc(Float64, N) GC.@preserve workspace begin (;ptr) = workspace Base.@assume_effects :nothrow @noinline println(devnull, "ptr = ", ptr) end end ``` ```julia julia> @code_typed f_EA_finalizer(42) CodeInfo( 1 ── %1 = Base.mul_int(8, N)::Int64 │ %2 = Core.lshr_int(%1, 63)::Int64 │ %3 = Core.trunc_int(Core.UInt8, %2)::UInt8 │ %4 = Core.eq_int(%3, 0x01)::Bool └─── goto #3 if not %4 2 ── invoke Core.throw_inexacterror(:convert::Symbol, UInt64::Type, %1::Int64)::Union{} └─── unreachable 3 ── goto #4 4 ── %9 = Core.bitcast(Core.UInt64, %1)::UInt64 └─── goto #5 5 ── goto #6 6 ── goto #7 7 ── goto #8 8 ── %14 = $(Expr(:foreigncall, :(:malloc), Ptr{Nothing}, svec(UInt64), 0, :(:ccall), :(%9), :(%9)))::Ptr{Nothing} └─── goto #9 9 ── %16 = Base.bitcast(Ptr{Float64}, %14)::Ptr{Float64} │ %17 = %new(ForeignBuffer{Float64}, %16)::ForeignBuffer{Float64} └─── goto #10 10 ─ %19 = $(Expr(:gc_preserve_begin, :(%17))) │ %20 = Base.getfield(%17, :ptr)::Ptr{Float64} │ invoke Main.println(Main.devnull::Base.DevNull, "ptr = "::String, %20::Ptr{Float64})::Nothing │ $(Expr(:gc_preserve_end, :(%19))) │ %23 = Main.foreign_buffer_finalized::Base.RefValue{Bool} │ Base.setfield!(%23, :x, true)::Bool │ %25 = Base.getfield(%17, :ptr)::Ptr{Float64} │ %26 = Base.bitcast(Ptr{Nothing}, %25)::Ptr{Nothing} │ $(Expr(:foreigncall, :(:free), Nothing, svec(Ptr{Nothing}), 0, :(:ccall), :(%26), :(%25)))::Nothing └─── return nothing ) => Nothing ``` However, this is still a WIP. Before merging, I want to improve EA's precision a bit and at least fix the test case that is currently marked as `broken`. I also need to check its impact on compiler performance. Additionally, I believe this feature is not yet practical. In particular, there is still significant room for improvement in the following areas: - EA's interprocedural capabilities: currently EA is performed ad-hoc for limited frames because of latency reasons, which significantly reduces its precision in the presence of interprocedural calls. - Relaxing the `:nothrow` check for finalizer inlining: the current algorithm requires `:nothrow`-ness on all paths from the allocation of the mutable struct to its last use, which is not practical for real-world cases. Even when `:nothrow` cannot be guaranteed, auxiliary optimizations such as inserting a `finalize` call after the last use might still be possible.

@noinline

E.g. this allows `finalizer` inlining in the following case: ```julia mutable struct ForeignBuffer{T} const ptr::Ptr{T} end const foreign_buffer_finalized = Ref(false) function foreign_alloc(::Type{T}, length) where T ptr = Libc.malloc(sizeof(T) * length) ptr = Base.unsafe_convert(Ptr{T}, ptr) obj = ForeignBuffer{T}(ptr) return finalizer(obj) do obj Base.@assume_effects :notaskstate :nothrow foreign_buffer_finalized[] = true Libc.free(obj.ptr) end end function f_EA_finalizer(N::Int) workspace = foreign_alloc(Float64, N) GC.@preserve workspace begin (;ptr) = workspace Base.@assume_effects :nothrow @noinline println(devnull, "ptr = ", ptr) end end ``` ```julia julia> @code_typed f_EA_finalizer(42) CodeInfo( 1 ── %1 = Base.mul_int(8, N)::Int64 │ %2 = Core.lshr_int(%1, 63)::Int64 │ %3 = Core.trunc_int(Core.UInt8, %2)::UInt8 │ %4 = Core.eq_int(%3, 0x01)::Bool └─── goto #3 if not %4 2 ── invoke Core.throw_inexacterror(:convert::Symbol, UInt64::Type, %1::Int64)::Union{} └─── unreachable 3 ── goto #4 4 ── %9 = Core.bitcast(Core.UInt64, %1)::UInt64 └─── goto #5 5 ── goto #6 6 ── goto #7 7 ── goto #8 8 ── %14 = $(Expr(:foreigncall, :(:malloc), Ptr{Nothing}, svec(UInt64), 0, :(:ccall), :(%9), :(%9)))::Ptr{Nothing} └─── goto #9 9 ── %16 = Base.bitcast(Ptr{Float64}, %14)::Ptr{Float64} │ %17 = %new(ForeignBuffer{Float64}, %16)::ForeignBuffer{Float64} └─── goto #10 10 ─ %19 = $(Expr(:gc_preserve_begin, :(%17))) │ %20 = Base.getfield(%17, :ptr)::Ptr{Float64} │ invoke Main.println(Main.devnull::Base.DevNull, "ptr = "::String, %20::Ptr{Float64})::Nothing │ $(Expr(:gc_preserve_end, :(%19))) │ %23 = Main.foreign_buffer_finalized::Base.RefValue{Bool} │ Base.setfield!(%23, :x, true)::Bool │ %25 = Base.getfield(%17, :ptr)::Ptr{Float64} │ %26 = Base.bitcast(Ptr{Nothing}, %25)::Ptr{Nothing} │ $(Expr(:foreigncall, :(:free), Nothing, svec(Ptr{Nothing}), 0, :(:ccall), :(%26), :(%25)))::Nothing └─── return nothing ) => Nothing ``` However, this is still a WIP. Before merging, I want to improve EA's precision a bit and at least fix the test case that is currently marked as `broken`. I also need to check its impact on compiler performance. Additionally, I believe this feature is not yet practical. In particular, there is still significant room for improvement in the following areas: - EA's interprocedural capabilities: currently EA is performed ad-hoc for limited frames because of latency reasons, which significantly reduces its precision in the presence of interprocedural calls. - Relaxing the `:nothrow` check for finalizer inlining: the current algorithm requires `:nothrow`-ness on all paths from the allocation of the mutable struct to its last use, which is not practical for real-world cases. Even when `:nothrow` cannot be guaranteed, auxiliary optimizations such as inserting a `finalize` call after the last use might still be possible.

@noinline

E.g. this allows `finalizer` inlining in the following case: ```julia mutable struct ForeignBuffer{T} const ptr::Ptr{T} end const foreign_buffer_finalized = Ref(false) function foreign_alloc(::Type{T}, length) where T ptr = Libc.malloc(sizeof(T) * length) ptr = Base.unsafe_convert(Ptr{T}, ptr) obj = ForeignBuffer{T}(ptr) return finalizer(obj) do obj Base.@assume_effects :notaskstate :nothrow foreign_buffer_finalized[] = true Libc.free(obj.ptr) end end function f_EA_finalizer(N::Int) workspace = foreign_alloc(Float64, N) GC.@preserve workspace begin (;ptr) = workspace Base.@assume_effects :nothrow @noinline println(devnull, "ptr = ", ptr) end end ``` ```julia julia> @code_typed f_EA_finalizer(42) CodeInfo( 1 ── %1 = Base.mul_int(8, N)::Int64 │ %2 = Core.lshr_int(%1, 63)::Int64 │ %3 = Core.trunc_int(Core.UInt8, %2)::UInt8 │ %4 = Core.eq_int(%3, 0x01)::Bool └─── goto #3 if not %4 2 ── invoke Core.throw_inexacterror(:convert::Symbol, UInt64::Type, %1::Int64)::Union{} └─── unreachable 3 ── goto #4 4 ── %9 = Core.bitcast(Core.UInt64, %1)::UInt64 └─── goto #5 5 ── goto #6 6 ── goto #7 7 ── goto #8 8 ── %14 = $(Expr(:foreigncall, :(:malloc), Ptr{Nothing}, svec(UInt64), 0, :(:ccall), :(%9), :(%9)))::Ptr{Nothing} └─── goto #9 9 ── %16 = Base.bitcast(Ptr{Float64}, %14)::Ptr{Float64} │ %17 = %new(ForeignBuffer{Float64}, %16)::ForeignBuffer{Float64} └─── goto #10 10 ─ %19 = $(Expr(:gc_preserve_begin, :(%17))) │ %20 = Base.getfield(%17, :ptr)::Ptr{Float64} │ invoke Main.println(Main.devnull::Base.DevNull, "ptr = "::String, %20::Ptr{Float64})::Nothing │ $(Expr(:gc_preserve_end, :(%19))) │ %23 = Main.foreign_buffer_finalized::Base.RefValue{Bool} │ Base.setfield!(%23, :x, true)::Bool │ %25 = Base.getfield(%17, :ptr)::Ptr{Float64} │ %26 = Base.bitcast(Ptr{Nothing}, %25)::Ptr{Nothing} │ $(Expr(:foreigncall, :(:free), Nothing, svec(Ptr{Nothing}), 0, :(:ccall), :(%26), :(%25)))::Nothing └─── return nothing ) => Nothing ``` However, this is still a WIP. Before merging, I want to improve EA's precision a bit and at least fix the test case that is currently marked as `broken`. I also need to check its impact on compiler performance. Additionally, I believe this feature is not yet practical. In particular, there is still significant room for improvement in the following areas: - EA's interprocedural capabilities: currently EA is performed ad-hoc for limited frames because of latency reasons, which significantly reduces its precision in the presence of interprocedural calls. - Relaxing the `:nothrow` check for finalizer inlining: the current algorithm requires `:nothrow`-ness on all paths from the allocation of the mutable struct to its last use, which is not practical for real-world cases. Even when `:nothrow` cannot be guaranteed, auxiliary optimizations such as inserting a `finalize` call after the last use might still be possible.

@noinline

E.g. this allows `finalizer` inlining in the following case: ```julia mutable struct ForeignBuffer{T} const ptr::Ptr{T} end const foreign_buffer_finalized = Ref(false) function foreign_alloc(::Type{T}, length) where T ptr = Libc.malloc(sizeof(T) * length) ptr = Base.unsafe_convert(Ptr{T}, ptr) obj = ForeignBuffer{T}(ptr) return finalizer(obj) do obj Base.@assume_effects :notaskstate :nothrow foreign_buffer_finalized[] = true Libc.free(obj.ptr) end end function f_EA_finalizer(N::Int) workspace = foreign_alloc(Float64, N) GC.@preserve workspace begin (;ptr) = workspace Base.@assume_effects :nothrow @noinline println(devnull, "ptr = ", ptr) end end ``` ```julia julia> @code_typed f_EA_finalizer(42) CodeInfo( 1 ── %1 = Base.mul_int(8, N)::Int64 │ %2 = Core.lshr_int(%1, 63)::Int64 │ %3 = Core.trunc_int(Core.UInt8, %2)::UInt8 │ %4 = Core.eq_int(%3, 0x01)::Bool └─── goto #3 if not %4 2 ── invoke Core.throw_inexacterror(:convert::Symbol, UInt64::Type, %1::Int64)::Union{} └─── unreachable 3 ── goto #4 4 ── %9 = Core.bitcast(Core.UInt64, %1)::UInt64 └─── goto #5 5 ── goto #6 6 ── goto #7 7 ── goto #8 8 ── %14 = $(Expr(:foreigncall, :(:malloc), Ptr{Nothing}, svec(UInt64), 0, :(:ccall), :(%9), :(%9)))::Ptr{Nothing} └─── goto #9 9 ── %16 = Base.bitcast(Ptr{Float64}, %14)::Ptr{Float64} │ %17 = %new(ForeignBuffer{Float64}, %16)::ForeignBuffer{Float64} └─── goto #10 10 ─ %19 = $(Expr(:gc_preserve_begin, :(%17))) │ %20 = Base.getfield(%17, :ptr)::Ptr{Float64} │ invoke Main.println(Main.devnull::Base.DevNull, "ptr = "::String, %20::Ptr{Float64})::Nothing │ $(Expr(:gc_preserve_end, :(%19))) │ %23 = Main.foreign_buffer_finalized::Base.RefValue{Bool} │ Base.setfield!(%23, :x, true)::Bool │ %25 = Base.getfield(%17, :ptr)::Ptr{Float64} │ %26 = Base.bitcast(Ptr{Nothing}, %25)::Ptr{Nothing} │ $(Expr(:foreigncall, :(:free), Nothing, svec(Ptr{Nothing}), 0, :(:ccall), :(%26), :(%25)))::Nothing └─── return nothing ) => Nothing ``` However, this is still a WIP. Before merging, I want to improve EA's precision a bit and at least fix the test case that is currently marked as `broken`. I also need to check its impact on compiler performance. Additionally, I believe this feature is not yet practical. In particular, there is still significant room for improvement in the following areas: - EA's interprocedural capabilities: currently EA is performed ad-hoc for limited frames because of latency reasons, which significantly reduces its precision in the presence of interprocedural calls. - Relaxing the `:nothrow` check for finalizer inlining: the current algorithm requires `:nothrow`-ness on all paths from the allocation of the mutable struct to its last use, which is not practical for real-world cases. Even when `:nothrow` cannot be guaranteed, auxiliary optimizations such as inserting a `finalize` call after the last use might still be possible.

ghost assigned StefanKarpinski Apr 27, 2011

JeffBezanson closed this as completed Oct 12, 2011

JeffBezanson mentioned this issue Dec 17, 2011

separate LU economy mode #272

Closed

ViralBShah mentioned this issue Mar 16, 2012

WIP: libuv rewrite and Windows port #521

Merged

vtjnash mentioned this issue Mar 17, 2012

build fails at sys.ji #600

Closed

ViralBShah mentioned this issue Mar 18, 2012

Building julia with clang: tests fail #602

Closed

StefanKarpinski mentioned this issue Apr 14, 2012

TSV parsing is extremely slow #661

Closed

timholy mentioned this issue May 3, 2012

Dispatch method selection (RangeIndex... more specific than Integer?) #795

Closed

kk49 mentioned this issue May 9, 2012

finfer causes segmentation fault #816

Closed

forkandwait mentioned this issue May 12, 2012

Failed build on FreeBSD #829

Closed

6e441f9c mentioned this issue May 30, 2012

Julia segfaults while loading stage0.jl #901

Closed

spott mentioned this issue Jul 16, 2012

problem bootstrapping the sys0.ji file #1052

Closed

dioptre mentioned this issue Jul 30, 2012

error during bootstrap on OS X #1089

Closed

fundamental mentioned this issue Apr 11, 2013

Regression in reload() Results in REPL Hanging #2829

Closed

railskipl mentioned this issue May 3, 2013

Tab-Expansion of Base in REPL causes SIGABRT #2064

Closed

staticfloat mentioned this issue Jan 18, 2014

Build failure on Fedora #5422

Closed

cmundi mentioned this issue Mar 18, 2014

RFC: Windows CI #6028

Merged

burrowsa pushed a commit to burrowsa/julia that referenced this issue Mar 24, 2014

Merge pull request JuliaLang#3 from JuliaLang/master

5915346

Rebase to a5b5d64

SimonDanisch mentioned this issue Apr 11, 2014

inv(Matrix{Float64}) makes julia exit without any error #6504

Closed

svaksha mentioned this issue May 3, 2014

[openblas] make testall - error in spawn.jl #6728

Closed

rekabrnalla mentioned this issue Oct 4, 2023

Base.@ccallable with a docstring attached to the function errors #51586

Closed

Keno added a commit that referenced this issue Oct 9, 2023

Support for local frame exceptions

9ab250f

Part of #3

Keno added a commit that referenced this issue Oct 9, 2023

Fix cross-frame exceptions

8d69e37

Together with previous commits, fixes #3

Keno added a commit that referenced this issue Oct 9, 2023

Fix #3

8b45785

Keno pushed a commit that referenced this issue Oct 9, 2023

Properly resolve the library symbol in :foreigncall (fixes #3)

dd5cc4c

Keno pushed a commit that referenced this issue Oct 9, 2023

Merge pull request #4 from JuliaDebug/teh/fix_3

e9c1bc4

Properly resolve the library symbol in :foreigncall (fixes #3)

vchuravy mentioned this issue Dec 2, 2023

GC error on 1.10 #52363

Closed

Andrew-Milestone mentioned this issue Apr 28, 2024

Error adding NeuralOperators in Julia v1.10.2 #54292

Closed

Keno pushed a commit that referenced this issue Jun 5, 2024

Merge pull request #3 from jpsamaroo/jps/readme

281b9e1

Add a README

gbaraldi mentioned this issue Jul 2, 2024

Bump LLVM to v18 #54848

Merged

aplavin mentioned this issue Jul 28, 2024

Add meaningful names to AnnotatedString annotations to make their usage understandable #55249

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

systematic, efficient approach to string construction #3

systematic, efficient approach to string construction #3

StefanKarpinski commented Apr 27, 2011

JeffBezanson commented May 3, 2011

StefanKarpinski commented May 3, 2011

JeffBezanson commented May 3, 2011

ViralBShah commented Jul 9, 2011

StefanKarpinski commented Jul 9, 2011

JeffBezanson commented Jul 9, 2011

StefanKarpinski commented Jul 9, 2011

JeffBezanson commented Jul 9, 2011

StefanKarpinski commented Jul 9, 2011

ViralBShah commented Jul 9, 2011

systematic, efficient approach to string construction #3

systematic, efficient approach to string construction #3

Comments

StefanKarpinski commented Apr 27, 2011

JeffBezanson commented May 3, 2011

StefanKarpinski commented May 3, 2011

JeffBezanson commented May 3, 2011

ViralBShah commented Jul 9, 2011

StefanKarpinski commented Jul 9, 2011

JeffBezanson commented Jul 9, 2011

StefanKarpinski commented Jul 9, 2011

JeffBezanson commented Jul 9, 2011

StefanKarpinski commented Jul 9, 2011

ViralBShah commented Jul 9, 2011