-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
systematic, efficient approach to string construction #3
Comments
Changing types based on string lengths makes it too hard to infer the types of these rather common operations. Instead, we should have the option to wrap a string as BigString(s) if s might be large, and BigString can use the memory-saving versions of these operations. |
Makes sense. I can make the BigString change easily. Is this an argument for continuing to implement core string building functionality by writing the printing version first and then defining the string creating version by applying print_to_string to the printing version? |
Somewhat, but multiple approaches can be used. For example, if you're just The trouble is that if I do something like write(io, strcat(a,b,c)) what you ideally want is to write each string without forming the temporary. strcat_to(io, a, b, c) but that's not a very nice interface. If a, b, or c is a BigString though, print_escaped is a bit different since we know that a main use of it is On Tue, May 3, 2011 at 12:38 PM, StefanKarpinski <
|
This seems like a 2.0 thing. |
We're actually pretty good on this at this point. All If someone wants to use a I think this issue is not fully addressed, but well enough for v1.0 for now. Will reassign to v2.0. |
Can I replace |
Is |
It should be now that we changed |
We can get rid of |
We also need to experiment with some sizes at which memcpy is faster. It is actually slower for small arrays. Copy_to should have these smarts. On 10-Jul-2011, at 12:43 AM, JeffBezansonreply@reply.github.com wrote:
|
Rebase to a5b5d64
This commit improves SROA pass by extending the `unswitchtupleunion` optimization to handle the general parametric types, e.g.: ```julia julia> struct A{T} x::T end; julia> function foo(a1, a2, c) t = c ? A(a1) : A(a2) return getfield(t, :x) end; julia> only(Base.code_ircode(foo, (Int,Float64,Bool); optimize_until="SROA")) ``` > Before ``` 2 1 ─ goto #3 if not _4 │ 2 ─ %2 = %new(A{Int64}, _2)::A{Int64} │╻ A └── goto #4 │ 3 ─ %4 = %new(A{Float64}, _3)::A{Float64} │╻ A 4 ┄ %5 = φ (#2 => %2, #3 => %4)::Union{A{Float64}, A{Int64}} │ 3 │ %6 = Main.getfield(%5, :x)::Union{Float64, Int64} │ └── return %6 │ => Union{Float64, Int64} ``` > After ``` julia> only(Base.code_ircode(foo, (Int,Float64,Bool); optimize_until="SROA")) 2 1 ─ goto #3 if not _4 │ 2 ─ nothing::A{Int64} │╻ A └── goto #4 │ 3 ─ nothing::A{Float64} │╻ A 4 ┄ %8 = φ (#2 => _2, #3 => _3)::Union{Float64, Int64} │ │ nothing::Union{A{Float64}, A{Int64}} 3 │ %6 = %8::Union{Float64, Int64} │ └── return %6 │ => Union{Float64, Int64} ```
This commit improves SROA pass by extending the `unswitchtupleunion` optimization to handle the general parametric types, e.g.: ```julia julia> struct A{T} x::T end; julia> function foo(a1, a2, c) t = c ? A(a1) : A(a2) return getfield(t, :x) end; julia> only(Base.code_ircode(foo, (Int,Float64,Bool); optimize_until="SROA")) ``` > Before ``` 2 1 ─ goto #3 if not _4 │ 2 ─ %2 = %new(A{Int64}, _2)::A{Int64} │╻ A └── goto #4 │ 3 ─ %4 = %new(A{Float64}, _3)::A{Float64} │╻ A 4 ┄ %5 = φ (#2 => %2, #3 => %4)::Union{A{Float64}, A{Int64}} │ 3 │ %6 = Main.getfield(%5, :x)::Union{Float64, Int64} │ └── return %6 │ => Union{Float64, Int64} ``` > After ``` julia> only(Base.code_ircode(foo, (Int,Float64,Bool); optimize_until="SROA")) 2 1 ─ goto #3 if not _4 │ 2 ─ nothing::A{Int64} │╻ A └── goto #4 │ 3 ─ nothing::A{Float64} │╻ A 4 ┄ %8 = φ (#2 => _2, #3 => _3)::Union{Float64, Int64} │ │ nothing::Union{A{Float64}, A{Int64}} 3 │ %6 = %8::Union{Float64, Int64} │ └── return %6 │ => Union{Float64, Int64} ```
This commit improves SROA pass by extending the `unswitchtupleunion` optimization to handle the general parametric types, e.g.: ```julia julia> struct A{T} x::T end; julia> function foo(a1, a2, c) t = c ? A(a1) : A(a2) return getfield(t, :x) end; julia> only(Base.code_ircode(foo, (Int,Float64,Bool); optimize_until="SROA")) ``` > Before ``` 2 1 ─ goto #3 if not _4 │ 2 ─ %2 = %new(A{Int64}, _2)::A{Int64} │╻ A └── goto #4 │ 3 ─ %4 = %new(A{Float64}, _3)::A{Float64} │╻ A 4 ┄ %5 = φ (#2 => %2, #3 => %4)::Union{A{Float64}, A{Int64}} │ 3 │ %6 = Main.getfield(%5, :x)::Union{Float64, Int64} │ └── return %6 │ => Union{Float64, Int64} ``` > After ``` julia> only(Base.code_ircode(foo, (Int,Float64,Bool); optimize_until="SROA")) 2 1 ─ goto #3 if not _4 │ 2 ─ nothing::A{Int64} │╻ A └── goto #4 │ 3 ─ nothing::A{Float64} │╻ A 4 ┄ %8 = φ (#2 => _2, #3 => _3)::Union{Float64, Int64} │ │ nothing::Union{A{Float64}, A{Int64}} 3 │ %6 = %8::Union{Float64, Int64} │ └── return %6 │ => Union{Float64, Int64} ```
This commit improves SROA pass by extending the `unswitchtupleunion` optimization to handle the general parametric types, e.g.: ```julia julia> struct A{T} x::T end; julia> function foo(a1, a2, c) t = c ? A(a1) : A(a2) return getfield(t, :x) end; julia> only(Base.code_ircode(foo, (Int,Float64,Bool); optimize_until="SROA")) ``` > Before ``` 2 1 ─ goto #3 if not _4 │ 2 ─ %2 = %new(A{Int64}, _2)::A{Int64} │╻ A └── goto #4 │ 3 ─ %4 = %new(A{Float64}, _3)::A{Float64} │╻ A 4 ┄ %5 = φ (#2 => %2, #3 => %4)::Union{A{Float64}, A{Int64}} │ 3 │ %6 = Main.getfield(%5, :x)::Union{Float64, Int64} │ └── return %6 │ => Union{Float64, Int64} ``` > After ``` julia> only(Base.code_ircode(foo, (Int,Float64,Bool); optimize_until="SROA")) 2 1 ─ goto #3 if not _4 │ 2 ─ nothing::A{Int64} │╻ A └── goto #4 │ 3 ─ nothing::A{Float64} │╻ A 4 ┄ %8 = φ (#2 => _2, #3 => _3)::Union{Float64, Int64} │ │ nothing::Union{A{Float64}, A{Int64}} 3 │ %6 = %8::Union{Float64, Int64} │ └── return %6 │ => Union{Float64, Int64} ```
Properly resolve the library symbol in :foreigncall (fixes #3)
Fixes: #33147 Replaces/Closes: #40445 The difference here, compared to past implementations, is that we use the zero-cost `isiterable` check on every intermediate step, instead of wrapping the call in a try/catch and then trying to re-approximate the `isiterable` afterwards. Some samples: ```julia julia> Dict(i for i in 1:3) ERROR: ArgumentError: AbstractDict(kv): kv needs to be an iterator of 2-tuples or pairs Stacktrace: [1] _throw_dict_kv_error() @ Base ./dict.jl:118 [2] grow_to! @ ./dict.jl:132 [inlined] [3] dict_with_eltype @ ./abstractdict.jl:592 [inlined] [4] Dict(kv::Base.Generator{UnitRange{Int64}, typeof(identity)}) @ Base ./dict.jl:120 [5] top-level scope @ REPL[1]:1 julia> Dict(i => error("$i") for i in 1:3) ERROR: 1 Stacktrace: [1] error(s::String) @ Base ./error.jl:35 [2] (::var"#3#4")(i::Int64) @ Main ./none:0 [3] iterate @ ./generator.jl:48 [inlined] [4] grow_to! @ ./dict.jl:124 [inlined] [5] dict_with_eltype @ ./abstractdict.jl:592 [inlined] [6] Dict(kv::Base.Generator{UnitRange{Int64}, var"#3#4"}) @ Base ./dict.jl:120 [7] top-level scope @ REPL[2]:1 ``` The other unrelated change here is that `dest = empty(dest, typeof(k), typeof(v))` is made conditional, so we do not unconditionally construct an empty Dict in order to discard it and allocate an exact duplicate of it, but only do so if inference wasn't precise originally. Co-authored-by: Curtis Vogt <curtis.vogt@gmail.com>
For example, we seek to eliminate the gc frame from this function, as observed here: ```julia julia> code_llvm((BitSet,), raw=true) do x; r = x.bits; GC.safepoint(); @inbounds r[1]; end ; Function Signature: var"#3"(Base.BitSet) ; @ REPL[1]:1 within `#3` define swiftcc i64 @"julia_#3_494"(ptr nonnull swiftself %pgcstack, ptr noundef nonnull align 8 dereferenceable(16) %"x::BitSet") #0 !dbg !5 { top: call void @llvm.dbg.declare(metadata ptr %"x::BitSet", metadata !21, metadata !DIExpression()), !dbg !22 %ptls_field = getelementptr inbounds ptr, ptr %pgcstack, i64 2 %ptls_load = load ptr, ptr %ptls_field, align 8, !tbaa !23 %0 = getelementptr inbounds ptr, ptr %ptls_load, i64 2 %safepoint = load ptr, ptr %0, align 8, !tbaa !27 fence syncscope("singlethread") seq_cst %1 = load volatile i64, ptr %safepoint, align 8, !dbg !22 fence syncscope("singlethread") seq_cst ; ┌ @ Base.jl:49 within `getproperty` %"x::BitSet.bits" = load atomic ptr, ptr %"x::BitSet" unordered, align 8, !dbg !29, !tbaa !27, !alias.scope !33, !noalias !36, !nonnull !11, !dereferenceable !41, !align !42 ; └ ; ┌ @ gcutils.jl:253 within `safepoint` %ptls_load4 = load ptr, ptr %ptls_field, align 8, !dbg !43, !tbaa !23 %2 = getelementptr inbounds ptr, ptr %ptls_load4, i64 2, !dbg !43 %safepoint5 = load ptr, ptr %2, align 8, !dbg !43, !tbaa !27 fence syncscope("singlethread") seq_cst, !dbg !43 %3 = load volatile i64, ptr %safepoint5, align 8, !dbg !43 fence syncscope("singlethread") seq_cst, !dbg !43 ; └ ; ┌ @ essentials.jl:892 within `getindex` %4 = load ptr, ptr %"x::BitSet.bits", align 8, !dbg !46, !tbaa !49, !alias.scope !52, !noalias !53 %5 = load i64, ptr %4, align 8, !dbg !46, !tbaa !54, !alias.scope !57, !noalias !58 ret i64 %5, !dbg !46 ; └ } ```
For example, we seek to eliminate the gc frame from this function, as observed here: ```julia julia> code_llvm((BitSet,), raw=true) do x; r = x.bits; GC.safepoint(); @inbounds r[1]; end ; Function Signature: var"#3"(Base.BitSet) ; @ REPL[1]:1 within `#3` define swiftcc i64 @"julia_#3_494"(ptr nonnull swiftself %pgcstack, ptr noundef nonnull align 8 dereferenceable(16) %"x::BitSet") #0 !dbg !5 { top: call void @llvm.dbg.declare(metadata ptr %"x::BitSet", metadata !21, metadata !DIExpression()), !dbg !22 %ptls_field = getelementptr inbounds ptr, ptr %pgcstack, i64 2 %ptls_load = load ptr, ptr %ptls_field, align 8, !tbaa !23 %0 = getelementptr inbounds ptr, ptr %ptls_load, i64 2 %safepoint = load ptr, ptr %0, align 8, !tbaa !27 fence syncscope("singlethread") seq_cst %1 = load volatile i64, ptr %safepoint, align 8, !dbg !22 fence syncscope("singlethread") seq_cst ; ┌ @ Base.jl:49 within `getproperty` %"x::BitSet.bits" = load atomic ptr, ptr %"x::BitSet" unordered, align 8, !dbg !29, !tbaa !27, !alias.scope !33, !noalias !36, !nonnull !11, !dereferenceable !41, !align !42 ; └ ; ┌ @ gcutils.jl:253 within `safepoint` %ptls_load4 = load ptr, ptr %ptls_field, align 8, !dbg !43, !tbaa !23 %2 = getelementptr inbounds ptr, ptr %ptls_load4, i64 2, !dbg !43 %safepoint5 = load ptr, ptr %2, align 8, !dbg !43, !tbaa !27 fence syncscope("singlethread") seq_cst, !dbg !43 %3 = load volatile i64, ptr %safepoint5, align 8, !dbg !43 fence syncscope("singlethread") seq_cst, !dbg !43 ; └ ; ┌ @ essentials.jl:892 within `getindex` %4 = load ptr, ptr %"x::BitSet.bits", align 8, !dbg !46, !tbaa !49, !alias.scope !52, !noalias !53 %5 = load i64, ptr %4, align 8, !dbg !46, !tbaa !54, !alias.scope !57, !noalias !58 ret i64 %5, !dbg !46 ; └ } ```
For example, we seek to eliminate the gc frame from this function, as observed here: ```julia julia> code_llvm((BitSet,), raw=true) do x; r = x.bits; GC.safepoint(); @inbounds r[1]; end ; Function Signature: var"#3"(Base.BitSet) ; @ REPL[1]:1 within `#3` define swiftcc i64 @"julia_#3_494"(ptr nonnull swiftself %pgcstack, ptr noundef nonnull align 8 dereferenceable(16) %"x::BitSet") #0 !dbg !5 { top: call void @llvm.dbg.declare(metadata ptr %"x::BitSet", metadata !21, metadata !DIExpression()), !dbg !22 %ptls_field = getelementptr inbounds ptr, ptr %pgcstack, i64 2 %ptls_load = load ptr, ptr %ptls_field, align 8, !tbaa !23 %0 = getelementptr inbounds ptr, ptr %ptls_load, i64 2 %safepoint = load ptr, ptr %0, align 8, !tbaa !27 fence syncscope("singlethread") seq_cst %1 = load volatile i64, ptr %safepoint, align 8, !dbg !22 fence syncscope("singlethread") seq_cst ; ┌ @ Base.jl:49 within `getproperty` %"x::BitSet.bits" = load atomic ptr, ptr %"x::BitSet" unordered, align 8, !dbg !29, !tbaa !27, !alias.scope !33, !noalias !36, !nonnull !11, !dereferenceable !41, !align !42 ; └ ; ┌ @ gcutils.jl:253 within `safepoint` %ptls_load4 = load ptr, ptr %ptls_field, align 8, !dbg !43, !tbaa !23 %2 = getelementptr inbounds ptr, ptr %ptls_load4, i64 2, !dbg !43 %safepoint5 = load ptr, ptr %2, align 8, !dbg !43, !tbaa !27 fence syncscope("singlethread") seq_cst, !dbg !43 %3 = load volatile i64, ptr %safepoint5, align 8, !dbg !43 fence syncscope("singlethread") seq_cst, !dbg !43 ; └ ; ┌ @ essentials.jl:892 within `getindex` %4 = load ptr, ptr %"x::BitSet.bits", align 8, !dbg !46, !tbaa !49, !alias.scope !52, !noalias !53 %5 = load i64, ptr %4, align 8, !dbg !46, !tbaa !54, !alias.scope !57, !noalias !58 ret i64 %5, !dbg !46 ; └ } ```
For example, we seek to eliminate the gc frame from this function, as observed here: ```julia julia> code_llvm((BitSet,), raw=true) do x; r = x.bits; GC.safepoint(); @inbounds r[1]; end ; Function Signature: var"https://github.com/JuliaLang/julia/issues/3"(Base.BitSet) ; @ REPL[1]:1 within `https://github.com/JuliaLang/julia/issues/3` define swiftcc i64 @"julia_#3_494"(ptr nonnull swiftself %pgcstack, ptr noundef nonnull align 8 dereferenceable(16) %"x::BitSet") #0 !dbg !5 { top: call void @llvm.dbg.declare(metadata ptr %"x::BitSet", metadata !21, metadata !DIExpression()), !dbg !22 %ptls_field = getelementptr inbounds ptr, ptr %pgcstack, i64 2 %ptls_load = load ptr, ptr %ptls_field, align 8, !tbaa !23 %0 = getelementptr inbounds ptr, ptr %ptls_load, i64 2 %safepoint = load ptr, ptr %0, align 8, !tbaa !27 fence syncscope("singlethread") seq_cst %1 = load volatile i64, ptr %safepoint, align 8, !dbg !22 fence syncscope("singlethread") seq_cst ; ┌ @ Base.jl:49 within `getproperty` %"x::BitSet.bits" = load atomic ptr, ptr %"x::BitSet" unordered, align 8, !dbg !29, !tbaa !27, !alias.scope !33, !noalias !36, !nonnull !11, !dereferenceable !41, !align !42 ; └ ; ┌ @ gcutils.jl:253 within `safepoint` %ptls_load4 = load ptr, ptr %ptls_field, align 8, !dbg !43, !tbaa !23 %2 = getelementptr inbounds ptr, ptr %ptls_load4, i64 2, !dbg !43 %safepoint5 = load ptr, ptr %2, align 8, !dbg !43, !tbaa !27 fence syncscope("singlethread") seq_cst, !dbg !43 %3 = load volatile i64, ptr %safepoint5, align 8, !dbg !43 fence syncscope("singlethread") seq_cst, !dbg !43 ; └ ; ┌ @ essentials.jl:892 within `getindex` %4 = load ptr, ptr %"x::BitSet.bits", align 8, !dbg !46, !tbaa !49, !alias.scope !52, !noalias !53 %5 = load i64, ptr %4, align 8, !dbg !46, !tbaa !54, !alias.scope !57, !noalias !58 ret i64 %5, !dbg !46 ; └ } ```
For example, we seek to eliminate the gc frame from this function, as observed here: ```julia julia> code_llvm((BitSet,), raw=true) do x; r = x.bits; GC.safepoint(); @inbounds r[1]; end ; Function Signature: var"JuliaLang#3"(Base.BitSet) ; @ REPL[1]:1 within `JuliaLang#3` define swiftcc i64 @"julia_#3_494"(ptr nonnull swiftself %pgcstack, ptr noundef nonnull align 8 dereferenceable(16) %"x::BitSet") #0 !dbg !5 { top: call void @llvm.dbg.declare(metadata ptr %"x::BitSet", metadata !21, metadata !DIExpression()), !dbg !22 %ptls_field = getelementptr inbounds ptr, ptr %pgcstack, i64 2 %ptls_load = load ptr, ptr %ptls_field, align 8, !tbaa !23 %0 = getelementptr inbounds ptr, ptr %ptls_load, i64 2 %safepoint = load ptr, ptr %0, align 8, !tbaa !27 fence syncscope("singlethread") seq_cst %1 = load volatile i64, ptr %safepoint, align 8, !dbg !22 fence syncscope("singlethread") seq_cst ; ┌ @ Base.jl:49 within `getproperty` %"x::BitSet.bits" = load atomic ptr, ptr %"x::BitSet" unordered, align 8, !dbg !29, !tbaa !27, !alias.scope !33, !noalias !36, !nonnull !11, !dereferenceable !41, !align !42 ; └ ; ┌ @ gcutils.jl:253 within `safepoint` %ptls_load4 = load ptr, ptr %ptls_field, align 8, !dbg !43, !tbaa !23 %2 = getelementptr inbounds ptr, ptr %ptls_load4, i64 2, !dbg !43 %safepoint5 = load ptr, ptr %2, align 8, !dbg !43, !tbaa !27 fence syncscope("singlethread") seq_cst, !dbg !43 %3 = load volatile i64, ptr %safepoint5, align 8, !dbg !43 fence syncscope("singlethread") seq_cst, !dbg !43 ; └ ; ┌ @ essentials.jl:892 within `getindex` %4 = load ptr, ptr %"x::BitSet.bits", align 8, !dbg !46, !tbaa !49, !alias.scope !52, !noalias !53 %5 = load i64, ptr %4, align 8, !dbg !46, !tbaa !54, !alias.scope !57, !noalias !58 ret i64 %5, !dbg !46 ; └ } ```
The functions `toms`, `tons`, and `days` uses `sum` over a vector of `Period`s to obtain the conversion of a `CompoundPeriod`. However, the compiler cannot infer the return type because those functions can return either `Int` or `Float` depending on the type of the `Period`. This PR forces the result of those functions to be `Float64`, fixing the type stability. Before this PR we had: ```julia julia> using Dates julia> p = Dates.Second(1) + Dates.Minute(1) + Dates.Year(1) 1 year, 1 minute, 1 second julia> @code_warntype Dates.tons(p) MethodInstance for Dates.tons(::Dates.CompoundPeriod) from tons(c::Dates.CompoundPeriod) @ Dates ~/.julia/juliaup/julia-nightly/share/julia/stdlib/v1.12/Dates/src/periods.jl:458 Arguments #self#::Core.Const(Dates.tons) c::Dates.CompoundPeriod Body::Any 1 ─ %1 = Dates.isempty::Core.Const(isempty) │ %2 = Base.getproperty(c, :periods)::Vector{Period} │ %3 = (%1)(%2)::Bool └── goto #3 if not %3 2 ─ return 0.0 3 ─ %6 = Dates.Float64::Core.Const(Float64) │ %7 = Dates.sum::Core.Const(sum) │ %8 = Dates.tons::Core.Const(Dates.tons) │ %9 = Base.getproperty(c, :periods)::Vector{Period} │ %10 = (%7)(%8, %9)::Any │ %11 = (%6)(%10)::Any └── return %11 julia> @code_warntype Dates.toms(p) MethodInstance for Dates.toms(::Dates.CompoundPeriod) from toms(c::Dates.CompoundPeriod) @ Dates ~/.julia/juliaup/julia-nightly/share/julia/stdlib/v1.12/Dates/src/periods.jl:454 Arguments #self#::Core.Const(Dates.toms) c::Dates.CompoundPeriod Body::Any 1 ─ %1 = Dates.isempty::Core.Const(isempty) │ %2 = Base.getproperty(c, :periods)::Vector{Period} │ %3 = (%1)(%2)::Bool └── goto #3 if not %3 2 ─ return 0.0 3 ─ %6 = Dates.Float64::Core.Const(Float64) │ %7 = Dates.sum::Core.Const(sum) │ %8 = Dates.toms::Core.Const(Dates.toms) │ %9 = Base.getproperty(c, :periods)::Vector{Period} │ %10 = (%7)(%8, %9)::Any │ %11 = (%6)(%10)::Any └── return %11 julia> @code_warntype Dates.days(p) MethodInstance for Dates.days(::Dates.CompoundPeriod) from days(c::Dates.CompoundPeriod) @ Dates ~/.julia/juliaup/julia-nightly/share/julia/stdlib/v1.12/Dates/src/periods.jl:468 Arguments #self#::Core.Const(Dates.days) c::Dates.CompoundPeriod Body::Any 1 ─ %1 = Dates.isempty::Core.Const(isempty) │ %2 = Base.getproperty(c, :periods)::Vector{Period} │ %3 = (%1)(%2)::Bool └── goto #3 if not %3 2 ─ return 0.0 3 ─ %6 = Dates.Float64::Core.Const(Float64) │ %7 = Dates.sum::Core.Const(sum) │ %8 = Dates.days::Core.Const(Dates.days) │ %9 = Base.getproperty(c, :periods)::Vector{Period} │ %10 = (%7)(%8, %9)::Any │ %11 = (%6)(%10)::Any └── return %11 ``` After this PR we have: ```julia julia> using Dates julia> p = Dates.Second(1) + Dates.Minute(1) + Dates.Year(1) 1 year, 1 minute, 1 second julia> @code_warntype Dates.tons(p) MethodInstance for Dates.tons(::Dates.CompoundPeriod) from tons(c::Dates.CompoundPeriod) @ Dates ~/.julia/juliaup/julia-nightly/share/julia/stdlib/v1.12/Dates/src/periods.jl:458 Arguments #self#::Core.Const(Dates.tons) c::Dates.CompoundPeriod Body::Float64 1 ─ %1 = Dates.isempty::Core.Const(isempty) │ %2 = Base.getproperty(c, :periods)::Vector{Period} │ %3 = (%1)(%2)::Bool └── goto #3 if not %3 2 ─ return 0.0 3 ─ %6 = Dates.Float64::Core.Const(Float64) │ %7 = Dates.sum::Core.Const(sum) │ %8 = Dates.tons::Core.Const(Dates.tons) │ %9 = Base.getproperty(c, :periods)::Vector{Period} │ %10 = (%7)(%8, %9)::Any │ %11 = (%6)(%10)::Any │ %12 = Dates.Float64::Core.Const(Float64) │ %13 = Core.typeassert(%11, %12)::Float64 └── return %13 julia> @code_warntype Dates.toms(p) MethodInstance for Dates.toms(::Dates.CompoundPeriod) from toms(c::Dates.CompoundPeriod) @ Dates ~/.julia/juliaup/julia-nightly/share/julia/stdlib/v1.12/Dates/src/periods.jl:454 Arguments #self#::Core.Const(Dates.toms) c::Dates.CompoundPeriod Body::Float64 1 ─ %1 = Dates.isempty::Core.Const(isempty) │ %2 = Base.getproperty(c, :periods)::Vector{Period} │ %3 = (%1)(%2)::Bool └── goto #3 if not %3 2 ─ return 0.0 3 ─ %6 = Dates.Float64::Core.Const(Float64) │ %7 = Dates.sum::Core.Const(sum) │ %8 = Dates.toms::Core.Const(Dates.toms) │ %9 = Base.getproperty(c, :periods)::Vector{Period} │ %10 = (%7)(%8, %9)::Any │ %11 = (%6)(%10)::Any │ %12 = Dates.Float64::Core.Const(Float64) │ %13 = Core.typeassert(%11, %12)::Float64 └── return %13 julia> @code_warntype Dates.days(p) MethodInstance for Dates.days(::Dates.CompoundPeriod) from days(c::Dates.CompoundPeriod) @ Dates ~/.julia/juliaup/julia-nightly/share/julia/stdlib/v1.12/Dates/src/periods.jl:468 Arguments #self#::Core.Const(Dates.days) c::Dates.CompoundPeriod Body::Float64 1 ─ %1 = Dates.isempty::Core.Const(isempty) │ %2 = Base.getproperty(c, :periods)::Vector{Period} │ %3 = (%1)(%2)::Bool └── goto #3 if not %3 2 ─ return 0.0 3 ─ %6 = Dates.Float64::Core.Const(Float64) │ %7 = Dates.sum::Core.Const(sum) │ %8 = Dates.days::Core.Const(Dates.days) │ %9 = Base.getproperty(c, :periods)::Vector{Period} │ %10 = (%7)(%8, %9)::Any │ %11 = (%6)(%10)::Any │ %12 = Dates.Float64::Core.Const(Float64) │ %13 = Core.typeassert(%11, %12)::Float64 └── return %13 ```
E.g. this allows `finalizer` inlining in the following case: ```julia mutable struct ForeignBuffer{T} const ptr::Ptr{T} end const foreign_buffer_finalized = Ref(false) function foreign_alloc(::Type{T}, length) where T ptr = Libc.malloc(sizeof(T) * length) ptr = Base.unsafe_convert(Ptr{T}, ptr) obj = ForeignBuffer{T}(ptr) return finalizer(obj) do obj Base.@assume_effects :notaskstate :nothrow foreign_buffer_finalized[] = true Libc.free(obj.ptr) end end function f_EA_finalizer(N::Int) workspace = foreign_alloc(Float64, N) GC.@preserve workspace begin (;ptr) = workspace Base.@assume_effects :nothrow @noinline println(devnull, "ptr = ", ptr) end end ``` ```julia julia> @code_typed f_EA_finalizer(42) CodeInfo( 1 ── %1 = Base.mul_int(8, N)::Int64 │ %2 = Core.lshr_int(%1, 63)::Int64 │ %3 = Core.trunc_int(Core.UInt8, %2)::UInt8 │ %4 = Core.eq_int(%3, 0x01)::Bool └─── goto #3 if not %4 2 ── invoke Core.throw_inexacterror(:convert::Symbol, UInt64::Type, %1::Int64)::Union{} └─── unreachable 3 ── goto #4 4 ── %9 = Core.bitcast(Core.UInt64, %1)::UInt64 └─── goto #5 5 ── goto #6 6 ── goto #7 7 ── goto #8 8 ── %14 = $(Expr(:foreigncall, :(:malloc), Ptr{Nothing}, svec(UInt64), 0, :(:ccall), :(%9), :(%9)))::Ptr{Nothing} └─── goto #9 9 ── %16 = Base.bitcast(Ptr{Float64}, %14)::Ptr{Float64} │ %17 = %new(ForeignBuffer{Float64}, %16)::ForeignBuffer{Float64} └─── goto #10 10 ─ %19 = $(Expr(:gc_preserve_begin, :(%17))) │ %20 = Base.getfield(%17, :ptr)::Ptr{Float64} │ invoke Main.println(Main.devnull::Base.DevNull, "ptr = "::String, %20::Ptr{Float64})::Nothing │ $(Expr(:gc_preserve_end, :(%19))) │ %23 = Main.foreign_buffer_finalized::Base.RefValue{Bool} │ Base.setfield!(%23, :x, true)::Bool │ %25 = Base.getfield(%17, :ptr)::Ptr{Float64} │ %26 = Base.bitcast(Ptr{Nothing}, %25)::Ptr{Nothing} │ $(Expr(:foreigncall, :(:free), Nothing, svec(Ptr{Nothing}), 0, :(:ccall), :(%26), :(%25)))::Nothing └─── return nothing ) => Nothing ``` However, this is still a WIP. Before merging, I want to improve EA's precision a bit and at least fix the test case that is currently marked as `broken`. I also need to check its impact on compiler performance. Additionally, I believe this feature is not yet practical. In particular, there is still significant room for improvement in the following areas: - EA's interprocedural capabilities: currently EA is performed ad-hoc for limited frames because of latency reasons, which significantly reduces its precision in the presence of interprocedural calls. - Relaxing the `:nothrow` check for finalizer inlining: the current algorithm requires `:nothrow`-ness on all paths from the allocation of the mutable struct to its last use, which is not practical for real-world cases. Even when `:nothrow` cannot be guaranteed, auxiliary optimizations such as inserting a `finalize` call after the last use might still be possible.
E.g. this allows `finalizer` inlining in the following case: ```julia mutable struct ForeignBuffer{T} const ptr::Ptr{T} end const foreign_buffer_finalized = Ref(false) function foreign_alloc(::Type{T}, length) where T ptr = Libc.malloc(sizeof(T) * length) ptr = Base.unsafe_convert(Ptr{T}, ptr) obj = ForeignBuffer{T}(ptr) return finalizer(obj) do obj Base.@assume_effects :notaskstate :nothrow foreign_buffer_finalized[] = true Libc.free(obj.ptr) end end function f_EA_finalizer(N::Int) workspace = foreign_alloc(Float64, N) GC.@preserve workspace begin (;ptr) = workspace Base.@assume_effects :nothrow @noinline println(devnull, "ptr = ", ptr) end end ``` ```julia julia> @code_typed f_EA_finalizer(42) CodeInfo( 1 ── %1 = Base.mul_int(8, N)::Int64 │ %2 = Core.lshr_int(%1, 63)::Int64 │ %3 = Core.trunc_int(Core.UInt8, %2)::UInt8 │ %4 = Core.eq_int(%3, 0x01)::Bool └─── goto #3 if not %4 2 ── invoke Core.throw_inexacterror(:convert::Symbol, UInt64::Type, %1::Int64)::Union{} └─── unreachable 3 ── goto #4 4 ── %9 = Core.bitcast(Core.UInt64, %1)::UInt64 └─── goto #5 5 ── goto #6 6 ── goto #7 7 ── goto #8 8 ── %14 = $(Expr(:foreigncall, :(:malloc), Ptr{Nothing}, svec(UInt64), 0, :(:ccall), :(%9), :(%9)))::Ptr{Nothing} └─── goto #9 9 ── %16 = Base.bitcast(Ptr{Float64}, %14)::Ptr{Float64} │ %17 = %new(ForeignBuffer{Float64}, %16)::ForeignBuffer{Float64} └─── goto #10 10 ─ %19 = $(Expr(:gc_preserve_begin, :(%17))) │ %20 = Base.getfield(%17, :ptr)::Ptr{Float64} │ invoke Main.println(Main.devnull::Base.DevNull, "ptr = "::String, %20::Ptr{Float64})::Nothing │ $(Expr(:gc_preserve_end, :(%19))) │ %23 = Main.foreign_buffer_finalized::Base.RefValue{Bool} │ Base.setfield!(%23, :x, true)::Bool │ %25 = Base.getfield(%17, :ptr)::Ptr{Float64} │ %26 = Base.bitcast(Ptr{Nothing}, %25)::Ptr{Nothing} │ $(Expr(:foreigncall, :(:free), Nothing, svec(Ptr{Nothing}), 0, :(:ccall), :(%26), :(%25)))::Nothing └─── return nothing ) => Nothing ``` However, this is still a WIP. Before merging, I want to improve EA's precision a bit and at least fix the test case that is currently marked as `broken`. I also need to check its impact on compiler performance. Additionally, I believe this feature is not yet practical. In particular, there is still significant room for improvement in the following areas: - EA's interprocedural capabilities: currently EA is performed ad-hoc for limited frames because of latency reasons, which significantly reduces its precision in the presence of interprocedural calls. - Relaxing the `:nothrow` check for finalizer inlining: the current algorithm requires `:nothrow`-ness on all paths from the allocation of the mutable struct to its last use, which is not practical for real-world cases. Even when `:nothrow` cannot be guaranteed, auxiliary optimizations such as inserting a `finalize` call after the last use might still be possible.
E.g. this allows `finalizer` inlining in the following case: ```julia mutable struct ForeignBuffer{T} const ptr::Ptr{T} end const foreign_buffer_finalized = Ref(false) function foreign_alloc(::Type{T}, length) where T ptr = Libc.malloc(sizeof(T) * length) ptr = Base.unsafe_convert(Ptr{T}, ptr) obj = ForeignBuffer{T}(ptr) return finalizer(obj) do obj Base.@assume_effects :notaskstate :nothrow foreign_buffer_finalized[] = true Libc.free(obj.ptr) end end function f_EA_finalizer(N::Int) workspace = foreign_alloc(Float64, N) GC.@preserve workspace begin (;ptr) = workspace Base.@assume_effects :nothrow @noinline println(devnull, "ptr = ", ptr) end end ``` ```julia julia> @code_typed f_EA_finalizer(42) CodeInfo( 1 ── %1 = Base.mul_int(8, N)::Int64 │ %2 = Core.lshr_int(%1, 63)::Int64 │ %3 = Core.trunc_int(Core.UInt8, %2)::UInt8 │ %4 = Core.eq_int(%3, 0x01)::Bool └─── goto #3 if not %4 2 ── invoke Core.throw_inexacterror(:convert::Symbol, UInt64::Type, %1::Int64)::Union{} └─── unreachable 3 ── goto #4 4 ── %9 = Core.bitcast(Core.UInt64, %1)::UInt64 └─── goto #5 5 ── goto #6 6 ── goto #7 7 ── goto #8 8 ── %14 = $(Expr(:foreigncall, :(:malloc), Ptr{Nothing}, svec(UInt64), 0, :(:ccall), :(%9), :(%9)))::Ptr{Nothing} └─── goto #9 9 ── %16 = Base.bitcast(Ptr{Float64}, %14)::Ptr{Float64} │ %17 = %new(ForeignBuffer{Float64}, %16)::ForeignBuffer{Float64} └─── goto #10 10 ─ %19 = $(Expr(:gc_preserve_begin, :(%17))) │ %20 = Base.getfield(%17, :ptr)::Ptr{Float64} │ invoke Main.println(Main.devnull::Base.DevNull, "ptr = "::String, %20::Ptr{Float64})::Nothing │ $(Expr(:gc_preserve_end, :(%19))) │ %23 = Main.foreign_buffer_finalized::Base.RefValue{Bool} │ Base.setfield!(%23, :x, true)::Bool │ %25 = Base.getfield(%17, :ptr)::Ptr{Float64} │ %26 = Base.bitcast(Ptr{Nothing}, %25)::Ptr{Nothing} │ $(Expr(:foreigncall, :(:free), Nothing, svec(Ptr{Nothing}), 0, :(:ccall), :(%26), :(%25)))::Nothing └─── return nothing ) => Nothing ``` However, this is still a WIP. Before merging, I want to improve EA's precision a bit and at least fix the test case that is currently marked as `broken`. I also need to check its impact on compiler performance. Additionally, I believe this feature is not yet practical. In particular, there is still significant room for improvement in the following areas: - EA's interprocedural capabilities: currently EA is performed ad-hoc for limited frames because of latency reasons, which significantly reduces its precision in the presence of interprocedural calls. - Relaxing the `:nothrow` check for finalizer inlining: the current algorithm requires `:nothrow`-ness on all paths from the allocation of the mutable struct to its last use, which is not practical for real-world cases. Even when `:nothrow` cannot be guaranteed, auxiliary optimizations such as inserting a `finalize` call after the last use might still be possible.
E.g. this allows `finalizer` inlining in the following case: ```julia mutable struct ForeignBuffer{T} const ptr::Ptr{T} end const foreign_buffer_finalized = Ref(false) function foreign_alloc(::Type{T}, length) where T ptr = Libc.malloc(sizeof(T) * length) ptr = Base.unsafe_convert(Ptr{T}, ptr) obj = ForeignBuffer{T}(ptr) return finalizer(obj) do obj Base.@assume_effects :notaskstate :nothrow foreign_buffer_finalized[] = true Libc.free(obj.ptr) end end function f_EA_finalizer(N::Int) workspace = foreign_alloc(Float64, N) GC.@preserve workspace begin (;ptr) = workspace Base.@assume_effects :nothrow @noinline println(devnull, "ptr = ", ptr) end end ``` ```julia julia> @code_typed f_EA_finalizer(42) CodeInfo( 1 ── %1 = Base.mul_int(8, N)::Int64 │ %2 = Core.lshr_int(%1, 63)::Int64 │ %3 = Core.trunc_int(Core.UInt8, %2)::UInt8 │ %4 = Core.eq_int(%3, 0x01)::Bool └─── goto #3 if not %4 2 ── invoke Core.throw_inexacterror(:convert::Symbol, UInt64::Type, %1::Int64)::Union{} └─── unreachable 3 ── goto #4 4 ── %9 = Core.bitcast(Core.UInt64, %1)::UInt64 └─── goto #5 5 ── goto #6 6 ── goto #7 7 ── goto #8 8 ── %14 = $(Expr(:foreigncall, :(:malloc), Ptr{Nothing}, svec(UInt64), 0, :(:ccall), :(%9), :(%9)))::Ptr{Nothing} └─── goto #9 9 ── %16 = Base.bitcast(Ptr{Float64}, %14)::Ptr{Float64} │ %17 = %new(ForeignBuffer{Float64}, %16)::ForeignBuffer{Float64} └─── goto #10 10 ─ %19 = $(Expr(:gc_preserve_begin, :(%17))) │ %20 = Base.getfield(%17, :ptr)::Ptr{Float64} │ invoke Main.println(Main.devnull::Base.DevNull, "ptr = "::String, %20::Ptr{Float64})::Nothing │ $(Expr(:gc_preserve_end, :(%19))) │ %23 = Main.foreign_buffer_finalized::Base.RefValue{Bool} │ Base.setfield!(%23, :x, true)::Bool │ %25 = Base.getfield(%17, :ptr)::Ptr{Float64} │ %26 = Base.bitcast(Ptr{Nothing}, %25)::Ptr{Nothing} │ $(Expr(:foreigncall, :(:free), Nothing, svec(Ptr{Nothing}), 0, :(:ccall), :(%26), :(%25)))::Nothing └─── return nothing ) => Nothing ``` However, this is still a WIP. Before merging, I want to improve EA's precision a bit and at least fix the test case that is currently marked as `broken`. I also need to check its impact on compiler performance. Additionally, I believe this feature is not yet practical. In particular, there is still significant room for improvement in the following areas: - EA's interprocedural capabilities: currently EA is performed ad-hoc for limited frames because of latency reasons, which significantly reduces its precision in the presence of interprocedural calls. - Relaxing the `:nothrow` check for finalizer inlining: the current algorithm requires `:nothrow`-ness on all paths from the allocation of the mutable struct to its last use, which is not practical for real-world cases. Even when `:nothrow` cannot be guaranteed, auxiliary optimizations such as inserting a `finalize` call after the last use might still be possible.
E.g. this allows `finalizer` inlining in the following case: ```julia mutable struct ForeignBuffer{T} const ptr::Ptr{T} end const foreign_buffer_finalized = Ref(false) function foreign_alloc(::Type{T}, length) where T ptr = Libc.malloc(sizeof(T) * length) ptr = Base.unsafe_convert(Ptr{T}, ptr) obj = ForeignBuffer{T}(ptr) return finalizer(obj) do obj Base.@assume_effects :notaskstate :nothrow foreign_buffer_finalized[] = true Libc.free(obj.ptr) end end function f_EA_finalizer(N::Int) workspace = foreign_alloc(Float64, N) GC.@preserve workspace begin (;ptr) = workspace Base.@assume_effects :nothrow @noinline println(devnull, "ptr = ", ptr) end end ``` ```julia julia> @code_typed f_EA_finalizer(42) CodeInfo( 1 ── %1 = Base.mul_int(8, N)::Int64 │ %2 = Core.lshr_int(%1, 63)::Int64 │ %3 = Core.trunc_int(Core.UInt8, %2)::UInt8 │ %4 = Core.eq_int(%3, 0x01)::Bool └─── goto #3 if not %4 2 ── invoke Core.throw_inexacterror(:convert::Symbol, UInt64::Type, %1::Int64)::Union{} └─── unreachable 3 ── goto #4 4 ── %9 = Core.bitcast(Core.UInt64, %1)::UInt64 └─── goto #5 5 ── goto #6 6 ── goto #7 7 ── goto #8 8 ── %14 = $(Expr(:foreigncall, :(:malloc), Ptr{Nothing}, svec(UInt64), 0, :(:ccall), :(%9), :(%9)))::Ptr{Nothing} └─── goto #9 9 ── %16 = Base.bitcast(Ptr{Float64}, %14)::Ptr{Float64} │ %17 = %new(ForeignBuffer{Float64}, %16)::ForeignBuffer{Float64} └─── goto #10 10 ─ %19 = $(Expr(:gc_preserve_begin, :(%17))) │ %20 = Base.getfield(%17, :ptr)::Ptr{Float64} │ invoke Main.println(Main.devnull::Base.DevNull, "ptr = "::String, %20::Ptr{Float64})::Nothing │ $(Expr(:gc_preserve_end, :(%19))) │ %23 = Main.foreign_buffer_finalized::Base.RefValue{Bool} │ Base.setfield!(%23, :x, true)::Bool │ %25 = Base.getfield(%17, :ptr)::Ptr{Float64} │ %26 = Base.bitcast(Ptr{Nothing}, %25)::Ptr{Nothing} │ $(Expr(:foreigncall, :(:free), Nothing, svec(Ptr{Nothing}), 0, :(:ccall), :(%26), :(%25)))::Nothing └─── return nothing ) => Nothing ``` However, this is still a WIP. Before merging, I want to improve EA's precision a bit and at least fix the test case that is currently marked as `broken`. I also need to check its impact on compiler performance. Additionally, I believe this feature is not yet practical. In particular, there is still significant room for improvement in the following areas: - EA's interprocedural capabilities: currently EA is performed ad-hoc for limited frames because of latency reasons, which significantly reduces its precision in the presence of interprocedural calls. - Relaxing the `:nothrow` check for finalizer inlining: the current algorithm requires `:nothrow`-ness on all paths from the allocation of the mutable struct to its last use, which is not practical for real-world cases. Even when `:nothrow` cannot be guaranteed, auxiliary optimizations such as inserting a `finalize` call after the last use might still be possible.
E.g. this allows `finalizer` inlining in the following case: ```julia mutable struct ForeignBuffer{T} const ptr::Ptr{T} end const foreign_buffer_finalized = Ref(false) function foreign_alloc(::Type{T}, length) where T ptr = Libc.malloc(sizeof(T) * length) ptr = Base.unsafe_convert(Ptr{T}, ptr) obj = ForeignBuffer{T}(ptr) return finalizer(obj) do obj Base.@assume_effects :notaskstate :nothrow foreign_buffer_finalized[] = true Libc.free(obj.ptr) end end function f_EA_finalizer(N::Int) workspace = foreign_alloc(Float64, N) GC.@preserve workspace begin (;ptr) = workspace Base.@assume_effects :nothrow @noinline println(devnull, "ptr = ", ptr) end end ``` ```julia julia> @code_typed f_EA_finalizer(42) CodeInfo( 1 ── %1 = Base.mul_int(8, N)::Int64 │ %2 = Core.lshr_int(%1, 63)::Int64 │ %3 = Core.trunc_int(Core.UInt8, %2)::UInt8 │ %4 = Core.eq_int(%3, 0x01)::Bool └─── goto #3 if not %4 2 ── invoke Core.throw_inexacterror(:convert::Symbol, UInt64::Type, %1::Int64)::Union{} └─── unreachable 3 ── goto #4 4 ── %9 = Core.bitcast(Core.UInt64, %1)::UInt64 └─── goto #5 5 ── goto #6 6 ── goto #7 7 ── goto #8 8 ── %14 = $(Expr(:foreigncall, :(:malloc), Ptr{Nothing}, svec(UInt64), 0, :(:ccall), :(%9), :(%9)))::Ptr{Nothing} └─── goto #9 9 ── %16 = Base.bitcast(Ptr{Float64}, %14)::Ptr{Float64} │ %17 = %new(ForeignBuffer{Float64}, %16)::ForeignBuffer{Float64} └─── goto #10 10 ─ %19 = $(Expr(:gc_preserve_begin, :(%17))) │ %20 = Base.getfield(%17, :ptr)::Ptr{Float64} │ invoke Main.println(Main.devnull::Base.DevNull, "ptr = "::String, %20::Ptr{Float64})::Nothing │ $(Expr(:gc_preserve_end, :(%19))) │ %23 = Main.foreign_buffer_finalized::Base.RefValue{Bool} │ Base.setfield!(%23, :x, true)::Bool │ %25 = Base.getfield(%17, :ptr)::Ptr{Float64} │ %26 = Base.bitcast(Ptr{Nothing}, %25)::Ptr{Nothing} │ $(Expr(:foreigncall, :(:free), Nothing, svec(Ptr{Nothing}), 0, :(:ccall), :(%26), :(%25)))::Nothing └─── return nothing ) => Nothing ``` However, this is still a WIP. Before merging, I want to improve EA's precision a bit and at least fix the test case that is currently marked as `broken`. I also need to check its impact on compiler performance. Additionally, I believe this feature is not yet practical. In particular, there is still significant room for improvement in the following areas: - EA's interprocedural capabilities: currently EA is performed ad-hoc for limited frames because of latency reasons, which significantly reduces its precision in the presence of interprocedural calls. - Relaxing the `:nothrow` check for finalizer inlining: the current algorithm requires `:nothrow`-ness on all paths from the allocation of the mutable struct to its last use, which is not practical for real-world cases. Even when `:nothrow` cannot be guaranteed, auxiliary optimizations such as inserting a `finalize` call after the last use might still be possible.
E.g. this allows `finalizer` inlining in the following case: ```julia mutable struct ForeignBuffer{T} const ptr::Ptr{T} end const foreign_buffer_finalized = Ref(false) function foreign_alloc(::Type{T}, length) where T ptr = Libc.malloc(sizeof(T) * length) ptr = Base.unsafe_convert(Ptr{T}, ptr) obj = ForeignBuffer{T}(ptr) return finalizer(obj) do obj Base.@assume_effects :notaskstate :nothrow foreign_buffer_finalized[] = true Libc.free(obj.ptr) end end function f_EA_finalizer(N::Int) workspace = foreign_alloc(Float64, N) GC.@preserve workspace begin (;ptr) = workspace Base.@assume_effects :nothrow @noinline println(devnull, "ptr = ", ptr) end end ``` ```julia julia> @code_typed f_EA_finalizer(42) CodeInfo( 1 ── %1 = Base.mul_int(8, N)::Int64 │ %2 = Core.lshr_int(%1, 63)::Int64 │ %3 = Core.trunc_int(Core.UInt8, %2)::UInt8 │ %4 = Core.eq_int(%3, 0x01)::Bool └─── goto #3 if not %4 2 ── invoke Core.throw_inexacterror(:convert::Symbol, UInt64::Type, %1::Int64)::Union{} └─── unreachable 3 ── goto #4 4 ── %9 = Core.bitcast(Core.UInt64, %1)::UInt64 └─── goto #5 5 ── goto #6 6 ── goto #7 7 ── goto #8 8 ── %14 = $(Expr(:foreigncall, :(:malloc), Ptr{Nothing}, svec(UInt64), 0, :(:ccall), :(%9), :(%9)))::Ptr{Nothing} └─── goto #9 9 ── %16 = Base.bitcast(Ptr{Float64}, %14)::Ptr{Float64} │ %17 = %new(ForeignBuffer{Float64}, %16)::ForeignBuffer{Float64} └─── goto #10 10 ─ %19 = $(Expr(:gc_preserve_begin, :(%17))) │ %20 = Base.getfield(%17, :ptr)::Ptr{Float64} │ invoke Main.println(Main.devnull::Base.DevNull, "ptr = "::String, %20::Ptr{Float64})::Nothing │ $(Expr(:gc_preserve_end, :(%19))) │ %23 = Main.foreign_buffer_finalized::Base.RefValue{Bool} │ Base.setfield!(%23, :x, true)::Bool │ %25 = Base.getfield(%17, :ptr)::Ptr{Float64} │ %26 = Base.bitcast(Ptr{Nothing}, %25)::Ptr{Nothing} │ $(Expr(:foreigncall, :(:free), Nothing, svec(Ptr{Nothing}), 0, :(:ccall), :(%26), :(%25)))::Nothing └─── return nothing ) => Nothing ``` However, this is still a WIP. Before merging, I want to improve EA's precision a bit and at least fix the test case that is currently marked as `broken`. I also need to check its impact on compiler performance. Additionally, I believe this feature is not yet practical. In particular, there is still significant room for improvement in the following areas: - EA's interprocedural capabilities: currently EA is performed ad-hoc for limited frames because of latency reasons, which significantly reduces its precision in the presence of interprocedural calls. - Relaxing the `:nothrow` check for finalizer inlining: the current algorithm requires `:nothrow`-ness on all paths from the allocation of the mutable struct to its last use, which is not practical for real-world cases. Even when `:nothrow` cannot be guaranteed, auxiliary optimizations such as inserting a `finalize` call after the last use might still be possible.
E.g. this allows `finalizer` inlining in the following case: ```julia mutable struct ForeignBuffer{T} const ptr::Ptr{T} end const foreign_buffer_finalized = Ref(false) function foreign_alloc(::Type{T}, length) where T ptr = Libc.malloc(sizeof(T) * length) ptr = Base.unsafe_convert(Ptr{T}, ptr) obj = ForeignBuffer{T}(ptr) return finalizer(obj) do obj Base.@assume_effects :notaskstate :nothrow foreign_buffer_finalized[] = true Libc.free(obj.ptr) end end function f_EA_finalizer(N::Int) workspace = foreign_alloc(Float64, N) GC.@preserve workspace begin (;ptr) = workspace Base.@assume_effects :nothrow @noinline println(devnull, "ptr = ", ptr) end end ``` ```julia julia> @code_typed f_EA_finalizer(42) CodeInfo( 1 ── %1 = Base.mul_int(8, N)::Int64 │ %2 = Core.lshr_int(%1, 63)::Int64 │ %3 = Core.trunc_int(Core.UInt8, %2)::UInt8 │ %4 = Core.eq_int(%3, 0x01)::Bool └─── goto #3 if not %4 2 ── invoke Core.throw_inexacterror(:convert::Symbol, UInt64::Type, %1::Int64)::Union{} └─── unreachable 3 ── goto #4 4 ── %9 = Core.bitcast(Core.UInt64, %1)::UInt64 └─── goto #5 5 ── goto #6 6 ── goto #7 7 ── goto #8 8 ── %14 = $(Expr(:foreigncall, :(:malloc), Ptr{Nothing}, svec(UInt64), 0, :(:ccall), :(%9), :(%9)))::Ptr{Nothing} └─── goto #9 9 ── %16 = Base.bitcast(Ptr{Float64}, %14)::Ptr{Float64} │ %17 = %new(ForeignBuffer{Float64}, %16)::ForeignBuffer{Float64} └─── goto #10 10 ─ %19 = $(Expr(:gc_preserve_begin, :(%17))) │ %20 = Base.getfield(%17, :ptr)::Ptr{Float64} │ invoke Main.println(Main.devnull::Base.DevNull, "ptr = "::String, %20::Ptr{Float64})::Nothing │ $(Expr(:gc_preserve_end, :(%19))) │ %23 = Main.foreign_buffer_finalized::Base.RefValue{Bool} │ Base.setfield!(%23, :x, true)::Bool │ %25 = Base.getfield(%17, :ptr)::Ptr{Float64} │ %26 = Base.bitcast(Ptr{Nothing}, %25)::Ptr{Nothing} │ $(Expr(:foreigncall, :(:free), Nothing, svec(Ptr{Nothing}), 0, :(:ccall), :(%26), :(%25)))::Nothing └─── return nothing ) => Nothing ``` However, this is still a WIP. Before merging, I want to improve EA's precision a bit and at least fix the test case that is currently marked as `broken`. I also need to check its impact on compiler performance. Additionally, I believe this feature is not yet practical. In particular, there is still significant room for improvement in the following areas: - EA's interprocedural capabilities: currently EA is performed ad-hoc for limited frames because of latency reasons, which significantly reduces its precision in the presence of interprocedural calls. - Relaxing the `:nothrow` check for finalizer inlining: the current algorithm requires `:nothrow`-ness on all paths from the allocation of the mutable struct to its last use, which is not practical for real-world cases. Even when `:nothrow` cannot be guaranteed, auxiliary optimizations such as inserting a `finalize` call after the last use might still be possible.
The current approach uses polymorphism to make RopeString objects. This is pretty inefficient for the typical small string use-case. To efficiently construct a C-style string in the current framework, one makes the current output stream a memio object and then prints to it. The general pattern I've used is to write a
print_whatever
function and then wrap it in awhatever
function that returns a string usingprint_to_string
. Should we stick with this pattern? It has the advantage of allowing the printing version to be very efficient, but it's kind of awkward to write. Should we figure out a different pattern? Something like C#'sStringBuilder
pattern?Perhaps it suffices to make
strcat
check the size and encodings of its arguments and useprint_to_string
approach to concatenate them into a copied string where appropriate — namely when the arguments are of compatible encodings (e.g. any mixture ofASCIIString
andUTF8String
), and if concatenated they would be below some size threshold. For larger strings, we should continue to use theRopeString
approach. Also, string slices should copy their contents as well unless the resulting string is above the "large string" threshold, in which case, they can continue to use the currentSubString
with the known issue that this pins the superstring in memory.The text was updated successfully, but these errors were encountered: