Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

systematic, efficient approach to string construction #3

Closed
StefanKarpinski opened this issue Apr 27, 2011 · 10 comments
Closed

systematic, efficient approach to string construction #3

StefanKarpinski opened this issue Apr 27, 2011 · 10 comments
Assignees

Comments

@StefanKarpinski
Copy link
Sponsor Member

The current approach uses polymorphism to make RopeString objects. This is pretty inefficient for the typical small string use-case. To efficiently construct a C-style string in the current framework, one makes the current output stream a memio object and then prints to it. The general pattern I've used is to write a print_whatever function and then wrap it in a whatever function that returns a string using print_to_string. Should we stick with this pattern? It has the advantage of allowing the printing version to be very efficient, but it's kind of awkward to write. Should we figure out a different pattern? Something like C#'s StringBuilder pattern?

Perhaps it suffices to make strcat check the size and encodings of its arguments and use print_to_string approach to concatenate them into a copied string where appropriate — namely when the arguments are of compatible encodings (e.g. any mixture of ASCIIString and UTF8String), and if concatenated they would be below some size threshold. For larger strings, we should continue to use the RopeString approach. Also, string slices should copy their contents as well unless the resulting string is above the "large string" threshold, in which case, they can continue to use the current SubString with the known issue that this pins the superstring in memory.

@ghost ghost assigned StefanKarpinski Apr 27, 2011
@JeffBezanson
Copy link
Sponsor Member

Changing types based on string lengths makes it too hard to infer the types of these rather common operations. Instead, we should have the option to wrap a string as BigString(s) if s might be large, and BigString can use the memory-saving versions of these operations.
There's not much difference between print_to_string and StringBuilder. with_output_stream can be skipped in some cases by using write() with an explicit destination argument. It's also nice to be able to write output directly to an I/O endpoint without building a temporary string first.
Simple string building cases can also be handled by pushing characters into an array.

@StefanKarpinski
Copy link
Sponsor Member Author

Makes sense. I can make the BigString change easily.

Is this an argument for continuing to implement core string building functionality by writing the printing version first and then defining the string creating version by applying print_to_string to the printing version?

@JeffBezanson
Copy link
Sponsor Member

Somewhat, but multiple approaches can be used. For example, if you're just
combining strings and characters you can use write() instead of going
through print_to_string. We might want to provide some nicer names for
memio, takebuf_string, and write, and make it look more like StringBuilder.
Or for something like strcat I would determine the size of the result,
allocate it once, and use memcpy.

The trouble is that if I do something like

write(io, strcat(a,b,c))

what you ideally want is to write each string without forming the temporary.
Even if strcat is written using an i/o buffer you don't get that
automatically here. I might have to say

strcat_to(io, a, b, c)

but that's not a very nice interface. If a, b, or c is a BigString though,
the strcat is done lazily and you get the desired behavior of writing all
the pieces with no copying. This seems to convince me that there's no
advantage to writing all the string functions in terms of printing. So do
whatever's simplest/fastest/convenient, and let BigString handle other
concerns. How's that sound?

print_escaped is a bit different since we know that a main use of it is
doing output. So strcat etc. doesn't necessarily need to imitate it.

On Tue, May 3, 2011 at 12:38 PM, StefanKarpinski <
reply@reply.github.com>wrote:

Makes sense. I can make the BigString change easily.

Is this an argument for continuing to implement core string building
functionality by writing the printing version first and then defining the
string creating version by applying print_to_string to the printing version?

Reply to this email directly or view it on GitHub:
#3 (comment)

@ViralBShah
Copy link
Member

This seems like a 2.0 thing.

@StefanKarpinski
Copy link
Sponsor Member Author

We're actually pretty good on this at this point. All strcat and string ref (substring) operations on ASCIIString and UTF8String objects use memcpy now, so they're fast and they don't create exotic string objects (RopeString, SubString, etc.). Repeating a string does create aRepString` object, but I think that's probably acceptable. I could make a copying implementation of that rather easily.

If someone wants to use a StringBuilder pattern, they can write the printing version and then use print_to_string on it. I feel like that's a reasonable approach if one is worried about strcat efficiency, with the added bonus of providing a version of the same functionality that can print without having to build a string at all.

I think this issue is not fully addressed, but well enough for v1.0 for now. Will reassign to v2.0.

@JeffBezanson
Copy link
Sponsor Member

Can I replace memcpy(a) with copy(a)?

@StefanKarpinski
Copy link
Sponsor Member Author

Is copy(a::Array{Uint8,1}) as efficient as memcpy is?

@JeffBezanson
Copy link
Sponsor Member

It should be now that we changed copy_to to use memcpy for arrays where possible.

@StefanKarpinski
Copy link
Sponsor Member Author

We can get rid of memcpy entirely then. I'll do it.

@ViralBShah
Copy link
Member

We also need to experiment with some sizes at which memcpy is faster. It is actually slower for small arrays. Copy_to should have these smarts.

On 10-Jul-2011, at 12:43 AM, JeffBezansonreply@reply.github.com wrote:

It should be now that we changed copy_to to use memcpy for arrays where possible.

Reply to this email directly or view it on GitHub:
#3 (comment)

burrowsa pushed a commit to burrowsa/julia that referenced this issue Mar 24, 2014
aviatesk added a commit that referenced this issue Jul 11, 2023
This commit improves SROA pass by extending the `unswitchtupleunion`
optimization to handle the general parametric types, e.g.:
```julia
julia> struct A{T}
           x::T
       end;

julia> function foo(a1, a2, c)
           t = c ? A(a1) : A(a2)
           return getfield(t, :x)
       end;

julia> only(Base.code_ircode(foo, (Int,Float64,Bool); optimize_until="SROA"))
```

> Before
```
2 1 ─      goto #3 if not _4                                          │
  2 ─ %2 = %new(A{Int64}, _2)::A{Int64}                               │╻ A
  └──      goto #4                                                    │
  3 ─ %4 = %new(A{Float64}, _3)::A{Float64}                           │╻ A
  4 ┄ %5 = φ (#2 => %2, #3 => %4)::Union{A{Float64}, A{Int64}}        │
3 │   %6 = Main.getfield(%5, :x)::Union{Float64, Int64}               │
  └──      return %6                                                  │
   => Union{Float64, Int64}
```

> After
```
julia> only(Base.code_ircode(foo, (Int,Float64,Bool); optimize_until="SROA"))
2 1 ─      goto #3 if not _4                                           │
  2 ─      nothing::A{Int64}                                           │╻ A
  └──      goto #4                                                     │
  3 ─      nothing::A{Float64}                                         │╻ A
  4 ┄ %8 = φ (#2 => _2, #3 => _3)::Union{Float64, Int64}               │
  │        nothing::Union{A{Float64}, A{Int64}}
3 │   %6 = %8::Union{Float64, Int64}                                   │
  └──      return %6                                                   │
   => Union{Float64, Int64}
```
aviatesk added a commit that referenced this issue Jul 11, 2023
This commit improves SROA pass by extending the `unswitchtupleunion`
optimization to handle the general parametric types, e.g.:
```julia
julia> struct A{T}
           x::T
       end;

julia> function foo(a1, a2, c)
           t = c ? A(a1) : A(a2)
           return getfield(t, :x)
       end;

julia> only(Base.code_ircode(foo, (Int,Float64,Bool); optimize_until="SROA"))
```

> Before
```
2 1 ─      goto #3 if not _4                                          │
  2 ─ %2 = %new(A{Int64}, _2)::A{Int64}                               │╻ A
  └──      goto #4                                                    │
  3 ─ %4 = %new(A{Float64}, _3)::A{Float64}                           │╻ A
  4 ┄ %5 = φ (#2 => %2, #3 => %4)::Union{A{Float64}, A{Int64}}        │
3 │   %6 = Main.getfield(%5, :x)::Union{Float64, Int64}               │
  └──      return %6                                                  │
   => Union{Float64, Int64}
```

> After
```
julia> only(Base.code_ircode(foo, (Int,Float64,Bool); optimize_until="SROA"))
2 1 ─      goto #3 if not _4                                           │
  2 ─      nothing::A{Int64}                                           │╻ A
  └──      goto #4                                                     │
  3 ─      nothing::A{Float64}                                         │╻ A
  4 ┄ %8 = φ (#2 => _2, #3 => _3)::Union{Float64, Int64}               │
  │        nothing::Union{A{Float64}, A{Int64}}
3 │   %6 = %8::Union{Float64, Int64}                                   │
  └──      return %6                                                   │
   => Union{Float64, Int64}
```
aviatesk added a commit that referenced this issue Jul 11, 2023
This commit improves SROA pass by extending the `unswitchtupleunion`
optimization to handle the general parametric types, e.g.:
```julia
julia> struct A{T}
           x::T
       end;

julia> function foo(a1, a2, c)
           t = c ? A(a1) : A(a2)
           return getfield(t, :x)
       end;

julia> only(Base.code_ircode(foo, (Int,Float64,Bool); optimize_until="SROA"))
```

> Before
```
2 1 ─      goto #3 if not _4                                          │
  2 ─ %2 = %new(A{Int64}, _2)::A{Int64}                               │╻ A
  └──      goto #4                                                    │
  3 ─ %4 = %new(A{Float64}, _3)::A{Float64}                           │╻ A
  4 ┄ %5 = φ (#2 => %2, #3 => %4)::Union{A{Float64}, A{Int64}}        │
3 │   %6 = Main.getfield(%5, :x)::Union{Float64, Int64}               │
  └──      return %6                                                  │
   => Union{Float64, Int64}
```

> After
```
julia> only(Base.code_ircode(foo, (Int,Float64,Bool); optimize_until="SROA"))
2 1 ─      goto #3 if not _4                                           │
  2 ─      nothing::A{Int64}                                           │╻ A
  └──      goto #4                                                     │
  3 ─      nothing::A{Float64}                                         │╻ A
  4 ┄ %8 = φ (#2 => _2, #3 => _3)::Union{Float64, Int64}               │
  │        nothing::Union{A{Float64}, A{Int64}}
3 │   %6 = %8::Union{Float64, Int64}                                   │
  └──      return %6                                                   │
   => Union{Float64, Int64}
```
aviatesk added a commit that referenced this issue Jul 11, 2023
This commit improves SROA pass by extending the `unswitchtupleunion`
optimization to handle the general parametric types, e.g.:
```julia
julia> struct A{T}
           x::T
       end;

julia> function foo(a1, a2, c)
           t = c ? A(a1) : A(a2)
           return getfield(t, :x)
       end;

julia> only(Base.code_ircode(foo, (Int,Float64,Bool); optimize_until="SROA"))
```

> Before
```
2 1 ─      goto #3 if not _4                                          │
  2 ─ %2 = %new(A{Int64}, _2)::A{Int64}                               │╻ A
  └──      goto #4                                                    │
  3 ─ %4 = %new(A{Float64}, _3)::A{Float64}                           │╻ A
  4 ┄ %5 = φ (#2 => %2, #3 => %4)::Union{A{Float64}, A{Int64}}        │
3 │   %6 = Main.getfield(%5, :x)::Union{Float64, Int64}               │
  └──      return %6                                                  │
   => Union{Float64, Int64}
```

> After
```
julia> only(Base.code_ircode(foo, (Int,Float64,Bool); optimize_until="SROA"))
2 1 ─      goto #3 if not _4                                           │
  2 ─      nothing::A{Int64}                                           │╻ A
  └──      goto #4                                                     │
  3 ─      nothing::A{Float64}                                         │╻ A
  4 ┄ %8 = φ (#2 => _2, #3 => _3)::Union{Float64, Int64}               │
  │        nothing::Union{A{Float64}, A{Int64}}
3 │   %6 = %8::Union{Float64, Int64}                                   │
  └──      return %6                                                   │
   => Union{Float64, Int64}
```
Keno added a commit that referenced this issue Oct 9, 2023
Keno added a commit that referenced this issue Oct 9, 2023
Together with previous commits, fixes #3
Keno added a commit that referenced this issue Oct 9, 2023
Keno pushed a commit that referenced this issue Oct 9, 2023
 Properly resolve the library symbol in :foreigncall (fixes #3)
vtjnash added a commit that referenced this issue Feb 10, 2024
Fixes: #33147
Replaces/Closes: #40445

The difference here, compared to past implementations, is that we use
the zero-cost `isiterable` check on every intermediate step, instead of
wrapping the call in a try/catch and then trying to re-approximate the
`isiterable` afterwards. Some samples:

```julia
julia> Dict(i for i in 1:3)                                                        
ERROR: ArgumentError: AbstractDict(kv): kv needs to be an iterator of 2-tuples or pairs                                                                               
Stacktrace:                                                                                                                                                           
 [1] _throw_dict_kv_error()                                                        
   @ Base ./dict.jl:118                                                                                                                                               
 [2] grow_to!                                                                      
   @ ./dict.jl:132 [inlined]                                                       
 [3] dict_with_eltype                                                                                                                                                 
   @ ./abstractdict.jl:592 [inlined]                                                                                                                                  
 [4] Dict(kv::Base.Generator{UnitRange{Int64}, typeof(identity)})                                                                                                     
   @ Base ./dict.jl:120                                                            
 [5] top-level scope                                                                                                                                                  
   @ REPL[1]:1                                                                     
                                                                                                                                                                      
julia> Dict(i => error("$i") for i in 1:3)                                         
ERROR: 1                                                                                                                                                              
Stacktrace:                                                                                                                                                           
 [1] error(s::String)                                                                                                                                                 
   @ Base ./error.jl:35                                                                                                                                               
 [2] (::var"#3#4")(i::Int64)                                                       
   @ Main ./none:0                                                                 
 [3] iterate                                                                                                                                                          
   @ ./generator.jl:48 [inlined]                                                   
 [4] grow_to!                                                                      
   @ ./dict.jl:124 [inlined]                                                       
 [5] dict_with_eltype                                                              
   @ ./abstractdict.jl:592 [inlined]                                               
 [6] Dict(kv::Base.Generator{UnitRange{Int64}, var"#3#4"})                                                                                                            
   @ Base ./dict.jl:120                                                                                                                                               
 [7] top-level scope                                                               
   @ REPL[2]:1                                                                                                                                                        
```

The other unrelated change here is that `dest = empty(dest, typeof(k),
typeof(v))` is made conditional, so we do not unconditionally construct
an empty Dict in order to discard it and allocate an exact duplicate of
it, but only do so if inference wasn't precise originally.

Co-authored-by: Curtis Vogt <curtis.vogt@gmail.com>
vtjnash added a commit that referenced this issue Feb 26, 2024
For example, we seek to eliminate the gc frame from this function, as
observed here:

```julia
julia> code_llvm((BitSet,), raw=true) do x; r = x.bits; GC.safepoint(); @inbounds r[1]; end
; Function Signature: var"#3"(Base.BitSet)
;  @ REPL[1]:1 within `#3`
define swiftcc i64 @"julia_#3_494"(ptr nonnull swiftself %pgcstack, ptr noundef nonnull align 8 dereferenceable(16) %"x::BitSet") #0 !dbg !5 {
top:
  call void @llvm.dbg.declare(metadata ptr %"x::BitSet", metadata !21, metadata !DIExpression()), !dbg !22
  %ptls_field = getelementptr inbounds ptr, ptr %pgcstack, i64 2
  %ptls_load = load ptr, ptr %ptls_field, align 8, !tbaa !23
  %0 = getelementptr inbounds ptr, ptr %ptls_load, i64 2
  %safepoint = load ptr, ptr %0, align 8, !tbaa !27
  fence syncscope("singlethread") seq_cst
  %1 = load volatile i64, ptr %safepoint, align 8, !dbg !22
  fence syncscope("singlethread") seq_cst
; ┌ @ Base.jl:49 within `getproperty`
   %"x::BitSet.bits" = load atomic ptr, ptr %"x::BitSet" unordered, align 8, !dbg !29, !tbaa !27, !alias.scope !33, !noalias !36, !nonnull !11, !dereferenceable !41, !align !42
; └
; ┌ @ gcutils.jl:253 within `safepoint`
   %ptls_load4 = load ptr, ptr %ptls_field, align 8, !dbg !43, !tbaa !23
   %2 = getelementptr inbounds ptr, ptr %ptls_load4, i64 2, !dbg !43
   %safepoint5 = load ptr, ptr %2, align 8, !dbg !43, !tbaa !27
   fence syncscope("singlethread") seq_cst, !dbg !43
   %3 = load volatile i64, ptr %safepoint5, align 8, !dbg !43
   fence syncscope("singlethread") seq_cst, !dbg !43
; └
; ┌ @ essentials.jl:892 within `getindex`
   %4 = load ptr, ptr %"x::BitSet.bits", align 8, !dbg !46, !tbaa !49, !alias.scope !52, !noalias !53
   %5 = load i64, ptr %4, align 8, !dbg !46, !tbaa !54, !alias.scope !57, !noalias !58
   ret i64 %5, !dbg !46
; └
}
```
vtjnash added a commit that referenced this issue Feb 27, 2024
For example, we seek to eliminate the gc frame from this function, as
observed here:

```julia
julia> code_llvm((BitSet,), raw=true) do x; r = x.bits; GC.safepoint(); @inbounds r[1]; end
; Function Signature: var"#3"(Base.BitSet)
;  @ REPL[1]:1 within `#3`
define swiftcc i64 @"julia_#3_494"(ptr nonnull swiftself %pgcstack, ptr noundef nonnull align 8 dereferenceable(16) %"x::BitSet") #0 !dbg !5 {
top:
  call void @llvm.dbg.declare(metadata ptr %"x::BitSet", metadata !21, metadata !DIExpression()), !dbg !22
  %ptls_field = getelementptr inbounds ptr, ptr %pgcstack, i64 2
  %ptls_load = load ptr, ptr %ptls_field, align 8, !tbaa !23
  %0 = getelementptr inbounds ptr, ptr %ptls_load, i64 2
  %safepoint = load ptr, ptr %0, align 8, !tbaa !27
  fence syncscope("singlethread") seq_cst
  %1 = load volatile i64, ptr %safepoint, align 8, !dbg !22
  fence syncscope("singlethread") seq_cst
; ┌ @ Base.jl:49 within `getproperty`
   %"x::BitSet.bits" = load atomic ptr, ptr %"x::BitSet" unordered, align 8, !dbg !29, !tbaa !27, !alias.scope !33, !noalias !36, !nonnull !11, !dereferenceable !41, !align !42
; └
; ┌ @ gcutils.jl:253 within `safepoint`
   %ptls_load4 = load ptr, ptr %ptls_field, align 8, !dbg !43, !tbaa !23
   %2 = getelementptr inbounds ptr, ptr %ptls_load4, i64 2, !dbg !43
   %safepoint5 = load ptr, ptr %2, align 8, !dbg !43, !tbaa !27
   fence syncscope("singlethread") seq_cst, !dbg !43
   %3 = load volatile i64, ptr %safepoint5, align 8, !dbg !43
   fence syncscope("singlethread") seq_cst, !dbg !43
; └
; ┌ @ essentials.jl:892 within `getindex`
   %4 = load ptr, ptr %"x::BitSet.bits", align 8, !dbg !46, !tbaa !49, !alias.scope !52, !noalias !53
   %5 = load i64, ptr %4, align 8, !dbg !46, !tbaa !54, !alias.scope !57, !noalias !58
   ret i64 %5, !dbg !46
; └
}
```
vtjnash added a commit that referenced this issue Mar 4, 2024
For example, we seek to eliminate the gc frame from this function, as
observed here:

```julia
julia> code_llvm((BitSet,), raw=true) do x; r = x.bits; GC.safepoint(); @inbounds r[1]; end
; Function Signature: var"#3"(Base.BitSet)
;  @ REPL[1]:1 within `#3`
define swiftcc i64 @"julia_#3_494"(ptr nonnull swiftself %pgcstack, ptr noundef nonnull align 8 dereferenceable(16) %"x::BitSet") #0 !dbg !5 {
top:
  call void @llvm.dbg.declare(metadata ptr %"x::BitSet", metadata !21, metadata !DIExpression()), !dbg !22
  %ptls_field = getelementptr inbounds ptr, ptr %pgcstack, i64 2
  %ptls_load = load ptr, ptr %ptls_field, align 8, !tbaa !23
  %0 = getelementptr inbounds ptr, ptr %ptls_load, i64 2
  %safepoint = load ptr, ptr %0, align 8, !tbaa !27
  fence syncscope("singlethread") seq_cst
  %1 = load volatile i64, ptr %safepoint, align 8, !dbg !22
  fence syncscope("singlethread") seq_cst
; ┌ @ Base.jl:49 within `getproperty`
   %"x::BitSet.bits" = load atomic ptr, ptr %"x::BitSet" unordered, align 8, !dbg !29, !tbaa !27, !alias.scope !33, !noalias !36, !nonnull !11, !dereferenceable !41, !align !42
; └
; ┌ @ gcutils.jl:253 within `safepoint`
   %ptls_load4 = load ptr, ptr %ptls_field, align 8, !dbg !43, !tbaa !23
   %2 = getelementptr inbounds ptr, ptr %ptls_load4, i64 2, !dbg !43
   %safepoint5 = load ptr, ptr %2, align 8, !dbg !43, !tbaa !27
   fence syncscope("singlethread") seq_cst, !dbg !43
   %3 = load volatile i64, ptr %safepoint5, align 8, !dbg !43
   fence syncscope("singlethread") seq_cst, !dbg !43
; └
; ┌ @ essentials.jl:892 within `getindex`
   %4 = load ptr, ptr %"x::BitSet.bits", align 8, !dbg !46, !tbaa !49, !alias.scope !52, !noalias !53
   %5 = load i64, ptr %4, align 8, !dbg !46, !tbaa !54, !alias.scope !57, !noalias !58
   ret i64 %5, !dbg !46
; └
}
```
vtjnash referenced this issue Mar 5, 2024
For example, we seek to eliminate the gc frame from this function, as
observed here:

```julia
julia> code_llvm((BitSet,), raw=true) do x; r = x.bits; GC.safepoint(); @inbounds r[1]; end
; Function Signature: var"https://github.com/JuliaLang/julia/issues/3"(Base.BitSet)
;  @ REPL[1]:1 within `https://github.com/JuliaLang/julia/issues/3`
define swiftcc i64 @"julia_#3_494"(ptr nonnull swiftself %pgcstack, ptr noundef nonnull align 8 dereferenceable(16) %"x::BitSet") #0 !dbg !5 {
top:
  call void @llvm.dbg.declare(metadata ptr %"x::BitSet", metadata !21, metadata !DIExpression()), !dbg !22
  %ptls_field = getelementptr inbounds ptr, ptr %pgcstack, i64 2
  %ptls_load = load ptr, ptr %ptls_field, align 8, !tbaa !23
  %0 = getelementptr inbounds ptr, ptr %ptls_load, i64 2
  %safepoint = load ptr, ptr %0, align 8, !tbaa !27
  fence syncscope("singlethread") seq_cst
  %1 = load volatile i64, ptr %safepoint, align 8, !dbg !22
  fence syncscope("singlethread") seq_cst
; ┌ @ Base.jl:49 within `getproperty`
   %"x::BitSet.bits" = load atomic ptr, ptr %"x::BitSet" unordered, align 8, !dbg !29, !tbaa !27, !alias.scope !33, !noalias !36, !nonnull !11, !dereferenceable !41, !align !42
; └
; ┌ @ gcutils.jl:253 within `safepoint`
   %ptls_load4 = load ptr, ptr %ptls_field, align 8, !dbg !43, !tbaa !23
   %2 = getelementptr inbounds ptr, ptr %ptls_load4, i64 2, !dbg !43
   %safepoint5 = load ptr, ptr %2, align 8, !dbg !43, !tbaa !27
   fence syncscope("singlethread") seq_cst, !dbg !43
   %3 = load volatile i64, ptr %safepoint5, align 8, !dbg !43
   fence syncscope("singlethread") seq_cst, !dbg !43
; └
; ┌ @ essentials.jl:892 within `getindex`
   %4 = load ptr, ptr %"x::BitSet.bits", align 8, !dbg !46, !tbaa !49, !alias.scope !52, !noalias !53
   %5 = load i64, ptr %4, align 8, !dbg !46, !tbaa !54, !alias.scope !57, !noalias !58
   ret i64 %5, !dbg !46
; └
}
```
mkitti pushed a commit to mkitti/julia that referenced this issue Apr 13, 2024
For example, we seek to eliminate the gc frame from this function, as
observed here:

```julia
julia> code_llvm((BitSet,), raw=true) do x; r = x.bits; GC.safepoint(); @inbounds r[1]; end
; Function Signature: var"JuliaLang#3"(Base.BitSet)
;  @ REPL[1]:1 within `JuliaLang#3`
define swiftcc i64 @"julia_#3_494"(ptr nonnull swiftself %pgcstack, ptr noundef nonnull align 8 dereferenceable(16) %"x::BitSet") #0 !dbg !5 {
top:
  call void @llvm.dbg.declare(metadata ptr %"x::BitSet", metadata !21, metadata !DIExpression()), !dbg !22
  %ptls_field = getelementptr inbounds ptr, ptr %pgcstack, i64 2
  %ptls_load = load ptr, ptr %ptls_field, align 8, !tbaa !23
  %0 = getelementptr inbounds ptr, ptr %ptls_load, i64 2
  %safepoint = load ptr, ptr %0, align 8, !tbaa !27
  fence syncscope("singlethread") seq_cst
  %1 = load volatile i64, ptr %safepoint, align 8, !dbg !22
  fence syncscope("singlethread") seq_cst
; ┌ @ Base.jl:49 within `getproperty`
   %"x::BitSet.bits" = load atomic ptr, ptr %"x::BitSet" unordered, align 8, !dbg !29, !tbaa !27, !alias.scope !33, !noalias !36, !nonnull !11, !dereferenceable !41, !align !42
; └
; ┌ @ gcutils.jl:253 within `safepoint`
   %ptls_load4 = load ptr, ptr %ptls_field, align 8, !dbg !43, !tbaa !23
   %2 = getelementptr inbounds ptr, ptr %ptls_load4, i64 2, !dbg !43
   %safepoint5 = load ptr, ptr %2, align 8, !dbg !43, !tbaa !27
   fence syncscope("singlethread") seq_cst, !dbg !43
   %3 = load volatile i64, ptr %safepoint5, align 8, !dbg !43
   fence syncscope("singlethread") seq_cst, !dbg !43
; └
; ┌ @ essentials.jl:892 within `getindex`
   %4 = load ptr, ptr %"x::BitSet.bits", align 8, !dbg !46, !tbaa !49, !alias.scope !52, !noalias !53
   %5 = load i64, ptr %4, align 8, !dbg !46, !tbaa !54, !alias.scope !57, !noalias !58
   ret i64 %5, !dbg !46
; └
}
```
Keno pushed a commit that referenced this issue Jun 5, 2024
aviatesk pushed a commit that referenced this issue Jul 14, 2024
The functions `toms`, `tons`, and `days` uses `sum` over a vector of
`Period`s to obtain the conversion of a `CompoundPeriod`. However, the
compiler cannot infer the return type because those functions can return
either `Int` or `Float` depending on the type of the `Period`. This PR
forces the result of those functions to be `Float64`, fixing the type
stability.

Before this PR we had:

```julia
julia> using Dates

julia> p = Dates.Second(1) + Dates.Minute(1) + Dates.Year(1)
1 year, 1 minute, 1 second

julia> @code_warntype Dates.tons(p)
MethodInstance for Dates.tons(::Dates.CompoundPeriod)
  from tons(c::Dates.CompoundPeriod) @ Dates ~/.julia/juliaup/julia-nightly/share/julia/stdlib/v1.12/Dates/src/periods.jl:458
Arguments
  #self#::Core.Const(Dates.tons)
  c::Dates.CompoundPeriod
Body::Any
1 ─ %1  = Dates.isempty::Core.Const(isempty)
│   %2  = Base.getproperty(c, :periods)::Vector{Period}
│   %3  = (%1)(%2)::Bool
└──       goto #3 if not %3
2 ─       return 0.0
3 ─ %6  = Dates.Float64::Core.Const(Float64)
│   %7  = Dates.sum::Core.Const(sum)
│   %8  = Dates.tons::Core.Const(Dates.tons)
│   %9  = Base.getproperty(c, :periods)::Vector{Period}
│   %10 = (%7)(%8, %9)::Any
│   %11 = (%6)(%10)::Any
└──       return %11


julia> @code_warntype Dates.toms(p)
MethodInstance for Dates.toms(::Dates.CompoundPeriod)
  from toms(c::Dates.CompoundPeriod) @ Dates ~/.julia/juliaup/julia-nightly/share/julia/stdlib/v1.12/Dates/src/periods.jl:454
Arguments
  #self#::Core.Const(Dates.toms)
  c::Dates.CompoundPeriod
Body::Any
1 ─ %1  = Dates.isempty::Core.Const(isempty)
│   %2  = Base.getproperty(c, :periods)::Vector{Period}
│   %3  = (%1)(%2)::Bool
└──       goto #3 if not %3
2 ─       return 0.0
3 ─ %6  = Dates.Float64::Core.Const(Float64)
│   %7  = Dates.sum::Core.Const(sum)
│   %8  = Dates.toms::Core.Const(Dates.toms)
│   %9  = Base.getproperty(c, :periods)::Vector{Period}
│   %10 = (%7)(%8, %9)::Any
│   %11 = (%6)(%10)::Any
└──       return %11


julia> @code_warntype Dates.days(p)
MethodInstance for Dates.days(::Dates.CompoundPeriod)
  from days(c::Dates.CompoundPeriod) @ Dates ~/.julia/juliaup/julia-nightly/share/julia/stdlib/v1.12/Dates/src/periods.jl:468
Arguments
  #self#::Core.Const(Dates.days)
  c::Dates.CompoundPeriod
Body::Any
1 ─ %1  = Dates.isempty::Core.Const(isempty)
│   %2  = Base.getproperty(c, :periods)::Vector{Period}
│   %3  = (%1)(%2)::Bool
└──       goto #3 if not %3
2 ─       return 0.0
3 ─ %6  = Dates.Float64::Core.Const(Float64)
│   %7  = Dates.sum::Core.Const(sum)
│   %8  = Dates.days::Core.Const(Dates.days)
│   %9  = Base.getproperty(c, :periods)::Vector{Period}
│   %10 = (%7)(%8, %9)::Any
│   %11 = (%6)(%10)::Any
└──       return %11
```

After this PR we have:

```julia
julia> using Dates

julia> p = Dates.Second(1) + Dates.Minute(1) + Dates.Year(1)
1 year, 1 minute, 1 second

julia> @code_warntype Dates.tons(p)
MethodInstance for Dates.tons(::Dates.CompoundPeriod)
  from tons(c::Dates.CompoundPeriod) @ Dates ~/.julia/juliaup/julia-nightly/share/julia/stdlib/v1.12/Dates/src/periods.jl:458
Arguments
  #self#::Core.Const(Dates.tons)
  c::Dates.CompoundPeriod
Body::Float64
1 ─ %1  = Dates.isempty::Core.Const(isempty)
│   %2  = Base.getproperty(c, :periods)::Vector{Period}
│   %3  = (%1)(%2)::Bool
└──       goto #3 if not %3
2 ─       return 0.0
3 ─ %6  = Dates.Float64::Core.Const(Float64)
│   %7  = Dates.sum::Core.Const(sum)
│   %8  = Dates.tons::Core.Const(Dates.tons)
│   %9  = Base.getproperty(c, :periods)::Vector{Period}
│   %10 = (%7)(%8, %9)::Any
│   %11 = (%6)(%10)::Any
│   %12 = Dates.Float64::Core.Const(Float64)
│   %13 = Core.typeassert(%11, %12)::Float64
└──       return %13


julia> @code_warntype Dates.toms(p)
MethodInstance for Dates.toms(::Dates.CompoundPeriod)
  from toms(c::Dates.CompoundPeriod) @ Dates ~/.julia/juliaup/julia-nightly/share/julia/stdlib/v1.12/Dates/src/periods.jl:454
Arguments
  #self#::Core.Const(Dates.toms)
  c::Dates.CompoundPeriod
Body::Float64
1 ─ %1  = Dates.isempty::Core.Const(isempty)
│   %2  = Base.getproperty(c, :periods)::Vector{Period}
│   %3  = (%1)(%2)::Bool
└──       goto #3 if not %3
2 ─       return 0.0
3 ─ %6  = Dates.Float64::Core.Const(Float64)
│   %7  = Dates.sum::Core.Const(sum)
│   %8  = Dates.toms::Core.Const(Dates.toms)
│   %9  = Base.getproperty(c, :periods)::Vector{Period}
│   %10 = (%7)(%8, %9)::Any
│   %11 = (%6)(%10)::Any
│   %12 = Dates.Float64::Core.Const(Float64)
│   %13 = Core.typeassert(%11, %12)::Float64
└──       return %13


julia> @code_warntype Dates.days(p)
MethodInstance for Dates.days(::Dates.CompoundPeriod)
  from days(c::Dates.CompoundPeriod) @ Dates ~/.julia/juliaup/julia-nightly/share/julia/stdlib/v1.12/Dates/src/periods.jl:468
Arguments
  #self#::Core.Const(Dates.days)
  c::Dates.CompoundPeriod
Body::Float64
1 ─ %1  = Dates.isempty::Core.Const(isempty)
│   %2  = Base.getproperty(c, :periods)::Vector{Period}
│   %3  = (%1)(%2)::Bool
└──       goto #3 if not %3
2 ─       return 0.0
3 ─ %6  = Dates.Float64::Core.Const(Float64)
│   %7  = Dates.sum::Core.Const(sum)
│   %8  = Dates.days::Core.Const(Dates.days)
│   %9  = Base.getproperty(c, :periods)::Vector{Period}
│   %10 = (%7)(%8, %9)::Any
│   %11 = (%6)(%10)::Any
│   %12 = Dates.Float64::Core.Const(Float64)
│   %13 = Core.typeassert(%11, %12)::Float64
└──       return %13
```
aviatesk added a commit that referenced this issue Oct 1, 2024
E.g. this allows `finalizer` inlining in the following case:
```julia
mutable struct ForeignBuffer{T}
    const ptr::Ptr{T}
end
const foreign_buffer_finalized = Ref(false)
function foreign_alloc(::Type{T}, length) where T
    ptr = Libc.malloc(sizeof(T) * length)
    ptr = Base.unsafe_convert(Ptr{T}, ptr)
    obj = ForeignBuffer{T}(ptr)
    return finalizer(obj) do obj
        Base.@assume_effects :notaskstate :nothrow
        foreign_buffer_finalized[] = true
        Libc.free(obj.ptr)
    end
end
function f_EA_finalizer(N::Int)
    workspace = foreign_alloc(Float64, N)
    GC.@preserve workspace begin
        (;ptr) = workspace
        Base.@assume_effects :nothrow @noinline println(devnull, "ptr = ", ptr)
    end
end
```
```julia
julia> @code_typed f_EA_finalizer(42)
CodeInfo(
1 ── %1  = Base.mul_int(8, N)::Int64
│    %2  = Core.lshr_int(%1, 63)::Int64
│    %3  = Core.trunc_int(Core.UInt8, %2)::UInt8
│    %4  = Core.eq_int(%3, 0x01)::Bool
└───       goto #3 if not %4
2 ──       invoke Core.throw_inexacterror(:convert::Symbol, UInt64::Type, %1::Int64)::Union{}
└───       unreachable
3 ──       goto #4
4 ── %9  = Core.bitcast(Core.UInt64, %1)::UInt64
└───       goto #5
5 ──       goto #6
6 ──       goto #7
7 ──       goto #8
8 ── %14 = $(Expr(:foreigncall, :(:malloc), Ptr{Nothing}, svec(UInt64), 0, :(:ccall), :(%9), :(%9)))::Ptr{Nothing}
└───       goto #9
9 ── %16 = Base.bitcast(Ptr{Float64}, %14)::Ptr{Float64}
│    %17 = %new(ForeignBuffer{Float64}, %16)::ForeignBuffer{Float64}
└───       goto #10
10 ─ %19 = $(Expr(:gc_preserve_begin, :(%17)))
│    %20 = Base.getfield(%17, :ptr)::Ptr{Float64}
│          invoke Main.println(Main.devnull::Base.DevNull, "ptr = "::String, %20::Ptr{Float64})::Nothing
│          $(Expr(:gc_preserve_end, :(%19)))
│    %23 = Main.foreign_buffer_finalized::Base.RefValue{Bool}
│          Base.setfield!(%23, :x, true)::Bool
│    %25 = Base.getfield(%17, :ptr)::Ptr{Float64}
│    %26 = Base.bitcast(Ptr{Nothing}, %25)::Ptr{Nothing}
│          $(Expr(:foreigncall, :(:free), Nothing, svec(Ptr{Nothing}), 0, :(:ccall), :(%26), :(%25)))::Nothing
└───       return nothing
) => Nothing
```

However, this is still a WIP. Before merging, I want to improve EA's
precision a bit and at least fix the test case that is currently marked as
`broken`. I also need to check its impact on compiler performance.

Additionally, I believe this feature is not yet practical.
In particular, there is still significant room for improvement in the
following areas:
- EA's interprocedural capabilities: currently EA is performed ad-hoc
  for limited frames because of latency reasons, which significantly
  reduces its precision in the presence of interprocedural calls.
- Relaxing the `:nothrow` check for finalizer inlining: the current
  algorithm requires `:nothrow`-ness on all paths from the allocation of
  the mutable struct to its last use, which is not practical for
  real-world cases. Even when `:nothrow` cannot be guaranteed, auxiliary
  optimizations such as inserting a `finalize` call after the last use
  might still be possible.
aviatesk added a commit that referenced this issue Oct 1, 2024
E.g. this allows `finalizer` inlining in the following case:
```julia
mutable struct ForeignBuffer{T}
    const ptr::Ptr{T}
end
const foreign_buffer_finalized = Ref(false)
function foreign_alloc(::Type{T}, length) where T
    ptr = Libc.malloc(sizeof(T) * length)
    ptr = Base.unsafe_convert(Ptr{T}, ptr)
    obj = ForeignBuffer{T}(ptr)
    return finalizer(obj) do obj
        Base.@assume_effects :notaskstate :nothrow
        foreign_buffer_finalized[] = true
        Libc.free(obj.ptr)
    end
end
function f_EA_finalizer(N::Int)
    workspace = foreign_alloc(Float64, N)
    GC.@preserve workspace begin
        (;ptr) = workspace
        Base.@assume_effects :nothrow @noinline println(devnull, "ptr = ", ptr)
    end
end
```
```julia
julia> @code_typed f_EA_finalizer(42)
CodeInfo(
1 ── %1  = Base.mul_int(8, N)::Int64
│    %2  = Core.lshr_int(%1, 63)::Int64
│    %3  = Core.trunc_int(Core.UInt8, %2)::UInt8
│    %4  = Core.eq_int(%3, 0x01)::Bool
└───       goto #3 if not %4
2 ──       invoke Core.throw_inexacterror(:convert::Symbol, UInt64::Type, %1::Int64)::Union{}
└───       unreachable
3 ──       goto #4
4 ── %9  = Core.bitcast(Core.UInt64, %1)::UInt64
└───       goto #5
5 ──       goto #6
6 ──       goto #7
7 ──       goto #8
8 ── %14 = $(Expr(:foreigncall, :(:malloc), Ptr{Nothing}, svec(UInt64), 0, :(:ccall), :(%9), :(%9)))::Ptr{Nothing}
└───       goto #9
9 ── %16 = Base.bitcast(Ptr{Float64}, %14)::Ptr{Float64}
│    %17 = %new(ForeignBuffer{Float64}, %16)::ForeignBuffer{Float64}
└───       goto #10
10 ─ %19 = $(Expr(:gc_preserve_begin, :(%17)))
│    %20 = Base.getfield(%17, :ptr)::Ptr{Float64}
│          invoke Main.println(Main.devnull::Base.DevNull, "ptr = "::String, %20::Ptr{Float64})::Nothing
│          $(Expr(:gc_preserve_end, :(%19)))
│    %23 = Main.foreign_buffer_finalized::Base.RefValue{Bool}
│          Base.setfield!(%23, :x, true)::Bool
│    %25 = Base.getfield(%17, :ptr)::Ptr{Float64}
│    %26 = Base.bitcast(Ptr{Nothing}, %25)::Ptr{Nothing}
│          $(Expr(:foreigncall, :(:free), Nothing, svec(Ptr{Nothing}), 0, :(:ccall), :(%26), :(%25)))::Nothing
└───       return nothing
) => Nothing
```

However, this is still a WIP. Before merging, I want to improve EA's
precision a bit and at least fix the test case that is currently marked as
`broken`. I also need to check its impact on compiler performance.

Additionally, I believe this feature is not yet practical.
In particular, there is still significant room for improvement in the
following areas:
- EA's interprocedural capabilities: currently EA is performed ad-hoc
  for limited frames because of latency reasons, which significantly
  reduces its precision in the presence of interprocedural calls.
- Relaxing the `:nothrow` check for finalizer inlining: the current
  algorithm requires `:nothrow`-ness on all paths from the allocation of
  the mutable struct to its last use, which is not practical for
  real-world cases. Even when `:nothrow` cannot be guaranteed, auxiliary
  optimizations such as inserting a `finalize` call after the last use
  might still be possible.
aviatesk added a commit that referenced this issue Oct 2, 2024
E.g. this allows `finalizer` inlining in the following case:
```julia
mutable struct ForeignBuffer{T}
    const ptr::Ptr{T}
end
const foreign_buffer_finalized = Ref(false)
function foreign_alloc(::Type{T}, length) where T
    ptr = Libc.malloc(sizeof(T) * length)
    ptr = Base.unsafe_convert(Ptr{T}, ptr)
    obj = ForeignBuffer{T}(ptr)
    return finalizer(obj) do obj
        Base.@assume_effects :notaskstate :nothrow
        foreign_buffer_finalized[] = true
        Libc.free(obj.ptr)
    end
end
function f_EA_finalizer(N::Int)
    workspace = foreign_alloc(Float64, N)
    GC.@preserve workspace begin
        (;ptr) = workspace
        Base.@assume_effects :nothrow @noinline println(devnull, "ptr = ", ptr)
    end
end
```
```julia
julia> @code_typed f_EA_finalizer(42)
CodeInfo(
1 ── %1  = Base.mul_int(8, N)::Int64
│    %2  = Core.lshr_int(%1, 63)::Int64
│    %3  = Core.trunc_int(Core.UInt8, %2)::UInt8
│    %4  = Core.eq_int(%3, 0x01)::Bool
└───       goto #3 if not %4
2 ──       invoke Core.throw_inexacterror(:convert::Symbol, UInt64::Type, %1::Int64)::Union{}
└───       unreachable
3 ──       goto #4
4 ── %9  = Core.bitcast(Core.UInt64, %1)::UInt64
└───       goto #5
5 ──       goto #6
6 ──       goto #7
7 ──       goto #8
8 ── %14 = $(Expr(:foreigncall, :(:malloc), Ptr{Nothing}, svec(UInt64), 0, :(:ccall), :(%9), :(%9)))::Ptr{Nothing}
└───       goto #9
9 ── %16 = Base.bitcast(Ptr{Float64}, %14)::Ptr{Float64}
│    %17 = %new(ForeignBuffer{Float64}, %16)::ForeignBuffer{Float64}
└───       goto #10
10 ─ %19 = $(Expr(:gc_preserve_begin, :(%17)))
│    %20 = Base.getfield(%17, :ptr)::Ptr{Float64}
│          invoke Main.println(Main.devnull::Base.DevNull, "ptr = "::String, %20::Ptr{Float64})::Nothing
│          $(Expr(:gc_preserve_end, :(%19)))
│    %23 = Main.foreign_buffer_finalized::Base.RefValue{Bool}
│          Base.setfield!(%23, :x, true)::Bool
│    %25 = Base.getfield(%17, :ptr)::Ptr{Float64}
│    %26 = Base.bitcast(Ptr{Nothing}, %25)::Ptr{Nothing}
│          $(Expr(:foreigncall, :(:free), Nothing, svec(Ptr{Nothing}), 0, :(:ccall), :(%26), :(%25)))::Nothing
└───       return nothing
) => Nothing
```

However, this is still a WIP. Before merging, I want to improve EA's
precision a bit and at least fix the test case that is currently marked as
`broken`. I also need to check its impact on compiler performance.

Additionally, I believe this feature is not yet practical.
In particular, there is still significant room for improvement in the
following areas:
- EA's interprocedural capabilities: currently EA is performed ad-hoc
  for limited frames because of latency reasons, which significantly
  reduces its precision in the presence of interprocedural calls.
- Relaxing the `:nothrow` check for finalizer inlining: the current
  algorithm requires `:nothrow`-ness on all paths from the allocation of
  the mutable struct to its last use, which is not practical for
  real-world cases. Even when `:nothrow` cannot be guaranteed, auxiliary
  optimizations such as inserting a `finalize` call after the last use
  might still be possible.
aviatesk added a commit that referenced this issue Oct 2, 2024
E.g. this allows `finalizer` inlining in the following case:
```julia
mutable struct ForeignBuffer{T}
    const ptr::Ptr{T}
end
const foreign_buffer_finalized = Ref(false)
function foreign_alloc(::Type{T}, length) where T
    ptr = Libc.malloc(sizeof(T) * length)
    ptr = Base.unsafe_convert(Ptr{T}, ptr)
    obj = ForeignBuffer{T}(ptr)
    return finalizer(obj) do obj
        Base.@assume_effects :notaskstate :nothrow
        foreign_buffer_finalized[] = true
        Libc.free(obj.ptr)
    end
end
function f_EA_finalizer(N::Int)
    workspace = foreign_alloc(Float64, N)
    GC.@preserve workspace begin
        (;ptr) = workspace
        Base.@assume_effects :nothrow @noinline println(devnull, "ptr = ", ptr)
    end
end
```
```julia
julia> @code_typed f_EA_finalizer(42)
CodeInfo(
1 ── %1  = Base.mul_int(8, N)::Int64
│    %2  = Core.lshr_int(%1, 63)::Int64
│    %3  = Core.trunc_int(Core.UInt8, %2)::UInt8
│    %4  = Core.eq_int(%3, 0x01)::Bool
└───       goto #3 if not %4
2 ──       invoke Core.throw_inexacterror(:convert::Symbol, UInt64::Type, %1::Int64)::Union{}
└───       unreachable
3 ──       goto #4
4 ── %9  = Core.bitcast(Core.UInt64, %1)::UInt64
└───       goto #5
5 ──       goto #6
6 ──       goto #7
7 ──       goto #8
8 ── %14 = $(Expr(:foreigncall, :(:malloc), Ptr{Nothing}, svec(UInt64), 0, :(:ccall), :(%9), :(%9)))::Ptr{Nothing}
└───       goto #9
9 ── %16 = Base.bitcast(Ptr{Float64}, %14)::Ptr{Float64}
│    %17 = %new(ForeignBuffer{Float64}, %16)::ForeignBuffer{Float64}
└───       goto #10
10 ─ %19 = $(Expr(:gc_preserve_begin, :(%17)))
│    %20 = Base.getfield(%17, :ptr)::Ptr{Float64}
│          invoke Main.println(Main.devnull::Base.DevNull, "ptr = "::String, %20::Ptr{Float64})::Nothing
│          $(Expr(:gc_preserve_end, :(%19)))
│    %23 = Main.foreign_buffer_finalized::Base.RefValue{Bool}
│          Base.setfield!(%23, :x, true)::Bool
│    %25 = Base.getfield(%17, :ptr)::Ptr{Float64}
│    %26 = Base.bitcast(Ptr{Nothing}, %25)::Ptr{Nothing}
│          $(Expr(:foreigncall, :(:free), Nothing, svec(Ptr{Nothing}), 0, :(:ccall), :(%26), :(%25)))::Nothing
└───       return nothing
) => Nothing
```

However, this is still a WIP. Before merging, I want to improve EA's
precision a bit and at least fix the test case that is currently marked as
`broken`. I also need to check its impact on compiler performance.

Additionally, I believe this feature is not yet practical.
In particular, there is still significant room for improvement in the
following areas:
- EA's interprocedural capabilities: currently EA is performed ad-hoc
  for limited frames because of latency reasons, which significantly
  reduces its precision in the presence of interprocedural calls.
- Relaxing the `:nothrow` check for finalizer inlining: the current
  algorithm requires `:nothrow`-ness on all paths from the allocation of
  the mutable struct to its last use, which is not practical for
  real-world cases. Even when `:nothrow` cannot be guaranteed, auxiliary
  optimizations such as inserting a `finalize` call after the last use
  might still be possible.
aviatesk added a commit that referenced this issue Oct 2, 2024
E.g. this allows `finalizer` inlining in the following case:
```julia
mutable struct ForeignBuffer{T}
    const ptr::Ptr{T}
end
const foreign_buffer_finalized = Ref(false)
function foreign_alloc(::Type{T}, length) where T
    ptr = Libc.malloc(sizeof(T) * length)
    ptr = Base.unsafe_convert(Ptr{T}, ptr)
    obj = ForeignBuffer{T}(ptr)
    return finalizer(obj) do obj
        Base.@assume_effects :notaskstate :nothrow
        foreign_buffer_finalized[] = true
        Libc.free(obj.ptr)
    end
end
function f_EA_finalizer(N::Int)
    workspace = foreign_alloc(Float64, N)
    GC.@preserve workspace begin
        (;ptr) = workspace
        Base.@assume_effects :nothrow @noinline println(devnull, "ptr = ", ptr)
    end
end
```
```julia
julia> @code_typed f_EA_finalizer(42)
CodeInfo(
1 ── %1  = Base.mul_int(8, N)::Int64
│    %2  = Core.lshr_int(%1, 63)::Int64
│    %3  = Core.trunc_int(Core.UInt8, %2)::UInt8
│    %4  = Core.eq_int(%3, 0x01)::Bool
└───       goto #3 if not %4
2 ──       invoke Core.throw_inexacterror(:convert::Symbol, UInt64::Type, %1::Int64)::Union{}
└───       unreachable
3 ──       goto #4
4 ── %9  = Core.bitcast(Core.UInt64, %1)::UInt64
└───       goto #5
5 ──       goto #6
6 ──       goto #7
7 ──       goto #8
8 ── %14 = $(Expr(:foreigncall, :(:malloc), Ptr{Nothing}, svec(UInt64), 0, :(:ccall), :(%9), :(%9)))::Ptr{Nothing}
└───       goto #9
9 ── %16 = Base.bitcast(Ptr{Float64}, %14)::Ptr{Float64}
│    %17 = %new(ForeignBuffer{Float64}, %16)::ForeignBuffer{Float64}
└───       goto #10
10 ─ %19 = $(Expr(:gc_preserve_begin, :(%17)))
│    %20 = Base.getfield(%17, :ptr)::Ptr{Float64}
│          invoke Main.println(Main.devnull::Base.DevNull, "ptr = "::String, %20::Ptr{Float64})::Nothing
│          $(Expr(:gc_preserve_end, :(%19)))
│    %23 = Main.foreign_buffer_finalized::Base.RefValue{Bool}
│          Base.setfield!(%23, :x, true)::Bool
│    %25 = Base.getfield(%17, :ptr)::Ptr{Float64}
│    %26 = Base.bitcast(Ptr{Nothing}, %25)::Ptr{Nothing}
│          $(Expr(:foreigncall, :(:free), Nothing, svec(Ptr{Nothing}), 0, :(:ccall), :(%26), :(%25)))::Nothing
└───       return nothing
) => Nothing
```

However, this is still a WIP. Before merging, I want to improve EA's
precision a bit and at least fix the test case that is currently marked as
`broken`. I also need to check its impact on compiler performance.

Additionally, I believe this feature is not yet practical.
In particular, there is still significant room for improvement in the
following areas:
- EA's interprocedural capabilities: currently EA is performed ad-hoc
  for limited frames because of latency reasons, which significantly
  reduces its precision in the presence of interprocedural calls.
- Relaxing the `:nothrow` check for finalizer inlining: the current
  algorithm requires `:nothrow`-ness on all paths from the allocation of
  the mutable struct to its last use, which is not practical for
  real-world cases. Even when `:nothrow` cannot be guaranteed, auxiliary
  optimizations such as inserting a `finalize` call after the last use
  might still be possible.
aviatesk added a commit that referenced this issue Oct 4, 2024
E.g. this allows `finalizer` inlining in the following case:
```julia
mutable struct ForeignBuffer{T}
    const ptr::Ptr{T}
end
const foreign_buffer_finalized = Ref(false)
function foreign_alloc(::Type{T}, length) where T
    ptr = Libc.malloc(sizeof(T) * length)
    ptr = Base.unsafe_convert(Ptr{T}, ptr)
    obj = ForeignBuffer{T}(ptr)
    return finalizer(obj) do obj
        Base.@assume_effects :notaskstate :nothrow
        foreign_buffer_finalized[] = true
        Libc.free(obj.ptr)
    end
end
function f_EA_finalizer(N::Int)
    workspace = foreign_alloc(Float64, N)
    GC.@preserve workspace begin
        (;ptr) = workspace
        Base.@assume_effects :nothrow @noinline println(devnull, "ptr = ", ptr)
    end
end
```
```julia
julia> @code_typed f_EA_finalizer(42)
CodeInfo(
1 ── %1  = Base.mul_int(8, N)::Int64
│    %2  = Core.lshr_int(%1, 63)::Int64
│    %3  = Core.trunc_int(Core.UInt8, %2)::UInt8
│    %4  = Core.eq_int(%3, 0x01)::Bool
└───       goto #3 if not %4
2 ──       invoke Core.throw_inexacterror(:convert::Symbol, UInt64::Type, %1::Int64)::Union{}
└───       unreachable
3 ──       goto #4
4 ── %9  = Core.bitcast(Core.UInt64, %1)::UInt64
└───       goto #5
5 ──       goto #6
6 ──       goto #7
7 ──       goto #8
8 ── %14 = $(Expr(:foreigncall, :(:malloc), Ptr{Nothing}, svec(UInt64), 0, :(:ccall), :(%9), :(%9)))::Ptr{Nothing}
└───       goto #9
9 ── %16 = Base.bitcast(Ptr{Float64}, %14)::Ptr{Float64}
│    %17 = %new(ForeignBuffer{Float64}, %16)::ForeignBuffer{Float64}
└───       goto #10
10 ─ %19 = $(Expr(:gc_preserve_begin, :(%17)))
│    %20 = Base.getfield(%17, :ptr)::Ptr{Float64}
│          invoke Main.println(Main.devnull::Base.DevNull, "ptr = "::String, %20::Ptr{Float64})::Nothing
│          $(Expr(:gc_preserve_end, :(%19)))
│    %23 = Main.foreign_buffer_finalized::Base.RefValue{Bool}
│          Base.setfield!(%23, :x, true)::Bool
│    %25 = Base.getfield(%17, :ptr)::Ptr{Float64}
│    %26 = Base.bitcast(Ptr{Nothing}, %25)::Ptr{Nothing}
│          $(Expr(:foreigncall, :(:free), Nothing, svec(Ptr{Nothing}), 0, :(:ccall), :(%26), :(%25)))::Nothing
└───       return nothing
) => Nothing
```

However, this is still a WIP. Before merging, I want to improve EA's
precision a bit and at least fix the test case that is currently marked as
`broken`. I also need to check its impact on compiler performance.

Additionally, I believe this feature is not yet practical.
In particular, there is still significant room for improvement in the
following areas:
- EA's interprocedural capabilities: currently EA is performed ad-hoc
  for limited frames because of latency reasons, which significantly
  reduces its precision in the presence of interprocedural calls.
- Relaxing the `:nothrow` check for finalizer inlining: the current
  algorithm requires `:nothrow`-ness on all paths from the allocation of
  the mutable struct to its last use, which is not practical for
  real-world cases. Even when `:nothrow` cannot be guaranteed, auxiliary
  optimizations such as inserting a `finalize` call after the last use
  might still be possible.
aviatesk added a commit that referenced this issue Oct 4, 2024
E.g. this allows `finalizer` inlining in the following case:
```julia
mutable struct ForeignBuffer{T}
    const ptr::Ptr{T}
end
const foreign_buffer_finalized = Ref(false)
function foreign_alloc(::Type{T}, length) where T
    ptr = Libc.malloc(sizeof(T) * length)
    ptr = Base.unsafe_convert(Ptr{T}, ptr)
    obj = ForeignBuffer{T}(ptr)
    return finalizer(obj) do obj
        Base.@assume_effects :notaskstate :nothrow
        foreign_buffer_finalized[] = true
        Libc.free(obj.ptr)
    end
end
function f_EA_finalizer(N::Int)
    workspace = foreign_alloc(Float64, N)
    GC.@preserve workspace begin
        (;ptr) = workspace
        Base.@assume_effects :nothrow @noinline println(devnull, "ptr = ", ptr)
    end
end
```
```julia
julia> @code_typed f_EA_finalizer(42)
CodeInfo(
1 ── %1  = Base.mul_int(8, N)::Int64
│    %2  = Core.lshr_int(%1, 63)::Int64
│    %3  = Core.trunc_int(Core.UInt8, %2)::UInt8
│    %4  = Core.eq_int(%3, 0x01)::Bool
└───       goto #3 if not %4
2 ──       invoke Core.throw_inexacterror(:convert::Symbol, UInt64::Type, %1::Int64)::Union{}
└───       unreachable
3 ──       goto #4
4 ── %9  = Core.bitcast(Core.UInt64, %1)::UInt64
└───       goto #5
5 ──       goto #6
6 ──       goto #7
7 ──       goto #8
8 ── %14 = $(Expr(:foreigncall, :(:malloc), Ptr{Nothing}, svec(UInt64), 0, :(:ccall), :(%9), :(%9)))::Ptr{Nothing}
└───       goto #9
9 ── %16 = Base.bitcast(Ptr{Float64}, %14)::Ptr{Float64}
│    %17 = %new(ForeignBuffer{Float64}, %16)::ForeignBuffer{Float64}
└───       goto #10
10 ─ %19 = $(Expr(:gc_preserve_begin, :(%17)))
│    %20 = Base.getfield(%17, :ptr)::Ptr{Float64}
│          invoke Main.println(Main.devnull::Base.DevNull, "ptr = "::String, %20::Ptr{Float64})::Nothing
│          $(Expr(:gc_preserve_end, :(%19)))
│    %23 = Main.foreign_buffer_finalized::Base.RefValue{Bool}
│          Base.setfield!(%23, :x, true)::Bool
│    %25 = Base.getfield(%17, :ptr)::Ptr{Float64}
│    %26 = Base.bitcast(Ptr{Nothing}, %25)::Ptr{Nothing}
│          $(Expr(:foreigncall, :(:free), Nothing, svec(Ptr{Nothing}), 0, :(:ccall), :(%26), :(%25)))::Nothing
└───       return nothing
) => Nothing
```

However, this is still a WIP. Before merging, I want to improve EA's
precision a bit and at least fix the test case that is currently marked as
`broken`. I also need to check its impact on compiler performance.

Additionally, I believe this feature is not yet practical.
In particular, there is still significant room for improvement in the
following areas:
- EA's interprocedural capabilities: currently EA is performed ad-hoc
  for limited frames because of latency reasons, which significantly
  reduces its precision in the presence of interprocedural calls.
- Relaxing the `:nothrow` check for finalizer inlining: the current
  algorithm requires `:nothrow`-ness on all paths from the allocation of
  the mutable struct to its last use, which is not practical for
  real-world cases. Even when `:nothrow` cannot be guaranteed, auxiliary
  optimizations such as inserting a `finalize` call after the last use
  might still be possible.
aviatesk added a commit that referenced this issue Oct 4, 2024
E.g. this allows `finalizer` inlining in the following case:
```julia
mutable struct ForeignBuffer{T}
    const ptr::Ptr{T}
end
const foreign_buffer_finalized = Ref(false)
function foreign_alloc(::Type{T}, length) where T
    ptr = Libc.malloc(sizeof(T) * length)
    ptr = Base.unsafe_convert(Ptr{T}, ptr)
    obj = ForeignBuffer{T}(ptr)
    return finalizer(obj) do obj
        Base.@assume_effects :notaskstate :nothrow
        foreign_buffer_finalized[] = true
        Libc.free(obj.ptr)
    end
end
function f_EA_finalizer(N::Int)
    workspace = foreign_alloc(Float64, N)
    GC.@preserve workspace begin
        (;ptr) = workspace
        Base.@assume_effects :nothrow @noinline println(devnull, "ptr = ", ptr)
    end
end
```
```julia
julia> @code_typed f_EA_finalizer(42)
CodeInfo(
1 ── %1  = Base.mul_int(8, N)::Int64
│    %2  = Core.lshr_int(%1, 63)::Int64
│    %3  = Core.trunc_int(Core.UInt8, %2)::UInt8
│    %4  = Core.eq_int(%3, 0x01)::Bool
└───       goto #3 if not %4
2 ──       invoke Core.throw_inexacterror(:convert::Symbol, UInt64::Type, %1::Int64)::Union{}
└───       unreachable
3 ──       goto #4
4 ── %9  = Core.bitcast(Core.UInt64, %1)::UInt64
└───       goto #5
5 ──       goto #6
6 ──       goto #7
7 ──       goto #8
8 ── %14 = $(Expr(:foreigncall, :(:malloc), Ptr{Nothing}, svec(UInt64), 0, :(:ccall), :(%9), :(%9)))::Ptr{Nothing}
└───       goto #9
9 ── %16 = Base.bitcast(Ptr{Float64}, %14)::Ptr{Float64}
│    %17 = %new(ForeignBuffer{Float64}, %16)::ForeignBuffer{Float64}
└───       goto #10
10 ─ %19 = $(Expr(:gc_preserve_begin, :(%17)))
│    %20 = Base.getfield(%17, :ptr)::Ptr{Float64}
│          invoke Main.println(Main.devnull::Base.DevNull, "ptr = "::String, %20::Ptr{Float64})::Nothing
│          $(Expr(:gc_preserve_end, :(%19)))
│    %23 = Main.foreign_buffer_finalized::Base.RefValue{Bool}
│          Base.setfield!(%23, :x, true)::Bool
│    %25 = Base.getfield(%17, :ptr)::Ptr{Float64}
│    %26 = Base.bitcast(Ptr{Nothing}, %25)::Ptr{Nothing}
│          $(Expr(:foreigncall, :(:free), Nothing, svec(Ptr{Nothing}), 0, :(:ccall), :(%26), :(%25)))::Nothing
└───       return nothing
) => Nothing
```

However, this is still a WIP. Before merging, I want to improve EA's
precision a bit and at least fix the test case that is currently marked as
`broken`. I also need to check its impact on compiler performance.

Additionally, I believe this feature is not yet practical.
In particular, there is still significant room for improvement in the
following areas:
- EA's interprocedural capabilities: currently EA is performed ad-hoc
  for limited frames because of latency reasons, which significantly
  reduces its precision in the presence of interprocedural calls.
- Relaxing the `:nothrow` check for finalizer inlining: the current
  algorithm requires `:nothrow`-ness on all paths from the allocation of
  the mutable struct to its last use, which is not practical for
  real-world cases. Even when `:nothrow` cannot be guaranteed, auxiliary
  optimizations such as inserting a `finalize` call after the last use
  might still be possible.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants