Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BoundsError when joining AnnotatedStrings with distinct label orderings #54860

Closed
caleb-allen opened this issue Jun 20, 2024 · 1 comment · Fixed by #54917
Closed

BoundsError when joining AnnotatedStrings with distinct label orderings #54860

caleb-allen opened this issue Jun 20, 2024 · 1 comment · Fixed by #54917
Assignees
Labels
strings "Strings!"

Comments

@caleb-allen
Copy link

I think I've encountered a bug which occurs when joining AnnotatedStrings with annotations that have not been constructed in the same manner as StyledStrings, specifically with inconsistent ordering of annotation labels between strings.

With simple annotations on two AnnotatedString instances, join works as expected:

julia> import Base: AnnotatedString, annotatedstring, annotations, annotate!

julia> a = AnnotatedString("the quick fox ", [(1:14, :FOO => "bar")])
"the quick fox "

julia> b = AnnotatedString("jumped over the lazy dog", [(1:24, :FOO => "bar")])
"jumped over the lazy dog"

julia> annotations(a * b) # concat only, without joining the annotations
2-element Vector{Tuple{UnitRange{Int64}, Pair{Symbol, Any}}}:
 (1:14, :FOO => "bar")
 (15:38, :FOO => "bar")

julia> annotations(join([a, b]))
1-element Vector{Tuple{UnitRange{Int64}, Pair{Symbol, Any}}}:
 (1:38, :FOO => "bar")

However, if we attempt to join the above string a with an annotated string whose labels are inserted in a different order, it results in a BoundsError:

julia> c = AnnotatedString("jumped over the lazy dog", [(1:5, :BAZ => "bar"), (1:24, :FOO => "bar")])
"jumped over the lazy dog"

julia> annotations(a * c)
3-element Vector{Tuple{UnitRange{Int64}, Pair{Symbol, Any}}}:
 (1:14, :FOO => "bar")
 (15:19, :BAZ => "bar")
 (15:38, :FOO => "bar")

julia> annotations(join([a, c]))
ERROR: BoundsError: attempt to access 1-element Vector{Tuple{UnitRange{Int64}, Pair{Symbol, Any}}} at index [0]
Stacktrace:
  [1] throw_boundserror(A::Vector{Tuple{UnitRange{Int64}, Pair{Symbol, Any}}}, I::Tuple{Int64})
    @ Base ./essentials.jl:14
  [2] getindex
    @ ./essentials.jl:892 [inlined]
  [3] _insert_annotations!(io::Base.AnnotatedIOBuffer, annotations::Vector{Tuple{UnitRange{…}, Pair{…}}}, offset::Int64)
    @ Base ./strings/annotated.jl:600
  [4] _insert_annotations!
    @ ./strings/annotated.jl:591 [inlined]
  [5] write
    @ ./strings/annotated.jl:499 [inlined]
  [6] print
    @ ~/.julia/juliaup/julia-1.11.0-beta2+0.x64.linux.gnu/share/julia/stdlib/v1.11/StyledStrings/src/io.jl:255 [inlined]
  [7] join(io::Base.AnnotatedIOBuffer, iterator::Vector{AnnotatedString{String}}, delim::String)
    @ Base ./strings/io.jl:352
  [8] join
    @ ./strings/io.jl:349 [inlined]
  [9] _join_preserve_annotations(::Vector{AnnotatedString{String}})
    @ Base ./strings/io.jl:359
 [10] join(iterator::Vector{AnnotatedString{String}})
    @ Base ./strings/io.jl:366
 [11] top-level scope
    @ REPL[30]:1
Some type information was truncated. Use `show(err)` to see complete types.

It appears that the BoundsError does not occur if the "joined" annotation is ordered first on both strings (:FOO first for each)

julia> d = AnnotatedString("jumped over the lazy dog", [(1:24, :FOO => "bar"), (1:5, :BAZ => "bar")])
"jumped over the lazy dog"

julia> join([a, d])
"the quick fox jumped over the lazy dog"

julia> join([a, d]) |> annotations
2-element Vector{Tuple{UnitRange{Int64}, Pair{Symbol, Any}}}:
 (1:38, :FOO => "bar")
 (15:19, :BAZ => "bar")

This may be related to #54561 as the stacktrace shows join being dispatched to StyledStrings.

This bug is present on Julia 1.11.0-beta2, installed via juliaup

julia> versioninfo()
Julia Version 1.11.0-beta2
Commit edb3c92d6a6 (2024-05-29 09:37 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 8 × Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, skylake)
Threads: 8 default, 0 interactive, 4 GC (on 8 virtual cores)
Environment:
  JULIA_NUM_THREADS = auto
  JULIA_PKG_USE_CLI_GIT = true
  JULIA_TEST_FAILFAST = true
  JULIA_PKG_PRESERVE_TIERED_INSTALLED = true
@Seelengrab Seelengrab added the strings "Strings!" label Jun 20, 2024
@tecosaur
Copy link
Contributor

Thanks for the detailed bug report! I'll have a look at this on the weekend.

tecosaur added a commit to tecosaur/julia that referenced this issue Jun 24, 2024
As raised in JuliaLang#54860, when writing to an AnnotatedIOBuffer, should the
new content have more annotations than the AnnotatedIOBuffer, we may
attempt to index non-existent annotations.

This bug occurs in the process of looking for runs of matched
annotations. Since it is impossible for a run to be longer than the
number of existing annotations, we can add this as a sanity check and
not bother trying to check for run lengths where this is not the case.

Reported-by: caleb-allen <caleb.e.allen@gmail.com>
KristofferC pushed a commit that referenced this issue Jul 23, 2024
Fixes #54860, see the commit message for more details.

The added test serves as a MWE of the original bug report.

(cherry picked from commit e621c74)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
strings "Strings!"
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants