Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arrays and Regex Substition Strings #15069

Closed
Betawolf opened this issue Feb 14, 2016 · 8 comments
Closed

Arrays and Regex Substition Strings #15069

Betawolf opened this issue Feb 14, 2016 · 8 comments
Labels
bug Indicates an unexpected problem or unintended behavior

Comments

@Betawolf
Copy link

I'm new to the language, so can't be confident, but I was pointed to submit this issue by the #julia IRC channel. I couldn't find a reference to this behaviour in the issues listed here or on the mailing list.

The bug (if that is what it is) appears to be with subsitution strings when the substitution string is placed inside arrays.

str = "tough"
f = r"(^[crt])ough"
t = s"\1ou2f"

println(replace(str, f, t))
#gives 'tou2f' (as expected)

z = [f,""]
y = [t,""]
println(replace(str, z[1], y[1]))
#gives '\1ou2f'

println(replace(str, z[1], t))
#'tou2f'

println(replace(str, f, y[1]))
#'\1ou2f'
#y[1] rather than t is the issue.

The same behaviour appears if you do e.g. y = [s"\1ou2f",""], and the use of \g<1> rather than \1 does not resolve it. I would guess that the capture group is not being passed to the substitution string at all (\0 similarly produces \0ou2f in the above).

Version Info:

Julia Version 0.4.3
Commit a2f713d (2016-01-12 21:37 UTC)
Platform Info:
  System: Linux (x86_64-unknown-linux-gnu)
  CPU: Intel(R) Core(TM) i7-4710HQ CPU @ 2.50GHz
  WORD_SIZE: 64
  BLAS: libopenblas (DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas
  LIBM: libm
  LLVM: libLLVM-3.3
@yuyichao
Copy link
Contributor

The issue is that in y = [t, ""], t is promoted to a string so you get plain string substitution. Not sure if this counts as a bug or not....

@Betawolf
Copy link
Author

After kicking it around with the guys in the IRC channel, we discovered the cause was the promotion, as yuyichao commented. We also found that it could be worked around with y = map(x -> Base.SubstitutionString{ASCIIString}(x), y) to get an array of subsitution strings. However, it does seem to be at least an unexpected/undocumented behaviour.

@Betawolf
Copy link
Author

We also discovered how to avoid the promotion: using y = [t, s""] means y remains a Array{Base.SubstitutionString{ASCIIString},1} rather than being promoted to the common type of "" and t.

@vtjnash
Copy link
Sponsor Member

vtjnash commented Feb 14, 2016

that sounds like a bug in the way Base.SubstitutionString behaves to me.

@StefanKarpinski
Copy link
Sponsor Member

This is an interesting issue. Two possible solutions occur to me:

  1. Make SubstitutionString not a subtype of AbstractString.
  2. Change promotion of SubtitutionString with other kinds of string not to produce UTF8String.

I'm not sure that SubstitutionString really should be a kind of AbstractString. A string represents a sequence of characters; what character is \1?

@nalimilan
Copy link
Member

Changing promotion rules sounds like a good idea. It seems natural to promote standard strings to SubstitutionString rather than the reverse, as SubstitutionString is unlikely to appear in a context where you wouldn't expect it.

As regards inheriting from AbstractString, I'm not sure. Given the name SubstitutionString, it may be surprising for it not to be a subtype of AbstractString. But OTOH Regex isn't a subtype of AbstractString, and I can see no reason why one type would be and not the other. Maybe in practice it doesn't matter.

@vtjnash vtjnash added the bug Indicates an unexpected problem or unintended behavior label Mar 9, 2016
@inkydragon
Copy link
Sponsor Member

replace(::String, ::Regex, ::SubstitutionString{String}) will throw an MethodError now.
New API works fine now.

str = "tough";
f = r"(^[crt])ough";
t = s"\1ou2f";

replace(str, f, t) # throw
replace(str, f => t)

z = [f,""];
y = [t,""];
replace(str, z[1] => y[1])
replace(str, z[1] => t)
replace(str, f => y[1])
julia> str = "tough";
julia> f = r"(^[crt])ough";
julia> t = s"\1ou2f";

julia> replace(str, f, t) # throw
ERROR: MethodError: no method matching replace(::String, ::Regex, ::SubstitutionString{String})
Closest candidates are:
  replace(::Union{Function, Type}, ::Any; count) at set.jl:605
  replace(::String, ::Pair{var"#s75", B} where {var"#s75"<:AbstractChar, B}; count) at strings/util.jl:513
  replace(::String, ::Pair{var"#s72", B} where {var"#s72"<:Union{Tuple{Vararg{AbstractChar, N} where N}, Set{var"#s53"} where var"#s53"<:AbstractChar, AbstractVector{var"#s54"} where var"#s54"<:AbstractChar}, B}; count) at strings/util.jl:518
  ...
Stacktrace:
 [1] top-level scope
   @ REPL[33]:1

julia> replace(str, f => t)
"tou2f"

julia> z = [f,""];
julia> y = [t,""];

julia> replace(str, z[1] => y[1])
"tou2f"
julia> replace(str, z[1] => t)
"tou2f"
julia> replace(str, f => y[1])
"tou2f"

Related pr:

@brenhinkeller
Copy link
Sponsor Contributor

So the replace syntax has changed a bit now, in that now regex and substitution strings must be presented as a pair, separated by the => operator:

str = "tough"
f = r"(^[crt])ough"
t = s"\1ou2f"

julia> replace(str, f=>t)
"tou2f"

Whether related to this change or otherwise, the original problem seems to have been fixed in the intervening years

julia> z = [f,""]
2-element Vector{Any}:
 r"(^[crt])ough"
 ""

julia> y = [t,""]
2-element Vector{AbstractString}:
 s"\1ou2f"
 ""

julia> replace(str, z[1] => y[1])
"tou2f"

julia> replace(str, z[1] => t)
"tou2f"

julia> replace(str, f => y[1])
"tou2f"

julia> versioninfo()
Julia Version 1.8.3
Commit 0434deb161e (2022-11-14 20:14 UTC)
Platform Info:
  OS: macOS (arm64-apple-darwin21.3.0)
  CPU: 10 × Apple M1 Max
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, apple-m1)
  Threads: 1 on 8 virtual cores

So I think we can finally close this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Indicates an unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

7 participants