Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: deprecate * in favor of new ++ concatenation operator #22461

Closed
wants to merge 2 commits into from

Conversation

stevengj
Copy link
Member

@stevengj stevengj commented Jun 21, 2017

As discussed in #11030, this PR deprecates * in favor of a new all-purpose concatenation operator ++. (++ has been parsed as a binary operator since Julia 0.5, so Compat support shouldn't be a problem.)

a ++ b defaults to vcat, and since ++ is chaining you get vcat(a,b,c) from a++b++c. However, there is a general Base.concat_rule mechanism (analogous to promote_rule) that can be used to customize concatenation. Concatenation of an AbstractString with anything that is not an array calls string, e.g. "foo" ++ '-' ++ "bar" ++ 3 produces "foo-bar3", but "foo" ++ [1] produces ["foo", 1].

string^n is deprecated in favor of repeat(string,n).

Also, I lowered the precedence of ++. Previously, it was the same precedence as +, but that seems wrong for concatenation. You want a ++ 1:n to parse as a ++ (1:n), not (a++1):n. Hence, this is technically a breaking change, but I doubt that anyone is affected.

  • Pro: ++ eliminates the type-pun with * (see also one for AbstractString #19536), and gives us an explicit operator dedicated only to concatenation. It is already much more general than * for strings (e.g. string++char works), and it is more easily extensible if we want to support char ++ char == string, for example, or define ++ for concatenation of Set, Dict, and the like. When optimizing code, it makes it easy to search for instances of repeated concatenation (since this tends to be inefficient).

  • Con: Will cause a lot of code churn (witness the size of this PR).

To do:

  • Get tests passing. (So far, I just got it to the point of being buildable.)
  • More documentation
  • Concatenation rules for other container types, e.g. Set?

The basic question here is whether people like this overall approach: ++ as a general concatenation operator (not just strings), with a concat_rule mechanism to decide on the resulting container. (One issue with concat_rule is that the resulting concatenation may not be associative, i.e. a ++ b ++ c may be different from a ++ (b ++ c).)

cc: @StefanKarpinski

@stevengj stevengj added deprecation This change introduces or involves a deprecation strings "Strings!" labels Jun 21, 2017
@tknopp
Copy link
Contributor

tknopp commented Jun 21, 2017

I am against this. @ararslan made a good comment in #11030 (comment) which I fully agree with.

I teach automata theory and formal languages and in this field it is very common to use * and ^ on strings. I don't see what problem is solved by this change.

@stevengj
Copy link
Member Author

I don't have strong feelings, but the main advantage as I see it is that this allows us to easily extend the concatenation operation beyond string*string.

@tknopp
Copy link
Contributor

tknopp commented Jun 21, 2017

yes but string concatenation is not the same as array concatenation. Your example with foo-bar3 shows this best. If you want to concatenate two things currently hcat and vcat always give me an array back. Now, if for some reason the two things are Strings, I will get something different. From my perspective this is confusing.

@iamed2
Copy link
Contributor

iamed2 commented Jun 21, 2017

@tknopp This doesn't actually change the behaviour of hcat or vcat

@iamed2
Copy link
Contributor

iamed2 commented Jun 21, 2017

Would it be reasonable to define:

Base.concat_rule(::Number, ::Number) = (@_inline_meta; Array)

?

@tknopp
Copy link
Contributor

tknopp commented Jun 21, 2017

I know, but this also implies that one really should not use ++ for arrays since hcat and vcat are more reliable in what they return. This then again makes me wonder, which problem is supposed to be solved by ++?

@stevengj
Copy link
Member Author

@iamed2, vcat is already the default in this PR, so no need for such a rule.

@ararslan
Copy link
Member

I am very much against this. My specific objections:

  • Using * makes sense mathematically, as is noted in the documentation
  • This is a lot of churn and solves no actual problem for strings
  • General concatenation is a separate issue and shouldn't apply to strings, which should generally not be considered like arrays
  • IMO a special infix operator for general concatenation is unnecessary (I don't care too much about this though)

@carlobaldassi
Copy link
Member

Besides the usual arguments against this change that were written a hundred times, which I won't repeat here, I'm not convinced by the pro's at all:

  1. Defining string*char to perform concatenation is perfectly doable right now, as it's currently a method error.
  2. Concatenating Sets: isn't that union? We already have that. How else would it be defined?
  3. Concatenating Dicts: isn't that merge, which we also already have and describes much better the operation? Also, both union and merge have a mutating counterpart, which the concatenation operator would not have.
  4. More generally, I'm skeptical that there is much relevant code that could apply ++ in a generic way, regardless of the type, and thus I'm not sold on the benefit of having a generic concatenation operation. Unless that operation is basically vcat, and we already have that. I may be wrong, of course.
  5. If vcat is the generic default concatenation operation, we even already have a dedicated syntax for that: [a; b; c]. With ++, we would have two, which seems at least a little weird. (I think that the vcat name is a little unfortunate if it needs to be generic, but that's another story.)
  6. By the way if we really wanted we could even define vcat(s::Union{AbstractString,Char}...) = string(s...), no? That would settle the generic code thing. Also, the proposed concat_rule scheme could still apply, it does not depend on having ++ as an operator at all. And people that deeply hate * for string concatenation with a passion could even use ["abc"; "def"; 'g'; 'h'; "ijk"]... (except I guess they'd hate it even more?)

(Please don't take the last two points too seriously. What I'm arguing is that ++ really only makes sense/is needed for AbstractStrings and Chars and thus its merits should be judged only on that, i.e. on disambiguation from multiplication.)

Also, more (minor) cons: we lose ^ for repetition, and *= for appending, which I find mildly annoying. At least the latter could be provided via ++=. Yes, it's not-mutating for Strings, but none of the op= operators are, without the leading dot, so it's not hard to understand.

@nalimilan
Copy link
Member

Also, more (minor) cons: we lose ^ for repetition, and *= for appending, which I find mildly annoying. At least the latter could be provided via ++=. Yes, it's not-mutating for Strings, but none of the op= operators are, without the leading dot, so it's not hard to understand.

That could be an argument in favor of using ++: since ++= is often a performance trap when used to build a string by repeated concatenation (because of repeated allocations), that operator shouldn't exist at all.

Regarding concat_rule, I really think we need something like this, and I had made a PR at #20815. See discussion there.

@stevengj
Copy link
Member Author

stevengj commented Jun 22, 2017

I've been skeptical of the need for a renamed string-concatenation operator, but arguments about this have been going on for years, and ++ seemed like the most reasonable proposed replacement. So, I implemented this PR as an experiment, to help us resolve this issue (hopefully) once and for all:

  • How much code churn/inconvenience would be caused by a rename?
  • Would the ability to have a generic concatenation operator expose some elegant new possibilities/simplifications?
  • Is string-handling code significantly nicer with this PR?

Personally, my conclusion from this experiment is that it's probably not worth it. None of the string code seems to be much improved by this change, and the elimination of string^n is a pain. There's a lot of churn (and it's not even done yet!), and it's really hard to be sure that I've fixed all the usages since our tests might not exercise all code paths. As others have pointed out, a generic concatenation operator doesn't seem too useful because you're unlikely to share code between operations on strings and arrays. And we already have [a; b] as a vcat shortcut. Besides which, making concatenation easier for other types may be counterproductive as concatenation (as opposed to in-place mutation) is usually pretty inefficient.

However, I would be in favor of defining char*char = string and string*char = string and char^n = string, now that Char is no longer a subtype of Integer. I find the lack of these operations really annoying, and in making this PR I found zillions of places (> 50) where a single-character String was allocated just to do concatenation or " "^n or similar.

@nalimilan
Copy link
Member

I don't think the main arguments in favor of deprecating * were about code simplifications or new possibilities. They were more about that fact that * has a very different meaning for strings than for numbers or matrices, and that it has no precedent for string concatenation in any other language. Maybe a general concatenation operator isn't so useful after all, but that doesn't completely settle this debate; ++ could be reserved for string operations, which would at least give it a clear meaning.

Anyway no discussion will solve this, somebody just needs to make a decision.

@tknopp
Copy link
Contributor

tknopp commented Jun 22, 2017

They were more about that fact that * has a very different meaning for strings than for numbers or matrices

Where is the issue? For numbers and matrices * also has a pretty different meaning. The non-commutative "issue" is the same for matrices. By this argument matrix multiplication also requires a new operator. (see also #11030 (comment))

@nalimilan
Copy link
Member

@tknopp I'll just point to arguments @StefanKarpinski gave in the other issue here and here. It's true that multiplication has very different meanings for numbers and matrices, but at least there's a very strong tradition of calling these operations "multiplication". That's not the case for strings. (Let's not repeat this discussion, which has already happened in #1130 AFAICT.)

@tknopp
Copy link
Contributor

tknopp commented Jun 22, 2017

sure, I will be quiet. But bringing such a PR on the table is a proactive action to get this change in. So whats the way to argument against a PR if discussions are just allowed in #1130?

@nalimilan
Copy link
Member

Sorry, I didn't want to imply that comments are prohibited on this PR, I was just feeling guilty of reviving the same discussion again. If you have new arguments, by all means give them.

@simonbyrne
Copy link
Contributor

We can't use 🐈? I'm sure that would make @ararslan happy.

@pkofod
Copy link
Contributor

pkofod commented Jun 24, 2017

[noise](if this gets merged, my only public, solo-authored package is never going to take off as I had expected https://github.com/pkofod/PlusPlus.jl )[/noise]

@KristofferC
Copy link
Sponsor Member

Whatever happens to this, when it happens #11030 should be closed.

@ararslan
Copy link
Member

Given that consensus seems to be against this change, and indeed the author of this PR himself does not support the change, plus we now have * for strings and chars, I think we should just close this and finally lay the issue to rest.

@StefanKarpinski StefanKarpinski added this to the 1.0 milestone Jul 18, 2017
@andyferris
Copy link
Member

Completely separately to strings, it's my feeling that having a general concatenation operator and naming it ++ will be useful. The main argument I can see against it is performance (though I think this is surmountable with laziness and specializations for reduce(++, collection) and so on). I think a lazy concatenation iterator (like zip but sequential iteration - maybe we have this?) as a fallback for ++ would mean that, yes, this could totally be used in generic code.

The consequence of doing that though is that ++ starts to make sense for strings. It was interesting that someone pointed out that [a; b] calls vcat... though I feel that such syntax is misleading for strings (as a reader I'd expect an array coming out). (And if repeat is too painful, there is always **...)

Anyway, that's my opinion for the record but I won't stand in the way of closing this.

@ararslan
Copy link
Member

Strings shouldn't be conflated with arrays, which is why it makes sense to separate general iterable concatenation from string concatenation. vcat is a perfectly good general iterable concatenation operator, but should not be used on strings, which is fine.

@JeffBezanson
Copy link
Sponsor Member

vcat is very different from sequence concatenation: if you vcat two matrices, the result has a different linear iteration order than iterating the first argument followed by iterating the second argument.

@ararslan
Copy link
Member

Sequence concatenation doesn't really make sense for matrices anyway. I guess if you want to iterate one after the other you can do vcat(vec(a), vec(b)).

@andyferris
Copy link
Member

@ararslan Iteration is the basic interface I'm talking about here... we don't want const ++ = vcat. If strings iterate characters, then you can in theory make an object that iterates characters from two strings in order. It might be a String or some other AbstractString or even something else entirely. (Arrays are orthogonal, IMO, they can do their own thing as long as they don't violate the iteration property).

@iamed2
Copy link
Contributor

iamed2 commented Jul 20, 2017

@andyferris Iterators.flatten or IterTools.chain does lazy sequence concatenation

@JeffBezanson
Copy link
Sponsor Member

Triage thinks we should consider ++ later as a new feature (generic concatenation) and leave strings alone for now.

@ararslan
Copy link
Member

Thanks for doing the science here, @stevengj. Feels good to finally have this settled. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
deprecation This change introduces or involves a deprecation strings "Strings!"
Projects
None yet
Development

Successfully merging this pull request may close these issues.