Decision: remove `_names` and make `names` not perform a copy #1655

bkamins · 2018-12-26T12:34:50Z

We could decide to rename _names to names and avoid making a copy of names held in index (see #1648 (comment)).

The text was updated successfully, but these errors were encountered:

pdeffebach · 2018-12-26T17:38:31Z

names(df) = [:a, :b, :c] always seems intuitive. Would be cool if it were just a getindex / setindex call with some error checking.

bkamins · 2018-12-26T19:09:06Z

The idea is that names are stored in the index either as a Vector{Symbol} or as a view (in case of SubDataFrame or DataFrameRow) and we could give a direct access to the user to these objects (this is a bit risky, as they should really be read-only).

@nalimilan maybe names should return a read-only custom version of an AbstractVector? (similar to what we have with codeunits and String)

nalimilan · 2018-12-26T19:23:33Z

Yes, the point isn't to support names(df) = ..., which isn't idiomatic Julia AFAIK.

We could indeed return a custom AbstractVector to ensure it cannot be modified, but may be a bit overkill. Or maybe a package should provide a simple ReadOnlyVector wrapper that we could use here and in CategoricalArrays? I think we should ask for opinions among Julia developers, as this issue arises in many other situations.

bkamins · 2018-12-26T20:43:56Z

@nalimilan a general purpose package would be good (or a feature in some standard package e.g. from https://github.com/JuliaCollections).
Could you please ask as I will be out for several days and let us know (please mention me in the question so that I can track it)? Thank you.

bkamins · 2019-01-05T21:18:42Z

@nalimilan Have you had a chance to discuss this with Base devs? If no I can ask a question on Discourse.

nalimilan · 2019-01-07T15:56:25Z

See https://discourse.julialang.org/t/convention-for-read-only-array-getters/19356/1.

bkamins · 2019-01-20T16:30:48Z

@nalimilan I have created a proposal of a read-only array type here https://github.com/bkamins/ReadOnlyArrays.jl.

bkamins · 2019-01-24T12:56:21Z

Any opinion on https://github.com/bkamins/ReadOnlyArrays.jl?

is it OK?
should we use it (then probably I should register this package)?
or should we move this code somewhere else?

nalimilan · 2019-01-25T08:51:07Z

Sounds good, but I'd like a broad agreement that ReadOnlyArray should be the standard way to do this. It would be weird if this particular function in DataFrames was the only user of that feature.

bkamins · 2019-02-06T08:37:59Z

Given the response and discussion I would propose that we remove _names and expose internal vector names (like levels in CategoricalArrays.jl). We should also document it.

Probably we should throw a depwarn here then giving a hint to use copy(names(df)) if one wants a copy.

Do we go ahead this way?

nalimilan · 2019-02-06T09:05:17Z

Fine we me. The advantage with this solution is that we can change it later to return a ReadOnlyArray if we want, it won't break valid code. Adding a deprecation would be annoying since there's no way to avoid it (except for adding a temporary argument to names, which then will have to be deprecated later...).

An alternative solution would be to have names return the Index object and make it <:AbstractVector{Symbol} as evoked at #1712. That way it couldn't be modified as a plain Vector.

bkamins · 2019-02-06T11:06:39Z

The problem with:

An alternative solution would be to have names return the Index object

is that getindex on AbstractIndex does not return Symbol but Int (or vector of them). That is why in #1712 I considered returning AbstractIndex but not when names is called, but when keyindex/colindex is called.

bkamins · 2019-09-02T21:56:27Z

@nalimilan - I am not sure what to do with this. But for 1.0 I think we can leave names as is as I would assume everyone is used to it and it is almost surely not called in performance critical code (if you agree please close - or comment).

nalimilan · 2019-09-03T10:04:18Z

How about returning the internal vector without copying, as you suggested above? It probably won't break existing code, and that way we can do whatever we want after 1.0 (including making a copy if it turns out it creates bugs in user code).

bkamins · 2019-09-03T13:03:16Z

This goes against "safety" approach we have taken recently. Who knows where people used names(df) in their code. Maybe it gets modified.

I would go for this if the performance gain would be really important, but in this case I think it does not really matter if we are fast with this operation or not, so it is safer to perform copying.

nalimilan · 2019-09-03T14:01:36Z

OK, as you prefer.

bkamins mentioned this issue Dec 26, 2018

better parent for view #1648

Merged

bkamins mentioned this issue Jan 15, 2019

DataFrames.jl roadmap #1678

Closed

31 tasks

bkamins mentioned this issue Feb 6, 2019

Make names return internal names #1715

Closed

nalimilan closed this as completed Sep 3, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decision: remove `_names` and make `names` not perform a copy #1655

Decision: remove `_names` and make `names` not perform a copy #1655

bkamins commented Dec 26, 2018

pdeffebach commented Dec 26, 2018 •

edited

Loading

bkamins commented Dec 26, 2018

nalimilan commented Dec 26, 2018

bkamins commented Dec 26, 2018

bkamins commented Jan 5, 2019

nalimilan commented Jan 7, 2019

bkamins commented Jan 20, 2019

bkamins commented Jan 24, 2019

nalimilan commented Jan 25, 2019

bkamins commented Feb 6, 2019

nalimilan commented Feb 6, 2019

bkamins commented Feb 6, 2019

bkamins commented Sep 2, 2019

nalimilan commented Sep 3, 2019

bkamins commented Sep 3, 2019

nalimilan commented Sep 3, 2019

Decision: remove _names and make names not perform a copy #1655

Decision: remove _names and make names not perform a copy #1655

Comments

bkamins commented Dec 26, 2018

pdeffebach commented Dec 26, 2018 • edited Loading

bkamins commented Dec 26, 2018

nalimilan commented Dec 26, 2018

bkamins commented Dec 26, 2018

bkamins commented Jan 5, 2019

nalimilan commented Jan 7, 2019

bkamins commented Jan 20, 2019

bkamins commented Jan 24, 2019

nalimilan commented Jan 25, 2019

bkamins commented Feb 6, 2019

nalimilan commented Feb 6, 2019

bkamins commented Feb 6, 2019

bkamins commented Sep 2, 2019

nalimilan commented Sep 3, 2019

bkamins commented Sep 3, 2019

nalimilan commented Sep 3, 2019

Decision: remove `_names` and make `names` not perform a copy #1655

Decision: remove `_names` and make `names` not perform a copy #1655

pdeffebach commented Dec 26, 2018 •

edited

Loading