Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include only required fields in data #195

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

tkf
Copy link
Contributor

@tkf tkf commented Oct 11, 2019

This PR makes the following snippet work

using ElectronDisplay
using VegaLite
[
    (
        x = randn(),
        y = randn(),
        dummy = randn(10^5),
    )
    for _ in 1:100
] |>
@vlplot(:point, x=:x, y=:y)

Before this PR, it hangs as described in davidanthoff/Electron.jl#37. While this is not the direct solution for davidanthoff/Electron.jl#37, I think this still is a nice feature to have as this can drastically reduce generated JSON sometimes. The only downside may be that you can't open the plot in Vega Editor and use different fields on the fly. I think we can add global and per-plot configuration for this if that's needed.

@codecov-io
Copy link

codecov-io commented Oct 11, 2019

Codecov Report

Merging #195 into master will increase coverage by 0.7%.
The diff coverage is 87.5%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master     #195     +/-   ##
=========================================
+ Coverage   84.61%   85.32%   +0.7%     
=========================================
  Files          12       12             
  Lines         416      436     +20     
=========================================
+ Hits          352      372     +20     
  Misses         64       64
Impacted Files Coverage Δ
src/rendering/show.jl 74.54% <66.66%> (ø) ⬆️
src/rendering/render.jl 7.4% <66.66%> (+7.4%) ⬆️
src/vlspec.jl 96.36% <94.44%> (-0.94%) ⬇️
src/spec_utils.jl 88.88% <0%> (+5.55%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cd3ca34...35bad05. Read the comment docs.

@davidanthoff
Copy link
Member

Yeah, this is a good idea!

Does it handle nested specs, and fields that appear in transforms and things like that? Essentially there are many places in a spec where a field name can appear, and I think we need to find all of them, right?

@tkf
Copy link
Contributor Author

tkf commented Oct 12, 2019

The code in this PR accumulates all values in the dicts with the key named field. This happens recursively (e.g., dict-in-dict, dict-in-vector-in-dict...):

VegaLite.jl/src/vlspec.jl

Lines 94 to 103 in 35bad05

push_field!(fields, _) = fields
push_field!(fields, xs::AbstractVector) = foldl(push_field!, xs; init=fields)
function push_field!(fields, dict::AbstractDict)
f = get(dict, "field", nothing)
f !== nothing && push!(fields, string(f))
for v in values(dict)
push_field!(fields, v)
end
return fields
end

except for data property which can contain anything (e.g., a Dataframe can have a column named field.)

I think this covers transforms. Is there other ways that fields can be referred to in vega-lite spec?

@davidanthoff
Copy link
Member

davidanthoff commented Oct 12, 2019

Hm, filter transforms for example can reference columns in their filter expression? Lookup can have something called fields that is a vector. And that is just a very cursory look at things, who knows what else there might be, I actually don't have a very good sense of the scope of vega-lite :)

@davidanthoff
Copy link
Member

I wonder whether there is some function in vega that one can pass a spec to and that spits out which columns are actually needed...

@tkf
Copy link
Contributor Author

tkf commented Oct 12, 2019

Oh, I wasn't aware of filtering expression. That's pretty cool but it makes this approach super difficult...

@tkf
Copy link
Contributor Author

tkf commented Oct 12, 2019

Maybe do this only when there is no transform? I think it'd cover a lot of cases.

@davidanthoff
Copy link
Member

I've asked on the vega-dev channel whether there is some function that might tell us all the columns that are needed, if that exists it would be an easy solution :)

And yes, if not, only doing this when there are no transforms might work. I'm just a bit nervous that there might be still other places that we are missing where they reference a field in some other way. But maybe we could just ask that on the vega-lite dev channel, they would know.

@tkf
Copy link
Contributor Author

tkf commented Oct 12, 2019

Thanks for asking this to the vega team!

@davidanthoff
Copy link
Member

I'm pretty optimistic that they might have something like that, after all they also support a scenario where you can have a SQL backend, and they ship a SQL query back to you that filters exactly the data they need, and I would think that they also need a list of columns they need for that...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants