Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

can i summarize a datasource? #83

Closed
floswald opened this issue Feb 2, 2017 · 2 comments
Closed

can i summarize a datasource? #83

floswald opened this issue Feb 2, 2017 · 2 comments

Comments

@floswald
Copy link
Contributor

floswald commented Feb 2, 2017

Hi!

sorry if this is dumb question, but I would like to do this

data = [1,2,3]
3-element Array{Int64,1}:
 1
 2
 3

julia> x = @from i in data begin
           @select mean(i)
           @collect
       end
3-element Array{Float64,1}:
 1.0
 2.0
 3.0

when really i wanted to have mean(data). is there way to do this? I guess I'm looking for @transform or @summarize. thanks!

@davidanthoff
Copy link
Member

There is no elegant solution at this point, but I hope to add something like @summarize down the road (#84).

For now, here is what you can do. If you want to summarize the whole query, i.e. not based on some grouping, then you can just do this by hand after the query has run:

df = DataFrame(a=[1,2,3], b=[4,5,6])
x = @from i in df begin
    @select i
end
x2 = mean(collect(@select(x,i->i.a)))

This query is slightly more involved than yours because it uses columns, if your query just returned a list of Ints, then you could obviously skip the whole @select statement in the last line. The collect in the last line is annoying, but right now required (#85).

@floswald
Copy link
Contributor Author

floswald commented Nov 2, 2017

closed:

df = DataFrame(name=repeat(["John", "Sally", "Kirk"],inner=[1],outer=[2]), 
                   age=vcat([10., 20., 30.],[10., 20., 30.].+3), 
                   children=repeat([3,2,2],inner=[1],outer=[2]),state=[:a,:a,:a,:b,:b,:b])

    x = @from i in df begin
        @group i by i.state into g
        @select {group=g.key,mage=mean(g..age), oldest=maximum(g..age), youngest=minimum(g..age)}
        @collect DataFrame
    end

@floswald floswald closed this as completed Nov 2, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants