-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Column-wise operations in Learning Networks #271
Comments
Good question. Answer is yes. You can think as a node for a proxy for data. Anything you can do to data you can do to a node that is a proxy for that kind of data, using the julia> X = DataFrame(rand(10,3))
10×3 DataFrame
│ Row │ x1 │ x2 │ x3 │
│ │ Float64 │ Float64 │ Float64 │
├─────┼──────────┼───────────┼────────────┤
│ 1 │ 0.518405 │ 0.869467 │ 0.676809 │
│ 2 │ 0.16694 │ 0.381958 │ 0.840221 │
│ 3 │ 0.832859 │ 0.285973 │ 0.166679 │
│ 4 │ 0.32379 │ 0.183428 │ 0.00638633 │
│ 5 │ 0.24599 │ 0.282592 │ 0.539567 │
│ 6 │ 0.980281 │ 0.945239 │ 0.267164 │
│ 7 │ 0.544812 │ 0.0105223 │ 0.480417 │
│ 8 │ 0.74989 │ 0.444539 │ 0.64379 │
│ 9 │ 0.041312 │ 0.606745 │ 0.100776 │
│ 10 │ 0.119828 │ 0.930362 │ 0.419426 │
julia> Xs = source(X)
Source{:input} @ 3…09
julia> Y = node(df -> df[:, [:x1, :x2]], Xs) # pick off multiple columns
Node @ 1…84 = #15(3…09)
julia> Y()
10×2 DataFrame
│ Row │ x1 │ x2 │
│ │ Float64 │ Float64 │
├─────┼──────────┼───────────┤
│ 1 │ 0.518405 │ 0.869467 │
│ 2 │ 0.16694 │ 0.381958 │
│ 3 │ 0.832859 │ 0.285973 │
│ 4 │ 0.32379 │ 0.183428 │
│ 5 │ 0.24599 │ 0.282592 │
│ 6 │ 0.980281 │ 0.945239 │
│ 7 │ 0.544812 │ 0.0105223 │
│ 8 │ 0.74989 │ 0.444539 │
│ 9 │ 0.041312 │ 0.606745 │
│ 10 │ 0.119828 │ 0.930362 │
julia> y = node(df -> df.x3, Xs) # pick off single column
Node @ 1…50 = #17(3…09)
julia> y()
10-element Array{Float64,1}:
0.6768089324906859
0.8402212806559988
0.16667879796758434
0.0063863346857631065
0.5395665542748085
0.2671635276250435
0.48041697446815457
0.6437900249878024
0.10077583630960008
0.41942618943917687 To do the same thing for generic tables (which include DataFrames) you could use the built-in julia> Y = node(table->selectcols(table, [:x1, :x2]), Xs)
Node @ 8…98 = #21(3…09)
julia> Y()
10×2 DataFrame
│ Row │ x1 │ x2 │
│ │ Float64 │ Float64 │
├─────┼──────────┼───────────┤
│ 1 │ 0.518405 │ 0.869467 │
│ 2 │ 0.16694 │ 0.381958 │
│ 3 │ 0.832859 │ 0.285973 │
│ 4 │ 0.32379 │ 0.183428 │
│ 5 │ 0.24599 │ 0.282592 │
│ 6 │ 0.980281 │ 0.945239 │
│ 7 │ 0.544812 │ 0.0105223 │
│ 8 │ 0.74989 │ 0.444539 │
│ 9 │ 0.041312 │ 0.606745 │
│ 10 │ 0.119828 │ 0.930362 │ (Maybe we should provide a version of To merge columns, you can do things like this: julia> t(v1, v2) = (a=v1, b=v2) # define a function to put two vectors into a column table
t (generic function with 1 method)
julia> t(rand(3), [1, 2, 3])
(a = [0.0223417, 0.723267, 0.511597],
b = [1, 2, 3],)
julia> as = source(rand(3))
Source{:input} @ 1…65
julia> bs = source([1,2,3])
Source{:input} @ 2…88
julia> X = node(t, as, bs)
┌ Warning: A node referencing multiple origins when called has been defined:
│ MLJ.Source{:input}[Source{:input} @ 1…65, Source{:input} @ 2…88].
└ @ MLJ ~/Dropbox/Julia7/MLJ/MLJ/src/networks.jl:197
Node @ 1…00 = t(1…65, 2…88)
julia> X()
(a = [0.750533, 0.4004, 0.808384],
b = [1, 2, 3],) (You can ignore the warning here.) Does this answer your question? |
Thank you very much for your help, greatly appreciated! |
@ablaom sorry for the repeated questions, but I have another one. Let's say I want to first pre-train and then fine-tune a neural net. Thus, in the the training phase of the learning network, the training data is split into 'pre' and 'fine'. Model 'NNpre' is trained and the weights are saved. Then model 'NNfine' is trained with initial weights equal to the previously saved weights. All of the above are shown in the diagram below. The problem I have is the following: If I do not include any 'predict' nodes after 'NNpre' or 'NNfine', they are just not trained, because their respective nodal machines do not lead to the final node. However, if I include 'predict' nodes, the test data will also be split into 'pre' and 'fine' (?). The general question is if there is a way to separate the flow of information between the training and predicting phases. Could for example some nodes be activated only during training and not predicting and vice versa? Thanks again! |
@petrostat13 Thanks for the interesting use-case. I don't immediately see a way for the existing API to wrap everything you want in a single composite model. As I understand it, what you want to do is update the learned parameters of some model (a self-tuning neural-network) based on new data. That is, you want online learning, which isn't currently supported, but hopefully will be at some point (#60). |
Re the original comment: julia> using DataFrames, MLJ
julia> X = source(DataFrame(rand(2,3)))
Source{:input} @ 6…49
julia> X()
2×3 DataFrame
│ Row │ x1 │ x2 │ x3 │
│ │ Float64 │ Float64 │ Float64 │
├─────┼──────────┼──────────┼──────────┤
│ 1 │ 0.23562 │ 0.914597 │ 0.414423 │
│ 2 │ 0.529633 │ 0.280383 │ 0.641822 │
julia> v = selectcols(X, :x1)
Node @ 6…70 = #40(6…49)
julia> v()
2-element Array{Float64,1}:
0.2356195529795364
0.5296331600177344
julia> Xsmall = selectcols(X, 1:2)
Node @ 1…11 = #40(6…49)
julia> Xsmall()
2×2 DataFrame
│ Row │ x1 │ x2 │
│ │ Float64 │ Float64 │
├─────┼──────────┼──────────┤
│ 1 │ 0.23562 │ 0.914597 │
│ 2 │ 0.529633 │ 0.280383 │ |
Is there a way to perform column-wise operations to nodes of a learning network?
One can use FeatureSelector to manipulate the input source, but I couldn't do the same with the target source.
For example, let's say I want to train 2 models on the same input data but with different target variables. So if I define the target source as a 2-column Array, how do I separate the columns inside the network? Or is there a way to merge specific columns of different nodes into a new node? In general, is there a way to manipulate columns of nodes in such a network?
Thank you!
The text was updated successfully, but these errors were encountered: