Parallelism in Learning Networks #739

olivierlabayle · 2022-02-10T17:36:00Z

Hi,

I would be quite keen on having parallel training for learning networks.

I have seen that there may be a plan to use Dagger.jl in this issue for instance and wanted to know if it was still in the scope?

On a much less ambitious note, I have played around a bit and it seems that the following enables multithreaded fitting. One downside is that the acceleration cannot be provided in the fit! function, the user has to call default_resource(CPUThreads()). The good news is that it seems quick and easy to provide this additional feature. I am by no mean knowledgeable of parallel computing so if there is anything wrong with it please let me know. For instance, I am not even sure how to test that implementation.

The text was updated successfully, but these errors were encountered:

ablaom · 2022-02-10T19:43:37Z

Thanks @olivierlabayle for looking into yet another interesting area for enhancement. It's exciting to hear you may be interested in helping out here!

Re the multithreading. Thanks for this POC! What you suggest is precisely what I had in mind when I carried out a big refactor of learning network training some time ago to make training "ansynchronous". However, I think we should enlist the guidance of someone strong in this area, as I think it's easy to implement multi-threading in unsafe ways, which is why I stopped short. I have already mentioned this project to @OkonSamuel, who would be ideal, but he is quite busy just now.

As you note, there is also the issue of a user interface point for acceleration. I had in mind allowing user to optionally add this as a hyper-parameter to her composite model (that is implemented using learning networks). Then, thereturn! method that you call at the end of the fit! method wrapping the learning network is modified to inspect model.acceleration (if defined) and pass that to the subsequent fit! call to the glb node of the network. Then something similar for the update fallback for composites. But if you have a better idea...

Re the distributed computing, we actually had the maintainer of Dagger.jl @jpsamaroo look into this. However, this was before the big refactor, which made this an ambitious undertaking, and it was ultimately unsuccessful. Perhaps he would be willing to revisit it given the refactoring, especially if you are available to slog out some of the details.

Note that for both multi-threading and distributed computing the existing testing is already asynchronous. That is, we have tests that various parts of the network do what they are expected to do, but we do not insist nodes in parallel execute in a particular order. What will be important is to add tests that show outcomes are independent of the acceleration mode. And we already have the @test_accelerated macro (courtesy of @jpsamaroo) to repeat tests over multiple modes.

ablaom · 2022-02-10T19:46:22Z

BTW, if you are interested in pushing the POC a little further, you could already do some testing to check multithreaded stacks and CPU1 stacks give the same answers (assuming no RNGs!).

jpsamaroo · 2022-02-10T21:54:44Z

Thanks for the ping @ablaom!

Dagger has grown in capability and scope since I first tried to implement parallel training, and I also have a much better handle on how to safely do distributed programming in Julia. However, it's also been a while since I've looked at MLJ, so I'd definitely need to spend time wrapping my head around how models are initialized and how data moves around.

It's possible that we can do less implementation work in MLJ and instead allow a user to wrap MLJ calls with Dagger API calls (like Dagger.@spawn and Dagger.@mutable). We've had success with this approach with Flux.jl (https://github.com/DhairyaLGandhi/DaggerFlux.jl), so I suspect it should be possible for MLJ as well. It might require some work on the user's side to decide where models should reside (since the APIs don't currently provide a great way to distribute some set of arbitrary data uniformly, but we can figure that out).

Anyway, I don't have much time to do this right now, but if anyone wants to give this a shot without waiting on me, I'd be happy to help provide guidance! Just ping me if you run into trouble or have questions, or file issues on the Dagger repo.

ablaom · 2022-02-11T02:24:24Z

@jpsamaroo Thanks for the quick response and update. Things look promising. I suggest that if and when @olivierlabayle is ready to look at the distributed case, we have all have a call to get your best advice.

ablaom · 2022-02-11T02:30:52Z

For the record, here is the original, open (but quite stale) issue for adding distributed computing via Dagger: JuliaAI/MLJ.jl#72

olivierlabayle · 2022-02-13T18:52:59Z

@ablaom @jpsamaroo Thanks for your instructive replies! You seem to make a distinction between the multithreaded and the distributed version. From the README, I had the impression that Dagger actually abstracts this away "It can run computations represented as DAGs efficiently on many Julia worker processes and threads, as well as GPUs". Isn't that correct (I have never used Dagger before)? I initially thought we could represent learning networks as Dagger DAGs or something like that and then we would be "done".

Given my current personnal workload/deadlines I could only envision to push the current POC in the near future if that is deemed useful. However, if Dagger does indeed abstract the resource representation away and nobody else has taken the subject in the meantime I would be very happy to give it a try in a few months when it is a bit quieter for me.

olivierlabayle · 2022-04-11T16:00:26Z

I have been playing around a bit with Dagger.jl today, I think the following represents some kind of proof of concept that in theory it could work! The main caveat is that with this approach, I don't currently see how to do it without breaking everything.

My idea is as follows:

Create a @composite macro that a user would place prior to a composite model definition like in userdefined below with some @register statements to declare OPERATIONS and additional reports. This macro will generate a bunch of methods for the particular composite machine's model:

A fit!: For fitting and retrieve additional reports
A method for each registered operation in OPERATIONS like predict below.

As you can see I have also played with a composite of composite to check it was running fine.

At first glance, the repercusions I see are:

MLBase.Nodes are replaced by Dagger.EagerThunks
Each Thunk is executed once contrary to the current design where nodes are potentially called multiple times.
I have also tried to introduce the definition of the network at the machine level to be able to forward the machine's cache (see: Control caching of composite models #756). I hope it does not obscure too much the POC.

Of course I haven't done anything really here, since all the complexity will lie in the not provided macro. I just wanted to have your opinion before moving forward since it represents a big piece of work and potentially not in line with your perspectives for MLJBase.

Happy to discuss more in detail over a call!

using Pkg
Pkg.activate(".")
using Dagger
using DataFrames
using MLJBase
using MLJLinearModels

struct MyModel <: MLJBase.Model
    model₁
    model₂
end

"""
We can define a macro as it is done in most probabilistic programming languages.
The user could define something like this:

@composite function userdefined(mach::Machine{MyModel, C}) where C
    X₁, X₂, y = mach.args()

    mach₁ = machine(mach.model.model₁, X₁, y, cache=C)
    y₁ = predict(mach₁)

    mach₂ = machine(mach.model.model₂, X₂, y, cache=C)
    y₂ = predict(mach₂)

    @register ypred = (y₁ + y₂) ./ 2, :predict
    @register mean_ = mean(ypred), :mean
    @register var_ = var(ypred), :var
end

"""

"""
From the previous chunk of code we can generate a fit method from the
computational graph that would result here in:
"""
function MLJBase.fit!(mach::Machine{MyModel, C}; verbosity=0) where C
    X₁, y = (src() for src in mach.args)

    mach₁ = machine(mach.model.model₁, X₁, y, cache=C)
    mach₁ = Dagger.spawn(m -> fit!(m, verbosity=verbosity), mach₁)
    y₁ = Dagger.spawn(predict, mach₁)

    mach₂ = machine(mach.model.model₂, X₁, y, cache=C)
    mach₂ = Dagger.spawn(m -> fit!(m, verbosity=verbosity), mach₂)
    y₂ = Dagger.spawn(predict, mach₂, X₁)

    ypred = Dagger.spawn(+, y₁, y₂)
    mean_ = Dagger.spawn(mean, ypred)

    # Encapsulate in a return! 
    mach.fitresult = (machines=[fetch(mach₁), fetch(mach₂)], mean=fetch(mean_))
    return mach
end


"""
For each registered operation in OPERATIONS, generate the corresponding method from the graph.
Here only X₁ and X₂ are required for the prediction so only them are included in the signature.
"""
function MLJBase.predict(mach::Machine{MyModel,}, X₁)
    mach₁ = mach.fitresult.machines[1]
    y₁ = Dagger.spawn(predict, mach₁, X₁)

    mach₂ = mach.fitresult.machines[2]
    y₂ = Dagger.spawn(predict, mach₂, X₁)

    ypred = Dagger.spawn(+, y₁, y₂)

    return fetch(ypred)
end

###### Data
n = 1000
X₁ = MLJBase.table(rand(n, 3))
y = rand(n)
C = false

###### Machine
mymodel = MyModel(LinearRegressor(), RidgeRegressor(lambda=1))
mach = machine(mymodel, X₁, y, cache=C)
fit!(mach, verbosity=1)
predict(mach, X₁)


###### Composite of composite: hangs forever

newmodel = MyModel(mymodel, LinearRegressor())
mach = machine(newmodel, X₁, y, cache=C)
fit!(mach, verbosity=1)

olivierlabayle added brainstorm enhancement New feature or request labels Feb 10, 2022

olivierlabayle mentioned this issue Apr 8, 2022

Control caching of composite models #756

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallelism in Learning Networks #739

Parallelism in Learning Networks #739

olivierlabayle commented Feb 10, 2022 •

edited

Loading

ablaom commented Feb 10, 2022

ablaom commented Feb 10, 2022 •

edited

Loading

jpsamaroo commented Feb 10, 2022

ablaom commented Feb 11, 2022

ablaom commented Feb 11, 2022

olivierlabayle commented Feb 13, 2022

olivierlabayle commented Apr 11, 2022

Parallelism in Learning Networks #739

Parallelism in Learning Networks #739

Comments

olivierlabayle commented Feb 10, 2022 • edited Loading

ablaom commented Feb 10, 2022

ablaom commented Feb 10, 2022 • edited Loading

jpsamaroo commented Feb 10, 2022

ablaom commented Feb 11, 2022

ablaom commented Feb 11, 2022

olivierlabayle commented Feb 13, 2022

olivierlabayle commented Apr 11, 2022

olivierlabayle commented Feb 10, 2022 •

edited

Loading

ablaom commented Feb 10, 2022 •

edited

Loading