Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add nthread arg to XGBoostRegressor , XGBoostCount and XGBoostClassifier #13

Merged
merged 1 commit into from
Sep 28, 2021

Conversation

dmahasen
Copy link
Contributor

Hi, I have added nthread arg to XGBoostRegressor , XGBoostCount and XGBoostClassifier. So, number of threads used can be limited.

Copy link
Member

@ablaom ablaom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dmahasen Thanks for this contribution!

Looks good to me and will merge now.

However, I wasn't actually able to notice a speed-up with the number of threads on my machine. What have your observations been?

julia> mach.model.nthread
12

julia> @time fit!(mach, force=true, verbosity=-1)
  3.513411 seconds (104.47 k allocations: 6.427 MiB, 0.27% gc time)
Machine{XGBoostClassifier,} @287 trained 7 times; caches data
  args: 
    1:  Source @985`Table{AbstractVector{Continuous}}`
    2:  Source @642`AbstractVector{Multiclass{3}}`


julia> mach.model.nthread = 1
1

julia> @time fit!(mach, force=true, verbosity=-1)
  3.522144 seconds (104.47 k allocations: 6.427 MiB)
Machine{XGBoostClassifier,} @287 trained 8 times; caches data
  args: 
    1:  Source @985`Table{AbstractVector{Continuous}}`
    2:  Source @642`AbstractVector{Multiclass{3}}`

@ablaom ablaom merged commit 9401b34 into JuliaAI:master Sep 28, 2021
@ablaom ablaom mentioned this pull request Sep 28, 2021
@dmahasen
Copy link
Contributor Author

dmahasen commented Sep 30, 2021

@ablaom, It is my pleasure to make this tiny contribution.

I have experienced a time difference when I change the number of threads.

Here, X has 1000 observations with 100 values for each.

julia> xgb_thread_2 = xgbReg(num_round=1000,nthread= 2);

julia> m_xgb_thread_2 = machine(xgb_thread_2, X, y);

julia> @time fit!(m_xgb_thread_2,verbosity=-1 );
2.038276 seconds (39.83 k allocations: 4.895 MiB)

julia> xgb_thread_6 = xgbReg(num_round=1000,nthread= 6);

julia> m_xgb_thread_6 = machine(xgb_thread_6, X, y);

julia> @time fit!(m_xgb_thread_6,verbosity=-1 );
0.881676 seconds (39.83 k allocations: 4.895 MiB)

The main requirement I had was to limit the number of cores in fitting the model. Otherwise, it consumes all the cores on my machine.

@ablaom
Copy link
Member

ablaom commented Sep 30, 2021

Ah, yes of course! In tree methods the parallelism occurs across features not observations! I had a lot of observations but few features in my test.

Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants