-
Notifications
You must be signed in to change notification settings - Fork 369
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Fix #469] Print log progress on import rake task #787
[Fix #469] Print log progress on import rake task #787
Conversation
55a36dc
to
6455a0b
Compare
209859d
to
2c00e1b
Compare
lib/chewy/type/adapter/object.rb
Outdated
args.flatten(1).compact | ||
end | ||
|
||
collection.count |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't it be a "bit" inefficient for huge data sources?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually is hard to tell the advantages/disadvantages, because for huge data sources it could be particularly more interesting having a progress bar instead of just waiting without no feedback.
But yes, counting could take some time... But at least this is only calculated if someone asks for a progress bar.
WDYT? Keep as this or do you have other suggestion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe, somebody will feel that calculations take too much time and then will turn off the progressbar.
Or is it possible and does it makes sence to count the seconds of collection.count
executing? And we can stop the process in case of too long duration (like 1 seconds) and pass (for example) 1
as resulted value. So progressbar.title
will be set to Too many elements to output progressbar
.
In this case we will see just this title without progressbar line in the console.
I have added code like this here:
https://github.com/toptal/chewy/blob/print-log-progress-on-import-rake-task/lib/chewy/type/import.rb#L159-L168
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am totally ok with any of those approaches:
- Using the progress bar not as default so coping with the possibility of this performance cost happens with some kind of consent
- Changing this to a simple count of elements imported, of course handling the concurrency edge cases
- Having progress bar or count depending on env parameters, having simple count as default
Since @barthez doesn't seem to much fan of this introduced extra query for count, I guess the most promising of those 3 points are the 2nd and 3rd.
Between those 2 last, I prefer the 2nd due to its simplicity and it would not add any new dependency to the project
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bartek suggested to use thread to increment progressbar.total
eventually during parallel import executing:
progressbar = ProgressBar.create total: nil if routine.options[:progressbar] == 'true'
Thread.new { progressbar.total = adapter.import_count(objects) if progressbar }
117b75f
lib/chewy/type/import.rb
Outdated
adapter.import(*objects, routine.options) do |action_objects| | ||
routine.process(**action_objects) | ||
progressbar.progress += action_objects.values.flatten.length if routine.options[:progressbar] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe? flatten
is bad, maybe flatten(1)
instead?
progressbar.progress += action_objects.values.flatten.length if routine.options[:progressbar] | |
progressbar.progress += action_objects.map { _, v| v.size }.sum if routine.options[:progressbar] |
If you chose to apply it, do it in other places too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! I applyed this change.
Is this approach better because it is more explicit? @barthez
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just asked for a few tests, if it doesn't bother you much
lib/chewy/type/adapter/object.rb
Outdated
args.flatten(1).compact | ||
end | ||
|
||
collection.count |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually is hard to tell the advantages/disadvantages, because for huge data sources it could be particularly more interesting having a progress bar instead of just waiting without no feedback.
But yes, counting could take some time... But at least this is only calculated if someone asks for a progress bar.
WDYT? Keep as this or do you have other suggestion?
lib/chewy/type/import.rb
Outdated
@@ -150,8 +151,10 @@ def empty_objects_or_scope?(objects_or_scope) | |||
|
|||
def import_linear(objects, routine) | |||
ActiveSupport::Notifications.instrument 'import_objects.chewy', type: self do |payload| | |||
progressbar = ProgressBar.create total: adapter.import_count(objects) if routine.options[:progressbar] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to add a test case that mocks this progressbar object and make sure it has its progress incremented many times as expected
3a7b5ab
to
d141602
Compare
76670b0
to
057142f
Compare
057142f
to
0b3a054
Compare
…ring import" This reverts commit 9e30ae1.
@barthez Bartek, please take a look again on the PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Just added a few minor comments
@@ -1,6 +1,6 @@ | |||
require 'database_cleaner' | |||
|
|||
ActiveRecord::Base.establish_connection(adapter: 'sqlite3', database: 'file::memory:?cache=shared', pool: 10) | |||
ActiveRecord::Base.establish_connection(adapter: 'sqlite3', database: 'file::memory:?cache=shared', pool: 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be interesting to explain here that test suite doesn't run properly with more than one database connection, otherwise it does not find tables
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think merge was a little bit hasty.
Mentioning the feature in the README would be nice.
Why did we switch import parallelization from a process-based to a thread-based one?
@@ -33,6 +33,9 @@ | |||
* in rake tasks output (e.g. `Imported CitiesIndex::City in 1s, stats: index 3` -> `Imported CitiesIndex in 1s, stats: index 3`) | |||
* Use index name instead of type name in loader additional scope | |||
* e.g. `CitiesIndex.filter(...).load(city: {scope: City.where(...)})` -> `CitiesIndex.filter(...).load(cities: {scope: City.where(...)})` | |||
* [#469](https://github.com/toptal/chewy/issues/469): Add ability to output progressbar with `ENV['PROGRESS']` during `reset` rake tasks ([@Vitalina-Vakulchyk][]): | |||
* for `rake chewy:reset` and `rake chewy:parallel:reset` | |||
* progressbar is hidden by default, set `ENV['PROGRESS']` to `true` to display it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should have been in the unreleased
section, between lines 7 and 9. Or probably better in the new features sections (5-7).
@rabotyaga Because variables are protected from change for processes. I took the information from Readme of |
@rabotyaga Thank you |
…nce degradation in parallel import (toptal#800)
Allow printing of import progress during
rake chewy:reset
andrake chewy:parallel:reset
.[Fix #issue-number]
(if the related issue exists).master
(if not - rebase it).