Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fix #469] Print log progress on import rake task #787

Merged
merged 19 commits into from
Apr 23, 2021

Conversation

Vitalina-Vakulchyk
Copy link
Contributor

@Vitalina-Vakulchyk Vitalina-Vakulchyk commented Apr 15, 2021

Allow printing of import progress during rake chewy:reset and rake chewy:parallel:reset.

  • The PR relates to only one subject with a clear title and description in grammatically correct, complete sentences.
  • Wrote good commit messages.
  • Commit message starts with [Fix #issue-number] (if the related issue exists).
  • Feature branch is up-to-date with master (if not - rebase it).
  • Squashed related commits together.
  • Updated tests.
  • Added an entry to the changelog if the new code introduces user-observable changes. See changelog entry format for details.

@Vitalina-Vakulchyk Vitalina-Vakulchyk force-pushed the print-log-progress-on-import-rake-task branch from 55a36dc to 6455a0b Compare April 16, 2021 11:32
@Vitalina-Vakulchyk Vitalina-Vakulchyk force-pushed the print-log-progress-on-import-rake-task branch 2 times, most recently from 209859d to 2c00e1b Compare April 16, 2021 12:07
args.flatten(1).compact
end

collection.count
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't it be a "bit" inefficient for huge data sources?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually is hard to tell the advantages/disadvantages, because for huge data sources it could be particularly more interesting having a progress bar instead of just waiting without no feedback.

But yes, counting could take some time... But at least this is only calculated if someone asks for a progress bar.

WDYT? Keep as this or do you have other suggestion?

Copy link
Contributor Author

@Vitalina-Vakulchyk Vitalina-Vakulchyk Apr 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, somebody will feel that calculations take too much time and then will turn off the progressbar.

Or is it possible and does it makes sence to count the seconds of collection.count executing? And we can stop the process in case of too long duration (like 1 seconds) and pass (for example) 1 as resulted value. So progressbar.title will be set to Too many elements to output progressbar.

In this case we will see just this title without progressbar line in the console.

I have added code like this here:
https://github.com/toptal/chewy/blob/print-log-progress-on-import-rake-task/lib/chewy/type/import.rb#L159-L168

@barthez @dalthon

Copy link
Contributor Author

@Vitalina-Vakulchyk Vitalina-Vakulchyk Apr 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@barthez
Should I change the approach and just output the count of imported records to avoid collection.count? Or any other way?

And I can try to use Mutex to synchronize the count of records to output during parallel import.
@dalthon

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am totally ok with any of those approaches:

  • Using the progress bar not as default so coping with the possibility of this performance cost happens with some kind of consent
  • Changing this to a simple count of elements imported, of course handling the concurrency edge cases
  • Having progress bar or count depending on env parameters, having simple count as default

Since @barthez doesn't seem to much fan of this introduced extra query for count, I guess the most promising of those 3 points are the 2nd and 3rd.

Between those 2 last, I prefer the 2nd due to its simplicity and it would not add any new dependency to the project

Copy link
Contributor Author

@Vitalina-Vakulchyk Vitalina-Vakulchyk Apr 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bartek suggested to use thread to increment progressbar.total eventually during parallel import executing:
progressbar = ProgressBar.create total: nil if routine.options[:progressbar] == 'true'
Thread.new { progressbar.total = adapter.import_count(objects) if progressbar }
117b75f

adapter.import(*objects, routine.options) do |action_objects|
routine.process(**action_objects)
progressbar.progress += action_objects.values.flatten.length if routine.options[:progressbar]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe? flatten is bad, maybe flatten(1) instead?

Suggested change
progressbar.progress += action_objects.values.flatten.length if routine.options[:progressbar]
progressbar.progress += action_objects.map { _, v| v.size }.sum if routine.options[:progressbar]

If you chose to apply it, do it in other places too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! I applyed this change.

Is this approach better because it is more explicit? @barthez

Copy link
Contributor

@dalthon dalthon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just asked for a few tests, if it doesn't bother you much

args.flatten(1).compact
end

collection.count
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually is hard to tell the advantages/disadvantages, because for huge data sources it could be particularly more interesting having a progress bar instead of just waiting without no feedback.

But yes, counting could take some time... But at least this is only calculated if someone asks for a progress bar.

WDYT? Keep as this or do you have other suggestion?

@@ -150,8 +151,10 @@ def empty_objects_or_scope?(objects_or_scope)

def import_linear(objects, routine)
ActiveSupport::Notifications.instrument 'import_objects.chewy', type: self do |payload|
progressbar = ProgressBar.create total: adapter.import_count(objects) if routine.options[:progressbar]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to add a test case that mocks this progressbar object and make sure it has its progress incremented many times as expected

lib/chewy/rake_helper.rb Outdated Show resolved Hide resolved
@Vitalina-Vakulchyk Vitalina-Vakulchyk force-pushed the print-log-progress-on-import-rake-task branch 2 times, most recently from 3a7b5ab to d141602 Compare April 21, 2021 11:15
@Vitalina-Vakulchyk Vitalina-Vakulchyk force-pushed the print-log-progress-on-import-rake-task branch from 76670b0 to 057142f Compare April 22, 2021 08:18
@Vitalina-Vakulchyk Vitalina-Vakulchyk force-pushed the print-log-progress-on-import-rake-task branch from 057142f to 0b3a054 Compare April 22, 2021 08:56
@Vitalina-Vakulchyk
Copy link
Contributor Author

@barthez Bartek, please take a look again on the PR.

Copy link
Contributor

@dalthon dalthon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
Just added a few minor comments

Gemfile Outdated Show resolved Hide resolved
gemfiles/rails.6.0.activerecord.gemfile Outdated Show resolved Hide resolved
@@ -1,6 +1,6 @@
require 'database_cleaner'

ActiveRecord::Base.establish_connection(adapter: 'sqlite3', database: 'file::memory:?cache=shared', pool: 10)
ActiveRecord::Base.establish_connection(adapter: 'sqlite3', database: 'file::memory:?cache=shared', pool: 1)
Copy link
Contributor

@dalthon dalthon Apr 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be interesting to explain here that test suite doesn't run properly with more than one database connection, otherwise it does not find tables

@Vitalina-Vakulchyk Vitalina-Vakulchyk merged commit cb7e4c4 into master Apr 23, 2021
@Vitalina-Vakulchyk Vitalina-Vakulchyk deleted the print-log-progress-on-import-rake-task branch April 23, 2021 10:02
Copy link
Contributor

@rabotyaga rabotyaga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think merge was a little bit hasty.
Mentioning the feature in the README would be nice.
Why did we switch import parallelization from a process-based to a thread-based one?

@@ -33,6 +33,9 @@
* in rake tasks output (e.g. `Imported CitiesIndex::City in 1s, stats: index 3` -> `Imported CitiesIndex in 1s, stats: index 3`)
* Use index name instead of type name in loader additional scope
* e.g. `CitiesIndex.filter(...).load(city: {scope: City.where(...)})` -> `CitiesIndex.filter(...).load(cities: {scope: City.where(...)})`
* [#469](https://github.com/toptal/chewy/issues/469): Add ability to output progressbar with `ENV['PROGRESS']` during `reset` rake tasks ([@Vitalina-Vakulchyk][]):
* for `rake chewy:reset` and `rake chewy:parallel:reset`
* progressbar is hidden by default, set `ENV['PROGRESS']` to `true` to display it
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should have been in the unreleased section, between lines 7 and 9. Or probably better in the new features sections (5-7).

@Vitalina-Vakulchyk
Copy link
Contributor Author

Vitalina-Vakulchyk commented Apr 23, 2021

@rabotyaga Because variables are protected from change for processes. I took the information from Readme of Parallel gem
https://github.com/grosser/parallel

@Vitalina-Vakulchyk
Copy link
Contributor Author

@rabotyaga Thank you
I will create a PR to update CHANGELOG and Readme files.

rabotyaga pushed a commit that referenced this pull request May 24, 2021
rabotyaga pushed a commit that referenced this pull request May 24, 2021
rabotyaga pushed a commit that referenced this pull request May 24, 2021
rabotyaga pushed a commit that referenced this pull request May 24, 2021
cyucelen pushed a commit to cyucelen/chewy that referenced this pull request Jan 28, 2023
cyucelen pushed a commit to cyucelen/chewy that referenced this pull request Jan 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants