Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use joblib to parallelize updates #37

Merged
merged 4 commits into from
Sep 15, 2023
Merged

Use joblib to parallelize updates #37

merged 4 commits into from
Sep 15, 2023

Conversation

dwijn
Copy link
Contributor

@dwijn dwijn commented Jul 31, 2023

Modifications to parallelize updates, see #36.

@yymao
Copy link
Owner

yymao commented Aug 1, 2023

Thank you for the PR @dwijn! The code looks great but I haven't got time to test it yet. I'll approve once I have a chance to run it.

@dwijn
Copy link
Contributor Author

dwijn commented Sep 5, 2023

Any progress on this?

@yymao
Copy link
Owner

yymao commented Sep 10, 2023

@dwijn sorry about the delay. I tried running it but immediately hit a bug and haven't gotten a chance to debug. So if I have a tex file with citations but no bib file yet (or just an empty bib file), and I run adstex main.tex. This new version will immediately say "Done!" without doing any searches. The original version will correctly loop through all the citations.

I haven't had a chance to look any deeper and plan to return to this when I have more time. But if you happen to have a chance to check, that would be great too.

parallelize updates
Bugfix
Add short argument for parallel processing
@dwijn
Copy link
Contributor Author

dwijn commented Sep 11, 2023

Dumb bug, fixed it. I also added a short option for parallel updates (-P).

@syrte
Copy link
Contributor

syrte commented Sep 12, 2023

Nice work, the parallelized adstex would be very helpful.

I'd like to raise two questions for potential discussion:

  1. Just curious if it would be simpler to have a single argument "-N" instead of two args "-P" and "-N".
    We can use threads=1 for a single thread and threads>1 to enable parallel.
  2. Not sure if we want to set parallel by default or leave it optional.
    I'd slightly tend to make a single thread as the default, so keeping consistency with the original behavior.

@dwijn
Copy link
Contributor Author

dwijn commented Sep 12, 2023

Making it a single argument is definitely possible as @syrte proposes, and it would be trivial to implement. I chose two arguments in order to have just a single flag to turn on parallel processing with a default number of threads for the sake of simplicity. Let me know if you want me to change the code to a single argument. I'd probably make it -P [N], with N=8 default.
I did not make parallelism the default out of caution. I think it's OK, but I wanted to make sure the code works properly first, and also I had some mild concerns about hammering ADS with simultaneous connections. I'm not sure if they have some policies etc. on this, I haven't had a chance to look into it.

@yymao
Copy link
Owner

yymao commented Sep 14, 2023

Thanks for identifying and fixing the bugs @dwijn. Thanks for the comments @syrte.

@dwijn -- I wonder if you'd be ok if I added some minor changes directly as a commit to your PR? I think this version is mostly ready to go but I want to make some code style/convention edits if you are ok with it.

@dwijn
Copy link
Contributor Author

dwijn commented Sep 14, 2023

Yes, please go ahead!

@yymao
Copy link
Owner

yymao commented Sep 14, 2023

@dwijn thank you. If you have a chance, take a look at 3d186df. In particular, I'd like to merge -P and -N options with the following behavior:

  • -P: run in parallel with 8 threads
  • -P=x if x >= 2, run in parallel with x threads
  • No -P at all: run in series

This is the main change in this commit. All other edits are minor style issues.

And I agree that we should keep parallel running as optional for now. I imagine we might turn this on as default in the future after some more testing.

@dwijn
Copy link
Contributor Author

dwijn commented Sep 14, 2023

Yeah, this patch works and produces the desired behavior. The one issue I've noticed is that you now can't simply do
adstex -P main.tex
since that produces an error:
adstex: error: argument --parallel/-P: invalid int value: 'main.tex'
The solution is to use the meta-argument --:
adstex -P -- main.tex
But I guess this is slightly surprising behavior. I looked at argparse but I don't see a simple solution.

@syrte
Copy link
Contributor

syrte commented Sep 15, 2023

I guess adstex main.tex -P should work. Still it would be very surprising when one found adstex -P main.tex not work.

Hmm, so it sounds that either we use two options "-P" and "-N" as @dwijn did before, or we always ask users to explicitly specify the number of threads such as adstex -P 8 main.tex.
Maybe the latter one is not so bad. Only need to type two more characters (shorter than adstex -P -- main.tex!).

Does argparse support omitting the space between the option and number (as many linux commands), i.e., does adstex -P8 main.tex work?
If yes, then one needs to type one more character (with explicit control of the number of threads), which should be totally fine.

@dwijn
Copy link
Contributor Author

dwijn commented Sep 15, 2023

Yes, adstex -P8 main.tex works as expected. This appears to be a known issue with argparse (see, e.g., here).

I guess I would favor the solution with two arguments over requiring a user to always provide the number of threads to use. Another option would be to catch this error and handle it gracefully, but I think that will be kind of complicated and it'll be difficult to guarantee it will always do the right thing.

@yymao
Copy link
Owner

yymao commented Sep 15, 2023

Thanks for noticing this behavior! I am ok with adding back the --threads option (and make -P just store true). But I'd rather not having -N as a shorthand for --threads for two reasons. (1) Overall I'd like to reserve one-letter options for really commonly used options, and setting the number of threads doesn't seem to be so important. (2) There are a few existing options like --no-update and --no-backup that start with "N".

I wonder if you two would prefer the current setup, or reintroduce --threads but without shorthand?

@dwijn
Copy link
Contributor Author

dwijn commented Sep 15, 2023

I agree with not having the short option for the number of threads. My vote would be to have -P/--parallel option that just stores true/false, and add back the --threads N option to set the number of threads.

@syrte
Copy link
Contributor

syrte commented Sep 15, 2023

I am fine with either way ;)
Just in case anyone likes shorthand, an alternative option for -N is -j or --jobs, which is adopted by make.

@yymao yymao linked an issue Sep 15, 2023 that may be closed by this pull request
@yymao
Copy link
Owner

yymao commented Sep 15, 2023

Thank you both! Ready to merge.

@yymao yymao merged commit 1011f93 into yymao:master Sep 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parallelize updates
3 participants