You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi!
First of all, thank you for your script, it really helped me!
I wanted to download a 350000-message group. As you can imagine, it was taking a while.
I realized the script was creating a wget process, and therefore a single connection to Google, for each and every forum page, thread page, and message. Which is really inefficient.
So I improved the generated script so that
a bunch of messages were retrieved in one connection, thanks to curl and its -o $output1 $url1 -o $output2 $url2 syntax
there's still room for improvement (eg my script only handles the actual messages, not the thread list and message list)
I didn't look that much into the code, only into the generated script, so there's probably a few tweaks needed
there are a few decisions involved: my version requires curl, there are parameters for the number of processes and the number of URLs per batch, the cookie file is its own parameter instead of WGET_OPTIONS, etc.
Hope this proves useful. :)
The text was updated successfully, but these errors were encountered:
That's excellent @Pikrass . I will update the README to mention your script, and I think that's fine enough at the moment.
My intention was to slowly query google server so the script won't be locked down; that's why I didn't have parallel support out-of-the-box. Once you have the generated script, you have a few options to continue, e.g, your way, or you can also feed the gnu parallel tool.
Hi!
First of all, thank you for your script, it really helped me!
I wanted to download a 350000-message group. As you can imagine, it was taking a while.
I realized the script was creating a wget process, and therefore a single connection to Google, for each and every forum page, thread page, and message. Which is really inefficient.
So I improved the generated script so that
-o $output1 $url1 -o $output2 $url2
syntaxI even made a nice text UI to monitor the jobs.
Here's my script. I license it under the same terms as your code.
I'm not doing a PR because:
Hope this proves useful. :)
The text was updated successfully, but these errors were encountered: