Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PDF conversion progress insight and logging #235

Open
toncid opened this issue Apr 3, 2024 · 3 comments
Open

PDF conversion progress insight and logging #235

toncid opened this issue Apr 3, 2024 · 3 comments

Comments

@toncid
Copy link

toncid commented Apr 3, 2024

Hello, we are often getting various Grover or protocol timeout errors, which are not very helpful, as it is unclear at which stage the timeout happened.

I haven't found a way to enable logging in order to trace the progress of PDF conversion. It would be good if Grover is able to take a logger and invoke it as the call to_pdf makes progress:

  • Grover initialized
  • HTML URL/source received
  • Puppeteer started
  • Chrome v123.456 started
  • Page is loading
  • Page loaded (e.g. when fired DOMContentLoaded, load, network0/2, etc.)
  • Conversion started
  • Conversion ended
  • Capturing PDF

Hope the above paints the picture of what kind of insight is desirable.

If it is already possible, please share and we can work on updating the README.

@abrom
Copy link
Contributor

abrom commented Apr 7, 2024

Hi @toncid have you read the section in the README on debugging?

It's not clear exactly what you mean by "invoking a logger" given the props passed through to the NodeJS are serialised. There could be an option for dumping progress out to a log file but this starts to get messy given you could have multiple processes dumping entries at the same time. You'd need to have some unique log tagging key to go with it, or unique log files per invocation. Either way, going down the debugging route already laid out in the readme would seem a better option all round

@toncid
Copy link
Author

toncid commented Apr 22, 2024

Hello @abrom, thank you for your response. I was thinking of logging steps from the Grover side, around the actual invocation of Puppeteer, but I assume there isn't much to log.

However, in production systems, there can be multiple workers running Grover and Puppeteer, so it is practically impossible to get live debugging when needed.

Do you know any options to gather console and telemetry output from such setups? I wasn't able to find any way to do it (e.g. setting dumpio doesn't seem to show any output in server logs).

@abrom
Copy link
Contributor

abrom commented Jun 5, 2024

hmm.. good question. Because grover is already using the stdout/stderr channels for result/error comms it'd likely need to be passed some other IO (ie a file path) where it can be told to log to.. then it'd be pretty trivial to have debug information piped there. If the calling service was controlling the log path as an option it'd also make it easier to "manage" multiple concurrent invocations by just giving each a different path. Some log management/cleanup would likely be prudent!

BUT.. given that this could be an exceptionally destructive action (eg someone runs the process as a super user then a bad actor passes through the "log file" option as some import system file), any log file option would need to be excluded from anything that could be configured via a request. That shouldn't be a big deal, but not to be dealt with lightly.

in the processor JS it'd be something like:

const fs = require('node:fs/promises');
....

const debugLogFile = options.debugLogFile; delete options.debugLogFile;
....

if (debugLogFile) await fs.appendFile(debugLogFile, '... the log message ...');
... etc

Then in ruby land you'd use it as such (or similar):

Grover.new(<content>, debug_log_file: File.join('/tmp', request[:uuid], '.log')).to_pdf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants