Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

console.log() in JavaScript blocks should go to stdout #101

Closed
simonw opened this issue Jan 28, 2023 · 9 comments
Closed

console.log() in JavaScript blocks should go to stdout #101

simonw opened this issue Jan 28, 2023 · 9 comments
Labels
bug Something isn't working

Comments

@simonw
Copy link
Owner

simonw commented Jan 28, 2023

I was trying to debug a javascript: block in a shots.yml file and realized that console.log() would be useful.

https://playwright.dev/python/docs/api/class-consolemessage#console-message-args says it's as easy as doing this:

# Listen for all console logs
page.on("console", lambda msg: print(msg.text))
@simonw simonw added the bug Something isn't working label Jan 28, 2023
@simonw
Copy link
Owner Author

simonw commented Jan 28, 2023

That msg object actually has properties that look like this:

{'args': [<JSHandle preview=Hello world>, <JSHandle preview=JSHandle@node>], 'location': {'url': '', 'lineNumber': 15, 'columnNumber': 8}, 'text': 'Hello world JSHandle@node', 'type': 'log'}

That was triggered by console.log("Hello world", document.body).

I think I can just use text though, which is the default string representation.

@simonw
Copy link
Owner Author

simonw commented Jan 28, 2023

One catch with this: anything that a web page writes to console.log() will end up on standard output as well.

This is a problem for shot-scraper javascript since it could result in pages messing with the desired output.

As such, I think this mechanism should write to stderr, not stdout.

@simonw
Copy link
Owner Author

simonw commented Jan 28, 2023

Maybe I should control this with a --console-log option, otherwise things are going to get noisy for all sorts of pages that use console.log() in unexpected ways.

@simonw
Copy link
Owner Author

simonw commented Jan 28, 2023

For example, here's what it does on https://www.facebook.com/

% shot-scraper facebook.com



                                            
 .d8888b.  888                       888    
d88P  Y88b 888                       888    
Y88b.      888                       888    This is a browser feature intended for 
 "Y888b.   888888  .d88b.  88888b.   888    developers. If someone told you to copy-paste 
    "Y88b. 888    d88""88b 888 "88b  888    something here to enable a Facebook feature 
      "888 888    888  888 888  888  Y8P    or "hack" someone's account, it is a 
Y88b  d88P Y88b.  Y88..88P 888 d88P         scam and will give them access to your 
 "Y8888P"   "Y888  "Y88P"  88888P"   888    Facebook account.
                           888              
                           888              
                           888              

See https://www.facebook.com/selfxss for more information.

Screenshot of 'http://facebook.com' written to 'facebook-com.png'

@simonw
Copy link
Owner Author

simonw commented Jan 28, 2023

I think --log is clear enough.

@simonw
Copy link
Owner Author

simonw commented Jan 28, 2023

I'm going to call it --log-console for consistency with the existing --log-requests option.

@simonw
Copy link
Owner Author

simonw commented Jan 28, 2023

% shot-scraper javascript facebook.com 'document.title'
"Facebook - log in or sign up"
% shot-scraper javascript facebook.com 'document.title' --log-console



                                            
 .d8888b.  888                       888    
d88P  Y88b 888                       888    
Y88b.      888                       888    This is a browser feature intended for 
 "Y888b.   888888  .d88b.  88888b.   888    developers. If someone told you to copy-paste 
    "Y88b. 888    d88""88b 888 "88b  888    something here to enable a Facebook feature 
      "888 888    888  888 888  888  Y8P    or "hack" someone's account, it is a 
Y88b  d88P Y88b.  Y88..88P 888 d88P         scam and will give them access to your 
 "Y8888P"   "Y888  "Y88P"  88888P"   888    Facebook account.
                           888              
                           888              
                           888              

See https://www.facebook.com/selfxss for more information.

"Facebook - log in or sign up"

simonw added a commit that referenced this issue Jan 28, 2023
@simonw simonw closed this as completed Jan 28, 2023
@simonw
Copy link
Owner Author

simonw commented Jan 28, 2023

Re-opening because I need to do some manual testing and think about how to document this.

@simonw simonw reopened this Jan 28, 2023
@simonw
Copy link
Owner Author

simonw commented Jan 29, 2023

Need to manually test all of these, each with and without the --log-console option:

  • shot-scraper accessibility facebook.com --log-console
  • shot-scraper auth https://facebook.com /tmp/auth.json --log-console
  • shot-scraper html facebook.com --log-console
  • shot-scraper javascript facebook.com "document.title" --log-console
  • shot-scraper pdf facebook.com --log-console
  • shot-scraper shot facebook.com --log-console
  • echo '- url: https://facebook.com' | shot-scraper multi - --log-console

@simonw simonw closed this as completed in fceddad Jan 29, 2023
simonw added a commit that referenced this issue Jan 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant