[REGRESSION]: chromium.connect does not work with vanilla CDP servers anymore #4054

berstend · 2020-10-05T11:03:39Z

Hi there,
apologies if this has been asked before but I didn't find anything.

Up until v1.3.0 it was possible to .connect to a vanilla browser websocket URL (either created through launching Chrome with --remote-debugging) or using Puppeteer).

v1.4.0 introduced the new "New Client / Server Wire protocol", since then it's not possible to use .connect with other original CDP websocket servers anymore (it will hang here). I also noticed through logging that the new wire protocol looks much different from the regular CDP chatter.

My question would be if there's still a way to use chromium.connect with the traditional CDP websocket URLs or if there are pointers on how to best write a translation layer to accomplish that.

My use-case is to have a single websocket url which exposes a fleet of chromium browsers by proxying the websocket connections (similar to browserless.io) and both puppeteer & playwright being able to connect to it and control the browsers.

The following worked prior to v1.4.0:

const puppeteer = require("puppeteer")

puppeteer
  .launch({
    headless: true,
    defaultViewport: null,
    args: [
      "--remote-debugging-port=9222",
      "--remote-debugging-address=0.0.0.0",
    ],
  })
  .then(async (browser) => {
    console.log(browser._connection.url())
  })
// => ws://0.0.0.0:9222/devtools/browser/490a2b32-ce62-4b6b-a530-d4931fdeb046

const pw = require("playwright")

;(async () => {
  const browser = await pw.chromium.connect({
    wsEndpoint: "ws://0.0.0.0:9222/devtools/browser/490a2b32-ce62-4b6b-a530-d4931fdeb046",
  })

  const context = await browser.newContext()
  const page = await context.newPage()

  await page.goto("https://www.example.com/")
  await page.screenshot({ path: "example.png" })

  await browser.close()
})()

The text was updated successfully, but these errors were encountered:

dgozman · 2020-10-06T03:02:59Z

Hi @berstend!

The underlying protocol has indeed changed in 1.4, and we are aware that this usecase is not supported anymore. However, it was never a goal for Playwright to work against CDP-based remote websockets. Playwright supports multiple browsers, and we try for all features to be equally supported by all of them. Our new protocol is the same for all browsers we support.

I'd suggest to use Playwright launchServer API to spawn the server and proxy its websocket connection. Does that work for you?

My question would be if there's still a way to use chromium.connect with the traditional CDP websocket URLs or if there are pointers on how to best write a translation layer to accomplish that.

There is no API to connect to CDP websocket directly. A translation layer would be equivalent to ~50% of Playwright code base, so it does not look feasible.

berstend · 2020-10-06T06:44:11Z

@dgozman Thanks a lot for the quick response on that matter. :-)

Oh wow, that's unfortunate to hear. The CDP websocket has always felt like the underlying and unifying standard or common denominator that all CDP based automation frameworks could interface with.

Note: I really appreciate the efforts of the Playwright team and the project contributors and can understand why switching to a custom wire protocol was desired.

Having that said, I wonder what the best course of action would be if someone would be determined enough to develop a custom websocket server compatible with Playwright wire protocol and vanilla CDP clients.

In my scenario I have full control over the websocket server consumed by pptr/pw clients but the server itself has to connect to existing browsers through vanilla CDP (playwright cannot be used to launch them).

Would this be feasible:

Modify launchServer to not launch a browser but use an existing one through CDP (should be possible as internally Playwright still uses the vanilla CDP websocket)
Develop a custom websocket server that can differentiate between Playwright and Puppeteer clients (through e.g. the initial handshake or similar) and forward the websocket frames to either the custom Playwright websocket or the vanilla CDP socket

Thanks a lot for your time and input, much appreciated. :-)

pavelfeldman · 2020-10-06T14:16:59Z

playwright cannot be used to launch them

Could you share why this is the case? We assumed you would simply run browser.launchServer and that would be it?

CDP websocket has always felt like the underlying and unifying standard or common denominator that all CDP based automation frameworks could interface with.

Since we are more than just Chromium, it does not really work for us. Having said that, you can modify launchServer locally to attach over CDP instead of launching Chromium. Things will work, but I wonder if we can support your use case with launchServer out of the box!

pavelfeldman · 2020-10-06T16:01:57Z

Btw, if you join #rpc channel on our Slack (link in our main README.md), we can discuss this and other topics there!

joelgriffith · 2020-10-06T19:44:29Z

We landed support for this in browserless/browserless#1018. There's a new /playwright route that we've added, and it forks the default WebSocket behavior of proxying through to CDP, and instead proxies over into the browserServer endpoint. Some things about browserless don't yet work, like "head-full" and --user-data-dir's since it's not supported in the browserServer methods. We'll add those back in over time once we get broader playwright support on our stack.

You can see the change in behavior here: https://github.com/browserless/chrome/blob/master/src/puppeteer-provider.ts#L400

Niek · 2020-10-06T19:45:53Z

@pavelfeldman Not OP, but I want to highlight that there are various reasons to connect to existing Chrome CDP websockets that are not launched by Playwright. For example, if you have a scalable Chrome (e.g. k8s) cluster that already works with Puppeteer/Playwright. Or if you use Playwright with a third party infra solution like Browserless (see comment above of @joelgriffith) or HeadlessTesting.

It would be awesome if there would be a way to call launchServer() with an existing CDP websocket for these usecases.

pavelfeldman · 2020-10-06T21:25:01Z

This all makes sense.

When we are reasoning about it internally, we refer to the model where the k8s cluster includes Playwright Server and uses it to launch the browser. I believe Selenoid is now doing that already. This enables us (and you, cloud providers) operate much more efficiently:

We support non-Chromium browsers uniformly. Switching to this model gives WebKit and Firefox for free
Protocol traffic drops, instead of 4-8 roundtrips for each click you get 1.
We can be efficient with the resources written by the browser, including downloads, recently added videos and work-in-progress traces. They reside next to the browser, are accessible to the Playwright server and can be exposed to the user over the wire.

So putting Playwright server into the cloud sounds like a clear win to us, by good margin. If there is some code that you are using to manage the executed instances that is missing from the Playwright Server, we would be happy to consider adding it upstream.

We understand that it isn't free: solutions that support both Puppeteer and Playwright would need to maintain different code paths to support both. We also realize that it isn't feature complete (launchPersistent is not supported by connect currently). But we think multi-browser support, videos, traces and all the new features this structure enables us to do are totally worth it.

joelgriffith · 2020-10-06T22:12:40Z

@pavelfeldman thanks for going over this. I think the things you've outlined make a lot of sense, in particular the network traffic (an almost 10x drop in network chatter is huge). This is a big one for us.

For us, the problem is maintenance burden as we support playwright, puppeteer, selenium as well as our own REST APIs. Adding the new playwright server was quick, but over the long term the cost does add up, so it's not a trivial change. As much as the change was a bit of a shock, I think it's worth the burden given what you've said.

A few things that would be nice:

Support for user-data-dir in browser-server. Not sure what the complications are there since it's not part of the current API.
Allow for a newContext event so we can inject network interceptors for ad-blocking in our stack: https://github.com/browserless/chrome/blob/master/src/chrome-helper.ts#L139. There's a few other things as well here, but it sounds like playwright takes care of them mostly, so just this ad-blocker would be nice.

berstend · 2020-10-07T07:37:31Z

@joelgriffith congrats on your very fast fix :-)

My use-case differs a bit from browserless, in the sense that I can run playwright on the server but I cannot use the current launchServer to spawn browsers locally as they're remote browser instances that are already running.

If there is some code that you are using to manage the executed instances that is missing from the Playwright Server, we would be happy to consider adding it upstream.

@pavelfeldman Would the Playwright project be open to upstream patches for the following:

Extend launchServer with additional support for connecting to an existing CDP session (e.g. by adding options.cdpSession)
Add a user-agent: playwright/${version} request header to the internal cdpSession Websocket request (ws supports setting those, the change would be trivial) - I currently have a PoC that can differentiate between Playwright/Puppeteer clients by analyzing the first websocket frame, but looking at a user-agent would be much more robust 😄

Happy to work on those changes (they should be side-effect free) and implement them in a way that fits the Playwright code style & naming preferences.

edit, just to visualize what those changes would allow me to do 😄:

                   +------------+                             +---------------+
pptr client  +-->  |            |  +----------------------->  |               |
                   |  custom    |                             |  Chromium CDP |
                   |  websocket |        playwright           |  websocket    |
  pw client  +-->  |            |  +-->  wire protocol  +-->  |               |
                   +------------+        websocket            +---------------+

dgozman · 2020-10-07T09:40:46Z

Support for user-data-dir in browser-server. Not sure what the complications are there since it's not part of the current API.

Do you mind sharing the usecase for this one? IIUC, the user data dir won't be accessible to any remote client, because it lives on the server, so how is it intended to be used?

Also note that Playwright assumes exclusive access to the browser context to implement all kinds of context-wide method, e.g. routes. Since there is just a single persistent context that makes use of the user data dir, perhaps there is/should be a separate mechanism that ensures exclusive access to this persistent context between clients?

Allow for a newContext event so we can inject network interceptors for ad-blocking in our stack: https://github.com/browserless/chrome/blob/master/src/chrome-helper.ts#L139. There's a few other things as well here, but it sounds like playwright takes care of them mostly, so just this ad-blocker would be nice.

Are you talking about something like browser.on('context', (context: BrowserContext) => {}) event?

dgozman · 2020-10-07T09:44:09Z

Extend launchServer with additional support for connecting to an existing CDP session (e.g. by adding options.cdpSession)

I think we'll be open to something like that. I'd prefer options.cdpWebsocketEndpoint name, sorry for bikeshedding 😄

Add a user-agent: playwright/${version} request header to the internal cdpSession Websocket request (ws supports setting those, the change would be trivial) - I currently have a PoC that can differentiate between Playwright/Puppeteer clients by analyzing the first websocket frame, but looking at a user-agent would be much more robust 😄

Are you talking about sending this header when connecting to the new CDP webscoket from the launchServer? Sounds good to me.

berstend · 2020-10-07T16:37:56Z

I'd prefer options.cdpWebsocketEndpoint name, sorry for bikeshedding 😄

Haha, no worries - no strong opinions on the naming here (hence me asking in advance) 😄 cdpWebsocketEndpoint sounds good.

Are you talking about sending this header when connecting to the new CDP webscoket from the launchServer? Sounds good to me.

Yup, I'll be also checking with the puppeteer team to add a similar patch to their CDP websocket client.

Great, I'm gonna get familiar with the playwright build/test setup and submit a draft PR once I find an elegant way to add cdpWebsocketEndpoint support to launchServer.

Thanks for the feedback!

headlesstesting · 2020-11-03T14:40:30Z

Hi,

We're in the same boat as @berstend and are eagerly awaiting the possibility for a cdpWebsocketEndpoint option as well as a user-agent option (#1729)

People using HeadlessTesting.com can not use Playwright v1.4.0 with our service due to this change.
If there's anything we can do to help, please let us know.

ZeroBrowser · 2020-12-27T04:19:54Z

We are also advising our users to not use version v1.4.0 or higher at 0browser.com. Please let us know if we can help with anything. Otherwise similar to browserless we have to create a dedicated endpoint for Playwright. Thanks!

Niek · 2020-12-27T04:58:18Z

Is there any progress on reviewing #4344 @dgozman?

mxschmitt · 2021-05-08T10:56:33Z

Closing, since there is now BrowserType.connectOverCDP since a few releases: https://playwright.dev/docs/api/class-browsertype#browsertypeconnectovercdpparams

Niek mentioned this issue Oct 6, 2020

Playwright support broken as of v1.4.0 browserless/browserless#1016

Closed

berstend mentioned this issue Nov 4, 2020

feat: add support for using existing cdp websocket in launchServer #4344

Closed

mxschmitt mentioned this issue Jan 7, 2021

[Question] How to connect to a browser running on a specific port or launch browser at a specific port. #4918

Closed

pavelfeldman added the P2-bug label Jan 16, 2021

pavelfeldman assigned dgozman Jan 16, 2021

pavelfeldman added the v1.9 label Jan 18, 2021

danez mentioned this issue Feb 4, 2021

[Question] How to get websocket url from running server #5314

Closed

pavelfeldman added P3-collecting-feedback and removed P2-bug labels Feb 19, 2021

mxschmitt closed this as completed May 8, 2021

liweixi100 mentioned this issue May 30, 2021

BrowserContext support for Playwright browserless/browserless#1508

Closed

bsyouness mentioned this issue Jun 17, 2021

Can't connect to WS after launching separate headless chrome instance jupyterlab/galata#59

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[REGRESSION]: chromium.connect does not work with vanilla CDP servers anymore #4054

[REGRESSION]: chromium.connect does not work with vanilla CDP servers anymore #4054

berstend commented Oct 5, 2020 •

edited

Loading

dgozman commented Oct 6, 2020

berstend commented Oct 6, 2020 •

edited

Loading

pavelfeldman commented Oct 6, 2020

pavelfeldman commented Oct 6, 2020

joelgriffith commented Oct 6, 2020

Niek commented Oct 6, 2020

pavelfeldman commented Oct 6, 2020

joelgriffith commented Oct 6, 2020

berstend commented Oct 7, 2020 •

edited

Loading

dgozman commented Oct 7, 2020

dgozman commented Oct 7, 2020

berstend commented Oct 7, 2020

headlesstesting commented Nov 3, 2020

ZeroBrowser commented Dec 27, 2020

Niek commented Dec 27, 2020

mxschmitt commented May 8, 2021

[REGRESSION]: chromium.connect does not work with vanilla CDP servers anymore #4054

[REGRESSION]: chromium.connect does not work with vanilla CDP servers anymore #4054

Comments

berstend commented Oct 5, 2020 • edited Loading

dgozman commented Oct 6, 2020

berstend commented Oct 6, 2020 • edited Loading

pavelfeldman commented Oct 6, 2020

pavelfeldman commented Oct 6, 2020

joelgriffith commented Oct 6, 2020

Niek commented Oct 6, 2020

pavelfeldman commented Oct 6, 2020

joelgriffith commented Oct 6, 2020

berstend commented Oct 7, 2020 • edited Loading

dgozman commented Oct 7, 2020

dgozman commented Oct 7, 2020

berstend commented Oct 7, 2020

headlesstesting commented Nov 3, 2020

ZeroBrowser commented Dec 27, 2020

Niek commented Dec 27, 2020

mxschmitt commented May 8, 2021

berstend commented Oct 5, 2020 •

edited

Loading

berstend commented Oct 6, 2020 •

edited

Loading

berstend commented Oct 7, 2020 •

edited

Loading