Replace `java.net.HTTPUrlConnection` with `java.net.http.HttpClient` (500USD Bounty) #150

lihaoyi · 2024-01-24T09:38:39Z

This would fix #123, and put us in a better state going forward.

HttpClient was standardized on Java 11, which is probably old enough we can rely on it already

HttpClient provides an async API, and we should expose it as requests.get.async, requests.post.async, etc.. Apart from that, the rest of the API and test suite should be unchanged. All existing tests should pass, with any unavoidable failures or necessary changes in the tests called out in the ticket

This should also resolve #123

To incentivize contribution, I'm putting a 500USD bounty on resolving this ticket. This is payable via bank transfer, and at my discretion in case of ambiguity.

The text was updated successfully, but these errors were encountered:

nafg · 2024-05-14T22:52:20Z

I'll take a stab at this now

nafg · 2024-05-15T01:15:01Z

HTTPUrlConnection has you write the request body by exposing an OutputStream, and RequestBlob is designed around this.

OTOH with HttpClient you pass a BodyPublisher, which is an abstraction analogous to RequestBlob other than the headers, that has helpers given you have a byte array or InputStream or other common cases. Otherwise it's based on java.util.concurrent.Flow.Publisher so you can wrap one of those or extend it.

Can I change the RequestBlob API, or do we have to work with it? In particular I'm not sure how to adapt it to a BodyPublisher without either loading it all into memory or starting another thread.

TBH I'm not sure why it has to take something as powerful as Publisher which can have multiple subscribers.

lihaoyi · 2024-05-15T04:36:21Z

@nafg does HttpResponse.BodyHandlers.ofInputStream do what we want? At at glance it seems to do the right thing - streaming the data and making it available to whoever reads the stream as it downloads.

nafg · 2024-05-15T06:34:24Z

@nafg does HttpResponse.BodyHandlers.ofInputStream do what we want? At at glance it seems to do the right thing - streaming the data and making it available to whoever reads the stream as it downloads.

I guess you mean BodyPublisher (BodyHandler would be for the response body). It does "the right thing" as you say, but I don't know how to adapt a RequestBlob to it (without reading it into memory or starting another thread).

nafg · 2024-05-15T06:36:09Z

As I said, the cleanest implementation would be to completely break the RequestBlob API. (Of course we would still support implicitly adapting the same types of values.)

nafg · 2024-05-15T06:41:51Z

Maybe I could just create a connected PipedOutputStream and PipedInputStream and pass the former to data.write and the latter to BodyPublisher.ofInputStream. But where would write be called?

lihaoyi · 2024-05-15T09:51:21Z

Sorry, I mixed up BodyHandler and BodyPublisher.

If I understand the problem right, it is that RequestBlob exposes a push-based def write(out: java.io.OutputStream): Unit for people to implement, whereas HttpRequest.BodyPublisher.ofInputStream requires a pull-based Supplier[? extends InputStream], and the pull-based approach is fundamentally harder to implement that the original push-based approach and thus incompatible.

Making RequestBlob pull-based rather than push-based is a no-go. Pull-based APIs are much harder to implement than pull-based APIs, and it would mean the Writables coming from e.g. uPickle or Scalatags or other libraries would not be compatible with this API. Making uPickle, Scalatags, etc. support pull-based serialization would be a full rewrite and is unfeasible

Spawning a thread is an option, but an expensive one. What if we instead use the Java HTTPClient's sendAsync, and then use the calling thread to call RequestBlob#write and block on it? The calling thread isn't doing anything anyway when using sendAsync, so it costs us nothing to use it to call .write. That way we don't need to pay the cost of a new thread while the calling thread sits idle

I'm actually not sure exactly how the async concurrency story works with java.net.http.HttpClient. Does it need an Executor, ExecutionContext, or Threadpool of some sort? We need to sort that out as part of this ticket so we can understand what the consequences of this replacement will be

nafg · 2024-05-15T15:08:22Z

I'm actually not sure exactly how the async concurrency story works with java.net.http.HttpClient. Does it need an Executor, ExecutionContext, or Threadpool of some sort? We need to sort that out as part of this ticket so we can understand what the consequences of this replacement will be

Yeah, the builder takes an Executor IIRC

nafg · 2024-05-15T15:10:55Z

Spawning a thread is an option, but an expensive one. What if we instead use the Java HTTPClient's sendAsync, and then use the calling thread to call RequestBlob#write and block on it? The calling thread isn't doing anything anyway when using sendAsync, so it costs us nothing to use it to call .write. That way we don't need to pay the cost of a new thread while the calling thread sits idle

Does that mean that we give it an InputStream that will block until we write to the OutputStream? Not sure I follow.

lihaoyi · 2024-05-15T15:22:39Z

I'm not totally sure myself TBH. But if you said you can solve the problem by spawning a thread, it seems like you should be able to solve the problem by using sendAsync and taking advantage of the calling thread.

nafg · 2024-05-16T05:37:04Z

Do I need to handle chunkedUpload==false

nafg · 2024-05-16T05:41:00Z

I don't see an equivalent of connection.getResponseMessage

lihaoyi · 2024-05-16T12:00:13Z

@nafg You'll have to dig through the code and tell me what the tradeoffs and necessary changes are

nafg · 2024-05-16T21:12:05Z

Re response message (reason phrase) see https://stackoverflow.com/a/63576759/333643

Some points from there: HTTP 2 and some HTTP 1 servers don't provide it, HttpClient has no API to get it, and it doesn't really have any purpose.

Two options:

Preserve compatibility using a table of strings per status code
Remove the field from case class Response

nafg · 2024-05-16T21:53:00Z

I guess we'll need more breaking changes or fake compatibility: https://stackoverflow.com/q/53617574/333643 (can't control keep-alive)

nafg · 2024-05-16T22:12:53Z

It doesn't let you set Content-Length manually; it should be determined by the BodyPublisher. Which means RequestBlob can't define it among an arbitrary list of headers.

If I can make breaking changes as desired the most natural replacement would be

trait RequestBlob {
  def bodyPublisher: BodyPublisher
  def contentType: String
}

There are less breaking ways to do it though.

I guess that's really the theme of all my questions. Is this supposed to be "preserve the interface, swap the implementation" or is this going to be a V2 designed around HttpClient? If it's somewhere in the middle then I have to consult on all the specifics I guess.

lihaoyi · 2024-05-16T23:56:15Z

I guess that's really the theme of all my questions. Is this supposed to be "preserve the interface, swap the implementation" or is this going to be a V2 designed around HttpClient? If it's somewhere in the middle then I have to consult on all the specifics I guess.

I think it's somewhere in the middle:

Breaking changes are fine, especially ones that the Java 11 HTTP design explicitly decided were low priority enough to not support. Requests-Scala isn't 1.0.0 yet, so we are not promising people that we'll keep things strictly compatible, so it's just a best effort thing and if users need to tweak code in some edge cases to update that's fine.
But integration with uPickle, Scalatags, etc. via geny.Writable should not break, if at all possible. Being able to stream stuff seamlessly between the different libraries is a big deal, and not a capability I would give up lightly. Swapping over to geny.Readable to require it read from an InputStream is not feasible, because Readable is tremendously harder to implement than Writable

At least for now, it seems we have "use sendAsync and use the calling thread to call .write" as a path forward for keeping (2) working. If that doesn't work, then we'll have to reconsider: maybe we don't want to do this at all, maybe we want to pull in some third-party HTTP library, maybe something else. But we can cross that bridge when we reach it

nafg · 2024-05-20T19:42:19Z

I have all tests passing except gzipError which does requests.head("https://api.github.com/users/lihaoyi").

For some reason it returns a 400. I don't know why, and I don't know what it has to do with gzip. Any thoughts?

lihaoyi · 2024-05-21T02:50:44Z

@nafg not sure, you'll have to dig into it to take a look. Maybe try setting up a proxy to intercept the requests and compare the difference between the old and new?

nafg · 2024-05-23T21:30:02Z

Just logging this here to remember it. I used webhook.site to see what the gzipError test was doing different. I saved each as HAR JSON and compared with https://jsoncompare.com/#!/diff/id=a5efdd49952712265bd06d4e8be477cb&fullscreen&sortAlphabetically/ and got this:

nafg · 2024-05-23T21:32:00Z

so:

The old version includes cache-control: no-cache
The old version includes content-length: while the new version does content-length: 0
The old version includes pragma: no-cache
The new version includes transfer-encoding: chunked

I'm guessing the issue (if any of the above and not something more subtle) is (4).

But I still don't know what the purpose of the test is.

nafg · 2024-05-23T21:37:44Z

Ok it's from c65a2be and seems to be part of #95 trying to fix #37.

So basically it's trying to assert that HEAD requests that accept gzipped encoding don't throw an exception trying to read a non-existent body.

nafg · 2024-05-24T21:02:05Z

So using a BodyPublisher with a fixed length when contentLength is known (as would be correct anyway) fixes the 400.

But it creates another issue, which I think is a bug in the current behavior that now crashes: Content-Length should be the size after compression.

Since there's no way to know the length after compression without compressing it, that basically means that known-content length RequestBlobs must be written into memory, which means there's no point in them specifying their content length at all. They should just say if they should be streamed or not.

lihaoyi · 2024-05-25T06:09:57Z

The gzip + content length thing seems like it might need an API change to make it work.

What if we defaulted to yes streaming + no content length, and provided a Compress.GzipBuffered that did the compression in memory and thus was yes content length + no streaming?

I don't see an easy way to make the decision automatically in a principled fashion, so maybe the best we can do is let the user choose

nafg · 2024-05-26T07:38:46Z

The gzip + content length thing seems like it might need an API change to make it work.

What if we defaulted to yes streaming + no content length, and provided a Compress.GzipBuffered that did the compression in memory and thus was yes content length + no streaming?

I don't see an easy way to make the decision automatically in a principled fashion, so maybe the best we can do is let the user choose

I didn't fully understand that.

I thought we needed an API change anyway and that was okay. I'll implement my last idea in the near future and push my code up. I think that should help us communicate ;)

nafg · 2024-05-27T08:58:37Z

Regarding the last issue, what I did for now is that the request body is fixed-length if it's empty or has a known length and is uncompressed. If it is compressed or its length is unknown then it will be streamed.

In both cases the request is first initiated with sendAsync and then in the calling thread we call .write just as was done until now (as you suggested).

(Their API is a bit high level but IIUC fixed-length would mean it includes a Content-Length header and streaming means it will be Transfer-Encoding: chunked.)

This means that uploading even a large file without compression will load it all (compressed) into memory (using the provided implicit conversion). I don't know if there is alternative (other than perhaps picking a maximum size above which we always stream, or else saying that certain RequestBlobs such as files always should stream). Similar for adapting an arbitrary Writable that defines a contentLength.

lihaoyi · 2024-06-21T03:18:28Z

@nafg I cut 0.9.0-RC1. Will leave it for a few weeks before cutting 0.9.0 final.

If you email me your bank transfer details I can close out the bounty

nafg · 2024-06-24T08:22:16Z

I don't know your email address

lihaoyi · 2024-06-24T08:54:03Z

***@***.***

…

On Mon, 24 Jun 2024 at 5:22 PM, Naftoli Gugenheim ***@***.***> wrote: I don't know your email address — Reply to this email directly, view it on GitHub <#150 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAHEB7DFLMWWY2EAPEPPSCDZI7JM7AVCNFSM6AAAAABCINXRLSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBVHEYDCMZTGI> . You are receiving this because you modified the open/close state.Message ID: ***@***.***>

sideeffffect · 2024-06-24T10:21:38Z

@nafg https://github.com/lihaoyi

in case the email address is censored, because I see this as the response

lihaoyi · 2024-06-24T10:27:55Z

huh, github censors emails???

haoyi.sg@gmail.com

sideeffffect · 2024-06-24T10:43:52Z

Maybe only in email responses? 🤷

lihaoyi changed the title ~~Replace java.net.HTTPUrlConnection with java.net.http.HttpClient~~ Replace java.net.HTTPUrlConnection with java.net.http.HttpClient (500USD Bounty) May 14, 2024

nafg mentioned this issue May 27, 2024

Use JDK 11 HttpClient #158

Merged

lihaoyi closed this as completed in #158 Jun 16, 2024

lihaoyi closed this as completed in 1a911bb Jun 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace `java.net.HTTPUrlConnection` with `java.net.http.HttpClient` (500USD Bounty) #150

Replace `java.net.HTTPUrlConnection` with `java.net.http.HttpClient` (500USD Bounty) #150

lihaoyi commented Jan 24, 2024 •

edited

Loading

nafg commented May 14, 2024 •

edited

Loading

nafg commented May 15, 2024

lihaoyi commented May 15, 2024

nafg commented May 15, 2024

nafg commented May 15, 2024 •

edited

Loading

nafg commented May 15, 2024

lihaoyi commented May 15, 2024 •

edited

Loading

nafg commented May 15, 2024

nafg commented May 15, 2024

lihaoyi commented May 15, 2024

nafg commented May 16, 2024

nafg commented May 16, 2024

lihaoyi commented May 16, 2024

nafg commented May 16, 2024 •

edited

Loading

nafg commented May 16, 2024

nafg commented May 16, 2024

lihaoyi commented May 16, 2024

nafg commented May 20, 2024

lihaoyi commented May 21, 2024

nafg commented May 23, 2024

nafg commented May 23, 2024

nafg commented May 23, 2024

nafg commented May 24, 2024

lihaoyi commented May 25, 2024

nafg commented May 26, 2024

nafg commented May 27, 2024 •

edited

Loading

lihaoyi commented Jun 21, 2024

nafg commented Jun 24, 2024

lihaoyi commented Jun 24, 2024 via email

sideeffffect commented Jun 24, 2024

lihaoyi commented Jun 24, 2024

sideeffffect commented Jun 24, 2024

Replace java.net.HTTPUrlConnection with java.net.http.HttpClient (500USD Bounty) #150

Replace java.net.HTTPUrlConnection with java.net.http.HttpClient (500USD Bounty) #150

Comments

lihaoyi commented Jan 24, 2024 • edited Loading

nafg commented May 14, 2024 • edited Loading

nafg commented May 15, 2024

lihaoyi commented May 15, 2024

nafg commented May 15, 2024

nafg commented May 15, 2024 • edited Loading

nafg commented May 15, 2024

lihaoyi commented May 15, 2024 • edited Loading

nafg commented May 15, 2024

nafg commented May 15, 2024

lihaoyi commented May 15, 2024

nafg commented May 16, 2024

nafg commented May 16, 2024

lihaoyi commented May 16, 2024

nafg commented May 16, 2024 • edited Loading

nafg commented May 16, 2024

nafg commented May 16, 2024

lihaoyi commented May 16, 2024

nafg commented May 20, 2024

lihaoyi commented May 21, 2024

nafg commented May 23, 2024

nafg commented May 23, 2024

nafg commented May 23, 2024

nafg commented May 24, 2024

lihaoyi commented May 25, 2024

nafg commented May 26, 2024

nafg commented May 27, 2024 • edited Loading

lihaoyi commented Jun 21, 2024

nafg commented Jun 24, 2024

lihaoyi commented Jun 24, 2024 via email

sideeffffect commented Jun 24, 2024

lihaoyi commented Jun 24, 2024

sideeffffect commented Jun 24, 2024

Replace `java.net.HTTPUrlConnection` with `java.net.http.HttpClient` (500USD Bounty) #150

Replace `java.net.HTTPUrlConnection` with `java.net.http.HttpClient` (500USD Bounty) #150

lihaoyi commented Jan 24, 2024 •

edited

Loading

nafg commented May 14, 2024 •

edited

Loading

nafg commented May 15, 2024 •

edited

Loading

lihaoyi commented May 15, 2024 •

edited

Loading

nafg commented May 16, 2024 •

edited

Loading

nafg commented May 27, 2024 •

edited

Loading