Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal for a K6 stream API #592

Closed
danron opened this issue Apr 20, 2018 · 6 comments
Closed

Proposal for a K6 stream API #592

danron opened this issue Apr 20, 2018 · 6 comments

Comments

@danron
Copy link

danron commented Apr 20, 2018

Proposal for a K6 stream API

The open() function becomes pretty memory heavy when using many VU's and opening a big file. To have some kind of stream functionality would solve this problem and allow K6 to work with huge files as input data.

The init context

openStream( URI )

Description

Open a buffered input stream reader.

Parameters

URI : file://...

      (Maybe in the future: http://... | https://... | ssh://)

Return value

A handle to the stream which should be used by other calls when operating on this stream.

The stream module

stream.readUntilChar( streamHandle, stopChar )

Description

Read data until stopChar occurs in the stream or until EOF. Also move the read pointer for the next read operation, by any VU, forward to after the position of stopChar or to EOF.

Parameters

stopChar : A character, example '\n'.

           Maybe allow for a stopString if needed.

streamHandle : A handle to a stream as returned by openStream().

Return value

Returns a string from the read pointer position to the stopChar, including the stopChar or returns a string from the read pointer position to EOF.

Example:

import stream from "k6/stream"

var mystream = openStream("file://mydata.csv")

export default  function() {
  let [next, eof] = stream.readUntilChar(mystream, '\n')
  http.post("http://localhost/save", next)
  if (eof) {
    stop(); // this does not exist yet.
  }
}

Implementation and promises

A buffered reader in Go that will fetch more data and fill the buffer when the buffers unread data goes under a threshold. The threshold and buffersize should be parameterized. The VUs will pick data from this buffer and move a global pointer to ensure no other VU will read the same data. This global pointer will require some locking but the performance impact should be negligible since VUs will spend a very small part of its time fetching data from the stream.

Expandability

This proposal is probably enough for most use cases but it would be possible to extend the stream API in the future with things like readBytes() and/or readJsonObject().

@na--
Copy link
Member

na-- commented Apr 22, 2018

As I mentioned in the slack chat, I think something like this could be very useful, but I'm not sure what the most flexible and future-proof way to implement it would be. Here are some of the concerns:

  • Sometimes it would be useful if all VUs are able to read all of the file data efficiently, i.e. instead of having a shared stream, to have something like a shared read-only buffer.
  • We'd have to implement some constraints when users execute k6 archive or k6 cloud (which executes k6 archive internally), since all external resources are packaged in it - there should be some a file size limit. Or this is another point for supporting http streaming early on.
  • It would be impossible to synchronize that stream for VUs in the cloud or in the future cluster mode.

As for implementation details, generally Go has a very nice API when it comes to dealing with streams and buffers. I think that it may be useful to try and expose streams and buffers to the JS code through the Go conventions, since if we manage it we could get a lot of great functionality for free. Some examples:

@danron
Copy link
Author

danron commented Apr 23, 2018

When VUs wants a shared buffer I think that would need to be a completely different API. This is specific to be able to read huge files, there is no way around implementing that as some kind of stream.

Regarding k6 archive and cloud, I'm not familiar enough with that but maybe the stream API would be unavailable or have some constraints when running k6 like that. Having a file size limit on the streamed file when running k6 locally makes no sense.

My proposal does not concern syncing between different instances of k6, that would be solved by pre-partitioning the data or consume data from a database via HTTP if ordered data across all VUs in a k6 cluster is important.

I don't think exposing the Go io libs as is (except some read functions) to JS is a good idea in this particular case. I guess you mean that then you could implement whatever behaviour you need in your k6 script? Maybe I misunderstand you, but if not; You'd need to think about scope of variables and semaphores when VUs are reading from a stream etc. A lot of calls that would need to take the roundtrip via Goja. There is not that many ways to consume data as input to a test (shared buffer, streaming and memory mapped files) so it would be much simpler for the user if k6 provides simplified APIs for those cases. But can you explain in more detail how you mean? :)

@cyberw
Copy link

cyberw commented May 2, 2018

@na-- , any thoughts? :)

@na--
Copy link
Member

na-- commented May 2, 2018

Ah, sorry, I'd missed @danron's last comment, thanks for pinging me! As requested, thoughts:

  • I agree, a shared buffer and a stream would need completely different APIs. But it would be useful to be able to use a buffer (for example, the response body from an HTTP request in setup()) as a shared stream. Also, I see use benefits of having 2 types of streams - shared (where each VU reads only a slice of the stream) and individual (each VU has read-only access to the whole stream). Not sure how to best express this in the K6 API though.
  • For the moment, I think that there's no difference between the k6 API when executing locally or in the cloud. I'm not sure that's even needed in this case, we just have to add some sort of size restrictions for the embedded files in the archives.
  • Regarding the Go libraries, I didn't mean we should directly expose them in the JS. My thoughts here are that we should probably internally use something close to the bufio Reader/Scanner, since that would allow us to trivially support every type that implements io.Reader (files, pipes, network connections, buffers, HTTP responses, etc.). Then we could carefully (and synchronously) expose some of the read methods to the script without having to write a lot of code. stream.readUntilChar() is a bit restricting, take a look at the bufio.Reader API to see how many useful read functions it offers.

@danron
Copy link
Author

danron commented May 3, 2018

The readUntilChar is basically just a first example (a more flexible version of readLine), together with 'read n bytes' probably the two most used read operations. As I mention in the first post there is a lot of different read functions that could be implemented and I agree that we should have an API similar to the Go io libs, just on a higher level and hide everything that is not necessarily relevant. As a first step and what I think is a good start is to implement it as I describe in the first post, one global stream pointer with a lock. It should be trivial to extend that into each VU having individual pointers in the stream but that will consume more memory and at least for us it is not a required functionality.

@imiric
Copy link
Contributor

imiric commented Mar 29, 2023

We recently started work on the new HTTP API (initial design document), and part of that will involve a solution that addresses this. We've decided to implement the web Streams API instead, which would be more familiar to JS developers. This work is tracked in #2978.

Since this is a very old issue that describes a purpose-built streams API just for k6, we won't be implementing it as proposed, so I'll close this. Thanks for the proposal anyway @danron! 🙇

@imiric imiric closed this as not planned Won't fix, can't repro, duplicate, stale Mar 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants