Proposal for a K6 stream API #592

danron · 2018-04-20T08:54:15Z

Proposal for a K6 stream API

The open() function becomes pretty memory heavy when using many VU's and opening a big file. To have some kind of stream functionality would solve this problem and allow K6 to work with huge files as input data.

The init context

openStream( URI )

Description

Open a buffered input stream reader.

Parameters

URI : file://...

      (Maybe in the future: http://... | https://... | ssh://)

Return value

A handle to the stream which should be used by other calls when operating on this stream.

The stream module

stream.readUntilChar( streamHandle, stopChar )

Description

Read data until stopChar occurs in the stream or until EOF. Also move the read pointer for the next read operation, by any VU, forward to after the position of stopChar or to EOF.

Parameters

stopChar : A character, example '\n'.

           Maybe allow for a stopString if needed.

streamHandle : A handle to a stream as returned by openStream().

Return value

Returns a string from the read pointer position to the stopChar, including the stopChar or returns a string from the read pointer position to EOF.

Example:

import stream from "k6/stream"

var mystream = openStream("file://mydata.csv")

export default  function() {
  let [next, eof] = stream.readUntilChar(mystream, '\n')
  http.post("http://localhost/save", next)
  if (eof) {
    stop(); // this does not exist yet.
  }
}

Implementation and promises

A buffered reader in Go that will fetch more data and fill the buffer when the buffers unread data goes under a threshold. The threshold and buffersize should be parameterized. The VUs will pick data from this buffer and move a global pointer to ensure no other VU will read the same data. This global pointer will require some locking but the performance impact should be negligible since VUs will spend a very small part of its time fetching data from the stream.

Expandability

This proposal is probably enough for most use cases but it would be possible to extend the stream API in the future with things like readBytes() and/or readJsonObject().

The text was updated successfully, but these errors were encountered:

na-- · 2018-04-22T20:51:24Z

As I mentioned in the slack chat, I think something like this could be very useful, but I'm not sure what the most flexible and future-proof way to implement it would be. Here are some of the concerns:

Sometimes it would be useful if all VUs are able to read all of the file data efficiently, i.e. instead of having a shared stream, to have something like a shared read-only buffer.
We'd have to implement some constraints when users execute k6 archive or k6 cloud (which executes k6 archive internally), since all external resources are packaged in it - there should be some a file size limit. Or this is another point for supporting http streaming early on.
It would be impossible to synchronize that stream for VUs in the cloud or in the future cluster mode.

As for implementation details, generally Go has a very nice API when it comes to dealing with streams and buffers. I think that it may be useful to try and expose streams and buffers to the JS code through the Go conventions, since if we manage it we could get a lot of great functionality for free. Some examples:

danron · 2018-04-23T07:11:06Z

When VUs wants a shared buffer I think that would need to be a completely different API. This is specific to be able to read huge files, there is no way around implementing that as some kind of stream.

Regarding k6 archive and cloud, I'm not familiar enough with that but maybe the stream API would be unavailable or have some constraints when running k6 like that. Having a file size limit on the streamed file when running k6 locally makes no sense.

My proposal does not concern syncing between different instances of k6, that would be solved by pre-partitioning the data or consume data from a database via HTTP if ordered data across all VUs in a k6 cluster is important.

I don't think exposing the Go io libs as is (except some read functions) to JS is a good idea in this particular case. I guess you mean that then you could implement whatever behaviour you need in your k6 script? Maybe I misunderstand you, but if not; You'd need to think about scope of variables and semaphores when VUs are reading from a stream etc. A lot of calls that would need to take the roundtrip via Goja. There is not that many ways to consume data as input to a test (shared buffer, streaming and memory mapped files) so it would be much simpler for the user if k6 provides simplified APIs for those cases. But can you explain in more detail how you mean? :)

cyberw · 2018-05-02T13:24:32Z

@na-- , any thoughts? :)

na-- · 2018-05-02T13:47:16Z

Ah, sorry, I'd missed @danron's last comment, thanks for pinging me! As requested, thoughts:

I agree, a shared buffer and a stream would need completely different APIs. But it would be useful to be able to use a buffer (for example, the response body from an HTTP request in setup()) as a shared stream. Also, I see use benefits of having 2 types of streams - shared (where each VU reads only a slice of the stream) and individual (each VU has read-only access to the whole stream). Not sure how to best express this in the K6 API though.
For the moment, I think that there's no difference between the k6 API when executing locally or in the cloud. I'm not sure that's even needed in this case, we just have to add some sort of size restrictions for the embedded files in the archives.
Regarding the Go libraries, I didn't mean we should directly expose them in the JS. My thoughts here are that we should probably internally use something close to the bufio Reader/Scanner, since that would allow us to trivially support every type that implements io.Reader (files, pipes, network connections, buffers, HTTP responses, etc.). Then we could carefully (and synchronously) expose some of the read methods to the script without having to write a lot of code. stream.readUntilChar() is a bit restricting, take a look at the bufio.Reader API to see how many useful read functions it offers.

danron · 2018-05-03T07:21:25Z

The readUntilChar is basically just a first example (a more flexible version of readLine), together with 'read n bytes' probably the two most used read operations. As I mention in the first post there is a lot of different read functions that could be implemented and I agree that we should have an API similar to the Go io libs, just on a higher level and hide everything that is not necessarily relevant. As a first step and what I think is a good start is to implement it as I describe in the first post, one global stream pointer with a lock. It should be trivial to extend that into each VU having individual pointers in the stream but that will consume more memory and at least for us it is not a required functionality.

imiric · 2023-03-29T09:32:36Z

We recently started work on the new HTTP API (initial design document), and part of that will involve a solution that addresses this. We've decided to implement the web Streams API instead, which would be more familiar to JS developers. This work is tracked in #2978.

Since this is a very old issue that describes a purpose-built streams API just for k6, we won't be implementing it as proposed, so I'll close this. Thanks for the proposal anyway @danron! 🙇

na-- added feature performance labels Apr 22, 2018

SaberStrat mentioned this issue Jul 6, 2018

Concurrent map writes when incrementing a property from setup()'s return #693

Closed

mstoykov mentioned this issue Jan 16, 2019

Stream api #612

Closed

na-- mentioned this issue Apr 9, 2019

Execution segments for partitioning work between k6 instances #997

Closed

This was referenced Apr 22, 2019

Allow listing of files in the init context #1005

Open

How to read stream of file in k6 #1002

Closed

na-- mentioned this issue May 16, 2019

CSV API #1021

Closed

imiric mentioned this issue Oct 2, 2019

how to share a large data between every VUs? #1181

Closed

na-- mentioned this issue Jul 8, 2020

Data segmentation API framework #1539

Open

na-- mentioned this issue Oct 12, 2020

SFTP protocol support #1667

Closed

na-- mentioned this issue Oct 22, 2020

Improve errors and warnings #1681

Open

na-- mentioned this issue Dec 15, 2021

More options for custom body files forms #2298

Closed

mstoykov mentioned this issue Dec 22, 2021

Problems and deficiencies around uploading big files with k6 #2311

Open

oleiade mentioned this issue Mar 13, 2023

Add Streams API support to k6 #2978

Closed

imiric closed this as not planned Won't fix, can't repro, duplicate, stale Mar 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal for a K6 stream API #592

Proposal for a K6 stream API #592

danron commented Apr 20, 2018 •

edited

Loading

na-- commented Apr 22, 2018

danron commented Apr 23, 2018 •

edited

Loading

cyberw commented May 2, 2018

na-- commented May 2, 2018

danron commented May 3, 2018

imiric commented Mar 29, 2023

Proposal for a K6 stream API #592

Proposal for a K6 stream API #592

Comments

danron commented Apr 20, 2018 • edited Loading

Proposal for a K6 stream API

The init context

openStream( URI )

Description

Parameters

Return value

The stream module

stream.readUntilChar( streamHandle, stopChar )

Description

Parameters

Return value

Example:

Implementation and promises

Expandability

na-- commented Apr 22, 2018

danron commented Apr 23, 2018 • edited Loading

cyberw commented May 2, 2018

na-- commented May 2, 2018

danron commented May 3, 2018

imiric commented Mar 29, 2023

danron commented Apr 20, 2018 •

edited

Loading

danron commented Apr 23, 2018 •

edited

Loading