fetching a binary file from http & sending it => corruption #1375

emmanueltouzery · 2020-03-27T07:36:44Z

Environment

k6 version: 0.26.1
OS and version: linux, fedora 31
Docker version and image, if applicable: -

Expected Behavior

I am fetching a binary file from HTTP in the setup(), I print the size of the binary in the setup and in the VU function. In my real program of course I want to send the binary over HTTP.

Actual Behavior

I would expect the length of the binary that I fetched to be the same in the setup and in the VU, but it's not. The binary is garbled:

INFO[0000] whew.png body size: 3399
INFO[0000] body size is: 3302

The first line is from the setup, the second from the VU sender.

The discrepancy is quite a lot worse if I don't use base64 encode/decode (in that case the size in the VU sender is about twice larger as before).

Steps to Reproduce the Problem

I have this test code:

import http from "k6/http";
import { sleep } from "k6";
import encoding from "k6/encoding";

function getFile(path) {
    // replace this IP with the IP of your machine
    const body = http.get(`http://192.168.178.76:8000/${path}`, {
        responseType: "binary"
    }).body;
    console.log(`${path} body size: ${body.length}`);
    return encoding.b64encode(body);
}

export function setup() {
    // put the name of a binary file in the current folder
    return getFile(`whew.png`);
}

export default function(data) {
    console.log("body size is: " + encoding.b64decode(data).length);
    sleep(25);
}

I used python to serve files from the local folder:

python2 -m SimpleHTTPServer 8000

As an aside, the reason I'm fetching the files from HTTP is that I have lots of files to send, different for each VU. If I fetch the files in the per-VU init code (as I think I'm meant to do), I don't have the __VU there (I'm getting __VU is not defined). So I'd have to fetch the files for all the VUs in each VU init code, which would be way too much: I have about 250Mb of data for all VUs together, and 2000 VUs -- if each VU did fetch all the data, I'd load 250Mb*2000. So what I tried to do is to load the data for all the VUs together just once, in the setup. But now I'm hitting this issue.

EDIT If I make http read+write in the VU sending function (not using the setup) then it works:
INFO[0000] whew.png body size: 3399
INFO[0000] body size is: 3399

The text was updated successfully, but these errors were encountered:

imiric · 2020-03-27T10:31:20Z

Hi,

the difference you're seeing here is because of a type difference: body in this case is a raw binary array whereas b64decode() returns a string, so their .length will be different, even though the data is the same. You can confirm this by console.log-ing the base64 string and decoding it manually (e.g. base64 --decode < enc.b64 > dec.png) and you'll see the image is not corrupted.

This behavior is part of the discrepancies in how k6 handles binary data. See issue #1020 for details. Ideally both body and the value returned by b64decode() would be of the same type and actually be usable, and there wouldn't be a difference in their .length values.

But to address your use case, even with this binary issue aside, currently you wouldn't be able to achieve the memory savings you expect by loading all data in setup() once, since that data is passed to each VU, so if you load and return 250Mb from setup() you'd still need 250Mb*$K6_VUS amount of memory during the test.

One workaround you can consider is manually splitting the data for each VU, as suggested here. Since you're not dealing with JSON, you would need to request only images for each specific VU, but that pattern would probably work for you.

Note that sharing setup data efficiently across VUs has already been discussed and planned (see #532), and with the upcoming #1007/#997 distributed execution changes this kind of setup will be easier and more efficient.

I'll close this as these are known issues, but let us know if you have additional questions, and for further support you're welcome to use the community forum.

emmanueltouzery · 2020-03-27T11:32:00Z

No you misread, the length difference is not because of base 64. I do decode, I'm pretty sure there is in fact a bug. Thank you for the tips -- now I'm preparing my data by fetching it in the vu loop (I could also fetch it just in the first iteration). But the bug i described does stand.

emmanueltouzery · 2020-03-27T11:35:47Z

here is an example of the corruption

mstoykov · 2020-03-27T16:18:35Z

After ... too much digging (and reading your screenshot backwards, which lost sometime), the problem for the particular case of b64encode->b64decode given different data is the combination of that b64decode returns string and how goja (the JS VM k6 uses) works with strings ...

The short of it is that if b64decode returned string instead of []byte ( even though b64encode takes a []byte). If it was returning a []byte it would've worked, but unfortunately, that is (probably) a breaking change.

A possible workaround that I found is to JSON.stringify(data) and then JSON.parse(stringified). This works for the first three lines of bytes in the screenshot above, don't know if it works for all ;).
Again in your case this will NOT save you any amount of memory because the setup data is copied to all VUs ... actually, because you will need to decode it, it will use even more memory :).

Longer explanation :D (I have probably gotten something wrong, but it seems like my conclusions agree with the experiments I have done)

JS uses utf16(kind of 😑 ) for it's strings. K6 is written in golang which uses utf8 for it's strings.

My knowledge on the matter is not much, but the important fact is that the byte representation of a character in the one doesn't match the other. So the Goja VM translates non ASCII only strings(fun fact ... UTF-8 ASCII is not the same as UTF-16 ASCII so no idea why :D) from k6's internal UTF-8 to UTF-16 when k6 returns a string to the JS VM and does it back around when a string from goja goes to k6(a little bit more complicated but ... close enough).

Now if we look at this code in the golang playground:

package main

import (
	"fmt"
	"reflect"
	"unicode/utf16"

	"github.com/davecgh/go-spew/spew"
)

func main() {
	b := []byte{
		0x50, 0x4b, 0x03, 0x04, 0x14, 0x00, 0x08, 0x08, 0x08, 0x00, 0x8a, 0x81, 0x7a, 0x50, 0x00, 0x00,
		0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x2a, 0x00, 0x00, 0x00, 0x54, 0x52,
		0x41, 0x4e, 0x5f, 0x56, 0x41, 0x4c, 0x5f, 0x42, 0x41, 0x4b, 0x55, 0x56, 0x41, 0x4c, 0x30, 0x30,
	}

	s := string(b)
	utf16s := utf16.Encode([]rune(s))
	utf8s := utf16.Decode(utf16s)

	fmt.Println(reflect.DeepEqual([]byte(s), b))
	fmt.Println(s == string(utf8s))
	spew.Dump([]byte(s))
	spew.Dump([]byte(string(utf8s)))
}

// output:
true
false
([]uint8) (len=48 cap=48) {
 00000000  50 4b 03 04 14 00 08 08  08 00 8a 81 7a 50 00 00  |PK..........zP..|
 00000010  00 00 00 00 00 00 00 00  00 00 2a 00 00 00 54 52  |..........*...TR|
 00000020  41 4e 5f 56 41 4c 5f 42  41 4b 55 56 41 4c 30 30  |AN_VAL_BAKUVAL00|
}
([]uint8) (len=52 cap=64) {
 00000000  50 4b 03 04 14 00 08 08  08 00 ef bf bd ef bf bd  |PK..............|
 00000010  7a 50 00 00 00 00 00 00  00 00 00 00 00 00 2a 00  |zP............*.|
 00000020  00 00 54 52 41 4e 5f 56  41 4c 5f 42 41 4b 55 56  |..TRAN_VAL_BAKUV|
 00000030  41 4c 30 30                                       |AL00|
}

Program exited.

we can see that going to UTF-16 from UTF-8 and back isn't lossless in this case. My gut feeling is that some of those are not actually UTF-8 valid and string([]byte{...}) doesn't do any checks or fixes to this just copies data bytes blindly, but the Encode/Decode from UTF-16 does :)

I would argue the exact reason is not important as this is clearly not how binary data should be handled in k6 and this should just be fixed by using typed arrays for []byte and so on.

Additionally, the k6 b64decode returns string, which definitely makes the whole thing look more and more like utf16.Encode skips/tries to fix what it doesn't understand from the supposedly UTF-8 encoded string that b64decode returns.

I would argue b64decode should have either always returned []byte or should have had mode to do that, but I am not certain it can now be worked around, and I would argue this should happen after #1020 (or as part of it :D).

Another issue found along the way is that because the data returned from setup is encoded as JSON using the golang's json package []byte arrays get b64encode ... but not decoded when we put it back in each VU, as they are strings. The goja JSON implementation does the correct thing ™️ and marshals it to an array of ints which is why the workaround above works.

This should be fixed, but unfortunately, the internal goja JSON implementation is not exported ... there is Object.MarshalJSON, but in the code we have a ... goja.Value so I'm pretty sure it will take more then two lines :(

mstoykov · 2021-12-21T09:57:43Z

I think that this should be fixed with a4927b6#diff-787f834ad3403248052890ea97f946bffc88d39d2821b3157b22451081c7c393, so I am closing it

emmanueltouzery added the bug label Mar 27, 2020

imiric closed this as completed Mar 27, 2020

na-- reopened this Mar 27, 2020

na-- added the evaluation needed proposal needs to be validated or tested before fully implementing it in k6 label Mar 27, 2020

na-- removed the evaluation needed proposal needs to be validated or tested before fully implementing it in k6 label Mar 30, 2020

mstoykov mentioned this issue Jan 14, 2021

encoding.b64decode/encode fails on binary data #1798

Closed

This was referenced Jan 27, 2021

Add FormData example grafana/k6-docs#203

Closed

Expand ArrayBuffer support #1800

Merged

mstoykov closed this as completed Dec 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fetching a binary file from http & sending it => corruption #1375

fetching a binary file from http & sending it => corruption #1375

emmanueltouzery commented Mar 27, 2020 •

edited

Loading

imiric commented Mar 27, 2020

emmanueltouzery commented Mar 27, 2020 •

edited

Loading

emmanueltouzery commented Mar 27, 2020

mstoykov commented Mar 27, 2020

mstoykov commented Dec 21, 2021

fetching a binary file from http & sending it => corruption #1375

fetching a binary file from http & sending it => corruption #1375

Comments

emmanueltouzery commented Mar 27, 2020 • edited Loading

Environment

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

imiric commented Mar 27, 2020

emmanueltouzery commented Mar 27, 2020 • edited Loading

emmanueltouzery commented Mar 27, 2020

mstoykov commented Mar 27, 2020

mstoykov commented Dec 21, 2021

emmanueltouzery commented Mar 27, 2020 •

edited

Loading

emmanueltouzery commented Mar 27, 2020 •

edited

Loading