Offset between selection and what is sent to stdin when using 'Process Selection' and multi-byte character encoding #26

DidierA · 2021-10-06T12:18:09Z

Hi,
Using Piper with a 'Context Menu Item' definition sh -c "cat >/tmp/Piper", which basicly dumps to a file what is sent to it, I noticed that in some cases, when using it from the Context menu Extensions>Piper>Process Selection , the contents of the file is different than what I have selected: it has the same length, but starts a few characters before the selection.
For instance if the sentence "Hello World" is in the Reponse tab, I select "World", and get "Hello" sent to the script.

By trial and error, I managed to understand that it occurs when the HTTP Response is encoded in utf-8 (Content-Type: text/html; charset=utf-8), and only in portions of the body that appear after characters that take more than one byte to encode (such as 'é' or 'è'). In fact I have noticed that there will be an offset of one position for each of those specific characters that appear anywhere before the selected text.

So if the body contains Nous sommes désolés de vous voir partir,
If I select anything before désolés, there is no offset. If I select the word partir, the string r part will be send to the script, as if the selection had been made 2 characters ahead.

The text was updated successfully, but these errors were encountered:

DidierA · 2021-10-10T00:35:25Z

In function private fun messagesToMap(), if I replace the line

val body = bytes.copyOfRange(bounds[0], bounds[1])

with

val body = bytes.toString(Charsets.UTF_8).substring(bounds[0], bounds[1]).toByteArray(Charsets.UTF_8)

I get the expected result, but this assumes the contents is UTF-8 encoded, which depends on what the HTTP Server sent.
pushed in DidierA@318ced7, but needs testing to check if it does not break other use cases

DidierA · 2021-10-10T21:56:07Z

I did further tests and it seems Burp is using some sort of 'auto-detection' technique to check if the contents is utf-8 or not. A strategy would be to use the same technique: Try to decode the whole message as utf-8. If it fails , use bytes.copyOfRange() to retrieve the selection, else use the above mehtod with Charsets.UTF_8.

DidierA · 2021-10-11T16:33:23Z

submited PR #27

dnet · 2021-10-12T10:40:11Z

Thanks for the detailed report and the PR; with that being merged, I'll close this as well.

DidierA changed the title ~~offset between selection and what is sent to stdin when using 'Process Selection'~~ Offset between selection and what is sent to stdin when using 'Process Selection' and multi-byte character encoding Oct 6, 2021

DidierA pushed a commit to DidierA/burp-piper that referenced this issue Oct 11, 2021

Fix issue silentsignal#26

fffcf98

dnet pushed a commit that referenced this issue Oct 12, 2021

Fix issue #26

1a83102

dnet closed this as completed Oct 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Offset between selection and what is sent to stdin when using 'Process Selection' and multi-byte character encoding #26

Offset between selection and what is sent to stdin when using 'Process Selection' and multi-byte character encoding #26

DidierA commented Oct 6, 2021

DidierA commented Oct 10, 2021 •

edited

Loading

DidierA commented Oct 10, 2021

DidierA commented Oct 11, 2021

dnet commented Oct 12, 2021

Offset between selection and what is sent to stdin when using 'Process Selection' and multi-byte character encoding #26

Offset between selection and what is sent to stdin when using 'Process Selection' and multi-byte character encoding #26

Comments

DidierA commented Oct 6, 2021

DidierA commented Oct 10, 2021 • edited Loading

DidierA commented Oct 10, 2021

DidierA commented Oct 11, 2021

dnet commented Oct 12, 2021

DidierA commented Oct 10, 2021 •

edited

Loading