-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: support for streaming input delimited by null characters #2659
Comments
This would not be very useful since
If you are using It does not work and it is not "secure".
Adding support for something like this will be useful only when jq supports byte strings, which, if ever, I assume will only happen after jq 1.7. |
I am aware of the fact that UNIX file names can contain invalid UTF-8 sequences. And while I would like it if jq could work with any byte string, I'd argue that jq erroring out on a string it can't deal with, is vastly preferable to it interpreting one input as two; in many cases it is perfectly acceptable to not being able to process all file names, while silently confusing file names, with all security implications, is not. I was however not aware that jq silently replaces byte sequences which do not represent valid UTF-8 characters. The occasion for wanting this feature, however, was not because I expected to be dealing with untrusted or malformed input — in fact, I would be very surprised if my particular input could ever have newlines. Also, please look beyond my example of processing the output of Regardless of what other ways there might be to solve each instance of this problem, it would still be a useful tool in the toolbox. |
I agree that we need this, and we can have this independently of adding support for a binary type. |
So when enable zero byte separated input there would be two modes, with and without Example if treated as whitespace: $ echo -ne '"a"\x00\x00\n "b"' | jq --raw-input0
"a"
"b" Example if split only on zero byte: $ echo -ne '"a"\x00\x00\n "b"' | jq --raw-input0
"a"
"" # or error?
"b" # skip whitespace around JSON? Example with $ echo -ne 'a\x00\x00\n b' | jq -R --raw-input0
"a"
""
"\n b" So |
What I had in mind, is that you'd use You would be using If your input consists of JSON values, then there would be no need to delimit them with null bytes; normal whitespace would work just fine. |
Is there a need to Are there tools that can produce |
@svdb0 Ok that makes sense, thanks for clarifying |
This feature request makes me concerned about |
Yes! See #2683. I think we want to a) remove |
c) keep |
Yes, that is also an option. Probably the least disruptive one now. |
d) no |
I used the option name Following the same reasoning, and also for symmetry and the resulting intuitiveness, I would suggest that the null delimited version of One more reason for getting rid of |
Historically, many Linux command line tools took their input — in particular file names — as newline terminated strings, or produced newline terminated output strings.
When an input or output entry includes an internal newline character, such an entry can be interpreted as multiple entries.
This will cause the tool processing these entries to misbehave, and can pose a security risk.
Increasingly, many Linux command line tools now have the ability to use null bytes instead of newlines as terminator in input and output.
Examples of such tools are Bash (
read
,mapfile
),grep
,cut
,head
,tail
,sort
,uniq
,sed
,find
,xargs
,env
,wc
,du
,stat
,file
,id
,tar
,rsync
, andsha256sum
.In addition, a few pseudo-files in
/proc/
, i.e./proc/<pid>/cmdline
and/proc/<pid>/environ
, use null bytes as terminators.It would be nice if jq too could be used in shell pipelines where null characters are used as terminators.
Issue #1990 addressed this for output generated by jq.
As far as I have been able to find, there is no similar functionality yet for reading null terminated data as input to jq.
This feature request is for the addition of a variant of
--raw-input
which uses null bytes as terminators instead of newline characters.For the purpose of this feature request, I will refer to it as
--raw-input0
.A common use case for
--raw-input0
would be to securely read a file listing into jq as input:find / -print0 | jq --raw-input0
In some cases, the following construct could be used instead of
--raw-input0
:jq --raw-input --slurp 'split("\u0000")[]'
However, this does not stream its input, and hence would often be less suitable in combination with long-running commands like the
find / -print0
statement above.The text was updated successfully, but these errors were encountered: