-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch serialization format for variables and history #3341
Comments
This is probably an unfortunate consequence of fish using \x1D internally and exposing that fact publicly. See
in env.cpp. I suspect you are reporting this based on other open fish issues since it is very unlikely you noticed this in real world use of fish. If I'm wrong about that please provide more details about the scenario which caused you to notice this problem. |
Update: It looks like it is actually a bug with setting the variable rather than with looking it up and expanding it. I've updated the issue to reflect this. |
While working on something unrelated, I realized I had a very hazy understanding of what was actually in my variables, and whether my experiments were actually testing the things I thought that they were. Many languages address this by providing a way to introspect: # Ruby has `p` for a quick debugging print
$ ruby -e 'p ["abc", "def", "ghi"]'
["abc", "def", "ghi"]
# which is equivalent to `puts(variable.inspect)`
$ ruby -e 'puts ["abc", "def", "ghi"].inspect'
["abc", "def", "ghi"]
# Haskell has `show`
$ ghci -e 'putStrLn (show ["abc", "def", "ghi"])'
["abc","def","ghi"]
# Javascript has console.dir
$ node -e 'console.dir(["abc", "def", "ghi"])'
[ 'abc', 'def', 'ghi' ]
# Which is equivalent to `log(inspect(variable))`
$ node -e 'console.log(require("util").inspect(["abc", "def", "ghi"]))'
[ 'abc', 'def', 'ghi' ] So, I was about to open an issue requesting an But then I realized I've built tools like that before (1, 2) so I'm reasonably familiar with the problem / expectation, and I really ought to just try building it myself instead of continually expecting everyone else to do work for me. So, I tried to build the function I was wanting (currently working well"ish", source is here). Here are the difficulties I've hit while working on it so far:
|
Why use
If you'd like to display e.g. To make a function that just gets the variable name, you'd need to add the Then you could do something like function inspect --no-scope-shadowing
for ___inspect_arg in $argv
string escape -- $___inspect_arg $$___inspect_arg
end
end
inspect fish_color_autosuggestion
set -l smkx (tput smkx)
inspect smkx
Well, under Unix, programs receive arguments as NUL-terminated strings, so you can't actually pass a NUL inside an argument. (AFAIK, anyway) We could fake this for |
You could have it escape them, then unescape (eval unfortunately) them later,
These days I'd probably prefer to use JSON or something between tools if we're talking about arrays and stuff. First fish would need dicts to imagine anything very cool here. Another way one might want to represent things as output might be: |
Quoting is your friend. The "*" and "?" are interpreted as globs, like they are everwhere else. Do |
I've wondered for while if such an 'inspect' behavior might be perhaps a better thing for fish to do than this error:
I'm often just letting it complete $foo for me so I can snoop on what |
I need to apply highlighting between characters, so my plan was to iterate over them 1 at a time. Iterating over the result of
Oh, that's really cool! I haven't used arrays in this way yet, but was thinking it should check the results and display like an associative array if they line up with other vars.
Oh, yeah, good point >.< I guess you'd have to pipe the names into the function.
Aye, though you do still have to run it through sed (eg using sed, I can make my test pass for everything except
Aye, that would be nice. Or if they had ability to set mime types, then could use whatever format is appropriate for their type of data (presumably tools like
Ahh, yeah, you're right, I did have one level of quoting, but needed two:
Oh, that's a nice trick! That's pretty much the same use case I was trying to address. |
Note that the |
I'd advocate it, I did hit one with that somewhere along the way. Didn't document the experiment that elicited the issue, though. |
See also #436. |
Fixing this so that \x1D can be assigned to a var is going to be really hard. Why? Because that magic value is serialized to the universal var file and exported vars. If that magic value was just used internally it would be easy to change. Consider this var in my ~/.config/fish/fishd.$mac_addr file:
That entry does not mean the var was set to the empty string; e.g., I have a proof of concept change that lets me assign \x1D to vars. This even works for universal or exported vars. However, it means existing vars with the current magic On the one hand I think fish should not be using any ASCII or valid Unicode char for strictly internal purposes. That includes Unicode private-use chars. The latter isn't going to be possible as long as we're using wide chars internally since MS Windows limits us to 16 bit wide chars. But we can, at least, restrict our magic chars to a small range of the Unicode private-use range. On the other hand fish isn't meant to be a general purpose language that can handle arbitrary character values. So do we really want to make this change? I'll make a pull-request with my transitional solution and we can then discuss the merits of merging it. Obviously the above discussion applies to the magic \x1E value (symbol |
Way more folks will be inconvenienced by changing the array separator to some private use character when they try to load up a previous version of fish than folks who want to use that one character. I'd like to think the day we changed the array separator coincided with us gaining something really worthwhile to show for it, like key:value dicts, or a way to actually "return an array". Hopefully there would be something nice this would enable that we'd introduce at the same time. |
@krader1961 I'm extremely curious: how does your experiment handle exporting? What does The current behavior is certainly not really intuitive to use if you're not fish and are digesting the variables. The pre-2.2.0 automatic colon behavior was pretty much "as good as it gets" in that regard. We can't export a private use character - so that'd make us come up with something. Anyhow, what did you do? |
(I know that this is about NULL only, but you mentioned the same transformation you did would work for both situations, and that one is more interesting.) |
Might it be worth it to rename the file (i.e. increase the file's version)? Have a new fish read from fishd2 and try to import from fishd if no such file exists? That still wouldn't enable sharing across multiple fish versions, but at least you wouldn't get variables that appear to be corrupted. |
A bit of research reveals that 0xFFFF (and 0xFFFE) are classified as "not a character" by Unicode but are legal in unicode strings. In other words 0xFFFF is truly a private use character that should never be associated with a glyph or grapheme. So we can replace both the 0x1D (
Exactly as it does today, @floam. The only difference is the magic value in the UTF-8 encoded string will be \xEF\xBF\xBF (0xFFFF) rather than \x1D or \x1E. What I think you're asking is how will vars with more than one value be exported so that they're useful by other programs. That's a different problem that this issue won't address. It is being discussed in issue #436 as you noted.
I need to think about that. I wasn't going to automatically rewrite universal vars affected by my proposed change. So unless you create a new uvar that is null (i.e., has no value) it won't cause problems with older versions of fish. We will definitely need to do something like what you're proposing before we can remove all references to the legacy 0x1D and 0x1E magic values. And the only solution I too can think of is to introduce a new uvar file name (i.e., "version" of the name). Unless a better solution is identified we'll do that before closing this issue. But I'd like to do that work in a separate change. In no small part because it's problematic. Consider running fish versions pre and post this change. In both of them a uvar is created or modified whose encoding is affected by this change. Obviously we can have the new, post change, fish write to both uvar files using the encoding appropriate for each so that the old fish version sees the change. But the old fish isn't going to update the new uvar file. So how do we merge updates to the old uvar format into the new uvar format? It seems to me this is inherently a one way change. If you run a fish with this proposed change any uvars it modifies won't be visible to the old fish. Similarly, any uvar changes made by the old fish won't be visible to the new fish after the new uvar fishd2 file is created. |
Of course we're not going to have perfect compatibility (well, it would be possible to have a couple of fish versions that always synchronize both files and then switch over after a couple of years, but that's a bit much IMHO). But consider the case where someone has problems with the new fish - maybe there's a bug, maybe the distro package is broken, maybe a third-party script does not yet work with it. If we used one file, switching back to the old fish would make it read "corrupt" variables. If we used two, downgrading would mean switching to either the old variable state if we just one-way imported the old file or keep the new variable state if we wrote both. I don't want us to support running multiple fish versions side-by-side, but switching back should be okay. |
What if we just appended version identifiers to the end of the fishd files, always? I guess the idea would be that any particular version of fish might try to read and a convert a (recent) previous versions'
I've been trying to switch back and forth a bit these last few months, mostly to test things, often older linux build - it's really painful. I think we should "support" running fishes side-by-side with other versions of fish. But don't expect them to integrate. 5 year old versions of fish might run side-by-side with fish git master, basically unaffected on most computers today, except that they are likely loading up functions and completions for a totally different version of fish from the future. If we can segregate this, it'd go a long way towards making fish more reliable. What we do with the OS X .app bundles works really nice. There's still |
Yes, though the idea is that after this there's no need to change it again. Even if it did, it would do so very rarely and we could then again add the code. Since each fish version will only read the files it knows about, there's nothing special to do to protect them from reading a newer file version. Also, just to clarify, $FISH_VERSION is not useful for this as it changes with every single commit when doing a build from fish. Having hundreds of fishd files and every new build import again would be quite annoying.
What more would need to be done to enable that? Old fishes could read their stuff, new fishes would read their stuff (importing the old when needed). UVars wouldn't necessarily be synchronized, but the old fish wouldn't be broken by this. The biggest problem is going to be the user configuration - the other dirs in $fish_function_path et al should of course be set per-fish when you build it (in essence do what NixOS does). |
Actually, that was my suggestion: really use $FISH_VERSION so that 2.3.1 and 2.3.0 for example will not even try to share the content of a joint fishd file back and forth, aside from the 2.3.0 -> 2.3.1 migration. Because it's stuff like that that is likely to break scripts, and actually people don't need two-way stuff during upgrades. The idea is that if you roll back to 2.2.0 right now you'll have the functions that came with 2.2.0 being autoloaded and whatever was |
And 2.3.1-501-gb895a50 vs 2.3.1-500-gXXXXXX? This would mean going full NixOS. |
It'd be important that it matched whatever happened to the our scripts we include. I'd mimic Homebrew: for the versioned install directories in Cellar that get symlinked into place in |
In issue #4200 I'm working to replace the flat string representation used internally for fish script arrays with an actual array structure (e.g., std::vector). For fish 3.0 we should consider switching to a different representation when serializing vars into the environment. JSON, XML, and google protobufs are the obvious choices. Personally I'm not a fan of XML for this as it is overkill for this purpose and code to create and read the XML representation is considerably more complicated than for the other two representations. There are plenty of C++ JSON libraries (just google "c++ json"). The google protobuf project is here and a short tutorial is here. Protobufs are also overkill for this specific use case but are extremely efficient in both space and time and the flexibility could be extremely useful in the future. Another option is to use a variation of our current serialization format. Specifically, we would need to provide away to escape \x1D and \x1E characters so that they can be used in var values and unambiguously decide whether they are part of a literal value or have their current magic meaning. It might even be possible to do this in a way that allows fish 3.0 to read vars using the current serialization format with little risk of misinterpretation. |
All of these kinda seem like overkill. However, do note that we already use yaml for our history files. It might be possible to repurpose that. |
I really dislike YAML. And we use our own implementation that has known bugs rather than a high quality library. Too, we can't have literal newlines in the serialized format. Which means escaping the newlines in the YAML format which negates the primary reason to use it. Having slept on this I'm inclined to use a variant of our current serialization format. The idea is to prefix \x1D and \x1E with \x1B (escape) to remove the special meaning of those symbols. If an escape character appears in a string it is also prefixed with an escape character. Since the probability that anyone has a uvar with consecutive escape characters is very close to zero this allows us to read vars serialized using the current encoding. Having said that I'd still love to see us use JSON or protobufs. But that can probably only be justified if we decide to change the history file format. |
That's rather clever. Note that we also have #1257 (which asks for the uvar file to be moved out of ~/.config) and #1912 (which asks for it to not be machine-specific - which also requires a renaming). Doing all three at the same time is probably easier - try to get the new file, if not, try to find the old file. If you've found that, deserialize (with the old semantics) and save it (with the new ones) in the new place. |
I like JSON for this. I recall noticing that the current ksh 'beta' branch
on AT&T's ast GitHub project can read/print data as JSON and thinking that
was cool.
…On Sat, Jul 8, 2017 at 1:54 PM Kurtis Rader ***@***.***> wrote:
In issue #4200 <#4200> I'm
working to replace the flat string representation used internally for fish
script arrays with an actual array structure (e.g., std::vector). For fish
3.0 we should consider switching to a different representation when
serializing vars into the environment for fish 3.0. JSON, XML, and google
protobufs are the obvious choices.
Personally I'm not a fan of XML for this as it is overkill for this
purpose and code to create and read the XML representation is considerably
more complicated than for the other two representations. There are plenty
of C++ JSON libraries (just google "c++ json"). The google protobuf project
is here <https://github.com/google/protobuf> and a short tutorial is here
<https://developers.google.com/protocol-buffers/docs/cpptutorial>.
Protobufs are also overkill for this specific use case but are extremely
efficient in both space and time and the flexibility could be extremely
useful in the future.
Another option is to use a variation of our current serialization format.
Specifically, we would need to provide away to escape \x1D and \x1E
characters so that they can be used in var values and unambiguously decide
whether they are part of a literal value or have their current magic
meaning. It might even be possible to do this in a way that allows fish 3.0
to read vars using the current serialization format with little risk of
misinterpretation.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3341 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AARxRuBEsIoNy3uLoGrZ7T9UlwFKOiqkks5sL-yLgaJpZM4JvJBC>
.
|
Using JSON for serializing exported and universal vars (if not also for our history) might make it worthwhile to also make that format usable as a first class citizen in fish in a manner like the latest Korn shell version (not to mention Javascript). Probably via a new |
@floam, Can you clarify how you see JSON being useful in the context of fish script? I've thought about this some more and it isn't really useful unless we implement compound variables. That is, the ability to nest vars inside vars. Something I don't see happening any time soon. Nor can we use it for communicating directly with the So while I still want to see us switch to some other format for fish 3.0 I don't see a clear argument for JSON over google protobufs. I'm slightly biased as a former Google software engineer but protobufs is the superior encoding. Note that google protobufs normally use a binary encoding for storage and RPC but also has a robust, easy to read, text format. And by "easy to read" I mean that it is no harder to read, or write by hand, than JSON. |
Well, I at least know of a couple of tools to handle json on the commandline. The most popular is probably jq, I like jshon. That could be how I'm not sure that is a business we want to get into, but it would be useful without needing compound variables.
So... what are the arguments here? I mean
sounds nice, but what does that mean? Like I said, there's tooling around json, is there something similar for protobufs? Which is easier to implement? Do we need to use a library (which makes it harder to build fish yourself)? What are the available libraries (licensing, ease of use, availability)? Are there performance concerns i.e. with a common variable set size, does this add or reduce overhead compared to the existing format? How about history? |
Protobufs offer a space and time efficient binary representation where performance is critical. Where performance is less important it offers a text representation that is superficially similar to JSON. A protobuf schema allows you to define attributes like the field type, a default value and whether the field can be repeated. This is like our new The primary tooling for protobufs is the open source project from Google. The protobuf license is compatible with fish AFAICT.
Yes, regardless of whether we choose JSON or protobufs we should pick a high quality library. We should not roll our own implementation like we did for the history file pseudo-YAML format. If you google "google protobuf vs json" you'll find numerous articles such as these: http://blog.codeclimate.com/blog/2014/06/05/choose-protocol-buffers/ I looked at jq and jshon and shuddered. If we do implement fish script support for either JSON or protobufs we can do a lot better than either of those. But at the moment I'm only considering these encodings for serializing variables to the environment or uvar storage and the history file format. |
Indeed JSON would be a lot cooler if we had compound variables. But even without them, to me the big advantage of JSON vs protobufs is that we can exchange lists with other tools more easily and it's human readable rather than binary. Also, to me, it makes more sense to not use Probably |
I didn't know this. |
I think that's debatable. There is first-class support for protobufs in C++, Java, Python, Go, Ruby, Node.js and others. See the three articles I linked to and you'll find plenty more. And note that we're not talking about supporting either in fish script. I haven't seen a good use case. Let alone two or three that would justify implementing that support. At this stage I'm focused solely on a single serialization format for replacing our two, adhoc, mechanisms for command history and storing vars. |
I just learned that protobuf version 3 (aka proto3) has native support for JSON. I didn't know that because when I left Google four years ago proto3 didn't exist and thus the projects I worked on used proto2. This means that should we decide to implement a |
Also, this discussion thread about proto3 support for JSON: https://news.ycombinator.com/item?id=9666213 |
It's what the FSF calls a "Modified BSD license". Yes, it's compatible with our GPLv2. |
Note that variables which consist of a single element need to be exported as a simple string. However, such variables written to our universal variable file should be encoded as a google protobuf. |
One user's thoughts on the history format is at https://news.ycombinator.com/item?id=15911598 |
Happened upon this on Hacker News: https://news.ycombinator.com/item?id=20732197 |
fish version installed:
~/ref/tools/fish-shell/fish -v fish, version 2.3.1-492-g3702616
OS/terminal used:
Issue:
Variables do not get set when their value is exactly the character
\x1d
.Reproduction steps
The text was updated successfully, but these errors were encountered: