Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse string context elements properly #7543

Merged
merged 2 commits into from
Jan 11, 2023

Conversation

Ericson2314
Copy link
Member

Prior to this change, we had a bunch of ad-hoc string manipulation code scattered around. This made it hard to figure out what data model for string contexts is.

Now, we still store string contexts most of the time as encoded strings --- I was wary of the performance implications of changing that --- but whenever we parse them we do so only through the
NixStringContextElem::parse method, which handles all cases. This creates a data type that is very similar to DerivedPath but:

  • Represents the funky =<drvpath> case as properly distinct from the others.

  • Only encodes a single output, no wildcards and no set, for the "built" case.

(I would like to deprecate =<path>, after which we are in spitting distance of DerivedPath and could maybe get away with fewer types, but that is another topic for another day.)

Additionally, App now uses DerivedPath not StorePathWithOutputs for its (simplified) context. This is just better as StorePathWithOutputs is the deprecated old type just for the old CLI.

Copy link
Member

@thufschmitt thufschmitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice cleaning up :)

Left a bunch of small comments, but LGTM overall

src/libexpr/value/context.cc Show resolved Hide resolved
src/libexpr/value/context.cc Outdated Show resolved Hide resolved
src/libcmd/installables.hh Show resolved Hide resolved
src/libexpr/primops.cc Outdated Show resolved Hide resolved
src/libexpr/primops/context.cc Show resolved Hide resolved
src/libexpr/primops/context.cc Show resolved Hide resolved
Comment on lines +86 to +90
[&](const NixStringContextElem::DrvDeep & d) -> DerivedPath {
/* We want all outputs of the drv */
return DerivedPath::Built {
.drvPath = d.drvPath,
.outputs = {},
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just throw an error here instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or perhaps perform hash scanning?

Copy link
Member Author

@Ericson2314 Ericson2314 Jan 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be a breaking change (unfortunately) so I would rather save this for a subsequent PR that tries to constraint DrvDeep.

@fricklerhandwerk
Copy link
Contributor

fricklerhandwerk commented Jan 10, 2023

Reviewed in Nix team meeting on 2023-01-09:

  • @Ericson2314: deliberately gave the types cryptic names, because not sure what the code actually does in detail.
  • agreement: this PR is the right thing to do
  • @thufschmitt: we had the discussion about the variant pattern
    • @Ericson2314: it's more verbose in the header but more readable at call sites
  • @roberth: parsing and serialising the type should be called fromString and toString
  • @fricklerhandwerk: is this syntax completely internal?
    • @Ericson2314: we think so
    • @thufschmitt: the next step would be to remove it altogether and only leave the data structures
      • one exception: eval cache, where the string context is stored to disk
        • but that one is transient and we can bump the version on an incompatible serialisation format
  • @thufschmitt: somewhat unrelated: not clear what PathSet & context means in each different case
    • @Ericson2314: yes, but we should address this in a different PR

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2023-01-09-nix-team-meeting-minutes-22/24577/1

- Add a comment

- Put `OutputsSpec` in a different header (First part of NixOS#6815)

- Make a few stray uses of it in new code use `DerivedPath` instead.
Prior to this change, we had a bunch of ad-hoc string manipulation code
scattered around. This made it hard to figure out what data model for
string contexts is.

Now, we still store string contexts most of the time as encoded strings
--- I was wary of the performance implications of changing that --- but
whenever we parse them we do so only through the
`NixStringContextElem::parse` method, which handles all cases. This
creates a data type that is very similar to `DerivedPath` but:

 - Represents the funky `=<drvpath>` case as properly distinct from the
   others.

 - Only encodes a single output, no wildcards and no set, for the
   "built" case.

(I would like to deprecate `=<path>`, after which we are in spitting
distance of `DerivedPath` and could maybe get away with fewer types, but
that is another topic for another day.)
@Ericson2314
Copy link
Member Author

I called it to_string matching the convention elsewhere, but I kept parse as parse because that, and not from_string or fromString, is the convention today. I suppose we can deal with that later.

I ended up going with an exception not tests, because this (for now) exposed in the eval cache, and therefore we should have nice errors when reading a corrupted eval cache. Also, as a matter of practicality, I am not sure how to write failing parse tests when errors are caught with assert.

I think this should be ready now!

@roberth
Copy link
Member

roberth commented Jan 10, 2023

  • @roberth: parsing and serialising the type should be called fromString and toString

I pronounced it in snake case ;), and I'm also ok with parse.

Comment on lines +14 to +70
TEST_F(NixStringContextElemTest, empty_invalid) {
EXPECT_THROW(
NixStringContextElem::parse(store(), ""),
BadNixStringContextElem);
}

TEST_F(NixStringContextElemTest, single_bang_invalid) {
EXPECT_THROW(
NixStringContextElem::parse(store(), "!"),
BadNixStringContextElem);
}

TEST_F(NixStringContextElemTest, double_bang_invalid) {
EXPECT_THROW(
NixStringContextElem::parse(store(), "!!/"),
BadStorePath);
}

TEST_F(NixStringContextElemTest, eq_slash_invalid) {
EXPECT_THROW(
NixStringContextElem::parse(store(), "=/"),
BadStorePath);
}

TEST_F(NixStringContextElemTest, slash_invalid) {
EXPECT_THROW(
NixStringContextElem::parse(store(), "/"),
BadStorePath);
}

TEST_F(NixStringContextElemTest, opaque) {
std::string_view opaque = "/nix/store/g1w7hy3qg1w7hy3qg1w7hy3qg1w7hy3q-x";
auto elem = NixStringContextElem::parse(store(), opaque);
auto * p = std::get_if<NixStringContextElem::Opaque>(&elem);
ASSERT_TRUE(p);
ASSERT_EQ(p->path, store().parseStorePath(opaque));
ASSERT_EQ(elem.to_string(store()), opaque);
}

TEST_F(NixStringContextElemTest, drvDeep) {
std::string_view drvDeep = "=/nix/store/g1w7hy3qg1w7hy3qg1w7hy3qg1w7hy3q-x.drv";
auto elem = NixStringContextElem::parse(store(), drvDeep);
auto * p = std::get_if<NixStringContextElem::DrvDeep>(&elem);
ASSERT_TRUE(p);
ASSERT_EQ(p->drvPath, store().parseStorePath(drvDeep.substr(1)));
ASSERT_EQ(elem.to_string(store()), drvDeep);
}

TEST_F(NixStringContextElemTest, built) {
std::string_view built = "!foo!/nix/store/g1w7hy3qg1w7hy3qg1w7hy3qg1w7hy3q-x.drv";
auto elem = NixStringContextElem::parse(store(), built);
auto * p = std::get_if<NixStringContextElem::Built>(&elem);
ASSERT_TRUE(p);
ASSERT_EQ(p->output, "foo");
ASSERT_EQ(p->drvPath, store().parseStorePath(built.substr(5)));
ASSERT_EQ(elem.to_string(store()), built);
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shiny new tests! :)

Copy link
Member

@thufschmitt thufschmitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that looks good now :)

@thufschmitt thufschmitt merged commit a3ba803 into NixOS:master Jan 11, 2023
@Ericson2314 Ericson2314 deleted the typed-string-context branch January 11, 2023 06:47
Ericson2314 added a commit to obsidiansystems/nix that referenced this pull request Jan 29, 2023
Motivation

`PathSet` is not correct because string contexts have other forms
(`Built` and `DrvDeep`) that are not rendered as plain store paths.
Instead of wrongly using `PathSet`, or "stringly typed" using
`StringSet`, use `std::std<StringContextElem>`.

-----

In support of this change, `NixStringContext` is now defined as
`std::std<StringContextElem>` not `std:vector<StringContextElem>`. The
old definition was just used by a `getContext` method which was only
used by the eval cache. In reflection of this, the `typedef` is now
moved to the eval cache header, and the function a static function in
the eval cache c++ file.

Summarizing the previous paragraph:

Old:

  - `value/context.hh`: `NixStringContext = std::vector<StringContextElem>`
  - `value.hh`: `NixStringContext Value::getContext(...)`

New:

  - `value/context.hh`: `NixStringContext = std::set<StringContextElem>`
  - `eval-cache.hh`: `NixStringContextExact = std::vector<StringContextElem>`
  - `eval-cache.rcc`: `NixStringContext getExactContext(const Value &)`

This makes `NixStringContext` the common thing, and narrows the scope of
the odder and rarely used `NixStringContextExact`.

----

The string representation of string context elements no longer contains
the store dir. The diff of `src/libexpr/tests/value/context.cc` should
make clear what the new representation is, so we recommend reviewing
that file first. This was done for two reasons:

Less API churn:

`Value::mkString` and friends did not take a `Store` before. But if
`NixStringContextElem::{parse, to_string}` *do* take a store (as they
did before), then we cannot have the `Value` functions use them (in
order to work with the fully-structured `NixStringContext`) without
adding that argument.

That would have been a lot of churn of threading the store, and this
diff is already large enough, so the easier and less invasive thing to
do was simply make the element `parse` and `to_string` functions not
take the `Store` reference, and the easiest way to do that was to simply
drop the store dir.

Space usage:

Dropping the `/nix/store/` (or similar) from the internal representation
will safe space in the heap of the Nix programming being interpreted. If
the heap contains many strings with non-trivial contexts, the saving
could add up to something significant.

----

The eval cache version is bumped.

The eval cache serialization uses `NixStringContextElem::{parse,
to_string}`, and since those functions are changed per the above, that
means the on-disk representation is also changed.

This is simply done by changing the name of the used for the eval cache
from `eval-cache-v4` to eval-cache-v5`.

----

To avoid some duplication `EvalCache::mkPathString` is added to abstract
over the simple case of turning a store path to a string with just that
string in the context.

Context

This PR picks up where NixOS#7543 left off. That one introduced the fully
structured `NixStringContextElem` data type, but kept `PathSet context`
as an awkward middle ground between internal `char[][]` interpreter heap
string contexts and `NixStringContext` fully parsed string contexts.

The infelicity of `PathSet context` was specifically called out during
Nix team group review, but it was agreeing that fixing it could be left
as future work. This is that future work.

A possible follow-up step would be to get rid of the `char[][]`
evaluator heap representation, too, but it is not yet clear how to do
that. To use `NixStringContextElem` there we would need to get the STL
containers to GC pointers in the GC build, and I am not sure how to do
that.

----

PR NixOS#7543 effectively is writing the inverse of a `mkPathString`,
`mkOutputString`, and one more such function for the `DrvDeep` case. I
would like that PR to have property tests ensuring it is actually the
inverse as expected.

This PR sets things up nicely so that reworking that PR to be in that
more elegant and better tested way is possible.
Ericson2314 added a commit to obsidiansystems/nix that referenced this pull request Jan 29, 2023
Motivation

`PathSet` is not correct because string contexts have other forms
(`Built` and `DrvDeep`) that are not rendered as plain store paths.
Instead of wrongly using `PathSet`, or "stringly typed" using
`StringSet`, use `std::std<StringContextElem>`.

-----

In support of this change, `NixStringContext` is now defined as
`std::std<StringContextElem>` not `std:vector<StringContextElem>`. The
old definition was just used by a `getContext` method which was only
used by the eval cache. In reflection of this, the `typedef` is now
moved to the eval cache header, and the function a static function in
the eval cache c++ file.

Summarizing the previous paragraph:

Old:

  - `value/context.hh`: `NixStringContext = std::vector<StringContextElem>`
  - `value.hh`: `NixStringContext Value::getContext(...)`

New:

  - `value/context.hh`: `NixStringContext = std::set<StringContextElem>`
  - `eval-cache.hh`: `NixStringContextExact = std::vector<StringContextElem>`
  - `eval-cache.rcc`: `NixStringContext getExactContext(const Value &)`

This makes `NixStringContext` the common thing, and narrows the scope of
the odder and rarely used `NixStringContextExact`.

----

The string representation of string context elements no longer contains
the store dir. The diff of `src/libexpr/tests/value/context.cc` should
make clear what the new representation is, so we recommend reviewing
that file first. This was done for two reasons:

Less API churn:

`Value::mkString` and friends did not take a `Store` before. But if
`NixStringContextElem::{parse, to_string}` *do* take a store (as they
did before), then we cannot have the `Value` functions use them (in
order to work with the fully-structured `NixStringContext`) without
adding that argument.

That would have been a lot of churn of threading the store, and this
diff is already large enough, so the easier and less invasive thing to
do was simply make the element `parse` and `to_string` functions not
take the `Store` reference, and the easiest way to do that was to simply
drop the store dir.

Space usage:

Dropping the `/nix/store/` (or similar) from the internal representation
will safe space in the heap of the Nix programming being interpreted. If
the heap contains many strings with non-trivial contexts, the saving
could add up to something significant.

----

The eval cache version is bumped.

The eval cache serialization uses `NixStringContextElem::{parse,
to_string}`, and since those functions are changed per the above, that
means the on-disk representation is also changed.

This is simply done by changing the name of the used for the eval cache
from `eval-cache-v4` to eval-cache-v5`.

----

To avoid some duplication `EvalCache::mkPathString` is added to abstract
over the simple case of turning a store path to a string with just that
string in the context.

Context

This PR picks up where NixOS#7543 left off. That one introduced the fully
structured `NixStringContextElem` data type, but kept `PathSet context`
as an awkward middle ground between internal `char[][]` interpreter heap
string contexts and `NixStringContext` fully parsed string contexts.

The infelicity of `PathSet context` was specifically called out during
Nix team group review, but it was agreeing that fixing it could be left
as future work. This is that future work.

A possible follow-up step would be to get rid of the `char[][]`
evaluator heap representation, too, but it is not yet clear how to do
that. To use `NixStringContextElem` there we would need to get the STL
containers to GC pointers in the GC build, and I am not sure how to do
that.

----

PR NixOS#7543 effectively is writing the inverse of a `mkPathString`,
`mkOutputString`, and one more such function for the `DrvDeep` case. I
would like that PR to have property tests ensuring it is actually the
inverse as expected.

This PR sets things up nicely so that reworking that PR to be in that
more elegant and better tested way is possible.
Ericson2314 added a commit to obsidiansystems/nix that referenced this pull request Jan 31, 2023
Motivation

`PathSet` is not correct because string contexts have other forms
(`Built` and `DrvDeep`) that are not rendered as plain store paths.
Instead of wrongly using `PathSet`, or "stringly typed" using
`StringSet`, use `std::std<StringContextElem>`.

-----

In support of this change, `NixStringContext` is now defined as
`std::std<StringContextElem>` not `std:vector<StringContextElem>`. The
old definition was just used by a `getContext` method which was only
used by the eval cache. In reflection of this, the `typedef` is now
moved to the eval cache header, and the function a static function in
the eval cache c++ file.

Summarizing the previous paragraph:

Old:

  - `value/context.hh`: `NixStringContext = std::vector<StringContextElem>`
  - `value.hh`: `NixStringContext Value::getContext(...)`

New:

  - `value/context.hh`: `NixStringContext = std::set<StringContextElem>`
  - `eval-cache.hh`: `NixStringContextExact = std::vector<StringContextElem>`
  - `eval-cache.rcc`: `NixStringContext getExactContext(const Value &)`

This makes `NixStringContext` the common thing, and narrows the scope of
the odder and rarely used `NixStringContextExact`.

----

The string representation of string context elements no longer contains
the store dir. The diff of `src/libexpr/tests/value/context.cc` should
make clear what the new representation is, so we recommend reviewing
that file first. This was done for two reasons:

Less API churn:

`Value::mkString` and friends did not take a `Store` before. But if
`NixStringContextElem::{parse, to_string}` *do* take a store (as they
did before), then we cannot have the `Value` functions use them (in
order to work with the fully-structured `NixStringContext`) without
adding that argument.

That would have been a lot of churn of threading the store, and this
diff is already large enough, so the easier and less invasive thing to
do was simply make the element `parse` and `to_string` functions not
take the `Store` reference, and the easiest way to do that was to simply
drop the store dir.

Space usage:

Dropping the `/nix/store/` (or similar) from the internal representation
will safe space in the heap of the Nix programming being interpreted. If
the heap contains many strings with non-trivial contexts, the saving
could add up to something significant.

----

The eval cache version is bumped.

The eval cache serialization uses `NixStringContextElem::{parse,
to_string}`, and since those functions are changed per the above, that
means the on-disk representation is also changed.

This is simply done by changing the name of the used for the eval cache
from `eval-cache-v4` to eval-cache-v5`.

----

To avoid some duplication `EvalCache::mkPathString` is added to abstract
over the simple case of turning a store path to a string with just that
string in the context.

Context

This PR picks up where NixOS#7543 left off. That one introduced the fully
structured `NixStringContextElem` data type, but kept `PathSet context`
as an awkward middle ground between internal `char[][]` interpreter heap
string contexts and `NixStringContext` fully parsed string contexts.

The infelicity of `PathSet context` was specifically called out during
Nix team group review, but it was agreeing that fixing it could be left
as future work. This is that future work.

A possible follow-up step would be to get rid of the `char[][]`
evaluator heap representation, too, but it is not yet clear how to do
that. To use `NixStringContextElem` there we would need to get the STL
containers to GC pointers in the GC build, and I am not sure how to do
that.

----

PR NixOS#7543 effectively is writing the inverse of a `mkPathString`,
`mkOutputString`, and one more such function for the `DrvDeep` case. I
would like that PR to have property tests ensuring it is actually the
inverse as expected.

This PR sets things up nicely so that reworking that PR to be in that
more elegant and better tested way is possible.
Ericson2314 added a commit to obsidiansystems/nix that referenced this pull request Feb 1, 2023
Motivation

`PathSet` is not correct because string contexts have other forms
(`Built` and `DrvDeep`) that are not rendered as plain store paths.
Instead of wrongly using `PathSet`, or "stringly typed" using
`StringSet`, use `std::std<StringContextElem>`.

-----

In support of this change, `NixStringContext` is now defined as
`std::std<StringContextElem>` not `std:vector<StringContextElem>`. The
old definition was just used by a `getContext` method which was only
used by the eval cache. In reflection of this, the `typedef` is now
moved to the eval cache header, and the function a static function in
the eval cache c++ file.

Summarizing the previous paragraph:

Old:

  - `value/context.hh`: `NixStringContext = std::vector<StringContextElem>`
  - `value.hh`: `NixStringContext Value::getContext(...)`

New:

  - `value/context.hh`: `NixStringContext = std::set<StringContextElem>`
  - `eval-cache.hh`: `NixStringContextExact = std::vector<StringContextElem>`
  - `eval-cache.rcc`: `NixStringContext getExactContext(const Value &)`

This makes `NixStringContext` the common thing, and narrows the scope of
the odder and rarely used `NixStringContextExact`.

----

The string representation of string context elements no longer contains
the store dir. The diff of `src/libexpr/tests/value/context.cc` should
make clear what the new representation is, so we recommend reviewing
that file first. This was done for two reasons:

Less API churn:

`Value::mkString` and friends did not take a `Store` before. But if
`NixStringContextElem::{parse, to_string}` *do* take a store (as they
did before), then we cannot have the `Value` functions use them (in
order to work with the fully-structured `NixStringContext`) without
adding that argument.

That would have been a lot of churn of threading the store, and this
diff is already large enough, so the easier and less invasive thing to
do was simply make the element `parse` and `to_string` functions not
take the `Store` reference, and the easiest way to do that was to simply
drop the store dir.

Space usage:

Dropping the `/nix/store/` (or similar) from the internal representation
will safe space in the heap of the Nix programming being interpreted. If
the heap contains many strings with non-trivial contexts, the saving
could add up to something significant.

----

The eval cache version is bumped.

The eval cache serialization uses `NixStringContextElem::{parse,
to_string}`, and since those functions are changed per the above, that
means the on-disk representation is also changed.

This is simply done by changing the name of the used for the eval cache
from `eval-cache-v4` to eval-cache-v5`.

----

To avoid some duplication `EvalCache::mkPathString` is added to abstract
over the simple case of turning a store path to a string with just that
string in the context.

Context

This PR picks up where NixOS#7543 left off. That one introduced the fully
structured `NixStringContextElem` data type, but kept `PathSet context`
as an awkward middle ground between internal `char[][]` interpreter heap
string contexts and `NixStringContext` fully parsed string contexts.

The infelicity of `PathSet context` was specifically called out during
Nix team group review, but it was agreeing that fixing it could be left
as future work. This is that future work.

A possible follow-up step would be to get rid of the `char[][]`
evaluator heap representation, too, but it is not yet clear how to do
that. To use `NixStringContextElem` there we would need to get the STL
containers to GC pointers in the GC build, and I am not sure how to do
that.

----

PR NixOS#7543 effectively is writing the inverse of a `mkPathString`,
`mkOutputString`, and one more such function for the `DrvDeep` case. I
would like that PR to have property tests ensuring it is actually the
inverse as expected.

This PR sets things up nicely so that reworking that PR to be in that
more elegant and better tested way is possible.
Ericson2314 added a commit to obsidiansystems/nix that referenced this pull request Apr 20, 2023
Motivation

`PathSet` is not correct because string contexts have other forms
(`Built` and `DrvDeep`) that are not rendered as plain store paths.
Instead of wrongly using `PathSet`, or "stringly typed" using
`StringSet`, use `std::std<StringContextElem>`.

-----

In support of this change, `NixStringContext` is now defined as
`std::std<StringContextElem>` not `std:vector<StringContextElem>`. The
old definition was just used by a `getContext` method which was only
used by the eval cache. In reflection of this, the `typedef` is now
moved to the eval cache header, and the function a static function in
the eval cache c++ file.

Summarizing the previous paragraph:

Old:

  - `value/context.hh`: `NixStringContext = std::vector<StringContextElem>`
  - `value.hh`: `NixStringContext Value::getContext(...)`

New:

  - `value/context.hh`: `NixStringContext = std::set<StringContextElem>`
  - `eval-cache.hh`: `NixStringContextExact = std::vector<StringContextElem>`
  - `eval-cache.rcc`: `NixStringContext getExactContext(const Value &)`

This makes `NixStringContext` the common thing, and narrows the scope of
the odder and rarely used `NixStringContextExact`.

----

The string representation of string context elements no longer contains
the store dir. The diff of `src/libexpr/tests/value/context.cc` should
make clear what the new representation is, so we recommend reviewing
that file first. This was done for two reasons:

Less API churn:

`Value::mkString` and friends did not take a `Store` before. But if
`NixStringContextElem::{parse, to_string}` *do* take a store (as they
did before), then we cannot have the `Value` functions use them (in
order to work with the fully-structured `NixStringContext`) without
adding that argument.

That would have been a lot of churn of threading the store, and this
diff is already large enough, so the easier and less invasive thing to
do was simply make the element `parse` and `to_string` functions not
take the `Store` reference, and the easiest way to do that was to simply
drop the store dir.

Space usage:

Dropping the `/nix/store/` (or similar) from the internal representation
will safe space in the heap of the Nix programming being interpreted. If
the heap contains many strings with non-trivial contexts, the saving
could add up to something significant.

----

The eval cache version is bumped.

The eval cache serialization uses `NixStringContextElem::{parse,
to_string}`, and since those functions are changed per the above, that
means the on-disk representation is also changed.

This is simply done by changing the name of the used for the eval cache
from `eval-cache-v4` to eval-cache-v5`.

----

To avoid some duplication `EvalCache::mkPathString` is added to abstract
over the simple case of turning a store path to a string with just that
string in the context.

Context

This PR picks up where NixOS#7543 left off. That one introduced the fully
structured `NixStringContextElem` data type, but kept `PathSet context`
as an awkward middle ground between internal `char[][]` interpreter heap
string contexts and `NixStringContext` fully parsed string contexts.

The infelicity of `PathSet context` was specifically called out during
Nix team group review, but it was agreeing that fixing it could be left
as future work. This is that future work.

A possible follow-up step would be to get rid of the `char[][]`
evaluator heap representation, too, but it is not yet clear how to do
that. To use `NixStringContextElem` there we would need to get the STL
containers to GC pointers in the GC build, and I am not sure how to do
that.

----

PR NixOS#7543 effectively is writing the inverse of a `mkPathString`,
`mkOutputString`, and one more such function for the `DrvDeep` case. I
would like that PR to have property tests ensuring it is actually the
inverse as expected.

This PR sets things up nicely so that reworking that PR to be in that
more elegant and better tested way is possible.
Ericson2314 added a commit to obsidiansystems/nix that referenced this pull request Apr 20, 2023
Motivation

`PathSet` is not correct because string contexts have other forms
(`Built` and `DrvDeep`) that are not rendered as plain store paths.
Instead of wrongly using `PathSet`, or "stringly typed" using
`StringSet`, use `std::std<StringContextElem>`.

-----

In support of this change, `NixStringContext` is now defined as
`std::std<StringContextElem>` not `std:vector<StringContextElem>`. The
old definition was just used by a `getContext` method which was only
used by the eval cache. In reflection of this, the `typedef` is now
moved to the eval cache header, and the function a static function in
the eval cache c++ file.

Summarizing the previous paragraph:

Old:

  - `value/context.hh`: `NixStringContext = std::vector<StringContextElem>`
  - `value.hh`: `NixStringContext Value::getContext(...)`

New:

  - `value/context.hh`: `NixStringContext = std::set<StringContextElem>`
  - `eval-cache.hh`: `NixStringContextExact = std::vector<StringContextElem>`
  - `eval-cache.rcc`: `NixStringContext getExactContext(const Value &)`

This makes `NixStringContext` the common thing, and narrows the scope of
the odder and rarely used `NixStringContextExact`.

----

The string representation of string context elements no longer contains
the store dir. The diff of `src/libexpr/tests/value/context.cc` should
make clear what the new representation is, so we recommend reviewing
that file first. This was done for two reasons:

Less API churn:

`Value::mkString` and friends did not take a `Store` before. But if
`NixStringContextElem::{parse, to_string}` *do* take a store (as they
did before), then we cannot have the `Value` functions use them (in
order to work with the fully-structured `NixStringContext`) without
adding that argument.

That would have been a lot of churn of threading the store, and this
diff is already large enough, so the easier and less invasive thing to
do was simply make the element `parse` and `to_string` functions not
take the `Store` reference, and the easiest way to do that was to simply
drop the store dir.

Space usage:

Dropping the `/nix/store/` (or similar) from the internal representation
will safe space in the heap of the Nix programming being interpreted. If
the heap contains many strings with non-trivial contexts, the saving
could add up to something significant.

----

The eval cache version is bumped.

The eval cache serialization uses `NixStringContextElem::{parse,
to_string}`, and since those functions are changed per the above, that
means the on-disk representation is also changed.

This is simply done by changing the name of the used for the eval cache
from `eval-cache-v4` to eval-cache-v5`.

----

To avoid some duplication `EvalCache::mkPathString` is added to abstract
over the simple case of turning a store path to a string with just that
string in the context.

Context

This PR picks up where NixOS#7543 left off. That one introduced the fully
structured `NixStringContextElem` data type, but kept `PathSet context`
as an awkward middle ground between internal `char[][]` interpreter heap
string contexts and `NixStringContext` fully parsed string contexts.

The infelicity of `PathSet context` was specifically called out during
Nix team group review, but it was agreeing that fixing it could be left
as future work. This is that future work.

A possible follow-up step would be to get rid of the `char[][]`
evaluator heap representation, too, but it is not yet clear how to do
that. To use `NixStringContextElem` there we would need to get the STL
containers to GC pointers in the GC build, and I am not sure how to do
that.

----

PR NixOS#7543 effectively is writing the inverse of a `mkPathString`,
`mkOutputString`, and one more such function for the `DrvDeep` case. I
would like that PR to have property tests ensuring it is actually the
inverse as expected.

This PR sets things up nicely so that reworking that PR to be in that
more elegant and better tested way is possible.

Co-authored-by: Théophane Hufschmitt <7226587+thufschmitt@users.noreply.github.com>
Ericson2314 added a commit to obsidiansystems/nix that referenced this pull request Apr 21, 2023
Motivation

`PathSet` is not correct because string contexts have other forms
(`Built` and `DrvDeep`) that are not rendered as plain store paths.
Instead of wrongly using `PathSet`, or "stringly typed" using
`StringSet`, use `std::std<StringContextElem>`.

-----

In support of this change, `NixStringContext` is now defined as
`std::std<StringContextElem>` not `std:vector<StringContextElem>`. The
old definition was just used by a `getContext` method which was only
used by the eval cache. It can be deleted altogether since the tyes are
now unified and the preexisting `copyContext` function already suffices.

Summarizing the previous paragraph:

Old:

  - `value/context.hh`: `NixStringContext = std::vector<StringContextElem>`
  - `value.hh`: `NixStringContext Value::getContext(...)`
  - `value.hh`: `copyContext(...)`

New:

  - `value/context.hh`: `NixStringContext = std::set<StringContextElem>`
  - `value.hh`: `copyContext(...)`
----

The string representation of string context elements no longer contains
the store dir. The diff of `src/libexpr/tests/value/context.cc` should
make clear what the new representation is, so we recommend reviewing
that file first. This was done for two reasons:

Less API churn:

`Value::mkString` and friends did not take a `Store` before. But if
`NixStringContextElem::{parse, to_string}` *do* take a store (as they
did before), then we cannot have the `Value` functions use them (in
order to work with the fully-structured `NixStringContext`) without
adding that argument.

That would have been a lot of churn of threading the store, and this
diff is already large enough, so the easier and less invasive thing to
do was simply make the element `parse` and `to_string` functions not
take the `Store` reference, and the easiest way to do that was to simply
drop the store dir.

Space usage:

Dropping the `/nix/store/` (or similar) from the internal representation
will safe space in the heap of the Nix programming being interpreted. If
the heap contains many strings with non-trivial contexts, the saving
could add up to something significant.

----

The eval cache version is bumped.

The eval cache serialization uses `NixStringContextElem::{parse,
to_string}`, and since those functions are changed per the above, that
means the on-disk representation is also changed.

This is simply done by changing the name of the used for the eval cache
from `eval-cache-v4` to eval-cache-v5`.

----

To avoid some duplication `EvalCache::mkPathString` is added to abstract
over the simple case of turning a store path to a string with just that
string in the context.

Context

This PR picks up where NixOS#7543 left off. That one introduced the fully
structured `NixStringContextElem` data type, but kept `PathSet context`
as an awkward middle ground between internal `char[][]` interpreter heap
string contexts and `NixStringContext` fully parsed string contexts.

The infelicity of `PathSet context` was specifically called out during
Nix team group review, but it was agreeing that fixing it could be left
as future work. This is that future work.

A possible follow-up step would be to get rid of the `char[][]`
evaluator heap representation, too, but it is not yet clear how to do
that. To use `NixStringContextElem` there we would need to get the STL
containers to GC pointers in the GC build, and I am not sure how to do
that.

----

PR NixOS#7543 effectively is writing the inverse of a `mkPathString`,
`mkOutputString`, and one more such function for the `DrvDeep` case. I
would like that PR to have property tests ensuring it is actually the
inverse as expected.

This PR sets things up nicely so that reworking that PR to be in that
more elegant and better tested way is possible.

Co-authored-by: Théophane Hufschmitt <7226587+thufschmitt@users.noreply.github.com>
Ericson2314 added a commit to obsidiansystems/nix that referenced this pull request Apr 21, 2023
Motivation

`PathSet` is not correct because string contexts have other forms
(`Built` and `DrvDeep`) that are not rendered as plain store paths.
Instead of wrongly using `PathSet`, or "stringly typed" using
`StringSet`, use `std::std<StringContextElem>`.

-----

In support of this change, `NixStringContext` is now defined as
`std::std<StringContextElem>` not `std:vector<StringContextElem>`. The
old definition was just used by a `getContext` method which was only
used by the eval cache. It can be deleted altogether since the types are
now unified and the preexisting `copyContext` function already suffices.

Summarizing the previous paragraph:

Old:

  - `value/context.hh`: `NixStringContext = std::vector<StringContextElem>`
  - `value.hh`: `NixStringContext Value::getContext(...)`
  - `value.hh`: `copyContext(...)`

New:

  - `value/context.hh`: `NixStringContext = std::set<StringContextElem>`
  - `value.hh`: `copyContext(...)`
----

The string representation of string context elements no longer contains
the store dir. The diff of `src/libexpr/tests/value/context.cc` should
make clear what the new representation is, so we recommend reviewing
that file first. This was done for two reasons:

Less API churn:

`Value::mkString` and friends did not take a `Store` before. But if
`NixStringContextElem::{parse, to_string}` *do* take a store (as they
did before), then we cannot have the `Value` functions use them (in
order to work with the fully-structured `NixStringContext`) without
adding that argument.

That would have been a lot of churn of threading the store, and this
diff is already large enough, so the easier and less invasive thing to
do was simply make the element `parse` and `to_string` functions not
take the `Store` reference, and the easiest way to do that was to simply
drop the store dir.

Space usage:

Dropping the `/nix/store/` (or similar) from the internal representation
will safe space in the heap of the Nix programming being interpreted. If
the heap contains many strings with non-trivial contexts, the saving
could add up to something significant.

----

The eval cache version is bumped.

The eval cache serialization uses `NixStringContextElem::{parse,
to_string}`, and since those functions are changed per the above, that
means the on-disk representation is also changed.

This is simply done by changing the name of the used for the eval cache
from `eval-cache-v4` to eval-cache-v5`.

----

To avoid some duplication `EvalCache::mkPathString` is added to abstract
over the simple case of turning a store path to a string with just that
string in the context.

Context

This PR picks up where NixOS#7543 left off. That one introduced the fully
structured `NixStringContextElem` data type, but kept `PathSet context`
as an awkward middle ground between internal `char[][]` interpreter heap
string contexts and `NixStringContext` fully parsed string contexts.

The infelicity of `PathSet context` was specifically called out during
Nix team group review, but it was agreeing that fixing it could be left
as future work. This is that future work.

A possible follow-up step would be to get rid of the `char[][]`
evaluator heap representation, too, but it is not yet clear how to do
that. To use `NixStringContextElem` there we would need to get the STL
containers to GC pointers in the GC build, and I am not sure how to do
that.

----

PR NixOS#7543 effectively is writing the inverse of a `mkPathString`,
`mkOutputString`, and one more such function for the `DrvDeep` case. I
would like that PR to have property tests ensuring it is actually the
inverse as expected.

This PR sets things up nicely so that reworking that PR to be in that
more elegant and better tested way is possible.

Co-authored-by: Théophane Hufschmitt <7226587+thufschmitt@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

6 participants