-
Notifications
You must be signed in to change notification settings - Fork 413
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow defining libs with same name in multiple contexts #10179
Conversation
Signed-off-by: Javier Chávarri <javier.chavarri@gmail.com>
Signed-off-by: Javier Chávarri <javier.chavarri@gmail.com>
Signed-off-by: Javier Chávarri <javier.chavarri@gmail.com>
@rgrinberg I am trying to find a solution for this issue: whenever I define two libraries in the same folder in 2 separate contexts using dune/src/dune_rules/ml_sources.ml Line 66 in 91f6b39
(see eif-library-name-collision-same-folder.t for an example). The reason is that when Dune creates the dune/src/dune_rules/dir_contents.ml Line 272 in 91f6b39
Afaiu, all libraries DBs are created early in the build process, at a time where we don't know yet under which context the DB will be queried from. I know we previously discussed about having some new "library id" that includes the build path etc, but I think that approach would be useless, because at the time were One solution could be to defer the DB creation ( |
It doesn't take into account any context information because the scopes are already per context. Also, the library database is only needed to resolve the virtual libraries of implementations. So it can be used to fetch all the libraries in the entire scope, rather than the directory.
I think you've mentioned this a couple of times, but it's incorrect. Every library database belongs to a particular context. It might help if you explain why you think they're context independent, so that we could point out the source of the confusion.
Let me clarify the approach a bit. Your modifications to the library database are fine. For Ml_sources, you can do the following:
Thinking about this issue a little more, I'm actually not sure if we support two executables in the same directory either. E.g.:
I would imagine that we'd get the same duplicate executable error. I think it would be helpful to solve the issue for executables and melange.emit first. It should be a bit simpler, and it would demonstrate what would be necessary to make this work. |
Yes, they are created by context, but what I mean is that the database doesn't reflect any context-related info that the library might use in
This is the tricky part. The way In the case of having two libraries with the same name in the same folder, I don't see how
That makes sense, thanks for the idea, will work on this now then. |
(Note to self) This is the main reason why "cutting off early" (i.e. just avoid adding the rules of a library in |
Signed-off-by: Javier Chávarri <javier.chavarri@gmail.com>
brings the changes from ocaml#10231 Signed-off-by: Javier Chávarri <javier.chavarri@gmail.com>
Signed-off-by: Javier Chávarri <javier.chavarri@gmail.com>
Signed-off-by: Javier Chávarri <javier.chavarri@gmail.com>
Signed-off-by: Javier Chávarri <javier.chavarri@gmail.com>
Signed-off-by: Javier Chávarri <javier.chavarri@gmail.com>
Error: Library foo is defined twice: | ||
- dune:6 | ||
- dune:3 | ||
Error: Multiple rules generated for _build/default/foo.cmxs: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rgrinberg @anmonteiro I'm having some trouble while trying to maintain the behavior in this test case. Writing here my stream of thoughts in case you see some obvious path forward.
The current behavior in main
branch is driven by these lines:
Lines 73 to 74 in a18c530
| Redirect (loc1, lib1), Redirect (loc2, lib2) -> | |
if Lib_name.equal lib1 lib2 then Ok v1 else Error (loc1, loc2) |
In the case of two public libs with names baz.foo
and bar.foo
the code above would run through the error branch as the private names are equal but the public ones are not. Afaics, there's some logic in Lib_name
that transforms public names into private ones (by removing the dot) to use as keys of the map, so that's why this collision can be detected.
But in this PR, we have to defer this error until a later point when the current context is available, so the case above just adds the two libraries to the list set as payload of Redirect
and then keeps on. Eventually rules are added for both public libraries and the "Multiple rules" error is triggered.
What I can't figure out is that all the queries for lib information are done against the public libs DB using the public names baz.foo
and bar.foo
, so there's no way to detect the duplication after the fact. Maybe we have to track duplicates and keep them stored in the Project
variant, so that we can error out later on when the current build context is defined?
Lines 139 to 140 in a18c530
| Library (_, { project; visibility = Public p; _ }) -> | |
Some (Public_lib.name p, Project project) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still pretty blocked 😞 There seems to be a database for all libraries, but then a separate DB for public libs. This one is the one used to resolve public libraries apparently, and it's indexed by the public libs public names.
So when there are only public libraries that conflict, the collision detection won't work because the name resolution will go through the Project
branch:
Lines 119 to 121 in 40ea41f
| Some (Project project) -> | |
let scope = find_by_project (Fdecl.get t) project in | |
Lib.DB.Resolve_result.redirect scope.db (Loc.none, name) |
At that point, all the potential duplicates are lost, so the rules creation proceeds and the conflictive cmxs rules are created.
@rgrinberg do you have any pointers? Should we somehow keep 2 indexes in public_libs
, one keyed by public names and another with private? Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I attempted a fix in b970f3c.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is to perform a query to the DB using the libraries private names before adding their rules in Gen_rules.of_stanza
. Then check the result of this query for Invalid
values, and proceed accordingly to show the user error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After taking a brief look at the code, I would have thought:
- we create a DB with all the public libraries in the project from the stanzas
- this DB has
~parent:(Some installed_libraries)
- this DB has
- when we create the DB with all the public libraries in the project, we could change the map that we generate:
- Found_or_redirect.t Lib_name.Map.t
+ Found_or_redirect.t list Lib_name.Map.t
Given this change, I think you could perform your check like you did for the private libraries?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Found_or_redirect.t Lib_name.Map.t
type is the one used to create the "private" database, the public database type is redirect_to Lib_name.Map.t
.
I tried changing the former type as suggested, the result can be seen in jchavarri#11 (draft PR). This change doesn't affect behavior, but I think it makes the pattern matching a bit more noisy.
The main problem, as far as I understand, is that the public DB is indexed by public lib names, but the name collisions happen when the private names are used (i.e. in the private DB). Because public lib installation doesn't query the private DB but the public one, then the conflicts are never detected. That's why I had to add some code that tries to resolve the library private names:
dune/src/dune_rules/gen_rules.ml
Lines 115 to 118 in 531e4d3
(* This check surfaces conflicts between private names of public libraries, | |
without it the user might get duplicated rules errors for cmxs | |
when the libraries are defined in the same folder and have the same private name *) | |
let* res = Lib.DB.find_invalid db (Library.private_name lib) in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. The code in Gen_rules
doesn't match my intuition about this change, but I'm not intimately familiar with Lib
and Scope
apart from my cross-compilation changes.
Since you have a significant amount of context about this now, could you summarize the invariants/requirements we're trying to achieve? I'll take a stab at an incomplete list:
-
requirement: resolving a library by its private name should return all libraries matching that name
- we filter them based on the context they belong to, expect to find only 1, error if that's not the case
- invariant: resolving a private library returns 1 library per context, or errors out if
libs_in_current_context > 1
-
requirement: resolving a public library by its public name should return all libraries matching that name
- however, for public libraries, there could exist public libraries
a
andb
with a common private namepriv
, e.g.
(library (name priv) (public_name a)) (library (name priv) (public_name b))
- In this case, we must go look for the private library name to check whether there's another conflicting library (which could be both public or private -- we might need a test with a public library that has the same private name as a private lib?)
- invariant: the transitive closure of libraries for a module group must not contain multiple public libraries with the same private name
- however, for public libraries, there could exist public libraries
I tried to summarize my understanding of the current requirements to help me review the PR better. Did I get anything wrong and/or, is there something I missed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest, I don't think I understand everything that is going on to the level that I can explain the invariants in too much detail. At the (very) basic level, I could say that:
- before this PR, there was an invariant (implemented in the two DB creation functions) that prevented having two libraries sharing either private names or public names.
- after this PR, we need to relax this invariant to allow two or more libraries with the same private name or public name, because they might exist in different contexts, and at library DB creation time we can't resolve the
enabled_if
conditions as we don't have the expander available yet.
Maybe one important piece of information (that I recently understood) is that the "redirects" in the private DB are redirecting from the private names to the public names, which is a bit counterintuitive. I would have expected redirects to go from public names to private ones, because (I assume) all libs have a private name, but not all have a public name. But I assume there are good reasons for that.
This means that the private DB can contain Found
values with either public names, or private names. This can be seen on the usage of Library.best_name conf
when Found
values are added into it:
Line 69 in 171c231
Library.best_name conf, Found_or_redirect.found info) |
As an example of the above, if we have a public library with private name foo
and public name bar.foo
, there will be two entries in the private DB map:
- a
Redirect
value, indexed with keyfoo
- a
Found
value, indexed with keybar.foo
On cases where the conflict exists on the private names of public libraries (like the one tracked in lib-collision-public-same-folder.t
), the conflict is detected if we ever have to resolve the private names. But for installing public libs, that's never the case, we will just look for a library called bar.foo
, get a Found
value back, and then continue (until we get an error because both libs are trying to install cmxs under the same folder, because for that installation the private names are used).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code in
Gen_rules
doesn't match my intuition about this change
Maybe the call to Lib.DB.find_invalid
could be added somewhere else, it doesn't necessarily need to be in Gen_rules
, I just didn't know where to place it.
Signed-off-by: Javier Chávarri <javier.chavarri@gmail.com>
Signed-off-by: Javier Chávarri <javier.chavarri@gmail.com>
Signed-off-by: Javier Chávarri <javier.chavarri@gmail.com>
Signed-off-by: Javier Chávarri <javier.chavarri@gmail.com>
I just realized I was missing a few cases from the tests, mostly mixing public and private libs that collision. These cases are not covered by the current code. So while I thought the PR was ready to review, I am currently looking into what should be modified to cover those cases as well, and I don't know how much code will be changing. |
Signed-off-by: Javier Chávarri <javier.chavarri@gmail.com>
1b662f8
to
583255c
Compare
Signed-off-by: Javier Chávarri <javier.chavarri@gmail.com>
Signed-off-by: Javier Chávarri <javier.chavarri@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is ready to review. The case with a mix of public and private library is now covered.
The error messages can be improved, but at least the behavior is there already. I included some inline comments for extra context. Thanks.
if_available_buildable | ||
~loc:lib.buildable.loc | ||
(fun () -> Lib_rules.rules lib ~sctx ~dir ~scope ~dir_contents ~expander) | ||
(enabled_in_context && available)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understood that "available" means rather "exists". For inexistent libraries, enabled_in_context
might return true
surprisingly, so we have to keep both conditions in the check.
| Redirect of db * (Loc.t * Lib_name.t) | ||
| Deprecated_library_name of (Loc.t * Lib_name.t) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a new variant to distinguish between a regular redirect (public libs) and deprecated libs. Treating them both the same way leads to all the tests in test/blackbox-tests/test-cases/deprecated-library-name/features.t
failing because of duplicated errors. See related PR #10231 (those changes were added directly into this PR).
@@ -1084,7 +1098,10 @@ end = struct | |||
module Input = struct | |||
type t = Lib_name.t * Path.t Lib_info.t * string option | |||
|
|||
let equal (x, _, _) (y, _, _) = Lib_name.equal x y | |||
let equal (lib_name, info, _) (lib_name', info', _) = | |||
Lib_name.equal lib_name lib_name' && Lib_info.equal info info' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we don't properly distinguish between libraries with different info, the paths will be broken (when getting the info for library foo
in folder b
, we would get the lib foo
in folder a
).
Memo.List.filter_map | ||
~f:(function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is roughly the same treatment for each variant as the one in the "non-multiple" case, but we filter out disabled libs in the Found
branch.
Loc.equal t.loc loc | ||
&& Lib_name.equal t.name name | ||
&& Lib_kind.equal t.kind kind | ||
&& path_equal src_dir t.src_dir | ||
&& Option.equal path_equal orig_src_dir t.orig_src_dir | ||
&& Obj_dir.equal obj_dir t.obj_dir |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not as exhaustive as the check it had originally (see #9964), but it seems to get the job done, for the purposes of memoization.
Signed-off-by: Javier Chávarri <javier.chavarri@gmail.com>
Signed-off-by: Javier Chávarri <javier.chavarri@gmail.com>
Signed-off-by: Javier Chávarri <javier.chavarri@gmail.com>
Signed-off-by: Javier Chávarri <javier.chavarri@gmail.com>
Signed-off-by: Javier Chávarri <javier.chavarri@gmail.com>
From my brief look at the PR, I would have expected the following changes: The redirect constructor is now only active if it's corresponding
Crucially, we only follow the redirect if we resolve the
And to summarize the main uses of this API:
The API above is meant to be simple to explain, but I'm sure it can be tweaked so that minimal downstream code changes are needed. |
I tried implementing Rudi's proposal in jchavarri#12, but there's no There are changes in the cram test snapshots that lead me to believe we're doing less evaluation (fewer errors in some tests), but I'm not sure if I got the entire implementation right. |
Continues in #10307. |
Continuation of #10220. Supersedes #9839.
While #10220 fixed name collisions in exes and emits declared in separate contexts through
enabled_if
, this PR implements the same improvement for thelibrary
stanzas.Changes in invariants
before this PR, there was an invariant (implemented in the two DB creation functions in
Scope
) that prevented having two libraries sharing either private names or public names.after this PR, we need to relax this invariant to allow two or more libraries with the same private name or public name, because they might exist in different contexts, and at library DB creation time we can't resolve the
enabled_if
conditions as we don't have the expander available yet.Implementation
Scope
. While before the duplication checking logic was performed over there, and the functions to resolve libraries from their names would return a single value, now they can return multiple ones. This is done through new variantsMany
andMultiple_results
.enabled_if
field. Now we check for it and only create the rules if the library is actually enabled.