Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prune unused dependencies from manifests generated from spago.dhall files #667

Merged
merged 7 commits into from
Nov 6, 2023

Conversation

thomashoneyman
Copy link
Member

Fixes #662. The registry API will now detect unused dependencies in manifests generated from spago.dhall files. Those dependencies will be removed from the manifest before it is written to disk and we kick off the publishing process.

The general problem is that spago.dhall files frequently contain test dependencies and we have no way to distinguish a test from a non-test dependency from the config file alone. The main reason why these dependencies are mixed together is that spago.dhall files have historically been internal-only: the package set repository contained the "correct" list of source-only dependencies (though this is frequently incorrect, too). When we import a legacy package and it contains a spago.dhall file then we just take its dependencies at face value, but the result is over-constrained package dependencies as seen in #662.

This PR prunes unused dependencies from the manifests generated from legacy packages. Specifically, we prune unused dependencies any time we use the fetchLegacyManifest function, because that creates a manifest from the aggregate of bower.json, spago.dhall, and package-sets files.

The pruning process is as follows. As soon as we have the project source code and its generated manifest, we:

  1. Determine all of the project's dependencies (including transitive) by running the solver on the generated manifest.
  2. Download all dependencies so we have access to their source code
  3. Provide the dependency directory and source globs to purs graph (only usable from 0.13.8 onward, FYI) to determine, for any given module name, its path and the modules it directly depends on.
  4. Associate each module in the graph with the package it belongs to by parsing the path segment — we can do this because packages are installed in a standard form, ie. <tmp>/<package-name>-<version>/...
  5. Extract all modules that belong to the package source code (ie. those that have a path which begins with the package source directory) from the graph
  6. For each module in the package source code, determine its full transitive module dependencies, and merge all of them together into a set of module names reachable from the package source. Then, look up the package names these modules belong to, and merge that together to produce the set of package names containing a module reachable by the project source.

The result is a Set PackageName representing packages that are actually in use. We can then walk through the generated manifest checking each dependency to see if it is in the "used packages" set; if not, then it is removed from the manifest.

Finally, we write that updated manifest to disk and proceed with publishing the package. Publishing the package involves re-solving the manifest, downloading dependencies, and compiling, so we can be sure that the adjusted package indeed works. (This doesn't affect packages with a purs.json or spago.yaml file — they are solved only once, since we don't have to prune unused dependencies from them).

Internal Notes

  • I have included a PursGraph module that Spago can reuse if it wants, as well as a checkUnusedDependencies validation function, but I don't see any need to couple the code bases together over the use of purs graph as a utility. Spago has a more varied set of needs than the registry does. I would prefer not to try and share something like checkImports between the code bases.
  • I've added a number of tests for each function of any complexity that I added in this PR
  • I have updated the end-to-end tests to include an unnecessary dependency on type-equality in the effect@4.0.0 fixture and added a step to verify that the packaged tarball contains the correct, adjusted manifest.

@@ -16,6 +16,7 @@
"package.json"
],
"dependencies": {
"purescript-prelude": "^6.0.0"
"purescript-prelude": "^6.0.0",
"purescript-type-equality": "^4.0.0"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an unused dependency; since we solve for it, though, we need to add it to the local registry, registry-index, and storage fixtures (hence the other added files).

Comment on lines +407 to +410
Right string -> Env.askResourceEnv >>= \{ dhallTypes } -> Run.liftAff (jsonToDhallManifest dhallTypes string) >>= case _ of
Left error -> do
Log.error $ "Manifest does not typecheck: " <> error
Except.throw $ "Found a valid purs.json file in the package source, but it does not typecheck."
Copy link
Member Author

@thomashoneyman thomashoneyman Nov 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We used to always write a purs.json file, and then read it back and verify the dhall types. Now, we only verify the dhall types when a purs.json file already exists. That's because we already have tests in place to ensure that our manifest format complies with the dhall types, and in all cases except this one we're using our manifest codec to write the file.

Otherwise, this is the same as how we've always read the purs.json file out of the package source code.

Log.debug $ "Read a valid purs.json manifest from the package source:\n" <> stringifyJson Manifest.codec manifest
pure manifest

else if hasSpagoYaml then do
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section is unchanged other than returning rather than writing the manifest.

Log.debug "Successfully converted a spago.yaml into a purs.json manifest"
pure manifest
else do
Comment.comment $ "Package source does not have a purs.json file. Creating one from your bower.json and/or spago.dhall files..."
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section is unchanged other than returning rather than writing the manifest. That allows us to verify and modify it before writing it later on.

Comment on lines +535 to +536
-- Now that we've verified the package we can write the manifest to the source
-- directory and then publish it.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where the divergence between the old and new implementations begins.

let depGlobs = Path.concat [ tmpDepsDir, "*", "src", "**", "*.purs" ]
let command = Purs.Graph { globs: [ srcGlobs, depGlobs ] }
-- We need to use the minimum compiler version that supports 'purs graph'
let minGraphCompiler = unsafeFromRight (Version.parse "0.13.8")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

purs graph was introduced in 0.13.8, though that implementation had a bug in which it listed all transitive dependencies instead of direct dependencies only. In our case that doesn't matter: we want the transitive dependencies anyway.

We need to do our best to associate the call to 'purs graph' with a compiler version that can parse its source, so a package on 0.14.1 should use a purs graph call from the 0.14.0 series.

At this moment I'm allowing failed purs graph calls to just let the package through anyway if it's via the legacy importer since we can't accurately determine what compilers the source code might be valid with. However, when we implement #255 then we will be able to choose a correct compiler and can remove that escape hatch.

Copy link
Member

@f-f f-f left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! A small code-location note, but it's otherwise good to go

Comment on lines +606 to +635
let
-- We need access to a graph that _doesn't_ include the package
-- source, because we only care about dependencies of the package.
noSrcGraph = Map.filter (isNothing <<< String.stripPrefix (String.Pattern packageDirectory) <<< _.path) graph

pathParser = map _.name <<< parseInstalledModulePath <<< { prefix: tmpDepsDir, path: _ }

case PursGraph.associateModules pathParser noSrcGraph of
Left errs ->
Except.throw $ String.joinWith "\n"
[ "Failed to associate modules with packages while finding unused dependencies:"
, flip NonEmptyArray.foldMap1 errs \{ error, module: ModuleName moduleName, path } ->
" - " <> moduleName <> " (" <> path <> "): " <> error <> "\n"
]
Right modulePackageMap -> do
Log.debug "Associated modules with their package names. Finding all modules used in package source..."
-- The modules used in the package source code are any that have
-- a path beginning with the package source directory. We only
-- care about dependents of these modules.
let sourceModules = Map.keys $ Map.filter (isJust <<< String.stripPrefix (String.Pattern packageDirectory) <<< _.path) graph

Log.debug "Found all modules used in package source. Finding all modules used by those modules..."
-- We find all dependencies of each module used in the source,
-- which results in a set containing every module name reachable
-- by the source code.
let allReachableModules = Set.fromFoldable $ Array.fold $ Array.mapMaybe (flip PursGraph.allDependencies graph) $ Set.toUnfoldable sourceModules

-- Then we can associate each reachable module with its package
-- name to get the full set of used package names.
let allUsedPackages = Set.mapMaybe (flip Map.lookup modulePackageMap) allReachableModules
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This chunk could probably sit in the Graph module - it's pretty close to what Spago does for unused packages, and could probably be reused?

Copy link
Member Author

@thomashoneyman thomashoneyman Nov 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't add it there because we're relying on specific directory names and path formats to pull out the source vs. non-source code, so the only part that feels "shareable" is determining the reachable modules and used packages (lines 631-635); if you think that's usable by Spago I could move it to the Graph module but I'll fully defer to you on that. If there's no active plan to reuse it then I'll keep it where it is; it's easy to copy the implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Legacy spago.dhall imports overconstrain package dependencies.
2 participants