-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Class dependencies #86
Conversation
The inferred type is not great so let's enforce the right existential explicitly.
Remove implementation of Relations that supports the old incremental compilation algorithm. This will simplify changing relations to be class-based.
Since introduction of TextAnalysisFormat, most of AnalysisFormat is dead code. Some bits seem to be still used in various parts of sbt, though. Updated TextAnalysisFormatSpecification to not use deprecated methods.
Introduce a new relation declaredClasses in Relations. It tracks names of classes declared in given source file. Names of classes are fully qualified names as seen at pickler phase (i.e. before flattening). Objects are represented as object name with dollar sign appended to it. The `declaredClasses` relation will be needed to map invalidated classes back to source files where they are defined. It's worth mentioning that there's a relation called `classes` defined already. However, that relation tracks names of classes as seen by the Scala compiler backend. These are names that will later appear in bytecode. For our purposes, we want to have names of classes as they are declared in source code. Declared classes are tracked properly for both Scala and Java files. While updating ScalaCompilerForUnitTesting I flipped the nameHashing flag to be true by default. That's in sync with sbt's default value for that flag now.
Remove AnalysisTest that tests functionality not supported by name hashing.
We do not support it in a class-based dependency tracking. Remove tests that are specific to the old incremental compiler and are failing now.
Everything compiles but tests are failing. This commit should be split into two parts: - refactoring of Relations, TextAnalysisFormat, etc. that introduces support for relations where first element in their pair is not a file. At the moment, there're many places where code assumes that all relations are Relation[File, T]. - the rest of the code
Just get an empty set instead.
The helper method in IncrementalNameHashing was using a wrong relation and had bugs in logic emitting fatal error messages.
Dependencies coming from classes defined locally are not tracked properly and proper accounting of such dependencies will require refactoring of the Dependency phase. We'll need to make an explicit decision where dependencies of locally defined classes should go. They cannot be tracked at the level of a class that introduces the dependency because we track only classes publicly visible from other compilation units (this is left to be defined precisely later on). As for now, we just mark a test in DependencySpecification as pending.
Refinements are represented as classes internally but we want to record dependencies on named classes only. Therefore, we ignore the refinement class and only look at symbols referred from the refinement.
Introduce a dedicated ClassToSourceMapper and use it throughout the incremental compiler. The ClassToSourceMapper takes care of properly mapping classes that have been moved between source files by looking at relations before recompilation and after recompilation.
We used to use scala.Symbol to easily identify sources in file system independent manner. Since we switched to class-based dependency tracking, we can use class names directly.
Object's members are declared in module class but references to objects in trees are typechecked with module symbol assigned to it. We handle references to objects by mapping module symbols to module class symbols.
We would forget to consider declared classes for mapping. This would cause invalid mapping in case of mixed compilation (of java and scala files).
This commit fixes invalidation caused by api changes to an external (defined in other project) class. The problem was that we tried to convert internal class-> external class dependency relation to src->src but we would end up with an empty relation because we can't map external class to a src (it's defined outside of current project). We fix this problem by converting relation to internal src -> external class. This conversion is always possible. We also fix a bug in ClassToSourceMapper where conversion would lose some relations in case the left side of relation (class name) mapped to several different source files. This is possible when class is moved between source files.
The `classes` relation does the same thing as `declaredClasses` relation except it uses different string representation for class names. The `classes` uses names as seen after flatten phase but `declaredClasses` looks at symbols at pickler phase. I think the choice doesn't matter as long as the same naming scheme is used consisstently everywhere. Let's give flattened class names a try and find out if we can remove `declaredClasses`.
Modify xsbti.api.Source to carry api hash separately for each top level definition. Modify api extraction phase to compute hashes for each top-level definition separately. This will enable us to invalidate individual top-level classes.
This commit implements invalidation of dependencies at class level both for inheritance and memberRef dependencies. However, information that determines api changes is still tracked mostly at source file level. Name hashes are tracked at source level (classes considered for invalidation are mapped back to source files before name hash lookup). For that reason, #2319 is still not fixed. Api hashes are tracked per-class but only for top-level classes. It's not a final design but a quick hack to prove that class-based invalidation is working correctly. It was enough to scenario described in #2320 fixed. This is shown in `class-based-inheritance` scripted test. The final design of api hashing will be that all publicly accessible classes will get their own separate hash. The sealed test is marked pending as described in notes of #1104. Other thing to note is that we're getting close to being able to strip abstractions that deal with the fact that dependencies were tracked either as Relation[File, File] (for internal dependencies) or Relation[File, String] (for external dependencies). We track all dependencies uniformly as Relation[String, String] now.
We tried to switch to src class names everywhere in the incremental compiler but that broke the logic that was looking up generated class files. As a result, we wouldn't record inner classes properly due to difference between src and flat class names. I've added a scripted test that verifies proper recording of inner classes.
This is a large commit so I'll describe high level changes first and then go through each individual source file. At the center of this commit is differentation between binary and source class name. The incremental compiler needs to track both. The primary way of tracking classes and referring to them is through source (canonical in Java's terminology) class names. Binary names are needed for classpath lookups and turning binary dependencies into source dependencies. We perform that process each time there's a dependency on a class that has been compiled in different compiler run and from current compiler run is seen as a binary dependency (dependency on a class file). We maintain mapping between source and binary class names in `Relations.classes` relation. We track in that relation only names of non-local classes. Non-local casses are the ones that can be referred from other source files. The top-level, public inner, package private, protected are all examples of non-local classes. Locals classes are ones defined within a scope not owned by a class (e.g. in a block or a method) or anonymous classes. We maintain names of non-local classes because only those classes have canonical names and only those can be dependencies of other classes (hence we might need to perform mapping between binary and source name). Most changes in this commit are driven by changes to AnalysisCallback, the Java interface. I changed indentation from tabs to spaces in that file so the diff is hard to read. Let me explain the most important changes: - classDependency and binaryDependency methods have clear names of parameters that indicate whether they expects binary or source class names - generatedClass has been split into two: generatedNonLocalClass and generatedLocalClass. The first one takes both forms of the name as parameters, whereas the second variant takes just source file and class file and records just that. We used to pass some garbage names for local classes before. Now we make a clear distinction in AnalysisCallback between local and non-local classes and amount of information that is tracked about both. This change reduces the size of Analysis object and its serialized form. The generatedNonLocalClass method overlaps with declaredClass method. I haven't decided what to do about it yet but I think declaredClasses can be merged into generatedNonLocalClass Change to `generatedClass` caused a little bit different tracking of compilation products (class files). We do not track a binary name for every class file produced; we track it only for class files corresponding to non-local classes. For that reason we split products into products and classes in Analysis and Relations. All changes to Compile.scala (AnalysisCallback implementation) are about changes to how names and classes are tacked. They are all stragithforward changes. Analyzer makes it more clear what kind of names it's tracking and passing to AnalysisCallback. Analyze, the class that extracts dependencies from class files compiled by Java compiler has been adapted to changes in AnalysisCallback and clarifies when binary and source class names are used. I added a test, AnalyzeSpecification, that checks whether class dependencies on and from inner classes are extracted properly. I also added a scripted test that tests the same thing but verifies information recorded in Analysis. The reason for a duplicated test is that scripted test checks a whole lot more: - it verifies a correct behavior of AnalysisCallback implementation - it verifies that adding class dependencies is routed properly in Analysis data structures Scripted test is an integration test whereas AnalyzeSpecification is a unit test. In order to implement AnalyzeSpecification I had to introduce a class that lets me compile Java source on a fly: JavaCompilerForUnitTesting. The common functionality between java and scala compiler for unit testing that extracts data from TestCallback has been pushed to its companion object. For that to work, I had to add dependency on interfaceProj's test scope from classfileProj in build.sbt.
Introduce ClassName trait that centralizes logic for creating both source and binary names for a class symbol. In addition to using it in Analyzer, Dependency, ExtractDeclaredClasses we also use it in ExtractAPI. We want every component of the incremental compiler to use consitent naming scheme. Add ClassNameSpecification that documents expected behavior of ClassName.
The `binaryClassName` relation maintains mapping between source and binary class names. This mapping is needed to map binary dependencies back to source dependencies in case of separate compilation (where we see dependencies on class files). You can see that mapping being used in `binaryDependency` method implementation of Analysis callback. Previously, we would map class file to a source file it was produced from and then assume that dependency is on any (all) of classes declared in that class. Introduction of `binaryClassName` lets us map dependency back to source class name directly and remove that imprecision of dependency tracking. We maintain mapping between source and binary class names just for non-local classes. Check this sbt/sbt#1104 (comment) for the discussion of local and non-local classes. We also rework tracking of products in Analysis by introducing explicitly the concept of local and non-local products corresponding to local and non-local classes. This helps us to clarify for which classes we track source and binary class names.
Otherwise, scalariform reformats this file on every `compile` invocation.
Implements a strategy for recording dependencies introduced by top level imports by assigning those dependencies to the first top level class. In case there are top level imports but no top level class/trait/object defined in a compilation unit, a warning is issued. The rationale for this strategy can be found at: sbt/sbt#1104 (comment) Add an unit test covering different cases of top level imports (e.g. defined in nested packages). Mark the scripted test source-dependencies/import-class as passing after a small modification of adding a top level class.
Dependencies introduced by local classes require special handling because we track only non local classes (that can be referred from other files) in dependency relations. To overcome this problem, dependencies introduced by a local class are recorded as introduced by an outer class that is non local. However, this introduces a new problem with dependencies introduced by inheritance. We don't want local inheritance dependencies to cause a transitive invalidation of all classes that inherit from the outer class containing the local class. Check the comment in Relations.scala this patches introduces or follow the discussion of this problem at: sbt/sbt#1104 (comment) To capture the subtlety of inheritance dependencies from local classes, we introduce `LocalDependencyByInheritance` case to `DependencyContext` enum. TestCallback has been modified to return extracted local inheritance dependencies and a test in DependencySpecification has been updated accordingly. The Dependency phase has been reworked to handle local classes properly by mapping dependencies to outer, non local classes.Check the implementation for details of the mapping. It's worth mentioning that mapping is implemented as an amortized O(1) lookup so this change doesn't affect performance of extraction phase negatively. The invalidation logic has been modified to take into account inheritance dependencies introduced by local classes. The patch is small because the invalidation logic is very straightforward: we invalidate local inheritance dependencies non-transitively and we do not apply name hashing dependency pruning. Lastly, we mark local-class-inheritance scripted test as passing.
67d7a5a
to
f475de2
Compare
I added a small refactoring of the code that performs classpath and Analysis lookups. My motivation was streamline bits of code that are passing I updated Is this good to go for you? |
LGTM in the limited testing I did, however I'm still a bit concerned by the performance regressions on scala/scala (sbt/sbt#1104 (comment)), would be good to know why this happens and if this happens in other big projects too. |
Also the tests seem to be failing, no idea why |
LGTM! |
# Conflicts: # .gitignore
The test failure was a Travis glitch. |
Merged! |
Excellent news! I'll work on remaining issues. Just as a convenience, here they are:
|
// to types but that might be a bad thing because it might expand aliases eagerly which | ||
// not what we need | ||
case t: TypeTree if t.original != null => | ||
t.original.foreach(traverse) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this line (and the one above for macro originals) is a performance regression. Both foreach
and traverse
will recurse into the tree. See #187
Either t.original.foreach(handleTreeNode)
or simply traverse(t.original)
seems to be the right approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FTR, fixed in #193.
Reverted while implementing class-based NameHashing. See: - sbt/zinc@189a2a5 - sbt/zinc#86
Reverted while implementing class-based NameHashing. See: - sbt/zinc@189a2a5 - sbt/zinc#86
Reverted while implementing class-based NameHashing. See: - sbt/zinc@189a2a5 - sbt/zinc#86
Reverted while implementing class-based NameHashing. See: - sbt/zinc@189a2a5 - sbt/zinc#86
Reverted while implementing class-based NameHashing. See: - sbt/zinc@189a2a5 - sbt/zinc#86
Reverted while implementing class-based NameHashing. See: - sbt/zinc@189a2a5 - sbt/zinc#86
Reverted while implementing class-based NameHashing. See: - sbt/zinc@189a2a5 - sbt/zinc#86
Reverted while implementing class-based NameHashing. See: - sbt/zinc@189a2a5 - sbt/zinc#86
Reverted while implementing class-based NameHashing. See: - sbt/zinc@189a2a5 - sbt/zinc#86
Reverted while implementing class-based NameHashing. See: - sbt/zinc@189a2a5 - sbt/zinc#86
Reverted while implementing class-based NameHashing. See: - sbt/zinc@189a2a5 - sbt/zinc#86
Reverted while implementing class-based NameHashing. See: - sbt/zinc@189a2a5 - sbt/zinc#86
Reverted while implementing class-based NameHashing. See: - sbt/zinc@189a2a5 - sbt/zinc#86
Reverted while implementing class-based NameHashing. See: - sbt/zinc@189a2a5 - sbt/zinc#86
Not used by Zinc since 2016 sbt/zinc#86
This PR implements class-based dependency tracking. The basic idea is simple: let's track dependencies at class level instead of at source level. Dependencies on inner classes are tracked separately. Also, APIs (to detect which dependencies has changed) are tracked for each class separately.
The detailed list of what has been done and what's left to be done was tracked in sbt/sbt#1104. I'm pasting it here for convenience.
Tasks
Class-based dependency tracking can be implemented in two phases. The third phase is dedicated to testing on large projects.
Invalidate by inheritance based on class dependencies (phase 1)
Tasks marked as completed are completed only in a prototype implementation. No changes has been merged to sbt yet.declaredClasses
to immediately map them back to files until the rest of the algorithm is implemented)API
object corresponding to sealed parent; this will enable us to perform proper invalidation when a new case (class) introduced to a sealed hierarchybinaryClassName: Relation[String, String]
and use it whenever the conversion is needed.Track name hashes per class (phase 2)
declaredClasses
in almost entire algorithm) (would fix #2320)Only once the last bullet is merged we'll see improvement to incremental compilation when big number of nested classes is involved (e.g. like in Scala compiler's case).
Testing, bug fixing and benchmarking (phase 3)
;clean;compile
performanceMerge changes upstrem, prepare for shipping (phase 4)
Benefits
The improved dependency tracking delivers up to 40x speedups of incremental compilation in scenarios tested. Check benchmarking results here: sbt/sbt#1104 (comment)
The speedups are caused by fixing two main issues:
The 1. is likely to be triggered by any code base that uses more than one class defined in a single source file. The 2. is affecting code bases with big number of nested classes that are inherited.
One example is Scala compiler itself. Even with name hashing, we invalidate too much and code edit cycle becomes long whenever a new member is introduced.
Known issues
Backwards compatibility
This PR affects backwards compatibility in several ways:
Analysis.groupBy
operation has been dropped (it's not clear who needs it)Analysis
object is very different now because most methods take classNames as arguments instead of file names