-
Notifications
You must be signed in to change notification settings - Fork 441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved layer grouping #1326
Improved layer grouping #1326
Conversation
The main purpose of this change is to make layer grouping more robust. Instead of using a heuristic based on whether a library is in the same `organization` as the build (which is not always what you want, some builds may have sub projects with different organizations, and some builds may depend on external dependencies from the same organization), it checks whether the library in question is present in `projectDependencyArtifacts` (ie, whether its a dependency built by this build), if it is, then it goes in a higher layer. Doing this change however required changing `dockerLayerGrouping` to be a `TaskKey` instead of a `SettingKey`, which is a non binary compatible change. So, `dockerLayerGrouping` has been deprecated, and a new `TaskKey`, `dockerGroupLayers` has been introduced. Since a new task has been introduced, I also took advantage of that to improve the types, instead of just passing the destination path, which requires you to know how destination paths are built to check if a particular file should be in there, it passes the whole mapping, ie, the source file and the destination path, which means that you can match against the source file (which is what `projectDependencyArtifacts` gives you) as well. Also, I changed the type from a function that returns an `Option` to a `PartialFunction`, this makes composing it, to introduce new mappings, far simpler, as `PartialFunction[_, _]` composes trivially compared to `Function[_, Option[_]]`.
ac253ae
to
d47d4fb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jroper for improving this feature 🤗
Only a small note on documentation and then this is ready to merge 😄
@@ -1 +1 @@ | |||
dockerLayerGrouping in Docker := (_ => None) | |||
dockerGroupLayers in Docker := PartialFunction.empty |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to reflect this in the docs as well:
https://github.com/sbt/sbt-native-packager/blame/master/src/sphinx/formats/docker.rst#L147-L154
@ppiotrow any remarks from your side? |
val oldPartialFunction = Function.unlift((tuple: (File, String)) => oldFunction(tuple._2)) | ||
|
||
val libDir = dockerBaseDirectory + "/lib/" | ||
val binDir = dockerBaseDirectory + "/bin/" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you also add "/conf/" to the same layer? I noticed that /conf/application.ini
gets created when javaOptions
are used.
javaOptions in Universal := Seq(
"-Djava.net.preferIPv4Stack=true",
"-Dlogback.configurationFile=logback.packaged.xml"
),
val binDir = dockerBaseDirectory + "/bin/" | ||
|
||
oldPartialFunction.orElse { | ||
case (file, _) if artifacts(file) => 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for me artifacts.contains
would be more explicit.
val dockerLayerGrouping = settingKey[String => Option[Int]]( | ||
"Group files by path into in layers to increase docker cache hits. " + | ||
"Lower index means the file would be a part of an earlier layer." | ||
) | ||
val dockerGroupLayers = taskKey[PartialFunction[(File, String), Int]]( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not very convinced to expose File as this function argument.
- Not sure how plugin users will do matching on file, but most likely they end up comparing location/path
- It will simplify chmod related code that will give up find.getOrElse and /dev/null and become simply layerToPath.lift(...)
- To make it work, just simple change is necessary: sth like
.map(_.data.path)
in body of dockerGroupLayers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how plugin users will do matching on file, but most likely they end up comparing location/path
I think there are use cases for either. With the path, the user has to know exactly the structure of the Docker image, including what that base directory is. The file I think makes more sense because it's the file that's changing, not the path. I know that this source file changes frequently, so every mapping that includes this source file, I want to put it in a higher layer. That's more direct and robust then having to work out what the destination path for that file is, and then matching based on that.
It will simplify chmod related code that will give up find.getOrElse and /dev/null and become simply layerToPath.lift(...)
Yes, but that code isn't that complex, and I think the advantages outweigh the minuses here.
To make it work, just simple change is necessary: sth like .map(_.data.path) in body of dockerGroupLayers
No, that's the point, that doesn't work, and that's why the File
is being passed. If I want to match the slf4j api jar, the path for that is /home/jroper/.ivy2/cache/org.slf4j/slf4j-api/jars/slf4j-api-1.7.30.jar
, meanwhile, the path that gets passed into the function is opt/docker/lib/org.slf4j.slf4j-api-1.7.30.jar
. They are completely different paths, even the file name doesn't match. So, in order to match based on the path, I need to know the completely undocumented method that sbt native packager uses to construct that file name, I also need to know the completely undocumented method that sbt native packager uses to construct the the path to it. What if sbt native packager ever changes that logic? Then my logic for matching will break. In contrast, this logic will never break, no matter what sbt native packager does:
dockerGroupLayers := {
val slf4jFile = for {
artifact <- externalDependencyClasspath.value
moduleId <- artifact.metadata.get(AttributeKey[ModuleID]("moduleID"))
if moduleId.organization == "org.slf4j" && moduleId.name == "slf4j-api"
} yield artifact.data
dockerGroupLayers.value.orElse {
case (`slf4jFile`, _) => Some(2)
}
}
Perhaps a bit wordy, but if we find it's popular, we could always add a helper function to sbt and then they could use:
dockerGroupLayers := {
val slf4jFile = getClasspathFileFor(externalDependencyClasspath.value, "org.slf4j", "slf4j-api")
dockerGroupLayers.value.orElse {
case (`slf4jFile`, _) => Some(2)
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dockerGroupLayers := {
val slf4jFile = for {
artifact <- externalDependencyClasspath.value
moduleId <- artifact.metadata.get(AttributeKey[ModuleID]("moduleID"))
if moduleId.organization == "org.slf4j" && moduleId.name == "slf4j-api"
} yield artifact.data
dockerGroupLayers.value.orElse {
case (`slf4jFile`, _) => Some(2)
}
}
maybe we can add this to the docker.rst
docs as well?
@ppiotrow I'm going to merge this and we address the small hints in another pull request. Do you have time for this 😄 ? If not, that's totally fine. I'm going to add the two notes. |
Sure, I have plans to contribute more. |
The main purpose of this change is to make layer grouping more robust. Instead of using a heuristic based on whether a library is in the same
organization
as the build (which is not always what you want, some builds may have sub projects with different organizations, and some builds may depend on external dependencies from the same organization), it checks whether the library in question is present inprojectDependencyArtifacts
(ie, whether its a dependency built by this build), if it is, then it goes in a higher layer.Doing this change however required changing
dockerLayerGrouping
to be aTaskKey
instead of aSettingKey
, which is a non binary compatible change. So,dockerLayerGrouping
has been deprecated, and a newTaskKey
,dockerGroupLayers
has been introduced.Since a new task has been introduced, I also took advantage of that to improve the types, instead of just passing the destination path, which requires you to know how destination paths are built to check if a particular file should be in there, it passes the whole mapping, ie, the source file and the destination path, which means that you can match against the source file (which is what
projectDependencyArtifacts
gives you) as well. Also, I changed the type from a function that returns anOption
to aPartialFunction
, this makes composing it, to introduce new mappings, far simpler, asPartialFunction[_, _]
composes trivially compared toFunction[_, Option[_]]
.