-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make Docker images non-root, by default, and OpenShift compliant #13376
Make Docker images non-root, by default, and OpenShift compliant #13376
Conversation
@michaeljmarshall:Thanks for your contribution. For this PR, do we need to update docs? |
Mailing list thread: https://lists.apache.org/thread/lrt5ycdbms69npblp4j1jp75138wvn62 |
Looks like the test failures are related to containers failing to start. I'll take a closer look in the morning. |
@@ -18,6 +18,8 @@ | |||
# under the License. | |||
# | |||
|
|||
# NOTE: this script is intentionally not executable. It is only meant to be sourced for environment variables. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about removing the execution permission ? in this way you can run source conf/bkenv.sh
but you can't run it using ./conf/bkenv.sh
btw not sure it's worth to do that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The conf/bkenv.sh
file does not currently have execute permissions. It is used in a few places with commands like:
source $BK_HOME/conf/bkenv.sh
and
# Check bookkeeper env and load bkenv.sh
if [ -f "$PULSAR_HOME/conf/bkenv.sh" ]
then
. "$PULSAR_HOME/conf/bkenv.sh"
fi
I added this note because it surprised me that a .sh
file was in the conf
directory. If the note is obvious/confusing, I can remove it.
The Tiered FileSystem test failed because the broker couldn't write to the location. Since it is unrealistic for brokers to persist to a local file system, I am going to update the test to write to the
|
Still failing with the same error. Need to dig into the test a bit more. In my opinion, we should keep the docker image as locked down as possible, so I'd prefer to modify the failing test instead of the docker image.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tested with michaelmarshall/pulsar:2.10.0-SNAPSHOT
, please check my comment, seems the download dir cannot be created.
I have no name!@d48234a40a05:/pulsar$ mkdir download
mkdir: cannot create directory 'download': Permission denied
docker/pulsar/Dockerfile
Outdated
RUN chmod -R g+w /pulsar/conf | ||
RUN mkdir /pulsar/data && chmod -R g+w /pulsar/data | ||
RUN mkdir /pulsar/logs && chmod -R g+w /pulsar/logs | ||
RUN chmod -R u-w /pulsar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the /pulsar
do not have write permission anymore, so it will cause functions failed to download with default functions_worker configs. Since downloadDirectory: download/pulsar_functions
and the /pulsar/download/pulsar_functions
cannot be created by pulsar
user.
We should also check other folder permissions for functions as well, like narExtractionDirectory
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for testing this @freeznet. I hadn't come across that error yet. We could create the /pulsar/download/pulsar_functions
directory as part of this docker build so that we can ensure the build has the right permissions. That solution won't help anyone who has overridden narExtractionDirectory
, though.
I would like to avoid giving the user unnecessary permissions. The permission to write to the /pulsar
directory seems like more permission than the user needs.
@timmyyuan Please help to review this PR when you have time, thanks. |
I removed the |
@freeznet - I discovered that our default hdfs configuration also requires arbitrary writing to the pulsar/conf/filesystem_offload_core_site.xml Lines 27 to 30 in f7abada
I don't think this is a very good default, but I don't think we can change this default without breaking current implementation. With the latest commit, I gave the root group write permission to |
@congbobo184 - is it possible to update the default pulsar/conf/filesystem_offload_core_site.xml Lines 27 to 30 in f7abada
|
The Pulsar SQL tests have been failing because presto writes some logs to |
New docker image built from the most recent commit: Note that I ran the dive tool (https://github.com/wagoodman/dive) against the RUN apt-get update \
&& apt-get -y dist-upgrade \
&& apt-get -y install openjdk-11-jdk-headless netcat dnsutils less procps iputils-ping \
python3 python3-dev python3-setuptools python3-yaml python3-kazoo \
libreadline-gplv2-dev libncursesw5-dev libssl-dev libsqlite3-dev tk-dev libgdbm-dev libc6-dev libbz2-dev \
curl \
&& apt-get -y --purge autoremove \
&& apt-get autoclean \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, but I do have some more questions about non-root image, please check my comments, thanks.
USER root | ||
|
||
# We need to define the user in order for supervisord to work correctly | ||
# We don't need a user defined in the public docker image, though. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please show more context about why we don't need a user defined for the public image? Since the test image here is from pulsar image, and they both using UID 10000 here, why not add the user in the public image?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The UID needs to be a defined user in order to work with supervisord. Before I defined the user, the tests failed with this error:
Error: Invalid user id 10000 in section 'program:bookie' (file: '/etc/supervisord/conf.d/bookie.conf')
The public docker image does not need the user defined because we don't use supervisord to run the different pulsar processes. Further, when the docker image is used in OpenShift, it will receive a random UID that is guaranteed to be a member of the root group. My design here is similar. I don't think we should name the pulsar user because we don't want any logic tied to the UID. For reference, all bitnami docker images behave this way. They are well known for making non-root docker images. Here is one of their blogs that I consulted while preparing this change: https://engineering.bitnami.com/articles/running-non-root-containers-on-openshift.html.
docker/pulsar/Dockerfile
Outdated
|
||
# The UID must be non-zero. Otherwise, it is arbitrary. No logic should rely on its specific value. | ||
USER 10000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the best practices to debug the system when the container is in non-root mode?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Debugging can be challenging, especially because a non-root user does not have the privilege to download packages. The bitnami blog I referenced has some documentation on their experience debugging non root containers: https://engineering.bitnami.com/articles/running-non-root-containers-on-openshift.html.
Depending on your environment, there are several concrete options. In kubernetes, you can set the pod's securityContext
to runAsUser: 0
, if you really need to run it as root. If you have access to the docker container's host, you should be able to exec
into the container as the root user. We could also produce a "debug" image that contains additional debug tools.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can add some documentation about this, too. My initial focus for this PR was to get the technical part done first. Once we have solidified the design, I'll add docs.
after committing this patch we should take into consideration this #10815 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm
The conflict was only on white space. No code/configuration was changed. |
@codelipenghui - I haven't yet been able to test the function upgrade scenario that @sijie requested. It'd be great to get this included in branch-2.10, but I want to be clear about the status of testing before this gets merged. |
@michaeljmarshall Ok, I see. Thanks for the update. I will also ask @freeznet for testing this case if he has time to confirm it. Thanks again for the great work! |
@michaeljmarshall this brings me a one more question, do we need to add |
@freeznet - yes, it'll be important to set the |
@michaeljmarshall thanks for the reply, so will this PR add the |
@freeznet - no. The |
@michaeljmarshall but for Pulsar Functions Kubernetes Runtime, it will creates the statefulset for each function, and the |
@freeznet - why does the function's pod need a |
@codelipenghui @freeznet @michaeljmarshall Can we merge this PR? |
@freeznet Are you planning to do some testing before this is merged? Please test this asap. Thank you /cc @codelipenghui |
@lhotari yes I am working on it, and should be done in this week. |
fda6ae7
@freeznet I noticed your mailing list post that you have completed verification. Thanks for verifying. Ready to merge? |
) Master Issue: #11269 ### Motivation In order to increase the overall security of our Pulsar docker images, they should default to run as the non-root user. While updating these permissions, I make sure to comply with the OpenShift spec so the docker image can run on that platform out of the box. Once we finalize these changes, we will need to update the Apache Pulsar Helm chart to make sure that deployments take advantage of this feature. We'll use the `fsGroup` to make sure that k8s sets the appropriate file system permissions for the zookeeper, bookkeeper, and function pods. ### Modifications * Default to run as UID 10000. As noted in the `Dockerfile`, this UID is arbitrary. No logic should rely on this id. * Update filesystem permissions so that the group user has sufficient write permission. The group user is 0 (root). * Remove unnecessary write access. * The `/pulsar/{conf,data,logs}` directories and their members must be writable by the root group. I don't know of any other directories that need to be written to. Note that the `bin/pulsar-admin` too creates a log file in the `/pulsar/logs` directory. Please let me know if there are any additional * Note also that the executable file permissions are already set in our git repo. Those permissions are inherited by the docker image when we run the `COPY` directive in the `Dockerfile`. * There are no changes to the function worker in the k8s runtime. We do not need them because we already merged 04b5da0. * Add note to `conf/bkenv.sh`, as it is a `.sh` script that is not executable (and doesn't need to be). * Update test docker image and `supervisord` configuration. Note: it's unclear to me how the OpenShift spec handles restarts. I know that the UID is arbitrary. It's possible that the umask needs to be switched from `022` to `002`. Setting the umask in the docker image does not persist for consumers of the image, so this would need to be set in a helm chart. ### Verifying this change You can access a test image built with these changes here: `michaelmarshall/pulsar:2.10.0-SNAPSHOT`. I have already run some manual tests like `bin/pulsar standalone` in the container. I still need to deploy an actual cluster to verify that all of the unique components work correctly. Because we already merged 04b5da0, the upgrade scenarios are already simplified. If this change is in 2.10.0, that means 2.8 and 2.9 will be compatible for certain function worker upgrade scenarios. I wrote test criteria in #11269. I'll need to follow up on that criteria using my newly build image. I should be able to look closer at this tomorrow. We'll also need tests to pass, as I modified some tests with this PR. ### References The following links were useful in understanding how to make these changes: * https://engineering.bitnami.com/articles/running-non-root-containers-on-openshift.html * https://cloud.redhat.com/blog/a-guide-to-openshift-and-uids ### Does this pull request potentially affect one of the following parts: This PR updates our Docker images in a breaking way. It could result in bookkeepers, zookeepers, or functions with insufficient permissions. We will mitigate these permissions by updating the helm chart. These changes are easily overridden by extending the docker image. In k8s, you can use the pod's `securityContext` to override the user or group. (cherry picked from commit f7f8619)
…che#13376) Master Issue: apache#11269 ### Motivation In order to increase the overall security of our Pulsar docker images, they should default to run as the non-root user. While updating these permissions, I make sure to comply with the OpenShift spec so the docker image can run on that platform out of the box. Once we finalize these changes, we will need to update the Apache Pulsar Helm chart to make sure that deployments take advantage of this feature. We'll use the `fsGroup` to make sure that k8s sets the appropriate file system permissions for the zookeeper, bookkeeper, and function pods. ### Modifications * Default to run as UID 10000. As noted in the `Dockerfile`, this UID is arbitrary. No logic should rely on this id. * Update filesystem permissions so that the group user has sufficient write permission. The group user is 0 (root). * Remove unnecessary write access. * The `/pulsar/{conf,data,logs}` directories and their members must be writable by the root group. I don't know of any other directories that need to be written to. Note that the `bin/pulsar-admin` too creates a log file in the `/pulsar/logs` directory. Please let me know if there are any additional * Note also that the executable file permissions are already set in our git repo. Those permissions are inherited by the docker image when we run the `COPY` directive in the `Dockerfile`. * There are no changes to the function worker in the k8s runtime. We do not need them because we already merged apache@04b5da0. * Add note to `conf/bkenv.sh`, as it is a `.sh` script that is not executable (and doesn't need to be). * Update test docker image and `supervisord` configuration. Note: it's unclear to me how the OpenShift spec handles restarts. I know that the UID is arbitrary. It's possible that the umask needs to be switched from `022` to `002`. Setting the umask in the docker image does not persist for consumers of the image, so this would need to be set in a helm chart. ### Verifying this change You can access a test image built with these changes here: `michaelmarshall/pulsar:2.10.0-SNAPSHOT`. I have already run some manual tests like `bin/pulsar standalone` in the container. I still need to deploy an actual cluster to verify that all of the unique components work correctly. Because we already merged apache@04b5da0, the upgrade scenarios are already simplified. If this change is in 2.10.0, that means 2.8 and 2.9 will be compatible for certain function worker upgrade scenarios. I wrote test criteria in apache#11269. I'll need to follow up on that criteria using my newly build image. I should be able to look closer at this tomorrow. We'll also need tests to pass, as I modified some tests with this PR. ### References The following links were useful in understanding how to make these changes: * https://engineering.bitnami.com/articles/running-non-root-containers-on-openshift.html * https://cloud.redhat.com/blog/a-guide-to-openshift-and-uids ### Does this pull request potentially affect one of the following parts: This PR updates our Docker images in a breaking way. It could result in bookkeepers, zookeepers, or functions with insufficient permissions. We will mitigate these permissions by updating the helm chart. These changes are easily overridden by extending the docker image. In k8s, you can use the pod's `securityContext` to override the user or group. (cherry picked from commit f7f8619)
@freeznet - thanks for testing this feature out and helping get this merged in time for the 2.10 release! |
…che#13376) Master Issue: apache#11269 ### Motivation In order to increase the overall security of our Pulsar docker images, they should default to run as the non-root user. While updating these permissions, I make sure to comply with the OpenShift spec so the docker image can run on that platform out of the box. Once we finalize these changes, we will need to update the Apache Pulsar Helm chart to make sure that deployments take advantage of this feature. We'll use the `fsGroup` to make sure that k8s sets the appropriate file system permissions for the zookeeper, bookkeeper, and function pods. ### Modifications * Default to run as UID 10000. As noted in the `Dockerfile`, this UID is arbitrary. No logic should rely on this id. * Update filesystem permissions so that the group user has sufficient write permission. The group user is 0 (root). * Remove unnecessary write access. * The `/pulsar/{conf,data,logs}` directories and their members must be writable by the root group. I don't know of any other directories that need to be written to. Note that the `bin/pulsar-admin` too creates a log file in the `/pulsar/logs` directory. Please let me know if there are any additional * Note also that the executable file permissions are already set in our git repo. Those permissions are inherited by the docker image when we run the `COPY` directive in the `Dockerfile`. * There are no changes to the function worker in the k8s runtime. We do not need them because we already merged apache@04b5da0. * Add note to `conf/bkenv.sh`, as it is a `.sh` script that is not executable (and doesn't need to be). * Update test docker image and `supervisord` configuration. Note: it's unclear to me how the OpenShift spec handles restarts. I know that the UID is arbitrary. It's possible that the umask needs to be switched from `022` to `002`. Setting the umask in the docker image does not persist for consumers of the image, so this would need to be set in a helm chart. ### Verifying this change You can access a test image built with these changes here: `michaelmarshall/pulsar:2.10.0-SNAPSHOT`. I have already run some manual tests like `bin/pulsar standalone` in the container. I still need to deploy an actual cluster to verify that all of the unique components work correctly. Because we already merged apache@04b5da0, the upgrade scenarios are already simplified. If this change is in 2.10.0, that means 2.8 and 2.9 will be compatible for certain function worker upgrade scenarios. I wrote test criteria in apache#11269. I'll need to follow up on that criteria using my newly build image. I should be able to look closer at this tomorrow. We'll also need tests to pass, as I modified some tests with this PR. ### References The following links were useful in understanding how to make these changes: * https://engineering.bitnami.com/articles/running-non-root-containers-on-openshift.html * https://cloud.redhat.com/blog/a-guide-to-openshift-and-uids ### Does this pull request potentially affect one of the following parts: This PR updates our Docker images in a breaking way. It could result in bookkeepers, zookeepers, or functions with insufficient permissions. We will mitigate these permissions by updating the helm chart. These changes are easily overridden by extending the docker image. In k8s, you can use the pod's `securityContext` to override the user or group.
Master Issue: #11269
Motivation
In order to increase the overall security of our Pulsar docker images, they should default to run as the non-root user. While updating these permissions, I make sure to comply with the OpenShift spec so the docker image can run on that platform out of the box.
Once we finalize these changes, we will need to update the Apache Pulsar Helm chart to make sure that deployments take advantage of this feature. We'll use the
fsGroup
to make sure that k8s sets the appropriate file system permissions for the zookeeper, bookkeeper, and function pods.Modifications
Dockerfile
, this UID is arbitrary. No logic should rely on this id./pulsar/{conf,data,logs}
directories and their members must be writable by the root group. I don't know of any other directories that need to be written to. Note that thebin/pulsar-admin
too creates a log file in the/pulsar/logs
directory. Please let me know if there are any additionalCOPY
directive in theDockerfile
.conf/bkenv.sh
, as it is a.sh
script that is not executable (and doesn't need to be).supervisord
configuration.Note: it's unclear to me how the OpenShift spec handles restarts. I know that the UID is arbitrary. It's possible that the umask needs to be switched from
022
to002
. Setting the umask in the docker image does not persist for consumers of the image, so this would need to be set in a helm chart.Verifying this change
You can access a test image built with these changes here:
michaelmarshall/pulsar:2.10.0-SNAPSHOT
. I have already run some manual tests likebin/pulsar standalone
in the container. I still need to deploy an actual cluster to verify that all of the unique components work correctly. Because we already merged 04b5da0, the upgrade scenarios are already simplified. If this change is in 2.10.0, that means 2.8 and 2.9 will be compatible for certain function worker upgrade scenarios.I wrote test criteria in #11269. I'll need to follow up on that criteria using my newly build image. I should be able to look closer at this tomorrow.
We'll also need tests to pass, as I modified some tests with this PR.
References
The following links were useful in understanding how to make these changes:
Does this pull request potentially affect one of the following parts:
This PR updates our Docker images in a breaking way. It could result in bookkeepers, zookeepers, or functions with insufficient permissions. We will mitigate these permissions by updating the helm chart. These changes are easily overridden by extending the docker image. In k8s, you can use the pod's
securityContext
to override the user or group.Documentation
doc-required
I need to add docs for this PR. I'll do that in a follow up commit tomorrow.