Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error 135 in Initdb deploying inside Kubernetes #451

Closed
dlohin opened this issue May 29, 2018 · 13 comments
Closed

Error 135 in Initdb deploying inside Kubernetes #451

dlohin opened this issue May 29, 2018 · 13 comments
Labels
question Usability question, not directly related to an error with the image

Comments

@dlohin
Copy link

dlohin commented May 29, 2018

When I attempt to run the Postgres container using Kubernetes I get an error and the container crashes. I have been banging my head for a few days on this but can't find anything that points me in the right direction as to what to debug. I have tried using the Postgres container with Docker using the same host and this works fine. I have also tested on a different cluster using Kubernetes cluster and it is working fine so I believe it is something environment specific.

When I set the container to not enter into the entrypoint I can then recreate the initdb error.

Here is the output I get when I run initdb:

postgres@postgresql-844495667c-fdtzw:/$ /usr/lib/postgresql/10/bin/initdb -d -n /db
Running in debug mode.
Running in no-clean mode. Mistakes will not be cleaned up.
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

VERSION=10.4 (Debian 10.4-2.pgdg90+1)
PGDATA=/db
share_path=/usr/share/postgresql/10
PGPATH=/usr/lib/postgresql/10/bin
POSTGRES_SUPERUSERNAME=postgres
POSTGRES_BKI=/usr/share/postgresql/10/postgres.bki
POSTGRES_DESCR=/usr/share/postgresql/10/postgres.description
POSTGRES_SHDESCR=/usr/share/postgresql/10/postgres.shdescription
POSTGRESQL_CONF_SAMPLE=/usr/share/postgresql/10/postgresql.conf.sample
PG_HBA_SAMPLE=/usr/share/postgresql/10/pg_hba.conf.sample
PG_IDENT_SAMPLE=/usr/share/postgresql/10/pg_ident.conf.sample
The database cluster will be initialized with locale "C".
The default database encoding has accordingly been set to "SQL_ASCII".
The default text search configuration will be set to "english".

Data page checksums are disabled.

fixing permissions on existing directory /db ... ok
creating subdirectories ... ok
selecting default max_connections ... 20
selecting default shared_buffers ... 400kB
selecting dynamic shared memory implementation ... posix
creating configuration files ... ok
running bootstrap script ... 2018-05-29 13:59:16.693 UTC [249] DEBUG: invoking IpcMemoryCreate(size=3055616)
Bus error (core dumped)
child process exited with exit code 135
initdb: data directory "/db" not removed at user's request

Things I have tried:
Increasing SHM on the host and the container
Running as privileged
Running older versions of Postgres
Running STRACE to see if anything jumped out at me
Increasing CPU limits and requests

Even if someone can point me in the right direction I would be forever grateful. Right now I am stuck banging my head against a wall.

postgresdump.zip

@dlohin
Copy link
Author

dlohin commented May 29, 2018

Not sure if this helps, I tried a Centos7-postgres image and got a similiar error though with a bit more details.:

fixing permissions on existing directory /var/lib/pgsql/data/userdata ... ok
creating subdirectories ... ok
sh: line 1: 24 Bus error (core dumped) "/opt/rh/rh-postgresql96/root/usr/bin/postgres" --boot -x0 -F -c max_connections=100 -c shared_buffers=1000 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 26 Bus error (core dumped) "/opt/rh/rh-postgresql96/root/usr/bin/postgres" --boot -x0 -F -c max_connections=50 -c shared_buffers=500 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 28 Bus error (core dumped) "/opt/rh/rh-postgresql96/root/usr/bin/postgres" --boot -x0 -F -c max_connections=40 -c shared_buffers=400 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 30 Bus error (core dumped) "/opt/rh/rh-postgresql96/root/usr/bin/postgres" --boot -x0 -F -c max_connections=30 -c shared_buffers=300 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 32 Bus error (core dumped) "/opt/rh/rh-postgresql96/root/usr/bin/postgres" --boot -x0 -F -c max_connections=20 -c shared_buffers=200 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 34 Bus error (core dumped) "/opt/rh/rh-postgresql96/root/usr/bin/postgres" --boot -x0 -F -c max_connections=10 -c shared_buffers=100 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
selecting default max_connections ... 10
sh: line 1: 36 Bus error (core dumped) "/opt/rh/rh-postgresql96/root/usr/bin/postgres" --boot -x0 -F -c max_connections=10 -c shared_buffers=16384 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 38 Bus error (core dumped) "/opt/rh/rh-postgresql96/root/usr/bin/postgres" --boot -x0 -F -c max_connections=10 -c shared_buffers=8192 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 40 Bus error (core dumped) "/opt/rh/rh-postgresql96/root/usr/bin/postgres" --boot -x0 -F -c max_connections=10 -c shared_buffers=4096 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 42 Bus error (core dumped) "/opt/rh/rh-postgresql96/root/usr/bin/postgres" --boot -x0 -F -c max_connections=10 -c shared_buffers=3584 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 44 Bus error (core dumped) "/opt/rh/rh-postgresql96/root/usr/bin/postgres" --boot -x0 -F -c max_connections=10 -c shared_buffers=3072 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 46 Bus error (core dumped) "/opt/rh/rh-postgresql96/root/usr/bin/postgres" --boot -x0 -F -c max_connections=10 -c shared_buffers=2560 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 48 Bus error (core dumped) "/opt/rh/rh-postgresql96/root/usr/bin/postgres" --boot -x0 -F -c max_connections=10 -c shared_buffers=2048 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 50 Bus error (core dumped) "/opt/rh/rh-postgresql96/root/usr/bin/postgres" --boot -x0 -F -c max_connections=10 -c shared_buffers=1536 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 52 Bus error (core dumped) "/opt/rh/rh-postgresql96/root/usr/bin/postgres" --boot -x0 -F -c max_connections=10 -c shared_buffers=1000 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 54 Bus error (core dumped) "/opt/rh/rh-postgresql96/root/usr/bin/postgres" --boot -x0 -F -c max_connections=10 -c shared_buffers=900 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 56 Bus error (core dumped) "/opt/rh/rh-postgresql96/root/usr/bin/postgres" --boot -x0 -F -c max_connections=10 -c shared_buffers=800 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 58 Bus error (core dumped) "/opt/rh/rh-postgresql96/root/usr/bin/postgres" --boot -x0 -F -c max_connections=10 -c shared_buffers=700 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 60 Bus error (core dumped) "/opt/rh/rh-postgresql96/root/usr/bin/postgres" --boot -x0 -F -c max_connections=10 -c shared_buffers=600 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 62 Bus error (core dumped) "/opt/rh/rh-postgresql96/root/usr/bin/postgres" --boot -x0 -F -c max_connections=10 -c shared_buffers=500 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 64 Bus error (core dumped) "/opt/rh/rh-postgresql96/root/usr/bin/postgres" --boot -x0 -F -c max_connections=10 -c shared_buffers=400 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 66 Bus error (core dumped) "/opt/rh/rh-postgresql96/root/usr/bin/postgres" --boot -x0 -F -c max_connections=10 -c shared_buffers=300 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 68 Bus error (core dumped) "/opt/rh/rh-postgresql96/root/usr/bin/postgres" --boot -x0 -F -c max_connections=10 -c shared_buffers=200 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 70 Bus error (core dumped) "/opt/rh/rh-postgresql96/root/usr/bin/postgres" --boot -x0 -F -c max_connections=10 -c shared_buffers=100 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
sh: line 1: 72 Bus error (core dumped) "/opt/rh/rh-postgresql96/root/usr/bin/postgres" --boot -x0 -F -c max_connections=10 -c shared_buffers=50 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
selecting default shared_buffers ... 400kB
selecting dynamic shared memory implementation ... posix
creating configuration files ... ok
child process was terminated by signal 7: Bus error
initdb: removing contents of data directory "/var/lib/pgsql/data/userdata"

@wglambert wglambert added the question Usability question, not directly related to an error with the image label May 29, 2018
@yosifkit
Copy link
Member

We have no great ideas on how to debug this as we are neither experts in Postgres code nor Kubernetes code and we cannot realistically debug all issues with running the Official Images in random environment X.

Since you have been able to get it to work on plain Docker on the "broken" machine (and on a separate Kubernetes cluster), then it is not a problem with the Docker image. I would recommend trying to find out what is different between your two clusters and what is different between the plain Docker run and the Kubernetes deployment config (cgroups like memory limits, --shm-size, etc). Maybe #416.

In the future, it'd be better to post questions like this in the Docker Community Forums, the Docker Community Slack, Stack Overflow, or a Kubernetes specific help group.

@micdoher
Copy link

micdoher commented Jun 9, 2018

Hi dlohin, I'm getting the same issue and to expand slightly further on this, it works on my local minikube but not on a private K8s cluster running on vmware datastores so I am suspecting it maye be this aspect. I will continue fighting it and feedback if I stumble across the fix.

@midokura-agustin
Copy link

Same issue here. This image on K8s does not work (simple kubectl run) but running on plain docker (docker run) in the same host does.
The error code is 135 with a bus error.

@gopinatht
Copy link

Hey Guys, any progress or info for this issue? I am stuck on this for an urgent Demo.

@wglambert
Copy link

This is the only thing relevant I found relating to a bus error in kubernetes/docker
pytorch/pytorch#2244

Looks like the shared memory of the docker container wasn't set high enough. Setting a higher amount by adding --shm-size 8G to the docker run . . .

@ZzEeKkAa
Copy link

ZzEeKkAa commented Nov 19, 2018

Have the same issue, as well as on the gitlab image(error in postgres) and richarvey/nginx-php-fpm, webdevops/php-nginx, wordpress images (with php-fpm). Docker runs fine on the same host. Problem appeared on 1.12 version. 1.9 worked fine for me on all images.

The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.
 The database cluster will be initialized with locale "en_US.utf8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".
 Data page checksums are disabled.
 fixing permissions on existing directory /var/lib/postgresql/data ... ok
creating subdirectories ... ok
selecting default max_connections ... 10
selecting default shared_buffers ... 400kB
selecting dynamic shared memory implementation ... posix
creating configuration files ... ok
Bus error (core dumped)
child process exited with exit code 135
initdb: removing contents of data directory "/var/lib/postgresql/data"
running bootstrap script ...

I've tried to run postgresql 9.6.5. I've tried to mount /dev/shm/ both on the same host path and in empty dir. It didn't help. Guess it's not problem with shared memory.

Host system: ubuntu 16.04.

@henrywangx
Copy link

So is there any clue about this issue?
I tried all above ways, None of them works.
And I try this same image with another machine, the error is gone. It's weird

@ZzEeKkAa
Copy link

For me temporary workaround was to run one node at 1.9.11 and run this kind of images on it.
P.S.: you can connect 1.9.11 node to 1.11 cluster.

@nbartos
Copy link

nbartos commented Dec 14, 2018

I believe I hit the same issue (postgres works through docker run, but not k8s). The issue I hit was that huge pages were enabled, but they were not working through k8s, and Postgres wouldn't fall back properly to not using huge pages. I think there are several possible solutions to the problem:

  1. Modify the docker image to be able to set huge_pages = off in /usr/share/postgresql/postgresql.conf.sample before initdb was ran (this is what I did).
  2. Turn off huge page support on the system (vm.nr_hugepages = 0 in /etc/sysctl.conf).
  3. Fix Postgres's fallback mechanism when huge_pages = try is set (the default).
  4. Modify the k8s manifest to enable huge page support (https://kubernetes.io/docs/tasks/manage-hugepages/scheduling-hugepages/).
  5. Modify k8s to show that huge pages are not supported on the system, when they are not enabled for a specific container.

@henrywangx
Copy link

henrywangx commented Dec 18, 2018

As nbartos said, I tried to set vm.nr_hugepages = 0 in /etc/sysctl.conf.
Thanks to nbartos. Now the postgres works well.
Yes, we will continue to find out the root cause.

@recall704
Copy link

I get the same issue, and i try to change the docker image

FROM postgres:11.8

RUN sed -i -r 's/#huge_pages.*?/huge_pages = off/g' /usr/share/postgresql/postgresql.conf.sample
# docker build --no-cache -t postgres:11.8-huge-pages -f Dockerfile .

them use new image in k8s, it works well.

@MurzNN
Copy link

MurzNN commented Sep 10, 2021

Thanks for the solution, @nbartos! It also resolves for me the problem with running Solr in Kubernetes with "fatal error has been detected by the Java Runtime Environment"!

Here is the output of crashing Solr pod in k8s before setting the vm.nr_hugepages=0:

# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGBUS (0x7) at pc=0x00007f10759a53c3, pid=11, tid=60
#
# JRE version:  (11.0.12+7) (build )
# Java VM: OpenJDK 64-Bit Server VM (11.0.12+7, mixed mode, sharing, tiered, compressed oops, g1 gc, linux-amd64)
# Problematic frame:
# V  [libjvm.so+0x81d3c3]  CodeHeap::allocate(unsigned long)+0x293
#
# Core dump will be written. Default location: /opt/solr-8.9.0/server/core
#
# An error report file with more information is saved as:
# /tmp/hs_err_pid11.log

@docker-library docker-library locked as resolved and limited conversation to collaborators Sep 10, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
question Usability question, not directly related to an error with the image
Projects
None yet
Development

No branches or pull requests