Error 135 in Initdb deploying inside Kubernetes #451

dlohin · 2018-05-29T14:09:56Z

When I attempt to run the Postgres container using Kubernetes I get an error and the container crashes. I have been banging my head for a few days on this but can't find anything that points me in the right direction as to what to debug. I have tried using the Postgres container with Docker using the same host and this works fine. I have also tested on a different cluster using Kubernetes cluster and it is working fine so I believe it is something environment specific.

When I set the container to not enter into the entrypoint I can then recreate the initdb error.

Here is the output I get when I run initdb:

postgres@postgresql-844495667c-fdtzw:/$ /usr/lib/postgresql/10/bin/initdb -d -n /db
Running in debug mode.
Running in no-clean mode. Mistakes will not be cleaned up.
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

VERSION=10.4 (Debian 10.4-2.pgdg90+1)
PGDATA=/db
share_path=/usr/share/postgresql/10
PGPATH=/usr/lib/postgresql/10/bin
POSTGRES_SUPERUSERNAME=postgres
POSTGRES_BKI=/usr/share/postgresql/10/postgres.bki
POSTGRES_DESCR=/usr/share/postgresql/10/postgres.description
POSTGRES_SHDESCR=/usr/share/postgresql/10/postgres.shdescription
POSTGRESQL_CONF_SAMPLE=/usr/share/postgresql/10/postgresql.conf.sample
PG_HBA_SAMPLE=/usr/share/postgresql/10/pg_hba.conf.sample
PG_IDENT_SAMPLE=/usr/share/postgresql/10/pg_ident.conf.sample
The database cluster will be initialized with locale "C".
The default database encoding has accordingly been set to "SQL_ASCII".
The default text search configuration will be set to "english".

Data page checksums are disabled.

fixing permissions on existing directory /db ... ok
creating subdirectories ... ok
selecting default max_connections ... 20
selecting default shared_buffers ... 400kB
selecting dynamic shared memory implementation ... posix
creating configuration files ... ok
running bootstrap script ... 2018-05-29 13:59:16.693 UTC [249] DEBUG: invoking IpcMemoryCreate(size=3055616)
Bus error (core dumped)
child process exited with exit code 135
initdb: data directory "/db" not removed at user's request

Things I have tried:
Increasing SHM on the host and the container
Running as privileged
Running older versions of Postgres
Running STRACE to see if anything jumped out at me
Increasing CPU limits and requests

Even if someone can point me in the right direction I would be forever grateful. Right now I am stuck banging my head against a wall.

postgresdump.zip

dlohin · 2018-05-29T14:26:29Z

Not sure if this helps, I tried a Centos7-postgres image and got a similiar error though with a bit more details.:

fixing permissions creating subdirectories ... ok
sh: line 1: 24 Bus error sh: line 1: 26 Bus error sh: line 1: 28 Bus error sh: line 1: 30 Bus error sh: line 1: 32 Bus error sh: line 1: 34 Bus error selecting default sh: line 1: 36 Bus error sh: line 1: 38 Bus error sh: line 1: 40 Bus error sh: line 1: 42 Bus error sh: line 1: 44 Bus error sh: line 1: 46 Bus error sh: line 1: 48 Bus error sh: line 1: 50 Bus error sh: line 1: 52 Bus error sh: line 1: 54 Bus error sh: line 1: 56 Bus error sh: line 1: 58 Bus error sh: line 1: 60 Bus error sh: line 1: 62 Bus error sh: line 1: 64 Bus error sh: line 1: 66 Bus error sh: line 1: 68 Bus error sh: line 1: 70 Bus error sh: line 1: 72 Bus error selecting default selecting dynamic creating configuration files ... ok
child process was initdb: removing contents on existing directory /var/lib/pgsql/data/userdata ... ok
(core dumped) "/opt/rh/rh-postgresql96/root/usr/bin/postgres" --boot -x0 -F -c max_connections=100 -c shared_buffers=1000 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
(core dumped) "/opt/rh/rh-postgresql96/root/usr/bin/postgres" --boot -x0 -F -c max_connections=50 -c shared_buffers=500 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
(core dumped) "/opt/rh/rh-postgresql96/root/usr/bin/postgres" --boot -x0 -F -c max_connections=40 -c shared_buffers=400 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
(core dumped) "/opt/rh/rh-postgresql96/root/usr/bin/postgres" --boot -x0 -F -c max_connections=30 -c shared_buffers=300 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
(core dumped) "/opt/rh/rh-postgresql96/root/usr/bin/postgres" --boot -x0 -F -c max_connections=20 -c shared_buffers=200 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
(core dumped) "/opt/rh/rh-postgresql96/root/usr/bin/postgres" --boot -x0 -F -c max_connections=10 -c shared_buffers=100 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
max_connections ... 10
(core dumped) "/opt/rh/rh-postgresql96/root/usr/bin/postgres" --boot -x0 -F -c max_connections=10 -c shared_buffers=16384 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
(core dumped) "/opt/rh/rh-postgresql96/root/usr/bin/postgres" --boot -x0 -F -c max_connections=10 -c shared_buffers=8192 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
(core dumped) "/opt/rh/rh-postgresql96/root/usr/bin/postgres" --boot -x0 -F -c max_connections=10 -c shared_buffers=4096 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
(core dumped) "/opt/rh/rh-postgresql96/root/usr/bin/postgres" --boot -x0 -F -c max_connections=10 -c shared_buffers=3584 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
(core dumped) "/opt/rh/rh-postgresql96/root/usr/bin/postgres" --boot -x0 -F -c max_connections=10 -c shared_buffers=3072 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
(core dumped) "/opt/rh/rh-postgresql96/root/usr/bin/postgres" --boot -x0 -F -c max_connections=10 -c shared_buffers=2560 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
(core dumped) "/opt/rh/rh-postgresql96/root/usr/bin/postgres" --boot -x0 -F -c max_connections=10 -c shared_buffers=2048 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
(core dumped) "/opt/rh/rh-postgresql96/root/usr/bin/postgres" --boot -x0 -F -c max_connections=10 -c shared_buffers=1536 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
(core dumped) "/opt/rh/rh-postgresql96/root/usr/bin/postgres" --boot -x0 -F -c max_connections=10 -c shared_buffers=1000 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
(core dumped) "/opt/rh/rh-postgresql96/root/usr/bin/postgres" --boot -x0 -F -c max_connections=10 -c shared_buffers=900 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
(core dumped) "/opt/rh/rh-postgresql96/root/usr/bin/postgres" --boot -x0 -F -c max_connections=10 -c shared_buffers=800 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
(core dumped) "/opt/rh/rh-postgresql96/root/usr/bin/postgres" --boot -x0 -F -c max_connections=10 -c shared_buffers=700 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
(core dumped) "/opt/rh/rh-postgresql96/root/usr/bin/postgres" --boot -x0 -F -c max_connections=10 -c shared_buffers=600 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
(core dumped) "/opt/rh/rh-postgresql96/root/usr/bin/postgres" --boot -x0 -F -c max_connections=10 -c shared_buffers=500 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
(core dumped) "/opt/rh/rh-postgresql96/root/usr/bin/postgres" --boot -x0 -F -c max_connections=10 -c shared_buffers=400 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
(core dumped) "/opt/rh/rh-postgresql96/root/usr/bin/postgres" --boot -x0 -F -c max_connections=10 -c shared_buffers=300 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
(core dumped) "/opt/rh/rh-postgresql96/root/usr/bin/postgres" --boot -x0 -F -c max_connections=10 -c shared_buffers=200 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
(core dumped) "/opt/rh/rh-postgresql96/root/usr/bin/postgres" --boot -x0 -F -c max_connections=10 -c shared_buffers=100 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
(core dumped) "/opt/rh/rh-postgresql96/root/usr/bin/postgres" --boot -x0 -F -c max_connections=10 -c shared_buffers=50 -c dynamic_shared_memory_type=none < "/dev/null" > "/dev/null" 2>&1
shared_buffers ... 400kB
shared memory implementation ... posix
terminated by signal 7: Bus error
of data directory "/var/lib/pgsql/data/userdata"

yosifkit · 2018-05-29T20:18:05Z

We have no great ideas on how to debug this as we are neither experts in Postgres code nor Kubernetes code and we cannot realistically debug all issues with running the Official Images in random environment X.

Since you have been able to get it to work on plain Docker on the "broken" machine (and on a separate Kubernetes cluster), then it is not a problem with the Docker image. I would recommend trying to find out what is different between your two clusters and what is different between the plain Docker run and the Kubernetes deployment config (cgroups like memory limits, --shm-size, etc). Maybe #416.

In the future, it'd be better to post questions like this in the Docker Community Forums, the Docker Community Slack, Stack Overflow, or a Kubernetes specific help group.

micdoher · 2018-06-09T16:01:37Z

Hi dlohin, I'm getting the same issue and to expand slightly further on this, it works on my local minikube but not on a private K8s cluster running on vmware datastores so I am suspecting it maye be this aspect. I will continue fighting it and feedback if I stumble across the fix.

midokura-agustin · 2018-07-18T08:15:07Z

Same issue here. This image on K8s does not work (simple kubectl run) but running on plain docker (docker run) in the same host does.
The error code is 135 with a bus error.

gopinatht · 2018-09-04T23:20:06Z

Hey Guys, any progress or info for this issue? I am stuck on this for an urgent Demo.

wglambert · 2018-09-04T23:31:59Z

This is the only thing relevant I found relating to a bus error in kubernetes/docker
pytorch/pytorch#2244

Looks like the shared memory of the docker container wasn't set high enough. Setting a higher amount by adding --shm-size 8G to the docker run . . .

ZzEeKkAa · 2018-11-19T21:27:36Z

Have the same issue, as well as on the gitlab image(error in postgres) and richarvey/nginx-php-fpm, webdevops/php-nginx, wordpress images (with php-fpm). Docker runs fine on the same host. Problem appeared on 1.12 version. 1.9 worked fine for me on all images.

The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.
 The database cluster will be initialized with locale "en_US.utf8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".
 Data page checksums are disabled.
 fixing permissions on existing directory /var/lib/postgresql/data ... ok
creating subdirectories ... ok
selecting default max_connections ... 10
selecting default shared_buffers ... 400kB
selecting dynamic shared memory implementation ... posix
creating configuration files ... ok
Bus error (core dumped)
child process exited with exit code 135
initdb: removing contents of data directory "/var/lib/postgresql/data"
running bootstrap script ...

I've tried to run postgresql 9.6.5. I've tried to mount /dev/shm/ both on the same host path and in empty dir. It didn't help. Guess it's not problem with shared memory.

Host system: ubuntu 16.04.

henrywangx · 2018-12-14T07:15:36Z

So is there any clue about this issue?
I tried all above ways, None of them works.
And I try this same image with another machine, the error is gone. It's weird

ZzEeKkAa · 2018-12-14T11:43:44Z

For me temporary workaround was to run one node at 1.9.11 and run this kind of images on it.
P.S.: you can connect 1.9.11 node to 1.11 cluster.

nbartos · 2018-12-14T20:34:55Z

I believe I hit the same issue (postgres works through docker run, but not k8s). The issue I hit was that huge pages were enabled, but they were not working through k8s, and Postgres wouldn't fall back properly to not using huge pages. I think there are several possible solutions to the problem:

Modify the docker image to be able to set huge_pages = off in /usr/share/postgresql/postgresql.conf.sample before initdb was ran (this is what I did).
Turn off huge page support on the system (vm.nr_hugepages = 0 in /etc/sysctl.conf).
Fix Postgres's fallback mechanism when huge_pages = try is set (the default).
Modify the k8s manifest to enable huge page support (https://kubernetes.io/docs/tasks/manage-hugepages/scheduling-hugepages/).
Modify k8s to show that huge pages are not supported on the system, when they are not enabled for a specific container.

henrywangx · 2018-12-18T06:24:58Z

As nbartos said, I tried to set vm.nr_hugepages = 0 in /etc/sysctl.conf.
Thanks to nbartos. Now the postgres works well.
Yes, we will continue to find out the root cause.

recall704 · 2021-06-30T05:34:41Z

I get the same issue, and i try to change the docker image

FROM postgres:11.8

RUN sed -i -r 's/#huge_pages.*?/huge_pages = off/g' /usr/share/postgresql/postgresql.conf.sample

# docker build --no-cache -t postgres:11.8-huge-pages -f Dockerfile .

them use new image in k8s, it works well.

MurzNN · 2021-09-10T13:02:15Z

Thanks for the solution, @nbartos! It also resolves for me the problem with running Solr in Kubernetes with "fatal error has been detected by the Java Runtime Environment"!

Here is the output of crashing Solr pod in k8s before setting the vm.nr_hugepages=0:

# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGBUS (0x7) at pc=0x00007f10759a53c3, pid=11, tid=60
#
# JRE version:  (11.0.12+7) (build )
# Java VM: OpenJDK 64-Bit Server VM (11.0.12+7, mixed mode, sharing, tiered, compressed oops, g1 gc, linux-amd64)
# Problematic frame:
# V  [libjvm.so+0x81d3c3]  CodeHeap::allocate(unsigned long)+0x293
#
# Core dump will be written. Default location: /opt/solr-8.9.0/server/core
#
# An error report file with more information is saved as:
# /tmp/hs_err_pid11.log

wglambert added the question Usability question, not directly related to an error with the image label May 29, 2018

yosifkit closed this as completed May 29, 2018

ZzEeKkAa mentioned this issue Nov 19, 2018

Bus error (core dumped) kubernetes/kubernetes#71233

Closed

henrywangx mentioned this issue May 5, 2019

Why the huge_page can't be closed? patroni/patroni#1051

Closed

sskurapati mentioned this issue Jul 5, 2021

Is postgres-operator supports k3s? zalando/postgres-operator#1548

Closed

viorel-anghel mentioned this issue Aug 16, 2021

not working on k3s zalando/postgres-operator#1583

Closed

docker-library locked as resolved and limited conversation to collaborators Sep 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error 135 in Initdb deploying inside Kubernetes #451

Error 135 in Initdb deploying inside Kubernetes #451

dlohin commented May 29, 2018

dlohin commented May 29, 2018

yosifkit commented May 29, 2018

micdoher commented Jun 9, 2018

midokura-agustin commented Jul 18, 2018

gopinatht commented Sep 4, 2018

wglambert commented Sep 4, 2018

ZzEeKkAa commented Nov 19, 2018 •

edited

Loading

henrywangx commented Dec 14, 2018

ZzEeKkAa commented Dec 14, 2018

nbartos commented Dec 14, 2018

henrywangx commented Dec 18, 2018 •

edited

Loading

recall704 commented Jun 30, 2021

MurzNN commented Sep 10, 2021

Error 135 in Initdb deploying inside Kubernetes #451

Error 135 in Initdb deploying inside Kubernetes #451

Comments

dlohin commented May 29, 2018

dlohin commented May 29, 2018

yosifkit commented May 29, 2018

micdoher commented Jun 9, 2018

midokura-agustin commented Jul 18, 2018

gopinatht commented Sep 4, 2018

wglambert commented Sep 4, 2018

ZzEeKkAa commented Nov 19, 2018 • edited Loading

henrywangx commented Dec 14, 2018

ZzEeKkAa commented Dec 14, 2018

nbartos commented Dec 14, 2018

henrywangx commented Dec 18, 2018 • edited Loading

recall704 commented Jun 30, 2021

MurzNN commented Sep 10, 2021

ZzEeKkAa commented Nov 19, 2018 •

edited

Loading

henrywangx commented Dec 18, 2018 •

edited

Loading