Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New spilo from the most recent master crashes #384

Closed
ghost opened this issue Nov 21, 2019 · 15 comments · Fixed by CyberDem0n/pam-oauth2#4 or #412
Closed

New spilo from the most recent master crashes #384

ghost opened this issue Nov 21, 2019 · 15 comments · Fixed by CyberDem0n/pam-oauth2#4 or #412

Comments

@ghost
Copy link

ghost commented Nov 21, 2019

I just cloned the repo and build the image . When I use it in the postgres-operator, it crashes. Am I missing something?

@CyberDem0n
Copy link
Contributor

Please define "crashes" and provide some evidence, i.e. logs

@ghost
Copy link
Author

ghost commented Nov 26, 2019

From docker
docker run -it ************/spilo:1.6-p1-13-gb75771e root@a9f7c5777c92:/# psql Warning: No existing local cluster is suitable as a default target. Please see man pg_wrapper(1) how to specify one. Error: You must install at least one postgresql-client-<version> package root@a9f7c5777c92:/# which psql /usr/bin/psql root@a9f7c5777c92:/# cat /var/log/ alternatives.log bootstrap.log dpkg.log lastlog tallylog apt/ btmp faillog postgresql/ wtmp root@a9f7c5777c92:/# cat /var/log/apt/ eipp.log.xz history.log term.log root@a9f7c5777c92:/# cat /var/log/apt/history.log root@a9f7c5777c92:/# cat /var/log/lastlog root@a9f7c5777c92:/# cat /var/log/faillog
From Kubernetes (postgres-operator)
`Name: platops-test-cluster-0
Namespace: default
Priority: 0
Node: aks-agentpool-30264578-1/10.240.0.6
Start Time: Tue, 26 Nov 2019 18:21:53 +0000
Labels: application=spilo
controller-revision-hash=platops-test-cluster-849cf9f68d
statefulset.kubernetes.io/pod-name=platops-test-cluster-0
team=PLATOPS
version=platops-test-cluster
Annotations:
Status: Running
IP: 10.244.0.229
IPs:
Controlled By: StatefulSet/platops-test-cluster
Init Containers:
date:
Container ID: docker://a3a3a61df751f4a3fdb8686a9be189dd2ea228d29e5036ada70d4fed0e1b8b6f
Image: busybox
Image ID: docker-pullable://busybox@sha256:1303dbf110c57f3edf68d9f5a16c082ec06c4cf7604831669faf2c712260b5a0
Port:
Host Port:
Command:
/bin/date
State: Terminated
Reason: Completed
Exit Code: 0
Started: Tue, 26 Nov 2019 18:22:32 +0000
Finished: Tue, 26 Nov 2019 18:22:32 +0000
Ready: True
Restart Count: 0
Environment:
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from zalando-postgres-operator-token-4nn6j (ro)
Containers:
postgres:
Container ID: docker://**************************
Image: /spilo:1.6-p1-13-gb75771e
Image ID: docker-pullable://
/spilo@sha256:f64535847e790347e522151da40ccb8cc46f8ef1dc22d989c81dc3bcf5a74957
Ports: 8008/TCP, 5432/TCP, 8080/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP
State: Terminated
Reason: Completed
Exit Code: 0
Started: Tue, 26 Nov 2019 18:25:49 +0000
Finished: Tue, 26 Nov 2019 18:25:49 +0000
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Tue, 26 Nov 2019 18:24:21 +0000
Finished: Tue, 26 Nov 2019 18:24:21 +0000
Ready: False
Restart Count: 5
Limits:
cpu: 300m
memory: 300Mi
Requests:
cpu: 10m
memory: 100Mi
Environment:
SCOPE: platops-test-cluster
PGROOT: /home/postgres/pgdata/pgroot
POD_IP: (v1:status.podIP)
POD_NAMESPACE: default (v1:metadata.namespace)
PGUSER_SUPERUSER: postgres
KUBERNETES_SCOPE_LABEL: version
KUBERNETES_ROLE_LABEL: spilo-role
KUBERNETES_LABELS: application=spilo
PGPASSWORD_SUPERUSER: <set to the key 'password' in secret 'postgres.platops-test-cluster.credentials'> Optional: false
PGUSER_STANDBY: standby
PGPASSWORD_STANDBY: <set to the key 'password' in secret 'standby.platops-test-cluster.credentials'> Optional: false
PAM_OAUTH2: https://info.example.com/oauth2/tokeninfo?access_token= uid realm=/employees
HUMAN_ROLE: zalandos
SPILO_CONFIGURATION: {"postgresql":{"bin_dir":"/usr/lib/postgresql/11/bin","parameters":{"log_statement":"all","shared_buffers":"32MB"},"pg_hba":["hostssl all all 0.0.0.0/0 md5","host all all 0.0.0.0/0 md5"]},"bootstrap":{"initdb":[{"auth-host":"md5"},{"auth-local":"trust"},"data-checksums",{"encoding":"UTF8"},{"locale":"en_US.UTF-8"}],"users":{"zalandos":{"password":"","options":["CREATEDB","NOLOGIN"]}},"dcs":{"ttl":30,"loop_wait":10,"retry_timeout":10,"maximum_lag_on_failover":33554432,"postgresql":{"parameters":{"max_connections":"10"}}}}}
DCS_ENABLE_KUBERNETES_API: true
Mounts:
/dev/shm from dshm (rw)
/home/postgres/pgdata from pgdata (rw)
/var/run/secrets/kubernetes.io/serviceaccount from zalando-postgres-operator-token-4nn6j (ro)
/var/secret from azkey (rw)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
pgdata:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: pgdata-platops-test-cluster-0
ReadOnly: false
dshm:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium: Memory
SizeLimit:
azkey:
Type: Secret (a volume populated by a Secret)
SecretName: azkey
Optional: false
zalando-postgres-operator-token-4nn6j:
Type: Secret (a volume populated by a Secret)
SecretName: zalando-postgres-operator-token-4nn6j
Optional: false
QoS Class: Burstable
Node-Selectors:
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message


Warning FailedScheduling 4m19s (x6 over 4m30s) default-scheduler pod has unbound immediate PersistentVolumeClaims (repeated 3 times)
Normal Scheduled 4m19s default-scheduler Successfully assigned default/platops-test-cluster-0 to aks-agentpool-30264578-1
Normal SuccessfulAttachVolume 3m49s attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-902fee47-1079-11ea-ae02-32c1c317e03d"
Normal Pulling 3m27s kubelet, aks-agentpool-30264578-1 pulling image "busybox"
Normal Pulled 3m27s kubelet, aks-agentpool-30264578-1 Successfully pulled image "busybox"
Normal Created 3m26s kubelet, aks-agentpool-30264578-1 Created container
Normal Started 3m26s kubelet, aks-agentpool-30264578-1 Started container
Normal Pulling 3m25s kubelet, aks-agentpool-30264578-1 pulling image "platops.azurecr.io/spilo:1.6-p1-13-gb75771e"
Normal Pulled 3m21s kubelet, aks-agentpool-30264578-1 Successfully pulled image "platops.azurecr.io/spilo:1.6-p1-13-gb75771e"
Normal Pulled 2m30s (x3 over 3m12s) kubelet, aks-agentpool-30264578-1 Container image "platops.azurecr.io/spilo:1.6-p1-13-gb75771e" already present on machine
Normal Created 2m29s (x4 over 3m13s) kubelet, aks-agentpool-30264578-1 Created container
Normal Started 2m29s (x4 over 3m13s) kubelet, aks-agentpool-30264578-1 Started container
Warning BackOff 113s (x8 over 3m11s) kubelet, aks-agentpool-30264578-1 Back-off restarting failed containerkubectl get podsplatops-test-cluster-0 0/1 CrashLoopBackOff 5 7m6s
postgres-operator-cf4659865-rv5hp 1/1 Running 0 6d`
All I did is download the spilo repo, create the image and uploaded to my private registry.

@CyberDem0n
Copy link
Contributor

What is in pod logs?

@ghost
Copy link
Author

ghost commented Nov 26, 2019

No logs in the pods. It crashes from the beggining and doesn't have any logs.

@CyberDem0n
Copy link
Contributor

psql Warning: No existing local cluster is suitable as a default target. -- this is fishy, it means there is no postgres binaries available in the container.

@ghost
Copy link
Author

ghost commented Nov 26, 2019

All I did is:

  1. Download the repo
  2. cd postgres-appliance
  3. ./build.sh --build-arg COMPRESS=true --tag $YOUR_TAG .
    No changes, just as it is.

@CyberDem0n
Copy link
Contributor

Well, if you use COMPRESS, you also have to decompress: https://github.com/zalando/spilo/blob/master/postgres-appliance/launch.sh#L4-L7

@ghost
Copy link
Author

ghost commented Nov 26, 2019

OK, will not use compress.

@CyberDem0n
Copy link
Contributor

docker run executes launch.sh, and it is already strange that you get the shell.

@CyberDem0n
Copy link
Contributor

registry.opensource.zalan.do/acid/spilo-cdp-11:1.6-p96 is built from master 12 days ago:

$ docker run --rm -ti registry.opensource.zalan.do/acid/spilo-cdp-11:1.6-p96
Unable to find image 'registry.opensource.zalan.do/acid/spilo-cdp-11:1.6-p96' locally
1.6-p96: Pulling from acid/spilo-cdp-11
e03cf5b1164f: Pull complete 
f00f4f04b803: Pull complete 
4b5e21f94914: Pull complete 
be6f8e16173c: Pull complete 
28240a8321f0: Pull complete 
175e06471d49: Pull complete 
118621e2ed9d: Pull complete 
101baf41bc80: Pull complete 
Digest: sha256:7ccae2edfa11c06cd4e0d8814e2a0ff4de5d6dfb761fed672e452a61ed946825
Status: Downloaded newer image for registry.opensource.zalan.do/acid/spilo-cdp-11:1.6-p96
decompressing spilo image...
2019-11-26 19:03:47,550 - bootstrapping - INFO - Figuring out my environment (Google? AWS? Openstack? Local?)
2019-11-26 19:03:49,554 - bootstrapping - INFO - Could not connect to 169.254.169.254, assuming local Docker setup
2019-11-26 19:03:49,555 - bootstrapping - INFO - No meta-data available for this provider
2019-11-26 19:03:49,555 - bootstrapping - INFO - Looks like your running local
2019-11-26 19:03:49,587 - bootstrapping - INFO - Configuring bootstrap
2019-11-26 19:03:49,587 - bootstrapping - INFO - Configuring standby-cluster
2019-11-26 19:03:49,587 - bootstrapping - INFO - Configuring patronictl
2019-11-26 19:03:49,587 - bootstrapping - INFO - Configuring pam-oauth2
2019-11-26 19:03:49,587 - bootstrapping - INFO - No PAM_OAUTH2 configuration was specified, skipping
2019-11-26 19:03:49,587 - bootstrapping - INFO - Configuring pgbouncer
2019-11-26 19:03:49,587 - bootstrapping - INFO - No PGBOUNCER_CONFIGURATION was specified, skipping
2019-11-26 19:03:49,587 - bootstrapping - INFO - Configuring renice
2019-11-26 19:03:49,595 - bootstrapping - INFO - Skipping creation of renice cron job due to lack of permissions
2019-11-26 19:03:49,595 - bootstrapping - INFO - Configuring patroni
2019-11-26 19:03:49,606 - bootstrapping - INFO - Writing to file /home/postgres/postgres.yml
2019-11-26 19:03:49,606 - bootstrapping - INFO - Configuring wal-e
2019-11-26 19:03:49,606 - bootstrapping - INFO - Configuring crontab
2019-11-26 19:03:49,615 - bootstrapping - INFO - Configuring certificate
2019-11-26 19:03:49,615 - bootstrapping - INFO - Generating ssl certificate
2019-11-26 19:03:49,767 - bootstrapping - INFO - Configuring log
2019-11-26 19:03:49,767 - bootstrapping - INFO - Configuring pgqd
$ docker exec -ti vibrant_mahavira bash
 ____        _ _
/ ___| _ __ (_) | ___
\___ \| '_ \| | |/ _ \
 ___) | |_) | | | (_) |
|____/| .__/|_|_|\___/
      |_|                                                                                                                                                                                                                                                                       
                                                                                                                                                                                                                                                                                
This container is managed by runit, when stopping/starting services use sv                                                                                                                                                                                                      
                                                                                                                                                                                                                                                                                
Examples:                                                                                                                                                                                                                                                                       
                                                                                                                                                                                                                                                                                
sv stop cron                                                                                                                                                                                                                                                                    
sv restart patroni                                                                                                                                                                                                                                                              
                                                                                                                                                                                                                                                                                
Current status: (sv status /etc/service/*)                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                                                
run: /etc/service/cron: (pid 39) 213s                                                                                                                                                                                                                                           
run: /etc/service/etcd: (pid 40) 213s                                                                                                                                                                                                                                           
run: /etc/service/patroni: (pid 37) 213s                                                                                                                                                                                                                                        
run: /etc/service/pgqd: (pid 36) 213s                                                                                                                                                                                                                                           
root@a56dcbb86742:/home/postgres# cat /scm-source.json                                                                                                                                                                                                                          
{                                                                                                                                                                                                                                                                               
    "url": "git:https://github.com/zalando/spilo.git",                                                                                                                                                                                                                          
    "revision": "b75771ec85bcb3425d39d25f6b4e6cb99bf04f19",                                                                                                                                                                                                                     
    "author": "Alexander Kukushkin <cyberdemn@gmail.com>",                                                                                                                                                                                                                      
    "status": ""                                                                                                                                                                                                                                                                
}                                                                                                                                                                                                                                                                               ```

@ghost
Copy link
Author

ghost commented Nov 26, 2019

docker.txt
Here is the logs when I create the docker image. I am getting some errors.

@ghost
Copy link
Author

ghost commented Nov 26, 2019

Getting this error while building the image:
`+ git clone -b v1.0 --recurse-submodules https://github.com/CyberDem0n/pam-oauth2.git
Cloning into 'pam-oauth2'...
Note: checking out 'cffc8f76dc930bf58bef7030cd7571081e293d54'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

git checkout -b

Submodule 'jsmn' (git://github.com/zserge/jsmn.git) registered for path 'jsmn'
Cloning into '/builddeps/pam-oauth2/jsmn'...
fatal: unable to connect to github.com:
github.com[0: 140.82.114.3]: errno=Connection refused

fatal: clone of 'git://github.com/zserge/jsmn.git' into submodule path '/builddeps/pam-oauth2/jsmn' failed
Failed to clone 'jsmn'. Retry scheduled
Cloning into '/builddeps/pam-oauth2/jsmn'...
fatal: unable to connect to github.com:
github.com[0: 140.82.114.3]: errno=Connection refused

fatal: clone of 'git://github.com/zserge/jsmn.git' into submodule path '/builddeps/pam-oauth2/jsmn' failed
Failed to clone 'jsmn' a second time, aborting`
Any ideas?

@ghost
Copy link
Author

ghost commented Dec 3, 2019

I inspected the image I created and I notice the CMD is like this:
CMD ["/bin/sh", "/launch.sh"]
The image I built is like this:
"Cmd": [ "/bin/sh", "-c", "#(nop) ",
What can cause this? The only thing I did is:
`

$ cd postgres-appliance

$ ./build.sh --tag $YOUR_TAG .

`

@ghost
Copy link
Author

ghost commented Dec 9, 2019

I couldn't build it due to firewall issues. Closing the case.

@gertvdijk
Copy link
Contributor

gertvdijk commented Feb 18, 2020

I see the build error is caused by use of old-style Git protocol instead of HTTPS and had the same. Fixable by an insteadOf Git config locally (e.g. https://gist.github.com/Kovrinic/ea5e7123ab5c97d451804ea222ecd78a). Created a PR to the repo of @CyberDem0n too.

Ugh. Next failure:

+ git clone git://www.sigaev.ru/plantuner.git
Cloning into 'plantuner'...
fatal: unable to connect to www.sigaev.ru:
www.sigaev.ru[0: 93.180.27.50]: errno=Connection timed out

(Why is the build depending on a random Russian site anyway without any version pin? This is very bad dependency management and also because the use of insecure Git protocol may include malicious software in builds too.)

Fixed that one with PR #412.

With those two changes, I was able to build the Spilo image.

CyberDem0n pushed a commit to CyberDem0n/pam-oauth2 that referenced this issue Feb 19, 2020
To overcome corporate/enterprise firewall issues on build time of Spilo.

Fixes zalando/spilo#384
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants