Skip to content
This repository has been archived by the owner on Jan 18, 2024. It is now read-only.

Chart deployment fails due to Patroni pgbackrest_restore.sh error with PodSecurityPolicy #492

Closed
rituraj-AU opened this issue Nov 14, 2022 · 10 comments
Assignees
Labels
bug Something isn't working

Comments

@rituraj-AU
Copy link

What happened?
Get the following error in all my timescale pods:

2022-11-14 18:14:17,077 ERROR: Error creating replica using method pgbackrest: /etc/timescaledb/scripts/pgbackrest_restore.sh exited with code=1
2022-11-14 18:14:17,077 ERROR: failed to bootstrap from leader 'my-cluster-timescaledb-0'
2022-11-14 18:14:27,076 ERROR: Permission denied
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/patroni/dcs/kubernetes.py", line 498, in wrapper
    return func(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/patroni/dcs/kubernetes.py", line 1088, in touch_member
    ret = self._api.patch_namespaced_pod(self._name, self._namespace, body)
  File "/usr/lib/python3/dist-packages/patroni/dcs/kubernetes.py", line 483, in wrapper
    return getattr(self._core_v1_api, func)(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/patroni/dcs/kubernetes.py", line 419, in wrapper
    return self._api_client.call_api(method, path, headers, body, **kwargs)
  File "/usr/lib/python3/dist-packages/patroni/dcs/kubernetes.py", line 388, in call_api
    return self._handle_server_response(response, _preload_content)
  File "/usr/lib/python3/dist-packages/patroni/dcs/kubernetes.py", line 218, in _handle_server_response
    raise k8s_client.rest.ApiException(http_resp=response)
patroni.dcs.kubernetes.K8sClient.rest.ApiException: (403)
Reason: Forbidden
HTTP response headers: HTTPHeaderDict({'Audit-Id': 'a34c6fd2-a1c3-4cdf-99cf-1288fddf8817', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'Date': 'Mon, 14 Nov 2022 18:14:27 GMT', 'Content-Length': '289'})
HTTP response body: b'{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods \\"my-cluster-timescaledb-1\\" is forbidden: PodSecurityPolicy: unable to validate pod: []","reason":"Forbidden","details":{"name":"my-cluster-timescaledb-1","kind":"pods"},"code":403}\n'

In addition, there is no Postgres server running either:

Readiness probe failed: /var/run/postgresql:5432 - no response

Did you expect to see something different?
Expected the pods to have no errors, and postgres server up and running.

How to reproduce it (as minimally and precisely as possible):

helm install my-cluster ../timescaledb \                                                                                                                  [10:56:40]
  --namespace=default \
  --set image.tag=latest \
  -f values.yaml

Environment
Development or Minikube. Same behaviour in both.

  • Which helm chart and what version are you using?
    0.20.0

  • What is in your values.yaml ?
    Here is my values.yaml

timescaledb-single:
  image:
    pullPolicy: IfNotPresent

  service:
    primary:
      labels:
        team: my-team
    replica:
      labels:
        team: my-team

  prometheus:
    enabled: true
  
persistentVolumes:
    wal:
      size: 20G
    data:
      size: 5G

  replicaCount: 3

  serviceAccount:
    create: false  # There's an existing service account already from an earlier install.
    name: my-cluster-timescaledb
  • Kubernetes version information:
Client Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.17", GitCommit:"f3abc15296f3a3f54e4ee42e830c61047b13895f", GitTreeState:"clean", BuildDate:"2021-01-13T13:21:12Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.15", GitCommit:"58178e7f7aab455bc8de88d3bdd314b64141e7ee", GitTreeState:"clean", BuildDate:"2021-09-15T19:18:00Z", GoVersion:"go1.15.15", Compiler:"gc", Platform:"linux/amd64"}
  • Kubernetes cluster kind:

    Tried both on minikube:
    minikube start --memory 9216 --cpus 5 --disk-size 50g --driver hyperkit
    As well as my company's existing kubernetes cluster

Anything else we need to know?:
No

@rituraj-AU rituraj-AU added the bug Something isn't working label Nov 14, 2022
@nhudson
Copy link
Contributor

nhudson commented Nov 15, 2022

This is/was most likely a bug in Patroni which was fixed with patroni/patroni#1132 & patroni/patroni#2390.

We have merged the change to add these fixes into the timescaledb container image timescale/timescaledb-docker-ha#319

The image changes are currently in the process of being tested. We hope to have a release soon.

@rituraj-AU
Copy link
Author

Thanks! Does pg14.6-ts2.8.1-latest version of timescale/timescaledb-ha contain the patch that you've applied? If not, is there an experimental version that I can use to test if it works now? thanks

@rituraj-AU
Copy link
Author

I don't know if the fix is part of pg14.6-ts2.8.1-latest or not. I tried again with pg14.6-ts2.8.1-latest and chart version 0.22.0 and the issue persists.

@rituraj-AU
Copy link
Author

Also, patroni/patroni#1132 is a different issue where the serviceAccount isn't authorized to create services. That isn't the issue I mentioned above. My issue is related to a PodSecurityPolicy :

HTTP response body: b'{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure",
"message":"pods \\"my-cluster-timescaledb-1\\" is forbidden: PodSecurityPolicy: unable to validate pod: []",
"reason":"Forbidden","details":{"name":"my-cluster-timescaledb-1","kind":"pods"},"code":403}\n'

@nhudson
Copy link
Contributor

nhudson commented Nov 15, 2022

The timescaledb container image with the fix has not been released yet. So there are currently no images to test with or to use until the team is ready to release it.

As for the error message I believe it is the same code that is triggering it, but I am not really sure. It is a Patroni issue, so if you want to raise it there just to be through I don't think it would hurt.

Otherwise for the time being you can work around this by following this timescale/timescaledb-docker-ha#319 (comment) and giving it access Service and PodSecurityPolicy

I will caution mainly due to their own documentation, it states that it does not need access to these API's in Kubernetes.

@nhudson
Copy link
Contributor

nhudson commented Nov 16, 2022

@rituraj-AU pg14.6-ts2.8.1-latest & pg14.6-ts2.8.1-p0 images contain the fix for patroni/patroni#1132. So if it's not working I would urge you to open a new issue with Patroni so that can get it fixed.

@rituraj-AU
Copy link
Author

Thanks for the update, Nick. I did some more digging and found out that there was actually a PodSecurityPolicy preventing Patroni from doing its thing.
For others who might be facing the same issue, below are the psp and role changes that worked for me:

apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
  name: your-pod-security-policy
spec:
  fsGroup:
    rule: 'RunAsAny'
  hostIPC: false
  hostNetwork: false
  hostPID: false
  hostPorts:
  - max: 0
    min: 0
  privileged: false
  readOnlyRootFilesystem: false
  runAsGroup:
    rule: 'RunAsAny'
  runAsUser:
    rule: 'RunAsAny'
  seLinux:
    rule: 'RunAsAny'
  supplementalGroups:
    rule: 'RunAsAny'
  volumes:
  - configMap
  - downwardAPI
  - emptyDir
  - persistentVolumeClaim
  - projected
  - secret

The role:

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: your-role
  namespace: your-namespace
rules:
- apiGroups:
  - ""
  resources:
  - configmaps
  verbs:
  - create
  - get
  - list
  - patch
  - update
  - watch
  - delete
- apiGroups:
  - ""
  resources:
  - endpoints
  - endpoints/restricted
  verbs:
  - create
  - get
  - patch
  - update
  - list
  - watch
  - delete
- apiGroups:
  - ""
  resources:
  - pods
  verbs:
  - get
  - list
  - patch
  - update
  - watch
- apiGroups: [""]
  resources: ["services"]
  verbs: ["create"]
- apiGroups: 
  - extensions
  resources:
  - podsecuritypolicies
  verbs:
  - use
- apiGroups:
  - extensions
  resources:
  - podsecuritypolicies
  resourceNames:
  - your-pod-security-policy  # The one you created above
  verbs:
  - use

Please feel free to close this issue. Thanks

@nhudson
Copy link
Contributor

nhudson commented Nov 16, 2022

Awesome! Thanks for the info and happy you got it resolved!

@nhudson nhudson closed this as completed Nov 16, 2022
@zfy0701
Copy link
Contributor

zfy0701 commented Dec 2, 2022

Awesome! Thanks for the info and happy you got it resolved!

@nhudson it seems the new image only fix for the "static primary" which I assume only works for cluster that has one node.
For the general image, is that still issue in patroni side? If not, is there any ETA on the fix?

@nhudson
Copy link
Contributor

nhudson commented Feb 13, 2023

@zfy0701 yes I had thought we updated Patroni in our standard image. Looks like we are waiting on an update to Patroni in the Debian repos

timescale/timescaledb-docker-ha#343

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants