-
Notifications
You must be signed in to change notification settings - Fork 97
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the expected outcome
The status of certificate request shall be Pending, so the certificate can be retry when the network connection is back.
Describe the actual outcome
The status of certificate request will be Failed after 3 attempts. And the certificaterequest will be ignored by aws-pricateca issuer (upstream code: https://github.com/cert-manager/aws-privateca-issuer/blob/main/pkg/controllers/certificaterequest_controller.go#L96-L104).
Was wondering if it is possible to create a similar PR to this (#384) to let the status be Pending instead of Failed when encountered network issue?
Steps to reproduce
- Install cert-manager v1.16.4
- Install aws privateca issuer v1.6.0
- Let the aws-privateca issuer pods in the Kubernetes cluster unable to reach the AWS PCA Service (to simulate the network error) by
- Changing the CoreDNS Config map to make the AWS PCA Service Unreachable for aws-pca issuer pods
- Change the security group of VPC client that connects the aws-private-ca pods and the aws pca service
- After 3 attempts, the certificate request status will be failed
- Recover the network and wait for next retry
- The certificaterequest will never be retried since the status is
Failed
(We will need to delete the failed certificate request again and retry cmctl renew <certificate name>)
Relevant log output
The aws pca issuer will ignore the certificate request (set zap-log-level=9)
k logs -n cert-manager aws-privateca-issuer-7678d777b9-k7vz2 -f | grep test-cert-20
{"level":"Level(-5)","ts":"2025-08-29T07:29:29Z","msg":"Reconciling","controller":"certificaterequest","controllerGroup":"cert-manager.io","controllerKind":"CertificateRequest","CertificateRequest":{"name":"test-cert-20","namespace":"cert-manager"},"namespace":"cert-manager","name":"test-cert-20","reconcileID":"eb8a3171-2e21-4dd7-b95d-649d76dfc69b"}
{"level":"Level(-4)","ts":"2025-08-29T07:29:29Z","logger":"controllers.CertificateRequest","msg":"CertificateRequest is Failed. Ignoring.","certificaterequest":{"name":"test-cert-20","namespace":"cert-manager"}}
{"level":"Level(-5)","ts":"2025-08-29T07:29:29Z","msg":"Reconcile successful","controller":"certificaterequest","controllerGroup":"cert-manager.io","controllerKind":"CertificateRequest","CertificateRequest":{"name":"test-cert-20","namespace":"cert-manager"},"namespace":"cert-manager","name":"test-cert-20","reconcileID":"eb8a3171-2e21-4dd7-b95d-649d76dfc69b"}
cert manager keeps backing off the retry:
I0829 16:21:28.058473 1 controller.go:152] "re-queuing item due to optimistic locking on resource" logger="cert-manager.controller" error="Operation cannot be fulfilled on certificates.cert-manager.io \"test-cert\": the object has been modified; please apply your changes to the latest version and try again"
W0829 16:21:28.074812 1 warnings.go:70] unknown field "status.failedIssuanceAttempts"
I0829 16:21:28.074833 1 trigger_controller.go:202] "Backing off from issuance due to previously failed issuance(s). Issuance will next be attempted at 2025-08-29 17:21:28.000000942 +0000 UTC m=+210011.565085580" logger="cert-manager.controller" key="cert-manager/test-cert"
I0829 16:21:28.105934 1 trigger_controller.go:202] "Backing off from issuance due to previously failed issuance(s). Issuance will next be attempted at 2025-08-29 17:21:28.000000756 +0000 UTC m=+210011.565085390" logger="cert-manager.controller" key="cert-manager/test-cert"
I0829 16:21:28.122107 1 controller.go:152] "re-queuing item due to optimistic locking on resource" logger="cert-manager.controller" error="Operation cannot be fulfilled on certificates.cert-manager.io \"test-cert\": the object has been modified; please apply your changes to the latest version and try again"
I0829 17:21:28.002067 1 trigger_controller.go:223] "Certificate must be re-issued" logger="cert-manager.controller" key="cert-manager/test-cert" reason="Renewing" message="Renewing certificate as renewal was scheduled at 2025-08-29 07:21:25 +0000 UTC"
I0829 17:21:28.002172 1 conditions.go:192] Found status change for Certificate "test-cert" condition "Issuing": "False" -> "True"; setting lastTransitionTime to 2025-08-29 17:21:28.002164224 +0000 UTC m=+210011.567248879
I0829 17:21:28.053873 1 conditions.go:192] Found status change for Certificate "test-cert" condition "Issuing": "True" -> "False"; setting lastTransitionTime to 2025-08-29 17:21:28.053862881 +0000 UTC m=+210011.618947526
I0829 17:21:28.061472 1 controller.go:152] "re-queuing item due to optimistic locking on resource" logger="cert-manager.controller" error="Operation cannot be fulfilled on certificates.cert-manager.io \"test-cert\": the object has been modified; please apply your changes to the latest version and try again"
W0829 17:21:28.073831 1 warnings.go:70] unknown field "status.failedIssuanceAttempts"
I0829 17:21:28.075364 1 trigger_controller.go:202] "Backing off from issuance due to previously failed issuance(s). Issuance will next be attempted at 2025-08-29 18:21:28.000000646 +0000 UTC m=+213611.565085280" logger="cert-manager.controller" key="cert-manager/test-cert"
I0829 17:21:28.104046 1 trigger_controller.go:202] "Backing off from issuance due to previously failed issuance(s). Issuance will next be attempted at 2025-08-29 18:21:28.000000441 +0000 UTC m=+213611.565085073" logger="cert-manager.controller" key="cert-manager/test-cert"
The result of `k describe -n cert-manager certificaterequests.cert-manager.io test-cert-20`
Name: test-cert-20
Namespace: cert-manager
Labels: <none>
Annotations: cert-manager.io/certificate-name: test-cert
cert-manager.io/certificate-revision: 20
cert-manager.io/private-key-secret-name: test-cert-576rm
API Version: cert-manager.io/v1
Kind: CertificateRequest
Metadata:
Creation Timestamp: 2025-08-29T07:21:25Z
Generation: 1
Owner References:
API Version: cert-manager.io/v1
Block Owner Deletion: true
Controller: true
Kind: Certificate
Name: test-cert
UID: e128b7f7-ecf7-44cc-9a28-283f10d3a573
Resource Version: 810583
UID: 38fac994-9da4-48b1-9287-a6069b021e07
Spec:
Duration: 1h0m0s
Extra:
authentication.kubernetes.io/credential-id:
JTI=2e04894e-03f3-4e7b-b839-9e6ef08cee2f
authentication.kubernetes.io/node-name:
test-certmanager-awspca-wk01-md-0-v7xvt-kqbkr
authentication.kubernetes.io/node-uid:
bfbb90dc-0d85-4f6a-a42b-568b9b94c561
authentication.kubernetes.io/pod-name:
generated-cert-manager-64996dc5c5-8xtrx
authentication.kubernetes.io/pod-uid:
3ffd1e1c-3bc2-42d5-b28f-f81670619d5a
Groups:
system:serviceaccounts
system:serviceaccounts:cert-manager
system:authenticated
Issuer Ref:
Group: awspca.cert-manager.io
Kind: AWSPCAClusterIssuer
Name: test-aws-pca-issuer
Request: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURSBSRVFVRVNULS0tLS0KTUlJQ29qQ0NBWW9DQVFBd0Z<ignore here>LQo=
UID: ac1d2b9b-5335-43da-8fbf-1bd2dfc5c80b
Username: system:serviceaccount:cert-manager:generated-cert-manager
Status:
Conditions:
Last Transition Time: 2025-08-29T07:21:25Z
Message: Certificate request has been approved by cert-manager.io
Reason: cert-manager.io
Status: True
Type: Approved
Last Transition Time: 2025-08-29T07:21:28Z
Message: failed to request certificate from PCA: operation error ACM PCA: DescribeCertificateAuthority, exceeded maximum number of attempts, 3, https response error StatusCode: 0, RequestID: , request send failed, Post "https://vpce-01c10d5ae6bd7f703-7gzleefh.acm-pca.us-west-2.vpce.amazonaws.com/": dial tcp: lookup vpce-01c10d5ae6bd7f703-7gzleefh.acm-pca.us-west-2.vpce.amazonaws.com on 10.96.0.10:53: server misbehaving
Reason: Failed
Status: False
Type: Ready
Events: <none>Version
- aws pca issuer: v1.6.0
- Cert manager : v1.16.4
Have you tried the following?
- Check the Troubleshooting section
- Search open issues
Category
Supported Workflow Broken
Severity
Severity 2
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working