Skip to content

Conversation

sergelogvinov
Copy link
Contributor

@sergelogvinov sergelogvinov commented Oct 12, 2023

K8SPSMDB-1003 Powered by Pull Request Badge

https://jira.percona.com/browse/K8SPSMDB-1003


Problem:
To use read/write concern based on kubernetes zone/region.

Cause:
For example, reading from a single zone can reduce latency, while writing to multiple zones enhances redundancy

Solution:
Simple changes.
We will read node property (if we have a right for it) and add tags to the node.

Plus, we need to add RBAC policy in helm chart too.

{{- if or .Values.watchNamespace .Values.watchAllNamespaces }}
  - apiGroups:
    - ""
    resources:
    - nodes
    verbs:
    - get
    - list
    - watch

Thanks.

CHECKLIST

Jira

  • Is the Jira ticket created and referenced properly?
  • Does the Jira ticket have the proper statuses for documentation (Needs Doc) and QA (Needs QA)?
  • Does the Jira ticket link to the proper milestone (Fix Version field)?

Tests

  • Is an E2E test/test case added for the new feature/change?
  • Are unit tests added where appropriate?
  • Are OpenShift compare files changed for E2E tests (compare/*-oc.yml)?

Config/Logging/Testability

  • Are all needed new/changed options added to default YAML files?
  • Are the manifests (crd/bundle) regenerated if needed?
  • Did we add proper logging messages for operator actions?
  • Did we ensure compatibility with the previous version or cluster upgrade process?
  • Does the change support oldest and newest supported MongoDB version?
  • Does the change support oldest and newest supported Kubernetes version?

@pull-request-size pull-request-size bot added the size/M 30-99 lines label Oct 12, 2023
@sergelogvinov sergelogvinov changed the title K8SPSMDB-1003: Kubernetes node zone/region K8SPSMDB-1003: Kubernetes node zone/region tag Oct 12, 2023
@sergelogvinov sergelogvinov marked this pull request as draft October 13, 2023 10:08
@it-percona-cla
Copy link

it-percona-cla commented Oct 18, 2023

CLA assistant check
All committers have signed the CLA.

@hors hors added the community label Jan 11, 2024
@egegunes
Copy link
Contributor

@sergelogvinov are you willing to work on this further? Looking at the test results, I don't think it works right now but I think it's a useful feature. If you don't want to work on this further, we can take over.

@egegunes egegunes added this to the v1.16.0 milestone Jan 12, 2024
@egegunes
Copy link
Contributor

@sergelogvinov ping

@sergelogvinov
Copy link
Contributor Author

Hello, sorry for delay.

I did some tests on my application side with changes. And all works as expected.
But I think we need more changes here.

I know some clouds which does not allow you to use clusterRole permission (only one namespace permission).
So this feature should be as option (crd option).

The option proposal:
If topologyPrimaryKey exists (and non empty) we will add labels to the mongo nodes.

# Try to set higher priority for nodes which zone = us-east-1a
topologyPrimaryPrefer: us-east-1a
# Can be kubernetes.io/hostname or topology.kubernetes.io/region or topology.kubernetes.io/zone
topologyPrimaryKey: kubernetes.io/zone

And it can be done with https://jira.percona.com/browse/K8SPSMDB-1002

What do you think?

@egegunes
Copy link
Contributor

@sergelogvinov yes, namespace permission can be a problem since by default we don't use ClusterRole. So unless operator is deployed cluster-wide, this won't work. It'd be great if we can offer something for namespace scoped deployments too, what do you think @hors @spron-in ?

@sergelogvinov I think K8SPSMDB-1002 should be implemented in another PR, wdyt?

@hors
Copy link
Collaborator

hors commented Feb 1, 2024

@sergelogvinov yes, namespace permission can be a problem since by default we don't use ClusterRole. So unless operator is deployed cluster-wide, this won't work. It'd be great if we can offer something for namespace scoped deployments too, what do you think @hors @spron-in ?

@egegunes I think we can start from CW and then we will see.

@egegunes
Copy link
Contributor

egegunes commented Feb 2, 2024

@sergelogvinov we'll start working on v1.16.0 in this month and if you want to have this we can assist you

@egegunes
Copy link
Contributor

egegunes commented Mar 1, 2024

@sergelogvinov ping

@sergelogvinov sergelogvinov marked this pull request as ready for review March 6, 2024 12:55
@sergelogvinov
Copy link
Contributor Author

sergelogvinov commented Mar 6, 2024

@egegunes Sorry for delay.

I've rebase the PR, check the cluster wide and namespaces deployment. It willn't fail if it does not have cluster role permission.

@sergelogvinov
Copy link
Contributor Author

I've checked the failed logs. Is it CI issue?

Thanks.

@egegunes
Copy link
Contributor

egegunes commented Mar 8, 2024

@sergelogvinov I think we have problems with backups and restores because of this changes. I don't think it's just a CI issue

@sergelogvinov
Copy link
Contributor Author

@sergelogvinov I think we have problems with backups and restores because of this changes. I don't think it's just a CI issue

I've checked the logs/shell scripts and other PRs. Last PRs have the same error:

2024-03-08T14:38:20.000+0000 D [resync] bcp: 2024-03-08T14:37:40Z.pbm.json
2024-03-08T14:38:20.000+0000 W [resync] skip snapshot 2024-03-08T14:37:40Z: file "2024-03-08T14:37:40Z/shard1/oplog": no such file

I notice, that we run operator in cluster wide mode, so probably operator in another namespace affects our e2e tests.
Can you check the CI cluster, please?

Thanks.

@egegunes
Copy link
Contributor

egegunes commented Apr 4, 2024

@nmarukovich could you please check this

Add kubernetes node tags zone/region to the monogo nodes.
@nmarukovich nmarukovich requested a review from egegunes April 22, 2024 11:37
inelpandzic
inelpandzic previously approved these changes Apr 23, 2024
@pull-request-size pull-request-size bot added size/L 100-499 lines and removed size/M 30-99 lines labels Apr 23, 2024
hors
hors previously approved these changes Apr 24, 2024
| egrep -v 'I NETWORK|W NETWORK|Error saving history file|Percona Server for MongoDB|connecting to:|Unable to reach primary for set|Implicit session:|versions do not match|Error saving history file:|bye' \
| $sed -re 's/ObjectId\("[0-9a-f]+"\)//; s/-[0-9]+.svc/-xxx.svc/')

echo "$nodes_amount"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be better to have more descriptive log like: "${nodes_amount} members is in replset ${rsName} configuration"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and it'd be good to print waiting for all members to be configured in ${replsetName} before until

@hors hors self-requested a review April 24, 2024 08:00
@JNKPercona
Copy link
Collaborator

Test name Status
arbiter passed
balancer passed
custom-replset-name passed
cross-site-sharded passed
data-at-rest-encryption passed
data-sharded passed
demand-backup passed
demand-backup-eks-credentials passed
demand-backup-physical passed
demand-backup-physical-sharded passed
demand-backup-sharded passed
expose-sharded passed
ignore-labels-annotations passed
init-deploy passed
finalizer passed
ldap passed
ldap-tls passed
limits passed
liveness passed
mongod-major-upgrade passed
mongod-major-upgrade-sharded passed
monitoring-2-0 passed
multi-cluster-service passed
non-voting passed
one-pod passed
operator-self-healing-chaos passed
pitr passed
pitr-sharded passed
pitr-physical passed
pvc-resize passed
recover-no-primary passed
rs-shard-migration passed
scaling passed
scheduled-backup passed
security-context passed
self-healing-chaos passed
service-per-pod passed
serviceless-external-nodes passed
smart-update passed
split-horizon passed
storage passed
tls-issue-cert-manager passed
upgrade passed
upgrade-consistency passed
upgrade-consistency-sharded-tls passed
upgrade-sharded passed
users passed
version-service passed
We run 48 out of 48

commit: 95c1888
image: perconalab/percona-server-mongodb-operator:PR-1360-95c1888e

@hors hors merged commit 028e8c3 into percona:main Apr 24, 2024
@hors
Copy link
Collaborator

hors commented Apr 24, 2024

@sergelogvinov thank you for your contribution

@sergelogvinov sergelogvinov deleted the mongo-geo-tag branch April 27, 2024 08:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community size/L 100-499 lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants