Skip to content
This repository was archived by the owner on May 26, 2023. It is now read-only.

Commit 7ae7333

Browse files
authored
Option for post-deploy sleep delay to stagger tservers competing with one another for DDL version locks (#268)
* post-deploy jitter on tservers * fixup post-deploy-delay arithmatic * increase delay factor to 10, which might be too high, but it should help
1 parent ed29560 commit 7ae7333

File tree

2 files changed

+29
-1
lines changed

2 files changed

+29
-1
lines changed

jobs/yb-tserver/spec

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -66,6 +66,24 @@ properties:
6666
until then, the default will be the recommended 60 seconds.
6767
see also: https://docs.yugabyte.com/latest/manage/upgrade-deployment
6868
default: 60
69+
post_deploy_delay_factor:
70+
description: |
71+
must be a whole number.
72+
consider removing this property altogether a TODO.
73+
multiplication factor used in the post-deploy hook to calculate how long each node should
74+
sleep before attempting to run itself. this number is multipled by the node index to
75+
calculate how long, in seconds, to sleep.
76+
for example, given a delay factor of 5, on node index 0,
77+
the node will sleep 0 seconds before running its post-deploy. meanwhile on node index 1,
78+
the node will sleep for 5 seconds before running its post-deploy. and on node index 2, the
79+
node will sleep for 10 seconds before running its post-deploy.
80+
and for example, given a delay factor of 8, node index 0 sleeps for 0 seconds, index 1
81+
sleeps for 8 seconds, index 2 sleeps for 16 seconds, and so on.
82+
this is not the most incredible solution in the world, but it is a relatively okay
83+
way to prevent nodes from stumbling over each other.
84+
in the future it may be preferrable to have only a subset of nodes try to perform
85+
post-deploy at all, but this should be an acceptable solution for the time being.
86+
default: 10
6987

7088
tls.allow_insecure_connections:
7189
description: |

jobs/yb-tserver/templates/bin/post-deploy.erb.sh

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,10 +2,20 @@
22

33
set -eu
44

5-
echo "running post-deploy..."
5+
echo "$(date --rfc-3339=seconds) entering post-deploy..."
66

77
source /var/vcap/packages/python*/bosh/runtime.env
88

9+
INDEX=<%= spec.index %>
10+
POST_DEPLOY_DELAY_FACTOR=<%= p('post_deploy_delay_factor') %>
11+
POST_DEPLOY_DELAY_SECONDS="$((${POST_DEPLOY_DELAY_FACTOR} * ${INDEX}))"
12+
13+
echo "$(date --rfc-3339=seconds) post-deploy on node index ${INDEX} sleeping for ${POST_DEPLOY_DELAY_SECONDS} seconds before beginning post-deploy..."
14+
sleep "${POST_DEPLOY_DELAY_SECONDS}"
15+
echo "$(date --rfc-3339=seconds) post-deploy delay complete, running..."
16+
17+
####################################
18+
919
echo "running post-deploy ycql rotate admin password check..."
1020
/var/vcap/jobs/yb-tserver/bin/ycql-rotate-default-admin-password.sh
1121

0 commit comments

Comments
 (0)