-
Notifications
You must be signed in to change notification settings - Fork 4
Description
Migrator fails on upgrade
Issue reported in v4.4
During an AMI upgrade, if an AMI instance is upgraded via the standard upgrade procedure, some drift may be introduce if the instance is not rebooted as instructed in step 10.
Reproduction
An instance s initialized in v4.4

Upgrade to v4.5.1 - reboot

Observe failing migrator pods after startup and attachment of old volume
migrator-ivmtf-qdbmk 0/1 Error 0 45m
migrator-ivmtf-dkkmg 0/1 Error 0 45m
migrator-ivmtf-n7g4d 0/1 Error 0 44m
migrator-ivmtf-k86r6 0/1 Error 0 44m
migrator-ivmtf-ccnkt 0/1 Error 0 43m
otel-collector-754d7c6c4f-rlktn 0/1 Pending 0 6m48s
migrator-ivmtf-gwkbg 0/1 Error 0 7m13s
reboot the EC2 machine and check for drift
[ec2-user@ip-172-31-50-4 ~]$ kubectl logs job/migrator-o10ui
✱ Sourcegraph migrator 5.1.4
ℹ️ Connection DSNs used: frontend => postgres://sg:password@pgsql:5432/sg
Attempting connection to postgres://sg:password@pgsql:5432/sg...
✅ Connection to "postgres://sg:password@pgsql:5432/sg" succeeded
ℹ️ Locating schema description
✅ Schema found in Local file (/schema-descriptions/v4.5.1-internal_database_schema.json).
✅ No drift detected
Upgrade to v5.0.6
[ec2-user@ip-172-31-57-33 ~]$ kubectl logs job/migrator-vu1bz
✱ Sourcegraph migrator 5.1.4
ℹ️ Connection DSNs used: frontend => postgres://sg:password@pgsql:5432/sg
Attempting connection to postgres://sg:password@pgsql:5432/sg...
✅ Connection to "postgres://sg:password@pgsql:5432/sg" succeeded
💡 Parsed "v5.0.6" from version flag value "5.0.6"
ℹ️ Locating schema description
ℹ️ Reading schema definition in Local file (/schema-descriptions/v5.0.6-internal_database_schema.json)... Schema not found (open /schema-descriptions/v5.0.6-internal_database_schema.json: no such file or directory). Will attempt a fallback source.
✅ Schema found in GitHub (https://raw.githubusercontent.com/sourcegraph/sourcegraph/v5.0.6/internal/database/schema.json).
✅ No drift detected
Upgrade to v5.0.6

Upgrade to v5.1.4
Manual drift check
[ec2-user@ip-172-31-55-120 ~]$ k logs job/migrator-kzvjh
Found 2 pods, using pod/migrator-kzvjh-m68ww
✱ Sourcegraph migrator 5.1.4
ℹ️ Connection DSNs used: frontend => postgres://sg:password@pgsql:5432/sg
Attempting connection to postgres://sg:password@pgsql:5432/sg...
✅ Connection to "postgres://sg:password@pgsql:5432/sg" succeeded
{"SeverityText":"FATAL","Timestamp":1689936589304208831,"InstrumentationScope":"migrator","Caller":"migrator/main.go:29","Function":"main.main","Body":"version assertion failed: \"5.0\" != \"v5.1.4\". Re-invoke with --skip-version-check to ignore this check","Resource":{"service.name":"migrator","service.version":"5.1.4","service.instance.id":"8404e9ce-9ce4-47e1-be7f-d6c00e765e04"},"Attributes":{}}
Database version
sg=# SELECT * FROM versions;
service | version | updated_at | first_version | auto_upgrade
----------+---------+-------------------------------+---------------+--------------
frontend | 5.0.6 | 2023-07-21 10:44:57.253021+00 | 4.4.0 | f
(1 row)
Reboot in version 5.1.4
NAME READY STATUS RESTARTS AGE
otel-collector-64d9c9b6d6-zvbqr 0/1 Pending 0 18m
migrator-kzvjh-m68ww 0/1 Error 0 8m5s
migrator-kzvjh-kc7bf 0/1 Error 0 8m1s
migrator-kzvjh-mhzsj 0/1 Error 0 7m47s
migrator-kzvjh-f7w5j 0/1 Error 0 7m24s
migrator-kzvjh-twmwz 0/1 Error 0 6m41s
migrator-kzvjh-xtl42 0/1 Error 0 5m17s
[ec2-user@ip-172-31-55-120 ~]$ k logs migrator-kzvjh-m68ww
unable to retrieve container logs for containerd://784ba2c7dd1081b77f4f314b4ddfe12922a0f3fc1ad3b488b9da885d2a71ba34[ec2-user@ip-172-31-55-120 ~]$
[ec2-user@ip-172-31-55-120 ~]$ k logs migrator-kzvjh-m68ww
unable to retrieve container logs for containerd://784ba2c7dd1081b77f4f314b4ddfe12922a0f3fc1ad3b488b9da885d2a71ba34[ec2-user@ip-172-31-55-120 ~]$ k describe pod migrator-kzvjh-m68ww
Name: migrator-kzvjh-m68ww
Namespace: default
Priority: 0
Service Account: default
Node: sourcegraph-0/172.31.55.120
Start Time: Fri, 21 Jul 2023 10:49:48 +0000
Labels: app.kubernetes.io/instance=sourcegraph-migrator
app.kubernetes.io/name=sourcegraph-migrator
controller-uid=b7655183-4018-4bbe-a81a-93c71ccb9488
deploy=sourcegraph
job=migrator
job-name=migrator-kzvjh
Annotations: kubectl.kubernetes.io/default-container: migrator
Status: Failed
IP: 10.10.0.99
IPs:
IP: 10.10.0.99
Controlled By: Job/migrator-kzvjh
Containers:
migrator:
Container ID: containerd://784ba2c7dd1081b77f4f314b4ddfe12922a0f3fc1ad3b488b9da885d2a71ba34
Image: index.docker.io/sourcegraph/migrator:5.1.4@sha256:b871f4d32dee8ae757e3a66e5e0b75b0f2d6e04d6c598f1f0540a8e93648715b
Image ID: docker.io/sourcegraph/migrator@sha256:b871f4d32dee8ae757e3a66e5e0b75b0f2d6e04d6c598f1f0540a8e93648715b
Port: <none>
Host Port: <none>
Args:
drift
--db=frontend
--version=v5.1.4
State: Terminated
Reason: Error
Exit Code: 1
Started: Fri, 21 Jul 2023 10:49:48 +0000
Finished: Fri, 21 Jul 2023 10:49:49 +0000
Ready: False
Restart Count: 0
Limits:
cpu: 500m
memory: 100M
Requests:
cpu: 100m
memory: 50M
Environment:
PGDATABASE: <set to the key 'database' in secret 'pgsql-auth'> Optional: false
PGHOST: <set to the key 'host' in secret 'pgsql-auth'> Optional: false
PGPASSWORD: <set to the key 'password' in secret 'pgsql-auth'> Optional: false
PGPORT: <set to the key 'port' in secret 'pgsql-auth'> Optional: false
PGUSER: <set to the key 'user' in secret 'pgsql-auth'> Optional: false
CODEINTEL_PGDATABASE: <set to the key 'database' in secret 'codeintel-db-auth'> Optional: false
CODEINTEL_PGHOST: <set to the key 'host' in secret 'codeintel-db-auth'> Optional: false
CODEINTEL_PGPASSWORD: <set to the key 'password' in secret 'codeintel-db-auth'> Optional: false
CODEINTEL_PGPORT: <set to the key 'port' in secret 'codeintel-db-auth'> Optional: false
CODEINTEL_PGUSER: <set to the key 'user' in secret 'codeintel-db-auth'> Optional: false
CODEINSIGHTS_PGDATABASE: <set to the key 'database' in secret 'codeinsights-db-auth'> Optional: false
CODEINSIGHTS_PGHOST: <set to the key 'host' in secret 'codeinsights-db-auth'> Optional: false
CODEINSIGHTS_PGPASSWORD: <set to the key 'password' in secret 'codeinsights-db-auth'> Optional: false
CODEINSIGHTS_PGPORT: <set to the key 'port' in secret 'codeinsights-db-auth'> Optional: false
CODEINSIGHTS_PGUSER: <set to the key 'user' in secret 'codeinsights-db-auth'> Optional: false
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-vwwrt (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kube-api-access-vwwrt:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 10m default-scheduler Successfully assigned default/migrator-kzvjh-m68ww to sourcegraph-0
Normal Pulled 10m kubelet Container image "index.docker.io/sourcegraph/migrator:5.1.4@sha256:b871f4d32dee8ae757e3a66e5e0b75b0f2d6e04d6c598f1f0540a8e93648715b" already present on machine
Normal Created 10m kubelet Created container migrator
Normal Started 10m kubelet Started container migrator
You have new mail in /var/spool/mail/ec2-user
[ec2-user@ip-172-31-55-120 ~]$ k describe pod migrator-kzvjh-m68ww
Name: migrator-kzvjh-m68ww
Namespace: default
Priority: 0
Service Account: default
Node: sourcegraph-0/172.31.55.120
Start Time: Fri, 21 Jul 2023 10:49:48 +0000
Labels: app.kubernetes.io/instance=sourcegraph-migrator
app.kubernetes.io/name=sourcegraph-migrator
controller-uid=b7655183-4018-4bbe-a81a-93c71ccb9488
deploy=sourcegraph
job=migrator
job-name=migrator-kzvjh
Annotations: kubectl.kubernetes.io/default-container: migrator
Status: Failed
IP: 10.10.0.99
IPs:
IP: 10.10.0.99
Controlled By: Job/migrator-kzvjh
Containers:
migrator:
Container ID: containerd://784ba2c7dd1081b77f4f314b4ddfe12922a0f3fc1ad3b488b9da885d2a71ba34
Image: index.docker.io/sourcegraph/migrator:5.1.4@sha256:b871f4d32dee8ae757e3a66e5e0b75b0f2d6e04d6c598f1f0540a8e93648715b
Image ID: docker.io/sourcegraph/migrator@sha256:b871f4d32dee8ae757e3a66e5e0b75b0f2d6e04d6c598f1f0540a8e93648715b
Port: <none>
Host Port: <none>
Args:
drift
--db=frontend
--version=v5.1.4
State: Terminated
Reason: Error
Exit Code: 1
Started: Fri, 21 Jul 2023 10:49:48 +0000
Finished: Fri, 21 Jul 2023 10:49:49 +0000
Ready: False
Restart Count: 0
Limits:
cpu: 500m
memory: 100M
Requests:
cpu: 100m
memory: 50M
Environment:
PGDATABASE: <set to the key 'database' in secret 'pgsql-auth'> Optional: false
PGHOST: <set to the key 'host' in secret 'pgsql-auth'> Optional: false
PGPASSWORD: <set to the key 'password' in secret 'pgsql-auth'> Optional: false
PGPORT: <set to the key 'port' in secret 'pgsql-auth'> Optional: false
PGUSER: <set to the key 'user' in secret 'pgsql-auth'> Optional: false
CODEINTEL_PGDATABASE: <set to the key 'database' in secret 'codeintel-db-auth'> Optional: false
CODEINTEL_PGHOST: <set to the key 'host' in secret 'codeintel-db-auth'> Optional: false
CODEINTEL_PGPASSWORD: <set to the key 'password' in secret 'codeintel-db-auth'> Optional: false
CODEINTEL_PGPORT: <set to the key 'port' in secret 'codeintel-db-auth'> Optional: false
CODEINTEL_PGUSER: <set to the key 'user' in secret 'codeintel-db-auth'> Optional: false
CODEINSIGHTS_PGDATABASE: <set to the key 'database' in secret 'codeinsights-db-auth'> Optional: false
CODEINSIGHTS_PGHOST: <set to the key 'host' in secret 'codeinsights-db-auth'> Optional: false
CODEINSIGHTS_PGPASSWORD: <set to the key 'password' in secret 'codeinsights-db-auth'> Optional: false
CODEINSIGHTS_PGPORT: <set to the key 'port' in secret 'codeinsights-db-auth'> Optional: false
CODEINSIGHTS_PGUSER: <set to the key 'user' in secret 'codeinsights-db-auth'> Optional: false
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-vwwrt (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kube-api-access-vwwrt:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 10m default-scheduler Successfully assigned default/migrator-kzvjh-m68ww to sourcegraph-0
Normal Pulled 10m kubelet Container image "index.docker.io/sourcegraph/migrator:5.1.4@sha256:b871f4d32dee8ae757e3a66e5e0b75b0f2d6e04d6c598f1f0540a8e93648715b" already present on machine
Normal Created 10m kubelet Created container migrator
Normal Started 10m kubelet Started container migrator
Checking drift against the v5.0.6 version
helm upgrade --install --set "migrator.args={drift,--db=frontend,--version=v5.0.6,--skip-version-check}" sourcegraph-migrator sourcegraph/sourcegraph-migrator --version 5.1.4
Actual logs omitted, but this drift output is the same as is registered in the Update page
Summary
On upgrade from v5.0.6 to v5.1.x the migrator isn't correctly initializing and setting the db state to the correct version. It is however likely running schema migrations. Either the migrations are being applied correctly and the schema drift in the updates page is the result of a bad versions
table entry. Or the schema migrations aren't being run by the up
command.
Given these conditions once the direction issue of the up
operations failure is identified this can likely be solved manually by correct use of the upgrade
command. A hypothesis as to the root cause of this issue is the tagging of a 5.0.6
image set in sourcegraph/deploy
repo. Migrators image definitions in sourcegraph/sourcegraph
repo may not handle correctly for the extra/missing version.