Skip to content

AMIs: v5.0.6 -> v5.1.x Migrator fails on upgrade if not booted, introducing schema drift #56

@DaedalusG

Description

@DaedalusG

Migrator fails on upgrade

Issue reported in v4.4

During an AMI upgrade, if an AMI instance is upgraded via the standard upgrade procedure, some drift may be introduce if the instance is not rebooted as instructed in step 10.

Reproduction

An instance s initialized in v4.4

Screenshot 2023-07-21 at 1 04 41 AM

Upgrade to v4.5.1 - reboot

Screenshot 2023-07-21 at 2 27 38 AM

Observe failing migrator pods after startup and attachment of old volume

migrator-ivmtf-qdbmk                         0/1     Error     0               45m
migrator-ivmtf-dkkmg                         0/1     Error     0               45m
migrator-ivmtf-n7g4d                         0/1     Error     0               44m
migrator-ivmtf-k86r6                         0/1     Error     0               44m
migrator-ivmtf-ccnkt                         0/1     Error     0               43m
otel-collector-754d7c6c4f-rlktn              0/1     Pending   0               6m48s
migrator-ivmtf-gwkbg                         0/1     Error     0               7m13s

reboot the EC2 machine and check for drift

[ec2-user@ip-172-31-50-4 ~]$ kubectl logs job/migrator-o10ui
✱ Sourcegraph migrator 5.1.4
ℹ️ Connection DSNs used: frontend => postgres://sg:password@pgsql:5432/sg
Attempting connection to postgres://sg:password@pgsql:5432/sg...
✅ Connection to "postgres://sg:password@pgsql:5432/sg" succeeded
ℹ️ Locating schema description
✅ Schema found in Local file (/schema-descriptions/v4.5.1-internal_database_schema.json).
✅ No drift detected

Upgrade to v5.0.6

[ec2-user@ip-172-31-57-33 ~]$ kubectl logs job/migrator-vu1bz 
✱ Sourcegraph migrator 5.1.4
ℹ️ Connection DSNs used: frontend => postgres://sg:password@pgsql:5432/sg
Attempting connection to postgres://sg:password@pgsql:5432/sg...
✅ Connection to "postgres://sg:password@pgsql:5432/sg" succeeded
💡 Parsed "v5.0.6" from version flag value "5.0.6"
ℹ️ Locating schema description
ℹ️ Reading schema definition in Local file (/schema-descriptions/v5.0.6-internal_database_schema.json)... Schema not found (open /schema-descriptions/v5.0.6-internal_database_schema.json: no such file or directory). Will attempt a fallback source.
✅ Schema found in GitHub (https://raw.githubusercontent.com/sourcegraph/sourcegraph/v5.0.6/internal/database/schema.json).
✅ No drift detected

Upgrade to v5.0.6

Screenshot 2023-07-21 at 3 31 38 AM

Upgrade to v5.1.4

Version
Screenshot 2023-07-21 at 3 49 18 AM

Drift in UI
Screenshot 2023-07-21 at 3 49 12 AM

Manual drift check

[ec2-user@ip-172-31-55-120 ~]$ k logs job/migrator-kzvjh
Found 2 pods, using pod/migrator-kzvjh-m68ww
✱ Sourcegraph migrator 5.1.4
ℹ️ Connection DSNs used: frontend => postgres://sg:password@pgsql:5432/sg
Attempting connection to postgres://sg:password@pgsql:5432/sg...
✅ Connection to "postgres://sg:password@pgsql:5432/sg" succeeded
{"SeverityText":"FATAL","Timestamp":1689936589304208831,"InstrumentationScope":"migrator","Caller":"migrator/main.go:29","Function":"main.main","Body":"version assertion failed: \"5.0\" != \"v5.1.4\". Re-invoke with --skip-version-check to ignore this check","Resource":{"service.name":"migrator","service.version":"5.1.4","service.instance.id":"8404e9ce-9ce4-47e1-be7f-d6c00e765e04"},"Attributes":{}}

Database version

sg=# SELECT * FROM versions;
 service  | version |          updated_at           | first_version | auto_upgrade 
----------+---------+-------------------------------+---------------+--------------
 frontend | 5.0.6   | 2023-07-21 10:44:57.253021+00 | 4.4.0         | f
(1 row)

Reboot in version 5.1.4

NAME                                          READY   STATUS    RESTARTS      AGE
otel-collector-64d9c9b6d6-zvbqr               0/1     Pending   0             18m
migrator-kzvjh-m68ww                          0/1     Error     0             8m5s
migrator-kzvjh-kc7bf                          0/1     Error     0             8m1s
migrator-kzvjh-mhzsj                          0/1     Error     0             7m47s
migrator-kzvjh-f7w5j                          0/1     Error     0             7m24s
migrator-kzvjh-twmwz                          0/1     Error     0             6m41s
migrator-kzvjh-xtl42                          0/1     Error     0             5m17s
[ec2-user@ip-172-31-55-120 ~]$ k logs migrator-kzvjh-m68ww 
unable to retrieve container logs for containerd://784ba2c7dd1081b77f4f314b4ddfe12922a0f3fc1ad3b488b9da885d2a71ba34[ec2-user@ip-172-31-55-120 ~]$
[ec2-user@ip-172-31-55-120 ~]$ k logs migrator-kzvjh-m68ww 
unable to retrieve container logs for containerd://784ba2c7dd1081b77f4f314b4ddfe12922a0f3fc1ad3b488b9da885d2a71ba34[ec2-user@ip-172-31-55-120 ~]$ k describe pod migrator-kzvjh-m68ww 
Name:             migrator-kzvjh-m68ww
Namespace:        default
Priority:         0
Service Account:  default
Node:             sourcegraph-0/172.31.55.120
Start Time:       Fri, 21 Jul 2023 10:49:48 +0000
Labels:           app.kubernetes.io/instance=sourcegraph-migrator
                  app.kubernetes.io/name=sourcegraph-migrator
                  controller-uid=b7655183-4018-4bbe-a81a-93c71ccb9488
                  deploy=sourcegraph
                  job=migrator
                  job-name=migrator-kzvjh
Annotations:      kubectl.kubernetes.io/default-container: migrator
Status:           Failed
IP:               10.10.0.99
IPs:
  IP:           10.10.0.99
Controlled By:  Job/migrator-kzvjh
Containers:
  migrator:
    Container ID:  containerd://784ba2c7dd1081b77f4f314b4ddfe12922a0f3fc1ad3b488b9da885d2a71ba34
    Image:         index.docker.io/sourcegraph/migrator:5.1.4@sha256:b871f4d32dee8ae757e3a66e5e0b75b0f2d6e04d6c598f1f0540a8e93648715b
    Image ID:      docker.io/sourcegraph/migrator@sha256:b871f4d32dee8ae757e3a66e5e0b75b0f2d6e04d6c598f1f0540a8e93648715b
    Port:          <none>
    Host Port:     <none>
    Args:
      drift
      --db=frontend
      --version=v5.1.4
    State:          Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Fri, 21 Jul 2023 10:49:48 +0000
      Finished:     Fri, 21 Jul 2023 10:49:49 +0000
    Ready:          False
    Restart Count:  0
    Limits:
      cpu:     500m
      memory:  100M
    Requests:
      cpu:     100m
      memory:  50M
    Environment:
      PGDATABASE:               <set to the key 'database' in secret 'pgsql-auth'>            Optional: false
      PGHOST:                   <set to the key 'host' in secret 'pgsql-auth'>                Optional: false
      PGPASSWORD:               <set to the key 'password' in secret 'pgsql-auth'>            Optional: false
      PGPORT:                   <set to the key 'port' in secret 'pgsql-auth'>                Optional: false
      PGUSER:                   <set to the key 'user' in secret 'pgsql-auth'>                Optional: false
      CODEINTEL_PGDATABASE:     <set to the key 'database' in secret 'codeintel-db-auth'>     Optional: false
      CODEINTEL_PGHOST:         <set to the key 'host' in secret 'codeintel-db-auth'>         Optional: false
      CODEINTEL_PGPASSWORD:     <set to the key 'password' in secret 'codeintel-db-auth'>     Optional: false
      CODEINTEL_PGPORT:         <set to the key 'port' in secret 'codeintel-db-auth'>         Optional: false
      CODEINTEL_PGUSER:         <set to the key 'user' in secret 'codeintel-db-auth'>         Optional: false
      CODEINSIGHTS_PGDATABASE:  <set to the key 'database' in secret 'codeinsights-db-auth'>  Optional: false
      CODEINSIGHTS_PGHOST:      <set to the key 'host' in secret 'codeinsights-db-auth'>      Optional: false
      CODEINSIGHTS_PGPASSWORD:  <set to the key 'password' in secret 'codeinsights-db-auth'>  Optional: false
      CODEINSIGHTS_PGPORT:      <set to the key 'port' in secret 'codeinsights-db-auth'>      Optional: false
      CODEINSIGHTS_PGUSER:      <set to the key 'user' in secret 'codeinsights-db-auth'>      Optional: false
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-vwwrt (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  kube-api-access-vwwrt:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  10m   default-scheduler  Successfully assigned default/migrator-kzvjh-m68ww to sourcegraph-0
  Normal  Pulled     10m   kubelet            Container image "index.docker.io/sourcegraph/migrator:5.1.4@sha256:b871f4d32dee8ae757e3a66e5e0b75b0f2d6e04d6c598f1f0540a8e93648715b" already present on machine
  Normal  Created    10m   kubelet            Created container migrator
  Normal  Started    10m   kubelet            Started container migrator
You have new mail in /var/spool/mail/ec2-user
[ec2-user@ip-172-31-55-120 ~]$ k describe pod migrator-kzvjh-m68ww 
Name:             migrator-kzvjh-m68ww
Namespace:        default
Priority:         0
Service Account:  default
Node:             sourcegraph-0/172.31.55.120
Start Time:       Fri, 21 Jul 2023 10:49:48 +0000
Labels:           app.kubernetes.io/instance=sourcegraph-migrator
                  app.kubernetes.io/name=sourcegraph-migrator
                  controller-uid=b7655183-4018-4bbe-a81a-93c71ccb9488
                  deploy=sourcegraph
                  job=migrator
                  job-name=migrator-kzvjh
Annotations:      kubectl.kubernetes.io/default-container: migrator
Status:           Failed
IP:               10.10.0.99
IPs:
  IP:           10.10.0.99
Controlled By:  Job/migrator-kzvjh
Containers:
  migrator:
    Container ID:  containerd://784ba2c7dd1081b77f4f314b4ddfe12922a0f3fc1ad3b488b9da885d2a71ba34
    Image:         index.docker.io/sourcegraph/migrator:5.1.4@sha256:b871f4d32dee8ae757e3a66e5e0b75b0f2d6e04d6c598f1f0540a8e93648715b
    Image ID:      docker.io/sourcegraph/migrator@sha256:b871f4d32dee8ae757e3a66e5e0b75b0f2d6e04d6c598f1f0540a8e93648715b
    Port:          <none>
    Host Port:     <none>
    Args:
      drift
      --db=frontend
      --version=v5.1.4
    State:          Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Fri, 21 Jul 2023 10:49:48 +0000
      Finished:     Fri, 21 Jul 2023 10:49:49 +0000
    Ready:          False
    Restart Count:  0
    Limits:
      cpu:     500m
      memory:  100M
    Requests:
      cpu:     100m
      memory:  50M
    Environment:
      PGDATABASE:               <set to the key 'database' in secret 'pgsql-auth'>            Optional: false
      PGHOST:                   <set to the key 'host' in secret 'pgsql-auth'>                Optional: false
      PGPASSWORD:               <set to the key 'password' in secret 'pgsql-auth'>            Optional: false
      PGPORT:                   <set to the key 'port' in secret 'pgsql-auth'>                Optional: false
      PGUSER:                   <set to the key 'user' in secret 'pgsql-auth'>                Optional: false
      CODEINTEL_PGDATABASE:     <set to the key 'database' in secret 'codeintel-db-auth'>     Optional: false
      CODEINTEL_PGHOST:         <set to the key 'host' in secret 'codeintel-db-auth'>         Optional: false
      CODEINTEL_PGPASSWORD:     <set to the key 'password' in secret 'codeintel-db-auth'>     Optional: false
      CODEINTEL_PGPORT:         <set to the key 'port' in secret 'codeintel-db-auth'>         Optional: false
      CODEINTEL_PGUSER:         <set to the key 'user' in secret 'codeintel-db-auth'>         Optional: false
      CODEINSIGHTS_PGDATABASE:  <set to the key 'database' in secret 'codeinsights-db-auth'>  Optional: false
      CODEINSIGHTS_PGHOST:      <set to the key 'host' in secret 'codeinsights-db-auth'>      Optional: false
      CODEINSIGHTS_PGPASSWORD:  <set to the key 'password' in secret 'codeinsights-db-auth'>  Optional: false
      CODEINSIGHTS_PGPORT:      <set to the key 'port' in secret 'codeinsights-db-auth'>      Optional: false
      CODEINSIGHTS_PGUSER:      <set to the key 'user' in secret 'codeinsights-db-auth'>      Optional: false
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-vwwrt (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  kube-api-access-vwwrt:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  10m   default-scheduler  Successfully assigned default/migrator-kzvjh-m68ww to sourcegraph-0
  Normal  Pulled     10m   kubelet            Container image "index.docker.io/sourcegraph/migrator:5.1.4@sha256:b871f4d32dee8ae757e3a66e5e0b75b0f2d6e04d6c598f1f0540a8e93648715b" already present on machine
  Normal  Created    10m   kubelet            Created container migrator
  Normal  Started    10m   kubelet            Started container migrator

Checking drift against the v5.0.6 version

helm upgrade --install --set "migrator.args={drift,--db=frontend,--version=v5.0.6,--skip-version-check}" sourcegraph-migrator sourcegraph/sourcegraph-migrator --version 5.1.4

Actual logs omitted, but this drift output is the same as is registered in the Update page

Summary

On upgrade from v5.0.6 to v5.1.x the migrator isn't correctly initializing and setting the db state to the correct version. It is however likely running schema migrations. Either the migrations are being applied correctly and the schema drift in the updates page is the result of a bad versions table entry. Or the schema migrations aren't being run by the up command.

Given these conditions once the direction issue of the up operations failure is identified this can likely be solved manually by correct use of the upgrade command. A hypothesis as to the root cause of this issue is the tagging of a 5.0.6 image set in sourcegraph/deploy repo. Migrators image definitions in sourcegraph/sourcegraph repo may not handle correctly for the extra/missing version.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions