Skip to content

PBM-1442 PBM-1443: improve pbm diagnostic #1129

PBM-1442 PBM-1443: improve pbm diagnostic

PBM-1442 PBM-1443: improve pbm diagnostic #1129

GitHub Actions / JUnit Test Report failed Nov 14, 2024 in 0s

41 tests run, 32 passed, 8 skipped, 1 failed.

Annotations

Check failure on line 54 in psmdb-testing/pbm-functional/pytest/test_rename_replicaset.py

See this annotation in the file changed.

@github-actions github-actions / JUnit Test Report

test_rename_replicaset.test_logical_pitr_crud_PBM_T270[replaces]

AssertionError: Backup failed{"Error":"get backup metadata: get: context deadline exceeded"}

2024-11-14T15:11:41Z I [rs1/rs101:27017] pbm-agent:
Version:   2.7.0
Platform:  linux/amd64
GitCommit: 192769fd681964e48871725f596761b8933bdad4
GitBranch: CURRENT_PR
BuildTime: 2024-11-14_14:08_UTC
GoVersion: go1.22.9
2024-11-14T15:11:41Z I [rs1/rs102:27017] pbm-agent:
Version:   2.7.0
Platform:  linux/amd64
GitCommit: 192769fd681964e48871725f596761b8933bdad4
GitBranch: CURRENT_PR
BuildTime: 2024-11-14_14:08_UTC
GoVersion: go1.22.9
2024-11-14T15:11:41Z I [rs1/rs102:27017] starting PITR routine
2024-11-14T15:11:41Z I [rs1/rs101:27017] starting PITR routine
2024-11-14T15:11:41Z I [rs1/rs103:27017] pbm-agent:
Version:   2.7.0
Platform:  linux/amd64
GitCommit: 192769fd681964e48871725f596761b8933bdad4
GitBranch: CURRENT_PR
BuildTime: 2024-11-14_14:08_UTC
GoVersion: go1.22.9
2024-11-14T15:11:41Z I [rs1/rs103:27017] starting PITR routine
2024-11-14T15:11:41Z I [rs1/rs101:27017] node: rs1/rs101:27017
2024-11-14T15:11:41Z I [rs1/rs102:27017] node: rs1/rs102:27017
2024-11-14T15:11:41Z I [rs1/rs103:27017] node: rs1/rs103:27017
2024-11-14T15:11:41Z E [rs1/rs101:27017] [agentCheckup] check storage connection: unable to get storage: get config: get: mongo: no documents in result
2024-11-14T15:11:41Z E [rs1/rs102:27017] [agentCheckup] check storage connection: unable to get storage: get config: get: mongo: no documents in result
2024-11-14T15:11:41Z I [rs1/rs101:27017] conn level ReadConcern: majority; WriteConcern: majority
2024-11-14T15:11:41Z I [rs1/rs102:27017] conn level ReadConcern: majority; WriteConcern: majority
2024-11-14T15:11:41Z I [rs1/rs103:27017] conn level ReadConcern: majority; WriteConcern: majority
2024-11-14T15:11:41Z E [rs1/rs103:27017] [agentCheckup] check storage connection: unable to get storage: get config: get: mongo: no documents in result
2024-11-14T15:11:41Z I [rs1/rs102:27017] listening for the commands
2024-11-14T15:11:41Z I [rs1/rs101:27017] listening for the commands
2024-11-14T15:11:41Z I [rs1/rs103:27017] listening for the commands
2024-11-14T15:11:46Z E [rs1/rs101:27017] [agentCheckup] check storage connection: unable to get storage: get config: get: mongo: no documents in result
2024-11-14T15:11:46Z E [rs1/rs102:27017] [agentCheckup] check storage connection: unable to get storage: get config: get: mongo: no documents in result
2024-11-14T15:11:46Z E [rs1/rs103:27017] [agentCheckup] check storage connection: unable to get storage: get config: get: mongo: no documents in result
2024-11-14T15:11:48Z I [rs1/rs102:27017] got command resync <ts: 1731597108>, opid: 6736133461bbacba664157cd
2024-11-14T15:11:48Z I [rs1/rs101:27017] got command resync <ts: 1731597108>, opid: 6736133461bbacba664157cd
2024-11-14T15:11:48Z I [rs1/rs103:27017] got command resync <ts: 1731597108>, opid: 6736133461bbacba664157cd
2024-11-14T15:11:48Z I [rs1/rs102:27017] got epoch {1731597106 6}
2024-11-14T15:11:48Z I [rs1/rs101:27017] got epoch {1731597106 6}
2024-11-14T15:11:48Z I [rs1/rs103:27017] got epoch {1731597106 6}
2024-11-14T15:11:48Z D [rs1/rs101:27017] [resync] lock not acquired
2024-11-14T15:11:48Z I [rs1/rs102:27017] [resync] started
2024-11-14T15:11:48Z D [rs1/rs103:27017] [resync] lock not acquired
2024-11-14T15:11:48Z D [rs1/rs102:27017] [resync] uploading ".pbm.init" [size hint: 5 (5.00B); part size: 10485760 (10.00MB)]
2024-11-14T15:11:48Z D [rs1/rs102:27017] [resync] got backups list: 0
2024-11-14T15:11:48Z D [rs1/rs102:27017] [resync] got physical restores list: 0
2024-11-14T15:11:48Z D [rs1/rs102:27017] [resync] epoch set to {1731597108 19}
2024-11-14T15:11:48Z I [rs1/rs102:27017] [resync] succeed
2024-11-14T15:11:54Z I [rs1/rs102:27017] got command resync <ts: 1731597113>, opid: 67361339d7e79927bf9b1904
2024-11-14T15:11:54Z I [rs1/rs103:27017] got command resync <ts: 1731597113>, opid: 67361339d7e79927bf9b1904
2024-11-14T15:11:54Z I [rs1/rs101:27017] got command resync <ts: 1731597113>, opid: 67361339d7e79927bf9b1904
2024-11-14T15:11:54Z I [rs1/rs103:27017] got epoch {1731597108 19}
2024-11-14T15:11:54Z I [rs1/rs102:27017] got epoch {1731597108 19}
2024-11-14T15:11:54Z I [rs1/rs101:27017] got epoch {1731597108 19}
2024-11-14T15:11:54Z D [rs1/rs102:27017] [resync] lock not acquired
2024-11-14T15:11:54Z I [rs1/rs103:27017] [resync] started
2024-11-14T15:11:54Z I [rs1/rs102:27017] got command backup [name: 2024-11-14T15:11:53Z, compression: none (level: default)] <ts: 1731597113>, opid: 6736133965f2960bc0250fbc
2024-11-14T15:11:54Z D [rs1/rs101:27017] [resync] lock not acquired
2024-11-14T15:11:54Z I [rs1/rs101:27017] got command backup [name: 2024-11-14T15:11:53Z, compression: none (level: default)] <ts: 1731597113>, opid: 6736133965f2960bc0250fbc
2024-11-14T15:11:54Z I [rs1/rs102:27017] got epoch {1731597108 19}
2024-11-14T15:11:54Z I [rs1/rs101:27017] got epoch {1731597108 19}
2024-11-14T15:11:54Z D [rs1/rs103:27017] [resync] got backups list: 0
2024-11-14T15:11:54Z E [rs1/rs101:27017] [backup/2024-11-14T15:11:53Z] unable to proceed with the backup, active lock is present
2024-11-14T15:11:54Z D [rs1/rs103:27017] [resync] got physical restores list: 0
2024-11-14T15:11:54Z D [rs1/rs103:27017] [resync] epoch set to {1731597114 20}
2024-11-14T15:11:54Z I [rs1/rs103:27017] [resync] succeed
2024-11-14T15:11:54Z I [rs1/rs103:27017] got command backup [name: 2024-11-14T15:11:53Z, compression: none (level: default)] <ts: 1731597113>, opid: 6736133965f2960bc0250fbc
2024-11-14T15:11:54Z I [rs1/rs103:27017] got epoch {1731597114 20}
2024-11-14T15:12:09Z D [rs1/rs102:27017] [backup/2024-11-14T15:11:53Z] nomination timeout
2024-11-14T15:12:09Z D [rs1/rs102:27017] [backup/2024-11-14T15:11:53Z] skip after nomination, probably started by another node
2024-11-14T15:12:09Z D [rs1/rs103:27017] [backup/2024-11-14T15:11:53Z] nomination timeout
2024-11-14T15:12:09Z D [rs1/rs103:27017] [backup/2024-11-14T15:11:53Z] skip after nomination, probably started by another node
Raw output
start_cluster = True, cluster = <cluster.Cluster object at 0x7fa34b0e6a50>
collection = 'replaces'

    @pytest.mark.timeout(300,func_only=True)
    @pytest.mark.parametrize('collection',['inserts','replaces','updates','deletes','indexes'])
    def test_logical_pitr_crud_PBM_T270(start_cluster,cluster,collection):
        cluster.check_pbm_status()
>       cluster.make_backup("logical")

test_rename_replicaset.py:54: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <cluster.Cluster object at 0x7fa34b0e6a50>, type = 'logical'

    def make_backup(self, type):
        n = testinfra.get_host("docker://" + self.pbm_cli)
        timeout = time.time() + 120
        while True:
            running = self.get_status()['running']
            Cluster.log("Current operation: " + str(running))
            if not running:
                if type:
                    start = n.run(
                        'pbm backup --out=json --type=' + type)
                else:
                    start = n.run('pbm backup --out=json')
                if start.rc == 0:
                    name = json.loads(start.stdout)['name']
                    Cluster.log("Backup started")
                    break
                elif "resync" in start.stdout:
                    Cluster.log("Resync in progress, retrying: " + start.stdout)
                else:
                    logs = n.check_output("pbm logs -sD -t0")
>                   assert False, "Backup failed" + start.stdout + start.stderr + '\n' + logs
E                   AssertionError: Backup failed{"Error":"get backup metadata: get: context deadline exceeded"}
E                   
E                   2024-11-14T15:11:41Z I [rs1/rs101:27017] pbm-agent:
E                   Version:   2.7.0
E                   Platform:  linux/amd64
E                   GitCommit: 192769fd681964e48871725f596761b8933bdad4
E                   GitBranch: CURRENT_PR
E                   BuildTime: 2024-11-14_14:08_UTC
E                   GoVersion: go1.22.9
E                   2024-11-14T15:11:41Z I [rs1/rs102:27017] pbm-agent:
E                   Version:   2.7.0
E                   Platform:  linux/amd64
E                   GitCommit: 192769fd681964e48871725f596761b8933bdad4
E                   GitBranch: CURRENT_PR
E                   BuildTime: 2024-11-14_14:08_UTC
E                   GoVersion: go1.22.9
E                   2024-11-14T15:11:41Z I [rs1/rs102:27017] starting PITR routine
E                   2024-11-14T15:11:41Z I [rs1/rs101:27017] starting PITR routine
E                   2024-11-14T15:11:41Z I [rs1/rs103:27017] pbm-agent:
E                   Version:   2.7.0
E                   Platform:  linux/amd64
E                   GitCommit: 192769fd681964e48871725f596761b8933bdad4
E                   GitBranch: CURRENT_PR
E                   BuildTime: 2024-11-14_14:08_UTC
E                   GoVersion: go1.22.9
E                   2024-11-14T15:11:41Z I [rs1/rs103:27017] starting PITR routine
E                   2024-11-14T15:11:41Z I [rs1/rs101:27017] node: rs1/rs101:27017
E                   2024-11-14T15:11:41Z I [rs1/rs102:27017] node: rs1/rs102:27017
E                   2024-11-14T15:11:41Z I [rs1/rs103:27017] node: rs1/rs103:27017
E                   2024-11-14T15:11:41Z E [rs1/rs101:27017] [agentCheckup] check storage connection: unable to get storage: get config: get: mongo: no documents in result
E                   2024-11-14T15:11:41Z E [rs1/rs102:27017] [agentCheckup] check storage connection: unable to get storage: get config: get: mongo: no documents in result
E                   2024-11-14T15:11:41Z I [rs1/rs101:27017] conn level ReadConcern: majority; WriteConcern: majority
E                   2024-11-14T15:11:41Z I [rs1/rs102:27017] conn level ReadConcern: majority; WriteConcern: majority
E                   2024-11-14T15:11:41Z I [rs1/rs103:27017] conn level ReadConcern: majority; WriteConcern: majority
E                   2024-11-14T15:11:41Z E [rs1/rs103:27017] [agentCheckup] check storage connection: unable to get storage: get config: get: mongo: no documents in result
E                   2024-11-14T15:11:41Z I [rs1/rs102:27017] listening for the commands
E                   2024-11-14T15:11:41Z I [rs1/rs101:27017] listening for the commands
E                   2024-11-14T15:11:41Z I [rs1/rs103:27017] listening for the commands
E                   2024-11-14T15:11:46Z E [rs1/rs101:27017] [agentCheckup] check storage connection: unable to get storage: get config: get: mongo: no documents in result
E                   2024-11-14T15:11:46Z E [rs1/rs102:27017] [agentCheckup] check storage connection: unable to get storage: get config: get: mongo: no documents in result
E                   2024-11-14T15:11:46Z E [rs1/rs103:27017] [agentCheckup] check storage connection: unable to get storage: get config: get: mongo: no documents in result
E                   2024-11-14T15:11:48Z I [rs1/rs102:27017] got command resync <ts: 1731597108>, opid: 6736133461bbacba664157cd
E                   2024-11-14T15:11:48Z I [rs1/rs101:27017] got command resync <ts: 1731597108>, opid: 6736133461bbacba664157cd
E                   2024-11-14T15:11:48Z I [rs1/rs103:27017] got command resync <ts: 1731597108>, opid: 6736133461bbacba664157cd
E                   2024-11-14T15:11:48Z I [rs1/rs102:27017] got epoch {1731597106 6}
E                   2024-11-14T15:11:48Z I [rs1/rs101:27017] got epoch {1731597106 6}
E                   2024-11-14T15:11:48Z I [rs1/rs103:27017] got epoch {1731597106 6}
E                   2024-11-14T15:11:48Z D [rs1/rs101:27017] [resync] lock not acquired
E                   2024-11-14T15:11:48Z I [rs1/rs102:27017] [resync] started
E                   2024-11-14T15:11:48Z D [rs1/rs103:27017] [resync] lock not acquired
E                   2024-11-14T15:11:48Z D [rs1/rs102:27017] [resync] uploading ".pbm.init" [size hint: 5 (5.00B); part size: 10485760 (10.00MB)]
E                   2024-11-14T15:11:48Z D [rs1/rs102:27017] [resync] got backups list: 0
E                   2024-11-14T15:11:48Z D [rs1/rs102:27017] [resync] got physical restores list: 0
E                   2024-11-14T15:11:48Z D [rs1/rs102:27017] [resync] epoch set to {1731597108 19}
E                   2024-11-14T15:11:48Z I [rs1/rs102:27017] [resync] succeed
E                   2024-11-14T15:11:54Z I [rs1/rs102:27017] got command resync <ts: 1731597113>, opid: 67361339d7e79927bf9b1904
E                   2024-11-14T15:11:54Z I [rs1/rs103:27017] got command resync <ts: 1731597113>, opid: 67361339d7e79927bf9b1904
E                   2024-11-14T15:11:54Z I [rs1/rs101:27017] got command resync <ts: 1731597113>, opid: 67361339d7e79927bf9b1904
E                   2024-11-14T15:11:54Z I [rs1/rs103:27017] got epoch {1731597108 19}
E                   2024-11-14T15:11:54Z I [rs1/rs102:27017] got epoch {1731597108 19}
E                   2024-11-14T15:11:54Z I [rs1/rs101:27017] got epoch {1731597108 19}
E                   2024-11-14T15:11:54Z D [rs1/rs102:27017] [resync] lock not acquired
E                   2024-11-14T15:11:54Z I [rs1/rs103:27017] [resync] started
E                   2024-11-14T15:11:54Z I [rs1/rs102:27017] got command backup [name: 2024-11-14T15:11:53Z, compression: none (level: default)] <ts: 1731597113>, opid: 6736133965f2960bc0250fbc
E                   2024-11-14T15:11:54Z D [rs1/rs101:27017] [resync] lock not acquired
E                   2024-11-14T15:11:54Z I [rs1/rs101:27017] got command backup [name: 2024-11-14T15:11:53Z, compression: none (level: default)] <ts: 1731597113>, opid: 6736133965f2960bc0250fbc
E                   2024-11-14T15:11:54Z I [rs1/rs102:27017] got epoch {1731597108 19}
E                   2024-11-14T15:11:54Z I [rs1/rs101:27017] got epoch {1731597108 19}
E                   2024-11-14T15:11:54Z D [rs1/rs103:27017] [resync] got backups list: 0
E                   2024-11-14T15:11:54Z E [rs1/rs101:27017] [backup/2024-11-14T15:11:53Z] unable to proceed with the backup, active lock is present
E                   2024-11-14T15:11:54Z D [rs1/rs103:27017] [resync] got physical restores list: 0
E                   2024-11-14T15:11:54Z D [rs1/rs103:27017] [resync] epoch set to {1731597114 20}
E                   2024-11-14T15:11:54Z I [rs1/rs103:27017] [resync] succeed
E                   2024-11-14T15:11:54Z I [rs1/rs103:27017] got command backup [name: 2024-11-14T15:11:53Z, compression: none (level: default)] <ts: 1731597113>, opid: 6736133965f2960bc0250fbc
E                   2024-11-14T15:11:54Z I [rs1/rs103:27017] got epoch {1731597114 20}
E                   2024-11-14T15:12:09Z D [rs1/rs102:27017] [backup/2024-11-14T15:11:53Z] nomination timeout
E                   2024-11-14T15:12:09Z D [rs1/rs102:27017] [backup/2024-11-14T15:11:53Z] skip after nomination, probably started by another node
E                   2024-11-14T15:12:09Z D [rs1/rs103:27017] [backup/2024-11-14T15:11:53Z] nomination timeout
E                   2024-11-14T15:12:09Z D [rs1/rs103:27017] [backup/2024-11-14T15:11:53Z] skip after nomination, probably started by another node

cluster.py:393: AssertionError