Skip to content

Commit 804c1e3

Browse files
committed
[#27420] docdb: Fix flaky test MasterPathHandlersItest.TestClusterBalancerWarnings
Summary: This one is the same issue as D44361. After D43208, the advisory lock table and transactions table are now created by CatalogManagerBgTasks. In slower builds (e.g., TSAN or ASAN), these tables might be created while only 2 out of 3 tservers are live. This can lead to them initially being created with only 2 replicas, making them under-replicated. The Load Balancer will then later handle the under replicated tablet. Thus, we need more wait time before checking the Warnings Summary table to account for the time required by the Load Balancer to process the under-replicated advisory lock and transaction tablets. The test was previously flaky due to insufficient wait time, as seen in the failure below: ``` /share/jenkins/workspace/github-yugabyte-db-alma8-master-clang19-tsan/yugabyte-db/src/yb/integration-tests/master_path_handlers-itest.cc:1751 Expected equality of these values: rows.size() Which is: 0 1 ``` Jira: DB-16959 Test Plan: ./yb_build.sh asan --cxx-test integration-tests_master_path_handlers-itest --gtest_filter MasterPathHandlersItest.TestClusterBalancerWarnings Reviewers: asrivastava Reviewed By: asrivastava Subscribers: ybase Differential Revision: https://phorge.dev.yugabyte.com/D44377
1 parent b986bb4 commit 804c1e3

File tree

1 file changed

+15
-7
lines changed

1 file changed

+15
-7
lines changed

src/yb/integration-tests/master_path_handlers-itest.cc

Lines changed: 15 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1746,13 +1746,21 @@ TEST_F(MasterPathHandlersItest, TestClusterBalancerWarnings) {
17461746
auto hp = HostPort::FromBoundEndpoint(cluster_->mini_tablet_server(0)->bound_rpc_addr());
17471747
ASSERT_OK(yb_admin_client_->ChangeBlacklist({hp}, true /* add */, false /* blacklist_leader */));
17481748

1749-
SleepFor(FLAGS_catalog_manager_bg_task_wait_ms * 2ms); // Let the load balancer run once
1750-
auto rows = ASSERT_RESULT(GetHtmlTableRows("/load-distribution", "Warnings Summary"));
1751-
ASSERT_EQ(rows.size(), 1);
1752-
ASSERT_EQ(rows[0].size(), 2);
1753-
ASSERT_STR_CONTAINS(rows[0][0], "Could not find a valid tserver to host tablet");
1754-
// 3 user tablets + system tablets
1755-
auto tablet_count = std::stoi(rows[0][1]);
1749+
ANNOTATE_UNPROTECTED_WRITE(FLAGS_TEST_sleep_before_reporting_lb_ui_ms) = 500;
1750+
std::vector<std::string> row;
1751+
ASSERT_OK(WaitFor([&]() -> Result<bool> {
1752+
auto rows = VERIFY_RESULT(GetHtmlTableRows("/load-distribution", "Warnings Summary"));
1753+
if (rows.empty()) {
1754+
return false;
1755+
}
1756+
SCHECK_EQ(rows.size(), 1, IllegalState, "Expected one row");
1757+
row = rows[0];
1758+
return true;
1759+
}, 10s /* timeout */, "Waiting for warnings to show up in the Warnings Summary table"));
1760+
1761+
ASSERT_EQ(row.size(), 2);
1762+
ASSERT_STR_CONTAINS(row[0], "Could not find a valid tserver to host tablet");
1763+
auto tablet_count = std::stoi(row[1]);
17561764
ASSERT_GT(tablet_count, 3);
17571765
}
17581766

0 commit comments

Comments
 (0)