Skip to content

Commit

Permalink
[#24960] YSQL: Fix pg shutdown handling issue due to incorrect signal…
Browse files Browse the repository at this point in the history
… blocking

Summary:
Postgres runs each process with a single thread (mostly), but in yb, each process process starts several yb threads. These threads are meant to block the typical pg signals like `SIGINT`, `SIGTERM` and `SIGQUIT`. However in mac release builds the threads under `ev_run` are unblocking `SIGTERM` causing incorrect handling of the process shutdown.

Also in production scenarios pg is never gracefully shutdown. Instead yb-tserver is killed which causes PG to receive `YB_PG_PDEATHSIG` which is `SIGQUIT` (Immediate Shutdown). Whereas in the tests we use `SIGINT` (Fast Shutdown). This means the tests are not actually validating the pg shutdown behavior used in production.

This change addresses both issues by switching the graceful yb shutdown to also use `SIGQUIT`.

PG also uses `SIGALRM`, which yb threads should block. This has also been addressed.

`ProcessSupervisor::Stop` was unnecessarily issuing `SIGINT` several time when the process fails to exit, and only after 1 min uses `SIGQUIT`. With the above changes this logic no longer makes sense, so switching to log warning every 10s and force killing the process with `SIGKILL` after 1min.

Changed the `DFATAL` in `YBCInterruptPgGate` and ` `YBCDestroyPgGate`  to `FATAL` since if we continue execution there is a high probability of hitting SIGSEGV.

Fixes #24960
Jira: DB-14096

Test Plan: Jenkins

Reviewers: smishra, telgersma

Reviewed By: telgersma

Subscribers: yql, svc_phabricator, ybase

Differential Revision: https://phorge.dev.yugabyte.com/D40676
  • Loading branch information
hari90 committed Dec 24, 2024
1 parent ed4b4d2 commit ac52997
Show file tree
Hide file tree
Showing 3 changed files with 17 additions and 26 deletions.
3 changes: 2 additions & 1 deletion src/yb/util/signal_util.cc
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,8 @@ const std::vector<int> kYsqlHandledSignals{
SIGUSR1, // procsignal_sigusr1_handler
SIGFPE, // FloatExceptionHandler
SIGTERM, // bgworker_die
SIGQUIT // bgworker_quickdie
SIGQUIT, // bgworker_quickdie
SIGALRM // handle_sig_alarm
};

Result<sigset_t> ThreadYsqlSignalMaskBlock() {
Expand Down
8 changes: 4 additions & 4 deletions src/yb/yql/pggate/ybc_pggate.cc
Original file line number Diff line number Diff line change
Expand Up @@ -522,8 +522,8 @@ void YBCInitPgGate(const YBCPgTypeEntity *data_type_table, int count, PgCallback
}

void YBCDestroyPgGate() {
LOG_IF(DFATAL, !is_main_thread())
<< __PRETTY_FUNCTION__ << " should only be invoked from the main thread";
LOG_IF(FATAL, !is_main_thread())
<< __PRETTY_FUNCTION__ << " should only be invoked from the main thread";

if (pgapi_shutdown_done.exchange(true)) {
LOG(DFATAL) << __PRETTY_FUNCTION__ << " should only be called once";
Expand All @@ -537,8 +537,8 @@ void YBCDestroyPgGate() {
}

void YBCInterruptPgGate() {
LOG_IF(DFATAL, !is_main_thread())
<< __PRETTY_FUNCTION__ << " should only be invoked from the main thread";
LOG_IF(FATAL, !is_main_thread())
<< __PRETTY_FUNCTION__ << " should only be invoked from the main thread";

pgapi->Interrupt();
}
Expand Down
32 changes: 11 additions & 21 deletions src/yb/yql/process_wrapper/process_wrapper.cc
Original file line number Diff line number Diff line change
Expand Up @@ -44,14 +44,7 @@ Result<int> ProcessWrapper::Wait() {
return proc_->Wait();
}

void ProcessWrapper::Kill() {
// TODO(fizaa): Use SIGQUIT in asan build until GH #15168 is fixed.
#ifdef ADDRESS_SANITIZER
Kill(SIGQUIT);
#else
Kill(SIGINT);
#endif
}
void ProcessWrapper::Kill() { Kill(SIGQUIT); }

void ProcessWrapper::Kill(int signal) {
WARN_NOT_OK(proc_->Kill(signal), "Kill process failed");
Expand Down Expand Up @@ -163,20 +156,17 @@ void ProcessSupervisor::Stop() {
if (thread_finished_latch_.WaitFor(10s)) {
break;
}
auto passed = MonoDelta(CoarseMonoClock::now() - start);
bool force_kill = passed >= 1min;
auto message = Format(
"$0 did not gracefully exist after $1. $2.",
GetProcessName(), passed,
force_kill ? "Force killing it" : "Retry");
if (force_kill) {
LOG(DFATAL) << message;
const auto passed = MonoDelta(CoarseMonoClock::now() - start);
if (passed >= 1min) {
LOG(DFATAL) << GetProcessName() << " did not gracefully exit after " << passed
<< ". Force killing it with SIGKILL";
std::lock_guard lock(mtx_);
if (process_wrapper_) {
process_wrapper_->Kill(SIGKILL);
}
break;
} else {
LOG(WARNING) << message;
}
std::lock_guard lock(mtx_);
if (process_wrapper_) {
process_wrapper_->Kill(force_kill ? SIGQUIT : SIGINT);
LOG(WARNING) << GetProcessName() << " did not gracefully exist after " << passed;
}
}
supervisor_thread_->Join();
Expand Down

0 comments on commit ac52997

Please sign in to comment.