Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invoke-IcingaCheckMSSQLHealth : Incorrect determination of service status in Active-Passive SQL Server Cluster #56

Open
Aleksey-Maksimov opened this issue Dec 16, 2024 · 0 comments

Comments

@Aleksey-Maksimov
Copy link

Hello.

We have several two-node SQL Server Clusters running in Active-Passive mode.
In this mode, the primary service of the SQL Server instance has a startup type of Manual on both cluster nodes.
On the active cluster node, the service should be in the Running state, and on the passive cluster node, the service should be in the Stopped state. This is the normal behavior of a service in a cluster.

However, the Invoke-IcingaCheckMSSQLHealth cmdlet does not take advantage of this specificity of the service in the cluster and returns an incorrect critical service status.

Current result on the active cluster node (hostname SRV-DB21):

Invoke-IcingaCheckMSSQLHealth -IntegratedSecurity -SqlHost DBCL22 -Verbosity 2
[OK] MSSQL Health (SCVMM)
\_ [OK] Connection Time: 0.003s
\_ [OK] Service "SQL Server (SCVMM) (MSSQL$SCVMM)": Running
| 'scvmm::ifw_mssqlhealth::connectiontime'=0.003s;; 'mssqlscvmm::ifw_mssqlhealth::state'=4;;4
0

Current result on the passive cluster node (hostname SRV-DB22)::

Invoke-IcingaCheckMSSQLHealth -IntegratedSecurity -SqlHost DBCL22 -Verbosity 2

[CRITICAL] MSSQL Health (SCVMM) [CRITICAL] Service "SQL Server (SCVMM) (MSSQL$SCVMM)" (Stopped)
\_ [OK] Connection Time: 0.098s
\_ [CRITICAL] Service "SQL Server (SCVMM) (MSSQL$SCVMM)": Stopped is not matching threshold Running
| 'scvmm::ifw_mssqlhealth::connectiontime'=0.098s;; 'mssqlscvmm::ifw_mssqlhealth::state'=1;;4
2

To determine whether a SQL Server instance is clustered and which cluster node is currently active, we can use the document:
https://learn.microsoft.com/en-us/sql/t-sql/functions/serverproperty-transact-sql?view=sql-server-ver16&redirectedfrom=MSDN

First, we can check whether the SQL Server instance is clustered using a SQL query:

Select ServerProperty('IsClustered')

A value of "1" indicates that the instance is clustered.
A value of "0" indicates that it is a simple non-clustered instance.

This SQL query to get the ComputerNamePhysicalNetBIOS server property can return the name of the active node of the SQL Server cluster:

Select ServerProperty('ComputerNamePhysicalNetBIOS')

Example of returned data:

SRV-DB21

Then in PowerShell, using $env:computername, we can find out the name of the host on which the Invoke-IcingaCheckMSSQLHealth cmdlet is currently running.

Example of returned data:

SRV-DB21

If the host name obtained from the environment variable and the name of the active SQL Server node obtained from the SQL query match, then the cmdlet is running on the active cluster node and the SQL Server service should have the Running state.

Otherwise, if the host name obtained from the environment variable and the name of the active SQL Server node obtained from the SQL query do not match, then the cmdlet is running on the passive cluster node and the SQL Server service may have the Stopping state.

Please change the logic of checking the state of the SQL Server service using this feature of working in a cluster

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant