Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL Server requests query is inconsistent regarding blocking #10741

Closed
spaghettidba opened this issue Feb 25, 2022 · 0 comments · Fixed by #10553
Closed

SQL Server requests query is inconsistent regarding blocking #10741

spaghettidba opened this issue Feb 25, 2022 · 0 comments · Fixed by #10553
Labels
area/sqlserver bug unexpected problem or unintended behavior

Comments

@spaghettidba
Copy link
Contributor

spaghettidba commented Feb 25, 2022

Relevant telegraf.conf

[[inputs.sqlserver]]
  servers = [
    'Server=localhost;app name=telegraf;log=1;connection timeout=8'
  ]

  database_type = "SQLServer"

  include_query = [
    'SQLServerRequests'
  ]

Logs from Telegraf

No relevant output from Telegraf, the bug is in the SQL code

System info

Telegraf 1.21.4

Docker

No response

Steps to reproduce

Current query to extract sessions and request is generated by dynamic SQL and the generated query looks like this on SQL Server 2019:

SELECT [blocking_session_id] into #blockingSessions FROM sys.dm_exec_requests WHERE [blocking_session_id] != 0
CREATE INDEX ix_blockingSessions_1 ON #blockingSessions ([blocking_session_id])

-- GAP --

SELECT
	 'sqlserver_requests' AS [measurement]
	,REPLACE(@@SERVERNAME,'\',':') AS [sql_instance]
	,s.session_id
	,ISNULL(r.[request_id], 0) AS [request_id]
	,COALESCE(r.[status], s.[status]) AS [status]
	,COALESCE(r.[cpu_time], s.[cpu_time]) AS [cpu_time_ms]
	,COALESCE(r.[total_elapsed_time], s.total_elapsed_time) AS [total_elapsed_time_ms]
	,COALESCE(r.[logical_reads], s.[logical_reads]) AS [logical_reads]
	,COALESCE(r.[writes], s.[writes]) AS [writes]
	,r.[command]
	,r.[wait_time] AS [wait_time_ms]
	,r.[wait_type]
	,r.[wait_resource]
	,r.[blocking_session_id]
	,s.[program_name]
	,s.[host_name]
	,s.[nt_user_name]
	,s.[login_name]
	,LEFT (CASE COALESCE(r.[transaction_isolation_level], s.[transaction_isolation_level])
		WHEN 0 THEN '0-Read Committed'
		WHEN 1 THEN '1-Read Uncommitted (NOLOCK)'
		WHEN 2 THEN '2-Read Committed'
		WHEN 3 THEN '3-Repeatable Read'
		WHEN 4 THEN '4-Serializable'
		WHEN 5 THEN '5-Snapshot'
		ELSE CONVERT (varchar(30), r.[transaction_isolation_level]) + '-UNKNOWN'
	END, 30) AS [transaction_isolation_level]
	,r.[granted_query_memory] AS [granted_query_memory_pages]
	,r.[percent_complete]
	,SUBSTRING(
		qt.[text],
		r.[statement_start_offset] / 2 + 1,
		(CASE WHEN r.[statement_end_offset] = -1
			  THEN DATALENGTH(qt.[text])
			  ELSE r.[statement_end_offset]
		 END - r.[statement_start_offset]) / 2 + 1
	) AS [statement_text]
	,qt.[objectid]
	,QUOTENAME(OBJECT_SCHEMA_NAME(qt.[objectid], qt.[dbid])) + '.' +  QUOTENAME(OBJECT_NAME(qt.[objectid], qt.[dbid])) AS [stmt_object_name]
	,DB_NAME(qt.dbid) AS [stmt_db_name]
	,CONVERT(varchar(20),[query_hash],1) AS [query_hash]
	,CONVERT(varchar(20),[query_plan_hash],1) AS [query_plan_hash]
	,DB_NAME(COALESCE(r.[database_id], s.[database_id])) AS [session_db_name]
	,COALESCE(r.[open_transaction_count], s.[open_transaction_count]) AS [open_transaction]
FROM sys.dm_exec_sessions AS s
LEFT OUTER JOIN sys.dm_exec_requests AS r
	ON s.[session_id] = r.[session_id]
OUTER APPLY sys.dm_exec_sql_text(r.[sql_handle]) AS qt
WHERE
	(s.[session_id] IN (SELECT blocking_session_id FROM #blockingSessions))
	OR (
		r.[session_id] IS NOT NULL
		AND (
			s.is_user_process = 1 
			OR r.[status] COLLATE Latin1_General_BIN NOT IN ('background', 'sleeping')
		)
	)
OPTION(MAXDOP 1)

The capture is performed by two statements: the first saves blocking queries to a temporary table, the second query captures running sessions and requests, including those that are blocking or blocked (saved in the #blockingSessions table).

If one of the sessions in #blockingSessions is closed between the execution of the first and the second query (where I marked with --GAP --), the second query will still include a session which is not blocking.
It could also happen the other way around, that a session is not included even if it is blocking, because the first query did not capture it, but the second did. This one is even worse, because the second query includes the column r.[blocking_session_id], so it might show a blocking session which is not captured, because it's not in the temp table, thus the inconsistency.

Expected behavior

When a session is blocked, I want to see its blocker in the output.

+------------+---------------------+
| session_id | blocking_session_id |
+------------+---------------------+
|        100 | 110                 |
|        110 | NULL                |
+------------+---------------------+

Actual behavior

+------------+---------------------+
| session_id | blocking_session_id |
+------------+---------------------+
|        100 |                 110 |
+------------+---------------------+

If session 110 starts blocking between the two queries, I might get session 100 as blocked, with no sign of session 110 in the output, because it was not captured by the first query.

Additional info

Proposed query:

SELECT
	 [measurement],[sql_instance],[session_id]
	,ISNULL([request_id],0) AS [request_id]
	,[blocking_session_id],[status],[cpu_time_ms]
	,[total_elapsed_time_ms],[logical_reads],[writes]
	,[command],[wait_time_ms],[wait_type]
	,[wait_resource],[program_name]
	,[host_name],[nt_user_name],[login_name]
	,[transaction_isolation_level],[granted_query_memory_pages],[percent_complete]
	,[statement_text],[objectid],[stmt_object_name]
	,[stmt_db_name],[query_hash],[query_plan_hash]
	,[session_db_name],[open_transaction]
FROM (
	SELECT
		 'sqlserver_requests' AS [measurement]
		,REPLACE(@@SERVERNAME,'\',':') AS [sql_instance]
		,s.[session_id]
		,r.[request_id]
		,COALESCE(r.[status], s.[status]) AS [status]
		,COALESCE(r.[cpu_time], s.[cpu_time]) AS [cpu_time_ms]
		,COALESCE(r.[total_elapsed_time], s.total_elapsed_time) AS [total_elapsed_time_ms]
		,COALESCE(r.[logical_reads], s.[logical_reads]) AS [logical_reads]
		,COALESCE(r.[writes], s.[writes]) AS [writes]
		,r.[command]
		,r.[wait_time] AS [wait_time_ms]
		,r.[wait_type]
		,r.[wait_resource]
		,NULLIF(r.[blocking_session_id],0) AS [blocking_session_id]
		,s.[program_name]
		,s.[host_name]
		,s.[nt_user_name]
		,s.[login_name]
		,LEFT (CASE COALESCE(r.[transaction_isolation_level], s.[transaction_isolation_level])
			WHEN 0 THEN '0-Read Committed'
			WHEN 1 THEN '1-Read Uncommitted (NOLOCK)'
			WHEN 2 THEN '2-Read Committed'
			WHEN 3 THEN '3-Repeatable Read'
			WHEN 4 THEN '4-Serializable'
			WHEN 5 THEN '5-Snapshot'
			ELSE CONVERT (varchar(30), r.[transaction_isolation_level]) + '-UNKNOWN'
		END, 30) AS [transaction_isolation_level]
		,r.[granted_query_memory] AS [granted_query_memory_pages]
		,r.[percent_complete]
		,SUBSTRING(
			qt.[text],
			r.[statement_start_offset] / 2 + 1,
			(CASE WHEN r.[statement_end_offset] = -1
				THEN DATALENGTH(qt.[text])
				ELSE r.[statement_end_offset]
			END - r.[statement_start_offset]) / 2 + 1
		) AS [statement_text]
		,qt.[objectid]
		,QUOTENAME(OBJECT_SCHEMA_NAME(qt.[objectid], qt.[dbid])) + '.' +  QUOTENAME(OBJECT_NAME(qt.[objectid], qt.[dbid])) AS [stmt_object_name]
		,DB_NAME(qt.dbid) AS [stmt_db_name]
		,CONVERT(varchar(20),r.[query_hash],1) AS [query_hash]
		,CONVERT(varchar(20),r.[query_plan_hash],1) AS [query_plan_hash]
		,s.[is_user_process]
		,[blocking_or_blocked] = COUNT(*) OVER(PARTITION BY ISNULL(NULLIF(r.[blocking_session_id], 0),s.[session_id]))
	,DB_NAME(COALESCE(r.[database_id], s.[database_id])) AS [session_db_name]
	,COALESCE(r.[open_transaction_count], s.[open_transaction_count]) AS [open_transaction]
	FROM sys.dm_exec_sessions AS s
	LEFT OUTER JOIN sys.dm_exec_requests AS r
		ON s.[session_id] = r.[session_id]
	OUTER APPLY sys.dm_exec_sql_text(r.[sql_handle]) AS qt
) AS data
WHERE
	[blocking_or_blocked] > 1	--Always include blocking or blocked sessions/requests
	OR (
		[request_id] IS NOT NULL	--A request must exists
		AND (	--Always fetch user process (in any state), fetch system process only if active
			[is_user_process] = 1
			OR [status] COLLATE Latin1_General_BIN NOT IN ('background', 'sleeping')
		)
	)
OPTION(MAXDOP 1);
  • Everything is performed in a single query.
  • The blocking information is retrieved efficiently, whithout a subquery (that's probably the reason why blocking was moved to a temp table, because performance was poor).
  • The relevant bit of code is [blocking_or_blocked] = COUNT(*) OVER(PARTITION BY ISNULL(NULLIF(r.[blocking_session_id], 0),s.[session_id]))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/sqlserver bug unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant