Description
Hi, the recent change made to MSSQL merge statements via pull request #4811 adds significant computational overhead to merge statements and does not work in all scenarios.
Adding the EXISTS statement on every column causes every row of source data to be compared against the entire target table, leading to inefficient table scans. While this potentially reduces transaction log writes by limiting the amount of data updated from the source, it significantly increases disk reads on the target and requires significant CPU usage for value comparison. As the table continues to grow, merge statements will have overhead added in proportion to target table size.
WHEN MATCHED AND EXISTS(
--Reads the entire dataset from target (table scan)
SELECT [__MERGE_TARGET__].[Column1],
[__MERGE_TARGET__].[Column2]
--Compared against every row/column of new data (CPU intensive)
EXCEPT SELECT [__MERGE_SOURCE__].[Column1],
[__MERGE_SOURCE__].[Column2]
)
Also, for large LOB column types (TEXT, NTEXT, XML, GEOMETRY, GEOGRAPHY), MSSQL is unable to compare values in an EXCEPT statement, breaking incremental model types. On execution of a model update containing a GEOMETRY column type, the server returns the following error:
OperationalError:
(5335, b'The data type geometry cannot be used as an operand to the UNION, INTERSECT or EXCEPT operators because it is not comparable.DB-Lib error message 20018, severity 16')
While the updated merge language may work in certain scenarios, I do not believe it is an appropriate default for most incremental MSSQL models.