Tuesday 4 October 2011

T-SQL: Helping the optimiser can hinder

I've been meaning to blog about this for a while and noticing the (earlier than usual) T-SQL Tuesday #23 hosted by Stuart Ainsworth (blog | twitter) on the topic of Joins, I thought i'd cobble something together.

Essentially, this post is about reading the results of SHOWPLAN_TEXT, being aware of the different physical join operators and also how the optimiser can be influenced (poorly) by a bad WHERE clause.

I have the following query on a view which abstracts the 4 tables joined:
 
SELECT *, CAST(Val AS NUMERIC(22,16)) FROM DDL.vwData

The tables behind the view aren't really that important, as I just want to concentrate on the plans generated but basically, there is a lookup table for RateID and then a maindata table which joins to this table (via other intermediate tables). The key thing to note in the query though is the CAST - sometimes the column Val (from maindata) has a value which can't be converted - indeed running this query as is gives the error:

Msg 8115, Level 16, State 8, Line 2
Arithmetic overflow error converting numeric to data type numeric.

I know the data for RateID does have Vals which are CASTable to the decimal precision/scope and changing the query to only return those records where RateID = 1, returns a result set without errors:

SELECT *, CAST(Val AS NUMERIC(22,16)) FROM DDL.vwRateData WHERE RateId = 1

However, if I "improve" the query to filter out these records further to only return data from the last month using a non-sargable clause, i get that pesky overflow error again:

SELECT *, CAST(Val AS NUMERIC(22,16)) FROM DDL.vwRateData WHERE RateId = 1 AND DATEDIFF(D,GETDATE(),[Date])< 365

So whats different? First glance suggests something is wrong because if the whole dataset for RateId=1 returns without error, then choosing a further subset of this data should also work.

I delved into the query plans to find out and the clue was there:

The Good query generated a plan that was using a Nested Loops inner join. In a nutshell, the first thing this query does is filter out the correct rows from the lookup table (RateID = 1) and then iterates through the main table where there is a match on that RateId. In other words, it doesn't matter whether or not the Val column for other RateIDs is CASTable as this plan only touches RateID = 1.



On the other hand, the Bad query generated a plan that was using a Hash Match join. This differs in that it will read all the rows in both the tables, and thus attempting to run the CAST function over every record. Only later in the query plan does it do the filtering on the RateID.



The different types of  JOINs (both logical and physical) are explained really well in this article but the "takeaway" from this post is that you need to be careful with your query construction as the optimiser may choose a plan that can influence both performance and even robustness. Logic may tell you that adding extra filtering criteria will help the optimiser choose a more efficient plan, but in this example it has chosen a plan that has caused the query to fail.

No comments:

Post a Comment

/* add this crazy stuff in so i can use syntax highlighter