Sometimes nothing can beat the awesome-like feeling one gets from helping someone learn something new that excites them and really helps them with their job. In this case, I had the opportunity to show and explain the basics of execution plans and statistics IO to one of our developers using a complicated view she was working on. As a result, she seemed awed by the significant difference a few “minor” changes could make in performance – in a good way.
In this case, performing a select count(1) on this view before making any changes returned a count of over 200,000 records in 20-30 seconds on average. I don’t know about you but 20-30 seconds for 200,000 records usually isn’t a good thing. By the way, this system is running on SQL Server 2005 x86, 4 processors, yada yada yada.
Note: Normally I would clear the cache for performance testing purposes since we were running this on a development server. However, several other people are using the same server. So I figured clearing the cache on the server probably wouldn’t be a good idea at this time.
I then showed and explained to her about setting “STATISTICS IO” on, reading the output from it, and how to read the execution plan (right to left, top to bottom). After running the select on the view, we found the logical reads from the “SET STATISTICS IO” weren’t great but not horrible either. Each table had around 3000 to 5000 logical reads. Meaning, the number of pages read from cache ranged from 3000 to 5000.
We then looked at the execution plan. The plan was large enough that we needed to scroll up/down, right to left a few times to see it in its entirety. The cost per operator wasn’t horrible. For most of them it was around 1-2%. Most of the operators shown were scans or seeks on either the tables or the indices. I was happy to see at least the indexes were being used. The largest cost was actually on the Sort Distinct operator. So I asked her if the DISTINCT was really necessary. We ran a few queries, found it was not needed, and removed it.
We also removed several “LIKE” operators that were being used in a case statement. Not only were there “LIKE” operators but the wildcard was the first character such as “%value”. This is not a good thing because having the wildcard on the left side means the optimizer will not use an index and will do a scan. In this case, it turned out the “LIKE” wasn’t necessary either. So we replaced them with “=” without wildcards.
After making those few “minor” changes, we re-created the view and ran the same select statement on it. Guess what? Instead of 20-30 seconds, the view now ran in 1 second! Holy freakin’ T-SQL! In looking at the logical reads from the statistics IO output, the values were reduced drastically by several thousand reads! So instead of 3000-5000 logical reads, we were now seeing around 1000 or less. What about the execution plan? It was smaller and much simpler than the original one.
In hindsight, we probably should have made one change then re-run the view to see what effect that particular change had. In this case, I think it was mostly the distinct. I did mention that she could copy the database to her local instance and re-run the original versus the changes there. That way she can clear the cache without affecting anyone else that’s on that particular development server.
It just goes to show that sometimes even just a few “minor” changes can make a significant impact on performance – good or bad. In this case, removing a DISTINCT and replacing some “LIKE” operators seemed to make a huge difference.
To be completely honest, it really made my day knowing I helped someone learn something new that they are really excited about and can actually use. It may be cliché or sound cheesy (mmm… cheese… I miss cheese), but today I felt like I actually made a difference and that, my friends, is what I love most about my job.