To Seek or To Scan: That is My Question

This last week I’ve been pretty busy at work attempting to tune a stored procedure for performance. It all started when my manager asked me if I wanted a challenge. I said “sure!”  Then it all went down hill from there. Kidding! It’s really not that bad. I thought about writing a blog post for my experience tuning this procedure. However, it’s quite involved and I felt there was way too much to write about. Fortunately, my manager gave me the idea of writing about a small part of what I learned this last week in regards to tuning this procedure. So here we are!

Can you make this thing go faster?

The procedure itself returns four (yes, four) data sets all with different columns which really isn’t all that bad. For input, it uses seven parameters which are all optional. It contains one temporary table that is used as the basis for all the select statements. The WHERE clause on the SELECT for the INSERT into the temporary table is basically where I started. I thought it would be a good place as any to start. It contained two left outer joins, around eight “OR” operators, a couple of CASTs, one LIKE  (the wildcards are passed from the application), two inequality checks, and a partridge in a pear tree. Each table also had only a clustered index which is the primary key. No other indexes were created.  Note: I won’t get into the rest of the procedure in this post since it’s a bit lengthy. By the way, this procedure lives in a SQL Server 2005 database. 

I said “Statistics, Not Sadistics!”

One of the biggest things I learned in the last year or so is that performance doesn’t just mean how fast a query runs. It’s also about the resources it uses such as disk input/output (I/O) and CPU time. It’s very possible and likely that running the same query on different servers can result in different performance including time and resources. Which is why it’s good to look at the I/O and not just the time it takes for a query to run. How do you do that? Simple. There’s this really cool option you turn on called “SET STATISTICS IO”. Before you run it, though, it’s generally a good idea to clear your cache first. Otherwise, you’re not really seeing the true disk I/O activity. Here’s how you do that. Warning: Do NOT perform this in production without seriously considering the ramifications! Clearing the cache could (um, probably will) cause performance issues for everything and everybody else using the server since the other procedures and such need to recompile.

DBCC DROPCLEANBUFFERS
DBCC FREEPROCCACHE

SET STATISTICS IO ON

The “DBCC DROPCLEANBUFFERS” removes buffers from the buffer pool while “DBCC FREEPROCCACHE” basically clears the plan cache. If you’re using SQL 2008+ you can actually remove specific plans. I’m pretty sure you can’t do this in 2005, unfortunately. 

As I mentioned earlier, the “SET STATISTICS IO ON” option will report to you the disk I/O activity on the query(ies) you run for the duration of your transaction unless you turn it off. You can also use the “SET STATISTICS TIME” option to display the time it takes to parse, compile, and execute the query in milliseconds. I’m still learning how to read the output from this one so I’m not going to write about it  yet.

Here is an example of what the output can look like from “SET STATISTICS IO”:

Table ‘Table2’. Scan count 3, logical reads 6251, physical reads 13, read-ahead reads 5654, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table ‘Table1’. Scan count 3, logical reads 1953, physical reads 5, read-ahead reads 1800, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

What does this mean?

  • Scan count is the number of times the table is accessed.
  • Logical reads indicate how many pages are read from the data cache.
  • Physical reads show the number of pages that are read from disk.
  • Read-ahead reads indicate how many pages are put into cache.
  • The rest are numbers for LOBs such as ntext and so on.

The numbers above in this case don’t seem that bad. I then wondered what the execution plan for the insert looked like. So I used SQL Sentry‘s oh-so-cool Plan Explorer (it’s free!).  The plan showed clustered index scans had the highest cost. Scans are usually not a good thing because they basically scan or read the table which isn’t usually efficient. It would be like reading the phone book from beginning to end looking for one person’s phone number. So I did some major reading and played with this query a bit to see if I could get it to do an index seek instead.

To make a very long story way shorter, I had read that “OR” operators can cause index scans. So I basically stripped the query down to it’s basics by removing the “OR”s and “CAST”s and rewrote a few other things. After stripping it down and adding one covering index for each table, I cleared the cache again and ran the new query. The logical reads jumped up astronomcially. So I then looked at the query via Plan Explorer and saw that it was now doing index seeks. Cool! I thought. I then wondered why in heck the logical reads were so much higher than with the scan! I thought seeks were better because they actually use the index to look up the values instead of reading the whole table like a scan does. Hmm… I felt really confused.

My “Ah Ha!” Moment

I then scoured the Internet for more information. It turns out that each seek that is done on a page is counted as one logical I/O. However, when a scan is performed each page that is read counts as one logical I/O. Ah ha! Okay. That makes sense. So now I wondered which one is better in this case? How do I know if I should do a scan versus a seek since the logical reads are calculated differently with each one?

I then turned to my fellow SQL peeps on Twitter. I received great explanations and help from Amit Banerjee and Aaron Bertrand!  You guys so rock! Amit helped clear up my confusion on what scan counts really mean and provided a really cool link that really helped. Aaron explained that I should look at the overall picture instead of focusing on just one aspect of the plan. So guess what?  I found that when I ran the query that uses the index scan the overall performance was actually a bit better than with the seek. Go figure!

Hindsight

After contemplating this for a bit, I had another “ah ha!” moment. Instead of focusing on the one insert statement, I probably should actually focus on the procedure as a whole to see if one aspect is less efficient than the other. I took a few moments to chastise myself that I should have thought of this earlier in the week instead of spending several days on one statement. Oh well. The good news is that I am learning some good stuff and it will be worth it when I finally figure it out. I’m contemplating working on figuring out the tipping point which is something super cool that I learned about at Kimberly Tripp’s sessions at SQLConn April 2010.  Time to dig out my notes and revisit her blog! So guess what I’ll be doing next week? 😉

Note: If I wrote something that’s not correct, please let me know and I’ll fix it! Please be kind! I’m still learning this stuff! Thanks! 🙂

Advertisements

2 thoughts on “To Seek or To Scan: That is My Question

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s