This week my boss asked who wanted to install SQL Server on a new server. Seeing as it had been awhile since I did one, or at least it seemed like it, I happily volunteered. I needed some time away from my desk anyway so I was pretty excited. Much to my dismay, it turned out that 1) the server had already been released to the users and they started putting documents and stuff on it and 2) it didn’t have the separate drives for the data and log files, which is part of our standard install. So now I get to wait until they add or partition those two drives. No biggie. Good things come to those who wait, right? At least that’s what I keep telling myself and my soon-to-be 5 year old daughter… her theme song should be “I Want It All” by Queen. Seriously, it is SO her! 🙂
In the meantime, I decided it would be a good idea to update the status on the open help desk tickets assigned to me and close the ones where I managed to actually fix the issues. All excited, I was finally able to close enough help desk tickets to where I had less than 30 tickets to my name. Finally! *happy dance* Well… that was short lived. My DBA buddy, who was manning the unassigned ticket queue, assigned me a few more. So now I’m back up to 30. *sigh* Maybe I shouldn’t have put the giant spider under her desk while she was away. That’s okay. The tickets should keep me busy and out of trouble… hopefully… for now…
Yesterday, my DBA replacement from my last job sent me an email asking me if I just so happened to remember the password to a certain DR document. Funny thing… even though it’s been around 1.5 years since I worked there I actually remembered it! Probably because it made the senior DBA laugh when I told her what it was. No, I’m not posting the password here. Nice try! 😛 So I told her what it was and then promptly realized I should have asked for “payment” up front. Drat! Too late! Or was it? I thought I’d try anyway and promptly emailed her specifying that she needed to send me dark chocolate as payment for the correct password. So she sends me a picture of dark chocolate truffles… LMAO Yeah, she’s awesome like that! Lesson learned? Be more specific! *sigh… smack head on desk*
Anyway, on to this week’s feature. I just know you all were anxiously waiting for this week’s script! Don’t deny it! You know who you are! Anywho, for those who missed them, a couple of posts ago I had this crazy idea to post parts of my daily SQL Server checking script every week. Week one was all about SQL and agent error logs. Last week was about server and database level triggers. Since I haven’t been flogged (neither publicly or privately)… yet… *crossing fingers and toes* I shall continue my mini quest to impart at least some of my knowledge… This week is all about failures. No, not my personal failures. At least not this week… and not without several strong mojitos or beer… or both… well, there was that time at code camp… oh, sorry. Where was I? Oh yes! Failures!
So why did I feel the need to include job failures in my script when I could just use the GUI via Management Studio? When you have approximately 30+ database servers to check in the morning, like I had at my last job, time is of the essence, usually. Besides, I would rather know ahead of time and quickly if a job failed. I didn’t want to hear it from a client first. Plus we didn’t have any kind of third-party monitoring system. Besides, I liked the idea of them thinking I was on top of things and semi-omniscient when it came to the databases. The following script is to check of any of the backup jobs failed. How do I know if it’s a backup job? Our backup jobs usually have the word “backup” in the name. 😉 You can use whatever you want, though. Disclaimer: I wrote these queries years ago. There are probably better ways of doing the same thing. This is what I’ve done. Feel free to experiment.
--Check backup jobs... select 'Check if backup jobs failed...' select 'job_name'=left(j.name,40), jh.run_date, jh.run_time, jh.step_id, 'step_name'=left(rtrim(jh.step_name),30), jh.sql_message_id, jh.sql_severity, 'message'=left(rtrim(jh.message),100), jh.run_status, jh.run_duration, 'server'=left(rtrim(jh.server),10), 'job_descr'=left(rtrim(j.description),30), j.date_created, j.date_modified from msdb..sysjobhistory jh join msdb..sysjobs j on jh.job_id = j.job_id where jh.run_status<>1 and jh.step_id > 0 and substring(cast(jh.run_date as varchar), 1, 4) = year(getdate()) and substring(cast(jh.run_date as varchar), 5, 2) = month(getdate()) and substring(cast(jh.run_date as varchar), 7, 2) >= day(getdate())-7 and j.name like '%backup%' order by jh.run_date desc
The main part of the above query is looking in the msdb.dbo.sysjobhistory table joined with sysjobs table for any jobs with the word “backup” in the name. I also check where the run_status is not 1 (note: 1 is successful) and the step_id is greater than 0. The records where step_id = 0 are for the entire jobs themselves. Personally, I want to see which step(s) failed. Next I check all other job failures (see below). I separated out backup job failures because it was my preference, but you don’t have to.
--Check jobs... select 'jobs that failed...' select 'job_name'=left(rtrim(j.name),30),jh.instance_id, jh.run_date, jh.run_time, jh.step_id ,'step_name'=left(rtrim(jh.step_name),30), jh.sql_message_id, jh.sql_severity, 'message'=left(rtrim(jh.message),100), jh.run_status, jh.run_duration ,'server'=left(rtrim(jh.server),10), 'job_name'=left(rtrim(j.name),30), 'job_descr'=left(rtrim(j.description),30), j.date_created, j.date_modified from msdb..sysjobhistory jh join msdb..sysjobs j on jh.job_id = j.job_id where jh.run_status<>1 and jh.run_status<>4 and substring(cast(jh.run_date as varchar), 1, 4) = year(getdate()) and substring(cast(jh.run_date as varchar), 5, 2) = month(getdate()) order by jh.run_date desc
Since I’m on the topic of jobs, depending on your environment, you may want to check for job owners that aren’t what they should be. Our jobs should be running under the same generic account we have setup. So I like to check that it hasn’t changed. It may not be a big deal but in some cases, it could be.
--Check for jobs owners...
select ‘jobs whose owner is not what is should be…’
select left(SUSER_SNAME(j.owner_sid), 30) as ‘owner’, ‘job_name’=left(rtrim(j.name),40), j.date_created, j.date_modified
from msdb..sysjobs j
where SUSER_SNAME(j.owner_sid) <> ‘MySQLServerOwnerName’
order by j.name
In addition, it may be a good idea to check for jobs that are still running and could be stuck for whatever reason. Have you ever come in to work after a nice long weekend only to discover that one of the jobs that should only take 15 minutes to run actually ran all weekend and is still running? Since this has happened to me, I like to check for long running jobs just in case. The code is below:
exec msdb.dbo.sp_get_composite_job_info @execution_status = 1 –1:Executing
Okie dokie. I’m almost done. Next is just a quick check to see if database mail has failed. Database mail is available in SQL Server 2005 and on. The script below just checks the sysmail_allitems table in msdb to see if the sent_status is anything other than “sent.” In our case, if database mail isn’t being sent it’s usually because somehow the port became blocked. The default port for database mail is 25, in case you were wondering.
–Check db mail for failures…
select ‘Check db mail for failures…’
,left(recipients,100) as recipients
,left(subject,100) as [subject]
,left(body,50) as body
order by send_request_date desc