Once upon a time not so long ago in a cubicle far far away, I started a series of posts about the script I run daily to check my database servers. I wrote this script when I first became a DBA which was some time ago. Some days it seems like another lifetime ago. During this last year, I started updating the script, which it so desperately needed, and have been posting my findings including what I’ve learned since then.
Today’s post is about the final item I currently check for. Finally! However, I’ll be adding to the script as I learn of new things I need to be on the lookout for. I’ll post about my new findings as well. One of these days we’ll have the time to implement Policy Based Management (PBM) which will help out a ton! So without further ado, today it’s all about jobs – job failures, job owners, and jobs that have been running all weekend because the procedure that’s running is stuck in a never ending loop because of a rare occurring condition that wasn’t tested for.
Failure Is an Option?
Below is the original query I wrote for SQL Server 2000 to list jobs that were not successful. It will still work in later versions of SQL Server too. Imagine that!
select sysjobs.name as 'job_name' , sysjobs.[description] as 'job_description' , sysjobs.date_created , sysjobs.date_modified , sysjobhistory.step_id , sysjobhistory.step_name , sysjobhistory.run_date , sysjobhistory.run_time , sysjobhistory.run_status , sysjobhistory.run_duration , sysjobhistory.sql_message_id , sysjobhistory.sql_severity , sysjobhistory.[server] , sysjobhistory.[message] from msdb.dbo.sysjobhistory inner join msdb.dbo.sysjobs on sysjobhistory.job_id = sysjobs.job_id where sysjobhistory.run_status <> 1 and substring(cast(sysjobhistory.run_date as varchar), 1, 4) = year(getdate()) and substring(cast(sysjobhistory.run_date as varchar), 5, 2) = month(getdate()) order by sysjobhistory.run_date desc
For an example of what the results can look like when the above query is executed, click here.
Hopefully it’s fairly straight forward. Here are some items to note. The record where the step_id value is 0 refers to the actual job itself. The rest of the step_id values (in this case, 1) refer to the actual steps within the job. To return records for jobs that flat out failed, you can include run_status = 0 in the where clause. The other status values are 1 for succeeded, 2 for retry, and 3 for cancelled. Personally, I want to know if the job was cancelled or failed which is why I included run_status <> 1.
If you want more information on the job step(s), you can include the sysjobsteps system table in the query. Personally, when I see a job step failed, I then manually look at it from the Job Activity Monitor and go from there. On a slightly side note, one really cool thing about the sysjobsteps system table is that it includes the command that is to be executed. So if you need to find a job that runs a stored procedure called “getItem”, you can query this table to find it! Neat, huh? Yeah, I know. It doesn’t take much to get me excited.
Getting back to the run_status on the sysjobhistory system table, I’ve read in forums that there is another status whose value of 4 is used to indicate jobs that are still running. I’m fairly sure this is not listed in BOL. Someone please correct me if I’m wrong, but as I understand it, this is because the records are written to the system tables after the job is finished running whether it succeeds, fails, or is cancelled. So the system table will never show that the job is still running. However, there is a way to find out if a job is still running via T-SQL.
Run Away Jobs
When I originally wrote this script, I included an undocumented procedure called sp_get_composite_job_info to find jobs that were still running. The good news is that it works great! The bad news is that it’s generally recommended to not use undocumented procedures. Why? They may not be included in future releases of SQL Server. Having said that, it still seems to work in SQL Server 2008 R2.
execute msdb.dbo.sp_get_composite_job_info @execution_status = 1;
So instead of using this procedure, we can use sp_help_job instead. In this case we can use the @execution_status parameter like this:
execute msdb.dbo.sp_help_job @execution_status = 1;
In executing both sp_get_composite_job_info and sp_help_job, the results returned are the same. Click here if you’re curious what the results look like. So it’s probably safer to use sp_help_job since it’s documented in BOL.
So far, I haven’t found an alternate way of doing this in SQL Server 2005 and later. If I find one, I’ll post it.
You Don’t Own Me!
I don’t know what it’s like where you work, but at my current place of employment we actually try to ensure all jobs are running under the same login, unless deemed otherwise. It’s usually a domain login. It’s also good to make sure these jobs aren’t running under your personal login (or anyone else’s) to make it easier in case you (or those other people) leave for whatever reason. Otherwise, someone is going to have a lot of “fun” changing it when that someone leaves.
Below is my original query written for SQL Server 2000, of course. Yes, it will work in later versions as well. This query returns all the jobs that are NOT owned by the specified login.
select suser_sname (sysjobs.owner_sid) as 'owner' , sysjobs.name as 'job_name' , sysjobs.date_created , sysjobs.date_modified from msdb.sys.sysjobs where suser_sname (sysjobs.owner_sid) <> ‘domain\mainLogin’ order by sysjobs.name;
Running the above query will return results like this.
I haven’t found an alternate way of doing this in later versions of SQL yet. If I find one, I’ll write a post about it.
But Wait! There’s More!
Having written all that stuff about queries and so forth, if you really want a lot of information back about jobs or you don’t want to query the system tables (note: most recommend NOT to query system tables), you can execute the sp_help_job stored procedure. If you execute it without any parameters, it will return one record set about all the jobs. To narrow the results down, you can execute it with parameters such as the job_id, job_name, and so on. There are quite a few parameters. If you execute it for one particular job setting the job_name parameter, it will return record sets for the job steps, schedules, and target servers. It’s quite a bit of information unless you also set the job_aspect parameter. The available values are ‘ALL’, ‘JOB’, ‘SCHEDULES’, ‘STEPS’, and ‘TARGETS’.
So for the purposes of keeping the results down to a minimum in my script, I like to query the system tables just to give me a heads-up that there is an issue with at least one job. I also like the flexibility of returning only the columns I’m interested in. I probably should be a good girl and change my script to execute the stored procedures into a temp table or table variable and query off that instead. Even better, to possibly dump the stored procedure results in a real table for trend analysis or future troubleshooting.
In SQL Server 2005 and later, you can also execute a stored procedure called sp_help_jobactivity. It can be executed using parameters to indicate a specific SQL agent session, a specific job by id or name, or without parameters to return all job activity.
So if I execute it for my “Wait For It” job while it’s running, like this:
execute msdb.dbo.sp_help_jobactivity @job_name='Wait For It'
The results returned look like this.
So there you go! Some more food for thought in regards to monitoring jobs. As with all my posts, if I missed something or did something screwy, please let me know! I would really appreciate it.
So what’s next? Chances are fairly good I’ll start blogging about the scary-cool SQL stuff I’m learning in regards to a ScarePoint, err… I mean SharePoint project I’m involved in. As one of my co-workers recently informed me, I should have lots of good material for my blog. We’re fairly new to this SharePoint stuff so it should be interesting, and I’m sure it’ll be an adventure.
Our next SQL Server user group meeting (March 16th) should be fun. Hey, every meeting is fun! We are planning to do a lightning talk. For those who are scratching their heads right now, a lightning talk is one where each participant speaks on a topic for about 5 minutes using only pictures (or at least that’s what I’ve been told). I’m slated to speak in this one. Yikes! I think I have a topic but will run it by my fellow board members first. If they approve, then I may actually do a blog post on it in the near future as well.
Updated: To view the prior post in this series, please refer to “Growing, Growing, Gone!” regarding disk space, logs, and so on.