The first thing I do in the morning at work is to scan my email looking for nightly failed SQL Agent jobs. Not only does my inbox have all the SQL Agent Job failures, but it also has lots of successful job notifications and alert email messages. Finding the nightly job failures is just one of my morning DBA rituals. It is fairly easy to scan all the email with a number of creative search criteria. But I would like a more automated approach.
In looking for a more automated approach I developed a script that will find and report all job failures. I can use my scripted solution to run against the set of servers in our Central Management Server environment to get a single report across all of our SQL Server instances. The script I developed reads SQL Agent job history information from the msdb database.
By joining a set of job history tables together I was able to quickly produce a single failure report. The report not only listed which jobs failed, but in addition also includes the step failure information. Having this information allows for a quick triage of the job failures. Having this triage information lays the ground work for morning firefighting of nightly job failures.
Here is the script I cobbled together to help with my morning SQL Agent job failure triage task:
DECLARE @OneDayAgo INT = CONVERT(INT,DATEADD(dd,-1,getdate()),112); SELECT j.name, s.step_name, h.step_id, h.step_name, CONVERT(date,cast(h.run_date as varchar(8)),111) run_date, STUFF(STUFF(RIGHT(CAST(h.run_time + 1000000 as varchar(8)),6),3,0,':'),6,0,':') AS run_time, h.message, h.server FROM msdb.dbo.sysjobhistory h INNER JOIN msdb.dbo.sysjobs j ON h.job_id = j.job_id INNER JOIN msdb.dbo.sysjobsteps s ON j.job_id = s.job_id AND h.step_id = s.step_id WHERE h.run_status = 0 -- did the job fail AND h.run_date > @OneDayAgo ORDER BY h.instance_id DESC;