Monitoring execution plan changes

Times being a bit busy for me at the moment, unfortunately I have to postpone any “larger” projects, blogging-wise 😉 … but not wanting to lose the habit, I’ve thought I’d publish a little script I’ve written to track changes in execution plans.

The intention is to alert DBAs / developers to SQL statements that due to plan changes have experienced large variation in execution time.
When run, as I’d suggest, every day, the script quickly points out statements whose elapsed time at the most recent point of measurement is much different (in either direction) from elapsed times encountered in earlier snapshots that form part of the comparison moving window.

The moving window size, as well as the threshold (what kind of difference should be considered a big difference?) are parameterized. Different environments / applications will have different optimal values here, plus you will have to find a configuration where the amount of information (number of changed plans) can be handled by the DBAs 🙂

So, here’s the script.

set sqlblanklines on
set lines 800
set pages 1000

/* statements captured during last  days */
with samples as 
 (select *
  from dba_hist_sqlstat st
  join dba_hist_snapshot sn
  using (snap_id, instance_number) 
  where parsing_schema_name = '&schema_name'
  and module  'DBMS_SCHEDULER' -- no sql tuning task
  and executions_delta > 0
  and begin_interval_time between sysdate - '&num_days' and sysdate),

/* just statements that had at least 2 different plans during that time */
  sql_ids as 
   (select sql_id,
    count(distinct plan_hash_value) plancount
    from samples
    group by sql_id
    having count(distinct plan_hash_value) > 2),

/* per combination of sql_id and plan_hash_value, elapsed times per execution */
    plan_stats as 
     (select sql_id,
      plan_hash_value,
      count(snap_id) snap_count,
      max(end_interval_time) last_seen,
      sum(executions_delta) total_execs,
      sum(elapsed_time_delta) / sum(executions_delta) elapsed_per_exec_thisplan
      from sql_ids
      join samples
      using (sql_id)
      group by sql_id, plan_hash_value),

/* how much different is the elapsed time most recently encountered from other elapsed times in the measurement interval? */
      elapsed_time_diffs as 
       (select p.*,
        elapsed_per_exec_thisplan - first_value(elapsed_per_exec_thisplan)
          over(partition by sql_id order by last_seen desc) elapsed_per_exec_diff,
        (elapsed_per_exec_thisplan - first_value(elapsed_per_exec_thisplan)
          over(partition by sql_id order by last_seen desc)) / elapsed_per_exec_thisplan elapsed_per_exec_diff_ratio
        from plan_stats p),

/* consider just statements for which the difference is bigger than our configured threshold */
        impacted_sql_ids as 
         (select *
          from elapsed_time_diffs
          where abs(elapsed_per_exec_diff_ratio) > &threshold),

/* for those statements, get all required information */
          all_info as
           (select sql_id,
            plan_hash_value,
            snap_count,
            last_seen,
            round(elapsed_per_exec_thisplan / 1e6, 2) elapsed_per_exec_thisplan,
            round(elapsed_per_exec_diff / 1e6, 2) elapsed_per_exec_diff,
            round(100 * elapsed_per_exec_diff_ratio, 2) elapsed_per_exec_diff_pct,
            round(max(abs(elapsed_per_exec_diff_ratio))
              over(partition by sql_id), 2) * 100 max_abs_diff,
            round(max(elapsed_per_exec_diff_ratio) over(partition by sql_id), 2) * 100 max_diff,
            'select * from table(dbms_xplan.display_awr(sql_id=>''' || sql_id ||
            ''', plan_hash_value=>' || plan_hash_value || '));' xplan
            from elapsed_time_diffs
            where sql_id in (select sql_id from impacted_sql_ids))

/* format the output */
            select case sign(elapsed_per_exec_diff)
              when 0 then max_abs_diff ||
                case when max_abs_diff  max_diff then
                  '% worse than another plan found during the last 7 days'
                else
                  '% better than the next best plan during the last 7 days'
                end
             end status,
             a.sql_id,
             a.snap_count,
             to_char(last_seen, 'yy-mm-dd hh24:mi:ss') last_seen,
             to_char(a.elapsed_per_exec_thisplan, '999999.99') elapsed_per_exec_thisplan,
             to_char(a.elapsed_per_exec_diff, '999999.99') elapsed_per_exec_diff,
             to_char(a.elapsed_per_exec_diff_pct, '999999.99') elapsed_per_exec_diff_pct,
             xplan
             from all_info a
             order by sql_id, last_seen desc;

And this is how it might look (“pivoting” the output to make it readable here, and showing one statement only):

status                       	9073% worse than another plan found during the last 7 days 
sql_id   			698s47bt259sk
snap_count 			5
last_seen       		14-09-22  06:30:07
elapsed_per_exec_thisplan	315.74 
elapsed_per_exec_diff 		.00
elapsed_per_exec_diff_pct  	.00
xplan       			select * from table (dbms_xplan.dislay_awr(sql_id=>'698s47bt259sk', plan_hash_value=>3533318143));
                                                                                        
sql_id   			698s47bt259sk
snap_count 			5
last_seen       		14-09-22  06:30:07
elapsed_per_exec_thisplan	3.44 
elapsed_per_exec_diff 		-312.29
elapsed_per_exec_diff_pct  	-9072.77
xplan       			select * from table (dbms_xplan.dislay_awr(sql_id=>'698s47bt259sk', plan_hash_value=>3837578073));

Of course, the script would have to be adapted if you had to deal with statements using different literals on each execution (which was not the case for my usecase this time).

Advertisements