Mining SQL Trace Files Using External Tables Part 1: Getting the Timeline

Why external tables?

SQL trace files and external tables – at first glance, you wouldn’t think they’re a good match (or so I guess).
Given their structure, SQL trace files seem best suited for being processed serially, line by line, by a procedural or functional programming language. However,once you choose a profiler – be it tkprof or any of the much more sophisticated ones – you’re stuck with the output it produces; you cannot ask it just any question.
This, together with the fascination (I admit) of the preprocessor option, made me want to try out how far I’d get with this “set approach” ;-).

Of course, even though the idea itself might be intriguing, the whole thing must be practical; which means there should be a one-time setup of the infrastructure (shell scripts, table definitions) and no work required when analysing a different trace file apart from just changing the table’s location. So no data loads; everything that works in sqlldr but not in external tables cannot be used. Secondly, keep it simple; sophistication can always be added later, in SQL, when querying the data.

Base tables

For starters, I’ll restrict myself to extracting 4 kinds of information contained in a trace file:
– the cursor information itself, starting with PARSING IN CURSOR #
– cursor close information, starting with CLOSE #
– the database call (PARSE/EXECUTE/FETCH) and
– waits, identified by WAIT #

Each of these entities will end up in its own table; each of these tables will have the trace file’s line number (see below why), the 15-digit cursor number and the timestamp (tim) from the trace file.

As we can’t use sqlldr’s amazing transformation capabilities, data extraction will have to be done by short shell scripts (perl scripts, actually, in order to avoid long pipes of unix commands). For simplicity, here I’m putting scripts and trace files in the same directory, trace_dir, which I have set to /home/oracle/trace.

Cursors

Let’s start, then! First, the table CURSOR will store sql_id, sql_text etc. information:

drop table cursor;
create table cursor (
  lineno number,
  crsr number,
  crsr_length number,
  call_depth number,
  parsing_user_id number,
  command_type number,
  parsing_schema_id number,
  tim number,
  hash_value number,
  address varchar2(8),
  sqlid varchar2(13),
  sql_text varchar2(4000)
  )
  organization external
    (
    type oracle_loader
    default directory trace_dir
    access parameters
      (
        records delimited by newline
        preprocessor trace_dir:'cursor.sh'
        logfile trace_dir:'cursor.log'
        badfile trace_dir:'cursor.bad'
        fields terminated by '!' optionally enclosed by "'"
        (
          lineno char,          
          crsr char,
          crsr_length char,
          call_depth char,
          parsing_user_id char,
          command_type char,
          parsing_schema_id char,
          tim char,
          hash_value char,
          address char,
          sqlid char,
          sql_text char(4000)
        )
      )
    location (trace_dir:'')
  );  

The shell script that extracts and prepares the required information looks like this:

[oracle@node1 trace]$ cat cursor.pl
#!/usr/bin/perl
use strict;
use warnings;
while () {
  if (/^PARSING IN CURSOR/../END OF STMT/) {
    if (/^PARSING IN CURSOR/) {
      s/^PARSING IN CURSOR \#//;
      s/ [a-z]+=/!/g;
      s/\n$/!/;
      $_ = "$.!$_";
    }
    unless (/^END OF STMT/) {
      print;
    }
  }
}

Database calls

In the same way, information about database calls (restricted to parses, executes and fetches, for now) is extracted:

drop table call;
create table call
  (
    lineno number,
    call_type varchar2(5),
    crsr number,
    cpu number,
    elapsed number,
    phys_reads number,
    cons_reads number,
    cur_reads number,
    misses number,
    num_rows number,
    call_depth number,
    opt_goal number,
    ph_value number,
    tim number
  )
  organization external
  (
    type oracle_loader
    default directory trace_dir
    access parameters
      (
        records delimited by newline
        preprocessor trace_dir:'call.sh'
        logfile trace_dir:'call.log'
        badfile trace_dir:'call.bad'
        fields terminated by ','
        (
          lineno char,
          call_type char,
          crsr char,
          cpu char,
          elapsed char,
          phys_reads char,
          cons_reads char,
          cur_reads char,
          misses char,
          num_rows char,
          call_depth char,
          opt_goal char,
          ph_value char,
          tim char 
        )
    )
    location (trace_dir:'')
  );
[oracle@node1 trace]$ cat call.pl
#!/usr/bin/perl
use strict;
use warnings;
while () {
  if (/^(PARSE|EXEC|FETCH) \#/) {
      s/^(PARSE|EXEC|FETCH) \#(.+)/$1,$2/;
      s/[a-z]+=//g;
      s/:/,/g;
      print "$.,$_";
    }
}

Waits

Next, for the waits. Parameters end up nicely in their own columns:

drop table wait;
create table wait (
  lineno number,
  crsr number,
  event varchar2(64),
  elapsed number,
  p1name varchar2(64),
  p1value number,
  p2name varchar2(64),
  p2value number,
  p3name varchar2(64),
  p3value number,
  obj number,
  tim number
  )
  organization external
  (
    type oracle_loader
    default directory trace_dir
    access parameters
      (
        records delimited by newline
        preprocessor  trace_dir:'wait.sh'
        logfile trace_dir:'wait.log'
        badfile trace_dir:'wait.bad'
        fields terminated by ',' optionally enclosed by "'"
         (
          lineno char,
          crsr char,
          event char,
          elapsed char,
          p1name char,
          p1value char,
          p2name char,
          p2value char,
          p3name char,
          p3value char,
          obj_ char,
          obj char,
          tim char 
         )
       )
    location ('')
  );  
[oracle@node1 trace]$ cat wait.pl 
#!/usr/bin/perl
use strict;
use warnings;
while () {
  if (/^WAIT \#/) {
      s/^WAIT \#//;
      s/(\d+): nam=(.+) ela= (\d+) ([^=]+)=(\d+) ([^=]+)=(\d+) ([^=]+)=(\d+) ([^=]+)=(-?\d+) tim=(\d+)/$1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12/;
      print "$.,$_";
    }
}

Cursor closes

Finally (for today), we have the table storing cursor close information:

drop table cursor_close;
create table cursor_close (
  lineno number,
  crsr number,
  cpu number,
  elapsed number,
  call_depth number,
  type number,
  tim number )
  organization external
  (
    type oracle_loader
    default directory trace_dir
    access parameters
      (
        records delimited by newline
        preprocessor  trace_dir:'cursor_close.sh'
        logfile trace_dir:'cursor_close.log'
        badfile trace_dir:'cursor_close.bad'
        fields terminated by ',' 
         (
           lineno char,
           crsr char,
           cpu char,
           elapsed char,
           call_depth char,
           type char,
           tim char
         )  
       )
    LOCATION ('')
  ); 
[oracle@node1 trace]$ cat cursor_close.pl 
#!/usr/bin/perl
use strict;
use warnings;
while () {
  if (/^CLOSE \#/) {
      s/^CLOSE \#//;
      s/[a-z]+=//g;
      s/:/,/g;
      print "$.,$_";
    }
}

To actually select a trace file for analysis, we just alter the external tables’ locations, which can easily be accomplished with the following procedure:

create or replace procedure changeloc (newloc in varchar2)
is
  tablist sys.odcivarchar2list := sys.odcivarchar2list('cursor','cursor_close','call','wait');
begin
  for i in tablist.first .. tablist.last loop
    execute immediate 'alter table ' || tablist(i) || ' location (''' || newloc || ''')';
  end loop;
end;
/

The Timeline – Step 1

With the raw data extracted, the essential step is organizing it so it can be queried without too much thinking!
First, let’s get an overview of how cursors are being opened and closed:

create or replace view timeline_raw as
  select * from (
    select lineno, 'OPEN' call_type, sqlid, crsr, tim, call_depth, null close_type from cursor
    union all 
    select lineno, 'PARSE', null, crsr, tim, call_depth, null from call where call_type='PARSE'
    union all
    select lineno, 'CLOSE', null, crsr, tim, call_depth, type from cursor_close)
  order by lineno;

For example:

SQL> col crsr for 999999999999999
SQL> col tim for 99999999999
SQL> select * from timeline_raw fetch next 30 rows only;

    LINENO CALL_ SQLID			   CRSR        TIM CALL_DEPTH CLOSE_TYPE
---------- ----- ------------- ---------------- ---------- ---------- ----------
	27 OPEN  31wf1v9f4ags6	140411772197800 4624635824	    0
	35 CLOSE		140411772197800 4678444836	    0	       0
	37 OPEN  g9p1d48c5dsuq	140411772197800 4678446274	    0
	40 PARSE		140411772197800 4678446272	    0
	51 CLOSE		140411772197800 4696606742	    0	       0
	53 OPEN  2hy64qgq10kb8	140411772197800 4696607776	    0
	56 PARSE		140411772197800 4696607775	    0
	67 CLOSE		140411772197800 4750298461	    0	       0
	69 OPEN  5dqx2k1st1uxa	140411772197800 4750298879	    0
	72 PARSE		140411772197800 4750298878	    0
	76 OPEN  1p5grz1gs7fjq	140411771983304 4750303279	    1
	99 CLOSE		140411771983304 4750343695	    1	       3
       121 CLOSE		140411771983304 4750369440	    1	       3
       142 CLOSE		140411771983304 4750386565	    1	       3
       144 OPEN  1mjd9xp80vuqa	140411772838712 4750386665	    1
       158 CLOSE		140411772838712 4750427684	    1	       3
       160 OPEN  3un99a0zwp4vd	140411772481520 4750428600	    1
       163 PARSE		140411772481520 4750428599	    1
       187 CLOSE		140411772481520 4750472154	    1	       0
       189 OPEN  8swypbbr0m372	140411772481520 4750472732	    1
       192 PARSE		140411772481520 4750472732	    1
       205 CLOSE		140411772481520 4750556542	    1	       0
       215 OPEN  616m6uhpa2usu	140411772181736 4750564487	    2
       218 PARSE		140411772181736 4750564485	    2
       223 OPEN  96g93hntrzjtr	140411772237656 4750570678	    3
       226 PARSE		140411772237656 4750570676	    3
       243 CLOSE		140411772237656 4750598219	    3	       3
       245 OPEN  7u49y06aqxg1s	140411766762152 4750599062	    3
       248 PARSE		140411766762152 4750599061	    3
       273 CLOSE		140411766762152 4750615749	    3	       3

30 rows selected.

The important thing here is having the line numbers, which had been added by the preprocessor script – having the tim values is not enough! (Probably when you write a tracefile parser in a procedural or functional language and process the file line by line anyway, you don’t much think about it – but using sql makes you reflect on it).
Compare, for example, these lines from the output:

160 OPEN  3un99a0zwp4vd	140411772481520 4750428600	    1
163 PARSE		140411772481520 4750428599	    1

If we had just the tim values (next to last column), the PARSE line would appear before the OPEN, which would make it impossible to infer the correct sql_id for it (see following).

The Timeline – Step 2

Now, while this reflects the parse flow, it doesn’t show for which sqlid a cursor is being closed – there simply is no sqlid in the cursor close lines. To fill in the missing information, we have to match cursor open and cursor close lines – which is not completely straightforward at first glance, because the 15-digit cursor numbers may be reused as soon as their relative cursors are closed.
Analytic functions (and the line number) to the rescue ;-):

create or replace view timeline (lineno, call_type, sqlid, crsr,tim, call_depth, close_type) 
 as select
  lineno,
  call_type,
  case when sqlid is not null then sqlid else lag(sqlid) ignore nulls over (partition by crsr order by lineno) end sqlid,
  crsr,
  tim,
  call_depth,
  close_type
 from timeline_raw order by lineno;

Which gives us the complete timeline of when cursors were opened and closed:

SQL> select * from timeline fetch next 30 rows only;


    LINENO CALL_ SQLID			   CRSR        TIM CALL_DEPTH CLOSE_TYPE
---------- ----- ------------- ---------------- ---------- ---------- ----------
	27 OPEN  31wf1v9f4ags6	140411772197800 4624635824	    0
	35 CLOSE 31wf1v9f4ags6	140411772197800 4678444836	    0	       0
	37 OPEN  g9p1d48c5dsuq	140411772197800 4678446274	    0
	40 PARSE g9p1d48c5dsuq	140411772197800 4678446272	    0
	51 CLOSE g9p1d48c5dsuq	140411772197800 4696606742	    0	       0
	53 OPEN  2hy64qgq10kb8	140411772197800 4696607776	    0
	56 PARSE 2hy64qgq10kb8	140411772197800 4696607775	    0
	67 CLOSE 2hy64qgq10kb8	140411772197800 4750298461	    0	       0
	69 OPEN  5dqx2k1st1uxa	140411772197800 4750298879	    0
	72 PARSE 5dqx2k1st1uxa	140411772197800 4750298878	    0
	76 OPEN  1p5grz1gs7fjq	140411771983304 4750303279	    1
	99 CLOSE 1p5grz1gs7fjq	140411771983304 4750343695	    1	       3
       121 CLOSE 1p5grz1gs7fjq	140411771983304 4750369440	    1	       3
       142 CLOSE 1p5grz1gs7fjq	140411771983304 4750386565	    1	       3
       144 OPEN  1mjd9xp80vuqa	140411772838712 4750386665	    1
       158 CLOSE 1mjd9xp80vuqa	140411772838712 4750427684	    1	       3
       160 OPEN  3un99a0zwp4vd	140411772481520 4750428600	    1
       163 PARSE 3un99a0zwp4vd	140411772481520 4750428599	    1
       187 CLOSE 3un99a0zwp4vd	140411772481520 4750472154	    1	       0
       189 OPEN  8swypbbr0m372	140411772481520 4750472732	    1
       192 PARSE 8swypbbr0m372	140411772481520 4750472732	    1
       205 CLOSE 8swypbbr0m372	140411772481520 4750556542	    1	       0
       215 OPEN  616m6uhpa2usu	140411772181736 4750564487	    2
       218 PARSE 616m6uhpa2usu	140411772181736 4750564485	    2
       223 OPEN  96g93hntrzjtr	140411772237656 4750570678	    3
       226 PARSE 96g93hntrzjtr	140411772237656 4750570676	    3
       243 CLOSE 96g93hntrzjtr	140411772237656 4750598219	    3	       3
       245 OPEN  7u49y06aqxg1s	140411766762152 4750599062	    3
       248 PARSE 7u49y06aqxg1s	140411766762152 4750599061	    3
       273 CLOSE 7u49y06aqxg1s	140411766762152 4750615749	    3	       3

Sanity Checks

Let’s do some sanity checks. The example trace file was recorded for issuing the statement
select owner, object_name from dba_objects where object_id between :low and :high
three times in a row, with different binds each.
This is how that statement appears in the view:

SQL> select * from timeline where sqlid='5dqx2k1st1uxa';

    LINENO CALL_ SQLID			   CRSR        TIM CALL_DEPTH CLOSE_TYPE
---------- ----- ------------- ---------------- ---------- ---------- ----------
	69 OPEN  5dqx2k1st1uxa	140411772197800 4750298879	    0
	72 PARSE 5dqx2k1st1uxa	140411772197800 4750298878	    0
     10593 CLOSE 5dqx2k1st1uxa	140411772197800 4765329851	    0	       0
     10635 OPEN  5dqx2k1st1uxa	140411772197800 4771858794	    0
     10638 PARSE 5dqx2k1st1uxa	140411772197800 4771858792	    0
     10698 CLOSE 5dqx2k1st1uxa	140411772197800 4777876691	    0	       0
     10734 OPEN  5dqx2k1st1uxa	140411772197800 4784680278	    0
     10737 PARSE 5dqx2k1st1uxa	140411772197800 4784680276	    0
     10797 CLOSE 5dqx2k1st1uxa	140411772197800 4796212667	    0	       1

So we see the cursor was parsed 3 times; twice it was hard closed and the third time, it was put into the cursor cache (which fits the official documentation).

How about the (according to tkprof, most frequently parsed) statement in the trace file?
What tkprof tells us is it was parsed, executed as well as fetched 47 times each:

SQL ID: 04kug40zbu4dm Plan Hash: 4006480256

select policy#, action# 
from
 aud_object_opt$ where object# = :1 and type = 2


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse       47      0.00       0.00          0          0          0           0
Execute     47      0.00       0.01          0          0          0           0
Fetch       47      0.00       0.00          0        423          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total      141      0.01       0.02          0        423          0           0

Misses in library cache during parse: 2
Misses in library cache during execute: 2
SQL> select call_type, close_type, count(1) from timeline where sqlid='04kug40zbu4dm' group by call_type, close_type order by decode(call_type, 'OPEN', 1, 'PARSE', 2, 'CLOSE', 3), 2;

CALL_ CLOSE_TYPE   COUNT(1)
----- ---------- ----------
OPEN			  6
PARSE			 47
CLOSE	       0	  4
CLOSE	       1	  2
CLOSE	       3	 41

So we see the 47 parses confirmed, plus the cursor was hard closed 4 times, put “afresh” into the cursor cache 2 times, and put back there “ever after” 😉

As a last check, let’s look at a statement for which tkprof reports 0 parses:

SQL ID: 1p5grz1gs7fjq Plan Hash: 813480514

select obj#,type#,ctime,mtime,stime, status, dataobj#, flags, oid$, spare1, 
  spare2, spare3, signature, spare7, spare8, spare9 
from
 obj$ where owner#=:1 and name=:2 and namespace=:3 and remoteowner is null 
  and linkname is null and subname is null


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        0      0.00       0.00          0          0          0           0
Execute      9      0.00       0.00          0          0          0           0
Fetch        9      0.00       0.11          6         35          0           8
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total       18      0.00       0.12          6         35          0           8

Misses in library cache during parse: 0
Misses in library cache during execute: 2

Looking at the timeline:

SQL> select * from timeline where sqlid='1p5grz1gs7fjq';

    LINENO CALL_ SQLID			   CRSR        TIM CALL_DEPTH CLOSE_TYPE
---------- ----- ------------- ---------------- ---------- ---------- ----------
	76 OPEN  1p5grz1gs7fjq	140411771983304 4750303279	    1
	99 CLOSE 1p5grz1gs7fjq	140411771983304 4750343695	    1	       3
       121 CLOSE 1p5grz1gs7fjq	140411771983304 4750369440	    1	       3
       142 CLOSE 1p5grz1gs7fjq	140411771983304 4750386565	    1	       3
      5975 CLOSE 1p5grz1gs7fjq	140411771983304 4752345230	    2	       3
      6167 CLOSE 1p5grz1gs7fjq	140411771983304 4752374262	    2	       3
      6353 CLOSE 1p5grz1gs7fjq	140411771983304 4752400269	    3	       3
      7134 CLOSE 1p5grz1gs7fjq	140411771983304 4753627638	    3	       3
      7505 CLOSE 1p5grz1gs7fjq	140411771983304 4753804698	    3	       3
      7969 CLOSE 1p5grz1gs7fjq	140411771983304 4754168121	    3	       3

… we see that in this case, already the first time the cursor was closed it was put back into the cursor cache.
Still, we have seen above that even a “softer soft parse”, which already finds the cursor in the cache, should make a PARSE line appear in the trace file. I suspect that with sql trace info alone it might not be possible to give a satisfying explanation here.

Coming next

Anyway, we’ve seen how to get the flow of the sql statements contained in the trace file, which opens up an enormous space for further explorations :-).
In the next installments, we’ll see how to make use of the various types of information contained in a trace file.

Advertisements

3 thoughts on “Mining SQL Trace Files Using External Tables Part 1: Getting the Timeline

    • Hi Nikolay,

      thanks for your comment! I think both approaches have their respective merits – I find yours too very interesting. In my case though, part of the incentive was getting to use the preprocessor feature 🙂

      In future posts, I’d like to cover both classical applications and stuff less supported by classical profilers, too – but of course the same thing you could do with regexp functions!

      Ciao
      Sigrid

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s