Deadlock parser – parsing lmd0 trace files

Does “your” application produce deadlocks often? Or to phrase it more diplomatically, a little bit … well, too often, from time to time at least?
If so, you might want to provide developers with useful debug information – what sql was executed, where from, etc.
However, in RAC, the lmd0 trace files, which log the deadlocks, don’t give away information lightly – in 11g at least (things are different in 12c, and to be precise, I am talking about 11.2.0.3 here only, as I do not currently have access to an 11.2.0.4 instance).

Basically, what you need is the lmd0 trace files from all instances, and then, to combine information from three sections of data:

(1) the deadlock graph (“Wait-For-Graph”) , which is written in one instance only (per deadlock), looks like this …

Global Wait-For-Graph(WFG) at ddTS[0.b650] :
BLOCKED 0x74a6d83f0 5 wq 2 cvtops x1 TX 0x3e0003.0x7b225(ext 0x6,0x0)[9A000-0004-0000001C] inst 4 
BLOCKER 0x74ede07b8 5 wq 1 cvtops x28 TX 0x3e0003.0x7b225(ext 0x6,0x0)[AC000-0003-000001B4] inst 3 
BLOCKED 0x74edebdf8 5 wq 2 cvtops x1 TX 0xd00020.0x41ea(ext 0x39,0x0)[AC000-0003-000001B4] inst 3 
BLOCKER 0x6fbc1f388 5 wq 1 cvtops x28 TX 0xd00020.0x41ea(ext 0x39,0x0)[9A000-0004-0000001C] inst 4 

*** 2014-03-12 10:36:49.760
* Cancel deadlock victim lockp 0x74a6d83f0 

… and allows linking to – via the resource name (here, e.g., “TX 0x3e0003.0x7b225″) – (2) a section containing resource information (including the granted queue and the convert queue, at the bottom) …

Global blockers dump start:---------------------------------
DUMP LOCAL BLOCKER/HOLDER: block level 5 res [0x3e0003][0x7b225],[TX][ext 0x6,0x0]
----------resource 0x75dddead8----------------------
resname       : [0x3e0003][0x7b225],[TX][ext 0x6,0x0]
hash mask     : x3
Local inst    : 4
dir_inst      : 3
master_inst   : 3
hv idx        : 124
hv last r.inc : 65
current inc   : 65
hv status     : 0
hv master     : 3
open options  : dd 
Held mode     : KJUSERNL
Cvt mode      : KJUSEREX
Next Cvt mode : KJUSERNL
msg_seq       : 0x1
res_seq       : 23
grant_bits    : KJUSERNL 
grant mode    : KJUSERNL  KJUSERCR  KJUSERCW  KJUSERPR  KJUSERPW  KJUSEREX
count         : 1         0         0         0         0         0
val_state     : KJUSERVS_NOVALUE
valblk        : 0xf0655697ff7f00000000000000000000 .eV
access_inst   : 3
vbreq_state   : 0
state         : x8
resp          : 0x75dddead8
On Scan_q?    : N
Total accesses: 509
Imm.  accesses: 440
Granted_locks : 0 
Cvting_locks  : 1 
value_block:  f0 65 56 97 ff 7f 00 00 00 00 00 00 00 00 00 00
GRANTED_Q :
CONVERT_Q: 
lp 0x74a6d83f0 gl KJUSERNL rl KJUSEREX rp 0x75dddead8 [0x3e0003][0x7b225],[TX][ext 0x6,0x0]
  master 3 gl owner 0x75d451a00 possible pid 11039 xid 9A000-0004-0000001C bast 0 rseq 23 mseq 0 history 0x495149da
  convert opt KJUSERGETVALUE  

… and, via the lock address (here, e.g., “0x74a6d83f0″), to (3) a section containing lock details, such as grant and request level, the sid, the OS user and oracle username, as well as the client machine and, most interestingly for the application developer, the SQL:

----------enqueue 0x74a6d83f0------------------------
lock version     : 14509
Owner inst       : 4
grant_level      : KJUSERNL
req_level        : KJUSEREX
bast_level       : KJUSERNL
notify_func      : (nil)
resp             : 0x75dddead8
procp            : 0x75d9c7eb8
pid              : 11039
proc version     : 36
oprocp           : (nil)
opid             : 11039
group lock owner : 0x75d451a00
possible pid     : 11039
xid              : 9A000-0004-0000001C
dd_time          : 10.0 secs
dd_count         : 1
timeout          : 0.0 secs
On_timer_q?      : N
On_dd_q?         : Y
lock_state       : OPENING CONVERTING 
ast_flag         : 0x0
Open Options     : KJUSERDEADLOCK 
Convert options  : KJUSERGETVALUE 
History          : 0x495149da
Msg_Seq          : 0x0
res_seq          : 23
valblk           : 0x566a560a000000000000646100000000 VjVda
user session for deadlock lock 0x74a6d83f0
  sid: 1686 ser: 50887 audsid: 5210592 user: 110/APPUSER
    flags: (0x41) USR/- flags_idl: (0x1) BSY/-/-/-/-/-
    flags2: (0x40009) -/-/INC
  pid: 154 O/S info: user: oracle, term: UNKNOWN, ospid: 11039
    image: oracle@inst3
  client details:
    O/S info: user: someosuser, term: unknown, ospid: 1234
    machine: server1.our-dmz.de program: JDBC Thin Client
  current SQL:
  update tab1 set the_user='somename' where the_type='sometype' and the_date=TO_TIMESTAMP ('2014-02-27 00:00:00.0', 'YYYY-MM-DD HH24:MI:SS.F
F')
DUMP LOCAL BLOCKER: initiate state dump for DEADLOCK
  possible owner[154.11039] on resource TX-003E0003-0007B225

Now surely you can’t just hand the developers these lmd0 traces, so it seemed to make sense to write a parser for them, which outputs the type of information a developer would like to see (I guess ;-)). (I’m stressing the word developer here, because a DBA might be interested in additional information, like the grant and convert queues, information that is currently parsed, but not printed by the parser.)

The output from the parser looks like this:

------------------------------------------
***              Deadlocks             ***
------------------------------------------
First in tracefiles:   2014-03-10 12:16:17
Last in tracefiles:    2014-04-01 03:18:34
Deadlocks encountered: 22
------------------------------------------

Deadlock at: 2014-03-12 10:36:49
--------------------------------
Blockers:
  Address: 74ede07b8    [Resource: 0x3e0003-0x7b225 TX]
  Session id: 677
  User: appuser
  Machine: server1.our-dmz.de
  Current SQL: update tab1 set the_user='somename' where the_type='sometype' and the_date=TO_TIMESTAMP ('2014-02-27 00:00:00.0', 'YYYY-MM-DD HH24:MI:SS.F
F')

  Address: 6fbc1f388    [Resource: 0xd00020-0x41ea TX]
  Session id: 1686
  User: appuser
  Machine: server1.our-dmz.de
  Current SQL: update tab2 set the_info='someinfo' where the_id=5763836

Blocked:
  Address: 74a6d83f0    [Resource: 0x3e0003-0x7b225 TX]
  Session id: 1686
  User: appuser
  Machine: server2.our-dmz.de
  Current SQL: update tab2 set the_info='someinfo' where the_id=5763836

  Address: 74edebdf8    [Resource: 0xd00020-0x41ea TX]
  Session id: 677
  User: appuser
  Machine: server1.our-dmz.de
  Current SQL: update tab1 set the_user='somename' where the_type='sometype' and the_date=TO_TIMESTAMP ('2014-02-27 00:00:00.0', 'YYYY-MM-DD HH24:MI:SS.F
F')

To be honest, this is an uncommonly neat example, at least judging from the “real life” tracefiles I’ve been using for testing purposes. For other deadlocks, the output can be much more difficult to understand.
For one thing, detail information (like SQL, machine …) might be missing for one or several enqueues. Second, especially when TM locks are involved, it might be difficult to understand what’s really happened.
This is where knowledge about the application should help, and the current output generated by the parser is, in fact, intended to be used by developers who would match it to the application server logs.

On the other hand, the parser could be extended to display information that is relevant to the DBA, or in fact produce different output dependent on command line flags. (Currently I am, for example, parsing the grant and convert queues, but not displaying that to the user.)

If anyone would like to try the parser, which is written in Haskell, the source is on GitHub (https://github.com/skeydan/deadlockparser/blob/master/src/Main.hs). I can also provide a compiled version – so you don’t need to be running Haskell, which most probably you aren’t ;-) – if you’re using 64-bit Linux.

For more information on deadlocks, I think the must-reads (as always, of course) are the “deadlock series” by Jonathan Lewis, especially the classic

http://jonathanlewis.wordpress.com/2013/02/22/deadlock-detection/

that points out the importance of how the application handles the situation (so as a DBA, when you see all those deadlocks occurring, you might well ask the developers how the application handles the ORA-00060, instead of just leaning back knowing this does not cause a database problem … ;-) )

About these ads

2 thoughts on “Deadlock parser – parsing lmd0 trace files

  1. I would think this parse quite useful. Tracing deadlock on RAC when compared to single instance Oracle, is pretty much broken.
    I will be requesting a compiled version of this. :)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s