Finding the blocking SQL in a lock wait

July 25th, 2014

One of my pet peeves on Oracle is the inability to find out what SQL took out a lock that another user is waiting. It’s easy to find the waiting user and their SQL with v$session by looking at v$session.event where the event is an “enqueue” (v8 and v9) or “enq: TX – row lock contention” and then looking up their SQL via the v$session.sql_hash_value which joins to v$sql.hash_value for the v$sql.sql_text.

So far so good and easy.
Second step of finding the blocker is really easy starting in 10g because Oracle has a new field v$session.blocking_session which can be joined back to v$session.sid to find information on that user.
The rub is that there is no way to find the SQL text that the blocking session ran that took out the original blocking lock.
For the 2 day course I teach on Active Session History (ASH) and Oracle wait events, I wanted to show students how to actually get the blocking SQL text if they really had to.
I went as far as looking at log miner to try and get the blocking SQL text and this works sometimes and sometimes it doesn’t. At that point I gave up, knowing the next step was dumping the redo logs which was more research than I felt like doing at the time.Luckily someone has picked up the torch – Doug Burns!
On the Oaktable email list I shared my research with Doug and Doug took it even farther and posted it on his blog:
Long story short, the best way to try and see what changed (when there was a change and not a “select for update”) to cause the lock is to use flashback information. For example if or contention table was TEST_TAB and our field that we knew was modified “VAL1″ then we could try to find what it was changed from:

Session 1

update test_tab set val1=’aa’ where id=1;

Session 2

update test_tab set val1=’aaa’ where id=1;

Blocking info from ASH where wait is enq: TX – row lock contention

select
      distinct
      lpad(ltrim(to_char(p2,'XXXXXX')),6,'0')||'00'||
      ltrim(to_char(mod(p3,256),'XX'))||
      ltrim(to_char(trunc(p3/256),'XX'))||'0000'
      block_xid,
      to_char(p2,'XXXXXXXX') p2hex,
      to_char(p3,'XXXXXXXX') p3hex,
      trunc(p2/65536) usn,
      mod(p2,65536) slot,
      p3 sqn, xid wait_xid
from v$active_session_history
where event like 'enq: T%'
and sample_time > sysdate - &v_minutes/(60*24)
;

BLOCK_XID	      P2HEX     P3HEX	    USN         SLOT      SQN  WAIT_XID
----------------  --------- --------- ---------- ---------- ---------- ----------------
0A0001007264000       A0001      6472	      10          1      25714

Data from flashback, after session 1 commits (before the commit there is no data returned)

SELECT        VERSIONS_XID
       ,      VERSIONS_STARTTIME
       ,      VERSIONS_ENDTIME
       ,      VERSIONS_STARTSCN
       ,      VERSIONS_ENDSCN
       ,      VERSIONS_OPERATION
       ,      id
       ,      val1
       FROM   TEST_TAB 
              VERSIONS BETWEEN
              TIMESTAMP MINVALUE AND MAXVALUE
     where VERSIONS_XID=HEXTORAW('0A0001007264000')
      ORDER  BY  VERSIONS_STARTTIME
/
VERSIONS_XID	 VERSIONS_STARTTIME    ENDTIME STARTSCN  ENDSCN  V          VAL1
---------------- --------------------- ------- -------- -------  - --------------
0A00010072640000 15-OCT-13 06.46.30 PM         17042888	         U            aa


Now that’s not the blocking SQL but at least you can see what the value of the field was that the blocker changed it to, so you can guess to some degree what the actual SQL was. Not great, but better than nothing.


Uncategorized

  1. Trackbacks

  2. No trackbacks yet.
  1. Comments

  2. Noons
    July 26th, 2014 at 09:06 | #1

    I find the view dba_waiters very useful for this.
    It tells me which sessions, then with OEM it’s childplay to get the SQL.

  3. July 27th, 2014 at 10:10 | #2

    Noons,

    It’s only easy if you’re lucky and the application makes it obvious.

    It’s fairly trivial to show that your easy solution can give you no answer at all ( http://jonathanlewis.wordpress.com/2009/04/19/locking-sql/ ), and only slightly less trivial to give the wrong answer.

    Kyle missed one last posting from Doug which quotes Graham Wood’s explanation why it s not inherently possible to give a guarantee that you’ve found the blocking statement: http://oracledoug.com/serendipity/index.php?/archives/1495-Diagnosing-Locking-Problems-using-ASHLogMiner-The-End.html

  4. Noons
  5. July 29th, 2014 at 08:52 | #4

    Noons,

    I have no problem with your approach – and if your initial response to Kyle’s post had been more like: “I don’t bother looking for the specific SQL I just kill the session so all I need is dba_waiters” then I wouldn’t have felt the need to comment.

  6. Noons
    July 30th, 2014 at 03:06 | #5

    Actually, I do get the sql via OEM. The “Blocking sessions” screen in my case lets me track it. Mostly because in the particular case of the application in question, it was designed and written in java by someone who doesn’t understand set-at-a-time processing. With the result that 999 times out of 1000 the SQL that causes the lock is the same as the one waiting andwhich is shown by OEM: same jdbc program, different connections.
    Had this been our DW where most of the stuff is written in-house in a real language (not java), it’d be slightly more difficult as you point out.
    Good thing though that the guys writing our DW PL/SQL know what they are doing! And if they don’t, I can just stand-up and look at them with a stern face! :)

  7. July 30th, 2014 at 10:26 | #6

    Noons,

    Thanks for the follow-up – it emphasises, but makes more concrete, my comment from earlier on in the thread:
    “It’s only easy if you’re lucky and the application makes it obvious.”

    (Possibly the “and” should be an “or”, or even an “and/or”; and I suspect that many people get lucky enough most of the time – possibly without realising it – because it is often sufficient to be just a little bit lucky.)

  8. July 31st, 2014 at 02:46 | #7

    Absolutely. However, what I wish should happen is Oracle should pull their heads off the “aaS” marketing nonsense – which achieves noting for current customers – and actually deliver features to stop users having to rely on luck so many times.

    Using this example (I’m likely using the wrong terminology, but this springs to mind): an inbuilt, reliable mechanism to display which SQL is taking part in a lock, both locking and locked side?

    Seems to me this would be eminently more useful and sensible than expecting folks to pour through 3rd party blogs to find scripts to do what should be inbuilt into the product in the first place. And less “bad dba tut-tuting” when they find that the vast majority of said “bad dbas” are actually VERY busy and are not paid to browse blogs, forums, or “communities”.

    The list of small enhancements like this that would truly be useful and which Oracle has totally ignored over many, many years is incredibly long. I mentioned some of them in my blog many years ago and to my surprise a few have been answered in recent releases. But they are a very small portion of the sum total I know of.

    Unfortunately instead of truly useful improvements we keep getting even more of the “aaS” nonsense. And now there is even religion thrown in…

    But that’s got nothing to do with this!
    Thanks heaps for the (as usual) immensely useful script pointers.

  9. khailey
    July 31st, 2014 at 19:58 | #8

    @Noons, @Jonathan : thanks for all the lively discussion. It’s frustrating all around that technology advances that make sense at Oracle don’t get implemented. On the other hand Oracle is far ahead of every other database in the realm of diagnostics. Oracle’s wait events started way before other databases and are still much richer than any other database. I don’t know of any other database that has active session history (ASH) despite the implementation of ASH being quite feasible on say SQL Server, Sybase and DB2. We actually implemented it ourselves in DB Optimizer for all 4 databases.
    The market is ultimately revenue driven and thus improving the analysis of small performance issues for a DBAs is pretty far down the totem pole. It’s much more effective for all to push for changes that have huge impact on the results for a company and despite my interest in tactical tuning issues, that’s why I’ve instead concentrated these past few years on how to improve IT performance at the enterprise level.

  10. August 1st, 2014 at 06:37 | #9

    I agree with the general principle, but in practice a closer look at the problem often shows that it is much harder that it first seems.

    In this case you would need to store the sql_id and rowid of every row modified – do you do that in memory, or write the SQL_ID to the row in the data block ? (You can’t write it to the ITL because a single transaction may use different statements to modify different rows in the same block.) Then what do you do if a transaction uses 3 different statements to update the same row – bet Noons has seen ETL programs that do that – do you keep the first, the last, or all three SQL_IDs ?

  11. August 1st, 2014 at 06:39 | #10

    I agree with the general principle, but in practice a closer look at the problem often shows that it is much harder that it first seems.

    In this case you would need to store the sql_id and rowid of every row modified – do you do that in memory, or write the SQL_ID to the row in the data block ? (You can’t write it to the ITL because a single transaction may use different statements to modify different rows in the same block.) Then what do you do if a transaction uses 3 different statements to update the same row – bet Noons has seen ETL programs that do that – do you keep the first, the last, or all three SQL_IDs ?


8 − = six