Edit | Attach | New | Raw | Delete | History | Print | Tools

Preventing Deadlocks

Transaction-Specific Threads

Each transaction coordinator has a number of threads that act on it:

ThreadSorted ascending Description
LogAdministrator GUI threads If you have any log administrators then these will inspect the coordinator(s) in a GUI-spawned thread.
The TransactionService thread The thread that starts and stops the transaction service (usually the main thread of the application).
The application thread The thread that starts the transaction, does work within its scope and then terminates the transaction.
The timer thread The thread responsible for triggering rollback due to timeout.
The two-phase commit thread Two-phase termination is done by a separate thread (unless you explicitly disable this). Imported transactions will have a subordinate coordinator in the transaction service, and this subordinate coordinator will be called by the commit threads of the remote parent transaction. In the case of recursive calls between virtual machines, this will even lead to re-entrant commits (the parent coordinator is in turn a subordinate of its own subordinates).

What Causes Deadlock?

In Java, deadlocks are most commonly caused by synchronized blocks of code. In particular, deadlock can happen if all of the following hold:

  • One thread, say threadA, holds a lock on some object called objectA (i.e., is in a synchronized block of code).
  • At least one other thread, say threadB, holds a lock on another object, say objectB.
  • Thread threadA wants to call a synchronized method on objectB.
  • Thread threadB wants to call a synchronized method on objectA.

This will lead to an endless wait case, since each thread can only complete when it gets the lock its waiting for. But the locks they are waiting for are held by another waiting thread, so these locks are never freed.

Preventing Deadlocks

There are some guidelines that, if enforced consistently, can avoid deadlocks.

General Rules

In general, deadlocks will not happen (are impossible) if the following techniques are used throughout the code;:

  • Within a synchronized block of code, never call another synchronized block (directly or indirectly).
  • Alternatively (if you must): make sure each call stack locks objects in the same predefined order.

Merely living up to the first rule will help, but is hardly practical in realistic applications. So the second rule is bound to be relevant as well. However, there is a problem...

The Problem: FSM (Pre)EnterListeners

The FSM observers (listeners) in the coordinator are problematic: the FSM is by definition a state holder, and its methods require synchronization. Also, the pre-enter mechanism (via listeners) was designed to prevent illegal state transitions, so pre-enter events are dispatched within the synchronized block(s) of code. Consequently, the first rule is violated, and we must absolutely make sure that the second rule holds at all times.

Let's look at this in more detail. The FSM callbacks (via the listeners) will call other classes unknown at design time, and within a synchronized block of code. This implies that the FSM object will be locked at the time when another object (the listener) is called. So we are in the (possible) situation where a locked object calls another object, possibly violating the second rule. How can we make sure that this does not give deadlocks?

The answer is not so hard: according to the second rule, the order of locking must always be the same for all threads that hold locks in several objects. In the case of the FSM, this means that we should assume that the order of locking is always going to be of the form:

  • Lock the FSM object
  • Lock another object (the listener)

Note that the other object is more or less unknown at design time. It could be any object that implements the listener interface. Deadlocks can happen if that other object directly or indirectly calls the FSM again (in another thread). This would violate the second rule. In order to avoid that, we need some guidelines for our code, as explained next.

Rules of Thumb for Synchronization

We can't control whether or not other classes call back into the FSM. In fact, this is very likely to happen. However, we can avoid deadlocks if we respect the following rule at all times:

Synchronization Principle

Transaction-scoped classes should never call the FSM from within a synchronized block of code.

Because this applies to direct and indirect calls, we can restate this as follows:

Synchronization Principle Corollary

Transaction-scoped classes should never call another class from within a block of code that synchronizes on anything else but the FSM.

Coding Convention

To enforce this principle and make it clear in the code, please following this convention:

  • Define synchronized methods that span local attributes only. In particular, make sure that your synchronized blocks of code do not call the coordinator (directly or indirectly).
  • Prefix the names of synchronized methods with local, to stress this fact.

Known Deadlocks in the Past

Some deadlocks have occurred in past releases, by violations of these basic rules:

  • Case 21705
  • Case 21806
  • Case 26976

spacer

Copyright © 2014 Atomikos BVBA. Transaction Management for Extreme Transaction Processing and SOA Environments serving ISV, Commercial, OEM and Open Source Markets
Site map RSS ATOM