Tag - high availability

Fault tolerant EJB interceptor: a solution to optimistic locking errors and other transient faults

By Stephane Carrez 4 comments

Fault tolerance is often necessary in application servers. The J2EE standard defines an interceptor mechanism that can be used to implement the first steps for fault tolerance. The pattern that I present in this article is the solution that I have implemented for the Planzone service and which is used with success for the last two years.

Identify the Fault to recover

The first step is to identify the faults that can be recovered from others. Our application is using MySQL and Hibernate and we have identified the following three transient faults (or recoverable faults).

StaleObjectStateException (Optimistic Locking)

Optimistic locking is a pattern used to optimize database transactions. Instead of locking the database tables and rows when values are updated, we allow other transactions to access these values. Concurrent writes are possible and they must be detected. For this optimistic locking uses a version counter, or a timestamp or state comparison to detect concurrent writes.

When a concurrent write is detected, Hibernate raises a StaleObjectStateException exception. When such exception occurs, the state of objects associated with the current hibernate session is unknown. (See Transactions and Concurrency)

As far as Planzone is concerned, we get 3 exceptions per 10000 calls.

LockAcquisitionException (Database deadlocks)

On the database side, the server can detect deadlock situation and report an error. When a deadlock is detected between two clients, the server generates an error for one client and the second one can proceed. When such error is reported, the client can retry the operation (See InnoDB Lock Modes).

As far as Planzone is concerned, we get 1 or 2 exceptions per 10000 calls.

JDBCConnectionException (Connection failure)

Sometimes the connection to the database is lost either because the database server crashed or because it was restarted due to maintenance reasons. Server crash is rare but it can occur. For Planzone, we had 3 crashes during the last 2 years (one crash every 240 day). During the same period we also had to stop and restart the server 2 times for a server upgrade.

Restarting the call after a database connection failure is a little bit more complex. It is necessary to sleep some time before retrying.

EJB Interceptor

To create our fault tolerant mechanism we use an EJB interceptor which is invoked for each EJB method call. For this the interceptor defines a method marked with the @ArroundInvoke annotation. Its role is to catch the transient faults and retry the call. The example below retries the call at most 10 times.

The EJB interceptor method receives an InvocationContext parameter which allows to have access to the target object, parameters and method to invoke. The proceed method allows to transfer the control to the next interceptor and to the EJB method. The real implementation is a little bit more complex due to logging but the overall idea is here.

class RetryInterceptor {
 @AroundInvoke
  public Object retry(InvocationContext context) throws Exception {
    for (int retry = 0; ; retry++) {
      try {
        return context.proceed();

      } catch (LockAcquisitionException ex) {
         if (retry > 10) {
          throw ex;
        }

     } catch (StaleObjectStateException ex) {
       if (retry > 10) {
        throw ex;
      }

    } catch (final JDBCConnectionException ex) {
      if (retry > 10) {
        throw ex;
      }
      Thread.sleep(500L + retry * 1000L);
   }
 }
}

EJB Interface

For the purpose of this article, the EJB interface is declared as follows. Our choice was to define an ILocal and an IRemote interface to allow the creation of local and remote services.

public interface Service {
    ...
    @Local
    interface ILocal extends Service {
    }

    @Remote
    interface IRemote extends Service {
    }
}

EJB Declaration

The interceptor is associated with the EJB implementation class by using the @Interceptors annotation. The same interceptor class can be associated with several EJBs.

@Stateless(name = "Service")
@Interceptors(RetryInterceptor.class)
public class ServiceBean
  implements Service.ILocal, Service.IRemote {
  ...
}

Testing

To test the solution, I recommend to write a unit test. The unit test I wrote did the following:

  • A first thread executes the EJB method call.
  • The transaction commit operation is overriden by the unit test.
  • When the commit is called, a second thread is activated to simulate the concurrent call before committing.
  • The second thread performs the EJB method call in such a way that it will trigger the StaleObjectStateException when the first thread resumes
  • When the second thread finished, the first thread can perform the real commit and the StaleObjectStateException is raised by Hibernate because the object was modified.
  • The interceptor catches the exception and retries the call which will succeed.

The full design of such test is outside of the scope of this article. It is also specific to each application.

4 comments
To add a comment, you must be connected. Login to add a comment