A Retry Framework for JPA

Databases are sometimes the bane of developer’s lives. Aside from all the fun of managing objects, queries, etc., databases are notorious for occasionally (and non-repeatably) hitting your application with an exception. All sorts of things can cause this:

Your connection to the database may be momentarily interrupted. Even “high availability” approaches aren’t proof against this – most of them rely on some form of failover, which means that old connections get broken and new connections have to be established. Unfortunately, your database connection pool is probably holding onto those old connections.
In some applications it is possible for two simultaneously-running transactions to deadlock due to issues with the way that locks are obtained. This usually causes the database to reject one of the transactions.
There are cases where the design of the application itself might be susceptible to constraint violations. As one example, suppose that you identify users via email addresses, which are supposed to be unique. Before adding someone, you query the database to ensure that the address doesn’t already exist. Because of transaction isolation, however, it may still be possible for two simultaneous attempts to add the same email address to result in a constraint violation exception – if the second attempt’s transaction begins before the first one commits, the second attempt’s “get” will report that the address doesn’t exist.

Having an exception like this cause your application to barf in the user’s face is rarely good design. A much better design is to detect situations in which a repeat attempt is likely to succeed, and to retry the operation. A colleague and I evolved such a framework for Hibernate 3 as part of a series of projects on which we were working. It’s part of a project on SourceForge. It was, however, somewhat over complex because, at the time, we were trying to “hide” Hibernate behind the framework’s facade.

Since we first started on that, the Java Persistence API (JPA) has been nicely standardized. JPA has its advantages – it’s a standard and is supported by JEE servers – and its disadvantages – its query API is, in my opinion, much less readable than Hibernate’s – but it is a standard. Thus, having a similar framework for JPA would be useful. Thus, without further ado, here is a simple one.

Here are the basic requirements I had in mind:

There are certain exceptions that indicate fundamental issues with talking to the database, such as a broken connection. The framework should be capable of handling these for any operation.
Certain operations may have additional exceptions that indicate a retry is appropriate. Our email-address example above might warrant a retry on a constraint violation exception, but for other operations this may indicate a programming error, and a retry is not appropriate. Thus, the list of “retryable” exceptions should be operation-specific.
Retry counts are fragile. As one example, if a bunch of connections in the pool have been broken, it may be necessary to retry enough time to empty those out of the pool. Increase the number of connections in the pool, and you’ll have to remember to change the retry count as well. Instead, I opted for a retry period – retry until a certain amount of time has elapsed. If you’re in a failover situation, you probably can figure out how long that will take.

(Side note: yes, I know that you can typically set up pools to test connections. This isn’t appropriate in all situations, however.)

Because retries may be necessary, the framework uses the command pattern. Individual database operations are defined by the following interface:

public interface DatabaseOperation
{
	public void execute(EntityManager entityManager);
}

For some operations, this is sufficient. Operations using this interface will only retry if the exception is of the “system” type. If an operation needs to retry on specific non-“system” exceptions, it should implement this interface:

public interface RetryableDatabaseOperation extends DatabaseOperation
{
	public boolean isRetryable(RuntimeException e);
}

This allows the framework to ask the individual command if a particular exception should trigger a retry. Then, we provide an interface:

public interface DatabaseOperationExecutor
{
	public void executeInTransaction(DatabaseOperation...databaseOperations);
}

which is the contract for the class that will perform the operations. The entire list of operations will be executed within a single database transaction. Finally, EntityManagers are handled via this interface:

public interface EntityManagerHolder
{
	public EntityManager getEntityManager();
	public void returnEntityManager(EntityManager entityManager);
	public void forceCloseEntityManager(EntityManager entityManager);
}

The idea here is to decouple (to the extent possible) the lifecycle of the EntityManager from our framework. In a web application, it may be desirable to implement an “open EntityManager in view” approach, in which the EntityManager will persist for the remainder of the request. Thus, the contract for this interface is as follows:

getEntityManager will return an EntityManager that is ready for use.
returnEntityManager informs the holder that the framework we’re writing is done with the EntityManager, and that (as far as the framework is concerned) the EntityManager can continue to exist. The EntityManagerHolder may, however, choose to close the EntityManager at this point if necessary.
forceCloseEntityManager informs the Holder that the EntityManager must be closed at this point. This is used in the case of an exception – the internal state of an EntityManager is typically undefined after a database exception is thrown.

The simplest implementation of this interface would be as follows:

public class SimpleEntityManagerHolder implements EntityManagerHolder
{
	private EntityManagerFactory entityManagerFactory;
	
	public SimpleEntityManagerHolder(EntityManagerFactory entityManagerFactory)
	{
		this.entityManagerFactory = entityManagerFactory
	}

	@Override
	public EntityManager getEntityManager()
	{
		return entityManagerFactory.createEntityManager();
	}

	@Override
	public void returnEntityManager(EntityManager entityManager)
	{
		entityManager.close();
	}

	@Override
	public void forceCloseEntityManager(EntityManager entityManager)
	{
		if (entityManager != null)
		{
			entityManager.close();
		}
	}
}

An “open EntityManager in view” implementation could use a ThreadLocal variable to keep a single EntityManager instance around for multiple uses. You’d obviously need an additional procedure and additional method(s) to then close the EntityManager at the end of the request.

With all that background, here’s the implementation:

public class DatabaseOperationExecutorImpl implements DatabaseOperationExecutor
{

	private EntityManagerHolder entityManagerHolder;
	private long maxRetryMillis;

	public DatabaseOperationExecutorImpl(EntityManagerHolder entityManagerHolder)
	{
		this.entityManagerHolder = entityManagerHolder;
		maxRetryMillis = DEFAULT_MAX_RETRY_MILLIS;
	}

	public void setMaxRetryTime(long interval, TimeUnit unit)
	{
		maxRetryMillis = TimeUnit.MILLISECONDS.convert(interval, unit);
	}

	@Override
	public void executeInTransaction(DatabaseOperation... databaseOperations)
	{
		long start = System.currentTimeMillis();

		for (;;)
		{
			EntityManager entityManager = null;
			try
			{
				entityManager = entityManagerHolder.getEntityManager();
				executeTransaction(entityManager, databaseOperations);
				entityManagerHolder.returnEntityManager(entityManager);
				return;
			}
			catch (RuntimeException e)
			{
				entityManagerHolder.forceCloseEntityManager(entityManager);

				if (!isRetryable(e, databaseOperations))
				{
					throw e;
				}

				if (System.currentTimeMillis() > start + maxRetryMillis)
				{
					throw e;
				}
			}
		}
	}

	private void executeTransaction(EntityManager entityManager, DatabaseOperation... databaseOperations)
	{
		EntityTransaction transaction = null;

		try
		{
			transaction = entityManager.getTransaction();
			transaction.begin();
			for (DatabaseOperation operation : databaseOperations)
			{
				operation.execute(entityManager);
			}
			transaction.commit();
		}
		catch (RuntimeException e)
		{
			if (transaction != null)
			{
				transaction.rollback();
			}

			throw e;
		}
	}

	private boolean isRetryable(RuntimeException e, DatabaseOperation... databaseOperations)
	{
		for (Class<?> clazz : FRAMEWORK_EXCEPTION_CLASSES)
		{
			if (clazz.isAssignableFrom(e.getClass()))
			{
				return true;
			}
		}

		for (DatabaseOperation operation : databaseOperations)
		{
			if (operation instanceof RetryableDatabaseOperation)
			{
				if (((RetryableDatabaseOperation) operation).isRetryable(e))
				{
					return true;
				}
			}
		}

		return false;
	}

	private static Set<Class<?>> FRAMEWORK_EXCEPTION_CLASSES = new HashSet<Class<?>>();
	public static long DEFAULT_MAX_RETRY_MILLIS = 10000;

	public static void addFrameworkExceptionClasses(Class<?>... classes)
	{
		for (Class<?> clazz : classes)
		{
			FRAMEWORK_EXCEPTION_CLASSES.add(clazz);
		}
	}
	
	public static void setDefaultMaxRetryTime(long interval, TimeUnit unit)
	{
		DEFAULT_MAX_RETRY_MILLIS = TimeUnit.MILLISECONDS.convert(interval, unit);
	}

	static
	{
		DatabaseOperationExecutorImpl.addFrameworkExceptionClasses(	TransactionException.class,
																	JDBCConnectionException.class,
																	LockAcquisitionException.class);
	}
}

addFrameworkExceptionClasses allows one to globally configure exception classes that will always trigger a retry. In this particular case I’ve initialized the class with some Hibernate exceptions that would be typical. If you’re not using Hibernate, you can get rid of the static initializer and set it up you own way.

setDefaultMaxRetryTime similarly allows one to globally configure the maximum retry period. This can be overridden on an instance-by-instance manner if required.

A possible improvement to this would be to allow RetryableDatabaseOperations to specify whether a particular exception should trigger only one retry – an operation-specific exception that occurs more than once is not likely to clear itself up. I don’t need that at the moment, however, so on the YAGNI principle, I haven’t coded it. “Left as an exercise for the student,” as my professors used to say…