I recently published an article about how to bind non-transactional resources like web services / microservices into global distributed transactions so that recovery is handled automatically. Over the years I have often had to integrate “non-transactional” systems into Java EE application servers and data consistency was often a topic of discussion or even a non-functional requirement. I’ve put “non-transactional” into quotes because often the systems contain ways of ensuring data consistency, for example using calls to compensate, but the systems aren’t what you might traditionally call transactional. There is certainly no way of configuring a Java EE application server to automatically handle recovery for such resources.
The following is a list of patterns that we compiled, showing different ways to maintain consistency when faced with the task of integrating a non-transactional system.
- Write job to database – The common scenario whereby you want to send say an email confirmation after a sale is made. You cannot send the email and then attempt to commit the sales transaction to your database, because if the commit fails, the customer receives an email stating that they have bought something and you have no record of it. You cannot send the email after the sales transaction is committed to your database, because if the sending of the email fails (e.g. the mail server is temporarily down), the customer won’t get their confirmation, perhaps with a link to the tickets that they bought. One solution is to write the fact that an email needs to be sent, into the database in the same transaction that persists the sale. A batch or
@ScheduledEJB can then periodically check to see if it should send an email. Once it successfully sends an email it changes the state of the record so that the email is not sent again. The same problem applies here that you might only be able to send the email but not update the database. But if you were able to read the database, you are likely to be able to update it, and sending the same email twice because of a database failure isn’t as bad as never sending it, as could be the case if you didn’t handle sending email asynchronously. One disadvantage of integrating like this is that it means that you cannot integrate a system from which you need the result in order to continue processing your business logic before replying to the user. You must handle the integration asynchronously.
- JMS – In a similar scenario to the previous solution, instead of writing a job to the database, you can send a JMS message containing the job. JMS is transactional, but asynchronous so this solution suffers from the same disadvantages as the solution above. Instead of changing the state of the work to be done, if you cannot process the work at that time, you send the message back into the queue with a property so that it is only processed after a certain amount of time, or you send the message to a dead letter queue for manual handling.
- Generic Connector (JCA Adapter) – I recently published a blog article describing a
- CDI Events – Using
@Inject @Qualifier Event<T>on a field &
field.fire(t);when you want to fire an event &
@Observes(during=TransactionPhase.AFTER_FAILURE) @Qualifier Ton a method parameter, the method will be called for each fired event, after the transaction fails. This way you can implement some compensation for when the transaction fails. Equally, you can use different transaction phases to do different things, like
AFTER_SUCCESSto perform a call to confirm an initial reservation. We have even used these mechanisms to delay the call to the remote system, for example to post work to a workflow engine just before the commit, so that we are sure that all validation logic in the complex process has completed before the remote system call is made. See number 12 below.
- Custom Solution – If you can really really justify if, then you could build complex code with timeouts etc. involving batches and scripts which handle committing, rolling back and recovering transactions using the remote resource. The question you need to ask yourself is whether you are an expert in writing business code, or an expert in effectively writing transaction managers.
- Business Process Engine – Modern engines can integrate all kinds of remote resources into business processes and they tend to handle things like failure recovery. They typically retry failed calls and they can durably handle process state during the time it takes for remote systems to become online again so that the process can be resumed. Rather than commit and rollback, BPEL supports compensation to guarantee consistency across the entire landscape.
- Atomikos & TCC – A product which is capable of binding web services into JTA transactions. So far as I can tell, it is a stand alone transaction manager which can run outside of a Java EE application server. But I have no experience with this product.
- WS-AT – Using proprietary configuration (and/or annotations) you can set up two application servers to do their work within a global transaction. While this sounds promising, I have yet to come across a productive system which implements WS-AT. Really only supports SOAP web services, although JBoss has something in the pipeline for supporting REST services.
- EJB – Remote EJBs: Java EE application servers have been able to propgate transaction contexts from one server to another for a relatively long time. If you need to call a service that happens to be implemented using the Java EE stack, why not call it using remote EJB rather than calling it say over a web service, so that you get the service bound into a global transaction for free?
– Local EJBs: If the service you are calling happens to be written in Java using say EJB technology, why not just deploy it locally instead of going to the extra effort to call it remotely say via a SOAP web service? You might get brownie points with the enterprise architects, but has scalability and composability been compared to performance, consistency and simplicity? Sure, modern architectures with trends like microservices mean that deploying lots of remote services is good, but there’s always a trade-off being made and you need to really understand it when making the decision about what parts of the landscape need to be accessed remotely.
- Transaction Callbacks – like solution 4 but using the transaction synchronisation API to register callbacks which are called at the relevant stage of the transaction. The problem here, unlike with CDI events, is that you don’t know the context in which the transaction is being committed or rolled back, because the callback is not passed the relevant data unlike the object which is passed into an observing method in CDI. So if you need to compensate the transaction and call say a web service to cancel what you did during the transaction, where do you get the data that you need to do so?
- Enlist XA Resource into Transaction – add a custom implementation of the XAResource interface, which you enlist into the transaction using the
enlistResourcemethod. Unfortunately the commit/rollback methods are only called once and if they should fail, they won’t be called again during recovery.
- Non-transactional resource last – If no other pattern can be implemented, and you don’t need to call the resource at a specific time during the process, e.g. you need to send an email as part of the transaction, but it doesn’t matter if you do it as the first or last process step, then always call it right at the end of the process, shortly before the transaction is committed. The chances of the transaction not being able to commit is relatively small (especially if all the SQL has been flushed to the database), compared to the chances of your remote system call failing. If the call fails, then rollback the transaction. If the call succeeds, then commit the transaction. If the transaction then fails during commit, and it is important to you to compensate the non-transactional resource, you will need to use one of the patterns described above to add some compensation to the system.
The following table sums up the solutions. The recovery column indicates the level of automated recovery which this solution supports. The synchronicity column indicates whether you can use the solution if you need the response in order to continue processing, in which case you need a synchronous solution. Synchronicity here has nothing to do with blocking vs. non-blocking, rather it has to do with timing and whether you need a response in order to finish processing an activity.
|1) Write job to database
|3) Generic Connector (JCA Adapter)
|4) CDI Events
|5) Custom Solution
|Depends on your implementation
|Depends on your implementation
|6) Business Process Engine
|7) Atomikos & TCC
|No experience, presumably synchronous
|No experience, presumably supported
|8) WS-AT (Configuration)
|No experience, presumably synchronous
|No experience, presumably supported
|10) Transaction Callbacks
|11) Enlist XA Resource into Transaction
|12) Non-transactional resource last
|Asynchronous because it must be called last
- Manual Recovery – you need to program what to do if handling fails, i.e. how often a retry should be attempted before putting work on a “dead letter queue”.
- JMS will automatically attempt to resend messages if you configure the queue to be durable. But what you do with a failed attempt to handle a message is up to you, the programmer.
- The transaction manager will continuously attempt to commit/rollback incomplete transactions until an administrator steps in to handle long running failures.
- Callbacks are only called once so you have just one shot
- A business process engine will repeatedly attempt to re-call failed web service calls. The same is true for compensation. The behaviour is typically configurable.
- Remote EJBs: The JTA transaction is propagated across to other app servers and as such the coordinating transaction manager will propagate transaction recovery to the other app servers bound into the transaction.
Local EJBs: Using local EJBs means that any calls that they make to the database will be handled in the same transaction as your application code. If the local EJB is using a different database, you should use XA drivers for all databases, message queues, etc., so that the transaction manager can use two phase commit to ensure system-wide consistency.
Of all of these, my current favourite is the generic connector. It supports calls from which a response is required, as well as recovery being fully automatic. That means that I can concentrate on writing business code, rather than boilerplatey code that really belongs in a framework.
If you know of further ways, please contact me or leave a comment so that I can add them to the list.
Copyright ©2015, Ant Kutschera
UPDATE: See this new article about how to use asynchronous remote calls to guarantee global data consistency.