When you use the default (file-based) transaction logging and recovery mode, each JVM that embeds Atomikos will manage recovery for its transactions.
For smaller deployments this is fine, but big clustered applications with one or more shared backends will quickly run into performance issues because the backend will get the joint load of the recovery calls of each single node. If there are a lot of nodes then this because a problem.
There are two solutions to this:
From release 5.0.102 and higher, you can lower the recovery load as follows:
You can explicitly trigger the transaction recovery at a time of your choosing, like this:
boolean lax = ... com.atomikos.icatch.config.Configuration.getRecoveryService().performRecovery(lax);What to choose for lax? If you want recovery to happen only when the running JVM has detected recovery risks since last startup, then choose true. This will perform recovery only when the core things it is needed. On the other hand, if you want recovery to happen no matter what, then choose false. Please note that full recovery is a two-pass process, which takes at least com.atomikos.icatch.max_timeout to return. So you may want to tune your timeout settings accordingly. Also, don't call recovery from inside a critical section of your code!