- Feature189601Allow disabling retry on heuristic hazard participants
- Feature189603API extension to allow triggering recovery by the application
- Bug189602Don't call XA recovery on the XAResource on HeuristicHazard
- Issue189886Avoid using 0 for the maximum transaction timeout
- About Severity
- Available to customers only. Want to become a customer?
Feature189601 Allow disabling retry on heuristic hazard participants
You can now choose to disable retrying commit or rollback for heuristic hazard transactions.
Technical details
Heuristic hazard transactions can arise out of network connectivity issues during the commit phase: if a resource gets a prepare request and subsequently becomes unreachable during commit or rollback then the transaction will go into "heuristic hazard" mode. This essentially means that commit will be retried a number of times, even if com.atomikos.icatch.oltp_max_retries is set to zero. The rationale being: it is better to terminate pending in-doubt transactions sooner rather than later because of the pending locks they may be holding on to.
If you don't want this behaviour then you can now disable this, and rely on the recovery process in the background to take care of it (which also works, but will happen only periodically). To disable, just set this new property to false:
com.atomikos.icatch.retry_on_heuristic_hazard=false
Changes impacting client API
A new startup property that can optionally be set. If not present, it will default to true to preserve compatibility with existing behaviour.
Feature189603 API extension to allow triggering recovery by the application
You can now explicitly trigger recovery in your application, via our API.
Technical details
Recovery already happens periodically, in the background. For bigger clusters that connect to the same database (or other shared resource) this can cause a high load on the backend, because of many such background threads hitting the backend at the same time. This is especially true if most cluster nodes start up at the same time with the same configuration for recovery, and are NOT using LogCloud Documentation. To alleviate this, you can now have a bit more control over when recovery happens, like this:import com.atomikos.icatch.RecoveryService; import com.atomikos.icatch.config.Configuration; boolean lax = true; //false to force recovery, true to allow intelligent mode RecoveryService rs = Configuration.getRecoveryService(); rs.performRecovery(lax);
In order for this to work, make sure to set (in jta.properties):
# set to Long.MAX_VALUE so background recovery is disabled com.atomikos.icatch.recovery_delay=9223372036854775807L
Changes impacting client API
We have added methods on an existing API interface, which does not break existing clients.
Bug189602 Don't call XA recovery on the XAResource on HeuristicHazard
| Severity: | 2 |
|---|---|
| Affected version(s): | 5.0.x |
Description
From now on we no longer systematically callXAResource.recover() when failures happen during the regular commit or rollback, so the overhead for the backend is reduced.
Technical details
For historical reasons we used to call the XA recovery routine on the backed whenever commit or rollback failed. The most common cause is network glitches, meaning that big clusters with a short network problem would suddenly hit the backends with recovery for all active transactions. Since recovery can be an expensive operation, this would result in needless load on the backends.
The rationale behind this was to avoid needless commit retries (based on the value of com.atomikos.icatch.oltp_max_retries), but the overhead does not justify the possible benefit.
From now on we no longer do this, since it is either the recovery process (in the background) or the application (via our API) that controls when recovery happens.
Worst case, this can lead to needless commit retries, in which case the backend should respond with error code XAER_NOTA and our code will handle this gracefully. However, we have historical records where some older version of ActiveMQ did not behave like this. This would result in errors in the ActiveMQ log files, in turn leading to alerts for the operations team.
Changes impacting client API
If you experience issues with this, then it suffices to set com.atomikos.icatch.oltp_max_retries to zero. That will disable regular commit retries and delegate to the recovery background process.
Issue189886 Avoid using 0 for the maximum transaction timeout
| Severity: | 2 |
|---|---|
| Affected version(s): | 5.0.x |
Description
For releases 5.0 or higher, the maximum timeout should not be set to 0 or recovery will interfere with regular application-level commits.
Technical details
The 5.0 release has a new recovery workflow that is incompatible with com.atomikos.icatch.max_timeout being zero. That is because recovery depends on the maximum timeout to perform rollback of pending (orphaned) prepared transactions in the backends. If the maximum timeout is zero then recovery (in the background) will rollback prepared transactions that are concurrently being committed in your application. This will result in heuristic exceptions and inconsistent transaction outcomes.
Keep in mind that the maximum timeout is also indicative of maximum lock duration in your databases, so choose it wisely! If you are / were depending on an unlimited maximum timeout then you are also allowing unlimited lock times.

Add a comment