ExtremeTransactions 5.0.103
Release notes for 5.0.103

11 March 2021 | Guy Pardon | ExtremeTransactions

Bug189921
Avoid that exceptions (when writing a checkpoint) needlessly corrupt the transaction log

Severity:	2
Affected version(s):	5.0.x, 4.0.x

Description

You now no longer get "Log corrupted - restart JVM" exceptions after you interrupt a thread that is writing to the transaction log file, or after any other exception that make a log checkpoint fail.

Technical details

Any exceptions during a checkpoint (such as when a thread was interrupted during transaction log file I/O) would lead to a generic exception handling block in our com.atomikos.recovery.fs.CachedRepository class, leaving the instance in an invalid state:

2021-03-01 16:15:56.662 ERROR 41669 --- [pool-1-thread-1] c.a.recovery.fs.FileSystemRepository     : Failed to write checkpoint

java.nio.channels.ClosedByInterruptException: null
   at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202) ~[na:1.8.0_192]
   at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:392) ~[na:1.8.0_192]
   at com.atomikos.recovery.fs.FileSystemRepository.writeCheckpoint(FileSystemRepository.java:196) ~[transactions-5.0.9-SNAPSHOT.jar:na]
   at com.atomikos.recovery.fs.CachedRepository.performCheckpoint(CachedRepository.java:84) [transactions-5.0.9-SNAPSHOT.jar:na]
   at com.atomikos.recovery.fs.CachedRepository.put(CachedRepository.java:77) [transactions-5.0.9-SNAPSHOT.jar:na]
   at com.atomikos.recovery.fs.OltpLogImp.write(OltpLogImp.java:46) [transactions-5.0.9-SNAPSHOT.jar:na]
   at com.atomikos.persistence.imp.StateRecoveryManagerImp.preEnter(StateRecoveryManagerImp.java:51) [transactions-5.0.9-SNAPSHOT.jar:na]
   at com.atomikos.finitestates.FSMImp.notifyListeners(FSMImp.java:164) [transactions-5.0.9-SNAPSHOT.jar:na]
   at com.atomikos.finitestates.FSMImp.setState(FSMImp.java:251) [transactions-5.0.9-SNAPSHOT.jar:na]
   at com.atomikos.icatch.imp.CoordinatorImp.setState(CoordinatorImp.java:284) [transactions-5.0.9-SNAPSHOT.jar:na]
   at com.atomikos.icatch.imp.CoordinatorStateHandler.commitFromWithinCallback(CoordinatorStateHandler.java:346) [transactions-5.0.9-SNAPSHOT.jar:na]
   at com.atomikos.icatch.imp.ActiveStateHandler$6.doCommit(ActiveStateHandler.java:273) [transactions-5.0.9-SNAPSHOT.jar:na]
   at com.atomikos.icatch.imp.CoordinatorStateHandler.commitWithAfterCompletionNotification(CoordinatorStateHandler.java:587) [transactions-5.0.9-SNAPSHOT.jar:na]
   at com.atomikos.icatch.imp.ActiveStateHandler.commit(ActiveStateHandler.java:268) [transactions-5.0.9-SNAPSHOT.jar:na]
   at com.atomikos.icatch.imp.CoordinatorImp.commit(CoordinatorImp.java:550) [transactions-5.0.9-SNAPSHOT.jar:na]
   at com.atomikos.icatch.imp.CoordinatorImp.terminate(CoordinatorImp.java:682) [transactions-5.0.9-SNAPSHOT.jar:na]
   at com.atomikos.icatch.imp.CompositeTransactionImp.commit(CompositeTransactionImp.java:279) [transactions-5.0.9-SNAPSHOT.jar:na]
   at com.atomikos.icatch.jta.TransactionImp.commit(TransactionImp.java:168) [transactions-jta-5.0.9-SNAPSHOT.jar:na]
   at com.atomikos.icatch.jta.TransactionManagerImp.commit(TransactionManagerImp.java:428) [transactions-jta-5.0.9-SNAPSHOT.jar:na]
   at com.atomikos.icatch.jta.UserTransactionManager.commit(UserTransactionManager.java:160) [transactions-jta-5.0.9-SNAPSHOT.jar:na]
   at org.springframework.transaction.jta.JtaTransactionManager.doCommit(JtaTransactionManager.java:1035) [spring-tx-5.2.5.RELEASE.jar:5.2.5.RELEASE]
   at org.springframework.transaction.support.AbstractPlatformTransactionManager.processCommit(AbstractPlatformTransactionManager.java:743) [spring-tx-5.2.5.RELEASE.jar:5.2.5.RELEASE]
   at org.springframework.transaction.support.AbstractPlatformTransactionManager.commit(AbstractPlatformTransactionManager.java:711) [spring-tx-5.2.5.RELEASE.jar:5.2.5.RELEASE]
   at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:152) [spring-tx-5.2.5.RELEASE.jar:5.2.5.RELEASE]
   at com.example.atomikos.AtomikosApplicationTests.lambda$4(AtomikosApplicationTests.java:78) [test-classes/:na]
   at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_192]
   at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[na:1.8.0_192]
   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[na:1.8.0_192]
   at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_192]

Later requests trying to read from the transaction logs would get systematic corruption errors like this:

com.atomikos.recovery.LogReadException: Log corrupted - restart JVM
   at com.atomikos.recovery.fs.CachedRepository.assertNotCorrupted(CachedRepository.java:137) ~[transactions-5.0.9-SNAPSHOT.jar:na]
   at com.atomikos.recovery.fs.CachedRepository.findAllCommittingCoordinatorLogEntries(CachedRepository.java:145) ~[transactions-5.0.9-SNAPSHOT.jar:na]
   at com.atomikos.recovery.fs.RecoveryLogImp.getExpiredPendingCommittingTransactionRecordsAt(RecoveryLogImp.java:52) ~[transactions-5.0.9-SNAPSHOT.jar:na]
   at com.atomikos.icatch.imp.RecoveryDomainService.performRecovery(RecoveryDomainService.java:76) ~[transactions-5.0.9-SNAPSHOT.jar:na]
   at com.atomikos.icatch.imp.RecoveryDomainService$1.alarm(RecoveryDomainService.java:55) [transactions-5.0.9-SNAPSHOT.jar:na]
   at com.atomikos.timing.PooledAlarmTimer.notifyListeners(PooledAlarmTimer.java:101) [atomikos-util-5.0.9-SNAPSHOT.jar:na]
   at com.atomikos.timing.PooledAlarmTimer.run(PooledAlarmTimer.java:88) [atomikos-util-5.0.9-SNAPSHOT.jar:na]
   at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_192]
   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_192]
   at java.lang.Thread.run(Thread.java:748) [na:1.8.0_192]

This has now been fixed.

Changes impacting client API

None.

Bug190034
Spring Boot JDBC metadata: improve getActive method

Severity:	3
Affected version(s):	5.0.x

Description

The method getActive() in the DataSourceBeanMetadata classes of module transactions-springboot2 now no longer returns the total number of open connections, but rather the number of connections that are currently being used by the application.

Technical details

Due to a misunderstanding of Spring Boot's semantics, this method returned the wrong result: the total number of open connections in the pool, rather than the number of connections being used. This has now been fixed.

Changes impacting client API

None.

Bug190035
Spring Boot JDBC metadata: support wrapped datasources

Severity:	3
Affected version(s):	5.0.x

Description

You can now retrieve meaningful DataSourcePoolMetadata in Spring Boot, even if one of our datasources is used in wrapped or proxied mode in your Spring Boot runtime.

Technical details

We used to return metadata in the following style:

if (dataSource instanceof AtomikosDataSourceBean) {
         return new AtomikosDataSourceBeanMetadata((AtomikosDataSourceBean) dataSource);
}

(and similar for our AtomikosNonXADataSourceBean class)

This would not work if the dataSource presented is wrapped or proxied. So we now use the built-in Spring Boot DataSourceUnwrapper.unwrap to handle those cases.

Changes impacting client API

None.

Issue
PostgreSQL: XAResource ignores transaction timeout

Severity:	2
Affected version(s):	5.0.x, 4.0.x, 3.9.x

Description

The XA implementation of PostgreSQL ignores the transaction timeout, which means that you may have long-lived orphaned SQL sessions in your database server.

Technical details

The XA specification allows a transaction manager to inform the XAResource backend of transaction timeouts, so this information can be used to terminate (rollback) pending or long-lived transactions. However, PostgreSQL seems to ignore this information (see the source code on GitHub - which sometimes leads to pending SQL sessions that exceed the transaction timeout.

Possible workarounds

The following workarounds are available:

Set the queryTimeout on your JDBC Statement objects, or try setting a server-level timeout like this:

SET SESSION idle_in_transaction_session_timeout = '5min’;

If you have any other solution then please let us know - thanks!

ExtremeTransactions 5.0.102
Release notes for 5.0.102

25 February 2021 | Guy Pardon | ExtremeTransactions

Feature189601
Allow disabling retry on heuristic hazard participants

You can now choose to disable retrying commit or rollback for heuristic hazard transactions.

Technical details

Heuristic hazard transactions can arise out of network connectivity issues during the commit phase: if a resource gets a prepare request and subsequently becomes unreachable during commit or rollback then the transaction will go into "heuristic hazard" mode. This essentially means that commit will be retried a number of times, even if com.atomikos.icatch.oltp_max_retries is set to zero. The rationale being: it is better to terminate pending in-doubt transactions sooner rather than later because of the pending locks they may be holding on to.

If you don't want this behaviour then you can now disable this, and rely on the recovery process in the background to take care of it (which also works, but will happen only periodically). To disable, just set this new property to false:

com.atomikos.icatch.retry_on_heuristic_hazard=false

Changes impacting client API

A new startup property that can optionally be set. If not present, it will default to true to preserve compatibility with existing behaviour.

Feature189603
API extension to allow triggering recovery by the application

You can now explicitly trigger recovery in your application, via our API.

Technical details

Recovery already happens periodically, in the background. For bigger clusters that connect to the same database (or other shared resource) this can cause a high load on the backend, because of many such background threads hitting the backend at the same time. This is especially true if most cluster nodes start up at the same time with the same configuration for recovery, and are NOT using LogCloud Documentation. To alleviate this, you can now have a bit more control over when recovery happens, like this:

import com.atomikos.icatch.RecoveryService;
import com.atomikos.icatch.config.Configuration;

boolean lax = true; //false to force recovery, true to allow intelligent mode
RecoveryService rs = Configuration.getRecoveryService();
rs.performRecovery(lax);

In order for this to work, make sure to set (in jta.properties):

# set to Long.MAX_VALUE so background recovery is disabled
com.atomikos.icatch.recovery_delay=9223372036854775807L

Changes impacting client API

We have added methods on an existing API interface, which does not break existing clients.

Bug189602
Don't call XA recovery on the XAResource on HeuristicHazard

Severity:	2
Affected version(s):	5.0.x

Description

From now on we no longer systematically call XAResource.recover() when failures happen during the regular commit or rollback, so the overhead for the backend is reduced.

Technical details

For historical reasons we used to call the XA recovery routine on the backed whenever commit or rollback failed. The most common cause is network glitches, meaning that big clusters with a short network problem would suddenly hit the backends with recovery for all active transactions. Since recovery can be an expensive operation, this would result in needless load on the backends.

The rationale behind this was to avoid needless commit retries (based on the value of com.atomikos.icatch.oltp_max_retries), but the overhead does not justify the possible benefit.

From now on we no longer do this, since it is either the recovery process (in the background) or the application (via our API) that controls when recovery happens.

Worst case, this can lead to needless commit retries, in which case the backend should respond with error code XAER_NOTA and our code will handle this gracefully. However, we have historical records where some older version of ActiveMQ did not behave like this. This would result in errors in the ActiveMQ log files, in turn leading to alerts for the operations team.

Changes impacting client API

If you experience issues with this, then it suffices to set com.atomikos.icatch.oltp_max_retries to zero. That will disable regular commit retries and delegate to the recovery background process.

Issue189886
Avoid using 0 for the maximum transaction timeout

Severity:	2
Affected version(s):	5.0.x

Description

For releases 5.0 or higher, the maximum timeout should not be set to 0 or recovery will interfere with regular application-level commits.

Technical details

The 5.0 release has a new recovery workflow that is incompatible with com.atomikos.icatch.max_timeout being zero. That is because recovery depends on the maximum timeout to perform rollback of pending (orphaned) prepared transactions in the backends. If the maximum timeout is zero then recovery (in the background) will rollback prepared transactions that are concurrently being committed in your application. This will result in heuristic exceptions and inconsistent transaction outcomes.

Keep in mind that the maximum timeout is also indicative of maximum lock duration in your databases, so choose it wisely! If you are / were depending on an unlimited maximum timeout then you are also allowing unlimited lock times.

ExtremeTransactions 5.0.101

24 February 2021 | Guy Pardon | ExtremeTransactions

This release will be superseded by the upcoming release 5.0.102. Please ignore this one.

ExtremeTransactions 5.0.100
Release notes for 5.0.100

24 February 2021 | Guy Pardon | ExtremeTransactions

Bug184060
Collect thread name when reaping a pooled connection

Severity

4

Affected versions

5.0.x

Description

You can now more easily determine when connections are reaped because of another connection timing out on network I/O or DB locks.

Technical details

We already used to collect the stack trace of the thread that acquired a reaped connection. However, we now also collect the thread name to correlate reap situations with timeouts, for instance like this:

a JMS connection is gotten
attempt to get a JDBC connection / times out while blocking on the testQuery
reaping of the JMS connection in 1 by the pool's maintenance thread

Before this fix, you would see a timeout + application's thread name + stack trace for step 2, and a stack trace for 3. The stack trace would show where in your application the connection was gotten in step 1, but not by which thread. Indeed, step 3 would log the stack trace within the context of the pool maintenance thread, not the original application thread in step 1.

With this fix you will now also see the application's thread name (i.e., the thread of step 2) in step 3 so you can easily correlate 1-2-3 and determine the timeout in 2 as the root cause for the reap.

Changes impacting client API

None.

ExtremeTransactions 5.0.99
Release notes for 5.0.99

17 February 2021 | Guy Pardon | ExtremeTransactions

Release notes for 5.0.99