Fea­ture189601
Al­low dis­abling retry on heuris­tic haz­ard par­tic­i­pants

You can now choose to dis­able retry­ing com­mit or roll­back for heuris­tic haz­ard trans­ac­tions.

Tech­ni­cal de­tails

Heuris­tic haz­ard trans­ac­tions can arise out of net­work con­nec­tiv­i­ty is­sues dur­ing the com­mit phase: if a re­source gets a pre­pare re­quest and sub­se­quent­ly be­comes un­reach­able dur­ing com­mit or roll­back then the trans­ac­tion will go into "heuris­tic haz­ard" mode. This es­sen­tial­ly means that com­mit will be re­tried a num­ber of times, even if com.atom­ikos.icatch.olt­p_­max_re­tries is set to zero. The ra­tio­nale be­ing: it is bet­ter to ter­mi­nate pend­ing in-doubt trans­ac­tions soon­er rather than lat­er be­cause of the pend­ing locks they may be hold­ing on to.

If you don't want this be­hav­iour then you can now dis­able this, and rely on the re­cov­ery process in the back­ground to take care of it (which also works, but will hap­pen only pe­ri­od­i­cal­ly). To dis­able, just set this new prop­er­ty to false:

com.atomikos.icatch.retry_on_heuristic_hazard=false

Changes im­pact­ing client API

A new start­up prop­er­ty that can op­tion­al­ly be set. If not present, it will de­fault to true to pre­serve com­pat­i­bil­i­ty with ex­ist­ing be­hav­iour.

Fea­ture189603
API ex­ten­sion to al­low trig­ger­ing re­cov­ery by the ap­pli­ca­tion

You can now ex­plic­it­ly trig­ger re­cov­ery in your ap­pli­ca­tion, via our API.

Tech­ni­cal de­tails

Re­cov­ery al­ready hap­pens pe­ri­od­i­cal­ly, in the back­ground. For big­ger clus­ters that con­nect to the same data­base (or oth­er shared re­source) this can cause a high load on the back­end, be­cause of many such back­ground threads hit­ting the back­end at the same time. This is es­pe­cial­ly true if most clus­ter nodes start up at the same time with the same con­fig­u­ra­tion for re­cov­ery, and are NOT us­ing LogCloud Doc­u­men­ta­tion. To al­le­vi­ate this, you can now have a bit more con­trol over when re­cov­ery hap­pens, like this:

import com.atomikos.icatch.RecoveryService;
import com.atomikos.icatch.config.Configuration;

boolean lax = true; //false to force recovery, true to allow intelligent mode
RecoveryService rs = Configuration.getRecoveryService();
rs.performRecovery(lax);

In or­der for this to work, make sure to set (in jta.prop­er­ties):

# set to Long.MAX_VALUE so background recovery is disabled
com.atomikos.icatch.recovery_delay=9223372036854775807L 

Changes im­pact­ing client API

We have added meth­ods on an ex­ist­ing API in­ter­face, which does not break ex­ist­ing clients.

Bug189602
Don't call XA re­cov­ery on the XARe­source on Heuris­ticHazard

Sever­i­ty:2
Af­fect­ed ver­sion(s):5.0.x

De­scrip­tion

From now on we no longer sys­tem­at­i­cal­ly call XARe­source.re­cov­er() when fail­ures hap­pen dur­ing the reg­u­lar com­mit or roll­back, so the over­head for the back­end is re­duced.

Tech­ni­cal de­tails

For his­tor­i­cal rea­sons we used to call the XA re­cov­ery rou­tine on the backed when­ev­er com­mit or roll­back failed. The most com­mon cause is net­work glitch­es, mean­ing that big clus­ters with a short net­work prob­lem would sud­den­ly hit the back­ends with re­cov­ery for all ac­tive trans­ac­tions. Since re­cov­ery can be an ex­pen­sive op­er­a­tion, this would re­sult in need­less load on the back­ends.

The ra­tio­nale be­hind this was to avoid need­less com­mit re­tries (based on the val­ue of com.atom­ikos.icatch.olt­p_­max_re­tries), but the over­head does not jus­ti­fy the pos­si­ble ben­e­fit.

From now on we no longer do this, since it is ei­ther the re­cov­ery process (in the back­ground) or the ap­pli­ca­tion (via our API) that con­trols when re­cov­ery hap­pens.

Worst case, this can lead to need­less com­mit re­tries, in which case the back­end should re­spond with er­ror code XAER_NOTA and our code will han­dle this grace­ful­ly. How­ev­er, we have his­tor­i­cal records where some old­er ver­sion of Ac­tiveMQ did not be­have like this. This would re­sult in er­rors in the Ac­tiveMQ log files, in turn lead­ing to alerts for the op­er­a­tions team.

Changes im­pact­ing client API

If you ex­pe­ri­ence is­sues with this, then it suf­fices to set com.atom­ikos.icatch.olt­p_­max_re­tries to zero. That will dis­able reg­u­lar com­mit re­tries and del­e­gate to the re­cov­ery back­ground process.

Is­sue189886
Avoid us­ing 0 for the max­i­mum trans­ac­tion time­out

Sever­i­ty:2
Af­fect­ed ver­sion(s):5.0.x

De­scrip­tion

For re­leas­es 5.0 or high­er, the max­i­mum time­out should not be set to 0 or re­cov­ery will in­ter­fere with reg­u­lar ap­pli­ca­tion-lev­el com­mits.

Tech­ni­cal de­tails

The 5.0 re­lease has a new re­cov­ery work­flow that is in­com­pat­i­ble with com.atom­ikos.icatch.max_­time­out be­ing zero. That is be­cause re­cov­ery de­pends on the max­i­mum time­out to per­form roll­back of pend­ing (or­phaned) pre­pared trans­ac­tions in the back­ends. If the max­i­mum time­out is zero then re­cov­ery (in the back­ground) will roll­back pre­pared trans­ac­tions that are con­cur­rent­ly be­ing com­mit­ted in your ap­pli­ca­tion. This will re­sult in heuris­tic ex­cep­tions and in­con­sis­tent trans­ac­tion out­comes.

Keep in mind that the max­i­mum time­out is also in­dica­tive of max­i­mum lock du­ra­tion in your data­bas­es, so choose it wise­ly! If you are / were de­pend­ing on an un­lim­it­ed max­i­mum time­out then you are also al­low­ing un­lim­it­ed lock times.

About Sever­i­ty

The sever­i­ty lev­els we use are de­fined in our sup­port terms and con­di­tions.

Avail­able to cus­tomers only. Want to be­come a cus­tomer?

Free Tr­i­al
RSS

Comments

Add a comment

Corporate Information

Atomikos Corporate Headquarters
Hoveniersstraat, 39/1, 2800
Mechelen, Belgium

Contact Us

Copyright 2026 Atomikos BVBA | Our Privacy Policy
By using this site you agree to our cookies. More info. That's Fine