You are here: Blog » Vision

Vision

A lot of the hype in "Ex­treme Trans­ac­tion Pro­cess­ing (XTP)" is fail-over. When Or­a­cle bought Co­her­ence (a Tan­gosol prod­uct), they es­sen­tial­ly got an XTP so­lu­tion for data­base ac­cess.

As Cameron Pur­dy notes here, this now al­lows Or­a­cle to pro­vide a de­gree of XTP failover.

Now guess what: with Atomikos Trans­ac­tion­sEssen­tials you get:
  • Trans­ac­tion­al ro­bust­ness for noth­ing, and
  • failover for free
How? Just do the fol­low­ing:
  1. queue re­quests in JMS
  2. process them by a clus­ter of com­pet­ing con­sumer process­es
  3. use Atomikos Trans­ac­tion­sEssen­tials to en­sure that each mes­sage is processed ex­act­ly once, with­out du­pli­cates or mes­sage loss
By the se­man­tics of queues, this ar­chi­tec­ture will give you failover. By the se­man­tics of trans­ac­tions, this will give you ex­act­ly once. Since the re­quests can be queued by any source, this is mul­ti­chan­nel. Every­thing is com­mod­i­ty in­fra­struc­ture. This is very easy to scale: just add an­oth­er process.

In sum­ma­ry, this is XTP of the high­est de­gree:-)

Work­ing for Atomikos, I use two-phase com­mit a lot. While I don't want to claim that it is a so­lu­tion to all prob­lems, I do find it frus­trat­ing to hear peo­ple pro­claim­ing that they don't use it be­cause it doesn't scale (or some oth­er rea­son).

Take, for in­stance, Wern­er Vo­gel's talk about the Ama­zon ar­chi­tec­ture. Once again, two-phase com­mit is re­ject­ed as a vi­able so­lu­tion/tech­nol­o­gy. Once again, I dis­agree.

Let me il­lus­trate my point with an ex­am­ple of what re­al­ly hap­pened to me re­cent­ly - af­ter or­der­ing a book at Ama­zon (iron­i­cal­ly;-). I can give sim­i­lar ex­am­ples with air­line tick­et reser­va­tions but those will have to wait un­til lat­er...

So what hap­pened re­al­ly? Well, I or­dered a book that I re­al­ly want­ed to have. I or­dered it on­line at Ama­zon... All went well, I checked out and paid by VISA. How­ev­er, that is where things start­ed to go wrong: while wait­ing for the book to be de­liv­ered, I sud­den­ly get an email from Ama­zon say­ing that... my or­der has been can­celed!

Canceled? Yes, but not in a way you would think: I still had to pay for the de­liv­ery by DHL (sor­ry, what is that?!). Yes sir, DHL claimed they had found no­body present at the de­liv­ery ad­dress. The de­liv­ery was at our of­fice ad­dress, so it is very un­like­ly that no­body be there in the first place. More­over, any couri­er ser­vice I know will leave a note that they passed by and at least set­tle for an al­ter­na­tive de­liv­ery. Not this time.

My con­clu­sion? DHL did not ar­rive at my place. On the Ama­zon or­der track­ing page, my or­der had not even left Ger­many (to be de­liv­ered where I live, in Bel­gium).

Now what will I re­mem­ber? I will re­mem­ber that Ama­zon let me down, ei­ther di­rect­ly or via DHL. I will also re­mem­ber to be very sus­pi­cious about peo­ple who say they don't need two-phase com­mit. Two-phase com­mit comes down to en­sur­ing agree­ment be­tween the dif­fer­ent par­ties in­volved in a trans­ac­tion. Clear­ly, there was no such thing in my case.

One of the lat­est chal­lenges in com­put­er sci­ence seems to be the CAP the­o­rem. It ad­dress­es a per­ceived im­pos­si­bil­i­ty of build­ing large-scale and clus­tered (web) ser­vice ar­chi­tec­tures. The fact that it (sup­pos­ed­ly) has been proven to be true makes what I am go­ing to write here all the more un­like­ly. Still, read on be­cause I will show that I am right and CAP is not an im­pos­si­bil­i­ty af­ter all... While the im­pos­si­bil­i­ty proof of CAP is math­e­mat­i­cal­ly cor­rect, it is based on as­sump­tions that are too strict. By re­lax­ing these as­sump­tions, I found the so­lu­tion pre­sent­ed here.

What is CAP?

The CAP the­o­rem (short for con­sis­ten­cy, avail­abil­i­ty, par­ti­tion-tol­er­ant) es­sen­tial­ly states that you can­not have a clus­tered sys­tem that sup­ports all of the fol­low­ing three qual­i­ties:

Con­sis­ten­cy
Con­sis­ten­cy is a qual­i­ty mean­ing (in­for­mal­ly speak­ing) that reads and writes hap­pen cor­rect­ly. In oth­er words, the over­all ef­fect of ex­e­cut­ing thou­sands or mil­lions of trans­ac­tions con­cur­rent­ly is the same as if they had been ex­e­cut­ed one-at-a-time. Usu­al­ly, this is done with the help of a trans­ac­tion man­ag­er of some sort.
Avail­abil­i­ty
Avail­abil­i­ty es­sen­tial­ly means that every op­er­a­tion (that makes it to a non-fail­ing node) even­tu­al­ly re­turns a re­sult.
Par­ti­tion-tol­er­ant
This qual­i­ty refers to the pos­si­bil­i­ty of tol­er­at­ing par­ti­tions on the net­work. Note that we sup­pose a clus­ter ar­chi­tec­ture (which is where the net­work comes in).

CAP is a con­jec­ture orig­i­nal­ly for­mu­lat­ed by Eric Brew­er (Ink­to­mi) and has in­flu­enced many of to­day's larg­er-scale web­sites like . In oth­er words, the im­pact of CAP is very large. To make it worse, the per­ceived im­pos­si­bil­i­ty of a CAP sys­tem (one that has all three de­sir­able prop­er­ties) has lead peo­ple to ad­vo­cate some­thing called BASE (Ba­si­cal­ly Avail­able, Soft-state and Even­tu­al­ly Con­sis­tent) - see this talk by Wern­er Vo­gels (CTO at Ama­zon).

As far as I know (but I could be wrong), a the­o­ret­i­cal foun­da­tion of BASE does not ex­ist yet (it seems more of an in­for­mal ap­proach which to me rais­es se­ri­ous ques­tions con­cern­ing cor­rect­ness). In this post I will present:
  • a CAP so­lu­tion
  • how this con­forms to what BASE wants to achieve
  • a "de­sign pat­tern" for build­ing cor­rect sys­tems that (in a way) of­fer both CAP and BASE qual­i­ties
Be­cause CAP is per­ceived as im­pos­si­ble and be­cause BASE lacks for­mal treat­ment, I con­sid­er this to be a sig­ni­fi­ca­tion con­tri­bu­tion to the state of to­day's en­gi­neer­ing;-)

What about the proof of Brew­er's the­o­rem?

Brew­er's proof has been pub­lished by Nan­cy Lynch et al and dis­cussed by me (see my ear­li­er post and also this one).

While the the­o­ret­i­cal proof of the im­pos­si­bil­i­ty of CAP is valid, it has a big lim­i­ta­tion: it as­sumes that all three CAP prop­er­ties have to be sup­plied at the same mo­ment in time. If you drop this as­sump­tion, then all of a sud­den you get into a new spec­trum of pos­si­bil­i­ties. This is what I will do here.

A CAP so­lu­tion

Enough talk, let's get to the core of the mat­ter. Here is my so­lu­tion to CAP. To make it con­crete, I will use the con­cept of a web-shop like Ama­zon. Here are the rules that are suf­fi­cient to en­sure CAP:
  1. Pro­cess reads from the data­base if pos­si­ble, or use a cached val­ue if need­ed for avail­abil­i­ty (if the DB is un­reach­able).
  2. All reads use ver­sion­ing or an­oth­er mech­a­nism that al­lows op­ti­mistic lock­ing.
  3. Up­dates sup­plied by clients (or­ders in case of Ama­zon) are queued for ex­e­cu­tion, and in­clude the ver­sion­ing in­for­ma­tion of the reads that lead to the up­date.
  4. Queued up­dates are processed when the num­ber of par­ti­tions is low enough to do so. The eas­i­est way to do this is with a clus­ter-wide dis­trib­uted trans­ac­tion across all repli­cas (more on scal­a­bil­i­ty lat­er), but oth­er more re­fined ways are pos­si­ble (such as quo­rum-based repli­ca­tion or any oth­er smart way of repli­cat­ing). The ver­sion in­for­ma­tion in the up­date is used to val­i­date it: if the data in the data­base has been mod­i­fied since the orig­i­nal read(s) that lead to the up­date, the up­date is re­ject­ed and a can­cel­la­tion is re­port­ed back to the client. Other­wise the or­der is processed and a con­fir­ma­tion is re­port­ed back to the client.
  5. The re­sults (con­fir­ma­tion or can­cel­la­tion) are sent asyn­chro­nous­ly to the clients. This can be ei­ther email, mes­sage queu­ing, or any oth­er asyn­chro­nous de­liv­ery method.
That's it. Ad­here to these guide­lines, and you have a CAP ar­chi­tec­ture. I will not pro­vide a for­mal proof here (I in­tend to do that else­where, in a re­search pa­per), but in­tu­itive­ly the proof is as fol­lows:
  • This sys­tem is con­sis­tent be­cause reads are based on snap­shots and in­cor­rect up­dates are re­ject­ed be­fore they are ap­plied. In oth­er words: there are no in­cor­rect ex­e­cu­tions.
  • This sys­tem is avail­able since reads al­ways re­turn a val­ue, and so do writes (even though they are queued and it may take a while).
  • This sys­tem is par­ti­tion-tol­er­ant be­cause it al­lows net­work and node fail­ures.
Grant­ed, this sys­tem does not pro­vide all three at the same mo­ment in time (which is how we go around the im­pos­si­bil­i­ty), but nev­er­the­less the re­sult is quite strong IMHO.

The lim­i­ta­tions

There are some lim­i­ta­tions to this so­lu­tion - all of which seem rea­son­able:
  1. Read-only re­quests may be pre­sent­ed with stale in­for­ma­tion (due to up­dates that have yet-to-be-ap­plied). In that sense, their re­sults could be "in­con­sis­tent": for in­stance, the avail­abil­i­ty of an Ama­zon item can change be­tween two page views. I do not see this as a ma­jor re­stric­tion, since no web­site that I know of will of­fer read con­sis­ten­cy for the du­ra­tion of a user ses­sion. It all de­pends on what you con­sid­er to be with­in the scope of one trans­ac­tion;-) Note that this al­most cor­re­sponds to snap­shot iso­la­tion found in Or­a­cle.
  2. Par­ti­tions should not last for­ev­er: in or­der for this to work, par­ti­tions should be re­solved with­in a rea­son­able time (rea­son­able be­ing: with­in the ex­pect­ed con­fir­ma­tion time for up­dates). The du­ra­tion of any par­ti­tions also af­fects the time win­dow in which reads can pro­duce stale data.
  3. The up­dates have to be ap­plied in the same rel­a­tive or­der at all clus­ter nodes. This puts some re­stric­tions on the al­go­rithm used to do this.
Note that up­dates are al­ways based on cor­rect reads thanks to the ver­sion­ing check be­fore they are ap­plied. So up­date trans­ac­tions are al­ways con­sis­tent.

How does this re­late to BASE?

You could see this as a de­sign pat­tern for BASE if you like. The so­lu­tion ad­heres to BASE in the sense that it uses cached reads (if need­ed) and that the up­dates are de­layed (so you could say they are "even­tu­al­ly" ap­plied and the sys­tem be­comes "con­sis­tent").

Re­flec­tions in scal­a­bil­i­ty

So far the CAP fo­cus was on pos­si­bil­i­ty. I think my so­lu­tion shows that it is pos­si­ble. Now how about scal­ing up?

The naive so­lu­tion (a huge dis­trib­uted trans­ac­tion to up­date all clus­ter nodes in-sync) is un­like­ly to scale: as you add more nodes, more up­dates are need­ed. Now I am a big fan of trans­ac­tions, but not to use them in an ar­bi­trary mat­ter. So how to prop­a­gate these up­dates through the clus­ter?

While smarter so­lu­tions for this ex­ist (such as the work by Bet­ti­na Kemme), a triv­ial first try would be to push up­dates (lazi­ly) to all nodes in the clus­ter. This can be done with a smart queu­ing mech­a­nism. The dis­ad­van­tage is that up­dates are not ap­plied every­where at once (rather, the all-or-noth­ing qual­i­ty just "rip­ples" through the sys­tem). So you get into the "even­tu­al­ly" style again.

Note that this lat­ter sug­ges­tion makes the sys­tem be­have much like the READ COMMITTED iso­la­tion lev­el (which, by the way, is the de­fault in Or­a­cle). So this ap­proach sac­ri­fices con­sis­ten­cy/iso­la­tion a bit in fa­vor of scal­a­bil­i­ty.

Fu­ture work

Ad­di­tion­al re­search could/should be done in the fol­low­ing ar­eas:
  • Im­prov­ing read con­sis­ten­cy through ses­sion affin­i­ty
  • The best way to push the up­dates through the clus­ter
  • Per­for­mance eval­u­a­tion in real life im­ple­men­ta­tions
Fi­nal note and dis­claimer

I did not see Brew­er's orig­i­nal pre­sen­ta­tion of the CAP the­o­rem - so it could be that what he meant with con­sis­ten­cy also in­volved all reads (see the lim­i­ta­tions of the so­lu­tion I pre­sent­ed here). In that case I did not find a so­lu­tion for CAP but at least it is a frame­work and proof out­line for BASE winking face

UPDATE 15/3/2012:

It seems like Greg Young and Udi Da­han have been work­ing along sim­i­lar lines and gave this pat­tern/so­lu­tion a name: CQRS.

Forté/UDS is an end-of-life tech­nol­o­gy that used to be in Sun's prod­uct port­fo­lio. When talk­ing to peo­ple who have been do­ing a lot with Forté in the past, it seems that Forté can be con­sid­ered an an­ces­tor of Java:

  • It has an ob­ject-ori­ent­ed (4GL) de­vel­op­ment lan­guage.
  • Like Java's JMX, Forte also has in­stru­men­ta­tion (the agent is even called icon­sole - like jcon­sole for Java's built-in JMX agent these days!).
  • It has dis­trib­uted trans­ac­tions.
  • It has a strong no­tion of events as first-class cit­i­zens in the lan­guage.

The only thing that Forté does not have is En­ter­prise JavaBeans (EJB), nor XML con­fig­u­ra­tion is­sues for the ap­pli­ca­tion serv­er. This means that Forté de­vel­op­ers who mi­grate to Java (be­cause they are left lit­tle choice) get con­front­ed with com­plex­i­ties that they did not have to both­er with in their 4GL en­vi­ron­ment.

Thanks to Atomikos and the J2EE with­out ap­pli­ca­tion serv­er method­ol­o­gy, teams who used to work in Forté can eas­i­ly do Java/J2EE with­out hav­ing to both­er about the clut­ter of EJB nor about the ap­pli­ca­tion serv­er's XML hell. What's more, in com­bi­na­tion with Spring, Hiber­nate and JMS there is an equiv­a­lent, light-weight Java stack that (thanks to Atomikos) can still do all the con­nec­tion pool­ing, event-dri­ven and trans­ac­tion­al pro­cess­ing that is need­ed.

What makes it even bet­ter is that this method­ol­o­gy seems to achieve equal pro­duc­tiv­i­ty as with the 4GL en­vi­ron­ment in Forté, which is pret­ty good giv­en that Java is a 3GL and is not wide­ly known as a pro­duc­tiv­i­ty mir­a­cle.

In my last post I dis­cussed the the­o­ret­i­cal proof of the CAP the­o­rem. Both the the­o­rem and the proof have a lim­i­ta­tion that might very well ren­der them not-so-uni­ver­sal as as­sumed.

The lim­i­ta­tion of the CAP proof

The lim­i­ta­tion of the CAP proof (as for­mu­lat­ed by Lynch et al) is the fol­low­ing: it as­sumes that - for the pur­pose of avail­abil­i­ty - re­quests are to be served even when there is a par­ti­tion in the clus­ter.

A way around the lim­i­ta­tion

There is a way around this lim­i­ta­tion - al­though it may sound ex­ot­ic: just make sure that there are no par­ti­tions when re­quests are served.

How? By sim­ply do­ing the fol­low­ing:

  • Queue re­quests (e.g., in JMS).
  • Only process re­quests when there is no par­ti­tion prob­lem.
  • Send re­spons­es asyn­chro­nous­ly, for in­stance via email.

Since no par­ti­tion (hope­ful­ly) lasts for­ev­er, this so­lu­tion does not lead to live­lock.

Also, note that quo­rum so­lu­tions ex­ist to avoid that the com­plete clus­ter has to be up at the same time.

Is this the ca­pit­u­la­tion of CAP? Who knows...

Corporate Information

Atomikos Corporate Headquarters
Hoveniersstraat, 39/1, 2800
Mechelen, Belgium

Contact Us

Copyright 2026 Atomikos BVBA | Our Privacy Policy
By using this site you agree to our cookies. More info. That's Fine