The Atomikos Blog

Archive

You are here: Blog
One of the lat­est chal­lenges in com­put­er sci­ence seems to be the CAP the­o­rem. It ad­dress­es a per­ceived im­pos­si­bil­i­ty of build­ing large-scale and clus­tered (web) ser­vice ar­chi­tec­tures. The fact that it (sup­pos­ed­ly) has been proven to be true makes what I am go­ing to write here all the more un­like­ly. Still, read on be­cause I will show that I am right and CAP is not an im­pos­si­bil­i­ty af­ter all... While the im­pos­si­bil­i­ty proof of CAP is math­e­mat­i­cal­ly cor­rect, it is based on as­sump­tions that are too strict. By re­lax­ing these as­sump­tions, I found the so­lu­tion pre­sent­ed here.

What is CAP?

The CAP the­o­rem (short for con­sis­ten­cy, avail­abil­i­ty, par­ti­tion-tol­er­ant) es­sen­tial­ly states that you can­not have a clus­tered sys­tem that sup­ports all of the fol­low­ing three qual­i­ties:

Con­sis­ten­cy
Con­sis­ten­cy is a qual­i­ty mean­ing (in­for­mal­ly speak­ing) that reads and writes hap­pen cor­rect­ly. In oth­er words, the over­all ef­fect of ex­e­cut­ing thou­sands or mil­lions of trans­ac­tions con­cur­rent­ly is the same as if they had been ex­e­cut­ed one-at-a-time. Usu­al­ly, this is done with the help of a trans­ac­tion man­ag­er of some sort.
Avail­abil­i­ty
Avail­abil­i­ty es­sen­tial­ly means that every op­er­a­tion (that makes it to a non-fail­ing node) even­tu­al­ly re­turns a re­sult.
Par­ti­tion-tol­er­ant
This qual­i­ty refers to the pos­si­bil­i­ty of tol­er­at­ing par­ti­tions on the net­work. Note that we sup­pose a clus­ter ar­chi­tec­ture (which is where the net­work comes in).

CAP is a con­jec­ture orig­i­nal­ly for­mu­lat­ed by Eric Brew­er (Ink­to­mi) and has in­flu­enced many of to­day's larg­er-scale web­sites like . In oth­er words, the im­pact of CAP is very large. To make it worse, the per­ceived im­pos­si­bil­i­ty of a CAP sys­tem (one that has all three de­sir­able prop­er­ties) has lead peo­ple to ad­vo­cate some­thing called BASE (Ba­si­cal­ly Avail­able, Soft-state and Even­tu­al­ly Con­sis­tent) - see this talk by Wern­er Vo­gels (CTO at Ama­zon).

As far as I know (but I could be wrong), a the­o­ret­i­cal foun­da­tion of BASE does not ex­ist yet (it seems more of an in­for­mal ap­proach which to me rais­es se­ri­ous ques­tions con­cern­ing cor­rect­ness). In this post I will present:
  • a CAP so­lu­tion
  • how this con­forms to what BASE wants to achieve
  • a "de­sign pat­tern" for build­ing cor­rect sys­tems that (in a way) of­fer both CAP and BASE qual­i­ties
Be­cause CAP is per­ceived as im­pos­si­ble and be­cause BASE lacks for­mal treat­ment, I con­sid­er this to be a sig­ni­fi­ca­tion con­tri­bu­tion to the state of to­day's en­gi­neer­ing;-)

What about the proof of Brew­er's the­o­rem?

Brew­er's proof has been pub­lished by Nan­cy Lynch et al and dis­cussed by me (see my ear­li­er post and also this one).

While the the­o­ret­i­cal proof of the im­pos­si­bil­i­ty of CAP is valid, it has a big lim­i­ta­tion: it as­sumes that all three CAP prop­er­ties have to be sup­plied at the same mo­ment in time. If you drop this as­sump­tion, then all of a sud­den you get into a new spec­trum of pos­si­bil­i­ties. This is what I will do here.

A CAP so­lu­tion

Enough talk, let's get to the core of the mat­ter. Here is my so­lu­tion to CAP. To make it con­crete, I will use the con­cept of a web-shop like Ama­zon. Here are the rules that are suf­fi­cient to en­sure CAP:
  1. Pro­cess reads from the data­base if pos­si­ble, or use a cached val­ue if need­ed for avail­abil­i­ty (if the DB is un­reach­able).
  2. All reads use ver­sion­ing or an­oth­er mech­a­nism that al­lows op­ti­mistic lock­ing.
  3. Up­dates sup­plied by clients (or­ders in case of Ama­zon) are queued for ex­e­cu­tion, and in­clude the ver­sion­ing in­for­ma­tion of the reads that lead to the up­date.
  4. Queued up­dates are processed when the num­ber of par­ti­tions is low enough to do so. The eas­i­est way to do this is with a clus­ter-wide dis­trib­uted trans­ac­tion across all repli­cas (more on scal­a­bil­i­ty lat­er), but oth­er more re­fined ways are pos­si­ble (such as quo­rum-based repli­ca­tion or any oth­er smart way of repli­cat­ing). The ver­sion in­for­ma­tion in the up­date is used to val­i­date it: if the data in the data­base has been mod­i­fied since the orig­i­nal read(s) that lead to the up­date, the up­date is re­ject­ed and a can­cel­la­tion is re­port­ed back to the client. Other­wise the or­der is processed and a con­fir­ma­tion is re­port­ed back to the client.
  5. The re­sults (con­fir­ma­tion or can­cel­la­tion) are sent asyn­chro­nous­ly to the clients. This can be ei­ther email, mes­sage queu­ing, or any oth­er asyn­chro­nous de­liv­ery method.
That's it. Ad­here to these guide­lines, and you have a CAP ar­chi­tec­ture. I will not pro­vide a for­mal proof here (I in­tend to do that else­where, in a re­search pa­per), but in­tu­itive­ly the proof is as fol­lows:
  • This sys­tem is con­sis­tent be­cause reads are based on snap­shots and in­cor­rect up­dates are re­ject­ed be­fore they are ap­plied. In oth­er words: there are no in­cor­rect ex­e­cu­tions.
  • This sys­tem is avail­able since reads al­ways re­turn a val­ue, and so do writes (even though they are queued and it may take a while).
  • This sys­tem is par­ti­tion-tol­er­ant be­cause it al­lows net­work and node fail­ures.
Grant­ed, this sys­tem does not pro­vide all three at the same mo­ment in time (which is how we go around the im­pos­si­bil­i­ty), but nev­er­the­less the re­sult is quite strong IMHO.

The lim­i­ta­tions

There are some lim­i­ta­tions to this so­lu­tion - all of which seem rea­son­able:
  1. Read-only re­quests may be pre­sent­ed with stale in­for­ma­tion (due to up­dates that have yet-to-be-ap­plied). In that sense, their re­sults could be "in­con­sis­tent": for in­stance, the avail­abil­i­ty of an Ama­zon item can change be­tween two page views. I do not see this as a ma­jor re­stric­tion, since no web­site that I know of will of­fer read con­sis­ten­cy for the du­ra­tion of a user ses­sion. It all de­pends on what you con­sid­er to be with­in the scope of one trans­ac­tion;-) Note that this al­most cor­re­sponds to snap­shot iso­la­tion found in Or­a­cle.
  2. Par­ti­tions should not last for­ev­er: in or­der for this to work, par­ti­tions should be re­solved with­in a rea­son­able time (rea­son­able be­ing: with­in the ex­pect­ed con­fir­ma­tion time for up­dates). The du­ra­tion of any par­ti­tions also af­fects the time win­dow in which reads can pro­duce stale data.
  3. The up­dates have to be ap­plied in the same rel­a­tive or­der at all clus­ter nodes. This puts some re­stric­tions on the al­go­rithm used to do this.
Note that up­dates are al­ways based on cor­rect reads thanks to the ver­sion­ing check be­fore they are ap­plied. So up­date trans­ac­tions are al­ways con­sis­tent.

How does this re­late to BASE?

You could see this as a de­sign pat­tern for BASE if you like. The so­lu­tion ad­heres to BASE in the sense that it uses cached reads (if need­ed) and that the up­dates are de­layed (so you could say they are "even­tu­al­ly" ap­plied and the sys­tem be­comes "con­sis­tent").

Re­flec­tions in scal­a­bil­i­ty

So far the CAP fo­cus was on pos­si­bil­i­ty. I think my so­lu­tion shows that it is pos­si­ble. Now how about scal­ing up?

The naive so­lu­tion (a huge dis­trib­uted trans­ac­tion to up­date all clus­ter nodes in-sync) is un­like­ly to scale: as you add more nodes, more up­dates are need­ed. Now I am a big fan of trans­ac­tions, but not to use them in an ar­bi­trary mat­ter. So how to prop­a­gate these up­dates through the clus­ter?

While smarter so­lu­tions for this ex­ist (such as the work by Bet­ti­na Kemme), a triv­ial first try would be to push up­dates (lazi­ly) to all nodes in the clus­ter. This can be done with a smart queu­ing mech­a­nism. The dis­ad­van­tage is that up­dates are not ap­plied every­where at once (rather, the all-or-noth­ing qual­i­ty just "rip­ples" through the sys­tem). So you get into the "even­tu­al­ly" style again.

Note that this lat­ter sug­ges­tion makes the sys­tem be­have much like the READ COMMITTED iso­la­tion lev­el (which, by the way, is the de­fault in Or­a­cle). So this ap­proach sac­ri­fices con­sis­ten­cy/iso­la­tion a bit in fa­vor of scal­a­bil­i­ty.

Fu­ture work

Ad­di­tion­al re­search could/should be done in the fol­low­ing ar­eas:
  • Im­prov­ing read con­sis­ten­cy through ses­sion affin­i­ty
  • The best way to push the up­dates through the clus­ter
  • Per­for­mance eval­u­a­tion in real life im­ple­men­ta­tions
Fi­nal note and dis­claimer

I did not see Brew­er's orig­i­nal pre­sen­ta­tion of the CAP the­o­rem - so it could be that what he meant with con­sis­ten­cy also in­volved all reads (see the lim­i­ta­tions of the so­lu­tion I pre­sent­ed here). In that case I did not find a so­lu­tion for CAP but at least it is a frame­work and proof out­line for BASE winking face

UPDATE 15/3/2012:

It seems like Greg Young and Udi Da­han have been work­ing along sim­i­lar lines and gave this pat­tern/so­lu­tion a name: CQRS.

Forté/UDS is an end-of-life tech­nol­o­gy that used to be in Sun's prod­uct port­fo­lio. When talk­ing to peo­ple who have been do­ing a lot with Forté in the past, it seems that Forté can be con­sid­ered an an­ces­tor of Java:

  • It has an ob­ject-ori­ent­ed (4GL) de­vel­op­ment lan­guage.
  • Like Java's JMX, Forte also has in­stru­men­ta­tion (the agent is even called icon­sole - like jcon­sole for Java's built-in JMX agent these days!).
  • It has dis­trib­uted trans­ac­tions.
  • It has a strong no­tion of events as first-class cit­i­zens in the lan­guage.

The only thing that Forté does not have is En­ter­prise JavaBeans (EJB), nor XML con­fig­u­ra­tion is­sues for the ap­pli­ca­tion serv­er. This means that Forté de­vel­op­ers who mi­grate to Java (be­cause they are left lit­tle choice) get con­front­ed with com­plex­i­ties that they did not have to both­er with in their 4GL en­vi­ron­ment.

Thanks to Atomikos and the J2EE with­out ap­pli­ca­tion serv­er method­ol­o­gy, teams who used to work in Forté can eas­i­ly do Java/J2EE with­out hav­ing to both­er about the clut­ter of EJB nor about the ap­pli­ca­tion serv­er's XML hell. What's more, in com­bi­na­tion with Spring, Hiber­nate and JMS there is an equiv­a­lent, light-weight Java stack that (thanks to Atomikos) can still do all the con­nec­tion pool­ing, event-dri­ven and trans­ac­tion­al pro­cess­ing that is need­ed.

What makes it even bet­ter is that this method­ol­o­gy seems to achieve equal pro­duc­tiv­i­ty as with the 4GL en­vi­ron­ment in Forté, which is pret­ty good giv­en that Java is a 3GL and is not wide­ly known as a pro­duc­tiv­i­ty mir­a­cle.

In my last post I dis­cussed the the­o­ret­i­cal proof of the CAP the­o­rem. Both the the­o­rem and the proof have a lim­i­ta­tion that might very well ren­der them not-so-uni­ver­sal as as­sumed.

The lim­i­ta­tion of the CAP proof

The lim­i­ta­tion of the CAP proof (as for­mu­lat­ed by Lynch et al) is the fol­low­ing: it as­sumes that - for the pur­pose of avail­abil­i­ty - re­quests are to be served even when there is a par­ti­tion in the clus­ter.

A way around the lim­i­ta­tion

There is a way around this lim­i­ta­tion - al­though it may sound ex­ot­ic: just make sure that there are no par­ti­tions when re­quests are served.

How? By sim­ply do­ing the fol­low­ing:

  • Queue re­quests (e.g., in JMS).
  • Only process re­quests when there is no par­ti­tion prob­lem.
  • Send re­spons­es asyn­chro­nous­ly, for in­stance via email.

Since no par­ti­tion (hope­ful­ly) lasts for­ev­er, this so­lu­tion does not lead to live­lock.

Also, note that quo­rum so­lu­tions ex­ist to avoid that the com­plete clus­ter has to be up at the same time.

Is this the ca­pit­u­la­tion of CAP? Who knows...

My take on CAP

03 September 2008 | Guy Pardon | Vision
The CAP the­o­rem (Con­sis­ten­cy, Avail­abil­i­ty, Par­ti­tion­ing) has been re­ceiv­ing quite a lot of in­ter­est late­ly, just to men­tion one of the many ref­er­ences.

What is CAP about?

First let me give cred­its here: I am de­riv­ing my in­spi­ra­tion from the the­o­ret­i­cal in­sights found in this pa­per co-au­thored by one of my fa­vorite woman sci­en­tists, Nan­cy Lynch from MIT. If you get a chance to read this pa­per, go ahead it will bring you some very use­ful fun­da­men­tal un­der­stand­ing...

The CAP the­o­rem is es­sen­tial­ly a lim­i­ta­tion on what you can do with clus­tered (web) ser­vices in the fash­ion­able con­text of SOA.

The word 'clus­ter' is im­por­tant here since that is what it is all about. In par­tic­u­lar, the the­o­rem states that you can't have all three prop­er­ties (Con­sis­ten­cy, Avail­abil­i­ty, Par­ti­tion­ing) in one and the same sys­tem (read: ser­vice). This im­plies that there is no per­fect so­lu­tion to build­ing a high-through­put pop­u­lar ser­vice, or is there? Let's first ex­plore what each thing means...

Con­sis­ten­cy

By con­sis­ten­cy, the the­o­rem refers to the prop­er­ty that changes (up­dates) to the ser­vice back-end are vis­i­ble to lat­er queries. Sim­pli­fy­ing: if you add some­thing to your shop­ping bas­ket then it will ap­pear there next time you re­trieve your bas­ket sta­tus. That sounds triv­ial, but it is not if the bas­ket is spread over mul­ti­ple phys­i­cal serv­er process­es... Con­sis­ten­cy is com­mon­ly en­sured (be­tween process­es) by hav­ing some sort of dis­trib­uted trans­ac­tion co­or­di­na­tor, or (as­sum­ing a cen­tral back-end) a sin­gle cen­tral­ized data­base.

Avail­abil­i­ty

The Lynch pa­per uses a very sim­ple but suf­fi­cient de­f­i­n­i­tion of "avail­abil­i­ty": a sys­tem is avail­able if every re­quest to it re­turns. In oth­er words: there is no in­fi­nite block­ing.

Par­ti­tion­ing

Par­ti­tion­ing means the cut-off be­tween two seg­ments of the clus­ter. In oth­er words, one or more nodes be­come un­reach­able for at least some time.

What is the The­o­rem say­ing?

You can't have all three of the above qual­i­ties, pe­ri­od. How­ev­er, you can com­bine any two of them if you like. This is proven in the pa­per by Lynch et al. Also (and this is im­por­tant) you can ap­ply dif­fer­ent com­bi­na­tions of qual­i­ties to parts of your sys­tem. Mean­ing: you can stress con­sis­ten­cy in one part, avail­abil­i­ty in an­oth­er part, and so on. For in­stance, or­der pro­cess­ing or pay­ment pro­cess­ing can be done con­sis­tent­ly and avail­able (sac­ri­fic­ing par­ti­tion tol­er­ance) where­as query­ing the prod­uct cat­a­log can be done dif­fer­ent­ly (stress­ing par­ti­tion tol­er­ance in fa­vor of con­sis­ten­cy).

Does this con­tra­dict or in­val­i­date Atomikos?

Not at all, quite the con­trary: it makes Atomikos (and its third gen­er­a­tion of TP mon­i­tors) all the more rel­e­vant. Why? Be­cause Atomikos prod­ucts can help you in mak­ing those parts con­sis­tent when you want them to be.

Vir­tu­al­ly achiev­ing all three qual­i­ties

If you em­brace asyn­chro­nous mes­sag­ing (a la JMS or email) and ex­treme trans­ac­tion pro­cess­ing (XTP) then it is pos­si­ble to as­ymp­tot­i­cal­ly re­al­ize all three qual­i­ties (con­sis­ten­cy, avail­abil­i­ty, par­ti­tion-tol­er­ance) pro­vid­ed that you do use a call­back mech­a­nism to com­mu­ni­cate re­sults (e.g., by send­ing a con­fir­ma­tion email). Here is how:

  • Queue re­quests in JMS.
  • Pro­cess each re­quest trans­ac­tion­al­ly (so fail­ures will leave the re­quest queued for re­tries).
  • The process that di­gests each re­quest can be ar­bi­trar­i­ly com­plex and use trans­ac­tions (con­sis­ten­cy) and re­turn when­ev­er it likes (thanks to the queu­ing, no re­ply is ex­pect­ed with­in a pre­set time frame).
  • Any lack of avail­abil­i­ty of the pro­cess­ing is re­cov­ered by the queues: failed re­quests will stay queued un­til the process in the back-end is in fact avail­able again.

Now did I just break the CAP im­pos­si­bil­i­ty? More on this in a next post...

Sup­pose you want to de­vel­op a high-vol­ume trans­ac­tion pro­cess­ing sys­tem in Java/J2EE. How would you do it? Most peo­ple would say: don't use JTA/XA trans­ac­tions be­cause they kill per­for­mance. Wrong. And they would also say: use an appserv­er to scale. Again, they couldn't be more wrong.

Here is the mag­ic recipe on how we build sys­tems with vir­tu­al­ly un­lim­it­ed scal­a­bil­i­ty at Atomikos:

  • Kick out your appserv­er as soon as you can, as ex­plained here. J2EE is not lim­it­ed to an appserv­er. J2EE is a set of APIs. The appserv­er ties these APIs to a pro­gram­ming mod­el that al­most no­body needs. Con­clu­sion: drop the lat­ter.
  • Use a per­sis­tent JMS queue to store trans­ac­tion re­quests. This al­lows easy load-bal­anc­ing and pro­vides crash re­silience for on­go­ing re­quests. It also de-cou­ples the clients from the trans­ac­tion pro­cess­ing sys­tem.
  • Use Ex­tremeTrans­ac­tions to process the re­quests (stored in JMS). This al­lows for re­li­able, ex­act­ly-once mes­sage pro­cess­ing as out­lined here. Make sure to use the sup­plied JMS and JDBC dri­vers!
  • To add more pow­er, just add a sec­ond VM (process) on a sep­a­rate CPU.
  • Re­peat un­til per­for­mance is high enough.

You will reach the re­quired per­for­mance be­cause of the in­tra-VM na­ture of each process you add. The only po­ten­tial bot­tle­necks are your own data­base or JMS back­end. So scal­ing comes down to scal­ing your back­ends, which is much sim­pler than scal­ing your ap­pli­ca­tion it­self (which has al­ready been done in a nat­ur­al way as out­lined above).

So don't let any­body fool you: trans­ac­tions do scale - even with­out lim­its!.

Ex­tra: Elas­tic scal­ing

Our com­mer­cial prod­uct Ex­tremeTrans­ac­tions in­cludes elas­tic scal­ing fea­tures for cloud de­ploy­ments. You can ap­ply for a free tri­al be­low...

Try Ex­tremeTrans­ac­tions For Free

Corporate Information

Atomikos Corporate Headquarters
Hoveniersstraat, 39/1, 2800
Mechelen, Belgium

Contact Us

Copyright 2026 Atomikos BVBA | Our Privacy Policy
This page was cached on 27 May 2026 - 07:16.
By using this site you agree to our cookies. More info. That's Fine