Disaster recovery and XA part 2: a cheap and easy hot backup / disaster recovery architecture with XA

An easy and cheap way to achieve hot backup and disaster recovery with XA...

This article is the second part in its series. Whereas the first part introduced the problem, this second part will present a basic, cheap and easy solution that maintains a hot backup for zero data loss in case of disaster.

The basic idea: synchronous DBMS replication with XA
Dealing with disaster
Wrapping up
What's next?
Can't wait?
Your take

The basic idea: synchronous DBMS replication with XA

All solutions presented in this article are built upon the same basic notion of synchronous DBMS replication with XA:

Assuming that the primary and secondary DBMS are in different data centers, this ensures that we always have a copy of the data. It gives us a hot backup at all times, without the need for expensive DBMS replication technology. It's also ideal for cloud environments since you don't need heavy enterprise DBMS tools to setup the replication. Instead, our cloud-native transaction technology is all you need.

The application could be a web application, a micro-service or any other server-based application.

Note that cloud platforms already include backup / failover solutions - but these are typically not hot backups, plus they depend on vendor-specific mechanisms (making your cloud applications less portable across platforms). Our customers (mostly in financial services) often prefer XA because they already use it and have a lot of experience with it.

Dealing with disaster

Let's briefly go over how you can deal with disaster scenario's...

Sudden and permanent loss of incoming requests

We assume that requests can get lost, it is up to the client (not shown) to retry failed requests. This means that we assume that the client can consult the primary and / or secondary DBMS to determine if retry is needed or not. Clients can assume that requests are dealt with atomically, i.e. both primary and secondary are updated, or both have rolled back (how this works should become clear below).

Sudden and permanent loss of the primary DBMS

The primary DBMS can be restored from the secondary, since they are always kept in sync.

Before reconstructing it, the following steps are needed:

The former primary's pending transactions have to be purged from the transaction logs since they no longer have any use (the reconstructed primary will not remember any pending transactions) - this will soon be available as part of The LogCloud technology.
The system is temporarily put into read-only mode (meaning client requests for updates will temporarily fail except when we do what will be explained in the next part).
Distributed transaction recovery for the secondary is allowed to terminate before the primary is reconstructed, so all pending transactions are terminated and a clean, stable database view is available without any pending locks. Again, this will soon be part of The LogCloud technology.

The last step is necessary because at the time of disaster, almost by definition there will be pending transactions in both the (lost) primary and the secondary DBMS. This step ensures that the secondary DBMS is in a quiet state (no pending updates) when a new primary is created from it. Otherwise, we would risk running into locks because of remaining in-doubt transactions - which will effectively be cleaned up by the transaction recovery.

This works correctly, because at the time of a disaster:

All positive responses previously returned to the client are still taken into account in the secondary DBMS state (because a positive return value is only sent after successful commit, meaning after both DBMS have committed) - so there is no data loss
All pending transactions are terminated correctly by the transaction recovery and the results copied to the new primary, and
These pending transactions become visible in both DBMS after the restore is done - i.e., "eventual consistency"

Sudden and permanent loss of the secondary DBMS

In a similar way, the secondary DBMS can be restored from the primary.

Sudden and permanent loss of the transaction logs

For now, there is not much that can be done if the transaction logs are lost - so a mirrored disk approach or replicated disks of some form are highly recommended. Disk replication is presumably cheaper than full DBMS replication technology, so we think this is acceptable. In the future, we may be able to eliminate the logs - but for now this is what has to be done. With our The LogCloud, only the dedicated logging and recovery service needs these mirrored disks. This can be done in a private cloud datacenter, for instance.

Combined losses

Dealing with combined losses means we have to cope with two or more systems failing together. While this is certainly harder to deal with, the whole assumption behind a primary and secondary DBMS is that it is extremely unlikely that two systems will fail at the same time. So by definition, combined losses are beyond the scope of this architecture because they make the idea of primary / secondary hot backups pointless in the first place.

Of course, to prevent combined losses all of the resources (primary DBMS, secondary DBMS and transaction logs) should be hosted in different data centers.

Wrapping up

That's it, we've outlined a cheap and easy way to setup a hot-back architecture with zero loss in case of disaster striking on one of the two DBMS systems. What required very expensive enterprise software in the past can now be done much cheaper and much easier thanks to our cloud-native transaction processing software!

Note that while XA seemed to be a drawback in part 1 (making the problem a bit more complex), it actually turned out to be an advantage for the solution.

What's next?

Stay tuned for the next part in this series, where we will show another cheap and easy trick to scale things up horizontally - and even avoid failed client update requests when the system is doing failover.

Can't wait?

Do you prefer to get started and try things on your own?

Download our FREE JTA/XA here

Your take

So what is your experience with disaster recovery? Feel free to share any comments below…

Blog

Similar

Latest in Tech tips

Disaster recovery and XA part 2: a cheap and easy hot backup / disaster recovery architecture with XA

The basic idea: synchronous DBMS replication with XA

Dealing with disaster

Sudden and permanent loss of incoming requests

Sudden and permanent loss of the primary DBMS

Sudden and permanent loss of the secondary DBMS

Sudden and permanent loss of the transaction logs

Combined losses

Wrapping up

What's next?

Can't wait?

Your take

Comments

Add a comment

Attachments ($count)

Connect

Corporate Information

Community

Blog

Similar

Latest in Tech tips

Disas­ter re­cov­ery and XA part 2: a cheap and easy hot back­up / dis­as­ter re­cov­ery ar­chi­tec­ture with XA

The ba­sic idea: syn­chro­nous DBMS repli­ca­tion with XA

Deal­ing with dis­as­ter

Sud­den and per­ma­nent loss of in­com­ing re­quests

Sud­den and per­ma­nent loss of the pri­ma­ry DBMS

Sud­den and per­ma­nent loss of the sec­ondary DBMS

Sud­den and per­ma­nent loss of the trans­ac­tion logs

Com­bined loss­es

Wrap­ping up

What's next?

Can't wait?

Your take

Com­ments

Add a com­ment

At­tach­ments ($count)

Con­nect

Cor­po­rate In­for­ma­tion

Com­mu­ni­ty

Disaster recovery and XA part 2: a cheap and easy hot backup / disaster recovery architecture with XA

The basic idea: synchronous DBMS replication with XA

Dealing with disaster

Sudden and permanent loss of incoming requests

Sudden and permanent loss of the primary DBMS

Sudden and permanent loss of the secondary DBMS

Sudden and permanent loss of the transaction logs

Combined losses

Wrapping up

Comments

Add a comment

Attachments ($count)

Connect

Corporate Information

Community