The Problem
There is a lot of enthusiasm concerning Kafka as a high-speed broker for your messages. However, so far Kafka does not support extended transactions. Despite what the documentation suggests, this implies the following:
Messages sent to Kafka can be lost if there is a crash or a bug
Suppose you have your data stored in some regular DBMS (not Kafka). When you update the data, you want to publish an event to Kafka. You can do this as follows:
- First update the DBMS (i.e., commit the changes)
- Then publish to Kafka
However, if a crash happens in between 1 and 2 then you will have updated the DBMS, but the message for Kafka will have been lost. You may not want this.
Alternatively, let's try the other way around:
- Publish a message to Kafka
- Then update (commit) the DBMS changes
This does not help: if a crash happens in between 1 and 2 then you will have published a message, but the associated data will have been lost (no commit).
Messages received from Kafka can be lost if there is a crash or a bug
On the other end, there is likely some consumer of the messages you publish. Assuming that the consumers use their own DBMS storage (not Kafka), how do you consume messages reliably? Let's try:
- Take a message from Kafka
- Update the DBMS
- Save the offset of your Kafka messages read so far - so you don't get the same message again
If your developers don't think hard about failure scenarios (it's hard to think about everything when you are under pressure to deliver working code) then they might get the order wrong and the following can happen instead:
- Take a message from Kafka
- Save the offset of your Kafka messages read so far - so you don't get the same message again
- Update the DBMS
In this case, a crash between 2 and 3 will again lose messages.
The Solution
So how do you prevent data inconsistency with Kafka? Is it at all possible?
Yes, it is possible! You need to store everything inside Kafka. You see, Kafka is based on a logging architecture so it can "log" everything, and its logs can be used to restore the latest state of your domain objects via "event replay". In that sense, the Kafka team's article on exactly-once in Kafka is correct. However, be aware that it is actually insufficient if you use other storage technologies as outlined in the above…
Add a comment