Sunday, April 6, 2014

A plead to use tuple versioning in a database properly

I've seen tuple versioning been used in many databases and in almost all of them the end time of version is implemented in an error-prone way.

There are two ways to implement the end time:

Way 1. End time of version is the same time than start time of next version. 

From wikipedia:
For example, if a person's job changes from Engineer to Manager, there would be two tuples in an Employee table, one with the value Engineer for job and the other with the value Manager for job. The end time for the Engineer tuple would be equal to the start time for the Manager tuple.
When you select specific version you use query like this:
sysdate >= start_time and sysdate < end_time

Way 2. End time of the version is the last time the version was valid. 

If resolution of time is one day, then end time of version for instance would be 30.9.2013 if the start time of the next version  is 1.10.2013.

The query then looks almost the same than the other one: 
sysdate >= start_time and sysdate <= end_time

Do not use Way 2.  

Problem with Way 2 is you have to know what the resolution of time is. If the start time of the next version is 1.10.2013, then the end time of version could be 30.9.2013 or 30.9 23:59:00 or 30.9 23:59:59 or 30.9 23:59:59.999 depending of the resolution of the time used. Remember you normally hard code this to your queries or program code everywhere where you have to make a new version.

If you want to change the resolution of time later, you have to change your program AND your data. If you change your resolution of time from day to seconds, your have to update every row in your database and add that 23:59:59.

If you have connections to other databases that use different resolutions of time, you have to change end times on-the-fly. If you use Way 1 you don't have to do anything.

When you are searching for the version before another row you have to make same calculation depending of the resolution of time in your system. If you use Way 1 you can simple use query earlier_version.end_time = next_version.start_time.

This is completely unnecessary. I can't find any decent arguments to use Way 2 instead of Way 1. It's not a big problem but like I said it's an unnecessary one. If you simply use Way 1 there will not be any of the problems I previously mentioned. The queries to find specific version are nearly identical.

If you are using Oracle 11G there's also now an effective way to index null values of the end times:

No comments:

Post a Comment