com_sync - Starting the project
Like all good open source projects, this one was born out of a frustration. I wanted to be able to write entries for my personal site when off-line, but the only good software I could find to do so, Blogg-X, refused to work for me. The installer for the latest version wouldn’t run on Linux, and the previous version failed to login to a current Joomla site.
What to do? Of course, write your own. I thought for a while and realized that the approach taken by Blogg-X is wrong: there is no benefit in tying Joomla and Java, especially for content editing. What you really want is a local copy of your site, make modifications on one or the other, and then merge them into a synchronized site.
So I looked whether there was something to synchronize Joomla sites – and of course there wasn’t. On the other hand, the problem is a wide-spread one, and it would make sense to have some form of software that helps in getting at least a new site started, or two sites merged.
What do you need to do to synchronize sites? Well, there were two main approaches:
- Synchronize only selected sources, like content and section/category hierarchy
- Synchronize the entire database and file system, using an approach you could call “backup and merge”
I found the second approach to be the much more appealing. If you solve the problem of synchronization, you can solve it for everything, not just for content. And if your global synchronizer is smart enough to be able to omit a set of tables and files, then you get the benefit of both.
To synchronize two sites we need a list of current files and a current snapshot of the database. We can then check the differences between the databases and file systems, and use those to determine the current merged state. Things get complicated by the need to merge linked items.
A current DB snapshot is made by a series of table definitions followed by a series of INSERT statements. The art of synchronizing consists in finding out what entries changed and how: was there a modification, an addition, or a deletion?
Indeed, with the database comparison, we lose track of what actually happened. Indeed, if we add an entry on both the origin and destination DBs, it will look like a modified entry. So, to get to the root of it, we need to detect how the DB changed in time BEFORE we compare with the other DB.
Since pretty much all tables have unique autoincrement keys, a snapshot of the DB will tell us how it changed from one synchronization to the next. A deletion shows up as a missing key. An addition as a new key. A modification as an insert on one key that doesn’t match the insert in the previous DB.
Once we have the list of changes for both DBs, we can start merging them. The last synchronization event made the DBs identical, so that all the changes have to be replayed on the other side. In particular, after the merge the keys are going to be the same on both ends.
To put this in perspective, here a graph of the merge events: