Ziff Davis EnterpriseHeader
Advertisement
Advertisement
Advertisement
Thursday, June 07, 2007 11:53 AM/EST

Synchronization: The future of the web?

While taking a break from putting together my first Silverlight article (I decided to make it a series of articles, as it was getting too long for a blog!) I went over to see what my colleague Joe Wilcox over at Microsoft-Watch was saying today. One of his latest blog entries is about synchronization. Synchronization, in this case, refers to synchronizing your local data with data stored somewhere out on the Web. This is important to such apps as the Google word processor and spreadsheet applications, where you can work on your documents from any computer, since they're stored on Google's computers. If you're, for example, on an airplane, it would be nice to be able to work on a local copy of your documents stored on your laptop's own hard drive. Then when you reconnect to the internet, the changes to the local data would be synchronized with the documents stored on Google's computers so you have two exact copies of the same document.

In Joe's blog, he says:

The natural place for synchronization services is as part of the operating system.

Unfortunately, I think Joe's blog is missing something here, and I'm speaking as a developer. Synchronizing data is never a trivial nor generic task, because how it works and what it does depends on the particular application, and often requires substantial user input, and is heavily tied to file formats and how the data is organized inside the file.

We programmers run into this all the time with our source code version control. When two people need to work on the same source code file, synchronizing the final code is anything but trivial. Even with code control applications such as CVS or SourceSafe, the synchronization usually requires a careful human eye going through a visual merge program, where the two copies are shown side by side with matching lines lined up and differing lines highlighted. The programmer must carefully decide which lines to keep and which to throw away in the final "merged" version. And sometimes that's not easy; what if two programmers make different changes to the same ten lines inside a function? Or what if in one version a programmer removes a member variable because it's no longer needed, and another programmer writes new code that requires that member variable? The list of similar problems could go on and on, but any way you cut it, it requires a programmer to analyze and think through the changes to figure out which ones to keep.

Even if you reduce the problem to only one person, with a local copy and a remote copy, the problem is still non-trivial, as it still depends on the particular application and file formats, as well as the user's requirements and desires. And other problems can creep in: What if you have a local copy on your computer at home, and you get to work and realize you need to work on the document? Will there be a "lock" on the document (something source code control systems usually allow)? Then what do you do?

Source code, of course, usually exists as plain text files, which is easy to work with. What about more complex file formats and more complex applications? Suppose I'm writing an app that stores binary data, and it tracks who makes what changes to the data, and further, where the changes were made (e.g. while working on the web, or while working offline at home, or while working offline at work on a different computer). Next, I want to give the user the ability to review all changes before finalizing them. However, even after any changes are made final or rejected, I want the file to maintain a log of all the changes that ever took place, including names of who made the changes. (Thus this problem involves multiple "virtual" people -- user at home, user at work, user online.)

Certainly I could easily define a file format that could do all this. And if I'm willing to ditch the binary aspect, I could come up with some pretty good XML that could store all such information. But either way, the file format was built based on my own requirements. Somebody else's requirements for a different app would probably result in very different file formats. And after building the file format, it would be up to me to carefully write the code that manages all the synchronization requirements, as any generic library couldn't possibly already know about my requirements or the requirements for another app somebody else is building. (And don't forget, I would probably also need such synchronization handle high-level operations, such as creating documents offline and then having them added to the server when I connect.)

I hate to make predictions (people 100 years from now will laugh at me, although I suppose it won't matter since I'll be dead). But I would almost dare predict that synchronization will never be trivial and generic, at least not in the next 20 years. It's just too application-dependent.

TrackBack

TrackBack

http://blogs.devsource.com/cgi-bin/mte/mt-tb.cgi/11117

Comments (2)

rbloom00 :

Jeff Cogswell's point that such synchronization will never be trivial -- that is certainly true.

But development in that direction does not have to be a wasted effort.

For example, note that following his examples, already certain aspects of synchronization have been solved regarding availability-control functionality (e.g. ability for access by one modifier to lock out other modifiers until initial modifier's changes have been completed).

A pre-synch tool might also add temporary or even permanent branches in the modification tree -- so that would-be concurrent modifiers can at least record their intended changes, either to be resolved later or as base for alternate or additional product.

Interesting and ongoing tasks will include evaluating and integrating developed tools (and writing or publishing requests for other desirable tools to integrate), producing valuable new product tool sets.

The integrated tool sets would benefit highly from user- and administrative- oriented languages that (like SQL for relational databases) survive the multilinear growth in the tool sets that must certainly occur.

If not impeded too much by proprietary restrictions by developers and their organizations, at some point the synchronization tool sets are likely to, like those for current major databases, do major parts of the process, making the synchronization almost automatic.

rbloom00, you make some excellent points.

I should probably clarify that although I don't think data-level synchronization can really work at the OS level (unless it includes an extremely specific yet extensive file format that apps would have to use, and even then I'm not sure it's realistic), I do agree that there's a need for a library to provide a starting point. Why, after all, re-invent the wheel.

As I type that, I feel like I'm somewhat contradicting myself, but in fact I think a generic starting point such as Google Gears is best suited in a reusable framework, not in the OS. (The same is true for a lot of starting points like this.)

Having fought with the synchronization issue myself on many projects prior to joining DevSource, I welcome any libraries that will help out in the issue.

Post a Comment

 
 
Advertisement

Syndication

Subscribe: