Data Synchronization and the Compact Framework
posted on 04/27/09 at 12:13:58 am by Joel Ross
Over the past few weeks, I've had a chance to really dig into certain parts of the Compact Framework. We're taking on a major mobile development project, and one of the key parts will be how we move data from the website to the device and back.
So far, we've looked at a few different options, and I've come up with a few plusses and minuses for each. I figured I'd share that information, and see if anyone knew of other options and/or other benefits of the methods I'll mention.
One thing that I feel I should mention: In our current situation, we don't envision the need to have true data synchronization capabilities. We have a handful of reference tables that come down from the server. On the device, we create new entries (in another table), and send those back to the server. Any changes to the data on the client would be thrown away on the next sync with the server. That affects my view of the options, because collision detection and resolution might be stronger in one versus the other.
Sql Server Replication
I've typically heard replication used in terms of server to server communication, but it can also be used to move data from a server to a client, and synchronize changes back to the server. You can specify what tables (and columns) you want sent to the client, and you can limit the data sent to the client based on a few parameters sent by the client.
The most compelling argument for using replication is schema management. Any changes you make at the server level to the publication are automatically reflected in the client's database. Not having to worry about a database migration strategy for our devices sounds very appealing. Oh, and it doesn't hurt that it's extremely efficient at replicating the data. In an admittedly unscientific test, we saw replication from a server take less time than running through a series of inserts where the data was already loaded in the device's memory. That's impressive!
But, unless we're missing something, because of the immediate nature of schema change replication, we'd most likely have to miss out on the benefits it provides. We don't roll out new versions of our mobile application all at the same time, or at the same time we roll out web updates. Having the server publication change for everyone could be troublesome, so we'd likely be maintaining a publication per version of our software. Not horrible, but ideal.
There's a few other things I didn't like about this solution either: It's seems like the client needs to know a bit too much about the server - basically, enough info to create the server's connection string. I guess this is logical, because the server doesn't have to know about clients ahead of time, but with the possibility of devices getting lost or stolen, this seems risky. One more downside. Every table gets modified to add tracking information to the table for changes. Not a huge issue, but depending on what sync options you specify, the client database seems to grow beyond what I would have expected.
The setup is a bit touchy as well. Once you have it working, it's fine, and a lot of the problems I ran into were probably one time things or a lack on knowledge. Still, it seemed like it could be a whole lot cleaner.
Remote Data Access
Remote Data Access (RDA) has three different options for moving data between the server and the client: Push, Pull and SubmitSQL. Pull gets data from the server and puts it on the client. Push is a way to get data from a pulled table back to the server, which means we didn't look at it that closely. SubmitSQL is a way to create new records on the server with data from the client.
If we accept that we basically lose schema management with replication because of our requirements, then RDA seems a much simpler solution. We now have full use of SQL stored procedures, UDFs, etc., to help with how we select data, and we can pass back multiple identifiers from the client.
The downsides are roughly the same as replication - too much server side knowledge by the client. I understand why, but that doesn't mean I'm comfortable with it! Schema management is also a bit tougher - essentially, a table that's managed via Pull is not allowed to change. It's not horrible though - you can just drop the table with a new build, and have it re-added on first sync. Again, not ideal, but it works.
Unlike replication, setup for this is a breeze - no SQL server steps involved, and minimal IIS work (in fact, once replication was set up, there were no extra steps involved).
Oh yeah. One more downside for RDA: It's going to be removed from future versions.
Microsoft Sync Framework
But Microsoft never removes something without adding something similar, right? For the most part, yes, and in this case, there is an alternative as well: Synchronization Services. We've just started digging into this one, since I originally read it only supports Windows Mobile 5 & 6, where as we require CE support as well.
The upside is high for this one. We can use an alternate database on the device if we want - it supports any database that supports ADO.NET. That's nice, because we've contemplated looking into Sqlite as our store, since in our initial testing, it's quite a bit faster than Sql CE.
The downsides here are roughly the same as RDA, which isn't surprising given it's meant as the replacement for RDA - we have to manage schema changes ourselves. Oh, and the documentation and samples leave something to be desired.
Roll Our Own
What can I say? I've yet to see a wheel where I haven't thought, "I could re-invent that!" Ok, that's not true. I'd much rather find something that does what I want so I can focus on other problems. But there are times where you can't find a solution that fits your needs well enough to actually be called a solution. If the above solutions don't end up fitting our needs, we always have the option of hand crafting a solution that does.
The upsides to this are obvious. We'd get to choose exactly how we do everything. We can solve all of the downsides of the above solutions, and avoid all of the negatives. All of the above solution
The downsides are also obvious. We have to do it all. We have to maintain it. There's no new version coming that has a killer feature (unless we add it ourselves). It increases the timeline, because we're doing the work, and we're not benefiting from other's experiences.
After writing this, I found this chart to be helpful to review and verify what I said. It's a comparison between the different sync options (minus roll your own).
I don't think we've landed on a solution yet, so I'm seeking feedback. Are there other reasons to go with one of the above solutions (or not to!)? Am I missing a solution? What solutions have you used in the past? Would you go the same way if you had to do it again?
Tags: CF.NET | Sql Server | Replication | Synchronization
Categories: ASP.NET