March 24, 2009

NoSQL databases break all the old rules

Amazon SimpleDB, CouchDB, Google App Engine, and Persevere may have a better way of storing data for your Web app

Print|

So you've got some data to store. In the past, the answer was simple: Hook up an official database, pour the data into it, and let the machine sort everything out for you while you spend your time writing big checks to the database manufacturer. Now things aren't so cut and dry. A fresh round of exciting new tools is tacking the two letters "db" onto a pile of code that breaks with the traditional relational model. Old database administrators call them "toys" and hint at terrible dangers to come from the follies of these young whippersnappers. The whippersnappers just tune out the warnings because the new tools are good enough and fast enough for what they need.

The non-relational upstarts are grabbing attention because they're willfully ignoring many of the rules that codify the hard lessons learned by the old database masters. The problem is that these belts-and-suspenders strictures often make it hard to create really, really big databases that suck up all of the cycles of a room full of machines. Because all Web application designers dream of building a startup that needs a really big room filled with machines to hold all of the data of all of the users, the rules need to be bent or even broken.

[ For a brief look at more alternative databases, see Open source and SaaS offerings rethink the database. Catch InfoWorld's cloud computing reviews and analysis: Cloud versus cloud: Amazon, Google, AppNexus, and GoGrid | Inside Amazon Web Services | App builders in the sky | Windows Azure Services Platform gives wings to .Net | What cloud computing really means. ]

The first thing to go is the venerable old JOIN. College students used to dutifully work through exercises that taught them how to normalize the data by breaking the tables up into as many parts as practical. Disk space was expensive then, and a good normalization expert could really pack in the data. The problem is that JOINs are really, really slow when the data is spread out over several machines. Now that disk space is so cheap and many of the data models don't benefit as much from normalization, JOINs are easy to leave behind.

The next trick is to start using phrases like "eventual consistency." Amazon's documentation for SimpleDB includes this inexact promise: "Consistency is usually reached within seconds, but a high system load or network partition might increase this time." The new twerps really get those codgers steamed when they talk about how all of the computers in the cluster will get around to replicating the data and giving consistent answers when the machines are good and ready. For the kids, consistency is akin to cod liver oil or making your bed in the morning.

Test Center Scorecard

	25%	25%	20%	20%	10%
Amazon SimpleDB	8	8	8	9	8	8.2 Amazon SimpleDB" /> Amazon SimpleDB" /> Amazon SimpleDB" /> Very Good
	25%	25%	20%	20%	10%
Apache CouchDB	7	7	8	7	9	7.4 Apache CouchDB" /> Apache CouchDB" /> Apache CouchDB" /> Good
	25%	25%	20%	20%	10%
Google App Engine	8	8	8	9	8	8.2 Google App Engine" /> Google App Engine" /> Google App Engine" /> Very Good
	25%	25%	20%	20%	10%
Persevere Server	8	7	8	7	9	7.7 Persevere Server" /> Persevere Server" /> Persevere Server" /> Good