CouchDB – Thinking Outside the Relational Database Box

Schema-Free Is The New Drug-Free

Lately I’ve been thinking a lot about this new-fangled contraption, Document Oriented Databases. Ok, so maybe these aren’t that new-fangled – Lotus Notes is a document oriented database and it’s been around for 20 years or so. But, they’re new-fangled to me; my experience to date has solely been with relational databases.

So what is a Document Oriented Database? Actually, just what it sounds like – a data store whose primary unit is a document. A document can be anything, usually a collection of key-value pairs that describe something, anything… it’s really that simple, really no different than a Microsoft Word document…

Ok, so if a document in this context is similar to a Microsoft Word document, how is a document oriented database any different than my file system?

Good question, and really you could think of your file system as a document oriented database. But, you quickly run into problems with a file system as your data store. What happens when you try to open a document that someone else has open? How do you draw a relationship between two documents on your hard drive? These are the problems that DOBs (warning: not an official abbreviation!) alleviate through things like versioning, and Map-Reduce.

CouchDB

One document oriented database project that has been getting a lot of attention lately, and that I’m getting excited about, is CouchDB. It’s a solution that appeals to me for a lot of reasons – top among those being that it is written in Erlang (hot-swap code, anyone?). Also, I like the fact that it is (recently) built from the ground up using JSON requests/responses over HTTP because god knows we don’t need another XML based anything out there.

Hey! I’m not the only one who thinks XML is a little too ubiquitous.

Anyways, I’m getting off topic. So why use a solution like CouchDB over the tried and true MySQL/Postgre/SQLServer, etc? For me, there are a couple of reasons:

  • no schema
  • built with scalability and replication in mind

Schema-Free Is The New Drug-Free

Ok, so we probably won’t be seeing any “This is your brain on schemas” commercials any time soon. But, anyone who’s had to build/manage/maintain a Relational Database before understands how detrimental to your health they are.

With DOBs there are no schemas! Remember, their primary unit of storage is the document. A document can have any information in it you want, no need to concern yourself if a document matches field for field with another document – just insert it and forget it! Ok, I should work on my phraseology there, but you get the point.

Wait, so if there are no schemas how do you draw relationships between your data?

In CouchDB, there’s a layer that sits on top of the base documents layer that uses server-side javascript to define relationships on your data. These relationships are defined by you, the database manager, but they don’t need to be defined completely before you start collecting data. Simply define a function in javascript, CouchDB calls them Views, and couchdb will do the data crunching to maintain your views as you insert new data.

As of this writing, the view engine is pretty slow in couchdb – certainly not what you get from a nicely indexed relational database – but remember, CouchDB is still in alpha. I don’t believe the team has even started focusing on performance.

The Future Is In The Cloud

I’m a firm believer that in the not-so-distant future everyone and anyone who wants to will have access to the same massive-computing infrastructure than mostly large, well funded companies do today. We’re already seeing this movement with Amazon EC2, SimpleDB (another document oriented database), and Google App Engine.

So, if we have a massively distributed infrastructure at our disposal, wouldn’t it make sense to use technology that is built from the ground up to take advantage of distribution? Hence my interest in Erlang (that’s a topic for another post) and CouchDB.

Ask anyone who has had to scale a MySQL database from 7 to 7 million users what they think on the matter.

You Still Reading This?

Anyways, this has been mostly a brain dump as I sit here enjoying 10 dollar bottle of Cabernet Sauvignon thinking about what kind of technology I want to use on some of my passion projects. I know for sure that statforge is going to be built from the ground up using couchdb and erlang. That is, if I ever get off my butt to work on it!