Two Reasons You Shouldn't Use MongoDB

14 September 2010

ADDENDUM
As a couple of people have pointed out, the title of this post is pretty flamey. A bit more flamey than I really intended it to be. The underlying message of this post is that, as a developer, you should be aware of the idiosyncrasies of your potential data store, so that you can adequately make a decision to use one that fits your problem space. Don’t let the title derail that message.

1. Your application is awkwardly write heavy

In Mongo, a single mongod process can only process one write at a time, and issues a server level read/write lock while it’s doing so. Yup, that’s right, when a write is in process, nothing else can happen.

Since Mongo has some wicked fast writes, this normally isn’t a problem. However, if a write hangs, if your application has large batch inserts, or if you’re inserting a lot of really large documents, this could quickly become an issue. Ideally, this will eventually become more granular, down to the collection level for instance*.

Thankfully, until that happens, there are a couple of other ways to mitigate this. The first, and probably easiest option, is to setup a Replica Set and perform all reads on the slave(s). However, this doesn’t stop writes from queueing up. The second option is to setup a sharded environment. This option allows writes to be split up and sent to their respective shards.

2. You don’t understand how Mongo handles durability

Mongo is fast, very fast. However, it achieves those speeds by doing things that are pretty out of the ordinary. These things can potentially be catastrophic if you don’t understand what’s going on.

Any developer looking to use Mongo needs to take a look at its current** stance on single server durability. To sum it up, it’s not. Instead, developers should be using Replica Sets and sharding to achieve durability. These are things you should be looking at regardless of your data store, but it becomes all that more important to have a proper cluster when you’re working with Mongo.

Another key thing to look at is the insertion path. By default, Mongo does not wait for a response when issuing a write. There is no guarantee that the write successfully updated the memory mapped file, that it was fsynced to datafiles, or that it was replicated across the cluster. Luckily, there are a couple of commands available to alleviate this.

All of the drivers implement the getLastError command, commonly known as safe mode. Safe mode will wait for a return code from the database, ensuring that the write was successful. Safe mode also has options for ensuring fsnyc and replication.

There is also a general fsync command that can be used to flush everything to datafiles. While this can be configured at the server level, by default, it is executed every 60 seconds, or when the kernel forces it, whichever comes first.

In the end, your problem space will dictate whether the cost of performance is acceptable in favor of better durability.

* Apparently this is being addressed in 1.8 or 2.0
** True single server durability will be added in 1.8

Full Disclosure:
I love mongo. So much so, that I’m using it in my latest venture, gathers.us, and am giving several presentations on it, one of them being at Mongo Chicago.

While these quirks don’t even come close to outweighing the benefits of using Mongo, they are things that I believe tend to bite new developers to Mongo, and should be given some attention