« Scalr: Scalr is a fully redundant, self-curing, self-hosting EC2 environment | Main | Using Sphere for Sharing Related Content »
Sunday
Mar302008

Relational Database, Why Bother?

I'm sure there are actual very good answers to the "Why Bother?" portion of this posts title.  But, this post is more or less in response to Scaling out MySQL from Nati Shalom's blog. The argument essentially that you should augment the relational database layer with an IMDB (In Memory Data Grid) for transactional activities and use the Relational DB as a back end persistent data store.  It is also a nice run down of various things that one might have to do to enable a MySQL Relational database layer to scale and continue to perform as load increases to insane levels where vertical scaling becomes impossible or cost prohibitive.

In reading that post I just could not stop thinking about all the hoops we all jump through to get around the fact that current implementations of Relational Databases just do not seem to be able to provide the performance and scale that successful modern web applications demand.  

Using in memory data grids like Coherence or in memory distributed cache technology like memcached gives me the scalability and performance I need to handle modern web application transaction loads on the systems I design. I use them for a couple of reasons.

1. Protect the database from meltdown
2. Enable shared access to data across a horizontally scalable clusters of machines

I have considered that the work being done on columnar databases like Vertica might be interesting to apply to web applications but I have not had a chance to really dig into that idea.

So, because of the limitations of my primary permanent relational data store I am forced to have to take the transactions out of the database.  Which makes me continue to ask the question over and over again of why I need the relational database anyway when I often don't use or need referential itegrity (I see DBA's shivering everywhere when I say that).  I really think that things like Mnesia, CouchDB, SimpleDB, HBase, Bigtable, and other technologies along those lines are coming in fast and furious to replace the relational database in its entirety as the persistent data store anyway.  This is especially true if you need to do major heavy lifting data mining of the data store or fancy things like Rackspace's log parsing with Hadoop or the NYT creating 11 million PDF's in 24 hours.

Resources:

Scaling Out MySQL by Nati Shalom
http://natishalom.typepad.com/nati_shaloms_blog/2008/03/scaling-out-mys.html

Vertica
http://www.vertica.com/

Memcached
http://www.productionscale.com/display/Search?searchQuery=memcached&moduleId=1481658
http://www.danga.com/memcached/

Oracle Coherence
http://www.oracle.com/technology/products/coherence/index.html

EmailEmail Article to Friend

Reader Comments (4)

For the record, Vertica and other column-oriented systems are great for analytics but horrible for row-by-row operations. So unless your particular web site happens to calculate and display reports, then Vertica isn't a good option to consider.

You make a very interesting point about how we generally use data on the web and how little value an RDBMS provides for you though. I think that the HStore project is based on that same idea, so you're in good company... :)

April 1, 2008 | Unregistered CommenterTom Briggs

Hi Tom, thanks for your comment. I was thinking in part of this quote from the Wikipedia entry on Google's BigTable when I wrote that.

"BigTable is a fast and extremely large-scale column-oriented database system, with a focus on quick reads from columns, not rows. It's designed to scale into the petabyte range across hundreds or thousands of machines"
http://en.wikipedia.org/wiki/BigTable

Column Oriented on the web... being used in a map-reduce environment I suppose. I need to research more...

April 7, 2008 | Registered CommenterKent Langley

Hi Kent,

I went to the BSDCan 2008 conference last week, and Ivan Voras (FreeBSD contributor) announced his new mdcached project that aims to perform even better than memcached. I thought you might want to have a look at it: http://ivoras.sharanet.org/projects/mdcached.html

I haven't had a chance to try it out yet, but if anyone else has, please post a followup comment with your impressions.

Cheers,
Greg

May 22, 2008 | Unregistered CommenterGreg Larkin

Thanks Greg. I did read about that one a few days ago as well. I also haven't tried it yet or had a chance to review it much. There's definitely some room for improvement w/ memcached but one of it's real strengths is its overall simplicity and clarity in setup and use overall. People often say to me, "what? Huh? That's all it does?" When I first introduce it to them. Eventually they see the light. Although, there are some hidden complexities in implementation for sure!

May 30, 2008 | Registered CommenterKent Langley
Editor Permission Required
You must have editing permission for this entry in order to post comments.