Search Site
Blog Sponsors

Contact Kent for Sponsorship Information

Topics
RSS Feed
« mySQL Tools for the Wise | Main | Wbox: It's like PING man... and more »
Saturday
11Aug

Getting Rid of the Relational Database

In this article, which is written in the style I hope to follow for many of my posts here at ProductionScale, I will discuss the topic of getting rid of the relational database. Afterwards, I will follow up with a brief analysis or executive summary if you will of this infobit might mean for businesses. The summary is written to be accessible for the various business managers out there who just need to know what it all means. The topic today is getting rid of the relational database. Where appropriate, I have tried to include references to source materials in an accessible way also.

There is, what seems to be, a growing trend in scalability circles that the relational database model is simply the proverbial ball and chain in the relationship between scalable applications and the underlying infrastructure. The quest for seamless linear growth for technology applications is being hindered by the “elephant database.”1

What would Amazon do? In a recent talk2 at QCon London Werner Vogels, the CTO of Amazon.com clearly noted that the relational database model is a essentially outdated for the needs of modern applications as a primary data storage medium. In other words, it is simply to slow and cumbersome.

Additionally, Mr. Vogels makes a critical point that in many, many cases relational databases are simply not necessary. Simple key/value pairs (hashes) are all you need.

Recently a developer I work with had begun, when given memcached to play with, storing much more than I had originally intended in the cache. At first, when I found out, because he was dismayed that memcached didn’t like eating anything over 2MB, I just said, why are you putting “big” files there anyway? The answer to that question doesn’t matter in this context. What does matter is the question, why not? So, I thought, well, if you can just put everything in hashes in memcache the the DB is just a cache state backup in case you have to restart the thing. Interesting. Who needs a DB anyway? But, say you need to run more complex queries.

A recent architecture article I read by Todd Hoff on the website High Scalability3 discusses just this to a point. It says, “Move cpu-intensive work moved out of the database layer to applications applications layer: referential integrity, joins, sorting done in the application layer! Reasoning: app servers are cheap, databases are the bottleneck.” Ebay chose to move traditional relations DB work right up into the application layer. How interesting!

In a conversation between Margo Seltzer and Michael Stonebraker we begin to get an idea of why the relational database model is overly cumbersome. It boils down to a single word. Latency. By way if example using the techniques of bond arbitrage Stonebraker notes quite earnestly that it is a “latency arms race.” The arbitrager with the least latency in their system wins. What do they win? Money! So, the stakes are high. Stonebraker continues on to explain what I think is the most important part that it is not the latency of any individual component but the latency of the entire architecture end to end. Seltzer picks up on this when he says, “So, it’s not the latency of the instruction execution; it’s the latency of the architecture?”

So, is this inconclusive evidence of the pending death of the Relational Database? Of course not. But, it is trend spotting in that people are again noticing that there are other ways and that those other ways just might quite faster with modern applications.

So, to paraphrase Varnish4 software architect Poul-Henning Kamp, let’s stop doing things like it’s 1975 and get with the program.

What does this mean for Business?

This means you should be paying attention to your code quality, optimization. You should break out of a one-size-fits-all way of thinking when it comes to databases, data storage, and scalable systems. Vertical scaling by throwing hardware at it is no longer sufficient for modern web scale applications. There are built in limitations as dictated by clear and proven underlying mechanisms that prohibit current modern database and application technology from scaling much further. This is not only about money. It’s about finesse and the application of core scalability design theory from the forefront of technology. In summary, if you intend to run modern applications in truly scalable ways you must break out of the mold we’ve been in for 30+ years and think about new ways to design and build your applications. This article and it’s supporting sources is a good place to start.

Addendum (8/12/2007)

I just found this on a new site launched by GigaOM.  A little more along the same lines.  I haven't read it in depth yet  but just wanted to post it.
http://future.gigaom.com/2007/08/10/data-20-how-the-web-disrupts-our-relational-database-world/

 

  1. A Conversation with Michael Seltzer and Michael Stonebraker. Source URL - http://delivery.acm.org/10.1145/1260000/1255430/p16-stanik.htm?key1=1255430&key2=3880943811&coll=&dl=ACM&CFID=15151515&CFTOKEN=6184618
  2. Werner Vogels: Scalability and Consistency. Source URL - http://www.infoq.com/presentations/availability-consistency
  3. eBay’s Architecture. Source URL - http://highscalability.com/ebay-architecture
  4. Varnish Project – Source URL - http://varnish.projects.linpro.no/

EmailEmail Article to Friend

References (1)

References allow you to track sources for this article, as well as articles that were written in response to this article.
  • Response
    There are two schools of thought in the industry concerning the origin of quality manuals. One categorically declares that a single, common quality manual will not work and that each company should develop its own. This conviction stems from the belief that if documents are not written within the company, its ...

Reader Comments (9)

Again, it's the same speech about the death of the Relational Database. You are only viewing the problem from a single perspective: the enterprise echosystem is much richer than just web apps. The reason why relational databases are still dominant is because there is nothing better to substitute it as the common factor of the storage of all kind of enterprise data. You can read my post about normalized and denormalized data models

Regards
Diego

September 10, 2007 | Unregistered CommenterDiego Parrilla

Anyway... digged!

September 10, 2007 | Unregistered CommenterDiego Parrilla

Who places process intensive business logic inside database these days - thats right the people who followed blindly what the large database vendors have said for the last 10 years.

Any dev worth his salt should know that placing cpu intensive tasks inside a database is a poor design from a support and maintainence point of view.

September 10, 2007 | Unregistered CommenterOllie

Ollie, from time to time I still hear this kind of things. If your business process is very data intensive and manage huge amounts of data (data batches for example), then the database is THE BEST place to code your processes.
If your process needs some kind of quick user interaction or fast transaction, then the application layer is better. The rule of thumb is:
- Fast and Light transaction or synchronous user process, application layer
- Long and heavy transaction or asynchronous user process, the closer to the data the better: Reduce the roundtrips to your data and reduce latency. Use some kind of MOM to manage the communication with your application layer. Period.
Thinking that an application in the middle layer can perform better than a process in the database manager is naive.

September 10, 2007 | Unregistered CommenterDiego Parrilla

I will give you a real world example:

Large examination board HAD to spend 2 million dollars a year on hardware just to keep there database running at 90% cpu (multiple) utilisation all because they had decided to place so much business logic and data processing inside the database, they even had to build a queuing mechanism to schedule when certain task were to be submitted to the database because of so much un-neccessary processing was being done inside the database.

Now imagine if they had spent that 2 million dollars on application layer hardware, (2 dual core, 10 Gb memory etc) - now that is a lot of hardware and therefore alot of application layer processing.


It is not about individual task performance, obivously a database is going to process data faster than anything in an application layer, it is about concurrency and through put.

September 11, 2007 | Unregistered CommenterOllie

I will give another real world example:
A very large staffing company spent more than 20 million dollars and five years just because a team of very smart Microsoft architects (and later on, very '.Neat' architects) told them that business logic in databases is evil and ALL business logic must be in the middlelayer. Our proposal was a more balanced approach with business logic in both layers depending on the very exigent performance requirements. In that days J2EE was the best option because there were enough caching technologies to accelerate the access to read only and read mostly data in the application layer.
What happened? The system was slow like a turtle, the system needed to be fully rearchitected, Microsoft architects were fired, several extra million dollars were burned and the patience of the CFO ran out. The CIO had a stroke because of the stress, the WorldWide Services Team was sell to a large Consulting firm, the Technical Managers and Project Managers were fired and the project was cancel, of course.

September 11, 2007 | Unregistered CommenterDiego Parrilla

The last name of the Amazon CTO is "Vogels", not "Vogel".

January 21, 2008 | Unregistered CommenterNeil

For some domains there is definitely a trend for decentralized "schemaless" databases that are able to utilize way more computing power (and, therefore, it is a good reason to start paying less attention to single database performance).

January 21, 2008 | Unregistered CommenterYurii Rashkovskii

@Neil, thank you for pointing out my misspelling of Mr. Vogels name. I have corrected that mistake. My apologies to Mr. Vogels as well.

January 22, 2008 | Registered CommenterKent Langley

PostPost a New Comment

Enter your information below to add a new comment.

My response is on my own website »
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>