Wednesday
Dec212011

How NOT to Sell NoSQL Database

This is the description of my first experience with a newer NoSQL database that we'll just call NoSQL Database #9999 I was told about and asked what I thought about it overall.  I hadn't heard of it before but I wanted to see what the deal was since I work with several others.  I'm always up to see if something is actually the new hotness.

I found marketecture diagrams everywhere.  The development cycle is closed and opaque development for server and client.  There is a 30 day "free trial" signup wall to maybe get to the download screen.  I'm not sure since I didn't fill it out and I really don't feel like spending my time navigating a sales channel for filling it out. The License agreement was really fun.  The short version is that it is a non-exclusive licensing model and no ability to use/test in production to see if it really works.  The choice parts basically say that I can't use the software in production and that says that if it doesn't work that's not our problem and we never said it would.  There is actually a warranty clause that says they don't warranty anything at all it's is just "as-is" without warranty!  I am not feeling the love at this point.  Then, I wanted to see the pricing.  I couldn't of course.  The minimum contract term beyond 1st 30 days is 12mos with, you guessed it, unspecified pricing information unless I contact sales directly.

So, now I know why I've never heard of this software and nothing meaningful has been written about that is not PR or Marketing driven.  There is really no way that I would even consider adopting this software at this point.  It's most likely that it is not real.  

So, NoSQL database #9999 there are many other equally usable solutions that are far more transparent in the way they do business and foster community around their products.  This isn't about paying money.  This is about trust.  So, sorry NoSQL #9999, but I'll not be entering your sales cycle in this fashion or evaluating your product at this time.  Moving along now...  

Happy Wednesday Everyone!

Sunday
Nov202011

Building an Application upon Riak - Part 1

For the past few months some of my colleagues and I have been developing an application with Riak as the primary persistent data store.  This has been a very interesting journey from beginning to now.  I wanted to take a few minute and write a quick "off the top of my head" post about some of the things we learned along the way.  In writing this I realized that our journey breaks down into a handful of categories:
  • Making the Decision
  • Learning
  • Operating
  • Scaling
  • Mistakes
We made the decision to use Riak around January of 2011 for our application.  We looked at HBase, Cassandra, Riak, MySQL, Postgres, MongoDB, Oracle, and a few others.  There were a lot of things we didn’t know about our application back then.  This is a very important point.

In any event, I’ll not bore you with all the details but we chose Riak.  We originally chose it because we felt it would be easy to manage as our data volume grew as well as because published benchmarks looked very promising, we wanted something based on the dynamo model, adjustable CAP properties per “bucket”, speed, our “schema”, data volume capacity plan, data model, and a few other things.

Some of the Stack Details

The primary programming language for our project is Scala.  There is no reasonable scala client at the moment that is kept up to date for Riak so we use the Java client.

We are running our application (a rather interesting business analytics platform if I do say so myself) on AWS using Ubuntu images.

We do all of our configuration management, cloud instance management, monitoring harnesses, maintenance, EC2 instance management, and much more with Opscode Chef.  But, that’s a whole other story.

We are currently running Riak 1.0.1 and will get to 1.0.2 soon.  We started on 0.12.0 I think it was... maybe 0.13.0.  I’ll have to go back and check.

On to some of the learning (and mistakes)

Up and Running - Getting started with Riak is very easy, very affordable, and covered well in the documentation.  Honestly, it couldn't be much easier.  But then... things get a bit more interesting.

REST ye not - Riak allows you to use a REST API over HTTP to interact with the data store.  This is really nice for getting started.  It’s really slow for actually building your applications.  This was one of the first easy buttons we de-commissioned.  We had to move to the protocol buffers interface for everything.  In hind sight this makes sense but we really did originally expect to get more out of the REST interface.  It was completely not usable in our case.

Balancing the Load - Riak doesn’t do much for you when it comes to load balancing your various types of requests.  We settled, courtesy of our crafty operations team on an on application node haproxy to shuttle requests to and from the various nodes.  Let me warn you.  This has worked for us but there be demons here!  The configuration details of running HA proxy to Riak are about as clear as mud and there isn’t much help to be found at the moment.  This was one of those moments over time that I really wished for the client to be a bit smarter.

Now, when nodes start dying, getting to busy, or whatever might come up you’ll be relying on your proxy (haproxy or otherwise) to handle this for you.  We don’t consider ourselves done at all on this point but we’ll get there.

Link Walking (err.. Ambling) - We modeled much of our early data relationships using link walking.  The learning?  S-L-O-W.  Had to remove it completely.  Play with it but don’t plan on using this in production out of the gate.  I think there is much potential here and we’ll be returning to this feature for some less latency sensitive work I perhaps.  Time will tell...

Watchoo Lookin’ for?! Riak Search - When we stared search was a separate project.  But, we knew we would have a use for search in our application.  So, we did everything we could to plan ahead for that fact.  But, by the time we were really getting all hot and heavy (post 1.0.0 deployment) we were finding our a few very interesting things about search.  It's VERY slow when you have a large result set.  It's just the nature of the way it's implemented.  If you think your search result set will return > 2000 items then think long and hard about using Riak's search functions for your primary search. This is, again, one of those things we’ve pulled back on quite a bit. But, the most important bits of learning were to:
  • Keep Results Sets small
  • Use Inline fields (this helped us a lot)
  • Realize that searches run on ONE physical node and one vnode and WILL block (we didn’t really feel this until data really started growing from 100’s of 1000’s of “facets” to millions.
At this point, we are doing everything that we can to minimize the use of search in our application and where we do use it we’re limiting the result sets in various ways and using inline fields pretty successfully.  In any event, just remember Riak Search (stand alone or bundled post 1.0.0 is NOT a high performance search engine).  Again, this seems obvious now but we did design around a bit and had higher hopes.
 
OMG It’s broken what’s wrong - The error codes in the early version of Riak we used were useless to us and because we did not start w/ an enterprise support contract it was difficult sometimes to get help.  Thankfully, this has improved a lot over time.

Mailing List / IRC dosey-do - Dust off your IRC client and sub to the mailing list.  They are great and the Basho Team takes responding there very seriously.  We got help countless times this way.  Thanks team Basho!

I/O - It’s not easy to run Riak on AWS.  It loves I/O.  To be fair, they say this loud and clear so that’s my problem.   We originally tried fancy EBS setup to speed it up and make it persistent.  In the end we ditched all that and went ephemeral.  It was dramatically more stable for us overall.

Search Indexes (aka Pain) - Want to re-index?  Dump your data and reload.  Ouch.  Enough said.  We are working around this in a variety of ways but I have to believe this will change.

Basho Enterprise Support - Awesome.  These guys know their shit.  Once you become an enterprise customer they work very hard to help you.  For a real world production application you want Enterprise support via the licensing model.  Thanks again Basho!

The learning curve - It is a significant change for people to think in an eventually consistent distributed key value or distributed async application terms.  Having Riak under the hood means you NEED to think this way.  It requires a shifted mindset that, frankly, not a lot of people have today.  Build this fact into your dev cycle time or prepare to spend a lot of late nights.

Epiphany - One of the developers at work recently had an epiphany (or maybe we all had a group epiphany).  Riak is a distributed key value data store.  It is a VERY good one.  It’s not a search engine.  It’s not a relational database.  It’s not a graph database.  Etc.. etc..  Let me repeat.   Riak is an EXCELLENT distributed key value data store.  Use it as such.  Since we all had this revelation and adjusted things to take advantage of the fact life has been increasingly nice day by day.  Performance is up.  Throughput is up.  Things are scaling as expected.

In Summary - Reading back through this I felt it came off a bit negative.  That's not really fair though.  We're talking about nearly a year of learning.  I love Riak overall and I would definitely use it again.  It's not easy and you really need to make sure the context is correct (as with any database).  I think team Basho is just getting started but are off to a very strong start indeed.  I still believe Riak will really show it's stripes as we started to scale the application.  We have an excellent foundation upon which to build and our application is currently humming along and growing nicely.

I could not have even come close to getting where we are right now with the app we are working on without a good team as well.  You need a good devops-like team to build complex distributed web applications.

Lastly and this is the real summary, Riak is a very good key value data store.  The rest it can do is neat but for now, I'd recommend using it as a KV datastore.

I'm pretty open to the fact that even with several months of intense development and near ready product under our belt we also are only scratching the surface.

What I'll talk about next is the stack, the choices we've made for developing a distributed scala based app, and how those choices have played out.

Thursday
Oct062011

The SaaS Aggregation Benefit Mirage

In this service oriented on-demand world I’ve been running into something again and again lately that I’ve found interesting and a bit annoying.

To start, imagine I’m going to build an application that uses two 3rd party services on-demand.  We’ll just call them service A and service B and say each have two features.  For this example it does not really matter what the services do.

Service A
  Feature A-1
  Feature A-2
Service B
   Feature B-1
   Feature B-2

So, I create my application and it first uses service A do something and it uses Feature A-1 and A-2.  Then, with the output of that it uses service B to do something else using feature B-2.

Now, a few months down the line when things are going great I get a call from my account manager at Service A telling me I can now get all the features of service B directly from them included.  So, what they are telling me is that my service structure now looks like this:

Service A
  Feature A-1
  Feature A-2
  Feature B-1
  Feature B-2
Service B
   Feature B-1
   Feature B-2

On the surface this looks really good.  It’s the same thing with less hassle right?  Maybe not.

This is where my annoyance surfaces.  Dig in and dig in well.  What I find again and again is that it’s simply not true because of what I’ll just call the filter effect.  What you really are getting with this new and improved service A is more like.

Service A
  Feature A-1
   Feature A-2
   Feature B-1

Notice that Feature B-2 is missing and that probably no body mentioned it.  Or, it’s more like:

Service A
   Feature A-1
   Feature A-2
   Feature C-1
   Feature C-2
   Feature C-3
   Feature C-n-OMG
Service B
   Feature B-1
   Feature B-2

And you don’t care because C isn’t B and all you need as A-1, A-2, and B-2.  While they say it’s equal is not and the app use feature B-2 if you’ll recall.  How much time did you just spend?

So, by the time you get through all this and figure out that the new improved Service A + B is pretty useless and all you really want is what you already have you will have wasted a lot of time.  There are less features, more complexity, less control, and likely much worse service and support for the aggregated services since you have no direct relationship to the end point provider.

So, rambling aside the point is that these service provider mashup aggregaters are not what they often seem on the surface and I’m frequently finding that the best deal is going right to the source and that any “savings” on the surface likely gets eaten up later in a variety of ways that are difficult to predict.  In most cases, it’s best to go to the source to get what you want.

Friday
Sep092011

Brick and Mortal Retail Doomed, Doomed I Say

Not my usual blog topic, but hey, it’s Friday and I had a brutal week.  But, I just had to relay a retail experience I had a week or so ago.  I went into a local hardware store.  It’s a pretty good one but I’ve always thought they were a bit pricey.  But, I needed a new vanity mirror and I needed it now.  On the display they had one I immediately liked and was willing to pay the premium.  Once I found a salesperson I actually started w/ a question about some sconce lights I liked all so.  He looks at me.  Looks me up and down and says, “your not going to like it.”  I said, “hit me.”  He did, I didn’t.  Whatever.  Bad start for sure.  In any event, I say, I’d like to get this mirror you have over here on the wall.  I took him to it, I said I’d like one of these please.  Here are the two things he said to me:

1. We don’t have those in stock.
2. But, I bought one and it’s still in the box at home.  Want to make me an offer I might sell it.

Ohhhhkay... I said, alright, I think I’ll stick to the store here.  How long to get one.... Turns out it’s 7-10 business days.  Keep in mind this is a premium price.  Apparently there are six with their nearby suppliers.  At this point I had already snapped a picture with my phone and sent it to my wife for approval.  She says... LOVE IT... in response.  So, I take the product #.  I tap it into a google search.  I find it on Amazon.com from an affiliate for 30% less, 5-7 days delivered, and a bit of tax.  Net savings over store of around 20%.  I decide to just show they guy and I said the following things:

1. I’d prefer to buy local. Can you sell it to me for this price and get it here in about the same time frame.
2. I can just order it from here an it’ll come to my doorstep by pushing this buy button now.

The response floored me.  No.  He mentioned the one he had at home again.  I said okay.  I hit the buy button and went home.

Here’s the crazy part.  This is an employee owned store!

I’ve tested the I’d like to buy this at several other stores recently.  Most of them simply have NO stock.  They are just display stores. You cannot buy what they have and go home w/ it.

Astounding.  Physical retail has absolutely no chance with me using this kind of approach.  I’ll just order from home.  Shame, I wanted to buy local.  Either way mirror will be here in a couple of days along w/ the sconce lights, towel rack, and TP holder to match.  *sigh*

Sunday
Sep042011

Stop Staring at my Polyglot!

I received an interesting comment/question via my blog recently.  It went a bit like this... 

I’m developing a distributed cloud application but my developers are pushing back on me for having a polyglot database strategy.  What should I do?

I won’t get into exactly what it is they are doing since that would take several more pages.  This is something of a stream of thought post so apologies if it is a little rough around the edges.  The easiest way to answer is in the context of an application I’ve been working on for a while that has some similarities to what this person wants to build.  Everything I’m describing is part of an app I’ve been building with a client since earlier this year.

Typically you'll need a few layers of "data storage” for any distributed batch or real time application (cloud native application) which is what I understand that you are trying to build.

I consider anything that holds data that is for presentation, computation, or transformation part of the data storage architecture and I like to break it down by time in storage from least amount of time to most. 

Short or Very Short Term: Single node caches (like memcached) or volatile computer node memory
Mid Term: Queue's and IMDG's
Long Term  Durable Storage:  Dynamo and BigTable derivatives abound 

There are numerous database products that live in or even between those tiers these days; more than ever before.  By no means is what follows even close to an exhaustive list.  A quick list of the ones I have worked with in the last few months personally looks like: 

Short Term: Memcached, Redis, RabbitMQ, ZeroMQ, DRAM, APC, MongoDB
Mid-Term: Redis, GridGain, RabbitMQ, ZeroMQ, MongoDB
Long-Term: Riak, MongoDB, S3, Ceph, Swift, CloudFiles, EBS, HBase

Short-Term storage is ALL in memory, not persisted to disk, and not intended to be used for long periods of time.  Your application also has to be able to deal with the fact that this type of storage is essentially ephemeral.  If the node gets a KILL signal from some source or another your app needs to know how to deal with this gracefully.  In other words, storage here is not durable at all.

Mid-Term storage is used for longer running processes.  It benefits greatly from being distributed and having a higher degree of durability.  This is generally still where most of the work in done in main system memory (no disk I/O) but also where you might do complex calculations or data transformations on your way to your goal.  You do it here because it’s fast.  You do things here because they can be shared amongst lots of workers (like queue subscribers). 

Long-Term storage is used for exactly that, long term durable storage of important data that provides sufficient and reasonable interfaces from which to retrieve that data again when needed.  Preferably it’s possible to do things like map-reduce jobs so that you can iterate and retrieve what is necessary which you may then operate on at one of the higher levels up this stack.

You’ll see that I’ve put some of them in all or multiple categories which might seem odd until you understand how they work and match the technology to what ever you are trying to achieve from a business perspective.

I have a tendency to avoid things that require overly complex operational management issues for starting up projects because I like to try to get my TCO (Total Cost of Ownership) over time (3-5 years) as low as possible while achieving the project goals and SLA’s.  There are a couple of exceptions on the list above that do have more operational overhead (MongoDB and HBase) but they are good enough in the right context that you might want to learn and use them anyway.

Now, back to the question at hand.  Should I use one type of DB or many for the needs at hand.  In this case, I’ve told them that they should use as few as possible, possibly only one.  The reason for this was that in their case they will value speed, consistency, and lower cost of operations at this early stage of their project.  They are developing an interesting distributed system for cool reasons.  I recommended a choice to them that I think will help them get to their goals fast and cost effectively while allowing them down the line to break off pieces of the application later as and if needed.

Parting words are that it will, over time, be nearly unavoidable that this (and most) applications of a distributed nature end up being database polyglotoumous.  However, I do think it adds a lot of complexity and overhead and in the early stages of a project it's not usually necessary unless what you are doing is of great complexity in which case you might want to break that down anyway to something more manageable.

Sunday
Aug072011

Can New Clouds Teach Old Apps New Tricks?

Cramming the same old code, CMS, application, etc into the cloud (any cloud) doesn't make the most of the capabilities of cloud computing in all it's various forms.  I expect to be discussion this subject more in the near future.  But, start by giving two examples and labeling them cloud native application design pattern and anti-pattern. 

A Cloud Native Application Design Anti-Pattern

I'll pick on Drupal a bit (but with love).  If one installs Drupal at a cloud IaaS or PaaS provider then that does not make Drupal a cloud native application.  To me, this seems obvious but I am not so sure it is obvious in general.  The Drupal CMS is not a Cloud Native Application.  Putting Drupal, Wordpress, CMS XYZ of your choice on cloud computing IaaS or even PaaS provider of your choice essentially means you end up with an virtualized n-tier application running in the cloud with many of the same limitations of a hardware based deployment and only some of the benefits of being a cloud native application running on a cloud computer.  Yes, of course, and admirably (see billions of pageviews per month) drupal can run IN the cloud.  But, that does not make it OF the cloud.  But, I will say that based on personal experience even considering all this situation it's still likely the right choice in a great many cases to run it in the cloud.

A Cloud Native Application Design Pattern

If you want to see what CMS can look like as a cloud native application then check out the Lily CMS project. I personally might not choose this specific architecture and systems design to achieve the same goals.  However, there is more than one way to build a CNA.  They have done some great work there and are clearly on the right track!  It's excellent work and I have respect for what the Outerthought team has created with their platform.  It's actually potentially quite a lot more than just a CMS as well.  In any event, I think that with the exception of the default HBase high availability limitations (which will be addressed soon by HBase project I suspect) this can be considered a cloud native application.  Coupled with the appropriate monitoring, automation, and even cloud environment awareness it would be a very powerful cloud native application.

All of this summarizes to me as one very simple fact.  There is a tremendous opportunity ahead!  Exciting times.