Facebook works its data magic at scales others only dream of. And they do this for over 600,000,000 people, in real time! (see: Facebook's New Real-Time Messaging System: HBase To Store 135+ Billion Messages A Month). Cade Metz just wrote a piece diving deeper into this at The Register. His article is titled "HBase: Shops swap MySQL for open source Google mimic." This great reporting underscores something we already knew, that Facebook is a pioneer in the world of fast/realtime read/write access to big data. He also underscored that when you see Facebook making moves like swapping MySQL for HBase it is yet another reason to study what is going on here. This is especially important since Facebook is not the only firm swapping out MySQL for HBase.
So here is a bit more on HBase:
HBase is an open-source, distributed, versioned, column-oriented store modeled after Google' Bigtable: A Distributed Storage System for Structured by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, HBase provides Bigtable-like capabilities on top of Hadoop. HBase includes:
- Convenient base classes for backing Hadoop MapReduce jobs with HBase tables including cascading, hive and pig source and sink modules
- Query predicate push down via server side scan and get filters
- Optimizations for real time queries
- A Thrift gateway and a REST-ful Web service that supports XML, Protobuf, and binary data encoding options
Here is more from the Cade Metz article:
HBase is part of the Apache Hadoop project, a sweeping effort to mimic Google's proprietary infrastructure with open source code. It dovetails with HDFS, the Hadoop distributed file system, and Hadoop MapReduce, the distributed number-crunching platform. HBase is essentially a low-latency layer that sits atop HDFS, letting you rapidly store and retrieve data. It's fashioned after Google's BigTable platform, which Mountain View publicly described in a 2006 research paper.
Now back to the title of this post. How do you know if you should pick MySQL or HBase for a solution?
If you are designing systems to operate at huge scale, or integrating with other Hadoop related projects, select HBase. HBase, with Hadoop and related capabilities, is also there to support analysis at scale. So, if you are designing for making sense over big data, pick HBase.
MySQL is not going away. But the things it will be optimized for are traditional RDBMS solutions. If you can comfortably store your data in tables with rows and columns and don't have that much of it or don't do fast analysis over it, MySQL may be your best pick. MySQL is widely used, highly reliable, and well understood. So if your future growth and business model indicates you will never run into scale problems or have challenges conducting analysis over your data, MySQL may well be the best choice.
- Facebook's New Real-time Messaging System: HBase to Store 135+ Billion Messages a Month (highscalability.com)
- New Technology Behind 20 Billion Daily Facebook Messages (allfacebook.com)
- HBase 0.90.0 Released: Over 1000 Fixes and Improvements (nosql.mypopescu.com)
- Facebook: Why our 'next-gen' comms ditched MySQL (go.theregister.com)
- Tynt CTO Quoted in The Register: HBase: Shops swap MySQL for open source Google mimic (tynt.com)
- Cloudera (ctolabs.com)
- Jorge Escobar: Apple going Hadoop. Question is: HBase Or Cassandra? (nosql.mypopescu.com)
- NoSQL: Hive and HBase in Toad for Cloud Demo (themindstorms.blogspot.com)
- Facebook: Why our 'next-gen' comms ditched MySQL (channelregister.co.uk)
- HBase: Shops swap MySQL for open source Google mimic (go.theregister.com)