Cloudera Day in DC: Cloudera Manager and Enterprise

Editor's note: In this post,  provides more context on Big Data gleaned from the 26 Jan 2012 Cloudera Day in DC.-bg

Another valuable pannel at the DC Cloudera Day was Todd Lipcon's look into Hadoop management software Cloudera Manager available through Cloudera Enterprise. Cloudera is in the business of making Hadoop, the open source Big Data storage and analysis platform, easier for enterprises to adopt and, though the first step is their Cloudera Distribution Including Apache Hadoop (CDH), enterpise deployments will likely need additional help managing their clusters. While Google can ship a bus of computer science PhDs in from Stanford whenever they have a problem, most businesses and government agencies don’t have those kinds of resources available. Cloudera Enterprise and Manager allows the rest of us to build Hadoop systems up, predict issues, solve problems, and make improvements.

Cloudera Manager is the first end-to-end management tool for Apache Hadoop. It’s available in a Free Edition for download from Cloudera’s website which allows users to install, configure, and perform basic management for Hadoop clusters up to 50 nodes. The Enterpise Edition, with a number of more advanced features, is available through Cloudera Enterprise subscription service, which also includes Cloudera Support.

Manager greatly reduces the chance of operator error. It runs checks and validations on your code and creates a complete audit trail of changes to the system. Manager annotates changes and correlates them with performance to measure results and uncover mistakes. If you do harm your cluster’s performance, manager can automatically roll back the changes.

Manager also provides insight into the performance of your Hadoop cluster by tracking trends and alerting the user if a job is running slower than usual and by how much. If Hadoop fails, it can tell you what events occurred and what was going on with the data when it happened. Like Splunk, Manager also tracks and allows searches on log data. It performs all of these functions with minimal overhead, requiring at most 1% CPU and often much less, and continued to perform well even in thousand node clusters.

At the end of his panel, Lipcon offered some insight into what comes next for Cloudera Enterprise and Manager. New capabilities are being developed to make the most of the upcoming CDH4. CDH4 will have a secondary name node in case the first fails, so the next version of Manager will provide failover management and multiple-namespace management. CDH4 will also implement an updated version of MapReduce, so Manager will include MapReduce2 service and configuration tools.

Cloudera, along with numerous other key players in Big Data, were also present at yesterday's Carahsoft Government Big Data Forum. Check back for upcoming recaps of panels, speakers, and technology in the coming days and weeks.