Government Big Data Forum 2011

Carahsoft has just announced more details on the 2011 Government Big Data Forum. This should be a great event for discourse on the struggles faced by Government IT regarding the breadth of data they handle. Both civilian and military agencies require more data storage every day.  But the reason to have data is not to store […]

Background on Lucene, Nutch and Hadoop

Doug Cutting is the creator of Lucene, Nutch and Hadoop. Doug and these projects are increasingly being mentioned in enterprise environments. So with this post I’ll provide a bit more of context on each. Doug Cutting has earned a reputation as an innovative, community-supporting developer.  He is on the board of directors of the Apache […]

The Quickest Guide to Hadoop You'll Ever Read

What is Hadoop? Hadoop is a collection of software originally spawned from the Apache Nutch project (Read a little more of its history HERE that is now its own project within the Apache Foundation. Its goal is to provide a highly redundant, self-repairing cloud of computers that can fail out and still be robust, fast, […]

When do you pick HBase instead of MySQL?

Facebook works its data magic at scales others only dream of. And they do this for over 600,000,000 people, in real time!  (see: Facebook’s New Real-Time Messaging System: HBase To Store 135+ Billion Messages A Month). Cade Metz just wrote a piece diving deeper into this at The Register. His article is titled “HBase: Shops […]

Recap of the Government Big Data Forum of 26 Jan 2011

On 26 January 2011 government IT professionals, federally-focused systems integrators and IT vendors met in the Government Big Data Forum to dialog on issues of common concern on the topic of overwhelming data. This event, sponsored by Carahsoft, included presentations, discussions, panels and a tech expo all focused on the issue of “Big Data” in […]

Common Hadoopable Problems

If you’re reading this, you probably already know about Apache’s Hadoop, a popular data storage and analysis platform. Hadoop can inexpensively store any type of information from any source on commodity hardware and allow for fast, distributed analysis run in parallel on multiple servers in a Hadoop Cluster. It’s powerful, agile, scalable, and, due to […]