Background on Lucene, Nutch and Hadoop

Doug Cutting is the creator of Lucene, Nutch and Hadoop. Doug and these projects are increasingly being mentioned in enterprise environments. So with this post I’ll provide a bit more of context on each. Doug Cutting has earned a reputation as an innovative, community-supporting developer.  He is on the board of directors of the Apache […]

The Quickest Guide to Hadoop You'll Ever Read

What is Hadoop? Hadoop is a collection of software originally spawned from the Apache Nutch project (Read a little more of its history HERE that is now its own project within the Apache Foundation. Its goal is to provide a highly redundant, self-repairing cloud of computers that can fail out and still be robust, fast, […]

When do you pick HBase instead of MySQL?

Facebook works its data magic at scales others only dream of. And they do this for over 600,000,000 people, in real time!  (see: Facebook’s New Real-Time Messaging System: HBase To Store 135+ Billion Messages A Month). Cade Metz just wrote a piece diving deeper into this at The Register. His article is titled “HBase: Shops […]

Common Hadoopable Problems

If you’re reading this, you probably already know about Apache’s Hadoop, a popular data storage and analysis platform. Hadoop can inexpensively store any type of information from any source on commodity hardware and allow for fast, distributed analysis run in parallel on multiple servers in a Hadoop Cluster. It’s powerful, agile, scalable, and, due to […]

Hadoop for Bioinformatics

Bioinformatics is the application of computer science in the form of statistics and analytics to molecular biology. This exciting field is bringing about great breakthroughs, especially in genetics, where computers and algorithms are being used to map genomes. Advances in this field show promise in helping us understand life and advance science. The field is […]

The Future of Hadoop in Bioinformatics

Earlier, I wrote on the use of Hadoop in the exciting, evolving field of Bioinformatics. I have since had the pleasure of speaking with Dr. Ron Taylor of Pacific Northwest National Library, the author of “An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics”, on what’s changed in the half-year since its […]