Doug Cutting is the creator of Lucene, Nutch and Hadoop. Doug and these projects are increasingly being mentioned in enterprise environments. So with this post I'll provide a bit more of context on each. Doug Cutting has earned a reputation as an innovative, community-supporting developer. He is on the board of directors of the Apache Software Foundation and is now its … [Read more...] about Background on Lucene, Nutch and Hadoop
What is Hadoop? Hadoop is a collection of software originally spawned from the Apache Nutch project (Read a little more of its history HERE that is now its own project within the Apache Foundation. Its goal is to provide a highly redundant, self-repairing cloud of computers that can fail out and still be robust, fast, and efficient. To accomplish this goal, it leverages … [Read more...] about The Quickest Guide to Hadoop You'll Ever Read
Facebook works its data magic at scales others only dream of. And they do this for over 600,000,000 people, in real time! (see: Facebook's New Real-Time Messaging System: HBase To Store 135+ Billion Messages A Month). Cade Metz just wrote a piece diving deeper into this at The Register. His article is titled "HBase: Shops swap MySQL for open source Google mimic." This great … [Read more...] about When do you pick HBase instead of MySQL?
If you're reading this, you probably already know about Apache's Hadoop, a popular data storage and analysis platform. Hadoop can inexpensively store any type of information from any source on commodity hardware and allow for fast, distributed analysis run in parallel on multiple servers in a Hadoop Cluster. It's powerful, agile, scalable, and, due to replication, resilient to … [Read more...] about Common Hadoopable Problems
Bioinformatics is the application of computer science in the form of statistics and analytics to molecular biology. This exciting field is bringing about great breakthroughs, especially in genetics, where computers and algorithms are being used to map genomes. Advances in this field show promise in helping us understand life and advance science. The field is producing knowledge … [Read more...] about Hadoop for Bioinformatics
Earlier, I wrote on the use of Hadoop in the exciting, evolving field of Bioinformatics. I have since had the pleasure of speaking with Dr. Ron Taylor of Pacific Northwest National Library, the author of "An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics", on what's changed in the half-year since its publication and what's to … [Read more...] about The Future of Hadoop in Bioinformatics
Microsoft has shown once again that it can act decisively and smartly improve its offerings in doing so. It has "disrupted" itself in a way that will be positive for the community and probably the Microsoft bottom line. In a November 11 blog post, Microsoft announced that it would not be moving forward with a production release of LINQ to HPC, something that was an alternative … [Read more...] about Microsoft Focuses Big Data Efforts on Hadoop