Thoughts on analytics, data management, visualization and collaboration

Archive for June, 2011

–Get Started with Hadoop: from Evaluation to a Production Server

Posted by Brett Sheppard on June 7, 2011

Hadoop is is growing up. Apache Software Foundation (ASF) Hadoop and its related projects and sub-projects are maturing as an integrated, loosely coupled stack to store, process and analyze huge volumes of varied semi-structured, unstructured and raw data.Β This piece provides tips, cautions and best practices for an organization that would like to evaluate Hadoop and deploy an initial cluster. It focuses on the Hadoop Distributed File System (HDFS) and MapReduce. If you are looking for details on Hive, Pig or related projects and tools, you will be disappointed in this specific article, but I do provide links for where you can find more information. You can also refer to the live or archived presentations at the Yahoo Developer Network Hadoop Summit 2011 on June 29, 2011 in Santa Clara, Calif., and Hadoop World 2011, sponsored by Cloudera, in New York City on November 8-9, 2011.

This article is available on the O’Reilly Media sites at http://oreil.ly/lEPwQL

Posted in Big Data | 2 Comments »