Thoughts on analytics, data management, visualization and collaboration

–White House: “Big Data is indeed a Big Deal”

Posted by Brett Sheppard on April 2, 2012

White House Big Data AnnouncementThe U.S. federal government announced over US$200 million in funding for a wide range of both new and already-in-progress initiatives related to Big Data. Director of the White House Office of Science and Technology Policy Dr. John Holdren (pictured at left) spoke on Thursday March 29, 2012 in a panel presentation and webcast at the American Association for the Advancement of Science (AAAS) auditorium in Washington, D.C.  Read the rest of this entry »

Posted in Big Data | Leave a Comment »

–Parse and Visualize Unstructured Data in Hadoop

Posted by Brett Sheppard on November 5, 2011

In October 2011, I recorded a video chat with Ronen Schwartz, Vice President of B2B Products at Informatica, and Karl Van den Bergh, Vice President, Product and Alliances at Jaspersoft. The video runs a little under 10 minutes, and discusses approaches by Informatica and Jaspersoft to parse and visualize big data in Hadoop. (Disclaimer: Informatica is a Zettaforce client).

These approaches extend Jaspersoft business intelligence and Informatica data integration investments to leverage data stored in Hadoop, and support an Informatica drag-and-drop visual studio approach for parsing data in Hadoop that reduces the need for developers to manually write Map/Reduce scripts. For a broader discussion of data integration with Hadoop beyond the specific topics of parsing and visualization covered in this video and article, please refer to a Hadoop article series I wrote this summer for the Informatica Perspectives Blog.

Video chat with Brett Sheppard (left-hand side, wearing a white shirt), Karl Van den Bergh (center, in a blue shirt), and Ronen Schwartz (right-hand side, in a black shirt).

Read the rest of this entry »

Posted in Big Data | 4 Comments »

–Hadoop Blog Series at Informatica Perspectives

Posted by Brett Sheppard on August 27, 2011

Enterprises use Hadoop in data-science applications that improve operational efficiency, grow revenues or reduce risk. Many commercial, open source or internally developed data-science applications have to tackle a lot of semi-structured, unstructured or raw data. Common use cases include log analysis, data mining, machine learning and image processing.

Organizations benefit from Hadoop’s combination of storage and processing in each data node spread across a cluster of cost-effective commodity hardware. Hadoop’s lack of fixed-schema works particularly well for answering ad-hoc queries and exploratory “what if” scenarios.

For more, visit a multi-part guest blog series on Informatica Perspectives.

Posted in Big Data | Leave a Comment »

–Hadoop Examples

Posted by Brett Sheppard on July 31, 2011

Organizations in multiple industries and the public sector are using Hadoop as one part of their integrated data architectures to obtain the highest value from their data. Hadoop is an increasingly popular option to process, store and analyze huge volumes of semi-structured, unstructured or raw data, often from disparate data sources.

The following are 10 public-domain examples of Hadoop production clusters, with URL links if you would like additional information: Read the rest of this entry »

Posted in Big Data, Hadoop | 5 Comments »

–Get Started with Hadoop: from Evaluation to a Production Server

Posted by Brett Sheppard on June 7, 2011

Hadoop is is growing up. Apache Software Foundation (ASF) Hadoop and its related projects and sub-projects are maturing as an integrated, loosely coupled stack to store, process and analyze huge volumes of varied semi-structured, unstructured and raw data. This piece provides tips, cautions and best practices for an organization that would like to evaluate Hadoop and deploy an initial cluster. It focuses on the Hadoop Distributed File System (HDFS) and MapReduce. If you are looking for details on Hive, Pig or related projects and tools, you will be disappointed in this specific article, but I do provide links for where you can find more information. You can also refer to the live or archived presentations at the Yahoo Developer Network Hadoop Summit 2011 on June 29, 2011 in Santa Clara, Calif., and Hadoop World 2011, sponsored by Cloudera, in New York City on November 8-9, 2011.

This article is available on the O’Reilly Media sites at http://oreil.ly/lEPwQL

Posted in Big Data | 2 Comments »

–Outliers and Coexistence are the New Normal for Big Data

Posted by Brett Sheppard on March 24, 2011

Many enterprise architectures have evolved into coexistence environments to manage and benefit from advanced analytics, with help from technology and cloud service providers that are ramping up integration capabilities. Letting data speak for itself through analysis of entire data sets is eclipsing modeling from sub-sets. In the past, all too often what was once disregarded as an “outlier” in a data model turned out to be the telltale signs of a micro-trend that became a major event.The combination of (1) coexistence approaches to manage big data volumes, complexity and speed together with (2) analysis of complete data sets is driving operational efficiency, revenue growth and enablement of new business models. Outliers and coexistence are the new normal for big data.

This article is available on the O’Reilly Media sites at http://oreil.ly/fT8T1E

Posted in Big Data, Clouds, Hadoop | 1 Comment »

–Big Data 2011 Preview

Posted by Brett Sheppard on January 31, 2011

During the 2011 National Football League (NFL) playoff TV broadcasts — amid commercials with Anheuser-Busch Clydesdales and auto racing driver Danica Patrick — an ad appeared with an IBM researcher talking about data analytics. In the IBM TV ad, Dr. David Ferrucci discusses how an IBM Watson supercomputer competes in a Jeopardy! game by integrating analytics, natural language capabilities and rapid search of disparate data.

While at first glance NFL TV broadcasts may seem an unusual forum for a discussion of data analytics, Big Data offers important tools for enterprises of all sizes to improve operational efficiencies, grow revenues, and empower new business models.

Read the rest of this entry »

Posted in Big Data, Clouds, Hadoop | 4 Comments »

–DataRush and DataCloud

Posted by Brett Sheppard on August 2, 2010

Pervasive Software Incubates Startups in Cloud Computing and Big Data Analytics

I had the opportunity at the GigaOM Structure 2010 conference to meet with Pervasive Software Chief Technology Officer (CTO) and Executive Vice President Mike Hoskins and Director for Worldwide Marketing and Channel Development David Inbar about two 15-person teams that Pervasive has incubated, named DataCloud and DataRush. Read the rest of this entry »

Posted in Big Data, Clouds | Tagged: , , , , , | 1 Comment »

–Dr. Strangelove and Excel

Posted by Brett Sheppard on July 29, 2010

Dr. Strangelove or: How I Learned to Stop Worrying and Love Excel

With Microsoft’s new PowerPivot capabilities, and spreadsheet user interfaces to data sets in Hadoop by IBM BigSheets and Datameer, spreadsheets are more viable than ever as an analytics and business intelligence (BI) tool, to the consternation of some BI program managers.

At Gartner BI Summit 2010, in April in Las Vegas, Gartner advised BI advocates to give up trying to wean business users off Excel, and instead accept that Excel is here to stay. Sri Vemparala, manager of reporting and BI at Stanford University, told Craig Stedman at TechTarget “No matter what we try to do, I don’t think we can get away from Excel.” Gartner analyst John Hagerty advises IT departments to follow a rapid-iteration model to create and update reports, and allow business users to decide how to deploy data, whether in BI software interfaces, dashboards, Excel, SharePoint or other collaboration tools.

Read the rest of this entry »

Posted in Big Data | Tagged: , , , , | 4 Comments »

–Next LAMP Stack

Posted by Brett Sheppard on July 27, 2010


The Next LAMP Stack: Hadoop Platform for Big Data Analytics

Editor’s note: a shorter version of this article appeared on GigaOM.

Many Fortune 500 and mid-size enterprises are intrigued by Hadoop for Big Data analytics and are funding Hadoop test/dev projects, but would like to see Hadoop evolve into a more fully integrated analytics platform, similar to what the LAMP (Linux, Apache HTTP Server, MySQL and PHP) stack has enabled for web applications. For example, head of technology strategy and innovation at credit card giant Visa, Joe Cunningham, told the audience at last year’s Hadoop World that he would like to see Visa’s use of Hadoop evolve from an alpha/beta environment into mainstream use for transaction analysis, but has concerns about integration and operations management.

Read the rest of this entry »

Posted in Big Data, Hadoop | Tagged: , , , , | 10 Comments »