–Dr. Strangelove and Excel
Posted by Brett Sheppard on July 29, 2010
Dr. Strangelove or: How I Learned to Stop Worrying and Love Excel
With Microsoft’s new PowerPivot capabilities, and spreadsheet user interfaces to data sets in Hadoop by IBM BigSheets and Datameer, spreadsheets are more viable than ever as an analytics and business intelligence (BI) tool, to the consternation of some BI program managers.
At Gartner BI Summit 2010, in April in Las Vegas, Gartner advised BI advocates to give up trying to wean business users off Excel, and instead accept that Excel is here to stay. Sri Vemparala, manager of reporting and BI at Stanford University, told Craig Stedman at TechTarget “No matter what we try to do, I don’t think we can get away from Excel.” Gartner analyst John Hagerty advises IT departments to follow a rapid-iteration model to create and update reports, and allow business users to decide how to deploy data, whether in BI software interfaces, dashboards, Excel, SharePoint or other collaboration tools.
Enabling Excel users to access business intelligence data can present governance and privacy problems. Stanford University requires users who export healthcare or other non-public data into Excel to sign and certify reports produced with that data, and to retain the data in the same format as it existed in the original SAP BusinessObjects, Oracle BI or Oracle Hyperion databases.
Microsoft SharePoint is one option for enabling collaboration while enforcing role-based security, compliance policies, workflows, and versioning. With Microsoft Office 2010 and SQL Server 2008 R2, business users can import 100 million rows of data or more from traditional data warehouses and data marts, conduct complex queries, create graphs and charts using pre-built templates, and publish results on SharePoint. For more on PowerPivot, visit the Microsoft PowerPivot site.
Datameer and IBM Add Spreadsheet Interfaces to Hadoop
Both Datameer and IBM are offering spreadsheet front-ends for Hadoop datasets. They are not directly using Excel but have a similar spreadsheet look and feel. This helps hide Hadoop complexity and integration of disparate data sources behind an easy-to-use spreadsheet interface, while enabling use of Hadoop for the very large data sets that could be problematic to manage in Excel PowerPivot.
Based in San Mateo, CA, Datameer completed a US$2.5 million round of financing on March 31, 2010 with Redpoint Ventures. McAfee uses the Datameer Analytics Solution (DAS) to enable non-technical users to understand data from their Global Threat Intelligence platform. Datameer announced the DAS beta program in April 2010 and plan a general availability release this September. They have pilots with prospects in financial services, mobile services, online advertising and other industries.
I had the opportunity to attend Datameer’s presentation at GigaOM Structure LaunchPad, and talk by phone today with Datameer co-founder and CEO Ajay Anand and VP of Marketing Teresa Wingfield. Datameer works with multiple Hadoop distributions, including Apache Hadoop, Amazon Elastic MapReduce (EMR), Cloudera and Yahoo!.
Prior to co-founding Datameer with Stefan Groschupf, Ajay was the Director of Product Management for Hadoop and Cloud Computing at Yahoo! During Ajay’s tenure, Yahoo! became a major Hadoop contributor and adopter, with over 25,000 servers running on Hadoop, yet there were still no native tools for business analysts to access Hadoop data, hence the inspiration to start Datameer.
Datameer provides a spreadsheet interface for business analysts or other users who are not Hadoop specialists to import data into Hadoop, run queries and report results with built-in charting, graphing and other visualization tools. Users can also build a dashboard and personalize it, or export aggregates as a separated value file.
In addition to the spreadsheet user interface and reporting tools, Datameer DAS offers data integration capabilities to import data into Hadoop from disparate sources ranging from MySQL, Oracle and DB2 to Apache log servers and Twitter. For example, with the Datameer plug-in for log files, communications carriers can import call data records for network analytics or lawful intercept.
IBM BigSheets organizes information in a very large spreadsheet, where users can analyze it using the sort of tools and macros found in desktop spreadsheet software. BigSheets is an extension of the mashup paradigm that integrates large sets of unstructured data, enriches that data using semantic logic structure tools such as LanguageWare or OpenCalais, and lets you explore and visualize enriched data in tools such as IBM Many Eyes.
IBM alphaWorks shows sample visualizations, including a look at Dr. Who villains. IBM offers its own collaboration software, as part of Lotus, but also supports Microsoft SharePoint.
Figure: IBM Many Eyes Visualization Example, Doctor Who Villains
You can learn more about BigSheets at the IBM jStart site. jStart is part of IBM Software Group’s Emerging Technologies division, lead by Rod Smith, Vice President of Emerging Technologies.