Posted by Brett Sheppard on July 31, 2011
Organizations in multiple industries and the public sector are using Hadoop as one part of their integrated data architectures to obtain the highest value from their data. Hadoop is an increasingly popular option to process, store and analyze huge volumes of semi-structured, unstructured or raw data, often from disparate data sources.
The following are 10 public-domain examples of Hadoop production clusters, with URL links if you would like additional information:
- AOL Advertising pairs Hadoop’s capability for handling large, complex data volumes with Membase’s support for sub-millisecond latency to make optimized decisions for real-time ad placement. (Blog at Cloudera.com by Couchbase co-founder James Phillips, February 16, 2011).
- Hadoop helps quantitative analytic staff for Bank of America’s cross-enterprise lines of business – such as Home Loans and Insurance, Consumer and Small Businesses, Online Banking, Risk Management and Card Products – study entire data sets with billions of records versus the limitations of looking only at sample data, to assess the market, credit and/or operational risk, and revenue lift, of new and existing financial products (Author interview and 2010 Hadoop Summit talk by Tresata co-founder Abhishek Mehta).
- Hadoop powers DNA sequencing analysis and other bioinformatics research, starting with the CloudBurst application at the University of Maryland and continuing with the Crossbow, Contrail and Myrna genomic research projects (Ronald Taylor, BMC Bioinformatics, December 2010).
- At Disney, the Technology Shared Services Group uses Hadoop for an integration mashup for diverse departmental data to analyze patterns across different but connected customer activities, such as attendance at a theme park, purchases from Disney stores, and viewership of Disney’s cable television programming (PwC Technology Forecast case study, 2010).
- eBay extended a Teradata enterprise data warehouse with Hadoop for image processing and deep data mining along with a Teradata offshoot named Singularity for behavioral analysis and clickstream semi-relational data (author interview 2011 and video presentation by Anil Madan).
- In addition to using Hadoop for web analytics and to back up MySQL databases, Facebook uses HBase as a back-end for materialized views to support real-time analytics (Facebook Engineering Notes on Facebook).
- LinkedIn uses Hadoop together with a Project Voldemort key-value store, Kafka distributed messaging system and Azkaban work-flow tool to power analytics-driven features such as “People You May Know” and “Jobs You May Be Interested In” (LinkedIn Engineering).
- Hadoop forms part of the Motorola architecture to process and understand ever-growing volumes of mobile-device usage data. (Big Data Cloud meetup, August 2011).
- Twitter stores log data in Hadoop to run analytics and hit its FlockDB graph database in parallel to assemble social-graph aggregates (Twitter Engineering).
- Yahoo! runs 42,000 Hadoop nodes storing about 200 petabytes of data as an integral part of the Yahoo! cloud infrastructure, and joined Benchmark Capital to fund a Hadoop startup, Hortonworks (Yahoo! Developer Network, Hadoop Summit 2011).