–Big Data 2011 Preview
Posted by Brett Sheppard on January 31, 2011
During the 2011 National Football League (NFL) playoff TV broadcasts — amid commercials with Anheuser-Busch Clydesdales and auto racing driver Danica Patrick — an ad appeared with an IBM researcher talking about data analytics. In the IBM TV ad, Dr. David Ferrucci discusses how an IBM Watson supercomputer competes in a Jeopardy! game by integrating analytics, natural language capabilities and rapid search of disparate data.
While at first glance NFL TV broadcasts may seem an unusual forum for a discussion of data analytics, Big Data offers important tools for enterprises of all sizes to improve operational efficiencies, grow revenues, and empower new business models.
What constitutes Big Data varies by organization: Large enterprises are beginning to grapple with multiple petabytes, while for a small or mid-size enterprise, growth to 10s of terabytes or more can create challenges for data management and analytics. There’s more complexity too, with the proliferation of disparate data sources including machine to machine data, social media and electronic healthcare records.
For healthcare provider Kaiser Permanente and its more than 8 million members, Big Data is about improving the quality of care and reducing costs. Using Kaiser’s HealthConnect electronic healthcare records and decision-support software, doctors and nurses can view the patient’s complete history including lab test results, prescriptions, diagnosis, treatment, demographics, medical plan and payment records. Further, patients can avoid unnecessary trips to the hospital by exchanging emails with their doctor and ordering prescription refills online.
The simplicity of Kaiser’s HealthConnect web interface masks the complexity of an extensive Big Data infrastructure. Inpatient, outpatient, pharmacy, finance, cost management and other groups at Kaiser all access patients’ electronic healthcare records, with appropriate role and group based security controls.
Figure: Multiple Departments and Functions at Kaiser Rely on Access to Big Data in Patient Electronic Healthcare Records (source: Kaiser Permanente)
Kaiser’s data architecture includes:
- electronic healthcare records software from Epic Systems;
- SAP Business Objects and Crystal Enterprise reporting;
- application and service development through a service oriented architecture (SOA);
- Oracle 9i/10g, SQL Server and Teradata databases;
- Informatica PowerCenter for data integration; and
- data center outsourcing services from IBM.
For most enterprises and public sector organizations, the focus is the “right tool for the job”, which can include any number of different combinations among business intelligence software; R and other open source analytics tools; spreadsheets; relational databases; Hadoop; operational data stores; column stores; and document-oriented databases.
Hadoop/MapReduce, together with other related Apache open source projects, has moved past test/development to become a viable extension or alternative to traditional relational databases. For example, at LinkedIn, they use a combination of Hadoop to process massive batch workloads, Project Voldemort for a NoSQL key/value storage engine, and the Azkaban open-source workflow system to empower large-scale data computations of more than 100 billion relationships a day and low-latency site serving.
Figure: LinkedIn Data Infrastructure (source: LinkedIn presentation at Hadoop Summit 2010)
Using cloud-computing technologies, organizations are experimenting with distributed data stores, cloud compute capacity for data analytics, hosted data integration and even operational databases in the cloud. For organizations with existing investments in data warehouses and data markets, technologies such as in-memory systems, flash-based accelerators, and memcached servers can help alleviate performance bottlenecks and push back hardware retirement dates.
With all of the information available today on the public Internet and within internal corporate sites, it’s easy to feel overwhelmed. Visualization and collaboration tools are important to help business users overcome a feeling of data overload, identify patterns and take actionable steps. For example, LinkedIn Maps enables users to map professional networks and understand relationships among connections. Your map is color-coded to represent different affiliations or groups from your professional career, such as your previous employer, college classmates or industries you’ve worked in.
Figure: LinkedIn Maps Your Professional Network
This will be an exciting year for benefiting from and managing Big Data. For more, on GigaOM Pro read “Big Data 2011 Preview”, an introduction to Brett Sheppard’s upcoming “Big Data 2011” report that will be published in conjunction with GigaOM Structure Big Data: Get real-time insights on Big Data, March 23 in NYC.