Thoughts on analytics, data management, visualization and collaboration

–Big Data 2011 Preview

Posted by Brett Sheppard on January 31, 2011

During the 2011 National Football League (NFL) playoff TV broadcasts — amid commercials with Anheuser-Busch Clydesdales and auto racing driver Danica Patrick — an ad appeared with an IBM researcher talking about data analytics. In the IBM TV ad, Dr. David Ferrucci discusses how an IBM Watson supercomputer competes in a Jeopardy! game by integrating analytics, natural language capabilities and rapid search of disparate data.

While at first glance NFL TV broadcasts may seem an unusual forum for a discussion of data analytics, Big Data offers important tools for enterprises of all sizes to improve operational efficiencies, grow revenues, and empower new business models.

What constitutes Big Data varies by organization: Large enterprises are beginning to grapple with multiple petabytes, while for a small or mid-size enterprise, growth to 10s of terabytes or more can create challenges for data management and analytics. There’s more complexity too, with the proliferation of disparate data sources including machine to machine data, social media and electronic healthcare records.

For healthcare provider Kaiser Permanente and its more than 8 million members, Big Data is about improving the quality of care and reducing costs. Using Kaiser’s HealthConnect electronic healthcare records and decision-support software, doctors and nurses can view the patient’s complete history including lab test results, prescriptions, diagnosis, treatment, demographics, medical plan and payment records. Further, patients can avoid unnecessary trips to the hospital by exchanging emails with their doctor and ordering prescription refills online.

The simplicity of Kaiser’s HealthConnect web interface masks the complexity of an extensive Big Data infrastructure. Inpatient, outpatient, pharmacy, finance, cost management and other groups at Kaiser all access patients’ electronic healthcare records, with appropriate role and group based security controls.

Figure: Multiple Departments and Functions at Kaiser Rely on Access to Big Data in Patient Electronic Healthcare Records (source: Kaiser Permanente)

Kaiser’s data architecture includes:

  • electronic healthcare records software from Epic Systems;
  • SAP Business Objects and Crystal Enterprise reporting;
  • application and service development through a service oriented architecture (SOA);
  • Oracle 9i/10g, SQL Server and Teradata databases;
  • Informatica PowerCenter for data integration; and
  • data center outsourcing services from IBM.

For most enterprises and public sector organizations, the focus is the “right tool for the job”, which can include any number of different combinations among business intelligence software; R and other open source analytics tools; spreadsheets; relational databases; Hadoop; operational data stores; column stores; and document-oriented databases.

Hadoop/MapReduce, together with other related Apache open source projects, has moved past test/development to become a viable extension or alternative to traditional relational databases. For example, at LinkedIn, they use a combination of Hadoop to process massive batch workloads, Project Voldemort for a NoSQL key/value storage engine, and the Azkaban open-source workflow system to empower large-scale data computations of more than 100 billion relationships a day and low-latency site serving.

Figure: LinkedIn Data Infrastructure (source: LinkedIn presentation at Hadoop Summit 2010)

Source: LinkedIn Presentation at Hadoop Summit 2010

Using cloud-computing technologies, organizations are experimenting with distributed data stores, cloud compute capacity for data analytics, hosted data integration and even operational databases in the cloud. For organizations with existing investments in data warehouses and data markets, technologies such as in-memory systems, flash-based accelerators, and memcached servers can help alleviate performance bottlenecks and push back hardware retirement dates.

With all of the information available today on the public Internet and within internal corporate sites, it’s easy to feel overwhelmed. Visualization and collaboration tools are important to help business users overcome a feeling of data overload, identify patterns and take actionable steps. For example, LinkedIn Maps enables users to map professional networks and understand relationships among connections. Your map is color-coded to represent different affiliations or groups from your professional career, such as your previous employer, college classmates or industries you’ve worked in.

Figure: LinkedIn Maps Your Professional Network

This will be an exciting year for benefiting from and managing Big Data. For more, on GigaOM Pro read “Big Data 2011 Preview”, an introduction to Brett Sheppard’s upcoming “Big Data 2011” report that will be published in conjunction with GigaOM Structure Big Data: Get real-time insights on Big Data, March 23 in NYC.

4 Responses to “–Big Data 2011 Preview”

  1. […] Challenges with Parsing Natural language is one of the most difficult to parse. While most languages including English have grammar rules for subjects, verbs, pronouns, etc., specific sentences can be difficult to parse in the absence of context. Even with the combination of IBM natural language processing software, an IBM supercomputer and Hadoop, IBM Watson struggled to understand some of the language formulations in questions posed by the Jeopardy! TV game show last year. […]

  2. microsoft cloud computing…

    […]–Big Data 2011 Preview « Zettaforce[…]…

  3. Hi there! I could have sworn I’ve been to this blog before but after browsing through some of the post I realized it’s
    new to me. Anyways, I’m definitely happy I found it and I’ll
    be bookmarking and checking back often!

  4. Kaitlyn said

    I lopve what you guys are upp too. This sor of clever work and reporting!
    Keepp up the very good works gus I’ve you guys to my blogroll.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: