–DataRush and DataCloud
Posted by Brett Sheppard on August 2, 2010
Pervasive Software Incubates Startups in Cloud Computing and Big Data Analytics
I had the opportunity at the GigaOM Structure 2010 conference to meet with Pervasive Software Chief Technology Officer (CTO) and Executive Vice President Mike Hoskins and Director for Worldwide Marketing and Channel Development David Inbar about two 15-person teams that Pervasive has incubated, named DataCloud and DataRush.
Editor’s note: this is reprint from an article published in early July on Big Data News, with a few updates.
Pervasive Software is perhaps best known for its extensive data integration capabilities and its PSQL database engine. Headquartered in Austin, Pervasive marked its 38th consecutive quarter of profitability in the quarter ended June 30, 2010, with $11.7 million in revenue and $0.8 million in net income that quarter.
In addition to his role as CTO and head of Pervasive Innovation Labs, Mike Hoskins serves as General Manager for Pervasive’s integration products. Mike joined Pervasive through its December 2003 acquisition of data integration and exchange provider Data Junction Corporation, where he served as President for 15 years.
Pervasive DataCloud, now in its second version, adds a data services layer to Amazon Web Services. Amazon’s Elastic Compute Cloud (EC2) provides compute capacity in the cloud, but not much in terms of out-of-the-box data integration and transformation capabilities. That’s where Pervasive DataCloud comes in.
Enterprises and service providers can use DataCloud2 to integrate data from multiple applications, including software as a service (SaaS) to SaaS, SaaS to on-premise, or on-premise to on-premise, with the ability to add or decrease capacity as needed over time. Software and service developers can use DataCloud2 as a Platform as a Service (PaaS) to create and deliver on-demand data services, drawing from Pervasive’s extensive library of connectors and data maps with over 500 business rules and API interfaces.
For example, an enterprise with multiple CRM systems can synchronize application data from Oracle/Siebel, Salesforce.com and Force.com partner applications within a Pervasive DataCloud2 process, and then use feeds from that DataCloud process to power executive dashboards or business analytics. Likewise, an enterprise with Salesforce.com data can use DataCloud2 to synch with an on-premise relational database, or synch data between Salesforce.com and Intuit QuickBooks accounting software.
A large publisher has an interesting application where they take data from multiple silos through XML application programming interfaces (APIs), create a data mash-up on DataCloud, and sell that data as a service (DaaS), creating additional monetization from their content stores.
Pervasive issued a Summer 2010 update for DataCloud, with additional security options and an enhanced management console.
With DataRush, Pervasive has adopted an original approach to scaling analytics workloads. Pervasive DataRush is a dataflow platform that parallelizes workloads and reduces course-grain latencies over multiple cores within the same server.
DataRush has an optional temporal events store but not a built-in data warehouse, although it can feed external data warehouse systems. Currently, DataRush is set up to handle batch processing, but the batch intervals can be defined in short time segments (such as each hour). DataRush has some similarities to complex event processing (CEP) software used in financial services and other industries, and could be used as part of a CEP solution, but is not designed to compete with CEP.
Instead, among the key applications for DataRush are data preparation and advanced analytics. Pervasive DataRush for Data Preparation helps organizations remove performance bottlenecks in the cleansing and preparation of large data sets. The data preparation offering fits Pervasive’s strength in data integration. Pervasive DataRush for Analytics is designed to enable users to analyze entire large-scale data sets (instead of just sampling data) and perform data mining operations. DataRush for Analytics may be a more difficult sales proposition for Pervasive, compared to the tremendous industry interest in Hadoop and MapReduce for advanced analytics.
Will DataRush’s multi-threaded parallel engine revitalize SMP architectures? It’s too soon to tell. In addition to the DataRush and Intel Xeon 7500 performance benchmarks that Pervasive announced at GigaOM Structure, DataRush will benefit from a marquee customer or two that evangelizes why they selected DataRush and how their business has benefited.
Part of Pervasive’s challenge is that enterprises, public sector organizations, and influencers such as industry analysts have been sitting on the massively parallel processing (MPP) bandwagon for more than a decade. Within enterprise accounts, while Teradata, Oracle/Sun, IBM and others continue to sell symmetric multi-processing (SMP) systems, they are largely positioned either as a stepping stone to a MPP upgrade or as a departmental data mart.
One the plus side for DataRush, in addition to applications like data preparation that fit Pervasive’s strengths in data integration, there are some advantages to being “different” instead of only being “better”. As previously proprietary hardware platforms such as Teradata and Netezza use more commodity hardware, there is becoming less and less of a hardware performance differentiation. Taking a “different” approach – as well as delivering as good or better performance and features – can offer more opportunities for long-term differentiation than trying to always stay one step ahead as competitors on the same or similar path.
Has your organization used DataCloud or DataRush? If so, what pros or cons of these offerings have you found?