IBM just announced a major new push around Apache Spark. You know Spark, the free and open source complement to Apache Hadoop that gives enterprises better ability to field fast, unified applications that combine multiple workloads, including streaming over all your data. For years it has seemed like IBM was giving lip-service to Spark while emphasizing capabilities that they placed big bets on over the years (especially IBM’s flagship stream processing product, InfoSphere Streams).
I have seen solutions based around InfoSphere Streams that are clearly the right approach for high performance solutions where reliability is critical for mission support. Spark is still new and frankly there have been issues with performance from time to time. But Spark is also open source where large teams of developers are contributing to continuously improve the capability. Spark today is production ready for many workloads.
But I never would have imagined IBM would so strongly endorse Spark like they did today. I was really surprised. Their endorsement is so strong I believe it is time for all InfoSphere Streams users start considering the future of their designs.
The endorsement came in the form of a $300 million investment and the assignment of 3,500 people to help develop Spark. They also launched a plan to train over a million data scientists and data engineers on Spark. IBM called spark “the most significant open source project of the next decade.”
Full press release follows:
IBM Announces Major Commitment to Advance Apache®Spark™, Calling it Potentially the Most Significant Open Source Project of the Next Decade
BM Joins Spark Community, Plans to Educate More Than 1 Million Data Scientists
ARMONK, NY – 15 Jun 2015: IBM (NYSE:IBM) today announced a major commitment to Apache®Spark™, potentially the most important new open source project in a decade that is being defined by data. At the core of this commitment, IBM plans to embed Spark into its industry-leading Analytics and Commerce platforms, and to offer Spark as a service on IBM Cloud. IBM will also put more than 3,500 IBM researchers and developers to work on Spark-related projects at more than a dozen labs worldwide; donate its breakthrough IBM SystemML machine learning technology to the Spark open source ecosystem; and educate more than one million data scientists and data engineers on Spark.
As data and analytics are embedded into the fabric of business and society –from popular apps to the Internet of Things (IoT) –Spark brings essential advances to large-scale data processing. First, it dramatically improves the performance of data dependent apps. Second, it radically simplifies the process of developing intelligent apps, which are fueled by data.
To further accelerate open source innovation for the Spark ecosystem, IBM is taking the following actions:
- IBM will build Spark into the core of the company’s analytics and commerce platforms.
- IBM’s Watson Health Cloud will leverage Spark as a key underpinning for its insight platform, helping to deliver faster time to value for medical providers and researchers as they access new analytics around population health data.
- IBM will open source its breakthrough IBM SystemML machine learning technology and collaborate with Databricks to advance Spark’s machine learning capabilities.
- IBM will offer Spark as a Cloud service on IBM Bluemix to make it possible for app developers to quickly load data, model it, and derive the predictive artifact to use in their app.
- IBM will commit more than 3,500 researchers and developers to work on Spark-related projects at more than a dozen labs worldwide, and open a Spark Technology Center in San Francisco for the Data Science and Developer community to foster design-led innovation in intelligent applications.
- IBM will educate more than 1 million data scientists and data engineers on Spark through extensive partnerships with AMPLab, DataCamp, MetiStream, Galvanize and Big Data University MOOC.
“IBM has been a decades long leader in open source innovation. We believe strongly in the power of open source as the basis to build value for clients, and are fully committed to Spark as a foundational technology platform for accelerating innovation and driving analytics across every business in a fundamental way,” said Beth Smith, General Manager, Analytics Platform, IBM Analytics. “Our clients will benefit as we help them embrace Spark to advance their own data strategies to drive business transformation and competitive differentiation.”
Spark Drives Business Transformation for IBM Clients
Spark has grown quickly in popularity among developers and data scientists as an essential platform for helping organizations more easily integrate Big Data into applications, and is quickly gaining momentum with IBM clients looking to transform business decision-making:
- Real-time transportation planning software from Optibus is changing the way public transport is organized. “Spark, together with IBM, provides a highly scalable platform for Optibus, making it easy for us to expand our software as a service offering into new markets, and helps us simplify deployment, maintenance and application development for transportation companies worldwide,” said Amos Haggiag, Optibus CTO and Co-Founder.
- Findability Sciences, a global consulting and contextual data technology solutions company, is using IBM Analytics and Spark to help clients tap into the power of Big Data. “Apache Spark with IBM BigInsights has given us tremendous capacity for our implementations for small and medium businesses, where MapReduce was not efficient. With Spark, the performance has improved multi fold. We’re now able to process streaming data from IoT devices and offer analytics for data in motion for things like traffic, commuters and parking,”said Anand Mahurkar, CEO of Findability Sciences.
- Independence Blue Cross (IBC) is the largest health insurer in the Philadelphia area, serving more than two million people in the region and seven million nationwide. It’s using Spark to help drive product innovation and develop new services. “Apache Spark is quickly maturing into a power tool for development of machine-learning analytic applications. It allows our IBC researchers and academic partners to work together more seamlessly, which means we can get new claims and benefits apps up and out to customers much faster,” said Darwin Leung, Director of Informatics at Independence Blue Cross.
- IBM, NASA, and the SETI Institute are collaborating to analyze terabytes of complex deep space radio signals using Spark’s machine learning capabilities in a hunt for patterns that might betray the presence of intelligent extraterrestrial life. “With Spark as a Service on Bluemix, we’ll be able to work with IBM to develop promising new ways to analyze signal data as we hunt for evidence of intelligence elsewhere in the cosmos. This is an exciting example of synergy in the service of science,”said Dr. Seth Shostak, Senior Astronomer and Director of the Center for SETI Research.
IBM is one of four founding members of the UC Berkeley AMPLab, where Spark was first invented in 2009, and as a result participates in multi-day research retreats, provides advice and real-world insight, and interacts closely with AMPLab researchers on projects of mutual interest. “As a sponsor of the AMPLab, IBM contributes to the greater Spark community and provides guidance for the continued evolution and improvement of the Berkeley Data Analytics Stack, the open source platform of which Spark is a key component,” said Professor Michael Franklin, Director of the UC Berkeley AMPLab.
Spark is agile, fast and easy to use. And because it is open source, it is improved continuously by a worldwide community. Over the course of the next few months, IBM scientists and engineers will work with the Apache Spark open community to rapidly accelerate access to advanced machine learning capabilities and help drive speed-to-innovation in the development of smart business apps. By contributing SystemML, IBM will help data scientists iterate faster to address the changing needs of business and to enable a growing ecosystem of app developers to apply deep intelligence into every thing.
Latest posts by Bob Gourley
- DoD Intelligence Information Systems (DoDIIS) Conference 18-21 August 2019 - August 14, 2019
- Insights into threats, risks and opportunities - August 10, 2019
- Learn things your competitors wish you did not know - August 2, 2019