3 ways open source software is transforming analytics

Abhishek Jain
Vice President, Data & Analytics

February 27, 2020

Open source software has created a new world of opportunity for data science. See how it’s helping us do our best work – and changing the industry as a whole.

When most of our data scientists started out in the world of analytics, the software they learned to use was SAS. Quite simply, that was the tool, in the same way writers use MS Word and accountants cut their teeth on Excel.

SAS has been the undisputed market leader in the analytics space for years. It has a huge array of functions, robust technical support, and the levels of assurance that means heavily regulated industries can use it without fear of non-compliance. In fact, for this reason the banking industry is still a SAS purist to this day.

There is a downside to SAS though – it’s expensive. And, increasingly, it’s falling behind competitors in terms of its capabilities.

Today, most data scientists will tell you they prefer open source alternatives that offer more functionality for little or no cost.

The (relatively) new kids on the block

Over the last eight to ten years, the analytics software market has been significantly disrupted by open source tools like Python and R, along with frameworks like Apache Hadoop for processing large data sets. Ten years ago, the instances where we were using R or Python for analysis were less than 20%; for the majority it was SAS. Now the balance of 20-80 has become 80-20.

‘Open source’ is a term used to describe software that is free for anyone to use and modify – and that brings some significant benefits.

Where SAS is relatively expensive and rigid, these newer, open source solutions are free to use, packed with features and, in theory, endlessly flexible.

The comparison between the two is a bit like the old PC vs Mac adverts – where PCs were portrayed (perhaps unfairly) as the stuffy, old-school option, and Macs were painted as their slick, young replacements.

R is the open source alternative to SAS. It delivers fast access to the latest capabilities and has a dynamic user community that provides support and regularly releases documentation online.
Python is an open source scripting language that provides functions for almost any statistical operation you can imagine.

The advantages of open source

We’re now at the point where open source is no longer seen as a sort of indie alternative to the mainstream. It’s a headline act in its own right. In fact, as far back as 2016, Gartner found that 95% of IT organisations ‘used nontrivial open source software assets within their mission-critical IT portfolios’.

This includes large organisations like Ford Motor Company, Macy’s and Netflix, who have been using open source technology for their big data projects for a few years now.

This shift could provide a significant advantage for data science. Especially as we approach a future where more data – and more types of data – need to be analysed in real time.

So, what does open source offer data scientists that traditional tools don’t?

Reducing the cost of data discovery
As open source software is most often free to use, it can result in dramatic savings for data science projects. That means budgets go further and discoveries are made sooner.

Case in point: In 2017, Forbes reported that Centrica had priced a solution from an IT supplier that would cost £5 million to deploy only 12 computing nodes. By switching to open source options, Centrica instead spent £750,000 for 250 nodes.

In this instance, the operational savings alone meant the new systems paid for themselves.
Unleashing potential
With open source technology, our data scientists aren’t restricted by the limits of their software. If a particular capability isn’t available from a vendor, they can access the source code and create it themselves.

Data scientists thrive on finding new and unique ways to solve problems, so in this way, open source presents the perfect opportunity for them to do their very best work – and that means our clients can achieve things that otherwise wouldn’t have been possible.
Accelerating the rate of discovery
Perhaps the biggest advantage of open source is that once you’ve created a new capability it can be used by everyone else too, and vice versa.

This collaborative model has numerous benefits – not least that the rate of discovery for the entire industry is improved thanks to the democratisation of tools and ideas. This means that the boundaries of what’s possible are constantly being stretched. And that makes this an exciting time to work in data science.

At The Smart Cube, we combine advanced analytics, data science and technology to solve our customers’ most pressing problems, helping them to thrive in today’s competitive environment. Find out more about our analytics solutions. To learn about the latest data science trends and the analytics projects we’re working on, explore more of our blog posts.

Abhishek Jain

Abhishek is a Vice President, Advanced Analytics Solutions for The Smart Cube. He is passionate about developing and implementing analytical solutions for Fortune 500 companies, helping them understand customers and make better business decisions. Abhishek specialises in predictive analytics and visual storytelling around consumers and operations across the Retail, CPG and Life Sciences domains, focusing on data science and stakeholder management.

Abhishek Jain

Abhishek is a Vice President, Advanced Analytics Solutions for The Smart Cube. He is passionate about developing and implementing analytical solutions for Fortune 500 companies, helping them understand customers and make better business decisions. Abhishek specialises in predictive analytics and visual storytelling around consumers and operations across the Retail, CPG and Life Sciences domains, focusing on data science and stakeholder management.

Back to all resources