Data science vs. machine learning: What's the difference?

Machine learning is a branch of artificial intelligence (AI), while data science is the discipline of data cleansing, preparation, and analysis. Here's how each works - and how they work together
468 readers like this.
CIO_Big Data Decisions_2

Machine learning (ML) and data science are often mentioned in the same breath – and for good reason. The two complement each other. However, understanding how they work – and work together – is important.

Machine learning is a branch of artificial intelligence (AI) that empowers computers to self-learn from data and apply that learning without human intervention. Data science, on the other hand, is the discipline of data cleansing, preparation, and analysis.

[ Check out our quick-scan primer on 10 key artificial intelligence terms for IT and business leaders: Cheat sheet: AI glossary. ]

"At its core, data science is a field of practice and machine learning is a set of tools and methodologies."

“At its core, data science is a field of practice and machine learning is a set of tools and methodologies,” says JP Baritugo, director at business transformation and outsourcing consultancy Pace Harmon. “Data science uses a broad array of expertise, business knowledge, tools, and methodologies to process big data to generate meaningful insights that drive actions and enable impactful business outcomes.”

Why machine learning demands data science

Effective ML demands good data science. “A data scientist’s expertise is absolutely required to ensure machine learning is used and deployed properly,” Baritugo says. The data scientist may ensure the model is provided with the requisite amount of cleaned and normalized datasets for training and that the right algorithms are used based on the datasets and the business question to be addressed.

"A data scientist’s expertise is absolutely required to ensure machine learning is used and deployed properly." 

However, data science can be applied outside the realm of machine learning. “Data science is the practical application of artificial intelligence, machine learning, and deep learning – along with data preparation – in a business context,” says Ingo Mierswa, founder and president of data science platform RapidMiner.

“While the goal of data science is to extract insights from data, predict future developments, and suggest actions – sometimes even performing those actions automatically – this is achieved with tools like AI and ML,” Mierswa says.

At a basic level, a data scientist gathers and prepares data sets from multiple sources and then applies some capability to extract insight from them. In some cases, they may reach for machine learning. In others, a more basic analysis may make sense. “In my view at least, data science is just the manipulation of data,” says Wayne Butterfield, director of cognitive automation and innovation at ISG.

When to use machine learning

Machine learning excels at extracting data patterns.

When facing a situation in which a solution is hidden in a large volume of data, machine learning is your friend. “ML excels at processing that data, extracting patterns from it in a fraction the time a human would take, and producing otherwise inaccessible insight,” Mierswa says.

For example, machine learning (informed by data science) powers risk analysis, fraud detection, and portfolio management in financial services; GPS-based predictions in travel; and product and content recommendations for Amazon and Netflix.

Machine learning is best matched with problems for which large amounts of well labelled historical data already exist, or for which data can be simulated very quickly. “There isn’t much mileage in using ML if you don’t have enough existing data that it can train on,” says Butterfield.

ML models are only as good as the quality of the data they learn from. “Luckily, there are many types of problems for which lots of data exist,” says Timothy Havens, the William and Gloria Jackson Associate Professor of Computer Systems in the College of Computing at Michigan Technological University and director of the Institute of Computing and Cybersystems

“Certain problems lend themselves to ML very well,” Butterfield explains. Data science (minus machine learning) has been applied to forecasting and planning for years with limited accuracy, for example. “However, now because you can now build complex algorithms that can take into account multiple data sources – such as weather, historic sickness patterns, external events, past demand – you get a much more accurate forecast,” Butterfield says. “And this isn’t just on a daily basis, it can be hourly too.”

In financial services, machine learning and data science can power solutions for fraud prevention.

In financial services, ML and data science can power solutions for underwriting and fraud prevention. In IT, they can improve network management. Healthcare organizations can apply them to improve diagnostic accuracy, determine optimal price and volume mix, or predict patient outcomes. In customer experience management, they come together to improve customer interactions, predict customer lifetime value, and estimate churn. Retailers can tap them to forecast demand, optimize pricing, and segment customers. In manufacturing, data science and ML can be used to automate the supply chain and improve planning.

When data science is enough

With all the excitement around AI, every business problem can start to look like an opportunity to throw a little machine learning at it. However, there remain many instances in which ML may not work – or could be overkill.

Sometimes, data science is enough. “It is difficult to distinctly separate data science and ML since boundaries are blurry. At the end of the day, most of the algorithms use statistical techniques,” says Anil Vijayan, Vice President at Everest Group. “Not every problem requires AI to solve though. In many cases, using “traditional” data science may not just suffice, but also be more efficient.”

Traditional data science techniques (no ML required) may be more effective in scenarios in which the output is dependent on relatively few parameters, and those parameters do not change much over time, Vijayan says. In addition, when the relationship between the output and the parameters are well understood and not likely to shift, Vijayan adds, data science alone will suffice.

[ What's next in AI and ML? Read 10 AI trends to watch in 2020. ]

Stephanie Overby is an award-winning reporter and editor with more than twenty years of professional journalism experience. For the last decade, her work has focused on the intersection of business and technology. She lives in Boston, Mass.