85% of data science projects fail – here’s how to avoid it
Here are a few common traps that data scientists can avoid to NOT be one of the 85% of data science projects that fail.
By Sparkbeyond.
85% of data science projects fail. So how do you avoid being part of that statistic? Here are a few common traps that data scientists can avoid.
1. Move beyond predictions
There’s no doubt that predictive modeling is a big upside of data science — especially during those frequent instances when we know that the result is out of our control so predicting it is all we can do. But why only limit data science to predictions?
For example, should we simply accept that customers will churn, and simply make retention offers to those most at risk? Or should we understand why people are likely to churn and make them happier customers in the first place?
We need to move beyond just building predictive models to uncovering underlying drivers. This is obviously easier said than done, given finding the root causes of your open-ended problem is much more complex than building a model. If you want to shape the future instead of being shaped by it, you need to discover what drives your problem.
2. Do you know what you want to know?
When turning a business problem into a data science use case, the first question is often, “What is my target variable?” This isn’t as trivial a question as you may think.
Common analytics use cases often have multiple angles. Take insurance claims, for example. We’d want to know which claims are overall low risk and can be fast-tracked. In addition, we’d also want to know which would need to go through triage with another insurer, or which ones are likely uncovered. Each of these objectives typically has different drivers, and using traditional methods, exploring five use cases will require you to put in five times the effort. In order to create sustainable business impact, this is not enough.
3. What holds true today won’t be relevant tomorrow
The volatility of the pandemic surfaced the well-known problem of simply recalibrating models on up-to-date data. Recalibration only allows your models to correctly interpret the information presented to them — the information encoded in the features that the data scientist provides. But what about the information that was discarded or ignored by the data scientist, because, in the past, it did not matter?
As different as the above problems may seem, one approach does offer a solution to them all: leveraging AI to generate hypotheses at scale.