7 Deadly Sins of Data Practice
June 16, 2017
Which of these seven sins is your company guilty of?
The fear of missing out, or FOMO, has driven plenty of companies into setting up their own data practices. Which of these seven sins is your company guilty of?
1. Discarding Data
Cloud storage, nowadays, is so incredibly cheap that deleting data is like scrapping a Ferrari to free up garage space. Some may think it’s ok to discard everything that’s left after extracting ‘all useful information’ from the data but, more often than not, they will be wrong. You don’t know what’s useful until you know what you want to use it for. After you’ve deleted the raw data, there is no way back!
2. Asking Non-Data Scientists to do Data Science
There are many data scientists that don’t have formal data science training, but that doesn’t mean that anyone can do data science. Abundance of plug and play tools make it easy to quickly plot a graph, create a predictive model or set up a recommendation engine - the mere fact that you get some sort of output does not make it correct or valid. While this sounds self-evident, mistakes like this happen more often than you might imagine.
3. Locking Data Scientists in the Basement
If you’ve had to work with data scientists, this might sound like a good idea. But simply hiring data scientists, throwing data at them, and expecting groundbreaking results is not the most productive way to use them. Depending on the problem, they should be teamed up with subject matter experts, business analysts, software developers and empowered by the organisation’s infrastructure team.
4. Metric Madness
Having too many performance metrics is arguably just as bad as having none at all. Having a hundred different metrics describing a hundred different data items doesn’t help anyone. Additionally, not only is it important to define the metrics and get buy-in from stakeholders, one should remember to keep evaluating and revising them from time to time.
5. Assuming Assumption Adherence
Unfortunately, too many managers don’t pay enough attention to the ‘small print’ that comes with the models. There are always limitations and assumptions. It is the duty of the data scientists to be upfront and clear about them. If these are ignored, even great work can be misinterpreted and misused.
6. Models that are Eternally Applicable
It’s hard enough to create useful and robust models that are accurate for today, expecting them to continue to be accurate for ever more is just unrealistic. Data science is a continuous process of upgrading and updating models. The world around us keeps changing so why shouldn’t the models we have designed change with it?! If reinforcement learning takes too much effort, at least make sure you evaluate your results and incorporate learnings from the new data from time to time.
7. The Conspiracy of Optimism
There’s nothing wrong with having moonshot projects that challenge and push your teams to the limit but be honest about what you’re good at. Not everything has to be done using internal resources. Netflix famously stated “We have to be great at a number of things … operating data centers is not one of those”.