Credit: Jennifer Tapias Derch
is a professor of the practice at MIT Sloan. He specializes in data science and machine learning. He was a data science entrepreneur and tech executive for more than 20 years, most recently as senior vice president at Salesforce and chief data scientist for Salesforce Commerce Cloud.
An increasing number of organizations are bringing data scientists on board as executives and managers recognize the potential of data science and artificial intelligence to boost performance. But hiring talented data scientists is one thing; harnessing their capabilities for the benefit of the organization is another.
Supporting and getting the best out of data science teams requires a particular set of practices, including clearly identifying problems, setting metrics to evaluate success, and taking a close look at results. These steps don’t require technical knowledge and instead place a premium on clear business thinking, including understanding the business and how to achieve impact for the organization.
Data science teams can be a great source of value to the business, but failing to give them proper guidance isn’t a recipe for success. Following these steps will help data science teams realize their full potential, to the benefit of your organization.
1. Point data science teams toward the right problem.
Business leaders should take extraordinary care in defining the problem they want their data science teams to solve. Data scientists, especially new ones, often want to get going with preparing data and building models. And at least initially, they may not have the confidence to question a senior business executive, especially if that individual is the project sponsor.
It is up to leaders to make sure the team focuses on the right problem. For example, a company building a model to decide which customers to target in a marketing campaign has to decide whether the model should identify customers with a high propensity to transact, or if it should identify customers who are likely to transact if campaigned to but not otherwise.
Depending on the answer, the path taken by the data science team, including the training data, modeling approach, and level of effort, will likely be quite different, as will the impact on the business.
More generally, to maximize the chance of identifying the right problem, look at what other companies in your industry are doing, especially early adopters of data science. Pay less attention to how they are solving it, as there are usually many different ways to solve any data science problem, and more attention to what they are solving.
2. Decide on a clear evaluation metric up front.
To solve a problem, data science teams typically build lots of models and then select the one that seems best. To make this selection they need a metric. Given multiple models, they can use this metric to rank them and pick the best one.
Leaders need to use business judgment to determine what that metric should be, which is trickier than it sounds. In any complex business situation, there’s no single perfect metric. There are usually many relevant metrics and they often conflict with one another.
For example, a data science team might be asked to use historical contact data to build a model to help the sales team prioritize which customers to contact. Out of the many models the team will build, what metric will indicate the best one?
One option is the error rate, or the percentage of data points for which the model’s predictions are incorrect. This is a reasonable metric, but it is an average of two things: the rate of false negatives — prospects predicted to be a loss (not worth contacting) that would have actually been a win — and the rate of false positives, or prospects predicted to be a win that turn out to be a loss.
A model with the lowest error rate may have a combination of false positives and false negatives that may not be ideal for your business, since these two types of errors can have very different impacts. A model with a slightly higher overall error rate may even be preferable if it balances false positives and false negatives in a way that is better for your business. The best way to balance these errors should be part of the model selection process and requires guidance from the business team upfront.
If you are not sure what metric to use, ask your data science team to educate you on the metrics typically used in the industry to evaluate models for similar problems. You can select one that reflects what’s important for the business, and if none of them are a good match, you can work with your data science team to create a custom metric.
3. Create a common-sense baseline first.
Once you’ve decided on a relevant, important problem and defined a clear evaluation metric that reflects business priorities, you need to create a common-sense baseline, which is how your team would solve the problem if they didn’t know any data science. For example, if your data science team is building a personalized recommendation algorithm for your e-commerce site, a simple baseline would be tracking what product categories visitors look at, and recommending best-selling products from those categories.
Building a common-sense baseline will force the team to get the end-to-end data and evaluation pipeline working and uncover any issues, such as with data access, cleanliness, and timeliness. It will also surface any tactical obstacles with actually calculating the evaluation metric.
Knowing how well the baseline does on the evaluation metric will give a quick ballpark idea of how much benefit to expect from the project. Experienced practitioners know all too well that common-sense baselines are often hard to beat. And even when data science models beat these baselines, they may do so by slim margins.
Finally, this also leads the data science team to spend some time thinking about the data and the problem from first principles, rather than just diving in and throwing powerful machine learning models at the problem. They will develop valuable intuition for what will make a proposed solution do well on the evaluation metric, and think about what to avoid. This will also naturally lead them to talk to business end users who may have been solving the problem manually. Perhaps most importantly, they will begin to build relationships with non-technical colleagues who understand the business, which will pay off for your organization in the long term.
4. Manage data science projects more like research than like engineering.
It is natural for well-meaning executives to ask data science teams to commit to a clear timeline and hold them accountable. After all, this is routinely done in project management. But in this case, it is a mistake.
There’s a strong element of research in most data science work, which means a fair amount of time spent on dead ends with nothing to show for the effort. This trial and error makes it hard to predict when a breakthrough will occur.
For example, data scientists may be able to quickly produce a solution that’s 6% better. But they can’t predict how long it will take them to get from 6% to 10% better. It may happen tomorrow, it may happen next month, it may never happen.
In 2006, Netflix invited data scientists from all over the world to beat their in-house movie recommendation system. The first team to show a 10% improvement would be awarded a $1 million grand prize, and 41,305 teams from 186 countries jumped into the fray. Even so, it took three years for the 10% barrier to be breached.
If data scientists keep missing deadlines, don’t assume that they are incompetent. The problem they are working on may be hard and nobody can predict when it will be solved to your satisfaction.
What leaders can do is meet regularly with data science teams to understand the ups and downs of data science work, which by itself is a valuable thing to develop. If leaders realize at some point that the team’s efforts are plateauing and improvement is inching up slowly, it may be a good idea to pause and reconsider whether the improvement is good enough and it might be time to consider stopping the project. This won’t be an easy decision, but the alternative might be a long and uncertain wait with no guarantee of success.
5. Check for ‘truth and consequences.’
It is important to subject results to intense scrutiny to make sure the benefits are real and there are no unintended negative consequences. The most basic check is making sure the results are calculated on data that was not used to build the models.
Related Articles
Assuming the results are real, also check that there are no adverse side effects. When a model improves performance on a selected metric, it may do so at the expense of other important metrics. For example, an e-commerce business may focus on improving revenue per visitor with an improved recommendation algorithm. Revenue per visitor is the product of conversion rate and revenue per conversion.
If the algorithm achieves its objective by increasing revenue per conversion, but decreases the conversion rate, it may hurt the organization’s strategic goal of having more visitors become customers. By just getting existing customers to spend more, it may end up exhausting their budgets, so to speak, and lead to diminished growth in the future.
There’s always need for judgment about the tradeoff between one metric and another, and business leaders should be involved in making those decisions.
6. Log everything, and retrain periodically.
No amount of testing before launch can completely protect models from producing unexpected or incorrect predictions with certain kinds of input data. But if every input and output is logged in as much detail as possible, investigating and fixing problems will be easier and faster. This is particularly important for consumer-facing applications.
And over time, the nature of the data being fed to the model will start to drift away from the data used to build the model. If no action is taken, this will reduce the efficacy of the model, so it is important to make sure that data science teams have automated processes in place to track model performance over time and retrain as necessary.
Data science models, like software in general, tend to require a great deal of future effort because of the need for maintenance and upgrades. They have an additional layer of effort and complexity because of their extraordinary dependence on data and the resulting need for retraining. Furthermore, lack of knowledge about how models work can make identifying and resolving issues challenging. Logging everything and retraining models periodically are proven ways to address these challenges.