Why So Many Data Science Projects Fail to Deliver

We may earn money or products from the companies mentioned in this post.

Image courtesy of Jean Francois Podevin/ theispot.com

The Research

This article is based on an in-depth study of the data science efforts in three large-scale, private-sector Indian banks with collective assets exceeding $ 200 million.

The study included onsite observations; semistructured interviews with 57 managers, administrators, and data scientists; and the examination of archival records.

The five obstructions and the solutions for solve these problems emerged from an inductive analytical process based on the qualitative data.

More and more companies are espousing data discipline as a function and a capability. But many of them have not been able to consistently obtain business importance from their investments in big data, neural networks, and machine learning.1 Moreover, manifestation suggests that the breach is widening between organizations successfully gaining quality from data science and those struggling to do so.2

To better understand the mistakes that companies attain when implementing rewarding data science projects, and discover how to avoid them, we conducted in-depth studies of the data science activities in three of India’s top 10 private-sector banks with well-established analytics districts. We recognized five common mistakes, as exemplified by the following cases we encountered, and below we recommend equating solutions to address them.

Mistake 1: The Hammer in Search of a Nail

Hiren, a recently hired data scientist in one of the banks we studied, is the kind of analytics wizard that organizations covet.3 He is especially taken with the k-nearest neighbours algorithm, which is useful for identifying and classifying clusters of data. “I have applied k-nearest neighbors to several simulated data sets during my studies, ” he told us, “and I can’t wait to apply it to the real data soon.”

Hiren did accurately that months later, when he expended the k-nearest neighbors algorithm to identify especially fruitful industry segments within the bank’s portfolio of business checking accounts. His recommendation to the business checking accounts team: Target two of the portfolio’s 33 industry segments.

This conclusion underwhelmed the business team representatives. They once known about these segments and were able to ascertain segment profitability with simple back-of-the-envelope figurings. Squandering the k-nearest neighbors algorithm for this task was like working a guided missile when a pellet gun would have sufficed.

In this case and some others we examined in all three banks, the failure to achieve business value resulted from an infatuation with data science solutions. This downfall can play out in several ways. In Hiren’s case, the problem did not require such an elaborated answer. In other situations, we assured the successful exert of a data discipline mixture in one realm become the rationale for its use in another arena in which it wasn’t as appropriate or effective. In short-lived, this mistake does not arise from the technical execution of the analytical technique; it arises from its misapplication.

After Hiren developed a deeper understanding of the business, he returned to the team with a brand-new recommendation: Again, he proposed employing the k-nearest neighbours algorithm, but this time at the customer level instead of the industry level. This proved to be a much better fit, and it resulted in new penetrations that allowed the team to target as-yet untapped customer segments. The same algorithm in a more appropriate context offered a much greater potential for realizing business value.

It’s not exactly rocket science to observe that analytical solutions are likely to work best when they are developed and applied in a way that is sensitive to the business context. But we found that data science does seem like rocket science to countless managers. Dazzled by the high-tech aura of analytics, they can lose sight of context. This was more likely, we discovered, when managers viewed a mixture work well elsewhere, or when the mixture was accompanied by an intriguing label, such as “AI” or “machine learning.” Data scientists, who were typically focused on the analytical methods, often has not been able or, at any rate, did not provide a more holistic perspective.

To combat this problem, senior managers at the banks in our study often turned to training. At one bank, data science recruits were required to take produce training courses schooled by region experts alongside produce tie-in manager trainees. This bank also offered data science teach tailor-make for business directors at all levels and coached by the head of the data science unit. The curriculum included basic analytics abstractions, with an emphasis on questions to ask about specific solution techniques and where the method used should or should not be used. In general, the training interventions designed to address this problem aimed to facilitate the cross-fertilization of knowledge among data scientists, business administrators, and domain professionals and help them develop a better understanding of one another’s chores.

In related fieldwork, we have also seen process-based secures for by-passing the mistake of jumping too quickly to a favored answer. One vast U.S.-based aerospace companionship uses an approach it calls the Seven Ways, which requires that teams identify and compare at least seven possible solution approachings and then explicitly justify their final selection.

Gaffe 2: Unrecognized Roots of Bias

Pranav, a data scientist with expertise in statistical simulate, was developing an algorithm aimed at producing a recommendation for the underwriters responsible for approving locked loans to small and medium-sized firms. Squandering the ascribe approving memorandas( CAMs) for all loan employments handled over the previous 10 years, he likened the borrowers’ fiscal health at the time of their application with their current financial status. Within a couple of months, Pranav had a software tool built around a highly accurate model, which the underwriting team implemented.

Unfortunately, after six months, it became clear that the delinquency proportions on the lends were higher after the tool was implemented than before. Perplexed, senior managers blamed an experienced underwriter to work with Pranav to figure out what had gone wrong.

The epiphany came when the underwriter discovered that the input data came from CAMs. What the underwriter knew, but Pranav hadn’t, was that CAMs were prepared only for credits that had already been prescreened by experienced affair managers and were very likely to be approved. Data from lend lotions spurned at the prescreening theatre was not used in the development of the simulation, which produced a huge selection bias. This bias conducted Pranav to miss the import of a critical decision parameter: bounced checks. Unsurprisingly, there were very few instances of bounced checks among the borrowers whom rapport directors had prescreened.

The technical deposit in this case was easy: Pranav added data on loan applications repudiated in prescreening, and the “bounced checks” parameter became an important element in his simulate. The implement began to work as intended.

The bigger problem for companionships seeking to achieve business significance from data science is how to mark such sources of bias upfront and rest assured that they do not creep into frameworks in the first place. This is challenging because laypeople — and sometimes analytics experts themselves — can’t readily tell how the “black box” of analytics produces output. And analytics professionals who do understand the black box often do not recognize the biases embedded in the raw data they use.

The banks in research studies evaded unrecognized bias by requiring that data scientists become more familiar with the sources of the data they use in their sits. For instance, we experienced one data scientist spend a month in a division shadowing a relationship manager to identify the data needed to ensure that a sit made accurate results.

We too realized a project team composed of data scientists and business professionals use a formal bias-avoidance process, in which they determined possible predictor variables and their data sources and then analyse each for possible biases. The objective of this process was to question hypothesis and otherwise “deodorize” the data — thus avoiding difficulties that can arise from the circumstances in which the data was created or gathered.4

Mistake 3: Right Mixture, Wrong Time

Kartik, a data scientist with expertise in machine learning, spent a month developing a sophisticated simulation for analyzing savings account attrition, and he then wasted three more months fine-tuning it to improve its accuracy. When he shared the final product with the savings account product team, they only affected, but they could not sponsor its implementation because their annual budget had already been expended.

Eager to avoid the same result the following year, Kartik presented his example to the product team before the budgeting cycle began. But now the team’s mandate from senior management had shifted from chronicle retention to account acquisition. Again, the team was unable to patron a project based on Kartik’s model.

In his third year of trying, Kartik ultimately got approval for development projects, but he had little to celebrate. “Now they want to implement it, ” he told us, with evident fright, “but the pattern has crumbled and I will need to build it again! ”

The mistake that blocks banks from achieving appraise in cases like this is a lack of synchronization between data science and the priorities and processes of the business. To evade it, better linked with data science and the strategies and systems of the business are needed.

Senior ministerials can ensure the alignment of data science activities with organizational approaches and organisations by more tightly integrating data discipline practices and data scientists with the business in physical, structural, and process periods. For illustration, one bank embedded data scientists in business teams on a project basis. In this course, the data scientists chafed elbows with the business team daytime to era, becoming more aware of its priorities and deadlines — and in some cases actually apprehending unarticulated business needs. We have also encountered data discipline teams colocated with business crews, as well as the use of process authorizations, such as requiring that project activities be conducted at the business team’s locale or that data scientists be included in business team joins and activities.

Generally speaking, data scientists ought to be concentrating their efforts on the problems regarded most important by business leaders.5 But there is a caveat: Sometimes data discipline induces unexpected insights that should be brought to the attention of senior leaders, regardless of whether they align with current priorities.6 So, there is a line to be strolled now. If an revelation starts that does not fit current priorities and arrangements but nonetheless could deliver substantial appraise to the company, it is incumbent upon data scientists to communicate this to management.

We found that to facilitate exploratory work, bank directors sometimes apportioned additional data scientists to project teams. These data scientists did not colocate and were instructed not to concern themselves with unit priorities. On the contrary, they were tasked with building alternative solutions related to the project. If these solutions turned out to be viable, the head of the data science unit pitched them to senior government officials. This dual approaching recognizes the epistemic interdependence between the data science and business professionals — a scenario in which data science seeks to address today’s business needs as well as spy opportunities to innovate and transform current business practices.7 Both characters are important, if data science is to realize as much business importance as possible.

Correct 4: Right Tool, Wrong User

Sophia, a business analyst, worked with her team to develop a recommendation engine capable of offering accurately targeted brand-new products and services to the bank’s patrons. With assistance from the marketing team, the recommender was added to the bank’s mobile wallet app, internet banking website, and emails. But the anticipated brand-new business never materialized: Customer uptake of the product suggestions was much lower than anticipated.

To discover why, the bank’s telemarketers surveyed a sample of customers who did not purchase the new produces. The mystery was quickly solved: Countless purchasers disbelieved the credibility of policy recommendations to be provided by apps, websites, and emails.

Still looking for answers, Sophia inspected various of the bank’s disciplines, where she was surprised to discover the high degree of trust patrons appeared to place in the advice of relationship directors( RMs ). A few informal experiments persuasion her that customers would be much more likely to accept the recommendation engine’s suggestions when are reported in the field by an RM. Realizing that the problem wasn’t the recommender’s model but the delivery mode of the recommendations, Sophia met with the senior leaders in branch bank and proposed relaunching the recommendation engine as a tool to support product marketings through the RMs. The redesigned initiative was a huge success.

The predicaments Sophia encountered foreground the need to pay attention to how the outputs of analytical implements are communicated and used. To render full appreciate for customers and the business, customer knowledge analysis should be included in the data science design process. At the least, used testing should be an explicit part of the data science project life cycle. Better yet, a data science rehearsal could be situated within a human-centered design frame. In addition to user testing, such a formulate could mandate consumer research on the front end of the data science process.

While we did not see instances of data science embedded within blueprint pondering or other human-centered design rehearses for the purposes of this report, we did find that the shade procedures described above sometimes controlled as a kind of user experience analysis. As data scientists shadowed other employees to understand the sources of data, they too gained an understanding of users and directs through which answers could be delivered. In short, the use of shadowing in data science projects contributes to a better understanding of the relevant procedures that generate data, and of solution useds and delivery paths.

Mistake 5: The Rocky Last Mile

The bank’s “win-back” initiative, which was aimed at recovering lost clients, had fixed no progress for months. And that day’s meeting between the data scientists and the produce directors, which was supposed to get the initiative back on track, was not going well either.

Data scientists Dhara and Viral were focused on how to identify which lost patrons were most likely to return to the bank, but produce administrators Anish and Jalpa wanted to discuss the details of the campaign are in place to were pushing the data scientists to take responsibility for its implementation immediately. After the converge adjourned without a breakthrough, Viral showed his frustration to Dhara: “If data scientists and specialists do everything, why does the bank need make overseers? Our job is to develop an analytical mixture; it’s their chore to execute.”

By the next fill, though, Viral seemed to have changed his mind. He made a specified effort to understand why the concoction administrators kept insisting that the data scientists take responsibility for implementation. He discovered that on several opportunities in the past, the information systems department had given the bank’s product directors rolls of customers to target for win-back that had not resulted in a successful campaign. It turned out that using the rolls had been extremely challenging, partly due to an inability to track customer contacts — so the produce administrators felt that being given another list of target patrons was simply mounting them up for another failure.

With this newfound understanding of the problem from the point of view of the product overseers, Viral and Dhara added to their project plan the development of a front-end software application for the bank’s telemarketers, email handling crews, chapter banking faculty, and resources squads. This provided them with a implement where they are unable to feed information from their interactions with customers and make better help of the lists provided by the data science team. Finally, the project moved ahead.

Viral and Dhara’s actions required an peculiar measure of empathy and initiative. They stepped out of their characters as data scientists and behaved more like project supervisors. But companionships probably should not depend on data scientists in this way, and they may not want to — after all, the technical expertise of data scientists is a scarce and expensive resource.

Instead, companies can involve data scientists in the implementation of answers. One bank in our study achieved this by adding estimates of the business value delivered by data scientists’ solutions to their performance evaluations. This motivated data scientists to ensure the successful implementation of their solutions. The bank’s execs acknowledged that this sometimes generated data scientists to operate more far outside their assigned responsibilities. However, they believed that ensuring value delivery justified the recreation of data science assets, and that it could be corrected on a case-by-case basis, if the negative impact on the core responsibilities of data scientists turn excessive.

The mistakes we identified invariably occurred at the boundaries between the data science function and the business at large. This suggests that leaders should be adopting and promoting a broader idea of the role of data science within their companies — one that includes a higher degree of coordination between data scientists and employees responsible for problem diagnostics, process organisation, and solution implementation. This tighter linkage can be achieved through a variety of implies, including course, shadowing, colocating, and render formal motivations. Its payoff are likely to be fewer answer outages, shorter projection repetition times, and, ultimately, the attainment of greater business value.

Read more: sloanreview.mit.edu