Why most ML models never see the light of the day?

When I completed my graduation and started my Data Science job in 2019, Data Science was(still is) a buzzword that everyone was excited about, every other person you meet would want to be a Data Scientist. Every company would want one. This trend is still going up, but now after completing my 3 years in the field, I am more skeptical about the hype than ever before.

When I hear someone say they want to be a Data Scientist, I ask them if they really know what a Data Scientist does, apart from the glamorous bits? If I see a company adopting Data Science, I question whether at this moment in their journey do they even need a Data Science team, or is their money better invested somewhere else?

Let's say even if you get past these hurdles, are you really going to be successful in using Data Science in the real world? Do the models you build get deployed and used?

In my experience, most ML models don't get deployed. There is a huge gap between the models that Data Scientists build in POCs and the ones that actually get to the finish line. But why is this gap so huge?

I did my own research to answer this. After talking to a few colleagues and other connections in the industry, here's what I have found out:

Companies that get into AI because of the hype

AI-vs-WP-plugin-meme-featured-image-570x380.png Even if you are a well intentioned person, trying to solve some real-world problem with Data Science, you end up doing no good if you work for a company that hired a Data Scientist or two because of the hype. Companies do this to attract clients and would associate terms such as "AI first" or "powered by AI" with them. Since they are not really serious about it, putting an ML model into production would never be a priority for them.

Dependency on Tech to integrate and deploy these models

As Data Scientists we are concerned with writing pipelines to collect and prepare data, running experiments and if a model works well in validation writing a Flask API to be consumed by the Software Engineering Team. Now, no matter how good of a Data Scientist you are you won't be able to integrate this API in the main product/service to be used by the end-user. That is something we are just not an expert in.

So, usually in small to mid sized companies, Tech is mostly focussed on making things available and stable. They want to maximise the uptimes. In this case, putting DS models into production isn't really a priority for them.

Unrealistic Business Expectations

Sometimes business itself becomes a hurdle for ML models to go into production by setting unrealistic expectations. It is okay to expect 99% accuracy from a loan eligibility prediction model but the same is not needed from a text classifier on a news website. It is important to find that sweet spot between what DS models can realistically achieve and what business can manage with. 70% accuracy is still better than not having any model in place and putting 100 hours of manual labor in a task.

This is where Product Managers come in. They coordinate with both Business and Data Science to get the requirements that both parties agree on and prepare a Product Requirements Document(PRD) that people can refer to, in cases of confusion. This will also save a lot of iterations that Data Scientists go through to get a model approved for production by leadership and business.

Complexity and Latency Issues

Now, not every reason why an ML model won't go into production is somebody else's fault. We as Data Scientists fail a lot more than we would want to admit. We are so starstruck by every cool state-of-the art that comes in limelight: be it GPT(and all it's variants), BERT or GANs, that we forget something as basic as time and space complexity. Yes, this is not just a theoretical concept. The performance of the model in validation and testing is not the only metric by which we should judge it. If it will eventually be used for near real-time predictions, space and time constraints are really important. Therefore, every Data Scientist should have some basic Software Engineering skills in their toolbox.

Communication gap

images.jpeg I have observed that Data Scientists and people in tech (in general) have such bad communication skills(including myself) that they are not able to convince the stakeholders to use their models. Good communications skills are so underrated in the industry that people hardly focus on improving these. Online courses on Data Science won't teach you this nor will anyone in college. It is something you just have to learn on your own(or maybe from people in marketing).

What after these models are deployed in production?

In case you are able to overcome the above mentioned problems, deploying an ML model is not the end of the line. Generally, the performance of the model goes down with time, in some cases more rapidly than others. These models and the data has to be monitored for feature or data drift. If the distribution of the features that you have trained your model on changes or the relationship between them changes, it is called a data drift. This could be because the way in which data is collected has changed or there is a bug in the process or a natural drift such as more and more polarised tweets before election for a sentiment classifier.

Now, continuously checking these drifts is a tedious process. Therefore, companies set up pipelines to detect data drifts and retrain models.

Conclusion

I hope I have rightly highlighted the barriers that Data Scientists face in not only putting the models into production but also making sure they remain useful after being deployed. Data Science is an emerging field and like any other science, a majority of the experiments fail and as Data Scientists we should be prepared for that. However, there are some institutional and industry-wide problems that need to be solved at a large scale.

While the field is glamorised a lot, it's also important to look at it from a realistic lens-especially if you are thinking of starting your journey in Data Science. In this article I hoped to do that. If there are any other barriers you faced, please highlight it in the comments below.