Written by Zarreen Reza
Women Who Code Talks Tech 10 | Spotify — iTunes — Google — YouTube — Text
Zarreen Reza AI Research Scientist at Volta Charging and Leadership Fellow at Women Who Code shares Mistakes to Avoid as an AI Practitioner in the Industry. She discusses the importance of knowing when AI is actually the appropriate solution, the value of domain expertise on a project, and other key factors in successful AI applications.
I’m going to tell you mistakes to avoid if you want to be an AI practitioner in the industry, especially if you are coming from an academic mindset. Around 90% of total machine learning models that we build in a company or in a research lab, don’t make it to production. One in ten data scientists’ AI solutions end up being a part of products. Nine of the data scientists’ solutions either get discarded, discontinued, or have to pivot.
I will highlight twelve mistakes that are really crucial to avoid if you want to make a successful deployment to the production of an AI-based solution. The first mistake is considering that every data problem requires an AI ML solution. In today’s market, AI is a buzzword. Everyone is hyped up to try some new algorithms and try to solve all the problems. A common saying in the AI world is that when you have a hammer, everything looks like a nail. When you have AI skills, you have the resources in your team, you feel like you need to solve all the problems through only an AI algorithm. That’s not necessarily true all the time. The goal should be to find a practical and efficient solution you find for your business use case. AI algorithms are not always very practical to use for real-world use cases.
Ask yourself and your team this question, are you trying to solve your business problem using the most effective method and also maximizing the savings and profits? You also need to decide on this approach on a case-by-case basis. Don’t buy into all the hype that’s going on around AI. AI solutions are really good for certain types of problems, but they are not as good as some other traditional rule-based algorithms.
Mistake number two is thinking domain knowledge can always be compensated by data science skills. This is a misconception. It’s tempting to think we can let the machine learning algorithm figure it out by itself. Domain expertise or knowledge in data science is crucial. It’s more important than other tech-related rules. Domain knowledge must be pervasive throughout the data science methodology. From data collection, and data pre-processing to the model development and validation, plugging in domain expertise will save time and resources. Otherwise, we will be spending hours cleaning and processing the data.
Domain experts will save a lot of time and resources. They can help navigate the problem definition as well. What type of problems exist in that particular domain, so if it’s a medical domain, finance domain, or legal domain, the domain experts will have a better understanding of the problem and what type of solutions they are looking for. They can help so much in cleaning and preprocessing the ugly data. You can overcome challenges pretty quickly with the help of a domain expert. The domain expert can also help validate the results that your model is providing. Run frequent check-ins with them to make sure it makes sense. Also, a domain expert can facilitate an understanding of the client’s needs better. One important thing is, that data science skills and domain expertise can be mutually exclusive. It doesn’t have to be a data scientist coming with domain expertise.
Mistake number three is ignoring AI ethics, bias, and privacy considerations. It’s crucial to address ethical and privacy concerns as early as possible. There is privacy bias, inequality, safety, and security. All of those concerns are going around these days about AI solutions, especially if they’re in the fields of medical or finance. Those very sensitive areas where people don’t trust the AI solutions or algorithmic solutions as much as they trust a human decision maker. It is important to think about ethical and privacy concerns because of these trust issues. AI models should not be in favor or against a certain group. If you neglect all these aspects before starting the model development, they can lead to significant financial, brand, reputation, and human risk.
When we are doing the data collection, it’s really important that the data is diverse enough to feed the data that we will see in the real world. You should also think through the possible ways the model can be exploited or used in favor of, or against a subgroup. This type of analysis is important. There are a lot of experiments that show that even if you have a trained model at a level to the public, there are ways that you can actually reverse engineer. You can backtrack the weights from AI models and come back to some identifiable information hidden in that data. That could be pretty scary depending on certain fields. There are multiple tools to validate those things and diagnose these problems.
Mistake number four is thinking that complex models are better than simpler models. So that’s not necessarily true all the time because of several reasons. William Ockham has a famous hypothesis, known as Ockham’s Razor, that suggests that simpler models with fewer coefficients would be preferred over complex models like in samples. This is very much true in today’s machine learning development, where state-of-the-art models is not always feasible for practical use cases. If you want to make a product that needs to run that big train model on any of the smaller devices, it’s not a practical solution. Simpler models can also be easier to debug and run experiments faster. When you have a very large complex model where finishing one training requires maybe five to six days, then it’s really difficult for you to try out a lot of combinations. You are losing time. Also, simpler models are much cheaper to deploy and scale. Complex models require more computational resources and more cost.
Have a baseline model to compare against. Your first model, the simplest model could be used as a baseline model. When you are moving onto more complex models, you can easily compare the improvement you have obtained through this, by adding this additional complexity. Also, try to avoid ensemble models. The research papers show solutions based on ensembling models. When you assemble multiple models and take the average of the predictions, they tend to perform much better than a single model. Multiple models are at work on the same problem. These are good for academic papers because you always pass the state-of-the-art result and you can set a new benchmark. However, if you want to deploy them into your production, this is not really an efficient or practical way of deploying the solution. You not only need one model, but you also need to deploy maybe five different models. The five different models need to parallelly, make the predictions, and then you have to aggregate them in some way and then provide the inference. There is a lot of time and memory complexity there.
You need to register all five models in your device before it starts making the prediction. So try to avoid ensemble models whenever possible, and also consider memory and time complexity requirements for inference. Training your model can actually be done on a more resource-intensive platform. You have more GPUs in your infrastructure and you train your model there. You package the model and you ship it for inference to another platform. It’s really important to know inference needs to be faster. The inference might need to be CPU compatible. Think of training and inference separately. What are the infrastructure requirements for training a model versus what are the infrastructure requirements for making inferences using that model? They can be different. A model can be much slower while training, but the model can be much faster during inference.
Mistake number five is to consider your AI solution as a black box. This is a very common scenario. It’s really difficult to interpret the result, understand what’s going on inside, all these things. When you are transferring this model to people, to a set of people who are not as well-versed in AI, transparency, explainability, and interpretability become crucial to earning their trust. Our mindset needs to transition from black box to glass box modeling. By glass box I mean we have to peek inside the model and see what’s going on as much as possible. There will always be some things that are difficult to interpret. Explainability goes a long way to build trust on AI-made decisions. Interpretability of results helps debug and identify potential harms as well. When you have a model which is a black box, you have no idea what’s happening inside. You know inputs are going in and outputs are coming out. You don’t know what’s happening inside. When you are not getting the expected outputs, it’s difficult to debug and understand why. If you have some interpretability analysis of the results, then it makes it also much easier to debug and fix that model. It ensures ethical decision-making.
Decompose a complex model into simpler segments. Having a simpler model will come in handy, to do these explainability analyzes. If you have a big AI model, try to decompose it into a simpler model so it’s easier to debug, analyze, or interpret its results. Put constraints on the model to add a new level of insight. Maybe restrict the model at first, so that it doesn’t go in multiple directions. Your horizon gets enclosed by certain restrictions. You don’t have to try all the combinations to come to analyze the predictions, so you will have fewer parameters to deal with. Run both quantitative and qualitative analyses of the predictions. Quantitative analysis is basically doing a precision-recall curve.
Mistake number six is to emphasize model-centric AI over data-centric AI. Data-centric AI requires a heavier focus on tuning data, rather than tuning models. We spend a lot of time doing trial and error, hyperparameter tuning, and all these things to come up with the best model to make the model learn the most useful things. It’s not always worth the effort to put a lot of resources or time into coming up with a new algorithm. We don’t always need to reinvent the wheel, sometimes we can just use these off-the-shelf AI models. They may work just fine for your problem.
Mistake number seven is treating AI Model Development Projects the same as Software Development Projects. I have seen this mistake first hand and what the repercussions were. Software engineering and machine learning are different in their execution. Software developers automate tasks by writing programs, whereas machine learning engineers or data scientists try to make the computer find a program that fits the data. Machine learning entails exploration, experimentation, and uncertainty. It also needs to be robust to variability in dynamic conditions as well. While planning the model development, keep the buffer for uncertainty. Allocate time and resources for doing research. Be prepared for failed experiments and inconclusive outcomes. Be prepared to pivot. In machine learning, there are a lot of uncertainties, and variabilities, things can go wrong, or things can go the way we’re not expecting.
Mistake number eight is not adopting test-driven development in data science when required. Test-driven data science is a concept that is getting quite popular these days. The TDD system has three stages, red, green, and refactoring. In red, you write a test case that fails. In green, you write the code that passes that test. Refactoring, you try to get rid of all the redundancies. Modularize, clean your code and make the code usable. Writing these test cases actually forces you to think about what scenarios users might create later on, and what type of variations we can expect in the input data. It also helps modularize your code base for using functions and for future expansion. If there’s a possibility in the future, that your model needs to be extended for a certain type of features or data sets, this kind of TDD based system will help you do that. Write test cases for modules covering data variation, exception handling, and anomaly detection. Have your codes reviewed by your fellow data scientists. Encourage collaborative coding and sharing of ideas among them. It helps the data scientists to think out of the box, and also think more clearly.
We have already covered eight of the mistakes, there are four more. I won’t go into too much detail. Mistake number nine could be skipping the proof of concept and MVP stage to directly jump onto the production. Mistake number 10 is not defining the infrastructure requirements before and during model development. Number 11 is moving on to production before the pilot run and expert evaluation. Number 12 is no plan for systematic model monitoring. This is very important because if you don’t monitor your model, there will be data drift while the model is in production. If there is no mechanism for continuous monitoring of your model, you will not know when the model has started deviating from its original performance.