{"id":82535,"date":"2023-03-31T15:55:21","date_gmt":"2023-03-31T15:55:21","guid":{"rendered":"https:\/\/www.globallogic.com\/uk\/?post_type=insightsection&amp;p=82535"},"modified":"2023-03-31T15:55:21","modified_gmt":"2023-03-31T15:55:21","slug":"mlops-principles-part-one-model-monitoring","status":"publish","type":"insightsection","link":"https:\/\/www.globallogic.com\/uki\/insights\/blogs\/mlops-principles-part-one-model-monitoring\/","title":{"rendered":"MLOps Principles Part One: Model Monitoring"},"content":{"rendered":"<div class=\"classic_editor_content\">Machine learning (ML) has quickly become one of the most transformative technologies of our time \u2013 with applications in a wide range of industries, from healthcare and finance to retail and transportation. As organisations begin to adopt ML, they are facing new challenges arising from working with ML systems. Building, deploying and maintaining ML models at scale requires a new set of practices and tools, which is where Machine Learning Operations (MLOps) comes in.<\/p>\n<p>In this two-part blog series, we&#8217;ll explore some of the common problems organisations face when trying to productionise ML models. Namely:<\/p>\n<ul>\n<li>Why you should monitor your model in production<\/li>\n<li>What is model bias and how to combat it<\/li>\n<li>The importance of model explainability when using ML to make impactful decisions<\/li>\n<\/ul>\n<p>Each blog will define the definitions of these concepts and discuss popular open-source tools to address them.<\/p>\n<p>This blog will explore the various aspects of model monitoring \u2013 why you should implement it in your pipeline and the tools available.<\/p>\n<h4><\/h4>\n<p>&nbsp;<\/p>\n<h4>Model Monitoring<\/h4>\n<p>Model monitoring in production is a critical aspect of MLOps which enables organisations to ensure their deployed models are performing as expected and delivering accurate, reliable results. The ability to monitor models in production is crucial for identifying issues (which we\u2019ll cover below), debugging errors, and enabling fast iteration and improvement.<\/p>\n<p>If a ML model is not properly monitored, it may go unchecked in production and produce incorrect results, become outdated and no longer provide value to the business, or develop subtle bugs over time that go undetected. Unlike traditional software applications, ML systems tend to fail silently as the accuracy of the model degrades over time. For example, an ML model designed to predict house prices before the 2008 financial crises would produce poor quality predictions during the crisis.<\/p>\n<p>In industries where ML plays a central role, failing to catch these types of issues can have serious consequences \u2013 for example, in workflows where important decisions are dependent on the model\u2019s outputs. These decisions can have a high impact on customers, especially in regulated industries such as banking.<\/p>\n<p>&nbsp;<\/p>\n<h4>Types of Model Monitoring<\/h4>\n<p>When discussing model monitoring, the initial thought that comes to mind is monitoring the performance of a deployed ML model in production by comparing the predictions made by the model and the ground truth. However, this is only the tip of the iceberg.<\/p>\n<p>Broadly speaking, you can monitor your ML models at two levels:<br \/>\n\u2022 Functional level \u2013 monitoring input and output data and model evaluation performance.<br \/>\n\u2022 Operational level \u2013 monitoring the resources used by the deployed model and the pipelines involved in creating the model.<\/p>\n<p>In this blog, we\u2019ll be focusing on functional level monitoring and the potential problems that can be detected and remedied through utilising it.<\/p>\n<p>Typically, a Data Scientist or ML Engineer who is familiar with the deployed model and the underlying datasets used for training is responsible for monitoring at the functional level.<\/p>\n<p>&nbsp;<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-82613\" src=\"https:\/\/www.globallogic.com\/uki\/wp-content\/uploads\/sites\/4\/2023\/03\/1.png\" alt=\"\" width=\"720\" height=\"405\" srcset=\"https:\/\/www.globallogic.com\/uki\/wp-content\/uploads\/sites\/4\/2023\/03\/1.png 720w, https:\/\/www.globallogic.com\/uki\/wp-content\/uploads\/sites\/4\/2023\/03\/1-300x169.png 300w\" sizes=\"auto, (max-width: 720px) 100vw, 720px\" \/><\/p>\n<p><em>Figure 1 &#8211; Types of functional level monitoring<\/em><\/p>\n<p>&nbsp;<\/p>\n<p>The deterioration of a ML model&#8217;s performance over time can be attributed to two main factors: <strong>data drift<\/strong> and <strong>concept drift<\/strong>. <strong>Data drift<\/strong> occurs when the distribution of the input data deviates from the data the model was trained on, resulting in poor quality predictions. It can be detected by monitoring the input data being fed into the model and using statistical tests such as the Kolmogorov\u2013Smirnov test or by using metrics to measure the difference between two distributions.<\/p>\n<p><strong>Concept drift<\/strong>, on the other hand, is a change in the relationship between the target variable and the input data \u2013 for example, the sudden surge in online shopping sales during the pandemic lockdowns. Concept drift is detected by continuous monitoring of the model\u2019s performance over-time and the distribution of the model\u2019s prediction confidence scores (only applicable for classification models).<\/p>\n<p>&nbsp;<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-82612\" src=\"https:\/\/www.globallogic.com\/uki\/wp-content\/uploads\/sites\/4\/2023\/03\/2.png\" alt=\"\" width=\"720\" height=\"405\" srcset=\"https:\/\/www.globallogic.com\/uki\/wp-content\/uploads\/sites\/4\/2023\/03\/2.png 720w, https:\/\/www.globallogic.com\/uki\/wp-content\/uploads\/sites\/4\/2023\/03\/2-300x169.png 300w\" sizes=\"auto, (max-width: 720px) 100vw, 720px\" \/><\/p>\n<p><em>Figure 2 &#8211; Illustration of concept and data drift. Original image source: <a rel=\"external nofollow\" target=\"_blank\" href=\"https:\/\/www.iguazio.com\/glossary\/concept-drift\/\">Iguazio<\/a><\/em><\/p>\n<p>&nbsp;<\/p>\n<p><strong>Data quality<\/strong> is another important factor to consider when discussing model performance. Unvalidated data can potentially result in misleading predictions or cause the model to break as unexpected inputs are given to the model. To prevent this, data validation tools can be employed to ensure that the incoming data adheres to a data schema and passes quality checks before going to the model.<\/p>\n<p><strong>Inference speed<\/strong> may also be monitored; this tells us the time it takes for an ML model to make a prediction. Some use-cases may require fast inferencing times due to time-sensitive applications or high-volume requests.<\/p>\n<p>&nbsp;<\/p>\n<h4>Tooling<\/h4>\n<p>There are many tools used available for model monitoring, including those that are exclusive to AWS (SageMaker Model Monitor), Azure (Azure Monitor), and GCP (Vertex AI Model Monitoring). We\u2019ve selected two open-source Python packages which have stood out to us as feature-rich and actively developed \u2013 Evidently AI and NannyML.<\/p>\n<p>&nbsp;<\/p>\n<h6>Evidently AI<\/h6>\n<p>Evidently AI evaluates, tests, and monitors the performance of ML models and data quality throughout the ML pipeline. At a high level, there are three core aspects of the package:<\/p>\n<p>1 \u2013 Tests are performed on structured data and model quality checks which typically involve comparing a reference and a current dataset. Evidently AI has created several pre-built test suites which contain a set of tests relevant for a particular task. These include data quality, data drift, regression and classification model performance, and other pre-sets.<\/p>\n<p>2 \u2013 Interactive reports. These help with visual exploration, debugging, and documentation of the data and model performance. In the same fashion as test suites, Evidently AI has created pre-built reports for specific aspects. If none of the pre-built test suites or reports are suitable for your use case, you are able to build a custom test suit or report. All pre-built suites and reports can be found on their <a rel=\"external nofollow\" target=\"_blank\" href=\"https:\/\/docs.evidentlyai.com\/presets\/all-presets\">presets documentation page<\/a>.<\/p>\n<p>&nbsp;<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-82611\" src=\"https:\/\/www.globallogic.com\/uki\/wp-content\/uploads\/sites\/4\/2023\/03\/3.png\" alt=\"\" width=\"720\" height=\"405\" srcset=\"https:\/\/www.globallogic.com\/uki\/wp-content\/uploads\/sites\/4\/2023\/03\/3.png 720w, https:\/\/www.globallogic.com\/uki\/wp-content\/uploads\/sites\/4\/2023\/03\/3-300x169.png 300w\" sizes=\"auto, (max-width: 720px) 100vw, 720px\" \/><\/p>\n<p><em>Figure 3 &#8211; Example of a data drift report. Image source: <a rel=\"external nofollow\" target=\"_blank\" href=\"https:\/\/docs.evidentlyai.com\/presets\/data-drift\">Evidently AI<\/a><\/em><\/p>\n<p>&nbsp;<\/p>\n<p>3 \u2013 Near-real-time ML monitoring functionality that collects data and model metrics from a deployed ML service. In this aspect, Evidently AI is deployed as a monitoring service that calculates metrics over streaming data and outputs them in the <a rel=\"external nofollow\" target=\"_blank\" href=\"https:\/\/prometheus.io\/\">Promethetus<\/a> format which can then be visualised using a live dashboarding tool such as <a rel=\"external nofollow\" target=\"_blank\" href=\"https:\/\/grafana.com\/grafana\/\">Grafana<\/a> \u2013 this functionality is in early development and may be subject to major changes.<\/p>\n<p>On top of all this, Evidently AI provides examples of integrating with other tools in the ML pipeline such as Airflow, MLflow, Metaflow, and Grafana.<\/p>\n<p>&nbsp;<\/p>\n<h6>NannyML<\/h6>\n<p>NannyML is a tool that enables you to estimate post-deployment model performance in the absence of ground truth values, detect univariate and multivariate data drift, and link data drift alerts back to changes in model performance. In use-cases where there is a delayed feedback loop (e.g., when estimating delivery ETAs, you\u2019ll need to wait until the delivery has finished to know how accurate the predicted ETA is), this tool can provide immediate feedback on the deployed model\u2019s performance.<\/p>\n<div style=\"width: 1920px;\" class=\"wp-video\"><!--[if lt IE 9]><script>document.createElement('video');<\/script><![endif]-->\n<video class=\"wp-video-shortcode\" id=\"video-82535-1\" width=\"1920\" height=\"1080\" preload=\"metadata\" controls=\"controls\"><source type=\"video\/mp4\" src=\"https:\/\/www.globallogic.com\/uki\/wp-content\/uploads\/sites\/4\/2023\/03\/gif-4-1.mp4?_=1\" \/><a href=\"https:\/\/www.globallogic.com\/uki\/wp-content\/uploads\/sites\/4\/2023\/03\/gif-4-1.mp4\">https:\/\/www.globallogic.com\/uki\/wp-content\/uploads\/sites\/4\/2023\/03\/gif-4-1.mp4<\/a><\/video><\/div>\n<p><em>Figure 4 &#8211; Example of estimated performance of a regression model. Image source: <a rel=\"external nofollow\" target=\"_blank\" href=\"https:\/\/github.com\/NannyML\/nannyml\">NannyML<\/a><\/em><\/p>\n<p>&nbsp;<\/p>\n<p>Performance estimation of both classification and regression models are supported as the creators of this tool have researched and developed two novel algorithms: confidence-based performance estimation (CBPE) for classification models and direct loss estimation (DLE) for regression models.<\/p>\n<p>While univariate data drift tests (i.e., those only looking at a single input feature) are helpful for monitoring individual features in a dataset, they won\u2019t help when the relationship between two or more input features changes while the underlying data distributions of the individual features remain the same (see figure 5 for an example). In this case, you would need to use a multivariate test to detect data drift. NannyML has developed such a test that involves the use of Principal Component Analysis (PCA), a dimensionality reduction technique.<\/p>\n<p>&nbsp;<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-82610\" src=\"https:\/\/www.globallogic.com\/uki\/wp-content\/uploads\/sites\/4\/2023\/03\/4.png\" alt=\"\" width=\"720\" height=\"405\" srcset=\"https:\/\/www.globallogic.com\/uki\/wp-content\/uploads\/sites\/4\/2023\/03\/4.png 720w, https:\/\/www.globallogic.com\/uki\/wp-content\/uploads\/sites\/4\/2023\/03\/4-300x169.png 300w\" sizes=\"auto, (max-width: 720px) 100vw, 720px\" \/><\/p>\n<p><em>Figure 5 &#8211; Illustration of multivariate shift with univariate distributions on the margins. Image source: <a rel=\"external nofollow\" target=\"_blank\" href=\"https:\/\/nannyml.readthedocs.io\/en\/stable\/how_it_works\/data_reconstruction.html\">NannyML<\/a><\/em><\/p>\n<p>&nbsp;<\/p>\n<h4>Conclusion<\/h4>\n<p>Model monitoring is an essential aspect of ML development and deployment. By keeping track of model performance and data over time, Data Scientists and ML Engineers can ensure their models continue to make accurate predictions and identify and address any issues that may arise.<\/p>\n<p>Model monitoring helps organisations to make informed decisions, maintain high levels of model performance, and improve the overall customer experience.<\/p>\n<p>Our team here at GlobalLogic UK&amp;I are highly experienced and love working on all things Data Science and MLOps! If you&#8217;re interested in working with us, or just want to chat more about these topics in general, feel free to contact Dr Sami Alsindi (Lead Data Scientist) at <a rel=\"external nofollow\" target=\"_blank\" href=\"mailto:sami.alsindi@uk.globallogic.com\">sami.alsindi@uk.globallogic.com<\/a>.<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Machine learning (ML) has quickly become one of the most transformative technologies of our time \u2013 with applications in a wide range of industries, from healthcare and finance to retail and transportation. As organisations begin to adopt ML, they are facing new challenges arising from working with ML systems. Building, deploying and maintaining ML models at scale requires a new set of practices and tools, which is where Machine Learning Operations (MLOps) comes in. In this two-part blog series, we&#8217;ll explore some of the common problems organisations face when trying to productionise ML models. Namely: Why you should monitor your model in production What is model bias and how to combat it The importance of model explainability when using ML to make impactful decisions Each blog will define the definitions of these concepts and discuss popular open-source tools to address them. This blog will explore the various aspects of model monitoring \u2013 why you should implement it in your pipeline and the tools available. &nbsp; Model Monitoring Model monitoring in production is a critical aspect of MLOps which enables organisations to ensure their deployed models are performing as expected and delivering accurate, reliable results. The ability to monitor models in&#8230;<\/p>\n","protected":false},"author":38,"featured_media":82542,"parent":0,"menu_order":195,"template":"","insight":[41],"insight-subcats":[],"insight-industry":[750],"insight-services":[],"insight-partners":[],"class_list":["post-82535","insightsection","type-insightsection","status-publish","has-post-thumbnail","hentry","insight-blogs","insight-industry-technology"],"acf":[],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/www.globallogic.com\/uki\/wp-json\/wp\/v2\/insightsection\/82535","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.globallogic.com\/uki\/wp-json\/wp\/v2\/insightsection"}],"about":[{"href":"https:\/\/www.globallogic.com\/uki\/wp-json\/wp\/v2\/types\/insightsection"}],"author":[{"embeddable":true,"href":"https:\/\/www.globallogic.com\/uki\/wp-json\/wp\/v2\/users\/38"}],"version-history":[{"count":0,"href":"https:\/\/www.globallogic.com\/uki\/wp-json\/wp\/v2\/insightsection\/82535\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.globallogic.com\/uki\/wp-json\/wp\/v2\/media\/82542"}],"wp:attachment":[{"href":"https:\/\/www.globallogic.com\/uki\/wp-json\/wp\/v2\/media?parent=82535"}],"wp:term":[{"taxonomy":"insight","embeddable":true,"href":"https:\/\/www.globallogic.com\/uki\/wp-json\/wp\/v2\/insight?post=82535"},{"taxonomy":"insight-subcats","embeddable":true,"href":"https:\/\/www.globallogic.com\/uki\/wp-json\/wp\/v2\/insight-subcats?post=82535"},{"taxonomy":"insight-industry","embeddable":true,"href":"https:\/\/www.globallogic.com\/uki\/wp-json\/wp\/v2\/insight-industry?post=82535"},{"taxonomy":"insight-services","embeddable":true,"href":"https:\/\/www.globallogic.com\/uki\/wp-json\/wp\/v2\/insight-services?post=82535"},{"taxonomy":"insight-partners","embeddable":true,"href":"https:\/\/www.globallogic.com\/uki\/wp-json\/wp\/v2\/insight-partners?post=82535"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}