Computer vision (CV) is one of the most mature domains in artificial intelligence (AI). However, building and operationalizing robust, end-to-end computer vision pipelines is challenging in the modern AI ecosystem.
Why? Because production-deployed computer vision models show arbitrary behavior due to several factors related to data, model design, architecture, and deployment.
In the ML ecosystem, MLOps (Machine Learning Operations) is a set of practices to track and monitor the performance of the entire ML pipeline using processes such as CI/CD (Continuous Integration/Continuous Delivery) and CT (Continuous Training).
Mainly, MLOps automates data validation, model validation, and model retraining cycle in a production environment. Moreover, it offers a monitoring mechanism that alerts the development team if the model’s performance deteriorates.
However, in the computer vision ecosystem, regular MLOps practices are not optimal due to the following reasons:
- Data is unstructured, and data manipulation is done differently.
- Edge cases or under-represented classes can impact the model's performance more.
- CV is more data and hardware intensive than other ML domains.
Computer vision pipelines require a set of processes that are exclusive to Computer Vision. This is where CVOps–a fusion of MLOps and computer vision is more suited.
In this article, we’ll have a detailed discussion about why classical MLOps tools are unsuitable for computer vision tasks. We’ll also discuss how a CVOps pipeline is better equipped to streamline your entire CV stack.
How Is Data Manipulation Different in Computer Vision?
Data comes in all shapes and sizes, including structured, unstructured, and semi-structured. However, most of the enterprise data is unstructured.
In fact, a report suggests that more than 80% of data is unstructured, which includes images, videos, emails, web server logs, and more. But only 18% of organizations leverage this data–a Deloitte report suggests.
In computer vision, the majority of data is unstructured in the form of images and videos. Hence, exploring and visualizing data is difficult compared to structured data. Imagine browsing through one million unstructured images compared to one million rows of structured textual data stored in a .csv file or relational database. Moreover, images and videos require more storage capacity and processing power than structured data.
Furthermore, collecting and preparing labeled image and video datasets often require time-intensive manual intervention. Often there is not enough visual data to support CV model training. Other times the data requires hand annotation, which is labor-intensive and costly.
Similarly, filtering images and videos is a challenge. Just imagine creating a dataset of seasonal flowers from a large collection of plant and flower images without any labels or tags. Enterprises employ metadata management tools to tag and categorize images to make them searchable and accessible. This, again, requires pricey human intervention.
Rather than an MLOps tool, a robust CVOps tool solves all these problems by providing a centralized and integrated solution that covers data collection, storage, manipulation, filtering, and more.
Model Validation Needs to Be Focused on Edge Case Visualization
Model validation determines if the model is generalized enough to cater to all input scenarios. However, in the real-world, unlikely situations often occur. For instance, an autonomous car is trained for regular road scenarios. How would the vehicle respond if there was an unexpected event? Like a high-speed car chase by five police cars, sudden rain, and thunderstorm that changes the driving conditions. Or what if a person comes in front of the car by accident? These are known as edge cases which are unlikely and unusual but have a non-zero probability of occurring.
Besides the unpredictability of input scenarios, edge cases occur due to bias and variance in the model. The model is either too simple or too inexperienced to handle the incoming real-world data.
In computer vision, edge cases have a significant impact on the performance of the model. Edge cases in computer vision are more subtle as they’re often related to ambiguous visual aspects. Human annotators can sometimes fail to identify edge cases resulting in mislabeling. In images, edge cases can also occur if two objects are entirely different in their nature and properties but look visually similar. Take a look at the image below.
The only similarity between a dog and fried chicken is that humans love them–one as a pet and the other as food. These kinds of visually similar images jeopardize the performance of computer vision models. In mission-critical applications, such similarities can prove fatal. Even one in ten misspredictions could result in loss of revenue or mission failure.
Complex CV tasks require tools to visualize image data to effectively identify edge cases in order to avoid any real damage. A robust CVOps platform can provide the necessary tools to handle such edge cases.
Monitoring Needs to Be Focused on Data and Hardware
Computer vision tasks are data and hardware-intensive. Classical MLOps strategy does not offer detailed monitoring of production-deployed CV models because it is not equipped to monitor them.
In particular, computer vision monitoring includes tracking hardware devices like CCTV cameras, medical scanning and imaging instruments, and other edge devices. Each device can have specific configurations that might affect the performance of the CV model. For instance, the orientation of cameras installed on a roadside for traffic analytics might change due to wind or other environmental factors, causing the model to capture inaccurate input images for which the model was not trained.
To monitor such changes, you might require out-of-the-box or vision-specific metrics that differ from classical ML performance metrics. Some of these CV monitoring metrics include:
- Input image width and height
- Image ratio distribution
- Inference time
- AE outlier score
- KS drift
Besides these, the development team should be able to monitor various distribution drifts such as data drift, concept drift, domain drift, prediction drift, and upstream drift. A robust CVOps model offers monitoring capabilities that are often missing in regular MLOps tools.
We have discussed various reasons why regular MLOps pipelines are not suitable for computer vision applications. And lastly, we introduced CVOps as an appropriate solution. Now, let’s demystify CVOps in detail.
In Comes CVOps–Specialized MLOps for Computer Vision
If you are wondering, what is CVOps, exactly? Simply put, it is a combination of MLOps and computer vision, similar to how MLOps is a combination of DevOps practices tailored for machine learning tasks.
CVOps curates enterprise best practices, business processes, and tools and employs CV-specialized experts for building production-ready computer vision products. The product environment usually includes a camera or edge device installed in a real-world setting–for instance, cameras installed on the roadside for road safety and traffic monitoring systems.
In addition, CVOps tools specialize in tracking the behavior of data and models over time. They can track metrics and KPIs to identify distribution drift and device calibration issues. A CVOps tool can notify your development team if the threshold values of these metrics are reached, enabling them to deliver timely data and model updates.
Moreover, a robust CVOps strategy includes factors such as data and model scalability and the costs associated with it. It includes managing multiple edge devices and implementing solutions for each device simultaneously.
An enterprise CVOps strategy is also concerned with tracking and maintaining customer satisfaction over time. If the model performance deteriorates, the CVOps strategy should provide quick resolution of any related issues to ensure customer success.
Build Robust CVOps Pipelines With Picsellia
General MLOps tools are suitable for most machine learning and deep learning tasks. However, CVOps tools are more capable of solving the unique challenges of vision-powered systems. They can manage, manipulate, analyze, annotate, and validate image and video data. They can train, validate, retrain, monitor, and release CV pipelines in a production environment.
Depending on business requirements, CV teams can use packaged enterprise CVOps tools or build their custom CVOps strategy. If you opt for a pre-built platform, look no further than Picsellia.
- Centralized dataset management for unstructured data like images and videos
- Integrated image labeling and model training tools
- Experiment tracking fine-tuning and obtaining the best performing model
- Production-ready serverless model deployment
- Real-time CV model monitoring with edge case identification
- ML automated workflows covering each component of the CV pipeline.
Moreover, Picsellia offers supervised and unsupervised metrics to identify any drift and monitor overall CV workflows.
Start using Picsellia’s CVOps platform for free. Book your trial today!