How to Integrate Picsellia into a Hugging Face Training Workflow

Implementing computer vision model training at scale requires a robust computer vision operations workflow. Scaling training with a custom workflow can bottleneck at the point of traceability and repeatability. Tailoring this into your custom training workflow can be costly in terms of time and resources.

A CVOps platform like Picsellia has robust architectural workflows that provide enough features to handle large-scale computer vision-based model development (training). It has the necessary tools to manage the entire CV development lifecycle. You have two options when leveraging Picsellia for training. You can re-develop your entire training workflow to use Picsellia's workflows and infrastructure. Or integrate Picsellia into your existing training workflow and still use your infrastructure. The integration option gives you the best of both worlds without a platform lock or the compulsory mandate of migrating your existing training workflows to the platform. It also creates two points of failure for the training workflows and multiple data locations since the training information generated in your local training environment (on the infrastructure) also exists on Picsellia's platform.

Generally, integrating a program with a platform requires precision pipelining to manage the entire workflow due to specific requirements that the program is dependent on. In this case, you will consider metrics logging, associated training artifacts, model parameters, the training framework, etc.

Prerequisites

To follow along with this article's demo comfortably, you must have the following:

An active Picsellia account.
The Picsellia SDK is installed on your machine; see how to install it here.
HuggingFace is installed on your machine; see how to install it here.
Picsellia's requirements for an integration; see requirements here.
Basic-to-intermediate understanding of experiment tracking.

This tutorial will explain the integration of Picsellia into a DETR transformer training script that uses the HuggingFace framework.

In this tutorial, you will:

Initiate an experiment in Picsellia.
Pull the datasets associated with the experiment from Picsellia.
Train the model and log the results to Picsellia.
Send evaluations to Picsellia's evaluation platform.
Store artifacts in Picsellia's artifact store.
Store the retrained model in the Picsellia model registry.
Dockerize the code.

You can find all the related codes in this tutorial here.

Before getting into the nitty-gritty bits of the integration, let's quickly get a brief overview of the model and framework for the task.

DETR model, also called DEtection TRansformer, is an object detection model that uses a transformer-based architecture developed using PyTorch. It comprises an image processor and an object detection model. The image processor encodes the data by converting annotations to DETR format, then resizes and normalizes both the images and annotations. The object detection half then carries out the detection by decoding the encoded data. It can be trained (fine-tuned) using native PyTorch, HuggingFace

Trainer 🤗 API, HuggingFace Accelerate, or any other framework you prefer. For this tutorial, you will use the DETR model directly from the HuggingFace Transformers library and then retrain it using the HuggingFace Trainer API.

It's time to retrain the DETR model and integrate Picsellia into your training workflow.

Initiate a training experiment in Picsellia

To get started, you will create an experiment on Picsellia to manage your training run. An experiment can be created directly on the Picsellia platform through the user interface or with the SDK. To do this through the user interface, go to Projects and select a project. Within that project, move to the Experiment Tab and click the New Experiment button on the top right corner to add a new experiment. A new window containing four information sections will come up. In the General Information section, give the experiment a name (train_dataV_experiment) and description. The other three sections are optional. Only add the datasets you want to use in the Datasets Versions section and give them aliases. There is no need to fill out the Base Architecture and Hyper-parameters section since the model you will train isn't available on Picsellia. Just click the Create button to create the experiment.

‍

Pull the datasets associated with the experiment from Picsellia

In your local environment, initialize a connection to Picsellia from the training script using the Picsellia SDK and get the Picsellia experiment you created.

The dataset versions (training and evaluation datasets) you attached earlier to the experiment are part of sample datasets on Picsellia. Since they are object detection datasets, they are well suited for the task. Download the training dataset versions attached to your experiment.

‍

Get the label names for the training dataset version from its label objects.

‍

Then, you will also need to get the annotations for the training dataset version (images) from Picsellia. For object detection datasets, Picsellia provides the option of downloading dataset annotations in the three most generally used annotation formats: COCO , PASCAL VOC, and YOLO

With the function below, build the annotations for the training images in COCO format using the training dataset version object and extracted label names. Write the built annotations to a json file in your local directory, and read the json file as a COCO file using the COCO library.

Create a label map of the labels and their respective “ids” from the COCO annotation categories.

‍

Log the label maps to your Picsellia experiment.

‍

Train the model and log the results to Picsellia.

Since you now have all the images, labels, and annotations you need for training in your local environment, you can then train the model on your infrastructure and log the training results to Picsellia for traceability.

DETR's image processor expects the dataset annotations to be in this format: {'image_id': int, 'annotations': List[Dict]}, where each dictionary is a COCO object annotation. Therefore, reformat annotations for each image before passing it to the image processor to enable successful training.

Load DETR’s image processor, then encode the images and reformat the annotations with the image processor.

Load DETR's object detection model and specify your hyperparameters for the model's training.

Log the training hyperparameters to your Picsellia experiment.

‍

Create a custom training callback for logging training activities to your Picsellia experiment from the Trainer API class. The API class gives access to all the activities happening during training. Within the callback, the loss and learning rate decay logs occurring during training are logged directly to your experiment in Picsellia at the end of the training.

Initialize the training callback and train the model.

After the training, save the model to your local directory.

Store all the model files in your Picsellia experiment artifact.

‍

Send evaluations to the Picsellia evaluation platform.

Before making predictions with the fine-tuned model, you must load its image processor (encoder) and object detector (decoder).

Then, fetch images and labels for the evaluation dataset version from your Picsellia experiment.

Note: Fetching the evaluation dataset version doesn't download the images again; it is only a Picsellia object that acts as a placeholder to enable you to match the exact images you have locally to the ones in Picsellia.

To evaluate the model, make predictions on all the evaluation images that you downloaded to your local directory from your Picsellia experiment when you pulled the dataset versions initially. Match the local copy of the images to the ones in Picsellia and push the predictions of the corresponding images to your experiment evaluation dashboard.

Being able to see the progressive performance of your fine-tuned model over time is essential. So, each time you make predictions after training and send the evaluations to Picsellia, you get a visual and empirical evaluation of your model.

The annotations in red are predictions, and the ones in green are ground truths. Finally, spin up a job on Picsellia to compute the evaluation metrics for all the predictions you just added.

Store the retrained model in the Picsellia model registry.

To complete the integration pipeline for traceability, you must stash all fine-tuned model versions from your experiment in the Picsellia model registry and package your code in a Docker container.

Create a custom model folder for the different versions of your model in the Picsellia model registry. It will enable you to store the version of the fine-tuned model from our experiment run.

‍

Create a model version in the custom model folder to store the fine-tuned model from your training experiments in the model registry.

Dockerize the code

Package the workflow to enable you to scale training with seamless repeatability when running your experiments with the model. Make a custom Docker image structure with a Dockerfile.

Build the image, and your mission is complete..

‍

docker build . -f Dockerfile -t picsellia-hf-train_image:1.0

You can deploy this Docker container image on your infrastructure and have a CVOps integrated training pipeline with Picsellia. Think of each Docker container you run as a training pipeline to spin up an experiment with the DETR model.

To spin an experiment run, specify your environment variables for the Docker container with each run's experiment name and retrained model output directory.

---------------------------------------------------------------------------------------------------------------------

Conclusion

In this article, you learned about training with a CVOps workflow at scale and how to integrate Picsellia into training a HuggingFace object detection transformer to implement a robust CVOps training workflow. There is so much more to learn about Picsellia, HuggingFace, and scaling training; check out the following resources:

‍