Your datasets deserve version control
Git for your computer vision data. Track changes, compare versions, and ensure every experiment is reproducible.
Used by teams at
Track every change, reproduce any result
Your datasets evolve constantly—new images, corrected labels, filtered samples. Without version control, you're flying blind.
Immutable snapshots
Every dataset version is a permanent snapshot. Reference exact data states in experiments.
Label management
Create, rename, and merge labels across your dataset. Keep your taxonomy clean and consistent.
Fork for experiments
Fork dataset versions to test hypotheses without affecting production data.
Audit-ready history
Complete changelog with who changed what, when, and why. Perfect for compliance.
Version Timeline
InteractiveEverything you need to manage datasets
Version, organize, and share your datasets. Everything connects to your experiments.
Git-like Version Control
Track every change to your datasets. Compare versions, rollback mistakes, and branch for experiments. Full lineage from raw data to trained models.
Smart Data Organization
Tag, filter, and slice your data in seconds. Create custom views, save queries, and share collections. No more hunting through folders.
Team Collaboration
Share datasets across teams with fine-grained permissions. Track who changed what, when, and why. Comments and reviews built-in.
Full Data Lineage
Trace any prediction back to its training data. Audit-ready lineage for compliance. Understand model behavior through data.
Programmatic dataset management
Full Python SDK with type hints, auto-completion, and comprehensive documentation. Integrate datasets directly into your ML pipelines.
from picsellia import Client
client = Client()
datalake = client.get_datalake()
# Get or create dataset
dataset = client.get_dataset("defect-detection")
# Create a new version
version = dataset.create_version(
version="v3",
description="Added edge cases"
)
# Add data from datalake
data = datalake.list_data(
tags=["edge-case", "validated"]
)
version.add_data(data)# Label manipulation
labels = version.list_labels()
version.create_label("scratch")
# Rename a label
label = version.get_label("defect")
label.update(name="surface_defect")
# Export annotations in COCO format
version.export_annotation_file(
AnnotationFileType.COCO,
"./training_data"
)Dataset Browser
Structure your data the right way
Proper data splits are crucial for model performance. Create reproducible train/val/test splits, stratify by class, and ensure no data leakage.
Fits into your existing workflow
Datasets connect directly to annotations, experiments, and deployments. No manual handoffs.
Ready to version your datasets?
Free trial, no credit card. Start versioning your datasets today.