“Data Models for Dataset Drift Controls in Machine Learning With Optical Images” paper summary

Paper summary “Understanding and Controlling Dataset Drift in Machine Learning with Physical Optics”

This white paper summarises the findings from the "Data Models for Dataset Drift Controls in Machine Learning With Optical Images", addressing the critical gap in existing methods for managing ML model robustness to dataset drifts, which often does not consider the actual models of the data.

This paper, a collaboration between Dotphoton and academic partners from Glasgow University, Fraunhofer HHI, HEPIA/HES-SO, Klinikum rechts der Isar, and Helmholtz Zentrum Munich was featured in the Transactions on Machine Learning Research and discussed at ICML and NeurIPS workshops.

We hope this paper could highlight the overlooked challenges in ML and how to address them, being applicable to researchers working in ML/AI vision applications, especially within biomedical imaging, remote sensing, and autonomous vehicles.

Below we outline how advanced data drift management in ML/AI can be achieved by combining classical machine learning with differentiable, physical models of the data acquisition process. The paper sheds light on three primary use cases spanning drift synthesis, drift forensics, and drift optimization. These data-centric methods allow users to run physically faithful tests for their machine learning models, identify and characterize unfavorable deployment settings and optimize data acquisition for specific ML workloads.


In the realm of machine learning, optical images are pivotal, especially in sectors like healthcare, Earth Observation, automotive. However, the application of machine learning models in these areas has been restricted due to concerns about robustness. A significant challenge is the performance drops resulting from discrepancies between training and deployment data. This article delves into the limitations of current methods in addressing dataset drifts and introduces a novel approach that combines traditional machine learning with physical optics. This fusion enables the creation of explicit and differentiable data models, which can be utilized to control machine learning model performance related to dataset drift.


Camera images have been foundational in machine learning research, propelling advancements from the early days of neural networks to the recent breakthroughs in deep learning. These images are not just academic tools; they are integral to delivering high-impact public and commercial services. The potential of deep supervised learning has sparked innovations across various domains, from medicine to geospatial modeling.

However, the enthusiasm for machine learning has been tempered by its inherent vulnerabilities. Machine learning models, especially those trained on supervised learning, are highly sensitive to changes in input data. This sensitivity, often referred to as dataset drift, affects the model's generalization capabilities, making it a focal point of research across various machine learning sub-disciplines.

In this article, we explore the concept of dataset drift, its implications, and the current methods employed to address it. We then introduce our contributions, which combine raw sensor data, differentiable data models, and the standard machine learning pipeline to offer a more robust solution to the challenges posed by dataset drift.

The Current Landscape of Dataset Drift Controls

The primary methods to validate a machine learning model's performance under image dataset drift are augmentation and catalogue testing. While augmentation testing offers flexibility by applying perturbations to processed images, it often results in unfaithful drift artefacts. Catalogue testing, on the other hand, relies on datasets from different cameras, ensuring physically faithful test samples but lacks the flexibility of in-silico simulations.

Despite the widespread use of these methods, the data model of images, which is the root cause of input data variations, has been largely overlooked in machine learning robustness research. This oversight is surprising given the emphasis on data models in other scientific communities and advanced industry applications.

Our Approach and Contributions

Our research bridges this gap by integrating conventional machine learning with physical optics. This integration allows for the creation of explicit, differentiable models of the data generating process, enabling more advanced dataset drift controls.

Our primary contributions include:

  1. Drift Synthesis: This enables the controlled generation of physically accurate drift test cases, aiding in model selection and targeted generalization.
  2. Drift Forensics: By connecting the data model with the task model, we can specify the acceptable data environments for a given task model.
  3. Drift Optimisation: This novel approach optimizes the data generating process itself, enhancing the machine vision task's learning capabilities.

Additionally, we have released two raw image datasets, Raw-Microscopy and Raw-Drone, which are publicly available. These datasets, combined with our modular PyTorch code for explicit and differentiable data models, provide a comprehensive toolkit for researchers and practitioners.

Practical Implications

The methods we propose are tailored for the current imaging infrastructure, which predominantly uses ISPs leading to drift but also allows access to raw sensor readouts. Our data models can save time and money by avoiding additional acquisitions and offer new applications for integrated data-model quality management. However, it's essential to note that our current data models are limited to the ISP scope and require further extensions to capture other sources of data drift.

Related Work

Physically sound data models haven't been extensively explored in machine learning, they have been studied in physical optics and metrology.

Data Models for Images: Deep convolutional neural networks have been used to model raw image data processing. In contrast, we propose using a parametric data model with tunable parameters.

  • Differentiable Image Processing: Some work has been done on creating a differentiable image processing pipeline for camera lens manufacturing. However, this work focuses on optimising a physical component and doesn't provide public resources.
  • Software Packages: Packages like Halide, Kornia, and rawpy offer low-level image processing operations and can be integrated with Python and PyTorch.
  • Inverse Problems: Areas outside optical imaging, such as MRI or computed tomography, use known operator learning to incorporate forward models in optimization.

Drift Synthesis: Realistic drift test cases for computer vision tasks are often created by applying augmentations to processed images. However, adding noise to a processed image might not be physically accurate. Generative models like GANs have limitations in test data generation due to their tendency to produce artefacts.

Drift Forensics: Some work uses a differentiable raw processing pipeline to propagate gradient information back to the raw image for adversarial search. This article's approach, however, aims to modify the data model parameters to identify harmful configurations.

Drift Optimisation: A differentiable image processing data model allows for joint optimization with the task model. This has been explored in radiology image data.

Raw Image Data: Raw files from cameras contain minimal processing data, which can differ based on the camera manufacturer. This can contribute to dataset drift. The article distinguishes between datasets treated as raw data and those that are genuinely raw.

Preliminaries: A Data Model for Images

Image Acquisition: Traditionally optimized for human perception, most research has been on processed RGB image representations. The raw sensor image from a camera differs significantly from the processed image used in machine learning.

Image Transformation: The raw sensor image undergoes a series of transformations to produce the final RGB image. These transformations can result in variations contributing to dataset drift.


Raw Dataset Acquisition: Raw sensor data is essential for advanced data models. The article introduces two datasets: Raw-Microscopy (blood smear microscope images) and Raw-Drone (drone images with car annotations). The motivation behind these datasets includes ensuring coverage of machine learning tasks, potential positive welfare impact, and contexts where errors can be costly.

Data Models: A distinction should be made between a static data model and a parametrized data model. The static model allows for controlled synthesis of different views from the same raw sensor data. The parametrized model is differentiable, enabling backpropagation of the gradient for drift forensics and adjustments.

Task Models: Two task models are used in the experiments: ResNet18 for classification on the Raw-Microscopy dataset and U-Net for segmentation on the Raw-Drone dataset.


With data models, raw data and task models in place we are now able to demonstrate the advanced dataset drift controls comprising: 1 drift synthesis, 2 modular drift forensics and 3 drift optimization.

1. Drift Synthesis

The static data model allows for the creation of drift test cases that are physically accurate. This means that components of the data model can be replaced to generate different views from a single raw reference dataset.

A typical use case is for machine learning researchers to validate their models against drifts from different devices, like microscopes in various labs, without collecting data from each device.

There are twelve example data models provided. For each, task models were trained and then evaluated on test data from all twelve data models. The results showed that the leukocyte classification model was robust to most drifts except for specific configurations. The segmentation task model showed a more varied pattern.

📌 It’s important to model drift in a metrologically accurate way. In contrast, augmentations applied post-hoc to processed data, a common approach to benchmarking robustness of machine learning models, can lead to incorrect model selection and wrong conclusions about model robustness.
Physically Faithful vs. Physically Unfaithful Robustness Validation

There is a difference in results when using physically faithful drift test cases versus unfaithful ones. The former provides more accurate and reliable results.

Implications for Model Selection

The results from physically faithful data and corruptions differ, emphasizing the importance of using accurate test cases for model selection.

Data Models and Targeted Generalization

With data models, it's possible to specify individual environments and observe how different combinations of environments and task models interact.

Use Cases of Drift Synthesis

Drift synthesis can be used for physically accurate validation without actual measurement. It requires access to raw data and knowledge of the data model specification.

2. Drift Forensics

Precise specification of limitations is essential for products with machine learning components. A differentiable data model paired with raw data offers a solution to this. Drift forensics identifies parameter configurations of the data model that may negatively impact the task model's performance.

Sensitivity to Data Models

Classification task model is sensitive to changes in the data model parameters.

Sensitivity in Relation to Magnitude

A higher change in resulting RGB images doesn't necessarily lead to a more significant performance degradation of the task model.

Use Cases of Drift Forensics

Drift forensics can be used to understand the conditions under which a task model performs well or poorly.

3. Drift Optimization

Raw data and a differentiable data model can be used to optimize the data itself, creating a beneficial drift.

Convergence and Stability

The learned data model creates a drift that improves the stability of the learning trajectory compared to the frozen data model.

Helpful Artefacts

Processed images from a learned data model can contain visible artefacts that aid stability and generalization.

Raw and Data Models

Training directly on raw data offers the possibility of machine-optimized optical data processing free of existing data model constraints.

Use Cases of Drift Optimization

Drift optimization can be used to enhance the performance of a task model by creating beneficial drift. It's especially useful for adjusting imaging pipelines optimized for human users.


A black-box data models for images shouldn't be the standard in machine learning research or engineering. Using knowledge from physical optics can enhance machine learning by focusing on the data itself. This approach can help control dataset drift, a prevalent issue in many machine learning areas.

  1. Drift Synthesis: This method creates physically accurate drift test cases. Unlike augmentation testing, these test cases result in less severe performance drops. This changes how models are selected and offers new perspectives on generalization. A practical use of drift synthesis is to validate task models against drift from different devices, like microscopes or autonomous vehicles, without collecting data from each device.
  2. Drift Forensics: This allows for the precise identification of data model limitations for a given machine learning task. By using gradient search, models that shouldn't be operated under certain conditions can be identified. The changes in black level configuration and denoising parameters posed the most significant risks. This method is crucial for products with machine learning components, like medical devices or autonomous vehicles, to meet regulatory requirements.
  3. Differentiable Data Models for Drift Optimization: These models can be used to optimize the data generating process along with the task model parameters. This leads to better stability in learning trajectories. Interestingly, images from learned data models might have visible artifacts but can still offer better stability and generalization. This approach can be beneficial for learning problems where training is expensive or time-consuming.
  4. Raw Data Accessibility: It is crucial to make raw data, commonly used in optical industries, more accessible for machine learning tasks. While many optical imaging devices can extract raw data, machine learning research needs to catch up. We have released two raw image datasets used in the current research to promote this idea.

Access original paper



Luis Oala - Fraunhofer HHI and Dotphoton AG

Marco Aversa - Dotphoton AG and University of Glasgow

Gabriel Nobis - Fraunhofer HHI

Kurt Willis - Fraunhofer HHI

Yoan Neuenschwander - HEPIA/HES-SO

Michèle Buck - Klinikum rechts der Isar

Christian Matek - Helmholtz Zentrum Munich

Jérôme Extermann - HEPIA/HES-SO

Enrico Pomarico - HEPIA/HES-SO

Wojciech Samek - Fraunhofer HHI

Roderick Murray-Smith - University of Glasgow

Christoph Clausen - Dotphoton AG

Bruno Sanguinetti - Dotphoton AG


Oala, L., Aversa, M., Nobis, G., Willis, K., Neuenschwander, Y., Buck, M., Matek, C., Extermann, J., Pomarico, E., Samek, W., Murray-Smith, R., Clausen, C., & Sanguinetti, B. (2023). Data Models for Dataset Drift Controls in Machine Learning With Optical Images. Transactions on Machine Learning Research (TMLR). Retrieved from https://openreview.net/forum?id=I4IkGmgFJz