We generalize the continuous time framework for score-based generative models from an underlying Brownian motion (BM) to an approximation of fractional Brownian motion (FBM). We derive a continuous reparameterization trick and the reverse time model by representing FBM as a stochastic integral over a family of Ornstein-Uhlenbeck processes to define generative fractional diffusion models (GFDM) with driving noise converging to a non-Markovian process of infinite quadratic variation. The Hurst index H∈(0,1) of FBM enables control of the roughness of the distribution transforming path. To the best of our knowledge, this is the first attempt to build a generative model upon a stochastic process with infinite quadratic variation.
In NeurIPS 2023 Workshop on Diffusion Models, 2023
DiffInfinite is a hierarchical diffusion model that generates arbitrarily large histological images while preserving long-range correlation structural information. Our approach first generates synthetic segmentation masks, subsequently used as conditions for the high-fidelity generative diffusion process. The proposed sampling method can be scaled up to any desired image size while only requiring small patches for fast training. Moreover, it can be parallelized more efficiently than previous large-content generation methods while avoiding tiling artefacts.
Generative Models, Medical Imaging, Histopathology, Diffusion Models
NeurIPS 2023 spotlight paper
While there are methods to prospectively validate the robustness of machine learning models to such dataset drifts, existing approaches do not account for explicit models of machine learning’s primary object of interest: the data. This limits our ability to study and understand the relationship between data generation and downstream machine learning model performance in a physically accurate manner. In this study, we demonstrate how to overcome this limitation by pairing traditional machine learning with physical optics to obtain explicit and differentiable data models.
Machine Learning, Artificial Intelligence, Computer Vision, Pattern Recognition, Data Drift
TMLR (Transactions on Machine Learning Research), 2023. Presented as workshop paper at: ICML Spurious Correlations, Invariance, and Stability Workshop, 2023 • ICML Differentiable Almost Everything Workshop, 2023
Good practices for health applications of machine learning: Considerations for manufacturers and regulators
This report by the Focus Group on Artificial Intelligence for Health (FG-AI4H) goal is to assist in understanding the expectations of the regulatory bodies, promote the step-by-step implementation of the safety and effectiveness of AI/ML-based software-as-medical devices, and fill the current gap in international AI/ML-based medical device standards to the greatest extent possible.
AI/ML in healthcare, AI/ML standards in healthcare, AI/ML-based medical devices, AI checklist, regulatory framework, software-as-a-medical device
Available from ITU website, 2023
Once the raw data is collected, it is processed through a complex image signal processing (ISP) pipeline to produce an image compatible with human perception. However, this processing is rarely considered in machine learning modelling because available benchmark data sets are generally not in raw format. This study shows how to embed the forward acquisition process into the machine learning model.
machine learning, image signal processing, ISP, physical data model
Machine Learning and the Physical Sciences workshop, NeurIPS 2022, selected for a contributed talk
Jetraw images and functions may be used in end-to-end models to generate synthetic data with statistics matching those of genuine raw images, and play an important role in data-centric AI methodologies. Here we show how these features are used for a machine-learning task: the segmentation of cars in an urban, suburban and rural environment. Starting from a drone and airship image dataset in the Jetraw format (with calibrated sensor and optics), we use an end-to-end model to emulate realistic satellite raw images with on-demand parameters.
synthetic data, machine learning, AI, data-centric AI, satellite, drones, compression
8th International Workshop on On-Board Payload, Athens, 26 September 2022
Statistical distortion of supervised learning predictions in optical microscopy induced by image compression
Interestingly, a recent metrologically accurate algorithm, offering up to 10:1 compression ratio, provides a prediction spread equivalent to that stemming from raw noise. The method described here allows to set a lower bound to the predictive uncertainty of a SL task and can be generalized to determine the statistical distortions originated from a variety of processing pipelines in AI-assisted fields.
Artificial Intelligence (AI), Supervised Learning (SL) models, Deep Learning (DL) algorithms
Scientific Reports (2022) 12:3464
In this work, we target the paper-to-practice gap by applying an ML4H audit framework proposed by the ITU/WHO Focus Group on Artificial Intelligence for Health (FG-AI4H) to three use cases: diagnostic prediction of diabetic retinopathy, diagnostic prediction of Alzheimer’s disease, and cytomorphologic classification for leukemia diagnostics.
Machine Learning, Health, Testing
Proceedings of the Machine Learning for Health, PMLR 136:280-317, 2020
The current movement towards increased use of lossy compression is highly risky, because even careful and tedious parameter tuning cannot guarantee that no applications are compromised. We implemented and validated a compression method that simultaneously provides a strong data reduction and preserves analysis results for all possible applications.
hyperspectral imaging, machine learning, Earth Observation, satellites, compression
Proceedings of ATTRACT Online Conference "Igniting the Deep Tech Revolution", 22 September 2020, online
In this paper, we discuss requirements for compression tuned for machine vision, demonstrate an implementation achieving a compression ratio in the range 5:1–10:1 at a rate 200 MB/s/core in software and 400 MB/s on a VHDL FPGA simulation having a 5k-LUT footprint. We also show that adding a machine-learning component to our compressor increases the compression ratio by 10% and allows for easy portability of an otherwise complex algorithm on enterogenous architectures.
compression, satellites, machine learning, AI, Earth Observation, ESA
7th International Workshop on On-Board Payload Data Compression by ESA and CNES, virtual online workshop, 2020