Optimal Transport for Unsupervised Learning Tutorial

Optimal transport has a long history in mathematics and recently it advances in optimal transport theory have paved the way for its use in the ML/AI community. This tutorial aims to introduce pivotal computational, practical aspects of OT as well as applications of OT for unsupervised learning problems. In the tutorial, we will provide a selected, compact, and comprehensive background of optimal transport that is useful in machine learning research. Moreover, we will elaborate on the application of optimal transport in several popular and important machine learning applications in the last several decades including deep generative models, clustering, and topic modelling are among the most popular topics. This tutorial targets a wide range of machine learning practitioners who are interested in leveraging optimal transport in their research in different domains with applications in computer vision, natural language processing, data mining, etc. The audience is expected to have foundations of machine learning and deep learning, basic knowledge in probabilities and statistics, as well as familiarity with the Python programming language.

Instructors

Viet Huynh (Department of Data Science and AI, Monash University)
He Zhao (Department of Data Science and AI, Monash University)
Nhat Ho (Department of Statistics and Data Sciences, UT Austin)
Dinh Phung (Monash University & VinAI Research)

Tutorial Outline

I. Introduction

II. Optimal Transport Background (40 mins) [Nhat Ho, Slides]

Theory: Monge’s OT formulation, Kantorovich’s OT problem

Entropic regularized OT

Extensions such as Sliced Wasserstein and Wasserstein Barycenter

III. Optimal Transport for Deep Generative Models (40 mins)

Background of deep generative models [He Zhao, Slides]
a. Variational autoencoders
b. Adversarial generative nets

Deep generative models with optimal transport [Viet Huynh, Slides]
a. Dual formulation (e.g., WGAN)
b. Primal formulations (e.g., WAE)

IV. Clustering and Latent Factor Analysis with OT (40 mins)

Multilevel clustering [Viet Huynh, Slides]
a. Background of clustering
b. Clustering and multilevel clustering with OT

Latent factor analysis with OT [He Zhao, Slides]
a. Wasserstein dictionary learning
b. Topic modelling with geometry-aware OT
c. Neural topic modelling with OT

V. Concluding Remarks

Resources

II. Optimal Transport Background

References
[1] Cedric Villani. Optimal transport: Old and new, volume 338. Springer, 2009.
[2] Filippo Santambrogio. Optimal transport for applied mathematicians, volume 55. Springer, 2015.

III. Optimal Transport for Deep Generative Models

References
[1] Martin Arjovsky, Soumith Chintala, and Leon Bottou. Wasserstein generative adversarial networks. International Conference on Machine Learning, pages 214–223, 2017.
[2] Ilya Tolstikhin, Olivier Bousquet, Sylvain Gelly, and Bernhard Schoelkopf. Wasserstein auto-encoders. International Conference on Learning Representations, 2018.
[3] Viet Huynh, Dinh Phung, and He Zhao. Optimal transport for deep generative models: State of the art and research challenges. International Joint Conference on Artificial Intelligence, Survey Track, 2021.
[4] Nhan Dam, Quan Hoang, Trung Le, Tu Dinh Nguyen, Hung Bui, and Dinh Phung. Three-player Wasserstein GAN via amortised duality. International Joint Conference on Artificial Intelligence, 2019.
[5] Tim Salimans, Han Zhang, Alec Radford, and Dimitris Metaxas. Improving GANs using optimal transport. International Conference on Learning Representations, 2018.

IV. Clustering and Latent Factor Analysis

References
[1] Viet Huynh, Nhat Ho, Nhan Dam, XuanLong Nguyen, Mikhail Yurochkin, Hung Bui, et al. On efficient multilevel clustering via Wasserstein distances. Journal of Machine Learning Research, 2021.
[2] Viet Huynh, He Zhao, and Dinh Phung. OTLDA: A geometry-aware optimal transport approach for topic modeling. Advances in Neural Information Processing Systems, 2020.
[3] He Zhao, Dinh Phung, Viet Huynh, Trung Le, and Wray Buntine. Neural topic model via optimal transport. International Conference on Learning Representations, 2021.
[4] He Zhao, Dinh Phung, Viet Huynh, Yuan Jin, Lan Du, and Wray Buntine. Topic modelling meets deep neural networks: A survey. International Joint Conference on Artificial Intelligence, Survey Track, 2021.
[5] Nhat Ho, XuanLong Nguyen, Mikhail Yurochkin, Hung Hai Bui, Viet Huynh, and Dinh Phung. Multilevel clustering via Wasserstein means. International Conference on Machine Learning, 2017.
[6] Nhat Ho, Viet Huynh, Dinh Phung, and Michael Jordan. Probabilistic multilevel clustering via composite transportation distance. International Conference on Artificial Intelligence and Statistics, 2019.
[7] Morgan A Schmitz, Matthieu Heitz, Nicolas Bonneel, Fred Ngole, David Coeurjolly, Marco Cuturi, Gabriel Peyre, and Jean-Luc Starck. Wasserstein dictionary learning: Optimal transport-based unsupervised nonlinear dictionary learning. SIAM Journal on Imaging Sciences, 2018.

Others

References
[1] Lenaic Chizat, Gabriel Peyre, Bernhard Schmitzer, and Francois-Xavier Vialard. Unbalanced optimal transport: Dynamic and kantorovich formulations. Journal of Functional Analysis, 2018.
[2] Nicolas Fournier and Arnaud Guillin. On the rate of convergence in Wasserstein distance of the empirical measure. Probability Theory and Related Fields, 2015.