In this talk I will present a joint recent work with Guillaume P.
Archambault, Harry Guo, and Richong Zhang. In this work, we explain the
working mechanism of MixUp in terms of adversarial training. We introduce
a new class of adversarial training schemes, which we refer to as
directional adversarial training, or DAT. In a nutshell, a DAT scheme
perturbs a training example in the direction of another example but keeps
its original label as the training target. We prove that MixUp is
equivalent to a special subclass of DAT, in that it has the same expected
loss function and corresponds to the same optimization problem
asymptotically. This understanding not only serves to explain the
effectiveness of MixUp, but also reveals a more general family of MixUp
schemes, which we call Untied MixUp. We prove that the family of Untied
MixUp schemes is equivalent to the entire class of DAT schemes. We
establish empirically the existence of Untied Mixup schemes which improve
upon MixUp.