Marginalized Corrupted Features

 
 

Marginalized Corrupted Features (MCF) is a new approach to combatting overfitting in supervised learning. The key idea behind MCF is that you can regularize models by training them on corrupted copies of the data. MCF corrupts the data according to a pre-specified corrupting distribution. It trains linear models on infinitely many samples drawn from this corrupting distribution, without increasing the computational complexity.


For more details on MCF, please refer to the following papers:

  1. L.J.P. van der Maaten, M. Chen, S. Tyree, and K.Q. Weinberger. Marginalizing Corrupted Features. Arxiv 1402.7001, 2014. [ PDF ]

  2. L.J.P. van der Maaten, M. Chen, S. Tyree, and K.Q. Weinberger. Learning with Marginalized Corrupted Features. In Proceedings of the International Conference on Machine Learning (ICML), JMLR W&CP 28:410-418, 2013. [ PDF ]

 

Introduction

Matlab code implementing MCF can be obtained from here (tar.gz; 4MB). The code contains MCF code for three loss functions (quadratic, exponential, and logistic) and three corrupting distributions (blankout / dropout, Gaussian corruption, and Poisson corruption). A script that demonstrates the use of MCF in the “nightmare at test time” scenario is also included.


For more information on installation and usage, please refer to the included Readme. In case you think you found any bugs in the code, please contact: lvdmaaten@gmail.com

Matlab code

This code is licensed under a BSD license.


Legal