This dataset consists of about 270,000 MNIST-like captcha images. Each 84x30px image consists of 5 alphanumeric characters. I found the dataset useful for testing the UberAI Labs CoordConv method[1]. For interesting results you should use just a fraction of the 270k images for training. Also remember to split the dataset into a train and test set.

The filenames are structured like this:
captcha content__font.png




  1. Liu, Rosanne et al. " An Intriguing Failing of Convolutional Neural Networks and the CoordConv Solution" 2018