First, appreciate for your work which is easy to use and read.
In scaled_mninst.py, the normal CNN model is trained on origin data and then, tested on scaled one. It shows a bad acc of 60% (In my running). Then, you fine-tune a deform-conv on the scaled data and its accuracy is much better. However, I tried to re-train this trained CNN model on scaled data and the result confuses me definitely. It gets 96% on origin test and 98% on scaled data.
Well, this experiment can not prove the effectiveness of deform-conv layers.