Improve generalization with augmentation and regularization by moatednorth · Pull Request #1 · moatednorth/test

moatednorth · 2026-02-06T15:13:43Z

Reduce overfitting and improve robustness by adding regularization, augmentation, and label smoothing to the training pipeline.
Ensure the learning rate schedule is robust to small or zero warmup/decay ranges by using safe clamping in the warmup/cosine schedule.
Combine pruning and knowledge distillation while preserving stable optimization with decoupled weight decay.

Added hyperparameters WEIGHT_DECAY, DROPOUT_RATE, and LABEL_SMOOTHING and applied L2 kernel_regularizer to convolutional and classifier layers via tf.keras.regularizers.l2(WEIGHT_DECAY).
Inserted Dropout(DROPOUT_RATE) into both the student build_model and the teacher build_teacher_model feature heads, and enabled label smoothing in the student loss via CategoricalCrossentropy(label_smoothing=LABEL_SMOOTHING).
Reworked the CIFAR-10 input pipeline to apply lightweight 2D augmentation (resize_with_crop_or_pad, random_crop, random_flip_left_right) before expanding each image into a simulated 3D volume, and converted dataset building to tf.data pipelines with map, batch, shuffle, and prefetch.
Switched the optimizer to optimizers.AdamW with weight_decay=WEIGHT_DECAY and ensured warmup_steps is capped at total_steps when computed in train_model.
Hardened the WarmupCosineDecay schedule by using safe_warmup_steps = tf.maximum(warmup_steps, 1.0) and decay_steps = tf.maximum(total_steps - warmup_steps, 1.0) to avoid divide-by-zero or invalid progress ranges.

No automated tests were executed for this change.
Basic sanity edits were applied and committed to the repository, but training and unit tests were not run in CI as part of this PR.

Improve generalization with augmentation

6ff7e87

moatednorth added the codex label Feb 6, 2026 — with ChatGPT Codex Connector

Provide feedback