timm
  • 🌍GET STARTED
    • Home
    • Quickstart
    • Installation
  • 🌍TUTORIALS
    • Using Pretrained Models as Feature Extractors
    • Training With The Official Training Script
    • Share and Load Models from the BOINC AI Hub
  • 🌍MODEL PAGES
    • Model Summaries
    • Results
    • Adversarial Inception v3
    • AdvProp (EfficientNet)
    • Big Transfer (BiT)
    • CSP-DarkNet
    • CSP-ResNet
    • CSP-ResNeXt
    • DenseNet
    • Deep Layer Aggregation
    • Dual Path NetwORK(DPN)
    • ECA-ResNet
    • EfficientNet
    • EfficientNet (Knapsack Pruned)
    • Ensemble Adversarial Inception ResNet v2
    • ESE-VoVNet
    • FBNet
    • (Gluon) Inception v3
    • (Gluon) ResNet
    • (Gluon) ResNeXt
    • (Gluon) SENet
    • (Gluon) SE-ResNeXt
    • (Gluon) Xception
    • HRNet
    • Instagram ResNeXt WSL
    • Inception ResNet v2
    • Inception v3
    • Inception v4
    • (Legacy) SE-ResNet
    • (Legacy) SE-ResNeXt
    • (Legacy) SENet
    • MixNet
    • MnasNet
    • MobileNet v2
    • MobileNet v3
    • NASNet
    • Noisy Student (EfficientNet)
    • PNASNet
    • RegNetX
    • RegNetY
    • Res2Net
    • Res2NeXt
    • ResNeSt
    • ResNet
    • ResNet-D
    • ResNeXt
    • RexNet
    • SE-ResNet
    • SelecSLS
    • SE-ResNeXt
    • SK-ResNet
    • SK-ResNeXt
    • SPNASNet
    • SSL ResNet
    • SWSL ResNet
    • SWSL ResNeXt
    • (Tensorflow) EfficientNet
    • (Tensorflow) EfficientNet CondConv
    • (Tensorflow) EfficientNet Lite
    • (Tensorflow) MobileNet v3
    • (Tensorflow) MixNet
    • (Tensorflow) MobileNet v3
    • TResNet
    • Wide ResNet
    • Xception
  • 🌍REFERENCE
    • Models
    • Data
    • Optimizers
    • Learning Rate Schedulers
Powered by GitBook
On this page
  • Scripts
  • Training Script
  • Validation / Inference Scripts
  • Training Examples
  1. TUTORIALS

Training With The Official Training Script

PreviousUsing Pretrained Models as Feature ExtractorsNextShare and Load Models from the BOINC AI Hub

Last updated 1 year ago

Scripts

A train, validation, inference, and checkpoint cleaning script included in the github root folder. Scripts are not currently packaged in the pip release.

The training and validation scripts evolved from early versions of the . I have added significant functionality over time, including CUDA specific performance enhancements based on .

Training Script

The variety of training args is large and not all combinations of options (or even options) have been fully tested. For the training dataset folder, specify the folder to the base that contains a train and validation folder.

To train an SE-ResNet34 on ImageNet, locally distributed, 4 GPUs, one process per GPU w/ cosine schedule, random-erasing prob of 50% and per-pixel random value:

Copied

./distributed_train.sh 4 /data/imagenet --model seresnet34 --sched cosine --epochs 150 --warmup-epochs 5 --lr 0.4 --reprob 0.5 --remode pixel --batch-size 256 --amp -j 4

It is recommended to use PyTorch 1.9+ w/ PyTorch native AMP and DDP instead of APEX AMP. --amp defaults to native AMP as of timm ver 0.4.3. --apex-amp will force use of APEX components if they are installed.

Validation / Inference Scripts

Validation and inference scripts are similar in usage. One outputs metrics on a validation set and the other outputs topk class ids in a csv. Specify the folder containing validation images, not the base as in training script.

To validate with the model’s pretrained weights (if they exist):

Copied

python validate.py /imagenet/validation/ --model seresnext26_32x4d --pretrained

To run inference from a checkpoint:

Copied

python inference.py /imagenet/validation/ --model mobilenetv3_large_100 --checkpoint ./output/train/model_best.pth.tar

Training Examples

EfficientNet-B2 with RandAugment - 80.4 top-1, 95.1 top-5

These params are for dual Titan RTX cards with NVIDIA Apex installed:

Copied

./distributed_train.sh 2 /imagenet/ --model efficientnet_b2 -b 128 --sched step --epochs 450 --decay-epochs 2.4 --decay-rate .97 --opt rmsproptf --opt-eps .001 -j 8 --warmup-lr 1e-6 --weight-decay 1e-5 --drop 0.3 --drop-path 0.2 --model-ema --model-ema-decay 0.9999 --aa rand-m9-mstd0.5 --remode pixel --reprob 0.2 --amp --lr .016

MixNet-XL with RandAugment - 80.5 top-1, 94.9 top-5

This params are for dual Titan RTX cards with NVIDIA Apex installed:

Copied

./distributed_train.sh 2 /imagenet/ --model mixnet_xl -b 128 --sched step --epochs 450 --decay-epochs 2.4 --decay-rate .969 --opt rmsproptf --opt-eps .001 -j 8 --warmup-lr 1e-6 --weight-decay 1e-5 --drop 0.3 --drop-path 0.2 --model-ema --model-ema-decay 0.9999 --aa rand-m9-mstd0.5 --remode pixel --reprob 0.3 --amp --lr .016 --dist-bn reduce

SE-ResNeXt-26-D and SE-ResNeXt-26-T

These hparams (or similar) work well for a wide range of ResNet architecture, generally a good idea to increase the epoch # as the model size increases… ie approx 180-200 for ResNe(X)t50, and 220+ for larger. Increase batch size and LR proportionally for better GPUs or with AMP enabled. These params were for 2 1080Ti cards:

Copied

./distributed_train.sh 2 /imagenet/ --model seresnext26t_32x4d --lr 0.1 --warmup-epochs 5 --epochs 160 --weight-decay 1e-4 --sched cosine --reprob 0.4 --remode pixel -b 112

EfficientNet-B3 with RandAugment - 81.5 top-1, 95.7 top-5

The training of this model started with the same command line as EfficientNet-B2 w/ RA above. After almost three weeks of training the process crashed. The results weren’t looking amazing so I resumed the training several times with tweaks to a few params (increase RE prob, decrease rand-aug, increase ema-decay). Nothing looked great. I ended up averaging the best checkpoints from all restarts. The result is mediocre at default res/crop but oddly performs much better with a full image test crop of 1.0.

EfficientNet-B0 with RandAugment - 77.7 top-1, 95.3 top-5

Copied

./distributed_train.sh 2 /imagenet/ --model efficientnet_b0 -b 384 --sched step --epochs 450 --decay-epochs 2.4 --decay-rate .97 --opt rmsproptf --opt-eps .001 -j 8 --warmup-lr 1e-6 --weight-decay 1e-5 --drop 0.2 --drop-path 0.2 --model-ema --model-ema-decay 0.9999 --aa rand-m9-mstd0.5 --remode pixel --reprob 0.2 --amp --lr .048

ResNet50 with JSD loss and RandAugment (clean + 2x RA augs) - 79.04 top-1, 94.39 top-5

Trained on two older 1080Ti cards, this took a while. Only slightly, non statistically better ImageNet validation result than my first good AugMix training of 78.99. However, these weights are more robust on tests with ImageNetV2, ImageNet-Sketch, etc. Unlike my first AugMix runs, I’ve enabled SplitBatchNorm, disabled random erasing on the clean split, and cranked up random erasing prob on the 2 augmented paths.

Copied

./distributed_train.sh 2 /imagenet -b 64 --model resnet50 --sched cosine --epochs 200 --lr 0.05 --amp --remode pixel --reprob 0.6 --aug-splits 3 --aa rand-m9-mstd0.5-inc1 --resplit --split-bn --jsd --dist-bn reduce

EfficientNet-ES (EdgeTPU-Small) with RandAugment - 78.066 top-1, 93.926 top-5

Copied

./distributed_train.sh 8 /imagenet --model efficientnet_es -b 128 --sched step --epochs 450 --decay-epochs 2.4 --decay-rate .97 --opt rmsproptf --opt-eps .001 -j 8 --warmup-lr 1e-6 --weight-decay 1e-5 --drop 0.2 --drop-path 0.2  --aa rand-m9-mstd0.5 --remode pixel --reprob 0.2 --amp --lr .064

MobileNetV3-Large-100 - 75.766 top-1, 92,542 top-5

Copied

./distributed_train.sh 2 /imagenet/ --model mobilenetv3_large_100 -b 512 --sched step --epochs 600 --decay-epochs 2.4 --decay-rate .973 --opt rmsproptf --opt-eps .001 -j 7 --warmup-lr 1e-6 --weight-decay 1e-5 --drop 0.2 --drop-path 0.2 --model-ema --model-ema-decay 0.9999 --aa rand-m9-mstd0.5 --remode pixel --reprob 0.2 --amp --lr .064 --lr-noise 0.42 0.9

ResNeXt-50 32x4d w/ RandAugment - 79.762 top-1, 94.60 top-5

These params will also work well for SE-ResNeXt-50 and SK-ResNeXt-50 and likely 101. I used them for the SK-ResNeXt-50 32x4d that I trained with 2 GPU using a slightly higher LR per effective batch size (lr=0.18, b=192 per GPU). The cmd line below are tuned for 8 GPU training.

Copied

./distributed_train.sh 8 /imagenet --model resnext50_32x4d --lr 0.6 --warmup-epochs 5 --epochs 240 --weight-decay 1e-4 --sched cosine --reprob 0.4 --recount 3 --remode pixel --aa rand-m7-mstd0.5-inc1 -b 192 -j 6 --amp --dist-bn reduce

achieved these results with the command line for B2 adapted for larger batch size, with the recommended B0 dropout rate of 0.2.

Trained by with 8 V100 cards. Model EMA was not used, final checkpoint is the average of 8 best checkpoints during training.

🌍
PyTorch Imagenet Examples
NVIDIA’s APEX Examples
Michael Klachko
Andrew Lavin