Progressive Resizing

Automatic progressive resizing of images during training

ProgressiveResize is inspired by MosaicML’s Progressive Resizing algorithm for Composer which in turn was inspired by fastai’s manual progressive resizing.

progressive resizing illustrated

Progressive Resizing decreases model training time by training on smaller images then gradually increasing to the full image size. This allows training on more samples for the same compute budget, often leading to higher performance then training on full sized images.


source

IncreaseMode

 IncreaseMode (value, names=None, module=None, qualname=None, type=None,
               start=1)

Increase mode for ProgressiveResize


source

ProgressiveResize

 ProgressiveResize (initial_size:float|tuple[int,int]=0.5,
                    start:Numeric=0.5, finish:Numeric=0.75,
                    increase_by:int=4,
                    increase_mode:IncreaseMode=<IncreaseMode.Batch:
                    'batch'>, resize_mode:str='bilinear',
                    resize_valid:bool=True,
                    final_size:tuple[int,int]|None=None,
                    add_resize:bool=False, resize_targ:bool=False,
                    preallocate_bs:int|None=None, preallocate:bool=True,
                    empty_cache:bool=False, verbose:bool=True)

Progressively increase the size of input images during training. Starting from initial_size and ending at the valid image size or final_size.

Type Default Details
initial_size float | tuple[int, int] 0.5 Staring size to increase from. Image shape must be square
start Numeric 0.5 Earliest upsizing epoch in percent of training time or epoch (index 0)
finish Numeric 0.75 Last upsizing epoch in percent of training time or epoch (index 0)
increase_by int 4 Progressively increase image size by increase_by, or minimum increase per upsizing epoch
increase_mode IncreaseMode IncreaseMode.Batch Increase image size anytime during training or only before an epoch starts
resize_mode str bilinear PyTorch interpolate mode string for upsizing. Resets to existing fastai DataLoader mode at final_size
resize_valid bool True Apply progressive resizing to valid dataset
final_size tuple[int, int] | None None Final image size. Set if using a non-fastai DataLoaders, automatically detected from fastai DataLoader with batch_tfms
add_resize bool False Add a separate resize step. Use for non-fastai DataLoaders or fastai DataLoader without batch_tfms
resize_targ bool False Applies the separate resize step to targets
preallocate_bs int | None None Preallocation batch size. Set if the valid DataLoader has a larger batch size than the train DataLoader.
preallocate bool True Preallocate GPU memory with full size image. Can mitigate memory allocation slowdowns during training. If False set final_size.
empty_cache bool False Call torch.cuda.empty_cache() before a resizing epoch. May prevent Cuda & Magma errors. Don’t use with multiple GPUs
verbose bool True Print a summary of the progressive resizing schedule

Progressive Resizing initially trains on downsampled images then gradually increases the image size over to the full size for the remainder of training.

This can significantly reduce training time at the possible expense of lower model performance. However, Progressive Resizing allows training on more samples within the same compute budget, usually leading to increased performance.

The model must be capable of variable image sizes.

Tip: Increase DataLoader Throughput

ProgressiveResize should increase GPU throughput which may cause other parts of the training pipeline become a bottleneck. You can test for a DataLoader bottleneck using a fastxtend profiler.

An easy way to increase fastai’s DataLoader throughput is by replacing Pillow with Pillow-SIMD.

For best performance, use fastxtend’s FFCV Loader.

When testing Composer’s Progressive Resizing callback MosiacML found:

In our experiments, Progressive Resizing improves the attainable tradeoffs between training speed and the final quality of the trained model. In some cases, it leads to slightly lower quality than the original model for the same number of training steps. However, Progressive Resizing increases training speed so much (via improved throughput during the early part of training) that it is possible to train for more steps, recover accuracy, and still complete training in less time.

ProgressiveResize modifies the fastai batch augmentation pipeline by changing the batch_tfms size during training. Specifically, it modifies AffineCoordTfm size, which is set by any rotate, warp, or resize batch augmentation, and/or RandomResizedCropGPU size. This modification prevents unnecessarily resizing images a second time on the GPU, speeding up the process. If there are no batch_tfms or if training without a fastai DataLoader or fastxtend Loader, set add_resize=True to resize the batch on the GPU using PyTorch’s interpolate.

Progressive Resizing works best when the resize steps are spread out over a significant portion of the dataset.

Tip: Progressive Resizing & Small Datasets

If training small datasets with ProgressiveResize, such as Imagenette, scale the batch mode increase amount to be larger than the default of 4 by setting increase_by to a custom value.

In the example section, increase_by=16 gives good results for training Imagenette for 20-25 epochs.

Important: Preallocating GPU Memory Uses a Validation Batch

Before training starts, ProgressiveResize performs a dry run to preallocate GPU memory required for training on full images. This can prevent stuttering during training due to memory allocation.

If the validation batch size is larger than the training batch size, set preallocate_bs to the training batch size so ProgressiveResize will preallocate the correct amount of memory.

Validation images must be the same size as the full training image size.

ProgressiveResize supports logging to Weights & Biases and TensorBoard via the LogDispatch callback. If either the fastai.callback.wandb.WandbCallback or fastai.callback.tensorboard.TensorBoardCallback are added to Learner, ProgressiveResize will automatically log the current image size as progressive_resize_size.

If training on older versions of PyTorch with ProgressiveResize results in CUDA or Magma errors, try setting increase_mode=IncreaseMode.Epoch and empty_cache=True.

This will upsize once per epoch and call torch.cuda.empty_cache() before a resizing epoch. empty_cache=True may interfere with training multiple models on multi-GPU systems.

ProgressiveResize is fully compatible with CutMixUpAugment.

Example

In this example1, a xresnext50 is trained for 20 & 25 epochs on Imagenette at an image size of 224 pixels. Due to the short training run and small dataset, ProgressiveResize in batch mode is set to increase_by=16.

ProgressiveResize yields significant training time savings compared to training at full size. At a normalized compute budget of roughly 6.5 minutes, Progressive Resizing results with 92.7% accuracy compared to 92% accuracy with full sized training.

Mode Epochs Time (Mins) Accuracy
Full Size 20 6.5 92.0%
Progressive Batch 20 5.2 92.3%
Progressive Epoch 20 5.2 91.8%
Progressive Batch 25 6.5 92.7%

Due to the regularization effect of training on different sized images, Progressive Resizing with increase_by=16 outperforms full sized training by 0.3% in 25 percent less timeon the same number of epochs2 .

Progressive Resizing

There are two Progressive Resizing IncreaseMode types:

  • increase_mode=IncreaseMode.Batch
  • increase_mode=IncreaseMode.Epoch

this example will show both.

Batch Resizing

ProgressiveResize with the default increase_mode=IncreaseMode.Batch.

imagenette = untar_data(URLs.IMAGENETTE_320)

with less_random():
    dblock = DataBlock(blocks=(ImageBlock, CategoryBlock),
                       splitter=GrandparentSplitter(valid_name='val'),
                       get_items=get_image_files, get_y=parent_label,
                       item_tfms=Resize(224),
                       batch_tfms=[*aug_transforms(),
                                   Normalize.from_stats(*imagenet_stats)])

    dls = dblock.dataloaders(imagenette, bs=64,
                             num_workers=num_cpus(), pin_memory=True)

    learn = Learner(dls, xresnext50(n_out=dls.c), opt_func=ranger(foreach=True),
                    loss_func=nn.CrossEntropyLoss(label_smoothing=0.1),
                    metrics=Accuracy()).to_channelslast()

    start = time.perf_counter()
    learn.fit_flat_cos(20, 8e-3, cbs=ProgressiveResize(increase_by=16))
    total = time.perf_counter() - start
    print(f'Total training time: {scale_time(total)}')
Progressively increase the initial image size of [112, 112] by 16 pixels every 0.8333 epochs for 7 resizes. 
Starting at epoch 10 and finishing at epoch 15 for a final training size of [224, 224].
Total training time: 311.8 s
epoch train_loss valid_loss accuracy time
0 1.670977 1.883999 0.454268 00:13
1 1.403678 1.226364 0.710573 00:13
2 1.251599 1.446574 0.626497 00:13
3 1.136825 1.079901 0.768662 00:13
4 1.062239 1.250891 0.718981 00:13
5 1.006945 0.955187 0.820127 00:13
6 0.957047 1.238453 0.703439 00:13
7 0.910177 0.900485 0.842548 00:13
8 0.889880 0.963289 0.816560 00:13
9 0.860453 0.881689 0.849936 00:13
10 0.839035 0.952776 0.828535 00:14
11 0.815573 0.857969 0.863439 00:14
12 0.791430 0.863359 0.858089 00:15
13 0.770404 0.786469 0.887898 00:16
14 0.772790 0.848503 0.866242 00:17
15 0.745845 0.774377 0.890955 00:19
16 0.704179 0.769880 0.891974 00:19
17 0.640226 0.731178 0.914395 00:19
18 0.604006 0.706209 0.920764 00:19
19 0.584977 0.698062 0.922548 00:19

Epoch Resizing

ProgressiveResize with increase_mode=IncreaseMode.Epoch.

imagenette = untar_data(URLs.IMAGENETTE_320)

with less_random():
    dblock = DataBlock(blocks=(ImageBlock, CategoryBlock),
                       splitter=GrandparentSplitter(valid_name='val'),
                       get_items=get_image_files, get_y=parent_label,
                       item_tfms=Resize(224),
                       batch_tfms=[*aug_transforms(),
                                   Normalize.from_stats(*imagenet_stats)])

    dls = dblock.dataloaders(imagenette, bs=64,
                             num_workers=num_cpus(), pin_memory=True)

    learn = Learner(dls, xresnext50(n_out=dls.c), opt_func=ranger(foreach=True),
                    loss_func=nn.CrossEntropyLoss(label_smoothing=0.1),
                    metrics=Accuracy()).to_channelslast()

    start = time.perf_counter()
    learn.fit_flat_cos(20, 8e-3,
                       cbs=ProgressiveResize(increase_mode=IncreaseMode.Epoch))
    total = time.perf_counter() - start
    print(f'Total training time: {scale_time(total)}')
Progressively increase the initial image size of [112, 112] by 28 pixels every 1 epoch for 4 resizes.
Starting at epoch 12 and finishing at epoch 15 for a final training size of [224, 224].
Total training time: 309.3 s
epoch train_loss valid_loss accuracy time
0 1.670977 1.883999 0.454268 00:13
1 1.403678 1.226364 0.710573 00:13
2 1.251599 1.446574 0.626497 00:13
3 1.136825 1.079901 0.768662 00:13
4 1.062239 1.250891 0.718981 00:13
5 1.006945 0.955187 0.820127 00:13
6 0.957047 1.238453 0.703439 00:13
7 0.910177 0.900485 0.842548 00:13
8 0.889880 0.963289 0.816560 00:13
9 0.860453 0.881689 0.849936 00:14
10 0.839285 0.916867 0.835159 00:13
11 0.817720 0.837916 0.866242 00:13
12 0.792356 0.844864 0.869045 00:14
13 0.780980 0.811714 0.878471 00:15
14 0.780541 0.870851 0.853758 00:17
15 0.766215 0.788430 0.888153 00:19
16 0.709244 0.788267 0.887134 00:19
17 0.649643 0.732368 0.915159 00:19
18 0.611495 0.717171 0.915414 00:19
19 0.590308 0.708605 0.918471 00:19

Normal Training

fastai model training without Progressive Resizing.

imagenette = untar_data(URLs.IMAGENETTE_320)

with less_random():
    dblock = DataBlock(blocks=(ImageBlock, CategoryBlock),
                       splitter=GrandparentSplitter(valid_name='val'),
                       get_items=get_image_files, get_y=parent_label,
                       item_tfms=Resize(224),
                       batch_tfms=[*aug_transforms(),
                                   Normalize.from_stats(*imagenet_stats)])

    dls = dblock.dataloaders(imagenette, bs=64,
                             num_workers=num_cpus(), pin_memory=True)

    learn = Learner(dls, xresnext50(n_out=dls.c), opt_func=ranger(foreach=True),
                    loss_func=nn.CrossEntropyLoss(label_smoothing=0.1),
                    metrics=Accuracy()).to_channelslast()

    start = time.perf_counter()
    learn.fit_flat_cos(20, 8e-3)
    total = time.perf_counter() - start
    print(f'Total training time: {scale_time(total)}')
epoch train_loss valid_loss accuracy time
0 1.693837 1.660484 0.539873 00:19
1 1.425402 1.288508 0.682548 00:19
2 1.249855 1.231204 0.726879 00:19
3 1.107746 1.027718 0.794904 00:19
4 1.046856 1.113385 0.782420 00:19
5 0.974740 1.055205 0.800255 00:19
6 0.933220 1.195850 0.756688 00:19
7 0.880307 0.905752 0.845096 00:19
8 0.854195 1.113956 0.772229 00:19
9 0.839854 0.838828 0.868790 00:19
10 0.815095 0.868798 0.861146 00:19
11 0.786958 0.839955 0.867771 00:19
12 0.763221 0.884713 0.853758 00:19
13 0.751375 0.780010 0.890955 00:19
14 0.740009 0.835440 0.872102 00:19
15 0.710022 0.793270 0.888408 00:19
16 0.683471 0.743397 0.909809 00:19
17 0.630301 0.735851 0.910828 00:19
18 0.596394 0.715150 0.916688 00:19
19 0.577142 0.701738 0.918981 00:19
Total training time: 387.2 s

Progressive Resizing with Normalized Compute Budget

ProgressiveResize with the default increase_mode=IncreaseMode.Batch trained to match the Normal Training’s compute.

imagenette = untar_data(URLs.IMAGENETTE_320)

with less_random():
    dblock = DataBlock(blocks=(ImageBlock, CategoryBlock),
                       splitter=GrandparentSplitter(valid_name='val'),
                       get_items=get_image_files, get_y=parent_label,
                       item_tfms=Resize(224),
                       batch_tfms=[*aug_transforms(),
                                   Normalize.from_stats(*imagenet_stats)])

    dls = dblock.dataloaders(imagenette, bs=64,
                             num_workers=num_cpus(), pin_memory=True)

    learn = Learner(dls, xresnext50(n_out=dls.c), opt_func=ranger(foreach=True),
                    loss_func=nn.CrossEntropyLoss(label_smoothing=0.1),
                    metrics=Accuracy()).to_channelslast()

    start = time.perf_counter()
    learn.fit_flat_cos(25, 8e-3, cbs=ProgressiveResize(increase_by=16))
    total = time.perf_counter() - start
    print(f'Total training time: {scale_time(total)}')
Progressively increase the initial image size of [112, 112] by 16 pixels every 1.042 epochs for 7 resizes. 
Starting at epoch 12.5 and finishing at epoch 18.75 for a final training size of [224, 224].
Total training time: 390.5 s
epoch train_loss valid_loss accuracy time
0 1.670977 1.883999 0.454268 00:13
1 1.403678 1.226364 0.710573 00:13
2 1.251599 1.446574 0.626497 00:13
3 1.136825 1.079901 0.768662 00:13
4 1.062239 1.250891 0.718981 00:13
5 1.006945 0.955187 0.820127 00:13
6 0.957047 1.238453 0.703439 00:13
7 0.910177 0.900485 0.842548 00:13
8 0.889880 0.963289 0.816560 00:13
9 0.860453 0.881689 0.849936 00:13
10 0.839285 0.916867 0.835159 00:13
11 0.817720 0.837916 0.866242 00:13
12 0.806093 0.869887 0.850701 00:14
13 0.780977 0.805412 0.877962 00:14
14 0.766974 0.899283 0.839490 00:14
15 0.757296 0.811422 0.878726 00:15
16 0.736302 0.855174 0.853758 00:15
17 0.723357 0.769306 0.901401 00:16
18 0.714021 0.765733 0.895287 00:18
19 0.697444 0.736115 0.911847 00:19
20 0.663537 0.790711 0.881783 00:19
21 0.617896 0.712593 0.919745 00:19
22 0.583567 0.710089 0.918471 00:19
23 0.562689 0.685103 0.927643 00:19
24 0.551753 0.686037 0.926879 00:19

Footnotes

  1. All models are trained on a GeForce 3080 Ti using PyTorch 1.13.1 and Cuda 11.7. Results may differ with other datasets, hardware, and across runs.↩︎

  2. While Progressive Resizing can sometimes outperform full sized trained model in the same number of epochs, it is just as likely to perform worse, depending on setup.↩︎