Progressive Resizing

Automatic progressive resizing of images during training

ProgressiveResize is inspired by MosaicML’s Progressive Resizing algorithm for Composer which in turn was inspired by fastai’s manual progressive resizing.

Progressive Resizing decreases model training time by training on smaller images then gradually increasing to the full image size. This allows training on more samples for the same compute budget, often leading to higher performance then training on full sized images.

source

IncreaseMode

 IncreaseMode (value, names=None, module=None, qualname=None, type=None,
               start=1)

Increase mode for ProgressiveResize

source

ProgressiveResize

 ProgressiveResize (initial_size:float|tuple[int,int]=0.5,
                    start:Numeric=0.5, finish:Numeric=0.75,
                    increase_by:int=4,
                    increase_mode:IncreaseMode=<IncreaseMode.Batch:
                    'batch'>, resize_mode:str='bilinear',
                    resize_valid:bool=True,
                    final_size:tuple[int,int]|None=None,
                    add_resize:bool=False, resize_targ:bool=False,
                    preallocate_bs:int|None=None, preallocate:bool=True,
                    empty_cache:bool=False, verbose:bool=True)

Progressively increase the size of input images during training. Starting from initial_size and ending at the valid image size or final_size.

	Type	Default	Details
initial_size	float \| tuple[int, int]	0.5	Staring size to increase from. Image shape must be square
start	Numeric	0.5	Earliest upsizing epoch in percent of training time or epoch (index 0)
finish	Numeric	0.75	Last upsizing epoch in percent of training time or epoch (index 0)
increase_by	int	4	Progressively increase image size by `increase_by`, or minimum increase per upsizing epoch
increase_mode	IncreaseMode	IncreaseMode.Batch	Increase image size anytime during training or only before an epoch starts
resize_mode	str	bilinear	PyTorch interpolate mode string for upsizing. Resets to existing fastai DataLoader mode at `final_size`
resize_valid	bool	True	Apply progressive resizing to valid dataset
final_size	tuple[int, int] \| None	None	Final image size. Set if using a non-fastai DataLoaders, automatically detected from fastai DataLoader with batch_tfms
add_resize	bool	False	Add a separate resize step. Use for non-fastai DataLoaders or fastai DataLoader without batch_tfms
resize_targ	bool	False	Applies the separate resize step to targets
preallocate_bs	int \| None	None	Preallocation batch size. Set if the valid DataLoader has a larger batch size than the train DataLoader.
preallocate	bool	True	Preallocate GPU memory with full size image. Can mitigate memory allocation slowdowns during training. If False set `final_size`.
empty_cache	bool	False	Call `torch.cuda.empty_cache()` before a resizing epoch. May prevent Cuda & Magma errors. Don’t use with multiple GPUs
verbose	bool	True	Print a summary of the progressive resizing schedule

Progressive Resizing initially trains on downsampled images then gradually increases the image size over to the full size for the remainder of training.

This can significantly reduce training time at the possible expense of lower model performance. However, Progressive Resizing allows training on more samples within the same compute budget, usually leading to increased performance.

The model must be capable of variable image sizes.

Tip: Increase DataLoader Throughput

ProgressiveResize should increase GPU throughput which may cause other parts of the training pipeline become a bottleneck. You can test for a DataLoader bottleneck using a fastxtend profiler.

An easy way to increase fastai’s DataLoader throughput is by replacing Pillow with Pillow-SIMD.

For best performance, use fastxtend’s FFCV Loader.

When testing Composer’s Progressive Resizing callback MosiacML found:

In our experiments, Progressive Resizing improves the attainable tradeoffs between training speed and the final quality of the trained model. In some cases, it leads to slightly lower quality than the original model for the same number of training steps. However, Progressive Resizing increases training speed so much (via improved throughput during the early part of training) that it is possible to train for more steps, recover accuracy, and still complete training in less time.

ProgressiveResize modifies the fastai batch augmentation pipeline by changing the batch_tfms size during training. Specifically, it modifies AffineCoordTfm size, which is set by any rotate, warp, or resize batch augmentation, and/or RandomResizedCropGPU size. This modification prevents unnecessarily resizing images a second time on the GPU, speeding up the process. If there are no batch_tfms or if training without a fastai DataLoader or fastxtend Loader, set add_resize=True to resize the batch on the GPU using PyTorch’s interpolate.

Progressive Resizing works best when the resize steps are spread out over a significant portion of the dataset.

Tip: Progressive Resizing & Small Datasets

If training small datasets with ProgressiveResize, such as Imagenette, scale the batch mode increase amount to be larger than the default of 4 by setting increase_by to a custom value.

In the example section, increase_by=16 gives good results for training Imagenette for 20-25 epochs.

Important: Preallocating GPU Memory Uses a Validation Batch

Before training starts, ProgressiveResize performs a dry run to preallocate GPU memory required for training on full images. This can prevent stuttering during training due to memory allocation.

If the validation batch size is larger than the training batch size, set preallocate_bs to the training batch size so ProgressiveResize will preallocate the correct amount of memory.

Validation images must be the same size as the full training image size.

Note: Logging Image Size

ProgressiveResize supports logging to Weights & Biases and TensorBoard via the LogDispatch callback. If either the fastai.callback.wandb.WandbCallback or fastai.callback.tensorboard.TensorBoardCallback are added to Learner, ProgressiveResize will automatically log the current image size as progressive_resize_size.

Note: Older Versions of PyTorch

If training on older versions of PyTorch with ProgressiveResize results in CUDA or Magma errors, try setting increase_mode=IncreaseMode.Epoch and empty_cache=True.

This will upsize once per epoch and call torch.cuda.empty_cache() before a resizing epoch. empty_cache=True may interfere with training multiple models on multi-GPU systems.

ProgressiveResize is fully compatible with CutMixUpAugment.

Example

In this example¹, a xresnext50 is trained for 20 & 25 epochs on Imagenette at an image size of 224 pixels. Due to the short training run and small dataset, ProgressiveResize in batch mode is set to increase_by=16.

ProgressiveResize yields significant training time savings compared to training at full size. At a normalized compute budget of roughly 6.5 minutes, Progressive Resizing results with 92.7% accuracy compared to 92% accuracy with full sized training.

Mode	Epochs	Time (Mins)	Accuracy
Full Size	20	6.5	92.0%
Progressive Batch	20	5.2	92.3%
Progressive Epoch	20	5.2	91.8%
Progressive Batch	25	6.5	92.7%

Due to the regularization effect of training on different sized images, Progressive Resizing with increase_by=16 outperforms full sized training by 0.3% in 25 percent less timeon the same number of epochs² .

Progressive Resizing

There are two Progressive Resizing IncreaseMode types:

increase_mode=IncreaseMode.Batch
increase_mode=IncreaseMode.Epoch

this example will show both.

Batch Resizing

ProgressiveResize with the default increase_mode=IncreaseMode.Batch.

imagenette = untar_data(URLs.IMAGENETTE_320)

with less_random():
    dblock = DataBlock(blocks=(ImageBlock, CategoryBlock),
                       splitter=GrandparentSplitter(valid_name='val'),
                       get_items=get_image_files, get_y=parent_label,
                       item_tfms=Resize(224),
                       batch_tfms=[*aug_transforms(),
                                   Normalize.from_stats(*imagenet_stats)])

    dls = dblock.dataloaders(imagenette, bs=64,
                             num_workers=num_cpus(), pin_memory=True)

    learn = Learner(dls, xresnext50(n_out=dls.c), opt_func=ranger(foreach=True),
                    loss_func=nn.CrossEntropyLoss(label_smoothing=0.1),
                    metrics=Accuracy()).to_channelslast()

    start = time.perf_counter()
    learn.fit_flat_cos(20, 8e-3, cbs=ProgressiveResize(increase_by=16))
    total = time.perf_counter() - start
    print(f'Total training time: {scale_time(total)}')

Progressively increase the initial image size of [112, 112] by 16 pixels every 0.8333 epochs for 7 resizes. 
Starting at epoch 10 and finishing at epoch 15 for a final training size of [224, 224].
Total training time: 311.8 s

epoch	train_loss	valid_loss	accuracy	time
0	1.670977	1.883999	0.454268	00:13
1	1.403678	1.226364	0.710573	00:13
2	1.251599	1.446574	0.626497	00:13
3	1.136825	1.079901	0.768662	00:13
4	1.062239	1.250891	0.718981	00:13
5	1.006945	0.955187	0.820127	00:13
6	0.957047	1.238453	0.703439	00:13
7	0.910177	0.900485	0.842548	00:13
8	0.889880	0.963289	0.816560	00:13
9	0.860453	0.881689	0.849936	00:13
10	0.839035	0.952776	0.828535	00:14
11	0.815573	0.857969	0.863439	00:14
12	0.791430	0.863359	0.858089	00:15
13	0.770404	0.786469	0.887898	00:16
14	0.772790	0.848503	0.866242	00:17
15	0.745845	0.774377	0.890955	00:19
16	0.704179	0.769880	0.891974	00:19
17	0.640226	0.731178	0.914395	00:19
18	0.604006	0.706209	0.920764	00:19
19	0.584977	0.698062	0.922548	00:19

Epoch Resizing

ProgressiveResize with increase_mode=IncreaseMode.Epoch.

imagenette = untar_data(URLs.IMAGENETTE_320)

with less_random():
    dblock = DataBlock(blocks=(ImageBlock, CategoryBlock),
                       splitter=GrandparentSplitter(valid_name='val'),
                       get_items=get_image_files, get_y=parent_label,
                       item_tfms=Resize(224),
                       batch_tfms=[*aug_transforms(),
                                   Normalize.from_stats(*imagenet_stats)])

    dls = dblock.dataloaders(imagenette, bs=64,
                             num_workers=num_cpus(), pin_memory=True)

    learn = Learner(dls, xresnext50(n_out=dls.c), opt_func=ranger(foreach=True),
                    loss_func=nn.CrossEntropyLoss(label_smoothing=0.1),
                    metrics=Accuracy()).to_channelslast()

    start = time.perf_counter()
    learn.fit_flat_cos(20, 8e-3,
                       cbs=ProgressiveResize(increase_mode=IncreaseMode.Epoch))
    total = time.perf_counter() - start
    print(f'Total training time: {scale_time(total)}')

Progressively increase the initial image size of [112, 112] by 28 pixels every 1 epoch for 4 resizes.
Starting at epoch 12 and finishing at epoch 15 for a final training size of [224, 224].
Total training time: 309.3 s

epoch	train_loss	valid_loss	accuracy	time
0	1.670977	1.883999	0.454268	00:13
1	1.403678	1.226364	0.710573	00:13
2	1.251599	1.446574	0.626497	00:13
3	1.136825	1.079901	0.768662	00:13
4	1.062239	1.250891	0.718981	00:13
5	1.006945	0.955187	0.820127	00:13
6	0.957047	1.238453	0.703439	00:13
7	0.910177	0.900485	0.842548	00:13
8	0.889880	0.963289	0.816560	00:13
9	0.860453	0.881689	0.849936	00:14
10	0.839285	0.916867	0.835159	00:13
11	0.817720	0.837916	0.866242	00:13
12	0.792356	0.844864	0.869045	00:14
13	0.780980	0.811714	0.878471	00:15
14	0.780541	0.870851	0.853758	00:17
15	0.766215	0.788430	0.888153	00:19
16	0.709244	0.788267	0.887134	00:19
17	0.649643	0.732368	0.915159	00:19
18	0.611495	0.717171	0.915414	00:19
19	0.590308	0.708605	0.918471	00:19

Normal Training

fastai model training without Progressive Resizing.

imagenette = untar_data(URLs.IMAGENETTE_320)

with less_random():
    dblock = DataBlock(blocks=(ImageBlock, CategoryBlock),
                       splitter=GrandparentSplitter(valid_name='val'),
                       get_items=get_image_files, get_y=parent_label,
                       item_tfms=Resize(224),
                       batch_tfms=[*aug_transforms(),
                                   Normalize.from_stats(*imagenet_stats)])

    dls = dblock.dataloaders(imagenette, bs=64,
                             num_workers=num_cpus(), pin_memory=True)

    learn = Learner(dls, xresnext50(n_out=dls.c), opt_func=ranger(foreach=True),
                    loss_func=nn.CrossEntropyLoss(label_smoothing=0.1),
                    metrics=Accuracy()).to_channelslast()

    start = time.perf_counter()
    learn.fit_flat_cos(20, 8e-3)
    total = time.perf_counter() - start
    print(f'Total training time: {scale_time(total)}')

epoch	train_loss	valid_loss	accuracy	time
0	1.693837	1.660484	0.539873	00:19
1	1.425402	1.288508	0.682548	00:19
2	1.249855	1.231204	0.726879	00:19
3	1.107746	1.027718	0.794904	00:19
4	1.046856	1.113385	0.782420	00:19
5	0.974740	1.055205	0.800255	00:19
6	0.933220	1.195850	0.756688	00:19
7	0.880307	0.905752	0.845096	00:19
8	0.854195	1.113956	0.772229	00:19
9	0.839854	0.838828	0.868790	00:19
10	0.815095	0.868798	0.861146	00:19
11	0.786958	0.839955	0.867771	00:19
12	0.763221	0.884713	0.853758	00:19
13	0.751375	0.780010	0.890955	00:19
14	0.740009	0.835440	0.872102	00:19
15	0.710022	0.793270	0.888408	00:19
16	0.683471	0.743397	0.909809	00:19
17	0.630301	0.735851	0.910828	00:19
18	0.596394	0.715150	0.916688	00:19
19	0.577142	0.701738	0.918981	00:19

Total training time: 387.2 s

Progressive Resizing with Normalized Compute Budget

ProgressiveResize with the default increase_mode=IncreaseMode.Batch trained to match the Normal Training’s compute.

imagenette = untar_data(URLs.IMAGENETTE_320)

with less_random():
    dblock = DataBlock(blocks=(ImageBlock, CategoryBlock),
                       splitter=GrandparentSplitter(valid_name='val'),
                       get_items=get_image_files, get_y=parent_label,
                       item_tfms=Resize(224),
                       batch_tfms=[*aug_transforms(),
                                   Normalize.from_stats(*imagenet_stats)])

    dls = dblock.dataloaders(imagenette, bs=64,
                             num_workers=num_cpus(), pin_memory=True)

    learn = Learner(dls, xresnext50(n_out=dls.c), opt_func=ranger(foreach=True),
                    loss_func=nn.CrossEntropyLoss(label_smoothing=0.1),
                    metrics=Accuracy()).to_channelslast()

    start = time.perf_counter()
    learn.fit_flat_cos(25, 8e-3, cbs=ProgressiveResize(increase_by=16))
    total = time.perf_counter() - start
    print(f'Total training time: {scale_time(total)}')

Progressively increase the initial image size of [112, 112] by 16 pixels every 1.042 epochs for 7 resizes. 
Starting at epoch 12.5 and finishing at epoch 18.75 for a final training size of [224, 224].
Total training time: 390.5 s

epoch	train_loss	valid_loss	accuracy	time
0	1.670977	1.883999	0.454268	00:13
1	1.403678	1.226364	0.710573	00:13
2	1.251599	1.446574	0.626497	00:13
3	1.136825	1.079901	0.768662	00:13
4	1.062239	1.250891	0.718981	00:13
5	1.006945	0.955187	0.820127	00:13
6	0.957047	1.238453	0.703439	00:13
7	0.910177	0.900485	0.842548	00:13
8	0.889880	0.963289	0.816560	00:13
9	0.860453	0.881689	0.849936	00:13
10	0.839285	0.916867	0.835159	00:13
11	0.817720	0.837916	0.866242	00:13
12	0.806093	0.869887	0.850701	00:14
13	0.780977	0.805412	0.877962	00:14
14	0.766974	0.899283	0.839490	00:14
15	0.757296	0.811422	0.878726	00:15
16	0.736302	0.855174	0.853758	00:15
17	0.723357	0.769306	0.901401	00:16
18	0.714021	0.765733	0.895287	00:18
19	0.697444	0.736115	0.911847	00:19
20	0.663537	0.790711	0.881783	00:19
21	0.617896	0.712593	0.919745	00:19
22	0.583567	0.710089	0.918471	00:19
23	0.562689	0.685103	0.927643	00:19
24	0.551753	0.686037	0.926879	00:19

Footnotes

All models are trained on a GeForce 3080 Ti using PyTorch 1.13.1 and Cuda 11.7. Results may differ with other datasets, hardware, and across runs.↩︎
While Progressive Resizing can sometimes outperform full sized trained model in the same number of epochs, it is just as likely to perform worse, depending on setup.↩︎