Channels Last

Train models faster using channels last format (beta)

With MixedPrecision, image models trained in channels last format on Tensor Cores can increase training throughput over contiguous format. PyTorch observed a 22% improvment in ResNet50 training speed using channels last and 8-35% improvement across a selection of models tested on a V100.

Channels last format is compatible with modern GPUs (Volta, Turing, or newer) and modern CPUs (Ice Lake or newer).

Channels last memory format currently is implemented for NCHW Tensors. Not all PyTorch operators have been converted to support channels last. See (Beta) Channels Last Memory Format in PyTorch tutorial for more details.

Note

The ChannelsLast callback has been upstreamed into fastai 2.7.11.

source

ChannelsLast

 ChannelsLast (after_create=None, before_fit=None, before_epoch=None,
               before_train=None, before_batch=None, after_pred=None,
               after_loss=None, before_backward=None,
               after_cancel_backward=None, after_backward=None,
               before_step=None, after_cancel_step=None, after_step=None,
               after_cancel_batch=None, after_batch=None,
               after_cancel_train=None, after_train=None,
               before_validate=None, after_cancel_validate=None,
               after_validate=None, after_cancel_epoch=None,
               after_epoch=None, after_cancel_fit=None, after_fit=None)

Channels last training using PyTorch’s Channels Last Memory Format (beta)

When a PyTorch model is set to channels last format, PyTorch will automatically convert any compatible NCHW input tensors to NHWC format. ChannelsLast sets the model to channels last format, so no changes to dataloaders or inputs are required.

Using ChannelsLast with unsupported PyTorch operations can lead to “channel thrashing”, where channels last input is converted to contiguous format in an unsupported PyTorch operation, then back to channels last for execution on the tensor core, back to contiguous when returned to the operation, and finally to channels last for the next layer. Too many unsupported operations in a model can lead to reduced performance.

Convenience Methods

fastxtend adds two convenience methods to Learner to easily activate and disable channels last format.

fastxend’s Learner.to_channelslast is a drop in replacement for fastai.callback.channelslast.Learner.to_channelslast with the same defaults. It additionally supports AMPMode for selecting either float16 or bfloat16 mixed precision.

source

Learner.to_channelslast

 Learner.to_channelslast (use_amp:bool=True,
                          amp_mode:Union[str,fastxtend.callback.amp.AMPMod
                          e]=<AMPMode.FP16: 'fp16'>, init_scale=65536.0,
                          growth_factor=2.0, backoff_factor=0.5,
                          growth_interval=2000, enabled=True)

Set Learner and inputs to channels_last format and float16 Mixed Precision by default

	Type	Default	Details
use_amp	bool	True	Add `MixedPrecision` with `amp_mode`. Recommended for full channels last performance
amp_mode	str \| AMPMode	AMPMode.FP16	Mixed Precision training mode. Supports fp16 and bf16.
init_scale	float	65536.0
growth_factor	float	2.0
backoff_factor	float	0.5
growth_interval	int	2000
enabled	bool	True

By default, setting Learner.to_channelslast(True) will train in float16 mixed precision to match the fastai method. Set amp_mode=False to train in float32 channels last format (not recommended).

Passing GradScaler arguments to Learner.to_channelslast when training in bfloat16 has no effect, as bfloat16 mixed precision does not used a gradient scaler.

source

Learner.to_contiguous

 Learner.to_contiguous (to_fp32=False)

Set Learner and inputs to contiguous_format (default format), optionally to single precision