Using Mixed Precision, image models trained in channels last format on Nvidia Tensor Cores can achieve 8%-35% increased performance over contiguous format.
Channels last memory format is only implemented for 4D NCHW Tensors. Not all PyTorch operators have been converted to support channels last. See (Beta) Channels Last Memory Format in PyTorch for more details.
Channels Last format can error out if torch.backends.cudnn.benchmark = False
, e.g. via fast.ai's no_random context manager. If this occurs the less_random
context manager instead. This will allow reproducable training on the same GPU, PyTorch, and CUDA setup at the expense of less reproducablity should any of those change.
Channels last format requires inputs to be 4D NCHW Tensors, so ChannelsLastTfm
only encodes TensorImageBase
and TensorMask
inputs to channels last using fastcore's type dispatch.
To set another input type as channels last format, patch ChannelsLastTfm.encodes
to dispatch for that type.