Gradient Accumulation

Accumulate multiple mini-batches into one batch

Gradient accumulation allows training on batch sizes which are too large to fit into memory by splitting each batch into mini-batches and skipping the optimizer step until gradients have been accumulated from all mini-batches.

Because fastxtend gradient accumulation modifies the default fastai gradient accumulation behavior, it is not included in any all imports. You must import fastxtend gradient accumulation after fastai and fastxend imports:

from fastxtend.callback.gradaccum import *

Differences from fastai

fastai gradient accumulation works by treating each dataloader batch as a micro-batch, and then accumulating the micro-batches across multiple forward and backward steps into one larger macro-batch before performing an optimizer step.

Training Loss Logging

By default, GradientAccumulation and GradientAccumulationSchedule record and log accumulated batches instead of micro-batches. This effects training losses and Weights and Biases training steps.

Recorded and logged training losses are the accumulated loss used for the optimization step, not the last micro-batch loss as fastai records. Weights and Biases training steps will be reduced by ratio of micro-batches to accumulated batches, while TensorBoard training steps will be uneffected.

To revert to fastai behavor of recording micro-batches, set log_accum_batch=False.

Drop Last Batch

By default, GradientAccumulation and GradientAccumulationSchedule also drop the entire last macro-batch if there are not enough mini-batches in the epoch. This behavior matches training on the large batch size with the standard PyTorch DataLoader setting of drop_last=True. In contrast, fastai will accumulate mini-batches across epochs to form the full sized macro-batch.

To revert to fastai micro-batching behavior, set drop_last=False.

source

GradientAccumulation

 GradientAccumulation (accum_bs:int|None, n_acc:int=32,
                       micro_batch_size:int|None=None,
                       log_accum_batch:bool=True, drop_last:bool=True)

Accumulate gradients before updating weights

	Type	Default	Details
accum_bs	int \| None		Accumulation batch size. Defaults to `n_acc` if not set
n_acc	int	32	Default `accum_bs` value. Used for compatability with fastai
micro_batch_size	int \| None	None	Manually set micro-batch size if using non-fastai or non-fastxtend dataloader
log_accum_batch	bool	True	Log each accumulated batch (True) or micro batch (False). False is default fastai behavior
drop_last	bool	True	Drop last incomplete macro-batch. If False, macro-batch can be accumulated across two epochs (fastai default)

source

GradientAccumulationSchedule

 GradientAccumulationSchedule (start_accum_bs:int, final_accum_bs:int,
                               start:Numeric=0, finish:Numeric=0.3,
                               schedule:Callable[...,_Annealer]=<function
                               SchedCos>, micro_batch_size:int|None=None,
                               log_accum_batch:bool=True,
                               drop_last:bool=True)

Gradient accumulation with a schedulable batch size

	Type	Default	Details
start_accum_bs	int		Initial gradient accumulation batch size
final_accum_bs	int		Final gradient accumulation batch size
start	Numeric	0	Start batch size schedule in percent of training steps (float) or epochs (int, index 0)
finish	Numeric	0.3	Finish batch size schedule in percent of training steps (float) or epochs (int, index 0)
schedule	Callable[…, _Annealer]	SchedCos	Batch size schedule type
micro_batch_size	int \| None	None	Manually set micro-batch size if using non-fastai or non-fastxtend dataloader
log_accum_batch	bool	True	Log each accumulated batch (True) or micro batch (False). False is default fastai behavior
drop_last	bool	True	Drop last incomplete macro-batch. If False, macro-batch can be accumulated across two epochs (fastai default)