Gradient Accumulation

Accumulate multiple mini-batches into one batch

Gradient accumulation allows training on batch sizes which are too large to fit into memory by splitting each batch into mini-batches and skipping the optimizer step until gradients have been accumulated from all mini-batches.

fastai gradient accumulation works by treating each dataloader batch as a mini-batch, and then accumulating the mini-batches across multiple forward and backward steps into one larger batch.

Differences from fastai

By default, GradientAccumulation and GradientAccumulationSchedule record and log accumulated batches instead of micro-batches. This effects training losses and Weights and Biases training steps.

Recorded and logged training losses are the accumulated loss used for the optimization step, not the last micro-batch loss as fastai records. Weights and Biases training steps will be reduced by ratio of micro-batches to accumulated batches, while TensorBoard training steps will be uneffected.

To revert to fastai behavor of recording micro-batches, set log_accum_batch=False.


source

GradientAccumulation

 GradientAccumulation (accum_bs:int|None, n_acc:int=32,
                       log_accum_batch:bool=True)

Accumulate gradients before updating weights

Type Default Details
accum_bs int | None Accumulation batch size. Defaults to n_acc if not set
n_acc int 32 Default accum_bs value. Used for compatability with fastai
log_accum_batch bool True Log each accumulated batch (True) or micro batch (False). False is default fastai behavior

source

GradientAccumulationSchedule

 GradientAccumulationSchedule (start_accum_bs:int, final_accum_bs:int,
                               start:Numeric=0, finish:Numeric=0.3,
                               schedule:Callable[...,_Annealer]=<function
                               SchedCos>, log_accum_batch:bool=True,
                               micro_batch_size:int|None=None)

Gradient accumulation with a schedulable batch size

Type Default Details
start_accum_bs int Initial gradient accumulation batch size
final_accum_bs int Final gradient accumulation batch size
start Numeric 0 Start batch size schedule in percent of training steps (float) or epochs (int, index 0)
finish Numeric 0.3 Finish batch size schedule in percent of training steps (float) or epochs (int, index 0)
schedule Callable[…, _Annealer] SchedCos Batch size schedule type
log_accum_batch bool True Log each accumulated batch (True) or micro batch (False). False is default fastai behavior
micro_batch_size int | None None Manually set micro-batch size if using non-fastai or non-fastxtend dataloader