Gradient Accumulation

Accumulate multiple mini-batches into one batch

Gradient accumulation allows training on batch sizes which are too large to fit into memory by splitting each batch into mini-batches and skipping the optimizer step until gradients have been accumulated from all mini-batches.

Because fastxtend gradient accumulation modifies the default fastai gradient accumulation behavior, it is not included in any all imports. You must import fastxtend gradient accumulation after fastai and fastxend imports:

from fastxtend.callback.gradaccum import *

Differences from fastai

fastai gradient accumulation works by treating each dataloader batch as a micro-batch, and then accumulating the micro-batches across multiple forward and backward steps into one larger macro-batch before performing an optimizer step.

Training Loss Logging

By default, GradientAccumulation and GradientAccumulationSchedule record and log accumulated batches instead of micro-batches. This effects training losses and Weights and Biases training steps.

Recorded and logged training losses are the accumulated loss used for the optimization step, not the last micro-batch loss as fastai records. Weights and Biases training steps will be reduced by ratio of micro-batches to accumulated batches, while TensorBoard training steps will be uneffected.

To revert to fastai behavor of recording micro-batches, set log_accum_batch=False.

Drop Last Batch

By default, GradientAccumulation and GradientAccumulationSchedule also drop the entire last macro-batch if there are not enough mini-batches in the epoch. This behavior matches training on the large batch size with the standard PyTorch DataLoader setting of drop_last=True. In contrast, fastai will accumulate mini-batches across epochs to form the full sized macro-batch.

To revert to fastai micro-batching behavior, set drop_last=False.


source

GradientAccumulation

 GradientAccumulation (accum_bs:int|None, n_acc:int=32,
                       micro_batch_size:int|None=None,
                       log_accum_batch:bool=True, drop_last:bool=True)

Accumulate gradients before updating weights

Type Default Details
accum_bs int | None Accumulation batch size. Defaults to n_acc if not set
n_acc int 32 Default accum_bs value. Used for compatability with fastai
micro_batch_size int | None None Manually set micro-batch size if using non-fastai or non-fastxtend dataloader
log_accum_batch bool True Log each accumulated batch (True) or micro batch (False). False is default fastai behavior
drop_last bool True Drop last incomplete macro-batch. If False, macro-batch can be accumulated across two epochs (fastai default)

source

GradientAccumulationSchedule

 GradientAccumulationSchedule (start_accum_bs:int, final_accum_bs:int,
                               start:Numeric=0, finish:Numeric=0.3,
                               schedule:Callable[...,_Annealer]=<function
                               SchedCos>, micro_batch_size:int|None=None,
                               log_accum_batch:bool=True,
                               drop_last:bool=True)

Gradient accumulation with a schedulable batch size

Type Default Details
start_accum_bs int Initial gradient accumulation batch size
final_accum_bs int Final gradient accumulation batch size
start Numeric 0 Start batch size schedule in percent of training steps (float) or epochs (int, index 0)
finish Numeric 0.3 Finish batch size schedule in percent of training steps (float) or epochs (int, index 0)
schedule Callable[…, _Annealer] SchedCos Batch size schedule type
micro_batch_size int | None None Manually set micro-batch size if using non-fastai or non-fastxtend dataloader
log_accum_batch bool True Log each accumulated batch (True) or micro batch (False). False is default fastai behavior
drop_last bool True Drop last incomplete macro-batch. If False, macro-batch can be accumulated across two epochs (fastai default)