Gradient Accumulation
Gradient accumulation allows training on batch sizes which are too large to fit into memory by splitting each batch into mini-batches and skipping the optimizer step until gradients have been accumulated from all mini-batches.
fastai gradient accumulation works by treating each dataloader batch as a mini-batch, and then accumulating the mini-batches across multiple forward and backward steps into one larger batch.
Differences from fastai
By default, GradientAccumulation
and GradientAccumulationSchedule
record and log accumulated batches instead of micro-batches. This effects training losses and Weights and Biases training steps.
Recorded and logged training losses are the accumulated loss used for the optimization step, not the last micro-batch loss as fastai records. Weights and Biases training steps will be reduced by ratio of micro-batches to accumulated batches, while TensorBoard training steps will be uneffected.
To revert to fastai behavor of recording micro-batches, set log_accum_batch=False
.
GradientAccumulation
GradientAccumulation (accum_bs:int|None, n_acc:int=32, log_accum_batch:bool=True)
Accumulate gradients before updating weights
Type | Default | Details | |
---|---|---|---|
accum_bs | int | None | Accumulation batch size. Defaults to n_acc if not set |
|
n_acc | int | 32 | Default accum_bs value. Used for compatability with fastai |
log_accum_batch | bool | True | Log each accumulated batch (True) or micro batch (False). False is default fastai behavior |
GradientAccumulationSchedule
GradientAccumulationSchedule (start_accum_bs:int, final_accum_bs:int, start:Numeric=0, finish:Numeric=0.3, schedule:Callable[...,_Annealer]=<function SchedCos>, log_accum_batch:bool=True, micro_batch_size:int|None=None)
Gradient accumulation with a schedulable batch size
Type | Default | Details | |
---|---|---|---|
start_accum_bs | int | Initial gradient accumulation batch size | |
final_accum_bs | int | Final gradient accumulation batch size | |
start | Numeric | 0 | Start batch size schedule in percent of training steps (float) or epochs (int, index 0) |
finish | Numeric | 0.3 | Finish batch size schedule in percent of training steps (float) or epochs (int, index 0) |
schedule | Callable[…, _Annealer] | SchedCos | Batch size schedule type |
log_accum_batch | bool | True | Log each accumulated batch (True) or micro batch (False). False is default fastai behavior |
micro_batch_size | int | None | None | Manually set micro-batch size if using non-fastai or non-fastxtend dataloader |