Gradient Accumulation
Gradient accumulation allows training on batch sizes which are too large to fit into memory by splitting each batch into mini-batches and skipping the optimizer step until gradients have been accumulated from all mini-batches.
Because fastxtend gradient accumulation modifies the default fastai gradient accumulation behavior, it is not included in any all imports. You must import fastxtend gradient accumulation after fastai and fastxend imports:
from fastxtend.callback.gradaccum import *
Differences from fastai
fastai gradient accumulation works by treating each dataloader batch as a micro-batch, and then accumulating the micro-batches across multiple forward and backward steps into one larger macro-batch before performing an optimizer step.
Training Loss Logging
By default, GradientAccumulation
and GradientAccumulationSchedule
record and log accumulated batches instead of micro-batches. This effects training losses and Weights and Biases training steps.
Recorded and logged training losses are the accumulated loss used for the optimization step, not the last micro-batch loss as fastai records. Weights and Biases training steps will be reduced by ratio of micro-batches to accumulated batches, while TensorBoard training steps will be uneffected.
To revert to fastai behavor of recording micro-batches, set log_accum_batch=False
.
Drop Last Batch
By default, GradientAccumulation
and GradientAccumulationSchedule
also drop the entire last macro-batch if there are not enough mini-batches in the epoch. This behavior matches training on the large batch size with the standard PyTorch DataLoader setting of drop_last=True
. In contrast, fastai will accumulate mini-batches across epochs to form the full sized macro-batch.
To revert to fastai micro-batching behavior, set drop_last=False
.
GradientAccumulation
GradientAccumulation (accum_bs:int|None, n_acc:int=32, micro_batch_size:int|None=None, log_accum_batch:bool=True, drop_last:bool=True)
Accumulate gradients before updating weights
Type | Default | Details | |
---|---|---|---|
accum_bs | int | None | Accumulation batch size. Defaults to n_acc if not set |
|
n_acc | int | 32 | Default accum_bs value. Used for compatability with fastai |
micro_batch_size | int | None | None | Manually set micro-batch size if using non-fastai or non-fastxtend dataloader |
log_accum_batch | bool | True | Log each accumulated batch (True) or micro batch (False). False is default fastai behavior |
drop_last | bool | True | Drop last incomplete macro-batch. If False, macro-batch can be accumulated across two epochs (fastai default) |
GradientAccumulationSchedule
GradientAccumulationSchedule (start_accum_bs:int, final_accum_bs:int, start:Numeric=0, finish:Numeric=0.3, schedule:Callable[...,_Annealer]=<function SchedCos>, micro_batch_size:int|None=None, log_accum_batch:bool=True, drop_last:bool=True)
Gradient accumulation with a schedulable batch size
Type | Default | Details | |
---|---|---|---|
start_accum_bs | int | Initial gradient accumulation batch size | |
final_accum_bs | int | Final gradient accumulation batch size | |
start | Numeric | 0 | Start batch size schedule in percent of training steps (float) or epochs (int, index 0) |
finish | Numeric | 0.3 | Finish batch size schedule in percent of training steps (float) or epochs (int, index 0) |
schedule | Callable[…, _Annealer] | SchedCos | Batch size schedule type |
micro_batch_size | int | None | None | Manually set micro-batch size if using non-fastai or non-fastxtend dataloader |
log_accum_batch | bool | True | Log each accumulated batch (True) or micro batch (False). False is default fastai behavior |
drop_last | bool | True | Drop last incomplete macro-batch. If False, macro-batch can be accumulated across two epochs (fastai default) |