Throughput and Simple Profilers for fastai. Inspired by PyTorch Lightning’s SimpleProfiler.
Since fastxtend profilers change the fastai data loading loop, they are not imported by any of the fastxtend all imports and need to be imported seperately:
from fastxtend.callback import profiler
Warning
Throughput and Simple Profiler are untested on distributed training.
fastai callbacks do not have an event which is called directly before drawing a batch. fastxtend profilers add a new callback event called before_draw.
With a fastxtend profiler imported, a callback can implement actions on the following events:
after_create: called after the Learner is created
before_fit: called before starting training or inference, ideal for initial setup.
before_epoch: called at the beginning of each epoch, useful for any behavior you need to reset at each epoch.
before_train: called at the beginning of the training part of an epoch.
before_draw: called at the beginning of each batch, just before drawing said batch.
before_batch: called at the beginning of each batch, just after drawing said batch. It can be used to do any setup necessary for the batch (like hyper-parameter scheduling) or to change the input/target before it goes in the model (change of the input with techniques like mixup for instance).
after_pred: called after computing the output of the model on the batch. It can be used to change that output before it’s fed to the loss.
after_loss: called after the loss has been computed, but before the backward pass. It can be used to add any penalty to the loss (AR or TAR in RNN training for instance).
before_backward: called after the loss has been computed, but only in training mode (i.e. when the backward pass will be used)
before_step: called after the backward pass, but before the update of the parameters. It can be used to do any change to the gradients before said update (gradient clipping for instance).
after_step: called after the step and before the gradients are zeroed.
after_batch: called at the end of a batch, for any clean-up before the next one.
after_train: called at the end of the training phase of an epoch.
before_validate: called at the beginning of the validation phase of an epoch, useful for any setup needed specifically for validation.
after_validate: called at the end of the validation part of an epoch.
after_epoch: called at the end of an epoch, for any clean-up before the next one.
after_fit: called at the end of training, for final clean-up.
Run a fastxtend profiler which removes itself when finished training.
Type
Default
Details
mode
ProfileMode
ProfileMode.Throughput
Which profiler to use. Throughput or Simple.
show_report
bool
True
Display formatted report post profile
plain
bool
False
For Jupyter Notebooks, display plain report
markdown
bool
False
Display markdown formatted report
save_csv
bool
False
Save raw results to csv
csv_name
str
profiler.csv
CSV save location
rolling_average
int
10
Number of batches to average throughput over
drop_first_batch
bool
True
Drop the first batch from profiling
Output
The Simple Profiler report contains the following items divided in three Phases (Fit, Train, & Valid)
Fit:
fit: total time fitting the model takes.
epoch: duration of both training and validation epochs. Often epoch total time is the same amount of elapsed time as fit.
train: duration of each training epoch.
valid: duration of each validation epoch.
Train:
step: total duration of all batch steps including drawing the batch. Measured from before_draw to after_batch.
draw: time spent waiting for a batch to be drawn. Measured from before_draw to before_batch. Ideally this value should be as close to zero as possible.
batch: total duration of all batch steps except drawing the batch. Measured from before_batch to after_batch.
forward: duration of the forward pass and any additional batch modifications. Measured from before_batch to after_pred.
loss: duration of calculating loss. Measured from after_pred to after_loss.
backward: duration of the backward pass. Measured from before_backward to before_step.
opt_step: duration of the optimizer step. Measured from before_step to after_step.
zero_grad: duration of the zero_grad step. Measured from after_step to after_batch.
Valid:
step: total duration of all batch steps including drawing the batch. Measured from before_draw to after_batch.
draw: time spent waiting for a batch to be drawn. Measured from before_draw to before_batch. Ideally this value should be as close to zero as possible.
batch: total duration of all batch steps except drawing the batch. Measured from before_batch to after_batch.
predict: duration of the prediction pass and any additional batch modifications. Measured from before_batch to after_pred.
loss: duration of calculating loss. Measured from after_pred to after_loss.
The Throughput profiler only contains step, draw, and batch.
Examples
These examples are trained on Imagenette with an image size of 224 and batch size of 64 on a 3080 Ti.