Profiler Callbacks

Throughput and Simple Profilers for fastai. Inspired by PyTorch Lightning’s SimpleProfiler.

Since fastxtend profilers change the fastai data loading loop, they are not imported by any of the fastxtend all imports and need to be imported seperately:

from fastxtend.callback import profiler

Warning

Throughput and Simple Profiler are untested on distributed training.

Jump to usage examples.

Events

fastai callbacks do not have an event which is called directly before drawing a batch. fastxtend profilers add a new callback event called before_draw.

With a fastxtend profiler imported, a callback can implement actions on the following events:

after_create: called after the Learner is created
before_fit: called before starting training or inference, ideal for initial setup.
before_epoch: called at the beginning of each epoch, useful for any behavior you need to reset at each epoch.
before_train: called at the beginning of the training part of an epoch.
before_draw: called at the beginning of each batch, just before drawing said batch.
before_batch: called at the beginning of each batch, just after drawing said batch. It can be used to do any setup necessary for the batch (like hyper-parameter scheduling) or to change the input/target before it goes in the model (change of the input with techniques like mixup for instance).
after_pred: called after computing the output of the model on the batch. It can be used to change that output before it’s fed to the loss.
after_loss: called after the loss has been computed, but before the backward pass. It can be used to add any penalty to the loss (AR or TAR in RNN training for instance).
before_backward: called after the loss has been computed, but only in training mode (i.e. when the backward pass will be used)
before_step: called after the backward pass, but before the update of the parameters. It can be used to do any change to the gradients before said update (gradient clipping for instance).
after_step: called after the step and before the gradients are zeroed.
after_batch: called at the end of a batch, for any clean-up before the next one.
after_train: called at the end of the training phase of an epoch.
before_validate: called at the beginning of the validation phase of an epoch, useful for any setup needed specifically for validation.
after_validate: called at the end of the validation part of an epoch.
after_epoch: called at the end of an epoch, for any clean-up before the next one.
after_fit: called at the end of training, for final clean-up.

Throughput

The Throughput profiler only measures the step, draw, and batch. To use, both ThroughputCallback and ThroughputPostCallback must be added to the Learner. The recommended way to use is via Learner.profile.

source

ThroughputCallback

 ThroughputCallback (show_report:bool=True, plain:bool=False,
                     markdown:bool=False, save_csv:bool=False,
                     csv_name:str='throughput.csv',
                     rolling_average:int=10, drop_first_batch:bool=True)

Adds a throughput profiler to the fastai Learner. Optionally showing formatted report or saving unformatted results as csv.

Pair with ThroughputPostCallback to profile training performance.

Post fit, access report & results via Learner.profile_report & Learner.profile_results.

	Type	Default	Details
show_report	bool	True	Display formatted report post profile
plain	bool	False	For Jupyter Notebooks, display plain report
markdown	bool	False	Display markdown formatted report
save_csv	bool	False	Save raw results to csv
csv_name	str	throughput.csv	CSV save location
rolling_average	int	10	Number of batches to average throughput over
drop_first_batch	bool	True	Drop the first batch from profiling

source

ThroughputPostCallback

 ThroughputPostCallback ()

Required pair with ThroughputCallback to profile training performance. Removes itself after training is over.

Simple Profiler

To use, both SimpleProfilerCallback and SimpleProfilerPostCallback must be added to the Learner. The recommended way to use is via Learner.profile.

source

SimpleProfilerCallback

 SimpleProfilerCallback (show_report:bool=True, plain:bool=False,
                         markdown:bool=False, save_csv:bool=False,
                         csv_name:str='simpleprofiler.csv',
                         rolling_average:int=10,
                         drop_first_batch:bool=True)

Adds a simple profiler to the fastai Learner. Optionally showing formatted report or saving unformatted results as csv.

Pair with SimpleProfilerPostCallback to profile training performance.

Post fit, access report & results via Learner.profile_report & Learner.profile_results.

	Type	Default	Details
show_report	bool	True	Display formatted report post profile
plain	bool	False	For Jupyter Notebooks, display plain report
markdown	bool	False	Display markdown formatted report
save_csv	bool	False	Save raw results to csv
csv_name	str	simpleprofiler.csv	CSV save location
rolling_average	int	10	Number of batches to average throughput over
drop_first_batch	bool	True	Drop the first batch from profiling

source

SimpleProfilerPostCallback

 SimpleProfilerPostCallback ()

Required pair with SimpleProfilerCallback to profile training performance. Removes itself after training is over.

Convenience Method

Learner.profile is the easy and recommended way to use a fastxtend profiler.

source

ProfileMode

 ProfileMode (value, names=None, module=None, qualname=None, type=None,
              start=1)

Profile enum for Learner.profile

source

Learner.profile

 Learner.profile (mode:__main__.ProfileMode=<ProfileMode.Throughput:
                  'throughput'>, show_report:bool=True, plain:bool=False,
                  markdown:bool=False, save_csv:bool=False,
                  csv_name:str='profiler.csv', rolling_average:int=10,
                  drop_first_batch:bool=True)

Run a fastxtend profiler which removes itself when finished training.

	Type	Default	Details
mode	ProfileMode	ProfileMode.Throughput	Which profiler to use. Throughput or Simple.
show_report	bool	True	Display formatted report post profile
plain	bool	False	For Jupyter Notebooks, display plain report
markdown	bool	False	Display markdown formatted report
save_csv	bool	False	Save raw results to csv
csv_name	str	profiler.csv	CSV save location
rolling_average	int	10	Number of batches to average throughput over
drop_first_batch	bool	True	Drop the first batch from profiling

Output

The Simple Profiler report contains the following items divided in three Phases (Fit, Train, & Valid)

Fit:

fit: total time fitting the model takes.
epoch: duration of both training and validation epochs. Often epoch total time is the same amount of elapsed time as fit.
train: duration of each training epoch.
valid: duration of each validation epoch.

Train:

step: total duration of all batch steps including drawing the batch. Measured from before_draw to after_batch.
draw: time spent waiting for a batch to be drawn. Measured from before_draw to before_batch. Ideally this value should be as close to zero as possible.
batch: total duration of all batch steps except drawing the batch. Measured from before_batch to after_batch.
forward: duration of the forward pass and any additional batch modifications. Measured from before_batch to after_pred.
loss: duration of calculating loss. Measured from after_pred to after_loss.
backward: duration of the backward pass. Measured from before_backward to before_step.
opt_step: duration of the optimizer step. Measured from before_step to after_step.
zero_grad: duration of the zero_grad step. Measured from after_step to after_batch.

Valid:

step: total duration of all batch steps including drawing the batch. Measured from before_draw to after_batch.
draw: time spent waiting for a batch to be drawn. Measured from before_draw to before_batch. Ideally this value should be as close to zero as possible.
batch: total duration of all batch steps except drawing the batch. Measured from before_batch to after_batch.
predict: duration of the prediction pass and any additional batch modifications. Measured from before_batch to after_pred.
loss: duration of calculating loss. Measured from after_pred to after_loss.

The Throughput profiler only contains step, draw, and batch.

Examples

These examples are trained on Imagenette with an image size of 224 and batch size of 64 on a 3080 Ti.

learn = Learner(dls, xresnext50(n_out=dls.c), opt_func=adam(foreach=True),
                metrics=Accuracy()).to_channelslast().profile()
learn.fit_one_cycle(2, 3e-3)

epoch	train_loss	valid_loss	accuracy	time
0	1.501953	1.734705	0.472357	00:18
1	1.040516	0.913281	0.712866	00:16

Profiling Results
Phase	Action	Mean Duration	Duration Std Dev	Number of Calls	Samples/Second	Total Time	Percent of Total
fit		-	-	1	-	35.63 s	100%
	epoch	17.81 s	838.2ms	2	-	35.63 s	100%
	train	14.24 s	797.1ms	2	678	28.49 s	80%
	valid	3.565 s	39.48ms	2	1,311	7.130 s	20%
train	step	86.62ms	41.67ms	293	739	25.38 s	71%
	draw	4.269ms	37.39ms	293	-38	1.251 s	4%
	batch	82.35ms	4.472ms	293	777	24.13 s	68%
valid	step	43.05ms	63.38ms	123	1,470	5.295 s	15%
	draw	14.46ms	60.89ms	123	-744	1.779 s	5%
	batch	28.59ms	11.42ms	123	2,214	3.516 s	10%

Batch dropped. train and valid phases show 1 less batch than fit.

learn = Learner(dls, xresnext50(n_out=dls.c), opt_func=adam(foreach=True),
                metrics=Accuracy()).to_channelslast().profile(ProfileMode.Simple)
learn.fit_one_cycle(2, 3e-3)

epoch	train_loss	valid_loss	accuracy	time
0	1.497550	2.453694	0.428535	00:17
1	0.997146	0.888791	0.723057	00:17

Profiling Results
Phase	Action	Mean Duration	Duration Std Dev	Number of Calls	Samples/Second	Total Time	Percent of Total
fit		-	-	1	-	34.55 s	100%
	epoch	17.27 s	44.73ms	2	-	34.54 s	100%
	train	13.64 s	4.756ms	2	709	27.28 s	79%
	valid	3.629 s	48.68ms	2	1,291	7.259 s	21%
train	step	87.64ms	44.58ms	293	730	25.68 s	74%
	draw	4.428ms	39.70ms	293	-39	1.297 s	4%
	batch	83.22ms	6.353ms	293	769	24.38 s	71%
	forward	16.65ms	5.732ms	293	3,843	4.880 s	14%
	loss	771.3µs	196.1µs	293	82,977	226.0ms	1%
	backward	19.10ms	5.501ms	293	3,351	5.597 s	16%
	opt_step	45.46ms	5.934ms	293	1,408	13.32 s	39%
	zero_grad	1.106ms	298.9µs	293	-	324.1ms	1%
valid	step	43.94ms	67.12ms	123	1,441	5.404 s	16%
	draw	15.77ms	63.35ms	123	-807	1.940 s	6%
	batch	28.16ms	11.90ms	123	2,248	3.464 s	10%
	predict	26.60ms	11.17ms	123	2,379	3.272 s	9%
	loss	1.353ms	1.795ms	123	46,800	166.4ms	0%

Batch dropped. train and valid phases show 1 less batch than fit.

New Training Loop

The show_training_loop output below shows where the new before_draw event fits into the training loop.

learn = synth_learner()
learn.show_training_loop()

Start Fit
   - before_fit     : [TrainEvalCallback, Recorder, ProgressCallback]
  Start Epoch Loop
     - before_epoch   : [Recorder, ProgressCallback]
    Start Train
       - before_train   : [TrainEvalCallback, Recorder, ProgressCallback]
      Start Batch Loop
         - before_draw    : []
         - before_batch   : [CastToTensor]
         - after_pred     : []
         - after_loss     : []
         - before_backward: []
         - before_step    : []
         - after_step     : []
         - after_cancel_batch: []
         - after_batch    : [TrainEvalCallback, Recorder, ProgressCallback]
      End Batch Loop
    End Train
     - after_cancel_train: [Recorder]
     - after_train    : [Recorder, ProgressCallback]
    Start Valid
       - before_validate: [TrainEvalCallback, Recorder, ProgressCallback]
      Start Batch Loop
         - **CBs same as train batch**: []
      End Batch Loop
    End Valid
     - after_cancel_validate: [Recorder]
     - after_validate : [Recorder, ProgressCallback]
  End Epoch Loop
   - after_cancel_epoch: []
   - after_epoch    : [Recorder]
End Fit
 - after_cancel_fit: []
 - after_fit      : [ProgressCallback]

Logging

Profiler callbacks support logging to Weights & Biases and TensorBoard via the LogDispatch callback. If either the fastai.callback.wandb.WandbCallback or fastai.callback.tensorboard.TensorBoardCallback are added to Learner, will automatically logs samples/second for draw, batch, forward, loss, backward, and opt_step.

If Weights & Biases is installed Simple Profiler also logs two tables to active wandb run:

profile_report: formatted report from Simple Profiler
profile_results: raw results from Simple Profiler