Compiler [beta]

An experimental callback and patches to integrate torch.compile into fastai

The CompilerCallback is an experiment to provide an easy to use torch.compile integration for fastai.

torch.compile with the default inductor backend allows 30% to 2x speedups and 10% memory compression for both training and inference.

For more information on torch.compile please read PyTorch’s getting started guide. For troubleshooting torch.compile refer to this PyTorch Nightly guide.

This module is not imported via any fastxtend all imports. You must import it separately after importing fastai and fastxtend:

from fastxtend.callback import compiler

source

CompileMode

 CompileMode (value, names=None, module=None, qualname=None, type=None,
              start=1)

All valid torch.compile modes for tab-completion and typo-proofing

Currently, the ‘reduce-overhead’ mode doesn’t appear to train, instead the loss stagnates, and ‘max-autotune’ shouldn’t be used per Compile troubleshooting and gotchas.


source

MatMulPrecision

 MatMulPrecision (value, names=None, module=None, qualname=None,
                  type=None, start=1)

All valid matmul_precision modes for tab-completion and typo-proofing


source

CompilerCallback

 CompilerCallback (fullgraph:bool=False, dynamic:bool=False,
                   backend:str|Callable='inductor',
                   mode:str|CompileMode|None=None,
                   options:Dict[str,Union[str,int,bool]]|None=None,
                   matmul_precision:str|MatMulPrecision='high',
                   recompile:bool=False, verbose:bool=True)

An experimental callback for torch.compile (beta) and fastai

Type Default Details
fullgraph bool False Prevent breaking model into subgraphs
dynamic bool False Use dynamic shape tracing
backend str | Callable inductor torch.compile backend to use
mode str | CompileMode | None None torch.compile mode to use
options Dict[str, Union[str, int, bool]] | None None Extra options to pass to compile backend
matmul_precision str | MatMulPrecision high Set Ampere and newer TF32 matmul precision
recompile bool False Force a compiled model to recompile. Use when freezing/unfreezing a compiled model.
verbose bool True Verbose output

Using torch.compile with dynamic shapes and mode='max-autotune' is under active development and might fail. See Compile troubleshooting and gotchas for more details.

By default, CompilerCallback will set matmul ops to use TensorFloat32 for supported GPUs, which is the recommended setting for torch.compile. Set matmul_precision='highest' to turn off or matmul_precision='medium' to enable bfloat16 mode.

Convenience Method

fastxtend adds a convenience method to Learner to easily enable torch.compile.


source

Learner.compile

 Learner.compile (fullgraph:bool=False,
                  backend:Union[str,Callable]='inductor',
                  mode:Union[str,__main__.CompileMode,NoneType]=None,
                  options:Optional[Dict[str,Union[str,int,bool]]]=None, ma
                  tmul_precision:Union[str,__main__.MatMulPrecision]='high
                  ', recompile:bool=False, verbose:bool=True)

Set Learner to compile model using torch.compile.

Type Default Details
fullgraph bool False Prevent breaking model into subgraphs
backend str | Callable inductor torch.compile backend to use
mode str | CompileMode | None None torch.compile mode to use
options Dict[str, Union[str, int, bool]] | None None Extra options to pass to compile backend
matmul_precision str | MatMulPrecision high Set Ampere and newer TF32 matmul precision
recompile bool False Force a compiled model to recompile. Use when freezing/unfreezing a compiled model.
verbose bool True Verbose output

compile does not expose dynamic since it’s recommended not to be used with PyTorch 2.0. You can set it directly via CompilerCallback.

Compatability Patches

These patches integrate torch.compile with fastai saving, loading, freezing, unfreezing, and fine tuning.

Saving and Exporting


source

Learner.save

 Learner.save (file:Union[str,os.PathLike,BinaryIO,IO[bytes]],
               save_compiled:bool=False, with_opt=True, pickle_protocol=2)

Save model and optimizer state (if with_opt) to self.path/self.model_dir/file

Type Default Details
file FILE_LIKE Save file name, path, bytes, or IO
save_compiled bool False Save compiled model
with_opt bool True
pickle_protocol int 2

Saving a compiled model is supported, but for maximum compatiblity is turned off by default. Set save_compiled=True to save a compiled model.


source

Learner.export

 Learner.export
                 (fname:Union[str,os.PathLike,BinaryIO,IO[bytes]]='export.
                 pkl', pickle_module:Any=<module 'pickle' from '/opt/hoste
                 dtoolcache/Python/3.9.16/x64/lib/python3.9/pickle.py'>,
                 pickle_protocol:int=2)

Export the content of self without the items and the optimizer state for inference

Type Default Details
fname FILE_LIKE export.pkl Learner export file name, path, bytes, or IO
pickle_module Any pickle Module used for pickling metadata and objects
pickle_protocol int 2 Pickle protocol used

As of PyTorch 2.0 and 2.1 Nightly, compiled models cannot be pickled, so export sets Learner.model as the non-compiled model.


source

load_learner

 load_learner (fname:Union[str,os.PathLike,BinaryIO,IO[bytes]],
               cpu:bool=True, pickle_module=<module 'pickle' from '/opt/ho
               stedtoolcache/Python/3.9.16/x64/lib/python3.9/pickle.py'>)

Load a Learner object in fname, by default putting it on the cpu

Type Default Details
fname FILE_LIKE File name, path, bytes, or IO
cpu bool True Load model to CPU
pickle_module module pickle Module used for unpickling metadata and objects

By default, load_learner will remove the CompilerCallback.

Freezing and Unfreezing


source

Learner.freeze_to

 Learner.freeze_to (n:int)

Freeze parameter groups up to n

Freezing and unfreezing models works, but they need to be recompiled after. freeze_to will set CompilerCallback to recompile the model or warn users they need to manually recompile.

Training


source

Learner.fine_tune

 Learner.fine_tune (epochs:int, base_lr:float=0.002, freeze_epochs:int=1,
                    lr_mult:Union[int,float]=100, pct_start:float=0.3,
                    div:Union[int,float]=5.0, freeze_compile:bool=False,
                    lr_max=None, div_final=100000.0, wd=None, moms=None,
                    cbs=None, reset_opt=False, start_epoch=0)

Fine tune with Learner.freeze for freeze_epochs, then with Learner.unfreeze for epochs, using discriminative LR.

Type Default Details
epochs int Number of unfrozen epochs to train
base_lr float 0.002 Base learning rate, model head unfrozen learning rate
freeze_epochs int 1 Number of frozen epochs to train
lr_mult Numeric 100 Model stem unfrozen learning rate: base_lr/lr_mult
pct_start float 0.3 Start unfrozen learning rate cosine annealing
div Numeric 5.0 Initial unfrozen learning rate: base_lr/div
freeze_compile bool False pct_start for unfrozen fit_one_cycle
lr_max NoneType None
div_final float 100000.0
wd NoneType None
moms NoneType None
cbs NoneType None
reset_opt bool False
start_epoch int 0

By default, fine_tune will not compile the freeze_epochs, but this can be overridden by passing freeze_compile=True. If the model is already compiled, this will have no effect.