fastxtend

Train fastai models faster (and other useful tools)

fastxtend accelerates fastai

Train fastai models faster with fastxtend’s fused optimizers, Progressive Resizing callback, integrated FFCV DataLoader, and integrated PyTorch Compile support.

Feature overview

Train Models Faster

Drop in fused optimizers, which are 21 to 293 percent faster then fastai native optimizers.
Up to 75% optimizer memory savings with integrated bitsandbytes 8-bit optimizers.
Increase GPU throughput and decrease training time with the Progressive Resizing callback.
Use the highly optimized FFCV DataLoader, fully integrated with fastai.
Integrated support for torch.compile via the Compile callbacks.

General Features

Fused implementations of modern optimizers, such as Adan, Lion, & StableAdam.
Hugging Face Transformers compatibility with fastai
Flexible metrics which can log on train, valid, or both. Backwards compatible with fastai metrics.
Easily use multiple losses and log each individual loss on train and valid.
Multiple profilers for profiling training and identifying bottlenecks.
A fast Exponential Moving Average callback for smoother training.

Vision

Apply MixUp, CutMix, or Augmentations at once with CutMixUp or CutMixUpAugment.
Additional image augmentations.
Support for running fastai batch transforms on CPU.
More attention and pooling modules
A flexible implementation of fastai’s XResNet.

Check out the documentation for additional splitters, callbacks, schedulers, utilities, and more.

Install

fastxtend is avalible on pypi:

pip install fastxtend

fastxtend can be installed with task-specific dependencies for vision, ffcv, text, audio, or all:

pip install "fastxtend[all]"

To easily install most prerequisites for all fastxtend features, use Conda or Miniconda:

conda create -n fastxtend python=3.11 "pytorch>=2.1" torchvision torchaudio \
pytorch-cuda=12.1 fastai nbdev pkg-config libjpeg-turbo opencv tqdm psutil \
terminaltables numpy "numba>=0.57" librosa timm kornia rich typer wandb \
"transformers>=4.34" "tokenizers>=0.14" "datasets>=2.14" ipykernel ipywidgets \
"matplotlib<3.8" -c pytorch -c nvidia -c fastai -c huggingface -c conda-forge

conda activate fastxtend

pip install "fastxtend[all]"

replacing pytorch-cuda=12.1 with your preferred supported version of Cuda.

To create an editable development install:

git clone https://github.com/warner-benjamin/fastxtend.git
cd fastxtend
pip install -e ".[dev]"

Usage

Like fastai, fastxtend provides safe wildcard imports using python’s __all__.

from fastai.vision.all import *
from fastxtend.vision.all import *
from fastxtend.ffcv.all import *

In general, import fastxtend after all fastai imports, as fastxtend modifies fastai. Any method modified by fastxtend is backwards compatible with the original fastai code.

Examples

Use a fused ForEach optimizer:

Learner(..., opt_func=adam(foreach=True))

Or a bitsandbytes 8-bit optimizer:

Learner(..., opt_func=adam(eightbit=True))

Speed up image training using Progressive Resizing:

Learner(... cbs=ProgressiveResize())

Log an accuracy metric on the training set as a smoothed metric and validation set like normal:

Learner(..., metrics=[Accuracy(log_metric=LogMetric.Train, metric_type=MetricType.Smooth),
                      Accuracy()])

Log multiple losses as individual metrics on train and valid:

mloss = MultiLoss(loss_funcs=[nn.MSELoss, nn.L1Loss],
                  weights=[1, 3.5], loss_names=['mse_loss', 'l1_loss'])

Learner(..., loss_func=mloss, metrics=RMSE(), cbs=MultiLossCallback)

Compile a model with torch.compile:

from fastxtend.callback import compiler

learn = Learner(...).compile()

Profile a fastai training loop:

from fastxtend.callback import simpleprofiler

learn = Learner(...).profile()
learn.fit_one_cycle(2, 3e-3)

Benchmark

To run the benchmark on your own machine, see the example scripts for details on how to replicate.