Train fastai models faster (and other useful tools)

fastxtend accelerates fastai

Train fastai models faster with fastxtend’s fused optimizers, Progressive Resizing callback, integrated FFCV DataLoader, and integrated PyTorch Compile support.

Feature overview

Train Models Faster

General Features


Check out the documentation for additional splitters, callbacks, schedulers, utilities, and more.


fastxtend is avalible on pypi:

pip install fastxtend

fastxtend can be installed with task-specific dependencies for vision, ffcv, text, audio, or all:

pip install "fastxtend[all]"

To easily install most prerequisites for all fastxtend features, use Conda or Miniconda:

conda create -n fastxtend python=3.11 "pytorch>=2.1" torchvision torchaudio \
pytorch-cuda=12.1 fastai nbdev pkg-config libjpeg-turbo opencv tqdm psutil \
terminaltables numpy "numba>=0.57" librosa timm kornia rich typer wandb \
"transformers>=4.34" "tokenizers>=0.14" "datasets>=2.14" ipykernel ipywidgets \
"matplotlib<3.8" -c pytorch -c nvidia -c fastai -c huggingface -c conda-forge

conda activate fastxtend

pip install "fastxtend[all]"

replacing pytorch-cuda=12.1 with your preferred supported version of Cuda.

To create an editable development install:

git clone
cd fastxtend
pip install -e ".[dev]"


Like fastai, fastxtend provides safe wildcard imports using python’s __all__.

from import *
from import *
from fastxtend.ffcv.all import *

In general, import fastxtend after all fastai imports, as fastxtend modifies fastai. Any method modified by fastxtend is backwards compatible with the original fastai code.


Use a fused ForEach optimizer:

Learner(..., opt_func=adam(foreach=True))

Or a bitsandbytes 8-bit optimizer:

Learner(..., opt_func=adam(eightbit=True))

Speed up image training using Progressive Resizing:

Learner(... cbs=ProgressiveResize())

Log an accuracy metric on the training set as a smoothed metric and validation set like normal:

Learner(..., metrics=[Accuracy(log_metric=LogMetric.Train, metric_type=MetricType.Smooth),

Log multiple losses as individual metrics on train and valid:

mloss = MultiLoss(loss_funcs=[nn.MSELoss, nn.L1Loss],
                  weights=[1, 3.5], loss_names=['mse_loss', 'l1_loss'])

Learner(..., loss_func=mloss, metrics=RMSE(), cbs=MultiLossCallback)

Compile a model with torch.compile:

from fastxtend.callback import compiler

learn = Learner(...).compile()

Profile a fastai training loop:

from fastxtend.callback import simpleprofiler

learn = Learner(...).profile()
learn.fit_one_cycle(2, 3e-3)


To run the benchmark on your own machine, see the example scripts for details on how to replicate.