FFCV Loader

fastxtend’s fastai+FFCV Integrated DataLoader

fastxtend’s Loader adds fastai features to FFCV’s Loader, including one_batch, show_batch, show_results, and support for GPU batch transforms, to name a few.



 Loader (fname:str|Path, batch_size:int, num_workers:int=-1,
         os_cache:bool=True, order:ORDER_TYPE=<OrderOption.SEQUENTIAL: 1>,
         distributed:bool=False, seed:int|None=None,
         custom_fields:Mapping[str,Field]={}, drop_last:bool|None=None,
         batches_ahead:int=2, recompile:bool=False,
         device:str|int|torch.device|None=None, async_tfms:bool=False,
         n_inp:int|None=None, split_idx:int|None=None, do_setup:bool=True,

FFCV Loader with fastai Transformed DataLoader TfmdDL batch transforms

Type Default Details
fname str | Path Path to the location of the dataset (FFCV beton format)
batch_size int Batch size
num_workers int -1 Number of CPU cores to use in parallel (default: All available up to 16)
os_cache bool True Leverage the OS for caching. Beneficial when there is enough memory to cache the dataset
order ORDER_TYPE OrderOption.SEQUENTIAL Dataset traversal order, one of: SEQEUNTIAL, RANDOM, QUASI_RANDOM
distributed bool False Emulates the behavior of PyTorch’s DistributedSampler for distributed training
seed int | None None Random seed for batch ordering
indices Sequence[int] | None None Subset dataset by returning only these indices
pipelines Mapping[str, Sequence[Operation | nn.Module]] {} Dictionary defining for each field the sequence of Decoders and transforms to apply
custom_fields Mapping[str, Field] {} Dictonary informing Loader of the types associated to fields that are using a custom type
drop_last bool | None None Drop non-full batch in each epoch. Defaults to True if order is SEQEUNTIAL
batches_ahead int 2 Number of batches prepared in advance; balances latency and memory
recompile bool False Recompile at every epoch. Required if FFCV augmentations change during training
device str | int | torch.device | None None Device to place batch. Defaults to fastai’s default_device
async_tfms bool False Asynchronously run batch_tfms before batch is drawn.
n_inp int | None None Number of inputs to the model. Defaults to pipelines length minus 1
split_idx int | None None Apply batch transform(s) to training (0) or validation (1) set. Defaults to valid if order is SEQEUNTIAL
do_setup bool True Run setup() for batch transform(s)

Important Loader arguments:

  • order: Controls how much memory is used for dataset caching and whether the dataset is randomly shuffled. Can be one of RANDOM, QUASI_RANDOM, or SEQUENTIAL. See the note below for more details. Defaults to SEQUENTIAL, which is unrandomized.

  • os_cache: By default, FFCV will attempt to cache the entire dataset into RAM using the operating system’s caching. This can be changed by setting os_cache=False or setting the enviroment variable ‘FFCV_DEFAULT_CACHE_PROCESS’ to “True” or “1”. If os_cache=False then order must be set to QUASI_RANDOM for the training Loader.

  • num_workers: If not set, will use all CPU cores up to 16 by default.

  • batches_ahead: Controls the number of batches ahead the Loader works. Increasing uses more RAM, both CPU and GPU. Defaults to 2.

  • n_inp: Controls which inputs to pass to the model. By default, set to number of pipelines minus 1.

  • drop_last: Whether to drop the last partial batch. By default, will set to True if order is RANDOM or QUASI_RANDOM, False if SEQUENTIAL.

  • device: The device to place the processed batches of data on. Defaults to fastai.torch_core.default_device if not set.

  • async_tfms: Asynchronously apply batch_tfms before the batch is drawn. Can accelerate training if GPU compute isn’t fully saturated (95% or less) or if only using IntToFloatTensor and Normalize.

  • split_idx: This tells the fastai batch transforms what dataset they are operating on. By default will use 0 (train) if order is RANDOM or QUASI_RANDOM, 1 (valid) if SEQUENTIAL.

  • distributed: For distributed training on multiple GPUs. Emulates the behavior of PyTorch’s DistributedSampler. QUASI_RANDOM is unavailable with distributed training.

Note: Order Memory Usage

Each order option requires differing amounts of system memory.

  • RANDOM caches the entire dataset in memory for fast random sampling. RANDOM uses the most memory.

  • QUASI_RANDOM caches a subset of the dataset at a time in memory and randomly samples from the subset. Use when the entire dataset cannot fit into memory.

  • SEQUENTIAL requires least memory. It loads a few samples ahead of time. As the name suggests, it is not random, and primarly is for validation.

Asynchronous batch transforms can accelerate training by decreasing the draw time at the expense of slightly longer batch step. If the GPU isn’t fully saturated, usually 95% or less compute use, this will be a net gain in training performance. async_tfms=True pairs well with ProgressiveResize, as the GPU is almost never saturated when training on smaller then full sized images. When near or fully saturated, asynchronous batch transforms usually result a wash in training time.



 Loader.one_batch (batches_ahead:bool=False)

Return one processed batch of input(s) and target(s), optionally loading batches_ahead



 DataLoaderMixin.show_batch (b:Optional[Tuple[torch.Tensor,...]]=None,
                             max_n:int=9, ctxs=None, show:bool=True,
                             unique:bool=False, **kwargs)

Show max_n input(s) and target(s) from the batch.

Type Default Details
b Tuple[Tensor, …] | None None Batch to show. If None calls one_batch
max_n int 9 Maximum number of items to show
ctxs NoneType None List of ctx objects to show data. Could be matplotlib axis, DataFrame etc
show bool True If False, return decoded batch instead of showing
unique bool False Whether to show only one



 DataLoaderMixin.show_results (b, out, max_n:int=9, ctxs=None,
                               show:bool=True, **kwargs)

Show max_n results with input(s), target(s) and prediction(s).

Type Default Details
b Batch to show results for
out Predicted output from model for the batch
max_n int 9 Maximum number of items to show
ctxs NoneType None List of ctx objects to show data. Could be matplotlib axis, DataFrame etc
show bool True If False, return decoded batch instead of showing



 DataLoaderMixin.to (device:Union[int,str,torch.device])

Sets self.device=device.



 DataLoaderMixin.n_inp ()

Number of elements in a batch for model input



 DataLoaderMixin.decode (b:Tuple[torch.Tensor,...])

Decode batch b



 DataLoaderMixin.decode_batch (b:Tuple[torch.Tensor,...], max_n:int=9)

Decode up to max_n input(s) from batch b