FFCV Loader

fastxtend’s fastai+FFCV Integrated DataLoader

fastxtend’s Loader adds fastai features to FFCV’s Loader, including one_batch, show_batch, show_results, and support for GPU batch transforms, to name a few.

source

Loader

 Loader (fname:str|Path, batch_size:int, num_workers:int=-1,
         os_cache:bool=True, order:ORDER_TYPE=<OrderOption.SEQUENTIAL: 1>,
         distributed:bool=False, seed:int|None=None,
         indices:Sequence[int]|None=None,
         pipelines:Mapping[str,Sequence[Operation|nn.Module]]={},
         custom_fields:Mapping[str,Field]={}, drop_last:bool|None=None,
         batches_ahead:int=2, recompile:bool=False,
         device:str|int|torch.device|None=None, async_tfms:bool=False,
         n_inp:int|None=None, split_idx:int|None=None, do_setup:bool=True,
         **kwargs)

FFCV Loader with fastai Transformed DataLoader TfmdDL batch transforms

	Type	Default	Details
fname	str \| Path		Path to the location of the dataset (FFCV beton format)
batch_size	int		Batch size
num_workers	int	-1	Number of CPU cores to use in parallel (default: All available up to 16)
os_cache	bool	True	Leverage the OS for caching. Beneficial when there is enough memory to cache the dataset
order	ORDER_TYPE	OrderOption.SEQUENTIAL	Dataset traversal order, one of: `SEQEUNTIAL`, `RANDOM`, `QUASI_RANDOM`
distributed	bool	False	Emulates the behavior of PyTorch’s DistributedSampler for distributed training
seed	int \| None	None	Random seed for batch ordering
indices	Sequence[int] \| None	None	Subset dataset by returning only these indices
pipelines	Mapping[str, Sequence[Operation \| nn.Module]]	{}	Dictionary defining for each field the sequence of Decoders and transforms to apply
custom_fields	Mapping[str, Field]	{}	Dictonary informing `Loader` of the types associated to fields that are using a custom type
drop_last	bool \| None	None	Drop non-full batch in each epoch. Defaults to True if order is `SEQEUNTIAL`
batches_ahead	int	2	Number of batches prepared in advance; balances latency and memory
recompile	bool	False	Recompile at every epoch. Required if FFCV augmentations change during training
device	str \| int \| torch.device \| None	None	Device to place batch. Defaults to fastai’s `default_device`
async_tfms	bool	False	Asynchronously run `batch_tfms` before batch is drawn.
n_inp	int \| None	None	Number of inputs to the model. Defaults to pipelines length minus 1
split_idx	int \| None	None	Apply batch transform(s) to training (0) or validation (1) set. Defaults to valid if order is `SEQEUNTIAL`
do_setup	bool	True	Run `setup()` for batch transform(s)
kwargs

Important Loader arguments:

order: Controls how much memory is used for dataset caching and whether the dataset is randomly shuffled. Can be one of RANDOM, QUASI_RANDOM, or SEQUENTIAL. See the note below for more details. Defaults to SEQUENTIAL, which is unrandomized.
os_cache: By default, FFCV will attempt to cache the entire dataset into RAM using the operating system’s caching. This can be changed by setting os_cache=False or setting the enviroment variable ‘FFCV_DEFAULT_CACHE_PROCESS’ to “True” or “1”. If os_cache=False then order must be set to QUASI_RANDOM for the training Loader.
num_workers: If not set, will use all CPU cores up to 16 by default.
batches_ahead: Controls the number of batches ahead the Loader works. Increasing uses more RAM, both CPU and GPU. Defaults to 2.
n_inp: Controls which inputs to pass to the model. By default, set to number of pipelines minus 1.
drop_last: Whether to drop the last partial batch. By default, will set to True if order is RANDOM or QUASI_RANDOM, False if SEQUENTIAL.
device: The device to place the processed batches of data on. Defaults to fastai.torch_core.default_device if not set.
async_tfms: Asynchronously apply batch_tfms before the batch is drawn. Can accelerate training if GPU compute isn’t fully saturated (95% or less) or if only using IntToFloatTensor and Normalize.
split_idx: This tells the fastai batch transforms what dataset they are operating on. By default will use 0 (train) if order is RANDOM or QUASI_RANDOM, 1 (valid) if SEQUENTIAL.
distributed: For distributed training on multiple GPUs. Emulates the behavior of PyTorch’s DistributedSampler. QUASI_RANDOM is unavailable with distributed training.

Note: Order Memory Usage

Each order option requires differing amounts of system memory.

RANDOM caches the entire dataset in memory for fast random sampling. RANDOM uses the most memory.
QUASI_RANDOM caches a subset of the dataset at a time in memory and randomly samples from the subset. Use when the entire dataset cannot fit into memory.
SEQUENTIAL requires least memory. It loads a few samples ahead of time. As the name suggests, it is not random, and primarly is for validation.

Asynchronous batch transforms can accelerate training by decreasing the draw time at the expense of slightly longer batch step. If the GPU isn’t fully saturated, usually 95% or less compute use, this will be a net gain in training performance. async_tfms=True pairs well with ProgressiveResize, as the GPU is almost never saturated when training on smaller then full sized images. When near or fully saturated, asynchronous batch transforms usually result a wash in training time.

source

Loader.one_batch

 Loader.one_batch (batches_ahead:bool=False)

Return one processed batch of input(s) and target(s), optionally loading batches_ahead

source

DataLoaderMixin.show_batch

 DataLoaderMixin.show_batch (b:Optional[Tuple[torch.Tensor,...]]=None,
                             max_n:int=9, ctxs=None, show:bool=True,
                             unique:bool=False, **kwargs)

Show max_n input(s) and target(s) from the batch.

	Type	Default	Details
b	Tuple[Tensor, …] \| None	None	Batch to show. If None calls `one_batch`
max_n	int	9	Maximum number of items to show
ctxs	NoneType	None	List of `ctx` objects to show data. Could be matplotlib axis, DataFrame etc
show	bool	True	If False, return decoded batch instead of showing
unique	bool	False	Whether to show only one
kwargs

source

DataLoaderMixin.show_results

 DataLoaderMixin.show_results (b, out, max_n:int=9, ctxs=None,
                               show:bool=True, **kwargs)

Show max_n results with input(s), target(s) and prediction(s).

	Type	Default	Details
b			Batch to show results for
out			Predicted output from model for the batch
max_n	int	9	Maximum number of items to show
ctxs	NoneType	None	List of `ctx` objects to show data. Could be matplotlib axis, DataFrame etc
show	bool	True	If False, return decoded batch instead of showing
kwargs

source

DataLoaderMixin.to

 DataLoaderMixin.to (device:Union[int,str,torch.device])

Sets self.device=device.

source

DataLoaderMixin.n_inp

 DataLoaderMixin.n_inp ()

Number of elements in a batch for model input

source

DataLoaderMixin.decode

 DataLoaderMixin.decode (b:Tuple[torch.Tensor,...])

Decode batch b

source

DataLoaderMixin.decode_batch

 DataLoaderMixin.decode_batch (b:Tuple[torch.Tensor,...], max_n:int=9)

Decode up to max_n input(s) from batch b