FFCV Loader
fastxtend’s Loader
adds fastai features to FFCV’s Loader, including one_batch
, show_batch
, show_results
, and support for GPU batch transforms, to name a few.
Loader
Loader (fname:str|Path, batch_size:int, num_workers:int=-1, os_cache:bool=True, order:ORDER_TYPE=<OrderOption.SEQUENTIAL: 1>, distributed:bool=False, seed:int|None=None, indices:Sequence[int]|None=None, pipelines:Mapping[str,Sequence[Operation|nn.Module]]={}, custom_fields:Mapping[str,Field]={}, drop_last:bool|None=None, batches_ahead:int=2, recompile:bool=False, device:str|int|torch.device|None=None, async_tfms:bool=False, n_inp:int|None=None, split_idx:int|None=None, do_setup:bool=True, **kwargs)
FFCV Loader
with fastai Transformed DataLoader TfmdDL
batch transforms
Type | Default | Details | |
---|---|---|---|
fname | str | Path | Path to the location of the dataset (FFCV beton format) | |
batch_size | int | Batch size | |
num_workers | int | -1 | Number of CPU cores to use in parallel (default: All available up to 16) |
os_cache | bool | True | Leverage the OS for caching. Beneficial when there is enough memory to cache the dataset |
order | ORDER_TYPE | OrderOption.SEQUENTIAL | Dataset traversal order, one of: SEQEUNTIAL , RANDOM , QUASI_RANDOM |
distributed | bool | False | Emulates the behavior of PyTorch’s DistributedSampler for distributed training |
seed | int | None | None | Random seed for batch ordering |
indices | Sequence[int] | None | None | Subset dataset by returning only these indices |
pipelines | Mapping[str, Sequence[Operation | nn.Module]] | {} | Dictionary defining for each field the sequence of Decoders and transforms to apply |
custom_fields | Mapping[str, Field] | {} | Dictonary informing Loader of the types associated to fields that are using a custom type |
drop_last | bool | None | None | Drop non-full batch in each epoch. Defaults to True if order is SEQEUNTIAL |
batches_ahead | int | 2 | Number of batches prepared in advance; balances latency and memory |
recompile | bool | False | Recompile at every epoch. Required if FFCV augmentations change during training |
device | str | int | torch.device | None | None | Device to place batch. Defaults to fastai’s default_device |
async_tfms | bool | False | Asynchronously run batch_tfms before batch is drawn. |
n_inp | int | None | None | Number of inputs to the model. Defaults to pipelines length minus 1 |
split_idx | int | None | None | Apply batch transform(s) to training (0) or validation (1) set. Defaults to valid if order is SEQEUNTIAL |
do_setup | bool | True | Run setup() for batch transform(s) |
kwargs |
Important Loader
arguments:
order
: Controls how much memory is used for dataset caching and whether the dataset is randomly shuffled. Can be one ofRANDOM
,QUASI_RANDOM
, orSEQUENTIAL
. See the note below for more details. Defaults toSEQUENTIAL
, which is unrandomized.os_cache
: By default, FFCV will attempt to cache the entire dataset into RAM using the operating system’s caching. This can be changed by settingos_cache=False
or setting the enviroment variable ‘FFCV_DEFAULT_CACHE_PROCESS’ to “True” or “1”. Ifos_cache=False
thenorder
must be set toQUASI_RANDOM
for the trainingLoader
.num_workers
: If not set, will use all CPU cores up to 16 by default.batches_ahead
: Controls the number of batches ahead theLoader
works. Increasing uses more RAM, both CPU and GPU. Defaults to 2.n_inp
: Controls which inputs to pass to the model. By default, set to number of pipelines minus 1.drop_last
: Whether to drop the last partial batch. By default, will set to True iforder
isRANDOM
orQUASI_RANDOM
, False ifSEQUENTIAL
.device
: The device to place the processed batches of data on. Defaults tofastai.torch_core.default_device
if not set.async_tfms
: Asynchronously applybatch_tfms
before the batch is drawn. Can accelerate training if GPU compute isn’t fully saturated (95% or less) or if only usingIntToFloatTensor
andNormalize
.split_idx
: This tells the fastai batch transforms what dataset they are operating on. By default will use 0 (train) iforder
isRANDOM
orQUASI_RANDOM
, 1 (valid) ifSEQUENTIAL
.distributed
: For distributed training on multiple GPUs. Emulates the behavior of PyTorch’sDistributedSampler
.QUASI_RANDOM
is unavailable with distributed training.
Each order
option requires differing amounts of system memory.
RANDOM
caches the entire dataset in memory for fast random sampling.RANDOM
uses the most memory.QUASI_RANDOM
caches a subset of the dataset at a time in memory and randomly samples from the subset. Use when the entire dataset cannot fit into memory.SEQUENTIAL
requires least memory. It loads a few samples ahead of time. As the name suggests, it is not random, and primarly is for validation.
Asynchronous batch transforms can accelerate training by decreasing the draw time at the expense of slightly longer batch step. If the GPU isn’t fully saturated, usually 95% or less compute use, this will be a net gain in training performance. async_tfms=True
pairs well with ProgressiveResize
, as the GPU is almost never saturated when training on smaller then full sized images. When near or fully saturated, asynchronous batch transforms usually result a wash in training time.
Loader.one_batch
Loader.one_batch (batches_ahead:bool=False)
Return one processed batch of input(s) and target(s), optionally loading batches_ahead
DataLoaderMixin.show_batch
DataLoaderMixin.show_batch (b:Optional[Tuple[torch.Tensor,...]]=None, max_n:int=9, ctxs=None, show:bool=True, unique:bool=False, **kwargs)
Show max_n
input(s) and target(s) from the batch.
Type | Default | Details | |
---|---|---|---|
b | Tuple[Tensor, …] | None | None | Batch to show. If None calls one_batch |
max_n | int | 9 | Maximum number of items to show |
ctxs | NoneType | None | List of ctx objects to show data. Could be matplotlib axis, DataFrame etc |
show | bool | True | If False, return decoded batch instead of showing |
unique | bool | False | Whether to show only one |
kwargs |
DataLoaderMixin.show_results
DataLoaderMixin.show_results (b, out, max_n:int=9, ctxs=None, show:bool=True, **kwargs)
Show max_n
results with input(s), target(s) and prediction(s).
Type | Default | Details | |
---|---|---|---|
b | Batch to show results for | ||
out | Predicted output from model for the batch | ||
max_n | int | 9 | Maximum number of items to show |
ctxs | NoneType | None | List of ctx objects to show data. Could be matplotlib axis, DataFrame etc |
show | bool | True | If False, return decoded batch instead of showing |
kwargs |
DataLoaderMixin.to
DataLoaderMixin.to (device:Union[int,str,torch.device])
Sets self.device=device
.
DataLoaderMixin.n_inp
DataLoaderMixin.n_inp ()
Number of elements in a batch for model input
DataLoaderMixin.decode
DataLoaderMixin.decode (b:Tuple[torch.Tensor,...])
Decode batch b
DataLoaderMixin.decode_batch
DataLoaderMixin.decode_batch (b:Tuple[torch.Tensor,...], max_n:int=9)
Decode up to max_n
input(s) from batch b