FFCV Loader
fastxtend’s Loader
adds fastai features to FFCV’s Loader, including one_batch
, show_batch
, show_results
, and support for GPU batch transforms, to name a few.
Loader
Loader (fname:str|Path, batch_size:int, num_workers:int=-1, os_cache:bool=True, order:ORDER_TYPE=<OrderOption.SEQUENTIAL: 1>, distributed:bool=False, seed:int|None=None, indices:Sequence[int]|None=None, pipelines:Mapping[str,Sequence[Operation|nn.Module]]={}, custom_fields:Mapping[str,Field]={}, drop_last:bool|None=None, batches_ahead:int=3, recompile:bool=False, device:str|int|torch.device|None=None, n_inp:int|None=None, split_idx:int|None=None, do_setup:bool=True, **kwargs)
FFCV Loader
with fastai Transformed DataLoader TfmdDL
batch transforms
Type | Default | Details | |
---|---|---|---|
fname | str | Path | Path to the location of the dataset (FFCV beton format) | |
batch_size | int | Batch size | |
num_workers | int | -1 | Number of CPU cores to use in parallel (default: All available up to 16) |
os_cache | bool | True | Leverage the OS for caching. Beneficial when there is enough memory to cache the dataset |
order | ORDER_TYPE | OrderOption.SEQUENTIAL | Dataset traversal order, one of: SEQEUNTIAL , RANDOM , QUASI_RANDOM |
distributed | bool | False | Emulates the behavior of PyTorch’s DistributedSampler for distributed training |
seed | int | None | None | Random seed for batch ordering |
indices | Sequence[int] | None | None | Subset dataset by returning only these indices |
pipelines | Mapping[str, Sequence[Operation | nn.Module]] | {} | Dictionary defining for each field the sequence of Decoders and transforms to apply |
custom_fields | Mapping[str, Field] | {} | Dictonary informing Loader of the types associated to fields that are using a custom type |
drop_last | bool | None | None | Drop non-full batch in each epoch. Defaults to True if order is SEQEUNTIAL |
batches_ahead | int | 3 | Number of batches prepared in advance; balances latency and memory |
recompile | bool | False | Recompile at every epoch. Required if FFCV augmentations change during training |
device | str | int | torch.device | None | None | Device to place batch. Defaults to fastai’s default_device |
n_inp | int | None | None | Number of inputs to the model. Defaults to pipelines length minus 1 |
split_idx | int | None | None | Apply batch transform(s) to training (0) or validation (1) set. Defaults to valid if order is SEQEUNTIAL |
do_setup | bool | True | Run setup() for batch transform(s) |
kwargs |
Important Loader
arguments:
order
: Controls how much memory is used for dataset caching and whether the dataset is randomly shuffled. Can be one ofRANDOM
,QUASI_RANDOM
, orSEQUENTIAL
. See the note below for more details. Defaults toSEQUENTIAL
, which is unrandomized.os_cache
: By default, FFCV will attempt to cache the entire dataset into RAM using the operating system’s caching. This can be changed by settingos_cache=False
or setting the enviroment variable ‘FFCV_DEFAULT_CACHE_PROCESS’ to “True” or “1”. Ifos_cache=False
thenorder
must be set toQUASI_RANDOM
for the trainingLoader
.num_workers
: If not set, will use all CPU cores up to 16 by default.batches_ahead
: Controls the number of batches ahead theLoader
works. Increasing uses more RAM, both CPU and GPU. Defaults to 3.n_inp
: Controls which inputs to pass to the model. By default, set to number of pipelines minus 1.drop_last
: Whether to drop the last partial batch. By default, will set to True iforder
isRANDOM
orQUASI_RANDOM
, False ifSEQUENTIAL
.device
: The device to place the processed batches of data on. Defaults tofastai.torch_core.default_device
if not set.split_idx
: This tells the fastai batch transforms what dataset they are operating on. By default will use 0 (train) iforder
isRANDOM
orQUASI_RANDOM
, 1 (valid) ifSEQUENTIAL
.distributed
: For distributed training on multiple GPUs. Emulates the behavior of PyTorch’sDistributedSampler
.QUASI_RANDOM
is unavailable with distributed training.
Each order
option requires differing amounts of system memory.
RANDOM
caches the entire dataset in memory for fast random sampling.RANDOM
uses the most memory.QUASI_RANDOM
caches a subset of the dataset at a time in memory and randomly samples from the subset. Use when the entire dataset cannot fit into memory.SEQUENTIAL
requires least memory. It loads a few samples ahead of time. As the name suggests, it is not random, and primarly is for validation.
Loader.one_batch
Loader.one_batch ()
Return one processed batch of input(s) and target(s)
Loader.show_batch
Loader.show_batch (b=None, max_n:int=9, ctxs=None, show:bool=True, unique:bool=False, **kwargs)
Show max_n
input(s) and target(s) from the batch.
Type | Default | Details | |
---|---|---|---|
b | NoneType | None | Batch to show |
max_n | int | 9 | Maximum number of items to show |
ctxs | NoneType | None | List of ctx objects to show data. Could be matplotlib axis, DataFrame etc |
show | bool | True | Whether to display data |
unique | bool | False | Whether to show only one |
kwargs |
Loader.show_results
Loader.show_results (b, out, max_n:int=9, ctxs=None, show:bool=True, **kwargs)
Show max_n
results with input(s), target(s) and prediction(s).
Type | Default | Details | |
---|---|---|---|
b | Batch to show results for | ||
out | Predicted output from model for the batch | ||
max_n | int | 9 | Maximum number of items to show |
ctxs | NoneType | None | List of ctx objects to show data. Could be matplotlib axis, DataFrame etc |
show | bool | True | Whether to display data |
kwargs |
Loader.to
Loader.to (device:Union[int,str,torch.device])
Sets self.device=device
.
Loader.n_inp
Loader.n_inp ()
Number of elements in a batch for model input
Loader.decode
Loader.decode (b)
Decode batch b
Loader.decode_batch
Loader.decode_batch (b, max_n:int=9)
Decode up to max_n
input(s) from batch b