Learning Rate Finder
When finished running, fastai’s learning rate finder only restores the model weights and optimizer to the initial state.
By default, fastxtend’s learning rate finder additionally restores the dataloader and random state to their inital state, so running Learner.lr_find
has no effect on model training.
LRFinder
LRFinder (start_lr=1e-07, end_lr=10, num_it=100, stop_div=True, restore_state=True)
Training with exponentially growing learning rate
LRFinder.before_fit
LRFinder.before_fit ()
Initialize container for hyper-parameters and save the model & optimizer, optionally saving dataloader & random state
LRFinder.before_batch
LRFinder.before_batch ()
Set the proper hyper-parameters in the optimizer
LRFinder.after_batch
LRFinder.after_batch ()
Record hyper-parameters of this batch and potentially stop training
LRFinder.before_validate
LRFinder.before_validate ()
Skip the validation part of training
LRFinder.after_fit
LRFinder.after_fit ()
Save the hyper-parameters in the recorder if there is one and load the original model & optimizer, optionally restoring dataloader & random state
Learner.lr_find
Learner.lr_find (start_lr=1e-07, end_lr=10, num_it=100, stop_div=True, show_plot=True, suggest_funcs=<function valley>, restore_state=True)
Launch a mock training to find a good learning rate and return suggestions based on suggest_funcs
as a named tuple.
Use restore_state
to reset dataloaders and random state after running.
Without restore_state
, running lr_find
advances both the random state and DataLoaders and behaves the same way as fastai’s lr_find
. Which means the following two code blocks will result with different training output.
with no_random():
= get_dls()
dls = Learner(dls, xresnet18(n_out=dls.c))
learn
with no_random():
=False)
learn.lr_find(restore_state2, 3e-3) learn.fit_one_cycle(
with no_random():
= get_dls()
dls = Learner(dls, xresnet18(n_out=dls.c))
learn
with no_random():
2, 3e-3) learn.fit_one_cycle(
While the default of restore_state=True
prevents this from occurring, it has the potential downside of showing less variance in learning rate results due to every call to lr_find
will be over the same first n_iter
items using the same random state.