A duplicate of fast.ai's `lr_find`, except it restores the dataloader and random state by default.

class LRFinder[source]

LRFinder(start_lr=1e-07, end_lr=10, num_it=100, stop_div=True, restore_state=True) :: ParamScheduler

Training with exponentially growing learning rate

LRFinder.before_fit[source]

LRFinder.before_fit()

Initialize container for hyper-parameters and save the model & optimizer, optionally saving dataloader & random state

LRFinder.before_batch[source]

LRFinder.before_batch()

Set the proper hyper-parameters in the optimizer

LRFinder.after_batch[source]

LRFinder.after_batch()

Record hyper-parameters of this batch and potentially stop training

LRFinder.before_validate[source]

LRFinder.before_validate()

Skip the validation part of training

LRFinder.after_fit[source]

LRFinder.after_fit()

Save the hyper-parameters in the recorder if there is one and load the original model & optimizer, optionally restoring dataloader & random state

lr_find

Learner.lr_find[source]

Learner.lr_find(start_lr=1e-07, end_lr=10, num_it=100, stop_div=True, show_plot=True, suggest_funcs=valley, restore_state=True)

Launch a mock training to find a good learning rate and return suggestions based on suggest_funcs as a named tuple. Use restore_state to reset dataloaders and random state after running.

Without restore_state running lr_find advances both the random state and dataloaders and behaves the same way as fast.ai's lr_find. Which means the following two code blocks:

with no_random():
    dls = get_dls()
    learn = Learner(dls, xresnet18(n_out=dls.c))

with no_random():
    learn.lr_find(restore_state=False)
    learn.fit_one_cycle(2, 3e-3)
with no_random():
    dls = get_dls()
    learn = Learner(dls, xresnet18(n_out=dls.c))

with no_random():
    learn.fit_one_cycle(2, 3e-3)

will result with different training output.

While the default of restore_state=True prevents this from occuring, it has the potential downside of showing less variance in learning rate results. As every call to lr_find will be over the same first n_iter items using the same random state. Without no_random set, most of the variation appears to be from cuda not being set in deterministic mode.