Fine-tuning Weight Decay

Optimizers with fine-tuning weight decay from Katherine Crowson’s AdamWFineTune.

FineTuneOpt adds and additional optional weight decay ft_wd towards the starting value, to prevent overfitting to the new dataset during fine-tuning. This version uses fastai splitters to only apply the fine-tuning weight decay to the pre-trained model body and not the new head.

All fastai optimizers are replicated here with the suffix FT to indicate they are FineTuneOpt.

Early experimental results suggest AdamFT without weight decay might be equivalent to AdamW in vision fine-tuning performance.


source

FineTuneOpt

 FineTuneOpt (params:Tensor, cbs:list, train_bn:bool=True,
              wd_ft_head:bool=False, **defaults)

Modification of the base optimizer class for the fastai library, updating params with cbs

In combination with the fine_tune_wd callback, adds optional weight decay ft_wd towards the starting value, to prevent overfitting to the new dataset during fine-tuning.

By default, will not apply to the fine-tuning head, just the pretrained body.

From: https://gist.github.com/crowsonkb/f646976de8033b371ea17cb9b1c1561f

Type Default Details
params Tensor Parameters and hyper parameters
cbs list Optimizer callbacks
train_bn bool True Batch normalization is always trained
wd_ft_head bool False Apply fine tuning weight decay to model head
defaults

source

fine_tune_wd

 fine_tune_wd (p, lr, ft_wd, orig_p=None, do_wd=True, **kwargs)

Weight decay p towards the starting value orig_p

Optimizers


source

SGDFT

 SGDFT (params, lr, mom=0.0, wd=0.0, ft_wd=0.0, decouple_wd=True,
        wd_ft_head=False)

A Optimizer for SGD with lr and mom and params


source

RMSPropFT

 RMSPropFT (params, lr, sqr_mom=0.99, mom=0.0, wd=0.0, ft_wd=0.0,
            decouple_wd=True, wd_ft_head=False)

A FineTuneOpt for RMSProp with lr, sqr_mom, mom and params


source

AdamFT

 AdamFT (params, lr, mom=0.9, sqr_mom=0.99, eps=1e-05, wd=0.01, ft_wd=0.0,
         decouple_wd=True, wd_ft_head=False)

A FineTuneOpt for Adam with lr, mom, sqr_mom, eps and params


source

RAdamFT

 RAdamFT (params, lr, mom=0.9, sqr_mom=0.99, eps=1e-05, wd=0.0, ft_wd=0.0,
          beta=0.0, decouple_wd=True, wd_ft_head=False)

A FineTuneOpt for Adam with lr, mom, sqr_mom, eps and params


source

QHAdamFT

 QHAdamFT (params, lr, mom=0.999, sqr_mom=0.999, nu_1=0.7, nu_2=1.0,
           eps=1e-08, wd=0.0, ft_wd=0.0, decouple_wd=True,
           wd_ft_head=False)

An FineTuneOpt for Adam with lr, mom, sqr_mom, nus, epsandparams`


source

LarcFT

 LarcFT (params, lr, mom=0.9, clip=True, trust_coeff=0.02, eps=1e-08,
         wd=0.0, ft_wd=0.0, decouple_wd=True, wd_ft_head=False)

A FineTuneOpt for Adam with lr, mom, sqr_mom, eps and params


source

LambFT

 LambFT (params, lr, mom=0.9, sqr_mom=0.99, eps=1e-05, wd=0.0, ft_wd=0.0,
         decouple_wd=True, wd_ft_head=False)

A FineTuneOpt for Adam with lr, mom, sqr_mom, eps and params


source

LookaheadFT

 LookaheadFT (opt, k=6, alpha=0.5)

Wrap a FineTuneOpt opt in a Lookahead optimizer


source

rangerFT

 rangerFT (p, lr, mom=0.95, wd=0.01, ft_wd=0.0, eps=1e-06, sqr_mom=0.99,
           beta=0.0, decouple_wd=True, wd_ft_head=False)

Convenience method for LookaheadFT with RAdamFT