8-Bit Optimizers

bitsandbytes 8-bit optimizers with full fastai compatibility

bitsandbytes 8-bit optimizers can reduce optimizer memory usage up to 75% compared to 32-bit optimizers.

While it is possible to use bitsandbytes optimizers1 with fastai via fastai.optimizer.OptimWrapper, this doesn’t provide compatibility with all fastai optimizer features. fastxtend adds full fastai compatibility to bitsandbytes 8-bit optimizers, including per-parameter weight decay, automatic weight decay exclusion for normalization and bias terms, and discriminative learning rate support.

Note: 8-bit Optimizer Usage

While 8-bit optimizer support is defined and detailed here, they are integrated into and intended to be used via fastxtend’s fused fastai optimizers for SGD, Adam, LARS, and LAMB, and via fastxtend’s Lion optimizer as shown below.

To use 8-bit optimizers, install bitsandbytes on a machine with a Cuda device

pip install bitandbytes

then import fastxtend optimizers after importing fastai

from fastxtend.vision.all import *
# or just import fastxtend optimizers
from fastxtend.optimizer.all import *

opt_func = adam(eightbit=True)
Learner(..., opt_func=opt_func)

If training NLP models, you may need to replace the PyTorch embedding layer with a bitsandbytes layer : torch.nn.Embedding(..) -> bnb.nn.Embedding(..).

Check out the bitsandbytes readme for more details on using 8-bit optimizers.

bitsandbytes calls torch.cuda.synchronize after each optimizer step. This prevents starting the next optimizer step until the current step finishes, which may increase optimizer wallclock time.

fastxtend adds sync_each_step=False as an argument to both all 8-bit optimizers, disabling the per-step torch.cuda.synchronize. Set to sync_each_step=True to match bitsandbytes behavior.

fastai and bitsandbytes Compatibility


source

EightBitFastaiAdapter

 EightBitFastaiAdapter ()

Base for adding fastai optimizer functionality to EightBit Optimizers


source

EightBitCommon

 EightBitCommon ()

Common changes to EightBit Optimizers


source

EightBit1StateOptimizer

 EightBit1StateOptimizer (optimizer_name, params, lr=0.001, mom=0.9,
                          sqr_mom=0.0, eps=1e-08, wd=0.0, optim_bits=8,
                          args=None, min_8bit_size=4096,
                          percentile_clipping=100, block_wise=True,
                          max_unorm=0.0, skip_zeros=False, is_paged=False,
                          sync_each_step=False)

Adds fastai optimizer functionality & compatability to Optimizer1State


source

EightBit2StateOptimizer

 EightBit2StateOptimizer (optimizer_name, params, lr=0.001, mom=0.9,
                          sqr_mom=0.999, eps=1e-08, wd=0.0, optim_bits=8,
                          args=None, min_8bit_size=4096,
                          percentile_clipping=100, block_wise=True,
                          max_unorm=0.0, skip_zeros=False, is_paged=False,
                          sync_each_step=False)

Adds fastai optimizer functionality & compatability to Optimizer2State

8-bit Optimizers


source

SGD8bitOptimizer

 SGD8bitOptimizer (params, lr, mom, wd=0, args=None, min_8bit_size=4096,
                   percentile_clipping=100, block_wise=True,
                   sync_each_step=False)

A fastai-compatible bitsandbytes 8-bit SGD optimizer


source

RMSProp8bitOptimizer

 RMSProp8bitOptimizer (params, lr=0.01, sqr_mom=0.99, eps=1e-08, wd=0,
                       args=None, min_8bit_size=4096,
                       percentile_clipping=100, block_wise=True,
                       sync_each_step=False)

A fastai-compatible bitsandbytes 8-bit RMSProb optimizer


source

AdamW8bitOptimizer

 AdamW8bitOptimizer (params, lr=0.001, mom=0.9, sqr_mom=0.99, eps=1e-08,
                     wd=0.01, args=None, min_8bit_size=4096,
                     percentile_clipping=100, block_wise=True,
                     is_paged=False, sync_each_step=False)

A fastai-compatible bitsandbytes 8-bit AdamW optimizer


source

LARS8bitOptimizer

 LARS8bitOptimizer (params, lr, mom=0, wd=0, args=None,
                    min_8bit_size=4096, percentile_clipping=100,
                    trust_coeff=0.02, sync_each_step=False)

A fastai-compatible bitsandbytes 8-bit LARS optimizer


source

LAMB8bitOptimizer

 LAMB8bitOptimizer (params, lr=0.001, mom=0.9, sqr_mom=0.999, eps=1e-08,
                    wd=0, args=None, min_8bit_size=4096,
                    percentile_clipping=100, block_wise=False,
                    sync_each_step=False)

A fastai-compatible bitsandbytes 8-bit LAMB optimizer


source

Lion8bitOptimizer

 Lion8bitOptimizer (params, lr=0.0001, beta1=0.9, beta2=0.99, wd=0,
                    args=None, min_8bit_size=4096,
                    percentile_clipping=100, block_wise=True,
                    is_paged=False, sync_each_step=False)

A fastai-compatible bitsandbytes 8-bit Lion optimizer

Footnotes

  1. Or any PyTorch-compatible optimizer.↩︎