A backwards compatible reimplementation of fastai metrics to increase usability and flexibility.

All fastxtend metrics are classes which inherit from fastai's Metric and run on Learner via a modified Recorder callback.

There are three main metric types: AvgMetricX, AccumMetricX, and AvgSmoothMetricX. These correspond one-to-one with fastai's AvgMetric, AccumMetric, and AvgSmoothMetric.

To jump to the fastxtend metrics reference, click here.

Using a Metric

To use the accuracy metric, or any fastxtend metrics detailed below, create a Learner like normal (or task specific learner such as vision_learner, text_classifier_learner, etc) and add the metric(s) to the metrics argument:

from fastai.vision.all import *
from fastxtend.vision.all import *

Learner(..., metrics=Accuracy())

Fastxtend metrics can be mixed with fastai metrics:

Learner(..., metrics=[accuracy, Accuracy()])

Fastxtend metrics can be logged during training, validation, or both by setting the log_metric argument to LogMetric.Train, LogMetric.Valid, or LogMetric.Both. The sole exception is AvgSmoothMetricX which only logs during training.

To log a fastxtend metric during training pass LogMetric.Train to log_metric:

Learner(..., metrics=Accuracy(log_metric=LogMetric.Train))

Non-scikit-learn metrics can have the log type set via the metric_type argument to one of MetricType.Avg, MetricType.Accum, MetricType.Smooth, corresponding to AvgMetricX, AccumMetricX, and AvgSmoothMetricX, respectively.

To log a smooth metric on the training set and normal metric on the valid set:

Learner(..., 
        metrics=[Accuracy(log_metric=LogMetric.Train, metric_type=MetricType.Smooth), 
                 Accuracy()])

Fastxtend metrics also support custom names via the name argument:

Learner(..., metrics=Accuracy(name='metric_name'))

which will result in Accuracy logging under "metric_name" instead of the default "accuracy".

If a fastxtend metric is logged with multiple MetricTypes, the fastxtend Recorder will automatically deduplication the metric names. Unless the metric's name argument is set. Then fastxtend will not deduplicate any metric names.

Creating a Metric

AvgMetricX, AccumMetricX, and AvgSmoothMetricX all require func, which is a funcational implementation of the metric. The signature of func should be inp,targ (where inp are the predictions of the model and targ the corresponding labels).

Fastxtend metrics can be logged during training, validation, or both by setting the log_metric argument to LogMetric.Train, LogMetric.Valid, or LogMetric.Both. The sole exception is AvgSmoothMetricX which only computes during training.

AvgMetricX, AccumMetricX, and AvgSmoothMetricX will automatically recognize and pass any func's unique arguments to func.

An example of creating a fastxtend metric from a functional implementation:

def example_accuracy(inp, targ):
    return (inp == targ).float().mean()

def ExampleAccuracy(dim_argmax=-1, log_metric=LogMetric.Valid, **kwargs):
    return AvgMetricX(example_accuracy, dim_argmax=dim_argmax, log_metric=log_metric, **kwargs)

Alternatively, use the func_to_metric convenience method to create the metric:

def ExampleAccuracy(axis=-1, log_metric=LogMetric.Valid, **kwargs):
    return func_to_metric(example_accuracy, MetricType.Avg, True, axis=axis, log_metric=log_metric, **kwargs)

It is also possible to inherit directly from MetricX to create a fastxtend metric.

class ExampleAccuracy(MetricX):
    def __init__(self, dim_argmax=-1, log_metric=LogMetric.Valid, **kwargs):
    super().__init__(dim_argmax=dim_argmax, log_metric=log_metric, **kwargs)

    def reset(self): self.preds,self.targs = [],[]

    def accumulate(self, learn):
        super().accumulate(learn)
        self.preds.append(learn.to_detach(self.pred))
        self.targs.append(learn.to_detach(self.targ))

    @property
    def value(self):
        if len(self.preds) == 0: return
        preds,targs = torch.cat(self.preds),torch.cat(self.targs)
        return (preds == targs).float().mean()

Additional Metrics Functionality

MetricX, and classes which inherit from MetricX such as AvgMetricX, AccumMetricX, and AvgSmoothMetricX, have optional helper functionality in MetricX.accumulate to assist in developing metrics.

For classification problems with single label, predictions need to be transformed with a softmax then an argmax before being compared to the targets. Since a softmax doesn't change the order of the numbers, apply the argmax. Pass along dim_argmax to have this done by MetricX (usually -1 will work pretty well). If the metric implementation requires probabilities and not predictions, use softmax=True.

For classification problems with multiple labels, or if targets are one-hot encoded, predictions may need to pass through a sigmoid (if it wasn't included in in the model) then be compared to a given threshold (to decide between 0 and 1), this is done by MetricX by passing sigmoid=True and/or a value for thresh.

AvgMetricX, AccumMetricX, and AvgSmoothMetricX have two additional arguments to assist in creating metrics: to_np and invert_arg.

For example, if using a functional metric from sklearn.metrics, predictions and labels will need to be converted to numpy arrays with to_np=True. Also, scikit-learn metrics adopt the convention y_true, y_preds which is the opposite from fastai, so pass invert_arg=True to make AvgMetricX, AccumMetricX, and AvgSmoothMetricX do the inversion. Alternatively, use the skm_to_fastxtend convenience method to handle sklearn.metrics automatically.

LogMetric[source]

Enum = [Train, Valid, Both]

An enumeration.

MetricType[source]

Enum = [Avg, Accum, Smooth]

An enumeration.

ActivationType[source]

Enum = [No, Sigmoid, Softmax, BinarySoftmax]

An enumeration.

class MetricX[source]

MetricX(dim_argmax=None, activation=<ActivationType.No: 1>, thresh=None, log_metric=None, name=None) :: Metric

Blueprint for defining an extended metric with accumulate

For classification problems with single label, predictions need to be transformed with a softmax then an argmax before being compared to the targets. Since a softmax doesn't change the order of the numbers, apply the argmax. Pass along dim_argmax to have this done by MetricX (usually -1 will work pretty well). If the metric implementation requires probabilities and not predictions, use softmax=True.

For classification problems with multiple labels, or if targets are one-hot encoded, predictions may need to pass through a sigmoid (if it wasn't included in in the model) then be compared to a given threshold (to decide between 0 and 1), this is done by MetricX by passing sigmoid=True and/or a value for thresh.

Metrics can be simple averages (like accuracy) but sometimes their computation is a little bit more complex and can't be averaged over batches (like precision or recall), which is why we need a special AccumMetricX class for them. For simple functions that can be computed as averages over batches, we can use the class AvgMetricX, otherwise you'll need to implement the following methods.

MetricX.reset[source]

MetricX.reset()

Reset inner state to prepare for new computation

MetricX.accumulate[source]

MetricX.accumulate(learn)

Store targs and preds from learn, using activation function and argmax as appropriate

MetricX.value[source]

The value of the metric

MetricX.name[source]

Name of the Metric, camel-cased and with Metric removed. Or custom name if provided

class AvgMetricX[source]

AvgMetricX(func, to_np=False, invert_arg=False, dim_argmax=None, activation=<ActivationType.No: 1>, thresh=None, log_metric=None, name=None) :: MetricX

Average the values of func taking into account potential different batch sizes

func is applied to each batch of predictions/targets and then averaged when value attribute is asked for.The signature of func should be inp,targ (where inp are the predictions of the model and targ the corresponding labels).

If using a functional metric from sklearn.metrics, predictions and labels will need to be converted to numpy arrays with to_np=True. Also, scikit-learn metrics adopt the convention y_true, y_preds which is the opposite from fastai, so pass invert_arg=True to make AvgMetricX, AccumMetricX, and AvgSmoothMetricX do the inversion. Alternatively, use the skm_to_fastxtend convenience method to handle sklearn.metrics automatically.

By default, fastxtend's scikit-learn metrics use AccumMetricX.

class AccumMetricX[source]

AccumMetricX(func, to_np=False, invert_arg=False, flatten=True, dim_argmax=None, activation=<ActivationType.No: 1>, thresh=None, log_metric=None, name=None) :: MetricX

Stores predictions and targets on CPU in accumulate to perform final calculations with func.

func is only applied to the accumulated predictions/targets when the value attribute is asked for (so at the end of a validation/training phase, in use with Learner and its Recorder).The signature of func should be inp,targ (where inp are the predictions of the model and targ the corresponding labels).

If using a functional metric from sklearn.metrics, predictions and labels will need to be converted to numpy arrays with to_np=True. Also, scikit-learn metrics adopt the convention y_true, y_preds which is the opposite from fastai, so pass invert_arg=True to make AvgMetricX, AccumMetricX, and AvgSmoothMetricX do the inversion. Alternatively, use the skm_to_fastxtend convenience method to handle sklearn.metrics automatically.

By default, fastai's scikit-learn metrics use AccumMetricX.

class AvgSmoothMetricX[source]

AvgSmoothMetricX(func, beta=0.98, to_np=False, invert_arg=False, dim_argmax=None, activation=<ActivationType.No: 1>, thresh=None, name=None) :: MetricX

Smooth average the values of func (exponentially weighted with beta). Only computed on training set.

func is only applied to the accumulated predictions/targets when the value attribute is asked for (so at the end of a validation/training phase, in use with Learner and its Recorder).The signature of func should be inp,targ (where inp are the predictions of the model and targ the corresponding labels).

If using a functional metric from sklearn.metrics, predictions and labels will need to be converted to numpy arrays with to_np=True. Also, scikit-learn metrics adopt the convention y_true, y_preds which is the opposite from fastai, so pass invert_arg=True to make AvgMetricX, AccumMetricX, and AvgSmoothMetricX do the inversion. Alternatively, use the skm_to_fastxtend convenience method to handle sklearn.metrics automatically.

class AvgLossX[source]

AvgLossX(dim_argmax=None, activation=<ActivationType.No: 1>, thresh=None, log_metric=None, name=None) :: MetricX

Average the losses taking into account potential different batch sizes

class AvgSmoothLossX[source]

AvgSmoothLossX(beta=0.98) :: MetricX

Smooth average of the losses (exponentially weighted with beta)

class ValueMetricX[source]

ValueMetricX(func, name=None, log_metric=None) :: MetricX

Use to include a pre-calculated metric value (for instance calculated in a Callback) and returned by func

Recorder -

Patch Recorder to use fastxtend metrics.

Metrics

Custom Metric Creation

func_to_metric[source]

func_to_metric(func, metric_type, is_class, thresh=None, axis=-1, activation=None, log_metric=<LogMetric.Valid: 2>, dim_argmax=None, name=None)

Convert func metric to a fastai metric

This is the quickest way to use a functional metric as a fastxtend metric.

metric_type is one of MetricType.Avg, MetricType.Accum, or MetricType.Smooth which set the metric to use AvgMetricX, AccumMetricX, or AvgSmoothMetricX, respectively.

is_class indicates if you are in a classification problem or not. In this case:

  • leaving thresh to None indicates it's a single-label classification problem and predictions will pass through an argmax over axis before being compared to the targets
  • setting a value for thresh indicates it's a multi-label classification problem and predictions will pass through a sigmoid (can be deactivated with sigmoid=False) and be compared to thresh before being compared to the targets

If is_class=False, it indicates you are in a regression problem, and predictions are compared to the targets without being modified. In all cases, kwargs are extra keyword arguments passed to func.

skm_to_fastxtend[source]

skm_to_fastxtend(func, is_class=True, thresh=None, axis=-1, activation=None, log_metric=<LogMetric.Valid: 2>, dim_argmax=None, name=None)

Convert func from sklearn.metrics to a fastai metric

This is the quickest way to use a scikit-learn metric using fastxtend metrics. It is the same as func_to_metric except it defaults to using AccumMetricX.

Single-label classification

Accuracy[source]

Accuracy(axis=-1, metric_type=<MetricType.Avg: 1>, log_metric=<LogMetric.Valid: 2>, **kwargs)

Compute accuracy with targ when pred is bs * n_classes

ErrorRate[source]

ErrorRate(axis=-1, metric_type=<MetricType.Avg: 1>, log_metric=<LogMetric.Valid: 2>, **kwargs)

Compute 1 - accuracy with targ when pred is bs * n_classes

TopKAccuracy[source]

TopKAccuracy(k=5, axis=-1, metric_type=<MetricType.Avg: 1>, log_metric=<LogMetric.Valid: 2>, **kwargs)

Computes the Top-k accuracy (targ is in the top k predictions of inp)

APScoreBinary[source]

APScoreBinary(axis=-1, average='macro', pos_label=1, sample_weight=None, log_metric=<LogMetric.Valid: 2>, **kwargs)

Average Precision for single-label binary classification problems

See the scikit-learn documentation for more details.

BalancedAccuracy[source]

BalancedAccuracy(axis=-1, sample_weight=None, adjusted=False, log_metric=<LogMetric.Valid: 2>, **kwargs)

Balanced Accuracy for single-label binary classification problems

See the scikit-learn documentation for more details.

BrierScore[source]

BrierScore(axis=-1, sample_weight=None, pos_label=None, log_metric=<LogMetric.Valid: 2>, **kwargs)

Brier score for single-label classification problems

See the scikit-learn documentation for more details.

CohenKappa[source]

CohenKappa(axis=-1, labels=None, weights=None, sample_weight=None, log_metric=<LogMetric.Valid: 2>, **kwargs)

Cohen kappa for single-label classification problems

See the scikit-learn documentation for more details.

F1Score[source]

F1Score(axis=-1, labels=None, pos_label=1, average='binary', sample_weight=None, log_metric=<LogMetric.Valid: 2>, **kwargs)

F1 score for single-label classification problems

See the scikit-learn documentation for more details.

FBeta[source]

FBeta(beta, axis=-1, labels=None, pos_label=1, average='binary', sample_weight=None, log_metric=<LogMetric.Valid: 2>, **kwargs)

FBeta score with beta for single-label classification problems

See the scikit-learn documentation for more details.

HammingLoss[source]

HammingLoss(axis=-1, sample_weight=None, log_metric=<LogMetric.Valid: 2>, **kwargs)

Hamming loss for single-label classification problems

See the scikit-learn documentation for more details.

Jaccard[source]

Jaccard(axis=-1, labels=None, pos_label=1, average='binary', sample_weight=None, log_metric=<LogMetric.Valid: 2>, **kwargs)

Jaccard score for single-label classification problems

See the scikit-learn documentation for more details.

Precision[source]

Precision(axis=-1, labels=None, pos_label=1, average='binary', sample_weight=None, log_metric=<LogMetric.Valid: 2>, **kwargs)

Precision for single-label classification problems

See the scikit-learn documentation for more details.

Recall[source]

Recall(axis=-1, labels=None, pos_label=1, average='binary', sample_weight=None, log_metric=<LogMetric.Valid: 2>, **kwargs)

Recall for single-label classification problems

See the scikit-learn documentation for more details.

RocAuc[source]

RocAuc(axis=-1, average='macro', sample_weight=None, max_fpr=None, multi_class='ovr', log_metric=<LogMetric.Valid: 2>, **kwargs)

Area Under the Receiver Operating Characteristic Curve for single-label multiclass classification problems

See the scikit-learn documentation for more details.

RocAucBinary[source]

RocAucBinary(axis=-1, average='macro', sample_weight=None, max_fpr=None, multi_class='raise', log_metric=<LogMetric.Valid: 2>, **kwargs)

Area Under the Receiver Operating Characteristic Curve for single-label binary classification problems

See the scikit-learn documentation for more details.

MatthewsCorrCoef[source]

MatthewsCorrCoef(sample_weight=None, log_metric=<LogMetric.Valid: 2>, **kwargs)

Matthews correlation coefficient for single-label classification problems

See the scikit-learn documentation for more details.

Multi-label classification

AccuracyMulti[source]

AccuracyMulti(thresh=0.5, sigmoid=True, metric_type=<MetricType.Avg: 1>, log_metric=<LogMetric.Valid: 2>, **kwargs)

Compute accuracy when inp and targ are the same size.

APScoreMulti[source]

APScoreMulti(sigmoid=True, average='macro', pos_label=1, sample_weight=None, log_metric=<LogMetric.Valid: 2>, **kwargs)

Average Precision for multi-label classification problems

See the scikit-learn documentation for more details.

BrierScoreMulti[source]

BrierScoreMulti(thresh=0.5, sigmoid=True, sample_weight=None, pos_label=None, log_metric=<LogMetric.Valid: 2>, **kwargs)

Brier score for multi-label classification problems

See the scikit-learn documentation for more details.

F1ScoreMulti[source]

F1ScoreMulti(thresh=0.5, sigmoid=True, labels=None, pos_label=1, average='macro', sample_weight=None, log_metric=<LogMetric.Valid: 2>, **kwargs)

F1 score for multi-label classification problems

See the scikit-learn documentation for more details.

FBetaMulti[source]

FBetaMulti(beta, thresh=0.5, sigmoid=True, labels=None, pos_label=1, average='macro', sample_weight=None, log_metric=<LogMetric.Valid: 2>, **kwargs)

FBeta score with beta for multi-label classification problems

See the scikit-learn documentation for more details.

HammingLossMulti[source]

HammingLossMulti(thresh=0.5, sigmoid=True, labels=None, sample_weight=None, log_metric=<LogMetric.Valid: 2>, **kwargs)

Hamming loss for multi-label classification problems

See the scikit-learn documentation for more details.

JaccardMulti[source]

JaccardMulti(thresh=0.5, sigmoid=True, labels=None, pos_label=1, average='macro', sample_weight=None, log_metric=<LogMetric.Valid: 2>, **kwargs)

Jaccard score for multi-label classification problems

See the scikit-learn documentation for more details.

MatthewsCorrCoefMulti[source]

MatthewsCorrCoefMulti(thresh=0.5, sigmoid=True, sample_weight=None, log_metric=<LogMetric.Valid: 2>, **kwargs)

Matthews correlation coefficient for multi-label classification problems

See the scikit-learn documentation for more details.

PrecisionMulti[source]

PrecisionMulti(thresh=0.5, sigmoid=True, labels=None, pos_label=1, average='macro', sample_weight=None, log_metric=<LogMetric.Valid: 2>, **kwargs)

Precision for multi-label classification problems

See the scikit-learn documentation for more details.

RecallMulti[source]

RecallMulti(thresh=0.5, sigmoid=True, labels=None, pos_label=1, average='macro', sample_weight=None, log_metric=<LogMetric.Valid: 2>, **kwargs)

Recall for multi-label classification problems

See the scikit-learn documentation for more details.

RocAucMulti[source]

RocAucMulti(sigmoid=True, average='macro', sample_weight=None, max_fpr=None, log_metric=<LogMetric.Valid: 2>, **kwargs)

Area Under the Receiver Operating Characteristic Curve for multi-label binary classification problems

See the scikit-learn documentation for more details.

Regression

MSE[source]

MSE(metric_type=<MetricType.Avg: 1>, log_metric=<LogMetric.Valid: 2>, **kwargs)

Mean squared error between inp and targ.

RMSE[source]

RMSE(log_metric=<LogMetric.Valid: 2>, **kwargs)

Mean squared error between inp and targ.

MAE[source]

MAE(metric_type=<MetricType.Avg: 1>, log_metric=<LogMetric.Valid: 2>, **kwargs)

Mean absolute error between inp and targ.

MSLE[source]

MSLE(metric_type=<MetricType.Avg: 1>, log_metric=<LogMetric.Valid: 2>, **kwargs)

Mean squared logarithmic error between inp and targ.

ExpRMSE[source]

ExpRMSE(log_metric=<LogMetric.Valid: 2>, **kwargs)

Root mean square percentage error of the exponential of predictions and targets

ExplainedVariance[source]

ExplainedVariance(sample_weight=None, log_metric=<LogMetric.Valid: 2>, **kwargs)

Explained variance between predictions and targets

See the scikit-learn documentation for more details.

R2Score[source]

R2Score(sample_weight=None, log_metric=<LogMetric.Valid: 2>, **kwargs)

R2 score between predictions and targets

See the scikit-learn documentation for more details.

PearsonCorrCoef[source]

PearsonCorrCoef(dim_argmax=None, log_metric=<LogMetric.Valid: 2>, **kwargs)

Pearson correlation coefficient for regression problem

See the scipy documentation for more details.

SpearmanCorrCoef[source]

SpearmanCorrCoef(dim_argmax=None, axis=0, nan_policy='propagate', log_metric=<LogMetric.Valid: 2>, **kwargs)

Spearman correlation coefficient for regression problem

See the scipy documentation for more details.

Segmentation

ForegroundAcc[source]

ForegroundAcc(bkg_idx=0, axis=1, metric_type=<MetricType.Avg: 1>, log_metric=<LogMetric.Valid: 2>, **kwargs)

Computes non-background accuracy for multiclass segmentation

class Dice[source]

Dice(axis=1, log_metric=<LogMetric.Valid: 2>, **kwargs) :: MetricX

Dice coefficient metric for binary target in segmentation

class DiceMulti[source]

DiceMulti(axis=1, log_metric=<LogMetric.Valid: 2>, **kwargs) :: MetricX

Averaged Dice metric (Macro F1) for multiclass target in segmentation

The DiceMulti method implements the "Averaged F1: arithmetic mean over harmonic means" described in this publication: https://arxiv.org/pdf/1911.03347.pdf

class JaccardCoeff[source]

JaccardCoeff(axis=1, log_metric=<LogMetric.Valid: 2>, **kwargs) :: Dice

Implementation of the Jaccard coefficient that is lighter in RAM

NLP

class CorpusBLEUMetric[source]

CorpusBLEUMetric(vocab_sz=5000, axis=-1, log_metric=<LogMetric.Valid: 2>, name='CorpusBLEU', **kwargs) :: MetricX

BLEU Metric calculated over the validation corpus

The BLEU metric was introduced in this article to come up with a way to evaluate the performance of translation models. It's based on the precision of n-grams in your prediction compared to your target. See the fastai NLP course BLEU notebook for a more detailed description of BLEU.

The smoothing used in the precision calculation is the same as in SacreBLEU, which in turn is "method 3" from the Chen & Cherry, 2014 paper.

class Perplexity[source]

Perplexity(dim_argmax=None, activation=<ActivationType.No: 1>, thresh=None, log_metric=None, name=None) :: AvgLossX

Perplexity (exponential of cross-entropy loss) for Language Models

class LossMetric[source]

LossMetric(func, to_np=False, invert_arg=False, dim_argmax=None, activation=<ActivationType.No: 1>, thresh=None, log_metric=None, name=None) :: AvgMetricX

Create a metric from loss_func.attr named nm

LossMetrics[source]

LossMetrics(attrs, nms=None)

List of LossMetric for each of attrs and nms