# Metrics Extended

fastxtend’s Metrics Extended is an enhancement of fastai metrics and is backward compatible with fastai metrics. You can mix and match fastxtend and fastai metrics in the same `Learner`

.

fastxtend metrics add the following features to fastai metrics:

- fastxtend metrics can independantly log on train, valid, or both train and valid
- All fastxtend metrics can use the activation support of fastai’s
`AccumMetric`

, inherited from`MetricX`

- fastxtend metrics add
`AvgSmoothMetric`

, a metric version of`AvgSmoothLoss`

There are three main metric types: `AvgMetricX`

, `AccumMetricX`

, and `AvgSmoothMetricX`

. These correspond one-to-one with fastai’s `AvgMetric`

, `AccumMetric`

, and `AvgSmoothMetric`

. fastxtend metrics inherit from fastai’s `Metric`

and run on `Learner`

via a modified `Recorder`

callback.

To jump to the fastxtend metrics reference, click here.

## Using a Metric

To use the accuracy metric, or any fastxtend metrics detailed below, create a `Learner`

like normal (or task specific learner such as `vision_learner`

, `text_classifier_learner`

, etc) and add the metric(s) to the `metrics`

argument:

```
from fastai.vision.all import *
from fastxtend.vision.all import *
=Accuracy()) Learner(..., metrics
```

Fastxtend metrics can be mixed with fastai metrics:

`=[accuracy, Accuracy()]) Learner(..., metrics`

Fastxtend metrics can be logged during training, validation, or both by setting the `log_metric`

argument to `LogMetric.Train`

, `LogMetric.Valid`

, or `LogMetric.Both`

. The sole exception is `AvgSmoothMetricX`

which only logs during training.

To log a fastxtend metric during training pass `LogMetric.Train`

to `log_metric`

:

`=Accuracy(log_metric=LogMetric.Train)) Learner(..., metrics`

Non-scikit-learn metrics can have the log type set via the `metric_type`

argument to one of `MetricType.Avg`

, `MetricType.Accum`

, `MetricType.Smooth`

, corresponding to `AvgMetricX`

, `AccumMetricX`

, and `AvgSmoothMetricX`

, respectively.

To log a smooth metric on the training set and normal metric on the valid set:

```
Learner(..., =[Accuracy(log_metric=LogMetric.Train, metric_type=MetricType.Smooth),
metrics Accuracy()])
```

Fastxtend metrics also support custom names via the `name`

argument:

`=Accuracy(name='metric_name')) Learner(..., metrics`

which will result in Accuracy logging under “metric_name” instead of the default “accuracy”.

If a fastxtend metric is logged with multiple `MetricType`

s, the fastxtend `Recorder`

will automatically deduplication the metric names. Unless the metric’s `name`

argument is set. Then fastxtend will not deduplicate any metric names.

## Creating a Metric

`AvgMetricX`

, `AccumMetricX`

, and `AvgSmoothMetricX`

all require `func`

, which is a funcational implementation of the metric. The signature of `func`

should be `inp,targ`

(where `inp`

are the predictions of the model and `targ`

the corresponding labels).

Fastxtend metrics can be logged during training, validation, or both by setting the `log_metric`

argument to `LogMetric.Train`

, `LogMetric.Valid`

, or `LogMetric.Both`

. The sole exception is `AvgSmoothMetricX`

which only computes during training.

`AvgMetricX`

, `AccumMetricX`

, and `AvgSmoothMetricX`

will automatically recognize and pass any `func`

’s unique arguments to `func`

.

An example of creating a fastxtend metric from a functional implementation:

```
def example_accuracy(inp, targ):
return (inp == targ).float().mean()
def ExampleAccuracy(dim_argmax=-1, log_metric=LogMetric.Valid, **kwargs):
return AvgMetricX(example_accuracy, dim_argmax=dim_argmax, log_metric=log_metric, **kwargs)
```

Alternatively, use the `func_to_metric`

convenience method to create the metric:

```
def ExampleAccuracy(axis=-1, log_metric=LogMetric.Valid, **kwargs):
return func_to_metric(example_accuracy, MetricType.Avg, True, axis=axis, log_metric=log_metric, **kwargs)
```

It is also possible to inherit directly from `MetricX`

to create a fastxtend metric.

```
class ExampleAccuracy(MetricX):
def __init__(self, dim_argmax=-1, log_metric=LogMetric.Valid, **kwargs):
super().__init__(dim_argmax=dim_argmax, log_metric=log_metric, **kwargs)
def reset(self): self.preds,self.targs = [],[]
def accumulate(self, learn):
super().accumulate(learn)
self.preds.append(learn.to_detach(self.pred))
self.targs.append(learn.to_detach(self.targ))
@property
def value(self):
if len(self.preds) == 0: return
= torch.cat(self.preds),torch.cat(self.targs)
preds,targs return (preds == targs).float().mean()
```

## Additional Metrics Functionality

`MetricX`

, and classes which inherit from `MetricX`

such as `AvgMetricX`

, `AccumMetricX`

, and `AvgSmoothMetricX`

, have optional helper functionality in `MetricX.accumulate`

to assist in developing metrics.

For classification problems with single label, predictions need to be transformed with a softmax then an argmax before being compared to the targets. Since a softmax doesn’t change the order of the numbers, apply the argmax. Pass along `dim_argmax`

to have this done by `MetricX`

(usually -1 will work pretty well). If the metric implementation requires probabilities and not predictions, use `softmax=True`

.

For classification problems with multiple labels, or if targets are one-hot encoded, predictions may need to pass through a sigmoid (if it wasn’t included in in the model) then be compared to a given threshold (to decide between 0 and 1), this is done by `MetricX`

by passing `sigmoid=True`

and/or a value for `thresh`

.

`AvgMetricX`

, `AccumMetricX`

, and `AvgSmoothMetricX`

have two additional arguments to assist in creating metrics: `to_np`

and `invert_arg`

.

For example, if using a functional metric from sklearn.metrics, predictions and labels will need to be converted to numpy arrays with `to_np=True`

. Also, scikit-learn metrics adopt the convention `y_true`

, `y_preds`

which is the opposite from fastai, so pass `invert_arg=True`

to make `AvgMetricX`

, `AccumMetricX`

, and `AvgSmoothMetricX`

do the inversion. Alternatively, use the `skm_to_fastxtend`

convenience method to handle sklearn.metrics automatically.

### LogMetric

`LogMetric (value, names=None, module=None, qualname=None, type=None, start=1)`

All logging types for `MetricX`

### MetricType

`MetricType (value, names=None, module=None, qualname=None, type=None, start=1)`

All types of `MetricX`

### ActivationType

`ActivationType (value, names=None, module=None, qualname=None, type=None, start=1)`

All activation classes for `MetricX

### MetricX

`MetricX (dim_argmax=None, activation=<ActivationType.No: 1>, thresh=None, log_metric=None, name=None)`

Blueprint for defining an extended metric with accumulate

For classification problems with single label, predictions need to be transformed with a softmax then an argmax before being compared to the targets. Since a softmax doesn’t change the order of the numbers, apply the argmax. Pass along `dim_argmax`

to have this done by `MetricX`

(usually -1 will work pretty well). If the metric implementation requires probabilities and not predictions, use `softmax=True`

.

For classification problems with multiple labels, or if targets are one-hot encoded, predictions may need to pass through a sigmoid (if it wasn’t included in in the model) then be compared to a given threshold (to decide between 0 and 1), this is done by `MetricX`

by passing `sigmoid=True`

and/or a value for `thresh`

.

Metrics can be simple averages (like accuracy) but sometimes their computation is a little bit more complex and can’t be averaged over batches (like precision or recall), which is why we need a special `AccumMetricX`

class for them. For simple functions that can be computed as averages over batches, we can use the class `AvgMetricX`

, otherwise you’ll need to implement the following methods.

### MetricX.reset

`MetricX.reset ()`

Reset inner state to prepare for new computation

### MetricX.accumulate

`MetricX.accumulate (learn)`

Store targs and preds from `learn`

, using activation function and argmax as appropriate

### MetricX.value

`MetricX.value ()`

The value of the metric

### MetricX.name

`MetricX.name ()`

Name of the `Metric`

, camel-cased and with Metric removed. Or custom name if provided

### AvgMetricX

`AvgMetricX (func, to_np=False, invert_arg=False, dim_argmax=None, activation=<ActivationType.No: 1>, thresh=None, log_metric=None, name=None)`

Average the values of `func`

taking into account potential different batch sizes

`func`

is applied to each batch of predictions/targets and then averaged when `value`

attribute is asked for.The signature of `func`

should be `inp,targ`

(where `inp`

are the predictions of the model and `targ`

the corresponding labels).

If using a functional metric from sklearn.metrics, predictions and labels will need to be converted to numpy arrays with `to_np=True`

. Also, scikit-learn metrics adopt the convention `y_true`

, `y_preds`

which is the opposite from fastai, so pass `invert_arg=True`

to make `AvgMetricX`

, `AccumMetricX`

, and `AvgSmoothMetricX`

do the inversion. Alternatively, use the `skm_to_fastxtend`

convenience method to handle sklearn.metrics automatically.

By default, fastxtend’s scikit-learn metrics use `AccumMetricX`

.

### AccumMetricX

`AccumMetricX (func, to_np=False, invert_arg=False, flatten=True, dim_argmax=None, activation=<ActivationType.No: 1>, thresh=None, log_metric=None, name=None)`

Stores predictions and targets on CPU in accumulate to perform final calculations with `func`

.

`func`

is only applied to the accumulated predictions/targets when the `value`

attribute is asked for (so at the end of a validation/training phase, in use with `Learner`

and its `Recorder`

).The signature of `func`

should be `inp,targ`

(where `inp`

are the predictions of the model and `targ`

the corresponding labels).

If using a functional metric from sklearn.metrics, predictions and labels will need to be converted to numpy arrays with `to_np=True`

. Also, scikit-learn metrics adopt the convention `y_true`

, `y_preds`

which is the opposite from fastai, so pass `invert_arg=True`

to make `AvgMetricX`

, `AccumMetricX`

, and `AvgSmoothMetricX`

do the inversion. Alternatively, use the `skm_to_fastxtend`

convenience method to handle sklearn.metrics automatically.

By default, fastai’s scikit-learn metrics use `AccumMetricX`

.

### AvgSmoothMetricX

`AvgSmoothMetricX (func, beta=0.98, to_np=False, invert_arg=False, dim_argmax=None, activation=<ActivationType.No: 1>, thresh=None, name=None)`

Smooth average the values of `func`

(exponentially weighted with `beta`

). Only computed on training set.

`func`

is only applied to the accumulated predictions/targets when the `value`

attribute is asked for (so at the end of a validation/training phase, in use with `Learner`

and its `Recorder`

).The signature of `func`

should be `inp,targ`

(where `inp`

are the predictions of the model and `targ`

the corresponding labels).

If using a functional metric from sklearn.metrics, predictions and labels will need to be converted to numpy arrays with `to_np=True`

. Also, scikit-learn metrics adopt the convention `y_true`

, `y_preds`

which is the opposite from fastai, so pass `invert_arg=True`

to make `AvgMetricX`

, `AccumMetricX`

, and `AvgSmoothMetricX`

do the inversion. Alternatively, use the `skm_to_fastxtend`

convenience method to handle sklearn.metrics automatically.

### AvgLossX

`AvgLossX (dim_argmax=None, activation=<ActivationType.No: 1>, thresh=None, log_metric=None, name=None)`

Average the losses taking into account potential different batch sizes

### AvgSmoothLossX

`AvgSmoothLossX (beta=0.98)`

Smooth average of the losses (exponentially weighted with `beta`

)

### ValueMetricX

`ValueMetricX (func, name=None, log_metric=None)`

Use to include a pre-calculated metric value (for instance calculated in a `Callback`

) and returned by `func`

## Metrics

### Custom Metric Creation

fastxtend provides two convenience methods for creating custom metrics from functions: `func_to_metric`

and `skm_to_fastxtend`

.

### func_to_metric

`func_to_metric (func, metric_type, is_class, thresh=None, axis=-1, activation=None, log_metric=<LogMetric.Valid: 2>, dim_argmax=None, name=None)`

Convert `func`

metric to a fastai metric

This is the quickest way to use a functional metric as a fastxtend metric.

`metric_type`

is one of `MetricType.Avg`

, `MetricType.Accum`

, or `MetricType.Smooth`

which set the metric to use `AvgMetricX`

, `AccumMetricX`

, or `AvgSmoothMetricX`

, respectively.

`is_class`

indicates if you are in a classification problem or not. In this case: - leaving `thresh`

to `None`

indicates it’s a single-label classification problem and predictions will pass through an argmax over `axis`

before being compared to the targets - setting a value for `thresh`

indicates it’s a multi-label classification problem and predictions will pass through a sigmoid (can be deactivated with `sigmoid=False`

) and be compared to `thresh`

before being compared to the targets

If `is_class=False`

, it indicates you are in a regression problem, and predictions are compared to the targets without being modified. In all cases, `kwargs`

are extra keyword arguments passed to `func`

.

### skm_to_fastxtend

`skm_to_fastxtend (func, is_class=True, thresh=None, axis=-1, activation=None, log_metric=<LogMetric.Valid: 2>, dim_argmax=None, name=None)`

Convert `func`

from sklearn.metrics to a fastai metric

This is the quickest way to use a scikit-learn metric using fastxtend metrics. It is the same as `func_to_metric`

except it defaults to using `AccumMetricX`

.

## Single-label classification

### Accuracy

`Accuracy (axis=-1, metric_type=<MetricType.Avg: 1>, log_metric=<LogMetric.Valid: 2>, **kwargs)`

Compute accuracy with `targ`

when `pred`

is bs * n_classes

### ErrorRate

`ErrorRate (axis=-1, metric_type=<MetricType.Avg: 1>, log_metric=<LogMetric.Valid: 2>, **kwargs)`

Compute 1 - accuracy with `targ`

when `pred`

is bs * n_classes

### TopKAccuracy

`TopKAccuracy (k=5, axis=-1, metric_type=<MetricType.Avg: 1>, log_metric=<LogMetric.Valid: 2>, **kwargs)`

Computes the Top-k accuracy (`targ`

is in the top `k`

predictions of `inp`

)

### APScoreBinary

`APScoreBinary (axis=-1, average='macro', pos_label=1, sample_weight=None, log_metric=<LogMetric.Valid: 2>, **kwargs)`

Average Precision for single-label binary classification problems

See the scikit-learn documentation for more details.

### BalancedAccuracy

`BalancedAccuracy (axis=-1, sample_weight=None, adjusted=False, log_metric=<LogMetric.Valid: 2>, **kwargs)`

Balanced Accuracy for single-label binary classification problems

See the scikit-learn documentation for more details.

### BrierScore

`BrierScore (axis=-1, sample_weight=None, pos_label=None, log_metric=<LogMetric.Valid: 2>, **kwargs)`

Brier score for single-label classification problems

See the scikit-learn documentation for more details.

### CohenKappa

`CohenKappa (axis=-1, labels=None, weights=None, sample_weight=None, log_metric=<LogMetric.Valid: 2>, **kwargs)`

Cohen kappa for single-label classification problems

See the scikit-learn documentation for more details.

### F1Score

`F1Score (axis=-1, labels=None, pos_label=1, average='binary', sample_weight=None, log_metric=<LogMetric.Valid: 2>, **kwargs)`

F1 score for single-label classification problems

See the scikit-learn documentation for more details.

### FBeta

`FBeta (beta, axis=-1, labels=None, pos_label=1, average='binary', sample_weight=None, log_metric=<LogMetric.Valid: 2>, **kwargs)`

FBeta score with `beta`

for single-label classification problems

See the scikit-learn documentation for more details.

### HammingLoss

`HammingLoss (axis=-1, sample_weight=None, log_metric=<LogMetric.Valid: 2>, **kwargs)`

Hamming loss for single-label classification problems

See the scikit-learn documentation for more details.

### Jaccard

`Jaccard (axis=-1, labels=None, pos_label=1, average='binary', sample_weight=None, log_metric=<LogMetric.Valid: 2>, **kwargs)`

Jaccard score for single-label classification problems

See the scikit-learn documentation for more details.

### Precision

`Precision (axis=-1, labels=None, pos_label=1, average='binary', sample_weight=None, log_metric=<LogMetric.Valid: 2>, **kwargs)`

Precision for single-label classification problems

See the scikit-learn documentation for more details.

### Recall

`Recall (axis=-1, labels=None, pos_label=1, average='binary', sample_weight=None, log_metric=<LogMetric.Valid: 2>, **kwargs)`

Recall for single-label classification problems

See the scikit-learn documentation for more details.

### RocAuc

`RocAuc (axis=-1, average='macro', sample_weight=None, max_fpr=None, multi_class='ovr', log_metric=<LogMetric.Valid: 2>, **kwargs)`

Area Under the Receiver Operating Characteristic Curve for single-label multiclass classification problems

See the scikit-learn documentation for more details.

### RocAucBinary

`RocAucBinary (axis=-1, average='macro', sample_weight=None, max_fpr=None, multi_class='raise', log_metric=<LogMetric.Valid: 2>, **kwargs)`

Area Under the Receiver Operating Characteristic Curve for single-label binary classification problems

See the scikit-learn documentation for more details.

### MatthewsCorrCoef

`MatthewsCorrCoef (sample_weight=None, log_metric=<LogMetric.Valid: 2>, **kwargs)`

Matthews correlation coefficient for single-label classification problems

See the scikit-learn documentation for more details.

## Multi-label classification

### AccuracyMulti

`AccuracyMulti (thresh=0.5, sigmoid=True, metric_type=<MetricType.Avg: 1>, log_metric=<LogMetric.Valid: 2>, **kwargs)`

Compute accuracy when `inp`

and `targ`

are the same size.

### APScoreMulti

`APScoreMulti (sigmoid=True, average='macro', pos_label=1, sample_weight=None, log_metric=<LogMetric.Valid: 2>, **kwargs)`

Average Precision for multi-label classification problems

See the scikit-learn documentation for more details.

### BrierScoreMulti

`BrierScoreMulti (thresh=0.5, sigmoid=True, sample_weight=None, pos_label=None, log_metric=<LogMetric.Valid: 2>, **kwargs)`

Brier score for multi-label classification problems

See the scikit-learn documentation for more details.

### F1ScoreMulti

`F1ScoreMulti (thresh=0.5, sigmoid=True, labels=None, pos_label=1, average='macro', sample_weight=None, log_metric=<LogMetric.Valid: 2>, **kwargs)`

F1 score for multi-label classification problems

See the scikit-learn documentation for more details.

### FBetaMulti

`FBetaMulti (beta, thresh=0.5, sigmoid=True, labels=None, pos_label=1, average='macro', sample_weight=None, log_metric=<LogMetric.Valid: 2>, **kwargs)`

FBeta score with `beta`

for multi-label classification problems

See the scikit-learn documentation for more details.

### HammingLossMulti

`HammingLossMulti (thresh=0.5, sigmoid=True, labels=None, sample_weight=None, log_metric=<LogMetric.Valid: 2>, **kwargs)`

Hamming loss for multi-label classification problems

See the scikit-learn documentation for more details.

### JaccardMulti

`JaccardMulti (thresh=0.5, sigmoid=True, labels=None, pos_label=1, average='macro', sample_weight=None, log_metric=<LogMetric.Valid: 2>, **kwargs)`

Jaccard score for multi-label classification problems

See the scikit-learn documentation for more details.

### MatthewsCorrCoefMulti

`MatthewsCorrCoefMulti (thresh=0.5, sigmoid=True, sample_weight=None, log_metric=<LogMetric.Valid: 2>, **kwargs)`

Matthews correlation coefficient for multi-label classification problems

See the scikit-learn documentation for more details.

### PrecisionMulti

`PrecisionMulti (thresh=0.5, sigmoid=True, labels=None, pos_label=1, average='macro', sample_weight=None, log_metric=<LogMetric.Valid: 2>, **kwargs)`

Precision for multi-label classification problems

See the scikit-learn documentation for more details.

### RecallMulti

`RecallMulti (thresh=0.5, sigmoid=True, labels=None, pos_label=1, average='macro', sample_weight=None, log_metric=<LogMetric.Valid: 2>, **kwargs)`

Recall for multi-label classification problems

See the scikit-learn documentation for more details.

### RocAucMulti

`RocAucMulti (sigmoid=True, average='macro', sample_weight=None, max_fpr=None, log_metric=<LogMetric.Valid: 2>, **kwargs)`

Area Under the Receiver Operating Characteristic Curve for multi-label binary classification problems

See the scikit-learn documentation for more details.

## Regression

### MSE

`MSE (metric_type=<MetricType.Avg: 1>, log_metric=<LogMetric.Valid: 2>, **kwargs)`

Mean squared error between `inp`

and `targ`

.

### RMSE

`RMSE (log_metric=<LogMetric.Valid: 2>, **kwargs)`

Mean squared error between `inp`

and `targ`

.

### MAE

`MAE (metric_type=<MetricType.Avg: 1>, log_metric=<LogMetric.Valid: 2>, **kwargs)`

Mean absolute error between `inp`

and `targ`

.

### MSLE

`MSLE (metric_type=<MetricType.Avg: 1>, log_metric=<LogMetric.Valid: 2>, **kwargs)`

Mean squared logarithmic error between `inp`

and `targ`

.

### ExpRMSE

`ExpRMSE (log_metric=<LogMetric.Valid: 2>, **kwargs)`

Root mean square percentage error of the exponential of predictions and targets

### ExplainedVariance

`ExplainedVariance (sample_weight=None, log_metric=<LogMetric.Valid: 2>, **kwargs)`

Explained variance between predictions and targets

See the scikit-learn documentation for more details.

### R2Score

`R2Score (sample_weight=None, log_metric=<LogMetric.Valid: 2>, **kwargs)`

R2 score between predictions and targets

See the scikit-learn documentation for more details.

### PearsonCorrCoef

`PearsonCorrCoef (dim_argmax=None, log_metric=<LogMetric.Valid: 2>, **kwargs)`

Pearson correlation coefficient for regression problem

See the scipy documentation for more details.

### SpearmanCorrCoef

`SpearmanCorrCoef (dim_argmax=None, axis=0, nan_policy='propagate', log_metric=<LogMetric.Valid: 2>, **kwargs)`

Spearman correlation coefficient for regression problem

See the scipy documentation for more details.

## Segmentation

### ForegroundAcc

`ForegroundAcc (bkg_idx=0, axis=1, metric_type=<MetricType.Avg: 1>, log_metric=<LogMetric.Valid: 2>, **kwargs)`

Computes non-background accuracy for multiclass segmentation

### Dice

`Dice (axis=1, log_metric=<LogMetric.Valid: 2>, **kwargs)`

Dice coefficient metric for binary target in segmentation

### DiceMulti

`DiceMulti (axis=1, log_metric=<LogMetric.Valid: 2>, **kwargs)`

Averaged Dice metric (Macro F1) for multiclass target in segmentation

The DiceMulti method implements the “Averaged F1: arithmetic mean over harmonic means” described in this publication: https://arxiv.org/pdf/1911.03347.pdf

### JaccardCoeff

`JaccardCoeff (axis=1, log_metric=<LogMetric.Valid: 2>, **kwargs)`

Implementation of the Jaccard coefficient that is lighter in RAM

## NLP

### CorpusBLEUMetric

`CorpusBLEUMetric (vocab_sz=5000, axis=-1, log_metric=<LogMetric.Valid: 2>, name='CorpusBLEU', **kwargs)`

BLEU Metric calculated over the validation corpus

The BLEU metric was introduced in this article to come up with a way to evaluate the performance of translation models. It’s based on the precision of n-grams in your prediction compared to your target. See the fastai NLP course BLEU notebook for a more detailed description of BLEU.

The smoothing used in the precision calculation is the same as in SacreBLEU, which in turn is “method 3” from the Chen & Cherry, 2014 paper.

### Perplexity

`Perplexity (dim_argmax=None, activation=<ActivationType.No: 1>, thresh=None, log_metric=None, name=None)`

Perplexity (exponential of cross-entropy loss) for Language Models

### LossMetric

`LossMetric (func, to_np=False, invert_arg=False, dim_argmax=None, activation=<ActivationType.No: 1>, thresh=None, log_metric=None, name=None)`

Create a metric from `loss_func.attr`

named `nm`

### LossMetrics

`LossMetrics (attrs, nms=None)`

List of `LossMetric`

for each of `attrs`

and `nms`

## Logging

Metrics Extended is compatible with logging to Weights and Biases and TensorBoard using fastai’s `WandbCallback`

and `TensorBoardCallback`

.