Metrics
Metrics
Metrics is deprecated in π Datasets. To learn more about how to use metrics, take a look at the library π Evaluate! In addition to metrics, you can find more tools for evaluating models and datasets.
Metrics are important for evaluating a modelβs predictions. In the tutorial, you learned how to compute a metric over an entire evaluation set. You have also seen how to load a metric.
This guide will show you how to:
Add predictions and references.
Compute metrics using different methods.
Write your own metric loading script.
Add predictions and references
When you want to add model predictions and references to a Metric instance, you have two options:
Metric.add() adds a single
predictionandreference.Metric.add_batch() adds a batch of
predictionsandreferences.
Use Metric.add_batch() by passing it your model predictions, and the references the model predictions should be evaluated against:
Copied
>>> import datasets
>>> metric = datasets.load_metric('my_metric')
>>> for model_input, gold_references in evaluation_dataset:
... model_predictions = model(model_inputs)
... metric.add_batch(predictions=model_predictions, references=gold_references)
>>> final_score = metric.compute()Metrics accepts various input formats (Python lists, NumPy arrays, PyTorch tensors, etc.) and converts them to an appropriate format for storage and computation.
Compute scores
The most straightforward way to calculate a metric is to call Metric.compute(). But some metrics have additional arguments that allow you to modify the metrics behavior.
Letβs load the SacreBLEU metric, and compute it with a different smoothing method.
Load the SacreBLEU metric:
Copied
Inspect the different argument methods for computing the metric:
Copied
Compute the metric with the
floormethod, and a differentsmooth_value:
Copied
Custom metric loading script
Write a metric loading script to use your own custom metric (or one that is not on the Hub). Then you can load it as usual with load_metric().
To help you get started, open the SQuAD metric loading script and follow along.
Get jump started with our metric loading script template!
Add metric attributes
Start by adding some information about your metric in Metric._info(). The most important attributes you should specify are:
MetricInfo.descriptionprovides a brief description about your metric.MetricInfo.citationcontains a BibTex citation for the metric.MetricInfo.inputs_descriptiondescribes the expected inputs and outputs. It may also provide an example usage of the metric.MetricInfo.featuresdefines the name and type of the predictions and references.
After youβve filled out all these fields in the template, it should look like the following example from the SQuAD metric script:
Copied
Download metric files
If your metric needs to download, or retrieve local files, you will need to use the Metric._download_and_prepare() method. For this example, letβs examine the BLEURT metric loading script.
Provide a dictionary of URLs that point to the metric files:
Copied
If the files are stored locally, provide a dictionary of path(s) instead of URLs.
Metric._download_and_prepare()will take the URLs and download the metric files specified:
Copied
Compute score
DatasetBuilder._compute provides the actual instructions for how to compute a metric given the predictions and references. Now letβs take a look at the GLUE metric loading script.
Provide the functions for
DatasetBuilder._computeto calculate your metric:
Copied
Create
DatasetBuilder._computewith instructions for what metric to calculate for each configuration:
Copied
Test
Once youβre finished writing your metric loading script, try to load it locally:
Copied
Last updated