Releases: Lightning-AI/torchmetrics
Minor patch release
[1.4.3] - 2024-10-10
Fixed
- Fixed for Pearson changes inputs (#2765)
- Fixed bug in
PESQ
metric whereNoUtterancesError
prevented calculating on a batch of data (#2753) - Fixed corner case in
MatthewsCorrCoef
(#2743)
Key Contributors
@Borda, @SkafteNicki, @veera-puthiran-14082
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Full Changelog: v1.4.2...v1.4.3
Minor patch release
[1.4.2] - 2022-09-12
Added
- Re-adding
Chrf
implementation (#2701)
Fixed
- Fixed wrong aggregation in
segmentation.MeanIoU
(#2698) - Fixed handling zero division error in binary IoU (Jaccard index) calculation (#2726)
- Corrected the padding related calculation errors in SSIM (#2721)
- Fixed compatibility of audio domain with new
scipy
(#2733) - Fixed how
prefix
/postfix
works inMultitaskWrapper
(#2722) - Fixed flakiness in tests related to
torch.unique
withdim=None
(#2650)
Key Contributors
@Borda, @petertheprocess, @rittik9, @SkafteNicki, @vkinakh
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Full Changelog: v1.4.1...v1.4.2
Minor patch release
[1.4.1] - 2024-08-02
Changed
- Calculate the text color of
ConfusionMatrix
plot based on luminance (#2590) - Updated
_safe_divide
to allowAccuracy
to run on the GPU (#2640) - Improved better error messages for intersection detection metrics for wrong user input (#2577)
Removed
- Dropped
Chrf
implementation due to licensing issues with the upstream package (#2668)
Fixed
- Fixed bug in
MetricCollection
when using compute groups andcompute
is called more than once (#2571) - Fixed class order of
panoptic_quality(..., return_per_class=True)
output (#2548) - Fixed
BootstrapWrapper
not being reset correctly (#2574) - Fixed integration between
ClasswiseWrapper
andMetricCollection
with custom_filter_kwargs
method (#2575) - Fixed BertScore calculation: pred target misalignment (#2347)
- Fixed
_cumsum
helper function in multi-gpu (#2636) - Fixed bug in
MeanAveragePrecision.coco_to_tm
(#2588) - Fixed missed f-strings in exceptions/warnings (#2667)
Key Contributors
@Borda, @gxy-gxy, @i-aki-y, @ndrwrbgs, @relativityhd, @SkafteNicki
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Full Changelog: v1.4.0...v1.4.1
Minor dependency correction
Full Changelog: v1.4.0...v1.4.0.post0
Metrics for segmentation
In Torchmetrics v1.4, we are happy to introduce a new domain of metrics to the library: segmentation metrics. Segmentation metrics are used to evaluate how well segmentation algorithms are performing, e.g., algorithms that take in an image and pixel-by-pixel decide what kind of object it is. These kind of algorithms are necessary in applications such as self driven cars. Segmentations are closely related to classification metrics, but for now, in Torchmetrics, expect the input to be formatted differently; see the documentation for more info. For now, MeanIoU
and GeneralizedDiceScore
have been added to the subpackage, with many more to follow in upcoming releases of Torchmetrics. We are happy to receive any feedback on metrics to add in the future or the user interface for the new segmentation metrics.
Torchmetrics v1.3 adds new metrics to the classification and image subpackage and has multiple bug fixes and other quality-of-life improvements. We refer to the changelog for the complete list of changes.
[1.4.0] - 2024-05-03
Added
- Added
SensitivityAtSpecificity
metric to classification subpackage (#2217) - Added
QualityWithNoReference
metric to image subpackage (#2288) - Added a new segmentation metric:
- Added support for calculating segmentation quality and recognition quality in
PanopticQuality
metric (#2381) - Added
pretty-errors
for improving error prints (#2431) - Added support for
torch.float
weighted networks for FID and KID calculations (#2483) - Added
zero_division
argument to selected classification metrics (#2198)
Changed
- Made
__getattr__
and__setattr__
ofClasswiseWrapper
more general (#2424)
Fixed
- Fix getitem for metric collection when prefix/postfix is set (#2430)
- Fixed axis names with Precision-Recall curve (#2462)
- Fixed list synchronization with partly empty lists (#2468)
- Fixed memory leak in metrics using list states (#2492)
- Fixed bug in computation of
ERGAS
metric (#2498) - Fixed
BootStrapper
wrapper not working withkwargs
provided argument (#2503) - Fixed warnings being suppressed in
MeanAveragePrecision
when requested (#2501) - Fixed corner-case in
binary_average_precision
when only negative samples are provided (#2507)
Key Contributors
@baskrahmer, @Borda, @ChristophReich1996, @daniel-code, @furkan-celik, @i-aki-y, @jlcsilva, @NielsRogge, @oguz-hanoglu, @SkafteNicki, @ywchan2005
New Contributors
- @eamonn-zh made their first contribution in #2345
- @nsmlzl made their first contribution in #2346
- @fschlatt made their first contribution in #2364
- @JonasVerbickas made their first contribution in #2358
- @AtomicVar made their first contribution in #2391
- @JDongian made their first contribution in #2400
- @daniel-code made their first contribution in #2390
- @baskrahmer made their first contribution in #2457
- @ChristophReich1996 made their first contribution in #2381
- @lukazso made their first contribution in #2491
- @S-aiueo32 made their first contribution in #2499
- @dominicgkerr made their first contribution in #2493
- @Shoumik-Gandre made their first contribution in #2482
- @randombenj made their first contribution in #2511
- @NielsRogge made their first contribution in #1236
- @i-aki-y made their first contribution in #2198
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Full Changelog: v1.3.0...v1.4.0
Minor patch release
[1.3.2] - 2024-03-18
Fixed
- Fixed negative variance estimates in certain image metrics (#2378)
- Fixed dtype being changed by deepspeed for certain regression metrics (#2379)
- Fixed plotting of metric collection when prefix/postfix is set (#2429)
- Fixed bug when
top_k>1
andaverage="macro"
for classification metrics (#2423) - Fixed case where label prediction tensors in classification metrics were not validated correctly (#2427)
- Fixed how auc scores are calculated in
PrecisionRecallCurve.plot
methods (#2437)
Full Changelog: v1.3.1...v1.3.2
Key Contributors
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Minor patch release
[1.3.1] - 2024-02-12
Fixed
- Fixed how backprop is handled in
LPIPS
metric (#2326) - Fixed
MultitaskWrapper
not being able to be logged in lightning when using metric collections (#2349) - Fixed high memory consumption in
Perplexity
metric (#2346) - Fixed cached network in
FeatureShare
not being moved to the correct device (#2348) - Fix naming of statistics in
MeanAveragePrecision
with custom max det thresholds (#2367) - Fixed custom aggregation in retrieval metrics (#2364)
- Fixed initialize aggregation metrics with default floating type (#2366)
- Fixed plotting of confusion matrices (#2358)
Full Changelog: v1.3.0...v1.3.1
Key Contributors
@Borda, @fschlatt, @JonasVerbickas, @nsmlzl, @SkafteNicki
If we forgot someone due to not matching commit email with GitHub account, let us know :]
Minor release patch
Full Changelog: v1.3.0...v1.3.0.post0
New Image metrics & wrappers
TorchMetrics v1.3 is out now! This release introduces seven new metrics in the different subdomains of TorchMetrics, adding some nice features to already established metrics. In this blogpost, we present the new metrics with short code samples.
We are happy to see the continued adoption of TorchMetrics in over 19,000 Github repositories projects, and we are proud to release that we have passed 1,800 GitHub stars.
New metrics
The retrieval domain has received one new metric in this release: RetrievalAUROC
. This metric calculates the Area Under the Receiver Operation Curve for document retrieval data. It is similar to the standard AUROC
metric from classification but also supports the additional indexes
argument that all retrieval metrics support.
from torch import tensor
from torchmetrics.retrieval import RetrievalAUROC
indexes = tensor([0, 0, 0, 1, 1, 1, 1])
preds = tensor([0.2, 0.3, 0.5, 0.1, 0.3, 0.5, 0.2])
target = tensor([False, False, True, False, True, False, True])
r_auroc = RetrievalAUROC()
r_auroc(preds, target, indexes=indexes)
# tensor(0.7500)
The image subdomain is receiving two new metrics in v1.3, which brings the total number image-specific metrics in TorchMetrics to 21! As with other metrics, these two new metrics work by comparing a predicted image tensor to a ground truth image, but they focus on different properties for their metric calculation.
-
The first metrics is
SpatialCorrelationCoefficient
. As the name indicates this metric focuses on how well the spatial structure of the predicted image correlates with the ground truth image.import torch torch.manual_seed(42) from torchmetrics.image import SpatialCorrelationCoefficient as SCC preds = torch.randn([32, 3, 64, 64]) target = torch.randn([32, 3, 64, 64]) scc = SCC() scc(preds, target) # tensor(0.0023)
-
The second metrics is
SpatialDistortionIndex
compares the spatial structure of the images, and is especially useful for evaluating multi spectral imagesimport torch from torchmetrics.image import SpatialDistortionIndex preds = torch.rand([16, 3, 32, 32]) target = { 'ms': torch.rand([16, 3, 16, 16]), 'pan': torch.rand([16, 3, 32, 32]), } sdi = SpatialDistortionIndex() sdi(preds, target) # tensor(0.0090)
A new wrapper metric called FeatureShare
has also been added. This can be seen as a specialized version of MetricCollection
that can be combined with metrics that use a neural network as part of their metric calculation. For example, FrechetInceptionDistance
, InceptionScore
, KernelInceptionDistance
all, by default, use an inception network for their metric calculations. When these metrics were combined inside a MetricCollection
, the underlying neural network was still called three times, which is quite redundant and wastes resources. In principle, it should be possible only to call it once and then propagate the value to all metrics, which is exactly what the FeatureShare
wrapper solves.
import torch
from torchmetrics.wrappers import FeatureShare
from torchmetrics import MetricCollection
from torchmetrics.image import FrechetInceptionDistance, KernelInceptionDistance
def fs_wrapper():
fs = FeatureShare([FrechetInceptionDistance(), KernelInceptionDistance(subset_size=10, subsets=2)])
fs.update(torch.randint(255, (50, 3, 64, 64), dtype=torch.uint8), real=True)
fs.update(torch.randint(255, (50, 3, 64, 64), dtype=torch.uint8), real=False)
fs.compute()
def mc_wrapper():
mc = MetricCollection([FrechetInceptionDistance(), KernelInceptionDistance(subset_size=10, subsets=2)])
mc.update(torch.randint(255, (50, 3, 64, 64), dtype=torch.uint8), real=True)
mc.update(torch.randint(255, (50, 3, 64, 64), dtype=torch.uint8), real=False)
mc.compute()
# lets compare (using ipython timeit function)
% timeit fs_wrapper()
# 8.38 s ± 564 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
% timeit mc_wrapper()
# 13.8 s ± 232 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
This will most likely be significantly faster than the alternative metric collection, as show in the code example.
Improved features
In v1.2, several new arguments were added to MeanAveragePrecision
metric from the detection package. This metric has seen a further small improvement in that the argument extended_summary=True
also returns confidence scores. The confidence scores are the score assigned by the model on how confident a given predicted bounding box belongs to a certain class.
from torch import tensor
from torchmetrics.detection import MeanAveragePrecision
# enable extended summary
map_metric = MeanAveragePrecision(extended_summary=True)
preds = [
{
"boxes": torch.tensor([[0.5, 0.5, 1, 1]]),
"scores": torch.tensor([1.0]),
"labels": torch.tensor([0]),
}
]
target = [
{"boxes": torch.tensor([[0, 0, 1, 1]]), "labels": torch.tensor([0])}
]
map_metric.update(preds, target)
result = map_metric.compute()
# new confidence score can be found in the "score" key
confidence_scores = result["scores"]
# in this case confidence_score will have shape (10, 101, 1, 4, 3)
# because
# * We are by default evaluating for 10 different IoU thresholds
# * We evaluate the PR-curve based on 101 linearly spaced locations
# * We only have 1 class (see the labels tensor)
# * There are 4 area sizes we evaluate on (small, medium, large and all)
# * By default `max_detection_thresholds=[1,10,100]` meaning we evaluate for 3 values
From v1.3 all retrieval metrics now support an argument called aggregation
that determines how the metric should be aggregated over different documents. The supported options are "mean", "median", "max", "min"
with the default value being "mean"
which is fully backward compatible with earlier versions of TorchMetrics.
from torch import tensor
from torchmetrics.retrieval import RetrievalHitRate
indexes = tensor([0, 0, 0, 1, 1, 1, 1])
preds = tensor([0.2, 0.3, 0.5, 0.1, 0.3, 0.5, 0.2])
target = tensor([True, False, False, False, True, False, True])
hr2 = RetrievalHitRate(aggregation="max")
hr2(preds, target, indexes=indexes)
# tensor(1.000)
Finally, the SacreBLEU
metric from the text domain now supports even more tokenizers: "ja-mecab", "ko-mecab", "flores101", "flores200”
.
Changes and bugfixes
Users should be aware that from v1.3, TorchMetrics now only supports v1.10 of Pytorch and up (before v1.8). We always try to provide support for Pytorch releases for up to two years.
There have been several bug fixes related to numerical stability in several metrics. For this reason, we always recommend that users use the most recent version of Torchmetrics for the best experience.
Thank you!
As always, we offer a big thank you to all of our community members for their contributions and feedback. Please open an issue in the repo if you have any recommendations for the next metrics we should tackle.
If you want to ask a question or join us in expanding Torchmetrics, please join our discord server, where you can ask questions and get guidance in the #torchmetrics
channel.
🔥 Check out the documentation and code! 🚀
[1.3.0] - 2024-01-10
Added
- Added more tokenizers for
SacreBLEU
metric (#2068) - Added support for logging
MultiTaskWrapper
directly with lightningslog_dict
method (#2213) - Added
FeatureShare
wrapper to share submodules containing feature extractors between metrics (#2120) - Added new metrics to image domain:
- Added
average
argument to multiclass versions ofPrecisionRecallCurve
andROC
(#2084) - Added confidence scores when
extended_summary=True
inMeanAveragePrecision
(#2212) - Added
RetrievalAUROC
metric (#2251) - Added
aggregate
argument to retrieval metrics (#2220) - Added utility functions in
segmentation.utils
for future segmentation metrics (#2105)
Changed
- Changed minimum supported Pytorch version from 1.8 to 1.10 (#2145)
- Changed x-/y-axis order for
PrecisionRecallCurve
to be consistent with scikit-learn (#2183)
Deprecated
- Deprecated
metric._update_called
(#2141) - Deprecated
specicity_at_sensitivity
in favour ofspecificity_at_sensitivity
(#2199)
Fixed
- Fixed support for half precision + CPU in metrics requiring topk operator (#2252)
- Fixed warning incorrectly being raised in
Running
metrics (#2256) - Fixed integration with custom feature extractor in
FID
metric (#2277)
Full Changelog: v1.2.0...v1.3.0
Key Contributors
@Borda, @HoseinAkbarzadeh, @matsumotosan, @miskf...
Lazy imports
[1.2.1] - 2023-11-30
Added
- Added error if
NoTrainInceptionV3
is being initialized withouttorch-fidelity
not being installed (#2143) - Added support for Pytorch
v2.1
(#2142)
Changed
- Change default state of
SpectralAngleMapper
andUniversalImageQualityIndex
to be tensors (#2089) - Use
arange
and repeat for deterministic bincount (#2184)
Removed
- Removed unused
lpips
third-party package as dependency ofLearnedPerceptualImagePatchSimilarity
metric (#2230)
Fixed
- Fixed numerical stability bug in
LearnedPerceptualImagePatchSimilarity
metric (#2144) - Fixed numerical stability issue in
UniversalImageQualityIndex
metric (#2222) - Fixed incompatibility for
MeanAveragePrecision
withpycocotools
backend when too littlemax_detection_thresholds
are provided (#2219) - Fixed support for half precision in Perplexity metric (#2235)
- Fixed device and dtype for
LearnedPerceptualImagePatchSimilarity
functional metric (#2234) - Fixed bug in
Metric._reduce_states(...)
when usingdist_sync_fn="cat"
(#2226) - Fixed bug in
CosineSimilarity
where 2d is expected but 1d input was given (#2241) - Fixed bug in
MetricCollection
when using compute groups andcompute
is called more than once (#2211)
Full Changelog: v1.2.0...v1.2.1
Key Contributors
@Borda, @jankng, @kyle-dorman, @SkafteNicki, @tanguymagne
If we forgot someone due to not matching commit email with GitHub account, let us know :]