| Type: | Package |
| Title: | Assessment of Predictions for an Ordinal Response |
| Version: | 0.1.1 |
| Description: | Produces several metrics to assess the prediction of ordinal categories based on the estimated probability distribution for each unit of analysis produced by any model returning a matrix with these probabilities. |
| License: | GPL-2 |
| Encoding: | UTF-8 |
| RoxygenNote: | 7.3.3 |
| NeedsCompilation: | no |
| Packaged: | 2025-11-03 14:44:32 UTC; JEU |
| Author: | Javier Espinosa-Brito [aut, cre] |
| Maintainer: | Javier Espinosa-Brito <[email protected]> |
| Repository: | CRAN |
| Date/Publication: | 2025-11-04 00:00:10 UTC |
Normalized Ordinal Prediction Agreement (NOPA)
Description
Compute the Normalized Ordinal Prediction Agreement (NOPA) metric, a performance measure for models with ordinal-scaled response variables that output estimated probability distributions (EPDs) instead of predicted labels.
This function assesses the predictive quality of a model for an ordinal response by aggregating the predicted probability mass as a function of the level of disagreement with respect to the observed category. It provides a normalized and interpretable score between 0 and 1, where 1 indicates perfect agreement and 0 represents the worst possible prediction.
NOPA compares the estimated probability distribution produced by a model for
each unit of analysis against the observed ordinal response of the same unit. The
maximum disagreement is k-1, where k is the number of ordinal categories
of the response variable, and the
minimum disagreement is 0. Then, aggregates the disagreements of all units of analysis into one single measure.
The function internally computes:
-
OPD— Ordinal Prediction Disagreement, the average level of disagreement between the predicted and observed categories. -
w— The worst possible OPD given the dataset, representing the maximum disagreement achievable. -
NOPA— The normalized agreement metric defined as1 - OPD / w. -
OPDempDist, OPDur, NOPAempDist, NOPAur: Reference values for empirical and uniform-random baselines to contextualize model performance assessment provided by OPD and NOPA.
Usage
nopa(predMat, obsVect)
Arguments
predMat |
A numeric matrix with |
obsVect |
A numeric or integer vector of observed categories, with values from 1 to |
Value
A list containing:
predMatInput matrix of predicted probabilities.
obsVectInput vector of observed categories.
disagreementsObsA matrix with
kcolumns (number of ordinal categories of the response variable), andnrows. Each row shows the level of disagreement of each ordinal category with respect to the observed one for the same unit of analysis.rearrangedProbObsMatrix of probabilities aggregated by level of disagreement.
meanDistObsMean aggregated disagreement profile.
OPDObserved Ordinal Prediction Disagreement.
wOPD for the worst prediction possible (maximum disagreement).
NOPANormalized Ordinal Prediction Agreement (main metric).
OPDempDistA version of a reference point for OPD. It considers an ordinal prediction disagreement measure for the case where the estimated probability distribution for the
kcategories of the ordinal response follows the same distribution as the empirical one.OPDurA version of a reference point for OPD. It considers an ordinal prediction disagreement measure for the case where the observed response variable has its own empirical distribution and the estimated probability distribution for the
kcategories of the ordinal response follows a uniform distribution.NOPAempDistA version of a reference point for NOPA. It considers a normalized ordinal prediction agreement measure for the case where the estimated probability distribution for the
kcategories of the ordinal response follows the same distribution as the empirical one.NOPAurA version of a reference point for NOPA. It considers a normalized ordinal prediction agreement measure for the case where the estimated probability distribution for the
kcategories of the ordinal response follows a uniform distribution.
References
Javier
See Also
ordPredArgmax,
ordPredRandom
opdRef
Examples
EPD <- t(apply(matrix(runif(100),ncol=5),1,function(y) y/sum(y)))
sum(rowSums(EPD))==nrow(EPD)
ordResponse <- sample(1:5,20, replace=TRUE)
nopa(predMat=EPD,obsVect=ordResponse)
OPD Reference Points: Empirical vs Uniform Baselines
Description
Computes two reference values for the Ordinal Prediction Disagreement (OPD):
(i) the expected OPD when the predicted label \hat Y follows the *same*
empirical distribution as Y; and (ii) the expected OPD when
\hat Y is *uniform* over the k ordered categories while Y
retains its empirical distribution. These values are useful as dataset-specific
anchors for interpreting raw OPD and for constructing normalized benchmarks.
Usage
opdRef(p)
Arguments
p |
A probability vector of length |
Details
Let p=(p_1,\dots,p_k) denote the empirical distribution of Y.
The function returns two scalars:
-
OPDempDist:\mathbb{E}|\,\hat Y-Y\,|when\hat Y\sim pindependently ofY\sim p. -
OPDur:\mathbb{E}|\,\hat Y-Y\,|when\hat Y\sim \mathrm{Unif}\{1,\dots,k\}independently ofY\sim p.
Both are computed via the disagreement-level decomposition
\mathbb{E}|\,\hat Y-Y\,|
= \sum_{d=0}^{k-1} d \;\mathbb{P}(|\hat Y-Y|=d),
where, for the uniform case,
\mathrm{OPD}_{UR}=\frac{1}{k}\sum_{d=0}^{k-1}
d\Big[\mathbb{P}\{Y\le k-d\}-\mathbb{P}\{Y\le d\} + \mathbb{P}\{Y\ge d+1\}\Big],
which is the discrete-\{1,\dots,k\} version of the expression shown in the manuscript.
Value
A named numeric vector of length two:
c(OPDempDist = ..., OPDur = ...).
See Also
nopa,
ordPredArgmax,
ordPredRandom
Examples
# Example with k = 5 categories and an empirical distribution p:
p <- c(0.10, 0.20, 0.40, 0.20, 0.10)
opdRef(p)
Argmax Mapping from an Estimated Probability Distribution (EPD) to a Predicted Class
Description
Deterministically maps each row of an estimated probability distribution (EPD) matrix to a single predicted class by taking the index of the maximum probability. Rows are normalized to sum to one (within tolerance). Ties can be broken by first, last, or at random among maximizers.
Usage
ordPredArgmax(P, tie_break = c("first", "random", "last"), tol = 1e-12)
Arguments
P |
A numeric matrix of size |
tie_break |
Character string indicating how to break ties among
equal maxima. One of |
tol |
Numeric tolerance used for (i) row-sum checks and (ii) equality
when identifying ties among maximum probabilities. Defaults to |
Details
The function normalizes each row of P to sum to one (within
tol). Rows with (near) zero total probability trigger an error.
If multiple classes achieve the same (within tol) maximum probability,
the returned class depends on tie_break:
-
"first"— smallest index among maximizers (default). -
"last"— largest index among maximizers. -
"random"— one index sampled uniformly from the set of maximizers.
Value
An integer vector of length n with the predicted class indices
in \{1,\ldots,k\} for each row of P.
See Also
Examples
P <- rbind(
c(0.05, 0.10, 0.25, 0.60),
c(0.40, 0.40, 0.10, 0.10), # tie between classes 1 and 2
c(NA, 0.20, 0.80, 0.00) # NA treated as 0
)
Randomized Mapping from an Estimated Probability Distribution (EPD) to a Predicted Class
Description
Stochastically maps each row of an estimated probability distribution (EPD)
matrix to a single predicted class by drawing one sample from the row's
categorical distribution. Rows are normalized to sum to one (within tolerance),
and the cut-points method is used with intervals (c_{i,j-1}, c_{i,j}],
ensuring z_i=1 maps to class k.
Usage
ordPredRandom(P, z = NULL, tol = 1e-12)
Arguments
P |
A numeric matrix of size |
z |
Optional numeric vector of length |
tol |
Numeric tolerance used for row-sum checks and for guarding against
underflow when normalizing. Defaults to |
Details
The mapping follows the cumulative cut-points
c_{i,0}=0, c_{i,j}=\sum_{\ell=1}^j \hat\pi_{i\ell} for
j=1,\ldots,k, and assigns class j whenever
c_{i,j-1} < z_i \le c_{i,j}. When z is supplied, values are
clipped to (0,1] to respect interval boundaries. Rows with (near) zero
total probability trigger an error.
Value
An integer vector of length n with the predicted class indices
in \{1,\ldots,k\} for each row of P.
See Also
Examples
set.seed(1)
P <- rbind(
c(0.05, 0.10, 0.25, 0.60),
c(0.40, 0.40, 0.10, 0.10),
c(0.00, 0.20, 0.80, 0.00)
)
# Stochastic draws from each row's EPD
ordPredRandom(P)
# Reproducible draws using provided uniforms
z <- c(0.2, 0.85, 1.0)
ordPredRandom(P, z = z)