Title: | Extreme Value Theory for Open Set Classification - GPD and GEV Classifiers |
---|---|
Description: | Two classifiers for open set recognition and novelty detection based on extreme value theory. The first classifier is based on the generalized Pareto distribution (GPD) and the second classifier is based on the generalized extreme value (GEV) distribution. For details, see Vignotto, E., & Engelke, S. (2018) <arXiv:1808.09902>. |
Authors: | Edoardo Vignotto [aut, cre]
|
Maintainer: | Edoardo Vignotto <[email protected]> |
License: | GPL-3 |
Version: | 1.0 |
Built: | 2025-02-13 04:12:00 UTC |
Source: | https://github.com/cran/evtclass |
This function is used to evaluate a test set for a pre-trained GEV classifier. It can be used to perform open set classification based on the generalized Pareto distribution.
gevcTest(train, test, pre, prob = TRUE, alpha)
gevcTest(train, test, pre, prob = TRUE, alpha)
train |
a data matrix containing the train data. Class labels should not be included. |
test |
a data matrix containing the test data. |
pre |
a numeric vector of parameters obtained with the function |
prob |
logical indicating whether p-values should be returned. |
alpha |
threshold to be used if |
For details on the method and parameters see Vignotto and Engelke (2018).
If prob
is equal to TRUE
, a vector containing the p-values for each point is returned. A high p-value results in the classification of the corresponding test data as a known point, since this hypothesis cannot be rejected. If the p-value is small, the corresponding test data is classified as an unknown point. If prob
is equal to TRUE
, a vector of predicted values is returned.
Edoardo Vignotto
[email protected]
Vignotto, E., & Engelke, S. (2018). Extreme Value Theory for Open Set Classification-GPD and GEV Classifiers. arXiv preprint arXiv:1808.09902.
trainset <- LETTER[1:15000,] testset <- LETTER[-(1:15000), -1] knowns <- trainset[trainset$class==1, -1] gevClassifier <- gevcTrain(train = knowns) predicted <- gevcTest(train = knowns, test = testset, pre = gevClassifier)
trainset <- LETTER[1:15000,] testset <- LETTER[-(1:15000), -1] knowns <- trainset[trainset$class==1, -1] gevClassifier <- gevcTrain(train = knowns) predicted <- gevcTest(train = knowns, test = testset, pre = gevClassifier)
This function is used to train a GEV classifier. It can be used to perform open set classification based on the generalized extreme value distribution.
gevcTrain(train)
gevcTrain(train)
train |
a data matrix containing the train data. Class labels should not be included. |
For details on the method and parameters see Vignotto and Engelke (2018).
A numeric vector of two elements containing the estimated parameters of the fitted reversed Weibull.
Data are not scaled internally; any preprocessing has to be done externally.
Edoardo Vignotto
[email protected]
Vignotto, E., & Engelke, S. (2018). Extreme Value Theory for Open Set Classification - GPD and GEV Classifiers. arXiv preprint arXiv:1808.09902.
trainset <- LETTER[1:15000,] knowns <- trainset[trainset$class==1, -1] gevClassifier <- gevcTrain(train = knowns)
trainset <- LETTER[1:15000,] knowns <- trainset[trainset$class==1, -1] gevClassifier <- gevcTrain(train = knowns)
This function is used to evaluate a test set for a pre-trained GPD classifier. It can be used to perform open set classification based on the generalized Pareto distribution.
gpdcTest(train, test, pre, prob = TRUE, alpha = 0.01)
gpdcTest(train, test, pre, prob = TRUE, alpha = 0.01)
train |
data matrix containing the train data. Class labels should not be included. |
test |
a data matrix containing the test data. |
pre |
a list obtained with the function |
prob |
logical indicating whether p-values should be returned. |
alpha |
threshold to be used if |
For details on the method and parameters see Vignotto and Engelke (2018).
If prob
is equal to TRUE
, a vector containing the p-values for each point is returned. A high p-value results in the classification of the corresponding test data as a known point, since this hypothesis cannot be rejected. If the p-value is small, the corresponding test data is classified as an unknown point. If prob
is equal to TRUE
, a vector of predicted values is returned.
Edoardo Vignotto
[email protected]
Vignotto, E., & Engelke, S. (2018). Extreme Value Theory for Open Set Classification-GPD and GEV Classifiers. arXiv preprint arXiv:1808.09902.
trainset <- LETTER[1:15000,] testset <- LETTER[-(1:15000), -1] knowns <- trainset[trainset$class==1, -1] gpdClassifier <- gpdcTrain(train = knowns, k = 10) predicted <- gpdcTest(train = knowns, test = testset, pre = gpdClassifier)
trainset <- LETTER[1:15000,] testset <- LETTER[-(1:15000), -1] knowns <- trainset[trainset$class==1, -1] gpdClassifier <- gpdcTrain(train = knowns, k = 10) predicted <- gpdcTest(train = knowns, test = testset, pre = gpdClassifier)
This function is used to train a GPD classifier. It can be used to perform open set classification based on the generalized Pareto distribution.
gpdcTrain(train, k)
gpdcTrain(train, k)
train |
a data matrix containing the train data. Class labels should not be included. |
k |
the number of upper order statistics to be used. |
For details on the method and parameters see Vignotto and Engelke (2018).
A list of three elements.
pshapes |
the estimated rescaled shape parameters for each point in the training dataset. |
balls |
the estimated radius for each point in the training dataset. |
k |
the number of upper order statistics used. |
Data are not scaled internally; any preprocessing has to be done externally.
Edoardo Vignotto
[email protected]
Vignotto, E., & Engelke, S. (2018). Extreme Value Theory for Open Set Classification-GPD and GEV Classifiers. arXiv preprint arXiv:1808.09902.
trainset <- LETTER[1:15000,] knowns <- trainset[trainset$class==1, -1] gpdClassifier <- gpdcTrain(train = knowns, k = 10)
trainset <- LETTER[1:15000,] knowns <- trainset[trainset$class==1, -1] gpdClassifier <- gpdcTrain(train = knowns, k = 10)
A dataset containing 16 features extracted from 20000 handwritten characters.
LETTER
LETTER
A data frame with 20000 rows and 17 variables:
class labels
first extracted feature
second extracted feature
third extracted feature
4th extracted feature
5th extracted feature
6th extracted feature
7th extracted feature
8th extracted feature
9th extracted feature
10th extracted feature
11th extracted feature
12th extracted feature
13th extracted feature
14th extracted feature
15th extracted feature
16th extracted feature
https://archive.ics.uci.edu/ml/datasets/letter+recognition/