README.Rmd

---
output: github_document
bibliography: vignettes/bibliography.bib
---

<!-- README.md is generated from README.Rmd. Please edit that file -->

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)
library(ggplot2)
library(outlierensembles)
library(dbscan)
```

# outlierensembles
<img src='man/figures/logo.png' align="right" height="138" />

 <!-- badges: start -->
  [![R-CMD-check](https://github.com/sevvandi/outlierensembles/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/sevvandi/outlierensembles/actions/workflows/R-CMD-check.yaml)
  <!-- badges: end -->

**outlierensembles** provides a collection of outlier/anomaly detection ensembles. Given the anomaly scores of different anomaly detection methods, the following ensemble techniques can be used to construct an ensemble score:

1. Item Response Theory based ensemble discussed in @kandanaarachchiirtensemble
2. Greedy ensemble discussed in @Schubert2012
3. Inverse Cluster Weighted Averaging (ICWA) method discussed in @Chiang2017
4. Using Maximum scores discussed in @Aggarwal2015
5. Using a threshold sum discussed in @Aggarwal2015
6. Using the mean as the ensemble score


## Installation

You can install the released version of outlierensembles from [CRAN](https://CRAN.R-project.org) with:

``` r
install.packages("outlierensembles")
```

And the development version from [GitHub](https://github.com/) with:

``` r
# install.packages("devtools")
devtools::install_github("sevvandi/outlierensembles")
```
## Example
We use methods from dbscan R package as to find anomalies. You can use any anomaly detection method you want to build the ensemble.  First, we construct the IRT ensemble. The colors show the ensemble scores. 
```{r example1}
faithfulu <- scale(faithful)
# Using different parameters of lof for anomaly detection
y1 <- dbscan::lof(faithfulu, minPts = 5)
y2 <- dbscan::lof(faithfulu, minPts = 10)
y3 <- dbscan::lof(faithfulu, minPts = 20)
knnobj <- dbscan::kNN(faithfulu, k = 20)
# Using different KNN distances as anomaly scores
y4 <- knnobj$dist[ ,5]
y5 <- knnobj$dist[ ,10]
y6 <- knnobj$dist[ ,20]
# Dense points are less anomalous. Points in less dense areas are more anomalous. Hence 1 - pointdensity is used.
y7 <- 1 - dbscan::pointdensity(faithfulu, eps = 1, type="gaussian")
y8 <- 1 - dbscan::pointdensity(faithfulu, eps = 2, type = "gaussian")
y9 <- 1 - dbscan::pointdensity(faithfulu, eps = 0.5, type = "gaussian")


Y <- cbind.data.frame(y1, y2, y3, y4, y5, y6, y7, y8, y9)
ens1 <- irt_ensemble(Y)
df <- cbind.data.frame(faithful, ens1$scores)
colnames(df)[3] <- "IRT"
ggplot(df, aes(eruptions, waiting)) + geom_point(aes(color=IRT))  +  scale_color_gradient(low="yellow", high="red") 
```

Then we do the greedy ensemble. 
```{r example2}
ens2 <- greedy_ensemble(Y)
df <- cbind.data.frame(faithful, ens2$scores)
colnames(df)[3] <- "Greedy"
ggplot(df, aes(eruptions, waiting)) + geom_point(aes(color=Greedy)) +  scale_color_gradient(low="yellow", high="red") 
```


We do the ICWA ensemble next.
```{r example3}
ens3 <- icwa_ensemble(Y)
df <- cbind.data.frame(faithful, ens3)
colnames(df)[3] <- "ICWA"
ggplot(df, aes(eruptions, waiting)) + geom_point(aes(color=ICWA)) +  scale_color_gradient(low="yellow", high="red") 
```

Next, we use the maximum scores to build the ensemble.
```{r example4}
ens4 <- max_ensemble(Y)
df <- cbind.data.frame(faithful, ens4)
colnames(df)[3] <- "Max"
ggplot(df, aes(eruptions, waiting)) + geom_point(aes(color=Max)) +  scale_color_gradient(low="yellow", high="red") 
```

Then, we use the a threshold sum to construct the ensemble.
```{r example5}
ens5 <- threshold_ensemble(Y)
df <- cbind.data.frame(faithful, ens5)
colnames(df)[3] <- "Threshold"
ggplot(df, aes(eruptions, waiting)) + geom_point(aes(color=Threshold)) +  scale_color_gradient(low="yellow", high="red") 
```

Finally, we use the mean values as the ensemble score. 
```{r example6}
ens6 <- average_ensemble(Y)
df <- cbind.data.frame(faithful, ens6)
colnames(df)[3] <- "Average"
ggplot(df, aes(eruptions, waiting)) + geom_point(aes(color=Average)) +  scale_color_gradient(low="yellow", high="red") 
```

## References