Description
After studying the performance and data profiles, I think we can generalize their definition and implement something that is extensible and not so focused on mathematical programming.
This proposal is a brain dump of what I am thinking and may not reflect the final product.
cc. @rgaiacs @lrsantos11
Generalization of perprof
Suppose we have a set of algorithms
The performance profile defines these quantities:
-
$c_{ap} \in (0,\infty) \cup {\infty}$ : Cost of algorithm$a$ to find a good enough solution to problem$p$ . -
$\displaystyle c^+_p = \min_a c_{ap}$ : The lowest cost to solve$p$ . -
$r_{ap} = \dfrac{c_{ap}}{c^+_p}$ : The relative cost of solving problem$p$ . Use$r_{ap} = \infty$ is$c^+_p = \infty$ .
The performance profile is the cumulative distribution of the relative cost:
Above,
The data profile has two parts. The first is to change the definition of the relative cost by defining
-
$d_p \in (0,\infty)$ : Difficulty of solving problem$p$ . Higher is more difficult. -
$r_{ap} = \dfrac{c_{ap}}{d_p}$ : The relative cost of solving problem$p$ .
In other words, the first thing that the data profile changes is that the cost is relative to the problem, not to the fastest algorithm.
The problem is that this measure alone is not enough for algorithms that don't know if they solved the problem.
The second part of the data profile deals with this situation.
For example, a black-box optimization algorithm will know that it is improving, but it will never be sure that it found the solution.
In these cases, it is acceptable to compare the solution found by the algorithms and declare the best solution as the goal, and use a quality threshold to decide whether the not-best solutions have found an acceptable solution.
For instance, in optimization we aim to minimize the objective function
Assume that we have for each algorithm the list of best objective values per iteration per problem
(Notice that they all start from the same point).
The best solution will be $f^{\min}p = \min_a f{ap}^N$, and a relative measure per iteration based on this best would be
This measure ranges from 0% to 100%.
We can then say that an algorithm has found an acceptable solution if there is some
Now, the data profile finalizes by redefining
-
$c_{ap}$ : the cost of reaching iteration$i$ , if it exists, or$\infty$ otherwise.
In the original data profile paper, the quality was the objective value as we defined above, and the cost was the number of function evaluations.
The difficulty
So, the relative cost of solving the problem is a measure called the simplex gradient.
Under this condition, the probability of solving a problem with a relative cost up to
Revamp proposal
- I want to implement the structures to handle the performance and data profile described above, and possibly others, such as the Sven Leyffer extend profile.
- I want to be able to experiment with new profiles easily.
- I want to be able to run it from Python directly
- I want a lightweight application
- I want to have extendable plotting backends (preferably keeping it lightweight)
- I want to have the CLI separate from the package
- I want a web version that I can use for presentations, where I can show other performance metrics and use dials to control different parameters (such as the threshold quality)
Technical details
This is a very early draft.
Example
import perprof as pp
import matplotlib.plot as plt
import numpy as np
import pandas as pd
# From a Data Frame
pp.performance_profile(
results : pd.DataFrame,
columns : Union[string,List] = "all", # Which columns
)
pp.data_profile(
results : pd.DataFrame,
difficulty : String, # Which column is the difficulty
columns = "all", # Which other columns are solvers
)
# From vectors
pp.performance_profile(solver1 : np.array, solver2 : np.array, ...)
pp.data_profile(difficulty, solver1, solver2, ...)
# From dictionaries
pp.performance_profile({ 'solver 1': [...], 'solver 2': [...] })
pp.data_profile({ 'difficulty': [...], 'solver 1': [...], 'solver 2': [...] })
pp.data_profile({ 'difficulty': [...],
'solver 1': {
'problem 1': [...] # Cost to achieve the desired quality
'problem 2': [...]
...
})
pp.data_profile({ 'difficulty': [...],
'solver 1': {
'problem 1': {
'quality': [...]
'cost': [...]
},
'problem 2': {
'quality': [...]
'cost': [...]
},
...
},
quality_threshold
)
Internals
Classes
- abstract AlgorithmData: Stores the information of a given algorithm. By default contains the cost per problem
- AlgorithmDataForQualityOverCost
- abstract RelativeCost
- RelativeCostVSBest
- RelativeCostVSDifficulty
- RelativeCostVSBestExtended
- abstract Backend: Implements plotting of Cumulative Relative Cost
- MatplotlibBackend