This project is made to fit multivaiate distribution using copula approach. To consider complex dependencies between the random vaiables we extend the classical model with constant parameters to stochastic model, where parameter considered as Ornstein-Uhlenbeck process. The idea based on discrete model suggested in Liesenfeld and Richard, Univariate and multivariate stochastic volatility models (2003).
For that model we introduce a likelihood function as an expectation over all random process states and use Monte Carlo estimates for that expectation while solving parameter estimation problem. The approach discussed in detail in our article (link will be availble when it will published). For brief description and code examples see below. Frank, Gumbel, Clayton and Joe copulas were implemented in the code. Goodness-of-fit (GoF) metrics based on Rosenblatt transform is also available.
The stochastic models is used here to calculate metrics VaR and CVaR and build a CVaR-optimized portfolio. The calculation based on paper Rockafellar, Uryasev Optimization of conditional value-at-risk (2000).
This project is made during the MSc program "Financial mathematics and financial technologies" in Sirius University, Sochi, Russia. 2023-2024.
Installation could be made using pip:
git clone https://github.com/AANovokhatskiy/pyscarcopula
cd pyscarcopula
pip install .
According to Sklar's theorem joint distribution could be constructed using special function of marginal distributions is named copula.
Here we consider only a class of single-parameter copulas known as Archimedian. Denote
Here we use the following copulas
Copula | |||
---|---|---|---|
Gumbel | |||
Frank | |||
Joe | |||
Clayton |
Classical approach implies that
and
This approach for discrete case was proposed by Liesenfield and Richard, Univariate and Multivariate Stochastic Volatility Models: Estimation and Diagnostics (2003) and used in Hafner and Manner, Dynamic stochastic copula models: estimation, inference and applications (2012). Here we develop this approach in continious case.
For copula GoF we use Rosenblatt transform. For given multivariate pseudo observations dataset
is uniformly disturbed (see details in Hering and Hofert, Goodness-of-fit Tests for Archimedean Copulas in High Dimensions (2015)). This fact could be checked using various methods. We use a scipy implementation of Cramer-Von-Mises test.
Let
Rockafellar and Uryasev in Optimization of conditional value-at-risk (2000) proved that:
Here we solve this minimizations problems numerically using Monte-Carlo estimates of function
import pandas as pd
import numpy as np
crypto_prices = pd.read_csv("data/crypto_prices.csv", index_col=0, sep = ';')
from pyscarcopula import GumbelCopula, FrankCopula, JoeCopula, ClaytonCopula
from pyscarcopula import VineCopula, GaussianCopula, StudentCopula
copula = GumbelCopula(dim = 2)
Also available GumbelCopula
, JoeCopula
, ClaytonCopula
, FrankCopula
, 'VineCopula', 'GaussianCopula', StudentCopula
For initialized copula it is possible to show a cdf as a sympy expression (only for Archimedian):
copula.sp_cdf()
with output
It is also possible to call sp_pdf() to show pdf expression. But the anwser would be much more complex.
Stochastic methods also available for VineCopula() with specified parameter method (mle by default)
tickers = ['BTC-USD', 'ETH-USD']
returns_pd = np.log(crypto_prices[tickers] / crypto_prices[tickers].shift(1))[1:501]
returns = returns_pd.values
fit_result = copula.fit(data = returns, method = 'scar-m-ou', seed = 333, to_pobs = True)
fit_result
Function fit description:
- data -- initial dataset: log-returns or pseudo observations (see to_pobs description). Type Numpy Array. Required parameter.
- method -- calculation method. Type Literal. Available methods: mle, scar-p-ou, scar-m-ou, scar-p-ld. Required parameter.
- mle - classic method with constant parameter
- scar-p-ou - stochastic model with Ornstein-Uhlenbeck process as a parameter. latent_process_tr = 10000 is good choice here
- scar-m-ou - stochastic model with Ornstein-Uhlenbeck process as a parameter and implemented importance sampling Monte-Carlo techniques. latent_process_tr = 500 is good choice here
- scar-p-ld - stochastic model with process with logistic distribution transition density (experimental). latent_process_tr = 10000 is good choice here
- alpha0 -- starting point for optimization problem. Type Numpy Array. Optional parameter.
-
tol -- stop criteria (gradient norm). Type float. Optional parameter. Default value:
$10^{-5}$ for mle,$10^{-2}$ for other methods. - to_pobs -- transform data to pseudo observations. Type bool. Optional parameter. Default value False.
-
latent_process_tr -- number of latent process trajectories (for mle is ignored). Type int. Optional parameter. Default value
$500$ . -
M_iterations -- number of importance sampling steps (only for scar-m-ou). Type int. Optional parameter. Default value
$3$ . - seed -- random state. Type int. Optional parameter. Default value None. If None then every run of program would unique. Parameter is ignored if dwt is set explicitly.
- dwt -- Wiener process values. Type Numpy Array. Optional parameter. Default value None. If None then function generate dataset automatically.
- stationary -- stochastic model with stationary transition density. Type bool. Default value False. Optional parameter.
- print_path -- If True shows all output of optimization process (useful for debugging). Type bool. Default value False. Optional parameter.
fit result is mostly scipy.minimize output
message: CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL
success: True
status: 0
fun: 219.33843671302438
x: [ 9.619e-01 2.346e+00 1.451e+00]
nit: 6
jac: [-2.045e-03 5.006e-03 2.361e-03]
nfev: 28
njev: 7
hess_inv: <3x3 LbfgsInvHessProduct with dtype=float64>
name: Gumbel copula
method: scar-m-ou
latent_process_tr: 500
stationary: False
M_iterations: 3
Where fun is log likelihood and x - parameter set
Goodness of fit for copula using Rosenblatt transform:
from pyscarcopula.stattests import gof_test
gof_test(copula, moex_returns, fit_result, to_pobs=True)
with output
CramerVonMisesResult(statistic=0.19260862172251336, pvalue=0.28221824469771717)
It is possible to get sample of pseudo observations from copula
#sampling from copula with constant parameter
copula.get_sample(size = 1000, r = 1.2)
The output is array of shape = (N, dim). Copula parameter r should be set manually (array of size N is also supported for sampling time dependent parameter).
Sampling from random process state also available
#sampling from copula with time-dependent parameter
from pyscarcopula.sampler.sampler_ou import stationary_state_ou
size = 2000
random_process_state = copula.transform(stationary_state_ou(fit_result.x, size))
copula.get_sample(size = size, r = random_process_state)
#consider multivariate case
tickers = ['BTC-USD', 'ETH-USD', 'BNB-USD', 'ADA-USD', 'XRP-USD', 'DOGE-USD']
returns_pd = np.log(crypto_prices[tickers] / crypto_prices[tickers].shift(1))[1:]
returns = returns_pd.values
copula = StudentCopula()
from pyscarcopula.metrics import risk_metrics
gamma = [0.95]
window_len = 250
latent_process_tr = 500
MC_iterations = [int(10**5)]
M_iterations = 5
method = 'mle'
marginals_method = 'johnsonsu'
count_instruments = len(tickers)
portfolio_weight = np.ones(count_instruments) / count_instruments
result = risk_metrics(copula,
returns,
window_len,
gamma,
MC_iterations,
marginals_method = marginals_method,
latent_process_type = method,
optimize_portfolio = False,
portfolio_weight = portfolio_weight,
latent_process_tr = latent_process_tr,
M_iterations = M_iterations,
)
Function risk_metrics description
- copula -- object of class ArchimedianCopula (and inherited classes). Type ArchimedianCopula. Required parameter.
- data -- log-return dataset. Type Numpy Array. Required parameter.
- window_len -- window len. Type int. Required parameter. To use all available data use length of data.
-
gamma -- significance level. Type float or Numpy Array. Required parameter. If Numpy Array then calculations is made for every element of array. Made for minimizaion of repeated calculations. Usually one set, for example,
$0.95$ ,$0.97$ ,$0.99$ or array$[0.95, 0.97, 0.99]$ . Default value$0.95$ . Optional parameter. -
MC_iterations -- number of Monte-Carlo iterations that used for risk metrics calculations. Type int or Numpy Array. Required parameter. If Numpy Array then calculations is made for every element of array. Possible value, for example,
$10^4$ ,$10^5$ ,$10^6$ and so on. Default value$10^5$ . Optional parameter. - marginals_params_method -- method of marginal distribution fit. Type Literal. Available methods: normal, hyperbolic, stable, logistic, johnsonsu, laplace. Default value johnsonsu. Optional parameter.
- latent_process_type -- type of stochastic process that used as copula parameter. Type Literal. Available methods: mle, scar-p-ou, scar-m-ou, scar-p-ld. Default value mle. Optional parameter.
-
optimize_portfolio -- parameter responsible for the need to search for optimal CVaR weights.Type Bool. Optional parameter. Default value
$True$ . - portfolio_weight -- portfolio weight. Type Numpy Array. Optional parameter. If optimize_portfolio = True, this value is ignored. Default value -- equal weighted investment portfolio.
- kwargs -- keyworded arguments for copula fit (see fit method desctiption). Type int. Optional parameter.
Calculated values could be extracted as follows
var = result[0.95][1000000]['var']
cvar = result[0.95][1000000]['cvar']
portfolio_weight = result[0.95][1000000]['weight']
Plot the CVaR metrics:
from pyscarcopula.metrics import cvar_emp_window
from matplotlib import pyplot as plt
import matplotlib.ticker as plticker
pd_var_95 = pd.Series(data = -result[gamma[0]][MC_iterations[0]]['var'], index=returns_pd.index).shift(1)
pd_cvar_95 = pd.Series(data = -result[gamma[0]][MC_iterations[0]]['cvar'], index=returns_pd.index).shift(1)
weight = result[gamma[0]][MC_iterations[0]]['weight']
n = 1
m = 1
i1 = window_len
i2 = len(returns) - 1
fig,ax = plt.subplots(n,m,figsize=(10,6))
loc = plticker.MultipleLocator(base=127.0)
daily_returns = ((np.exp(returns_pd) - 1) * weight).sum(axis=1)
cvar_emp = cvar_emp_window(daily_returns.values, 1 - gamma[0], window_len)
ax.plot(np.clip(daily_returns, -0.2, 0.2)[i1:i2], label = 'Portfolio log return')
ax.plot(cvar_emp[i1:i2], label = 'Emperical CVaR', linestyle='dashed', color = 'gray')
ax.plot(pd_cvar_95[i1:i2], label= f'{method} {marginals_method} CVaR 95%')
ax.set_title(f'Daily returns', fontsize = 14)
ax.xaxis.set_major_locator(loc)
ax.set_xlabel('Date', fontsize = 12, loc = 'center')
ax.set_ylabel('Log return', fontsize = 12, loc = 'center')
ax.tick_params(axis='x', labelrotation = 15, labelsize = 12)
ax.tick_params(axis='y', labelsize = 12)
ax.grid(True)
ax.legend(fontsize=12, loc = 'upper right')
Examples of usage of this code coulde be found in example.ipynb notebook.