Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DoF computing mistake in function verifyHomogeneity #142

Open
drobelin opened this issue Nov 13, 2017 · 3 comments
Open

DoF computing mistake in function verifyHomogeneity #142

drobelin opened this issue Nov 13, 2017 · 3 comments
Assignees

Comments

@drobelin
Copy link

Hi,
Seems there is a mistake in the calculation of the number of degree of freedom of the "verifyHomogeneity" test.

Running the example shows 35 DoF instead of 30 following the documentation ("an_introduction_to_markovchain_package.pdf, p.35), as the considered Markov chains have 6 states.

>  verifyHomogeneity(kullback)
 Testing homogeneity of DTMC underlying input list 
 ChiSq statistic is 275.9963 d.o.f are 35 corresponding p-value is 0 
 $statistic
 [1] 275.9963
 
 $dof
 [1] 35
 
 $pvalue
 [1] 0
@drobelin
Copy link
Author

May I open a new ticket, but it seems there is a bug in the calculation of "verifyHomogeneity" test. I made some simulations in order to evaluate the distribution of the test-statistic under null hypothesis. Below, I will find R code which compute by simulation the empirical distribution of the statistics given by verifyHomogeneity test, and a log-likelihood ratio.
I think that the formula p.35 of "an_introduction_to_markovchain_package.pdf" needs a correction too.

require(markovchain)

nbsimu = 10000               # Nb of simulation
statesNames=c("A","B")  # 2 states : A , B
n=c(100,200)                  # Respective length of both simulated markov chains
transitionMatrix1 = matrix(c(0.7,0.3,0.1,0.9),byrow=TRUE, nrow=2, dimnames=list(statesNames,statesNames))
transitionMatrix2 = matrix(c(0.7,0.3,0.1,0.9),byrow=TRUE, nrow=2, dimnames=list(statesNames,statesNames))

res = matrix(nrow = nbsimu, ncol = 5)   #To store results of each iteration
for(i in 1:nbsimu) {
  M1 = new("markovchain", transitionMatrix=transitionMatrix1) 
  M2 = new("markovchain", transitionMatrix=transitionMatrix2) 
  m1 = markovchainSequence(n = n[1], markovchain = M1)
  m2 = markovchainSequence(n = n[2], markovchain = M2)
  m = c(m1,NA,m2)  #useful to compute model under null hypothesis
  m1.cpt = unclass(table(m1,c(m1[-1],NA))) #class "matrix"
  m2.cpt = unclass(table(m2,c(m2[-1],NA))) #class "matrix"
  m1f = markovchainFit(m1,confint=FALSE)
  m2f = markovchainFit(m2,confint=FALSE)
  mf = markovchainFit(m)
  vh = verifyHomogeneity(list(m1.cpt,m2.cpt), verbose = FALSE)
  res[i,] = c(m1f$logLikelihood, m2f$logLikelihood, mf$logLikelihood, vh$statistic, vh$pvalue)
}
res = as.data.frame(res)
names(res) <- c("L1", "L2", "L","vhStat","vhPvalue")
# Correcting DoF following above comment (2 instead of 3)
res$vhPvalueCorr <- pchisq(res$vhStat, df = 2, lower.tail = FALSE)
#Log-likelihood ratio statistic
res$F = 2*((res$L1+res$L2)-res$L)
res$pvalue = pchisq(res$F,df = 2,lower.tail = FALSE)

# Under null hypothesis, p-value should follow a uniform(0,1) distribution
plot(hist(res$vhPvalue))
plot(hist(res$vhPvalueCorr))
plot(hist(res$pvalue))
plot(density(res$F))
plot(density(res$vhStat))

ks.test(res$F,"pchisq",df=2)
ks.test(res$vhStat,"pchisq",df=3)
ks.test(res$vhStat,"pchisq",df=3)
mean(res$F) ; var(res$F)
mean(res$vhStat) ; var(res$vhStat)

@spedygiorgio
Copy link
Owner

spedygiorgio commented Nov 23, 2017 via email

@xabiben
Copy link

xabiben commented Oct 17, 2018

So far I have seen, the degrees of freedom are calculated folowing the formula written in the "Kullbak et al 1962" publication (see table 9.1). However, I do not have enough knowlegde to judge if this is the proper way of calculating the dof´s.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants