Skip to content

Goodness of fit for complex MRDS model #96

@erex

Description

@erex
Member

Email from user indicated following error when using ddf.gof()

Mark-recapture component: 
Capture History 10 
[0,50] (50,100] (100,200] (200,300] Total 
Observed 131 118 160 81 490 
Expected 132 109 178 70 489 
Chisquare 0 1 2 2 4 
Capture History 01 
[0,50] (50,100] (100,200] (200,300] Total 
Observed 136 112 179 54 481 
Expected 134 112 178 58 482 
Chisquare 0 0 0 0 0 
Capture History 11 
[0,50] (50,100] (100,200] (200,300] Total 
Observed 126 78 128 21 353 
Expected 127 87 112 28 353 
Chisquare 0 1 2 2 5 
Total chi-square = 17.214 P = NaN with -32 degrees of freedom 
Warning message: In pchisq(chisq.1 + chisq.2, 3 * nc - length(model$par) - 1) : NaNs produced

As shown, data were binned with 4 intervals, fitted mrmodel had 29 estimated parameters:

> summary(MRDS.6ii7)
Summary for io.fi object
Number of observations : 1324
Number seen by primary : 843
Number seen by secondary : 834
Number seen by both : 353
AIC : 2705.31
Conditional detection function parameters:
                 estimate         se
(Intercept)        -1.0796625270 0.277260949
obsnameBP           1.1046744569 0.361828611
obsnameBW           0.5433139316 0.394464395
obsnameCF           1.1961084816 0.382167121
obsnameCS           1.1934733554 0.396189048
obsnameDH           0.7954464197 0.343565299
obsnameEC           1.1975425396 0.395574213
obsnameMG           0.5859840295 0.362524121
obsnameMH           2.1259248539 0.409690893
obsnamePL           1.5807107511 0.425884100
obsnamePOB          0.5848155343 0.401523354
obsnameRL           2.5931041668 0.495230694
obsnameRT           1.7839252838 0.421544945
obsnameSD           1.4995302978 0.439472284
distance           -0.0036643313 0.002600023
obsnameBP:distance -0.0099988195 0.003096288
obsnameBW:distance  0.0078620272 0.003474921
obsnameCF:distance -0.0067997640 0.003229027
obsnameCS:distance -0.0007569504 0.003252560
obsnameDH:distance -0.0017379043 0.002718869
obsnameEC:distance -0.0030372291 0.003149064
obsnameMG:distance  0.0046375639 0.003172420
obsnameMH:distance -0.0114362985 0.003412340
obsnamePL:distance -0.0102762333 0.003554344
obsnamePOB:distance 0.0068440041 0.003242920
obsnameRL:distance -0.0089213506 0.003706149
obsnameRT:distance -0.0079749867 0.003662896
obsnameSD:distance  0.0011171377 0.004304027
distance:size       0.0008886445 0.000333803

Solution should be to have ddf.gof test the number of parameters against the number of bins and not attempt a test if there are insufficient remaining degrees of freedom.

Activity

added
triageNew items to look at and decide what to do
on Nov 20, 2024
lenthomas

lenthomas commented on Nov 25, 2024

@lenthomas
Member

My first suggestion was: "I suggest to have a look at the Distance for Windows message in this case and reproduce the same kind of output."
After discussion, it seems clear this is a poor suggestion since DistWin calls mrds to do its calculations so this won't produce any useful insights - sorry, my bad!

added this to the 3.0.1 milestone on Jan 20, 2025
LHMarshall

LHMarshall commented on Jan 20, 2025

@LHMarshall
Member

This seems to be an issue only related to mr models... I will try now simulate some double observer data.

# Replicate the issue
# Simulate data

library(dsims)

covs <- list()
covs$size  <- list(list(distribution = "poisson", lambda = 25))
covs$height <- list(list(distribution = "normal", mean = 165, sd = 5))
covs$sex   <- data.frame(level = c("male", "female"),
                         prob = c(0.5, 0.5))
covs$col   <- data.frame(level = c("blue", "red"),
                         prob = c(0.4, 0.6))

# Define the population description (this time using the density to determine
# the population size)
popdesc <- make.population.description(covariates = covs,
                                       N = 250)

cov.param <- list()
cov.param$size <- c(log(1.02))
cov.param$height <- c(log(1.005))
cov.param$sex <- data.frame(level = c("male", "female"),
                            param = c(log(1.5), 0))
cov.param$col <- data.frame(level = c("blue", "red"),
                            param = c(log(1.5), 0))


# define the detecability
detect <- make.detectability(key.function = "hn",
                             scale.param = 5,
                             cov.param = cov.param,
                             truncation = 50)

plot(detect, popdesc)

simulation <- make.simulation(reps = 1,
                              population.description = popdesc,
                              detectability = detect)

# run an example survey to check the setup
survey <- run.survey(simulation)
plot(survey)

# Single observer analysis
analysis <- make.ds.analysis(dfmodel = list(~size+height+sex+col),
                             key = "hr",
                             truncation = 50,
                             cutpoints = c(0,10,25,50),
                             er.var = "R2",
                             criteria = "AIC")

# Binned analysis
fit <- analyse.data(analysis = analysis,
                    data.obj = survey@dist.data)

# Goodness of Fit
mrds::ddf.gof(fit)
# Goodness of fit results for ddf object
# 
# Chi-square tests
# [0,10] (10,25] (25,50]   Total
# Observed  51.000  58.000  42.000 151.000
# Expected  30.200  45.303  75.505 151.008
# Chisquare 14.326   3.559  14.868  32.752
# 
# No degrees of freedom for test

Image

LHMarshall

LHMarshall commented on Jan 21, 2025

@LHMarshall
Member

Following on from above code to generate a second set of observations

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Second set of observations

survey2 <- run.survey(survey, make.region())

ddata1 <- survey@dist.data
ddata2 <- survey2@dist.data

head(ddata1)
head(ddata2)

# Add observer and detected columns
ddata1$observer <- 1
ddata2$observer <- 2
ddata1$detected <- 1
ddata2$detected <- 1

# Combine the two datasets
tot.data <- rbind(ddata1, ddata2)

# Check which objects only have 1 entry (all need 2)
index <- which(table(tot.data$individual) == 1)
index <- as.numeric(names(index))

# Add rows for animals that were detected by one observer but not the other
tmp <- tot.data[tot.data$individual %in% index,]

# These animals were only detected by one observer
tmp$detected <- 0
new.observer <- tmp$observer
new.observer <- ifelse(new.observer==1,2,1)
tmp$observer <- new.observer

tot.data <- rbind(tot.data, tmp)

# Order the data - by observer
index <- order(tot.data$observer)
tot.data <- tot.data[index,]

# Order by individual
index <- order(tot.data$individual)
tot.data <- tot.data[index,]

# Remove object column
tot.data <- tot.data[,-1]
# Rename individual column to object
names(tot.data)[1] <- "object"

tot.data[1:25,]

# Add distend distbegin to the data
tot.data <- Distance:::create_bins(tot.data, c(0,10,25,50))


#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# MRDS analysis

fit.mr <- ddf(data = tot.data,
              method = "io",
              dsmodel = ~mcds(key="hr", formula = ~size+height+sex),
              mrmodel = ~glm(formula = ~size+height+sex+col),
              meta.data = list(binned = TRUE, breaks = c(0,10,25,50)))

ddf.gof(fit.mr)

# Goodness of fit results for ddf object
# 
# Chi-square tests
# 
# Distance sampling component:
#   [0,10] (10,25] (25,50]  Total
# Observed  51.000  65.000  70.000 186.00
# Expected  54.268  60.678  71.055 186.00
# Chisquare  0.197   0.308   0.016   0.52
# 
# No degrees of freedom for test
# 
# Mark-recapture component:
#   Capture History 10
# [0,10] (10,25] (25,50] Total
# Observed       1       8      18    27
# Expected       8      11      12    31
# Chisquare      6       1       4    11
# Capture History 01
# [0,10] (10,25] (25,50] Total
# Observed       0       7      28    35
# Expected       8      11      12    31
# Chisquare      8       1      23    33
# Capture History 11
# [0,10] (10,25] (25,50] Total
# Observed      50      50      24   124
# Expected      34      43      47   124
# Chisquare      7       1      11    19
# 
# MR total chi-square = 63.24  P = 1.7764e-15 with 1 degrees of freedom
# 
# 
# Total chi-square = 63.761  P = NaN with -2 degrees of freedom
# Warning message:
#   In pchisq(chisq.1 + chisq.2, 3 * nc - length(model$par) - 1) :
#   NaNs produced
LHMarshall

LHMarshall commented on Jan 22, 2025

@LHMarshall
Member

This needs more investigation... why does it think there is 1 degree of freedom for the MR model?

added and removed
triageNew items to look at and decide what to do
on Jan 22, 2025
LHMarshall

LHMarshall commented on Jan 22, 2025

@LHMarshall
Member

@lenthomas @erex

Do either of you know where these calculations for degrees of freedom come from?
Also should there be a distance sampling component output for the io.fi model?

For the io model there is 1 degree of freedom for the MR model as the calculation is:

# nc is the number of bins (3 bins)

df.2 <- 2*nc-length(model$mr$par)

fit.mr$mr$par
# (Intercept)         size       height      sexmale       colred 
#-3.929788952 -0.006962552  0.034717349 -0.123153471 -0.243855659 

For an io.fi model the calculation for degrees of freedom is 3*3-3-1 = 5
(there was an error when trying to fit with factors with only 2 levels so had to remove sex & col)

Also isn't it strange that there is still a Chisquare table output for the distance sampling component?

# degrees of freedom calculation for total chi-square
df=3*nc-length(model$par)-1

fit.mr.fi <- ddf(data = tot.data,
                 method = "io.fi",
                 mrmodel = ~glm(formula = ~size+height),
                 meta.data = list(binned = TRUE, breaks = c(0,10,25,50)))

ddf.gof(fit.mr.fi)

# Goodness of fit results for ddf object
# 
# Chi-square tests
# 
# Distance sampling component:
#   [0,10] (10,25] (25,50]   Total
# Observed  51.000  65.000  70.000 186.000
# Expected  37.200  55.800  93.000 186.000
# Chisquare  5.119   1.517   5.688  12.324
# 
# No degrees of freedom for test
# 
# Mark-recapture component:
#   Capture History 10
# [0,10] (10,25] (25,50] Total
# Observed       1       8      18    27
# Expected       8      11      12    31
# Chisquare      7       1       3    11
# Capture History 01
# [0,10] (10,25] (25,50] Total
# Observed       0       7      28    35
# Expected       8      11      12    31
# Chisquare      8       1      23    33
# Capture History 11
# [0,10] (10,25] (25,50] Total
# Observed      50      50      24   124
# Expected      34      43      47   124
# Chisquare      7       1      11    19
# 
# 
# Total chi-square = 75.189  P = 8.5487e-15 with 5 degrees of freedom

erex

erex commented on Jan 27, 2025

@erex
MemberAuthor

@LHMarshall Best description of GOF tests is on p.5 of David's 2006 Biometrics paper

borchers et al(2006).pdf

David might be the source of answers for MRDS GOF questions.

For the io configuration that threw the error for the user to generate this issue in 2023, I think the problem lies with the overly ambitious return statement that ends gof.io.R. The return statement carries out a call to pchisq without checking whether the degrees of freedom argument is legitimate.

return(list(chi1=list(observed=observed.count.1,

Breaking down that return and checking that degrees of freedom is positive and gracefully reporting what happens if inadmissible, seems the simplest solution.

added a commit that references this issue on Jan 27, 2025
LHMarshall

LHMarshall commented on Jan 27, 2025

@LHMarshall
Member

@erex yes, I'd already fixed that (sorry hadn't yet pushed as I was still constructing a test for the test suite) but was just checking the df calculations as I didn't uderstand them. I think this looks like an issue for other MR models also but my test dataset was failing for the io.fi model as it had factors with 2 covariate levels. I also noticed that the io.fi model gave 2 chi quare outputs the first still referring to the 'Distance Sampling Component' this seemed strange as there is no ds component for fi models I thought?

@lenthomas should the 'distance sampling component' in the chi square output be removed for fi models?

LHMarshall

LHMarshall commented on Jan 27, 2025

@LHMarshall
Member

In print.ddf.gof line 33-35 it prints out Distance sampling component heading if chi2 is not NULL

if(!is.null(gof$chisquare$chi2)){
      cat("\nDistance sampling component:\n")
    }

Firstly, I can't currently see a situation where this would ever be NULL as there always seems to be a chi2 component. If there are insufficient degrees of freedom then the $df and $p components are NA. Hmm ok it is NULL if it is a gof test for a ds model... ok so I guess in that case there is no header.

Secondly, the code following this does not print out the chi2 component but it prints out the chi1 table: lines 37:44

print(chitable(gof$chisquare$chi1$observed, gof$chisquare$chi1$expected, digits))

    if(!is.na(gof$chisquare$chi1$p)){
      cat(paste("\nP = ", format(gof$chisquare$chi1$p, digits=5),
                " with ", gof$chisquare$chi1$df," degrees of freedom\n",sep=""))
    }else{
      cat("\nNo degrees of freedom for test\n")
    }
lenthomas

lenthomas commented on Jan 28, 2025

@lenthomas
Member

@lenthomas should the 'distance sampling component' in the chi square output be removed for fi models?

Although the distance sampling component is not part of the likelihood in fi models, there is no reason why we cannot check the fit of a component we didn't fit to. For example, in mark recapture models, it's common to do goodness of fit tests to look for a trap-shyness effect even if that wasn't part of the model fit -- if the GOF test fails then one goes ahead and fits a trap shyness model. Here I would think if the distance sampling component means the fi GOF fails then that's a spur to fit a pi model.

However, I suggest double checking with David Borchers for his thoughts.

lenthomas

lenthomas commented on May 7, 2025

@lenthomas
Member

Just to clarify, the thing I suggested checking with David is how to calculate the degrees of freedom for the Distance sampling component of the test when it's a full independence model. I can see how the test statistic is constructed, but not easily how to calculate the degrees of freedom for the test.

modified the milestone: 3.0.1 on Jul 1, 2025
LHMarshall

LHMarshall commented on Jul 2, 2025

@LHMarshall
Member

Add the table to the help, suggested location ddf.gof

The table below is currently being checked, an incinsistency is highlighted

Image

added a commit that references this issue on Jul 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Relationships

None yet

    Development

    No branches or pull requests

      Participants

      @lenthomas@erex@LHMarshall

      Issue actions

        Goodness of fit for complex MRDS model · Issue #96 · DistanceDevelopment/mrds