Skip to content

Computing speed slows down as iteration increases #1775

@xanalee

Description

@xanalee

Hi~

I am currently doing GLMM estimation on 1000 simulated datasets by brms and evaluating the computational cost. However, I find the compute time becomes longer and longer as interation increases. The two figures below show the computational slowdown observed when I running the experiments locally and on ALICE HPC platform, respectively.

Image
Image

Here are the versions of R and brms I used:

  • Local environment: R version 4.4.3, brms version 2.22.0
  • ALICE HPC platform: R version 4.4.0, brms version 2.22.0

And I am pretty sure the simulated datasets are homogeneous since they are generated by Monte Carlo simulation. The super smooth curves make me guess there might be accumulation of internal variables or uncleaned caches. Could you please help me to figure out the issue?

I give my script here for reference.

library(brms)

path = ''  # edit as appropriate

n_cores = 4
n_threads = 2

sim_brms = function(data_ls, num_sub, num_time, num_sim, resp_dist){
  out_path = paste0(path, 'brms_', 'n', num_sub, '_t', num_time, '_sim', num_sim, '_', resp_dist, '/')
  if (!dir.exists(out_path)) {
    dir.create(out_path, recursive = TRUE, showWarnings = FALSE)
  }
  for (i in 1:num_sim){
    # Monitor
    cat(paste('brms_i: ', i, '\n'))
    
    data = data_ls[[i]]
    
    # Define helper functions
    check_convergence = function(model, n_term) {
      if (inherits(model, "try-error")) return("error")
      Rhat = rhat(model)
      if (any(abs(Rhat[1:n_term] - 1) >= 0.1)) return("official_fail")
      "good"
    }
    
    # Specify response distribution
    if (resp_dist == 'binomial'){
      fami = bernoulli(link = 'logit')
    }
    if (resp_dist == 'Poisson'){
      fami = poisson(link = 'log')
    }
    
    # Fit full model on full dataset
    compute_time = system.time({
      brms_model_full = try(brm(y_full ~ time + group + time:group + (1|id),
                                family = fami,
                                data = data,
                                threads = threading(n_threads),
                                cores = n_cores,
                                seed = 123
      ))})[['elapsed']]
    conv_status = check_convergence(brms_model_full, 5)
    out = c('compute_time' = compute_time, 'converge_rate' = conv_status)
    save(out, file = paste0(out_path, i, '.RData'))
    gc()
  }
}

n_sim = 1000
n_sub = 20
n_time = 2
re_dist = 'binomial'
cat(paste0('n', n_sub, '_t', n_time, '_', re_dist, '\n'))
load(paste0(path, 'data/n', n_sub, '_t', n_time, '_sim', n_sim, '_', re_dist,  '_rdi.RData'))
sim_brms(data_ls = simulate_data_ls, num_sub = n_sub, num_time = n_time, num_sim = n_sim, resp_dist = re_dist)

Thank you in advance and look forward for a reply.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions