Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug fix] avoid duplicate creation of rrr/sss variables for models with multiple components of the same period #29

Open
RWParsons opened this issue Nov 5, 2024 · 2 comments

Comments

@RWParsons
Copy link
Member

Motivated by @MilaSMayor #27

When there is a model with multiple components with the same period (i.e in the main formula), there are multiple rrr/sss variables created and when the model is fit, it is rank deficient.

reprex:


test_data <- structure(list(row = c(
  4790, 18351, 8537, 20294, 9658, 4846,
  20603, 5223, 2067, 9061, 20751, 13979, 5870, 12182, 967, 2546,
  3189, 3220, 13284, 1002, 13459, 12946, 2946, 3601, 13539, 10268,
  13002, 17665, 7646, 5958, 10857, 20276, 16712, 1694, 216, 12010,
  10686, 12147, 15580, 20189, 9939, 9503, 101, 10948, 5775, 7025,
  8109, 15669, 9830, 4564, 425, 20181, 16576, 15469, 2783, 10341,
  16893, 20882, 18252, 20306, 203, 11359, 1225, 16813, 9084, 17500,
  3147, 298, 5264, 5396, 565, 5883, 15172, 533, 11996, 12597, 11396,
  10913, 4000, 5098, 455, 1614, 13075, 14176, 10470, 10993, 16228,
  430, 15060, 17038, 9419, 16489, 5773, 8590, 640, 19471, 18603,
  12875, 15504, 57
), new_ID = c(
  12, 36, 21, 41, 22, 12, 41, 13,
  5, 22, 41, 30, 16, 26, 3, 6, 7, 7, 28, 3, 28, 28, 7, 9, 28, 23,
  28, 35, 19, 16, 24, 41, 34, 5, 2, 26, 24, 26, 32, 40, 23, 22,
  1, 25, 16, 18, 20, 32, 23, 12, 2, 40, 33, 32, 7, 24, 34, 41,
  36, 41, 2, 25, 3, 34, 22, 35, 7, 2, 13, 13, 2, 16, 31, 2, 26,
  27, 25, 24, 10, 12, 2, 4, 28, 30, 24, 25, 33, 2, 31, 34, 22,
  33, 16, 21, 2, 38, 36, 27, 32, 1
), P1 = c(
  1, 0, 1, 0, 0, 0, 1,
  0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1,
  0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0,
  0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0,
  0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0,
  0, 0, 1, 0, 0, 1, 1, 0, 0
), hour = c(
  9, 5, 4, 4, 22, 17, 1, 19,
  4, 20, 5, 15, 16, 16, 17, 18, 0, 10, 9, 4, 16, 6, 21, 3, 0, 14,
  14, 1, 10, 9, 2, 10, 6, 13, 10, 12, 15, 5, 17, 19, 21, 11, 20,
  23, 17, 22, 10, 19, 8, 23, 3, 11, 14, 1, 6, 17, 22, 23, 13, 16,
  21, 18, 11, 11, 19, 19, 5, 20, 13, 4, 23, 5, 11, 15, 22, 13,
  8, 11, 4, 14, 9, 16, 16, 20, 3, 9, 0, 8, 19, 23, 19, 23, 15,
  22, 2, 0, 17, 0, 13, 3
), P2 = c(
  0, 0, 1, 0, 0, 0, 0, 0, 0, 0,
  0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,
  0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0,
  0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1,
  1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1,
  1, 0, 0, 0, 0, 0
), original_data = c(
  81.1505152399503, 25.290871802584,
  105.638906673811, 37.6492569611346, 75.2487861724966, 65.6833160668856,
  9.26783774456766, 37.5826334335925, 132.174068026666, 57.505395243653,
  13.5776684159436, 34.1192965220782, 40.4604603007995, 55.9040534073833,
  49.6843789034094, 47.5811345810795, 105.434826734125, 149.723681235355,
  128.755907667873, 84.8629469676206, 101.093905272917, 79.7384167936468,
  109.690049660256, 47.0894715665344, 134.577092010154, 119.630895698006,
  86.3222531011461, 111.629047174657, 57.0934238631868, 69.6968295542609,
  21.2675152551682, 21.4871946212554, 202.0797786341, 159.959221228979,
  104.101782114698, 85.2241982528729, 55.7058656015178, 105.785198530154,
  135.107579179321, 36.5356093367404, 106.377426214008, 89.5726564511067,
  84.7104602516007, 39.5980040753834, 24.9267309607656, 135.335264866478,
  123.325858027693, 133.09574834453, 45.911723466512, 50.527642877713,
  132.443340274967, 86.2973786263588, 102.03972044404, 82.341153412036,
  119.671509809086, 92.6547343881397, 111.477879383805, 18.8626183172769,
  25.4766095900354, 24.9683664722985, 125.682732431718, 40.7463158296111,
  65.4705467057566, 106.495240867716, 114.097125665045, 66.3119368814677,
  103.421642414608, 133.292151951722, 75.4031906956303, 42.7837715366637,
  169.545418885058, 36.3449779611802, 89.3729830186157, 111.991893048835,
  87.5508149602019, 78.5903719775091, 58.4495859588738, 41.1225123752,
  22.5018080277647, 64.7797948532742, 154.513873104168, 51.1921505575342,
  129.457825425795, 31.1606685505752, 181.918933013, 33.0354243947179,
  47.6738355795763, 110.168010508212, 70.7558731891458, 278.711410318731,
  93.2632025310014, 44.1055460201902, 81.2456787545665, 144.904199653632,
  77.0182663030833, 63.2390713860117, 31.8291538681657, 116.205553374686,
  175.390477503616, 60.5171426584574
)), row.names = c(NA, -100L), class = c("tbl_df", "tbl", "data.frame"))

cglmm(
  original_data ~ P2 * P1 +
    amp_acro(time_col = "hour", n_components = 1, group = c("P2"), period = c(24)) +
    amp_acro(time_col = "hour", n_components = 1, group = "P1", period = 24) +
    (1 | new_ID),
  REML = TRUE,
  data = send |> sample_n(100)
)

cglmm(
  original_data ~ P2 * P1 +
    amp_acro(time_col = "hour", n_components = 2, group = c("P2", "P1"), period = c(24, 24)) +
    (1 | new_ID),
  REML = TRUE,
  data = send |> sample_n(100)
)

Both of these should work but neither do. They work if you change the period of one of the components to be 23 but this shouldn't be necessary.

They can also be fixed by changing the formula (start of data_processor()) to use main_rrr1 and main_sss1 for both interaction terms rather instead of having one of the groups interact with a main_[rrr/sss]2.

@MilaSMayor
Copy link

MilaSMayor commented Nov 5, 2024

Many thanks Rex. But I continue missing something. If I want to run this model:
Y = P1*P2 + (1|new_id)
Applying cosinor mixed model, I would expect all the terms in the formula as you can see below.
Y ~ P1 + P2 + P1*P2 + P1*rrr + P1*sss + P2*rrr + P2*sss + P1*P2*rrr + P1*P2*sss + (1|new_id)

As well, both P1 and P2 are factors. Many thanks again.

Best,

Mila

@RWParsons
Copy link
Member Author

Ahh yeah I see. I've made a model here which includes all groupings by creating a new variable as the combination of P1 and P2.

d <- read.delim("TestSend.txt") |> as_tibble()

cnames <- names(d) |>
  str_split("\\.") |>
  unlist()

send <- d |>
  separate_wider_delim(cols = 1, delim = " ", names = c("row", cnames), too_many = "drop") |>
  mutate(
    across(everything(), as.numeric),
    p1_2 = paste0(P1, P2)
  )

m <- cglmm(
  original_data ~ P2 * P1 +
    amp_acro(time_col = "hour", n_components = 1, group = c("p1_2"), period = c(24)) +
    (1 | new_ID),
  REML = TRUE,
  data = send
)

milas-model2.zip

This should give you all the combinations that you want but the parameters of the model are a bit more annoying to interpret than if they were done as an interaction. You should be able to do this manually using the results for the different groupings in summary (summary(m)).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

2 participants