Skip to content

Dead lock detacted when training VQ-GAN #13

@ericli2333

Description

@ericli2333

I run the program with jax[cuda12] and I met dead lock when training VQ-GAN with the command in the README.md.

Here is what strace outputs:

futex(0x752a68, FUTEX_WAIT_BITSET_PRIVATE, 0, {tv_sec=12617286, tv_nsec=874604467}, FUTEX_BITSET_MATCH_ANY) = 0
futex(0x752a70, FUTEX_WAKE_PRIVATE, 1)  = 0
futex(0x752a6c, FUTEX_WAIT_BITSET_PRIVATE, 0, {tv_sec=12617286, tv_nsec=878546772}, FUTEX_BITSET_MATCH_ANY) = -1 ETIMEDOUT (Connection timed out)
futex(0x752a70, FUTEX_WAKE_PRIVATE, 1)  = 0
futex(0x752a68, FUTEX_WAIT_BITSET_PRIVATE, 0, {tv_sec=12617286, tv_nsec=883652458}, FUTEX_BITSET_MATCH_ANY) = -1 ETIMEDOUT (Connection timed out)
futex(0x752a70, FUTEX_WAKE_PRIVATE, 1)  = 0
futex(0x752a68, FUTEX_WAIT_BITSET_PRIVATE, 0, {tv_sec=12617286, tv_nsec=888816934}, FUTEX_BITSET_MATCH_ANY) = -1 ETIMEDOUT (Connection timed out)
futex(0x752a70, FUTEX_WAKE_PRIVATE, 1)  = 0
futex(0x752a6c, FUTEX_WAIT_BITSET_PRIVATE, 0, {tv_sec=12617286, tv_nsec=903110693}, FUTEX_BITSET_MATCH_ANY) = -1 ETIMEDOUT (Connection timed out)
futex(0x752a70, FUTEX_WAKE_PRIVATE, 1)  = 0
futex(0x752a6c, FUTEX_WAIT_BITSET_PRIVATE, 0, {tv_sec=12617286, tv_nsec=908186919}, FUTEX_BITSET_MATCH_ANY) = -1 (errno 18446744073709551414)
+++ killed by SIGKILL +++

Can any one tell me how to train this model?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions