Skip to content

dropping nearly all documents dfm2stm #291

@michellerh330

Description

@michellerh330

Hello, for several subsets of my data, my code runs fine, but for one subset (looking at newspapers from the UK), I keep getting this error:

Warning: There were 10 warnings in mutate().
The first warning was:
ℹ In argument: topic_model = future_map(K, ~stm(dfm_uk, K = ., verbose = FALSE, seed = 2461)).
Caused by warning in dfm2stm():
! Dropped 21,185 empty document(s)
ℹ Run dplyr::last_dplyr_warnings() to see the 9 remaining warnings.

Here is my code:

conflicts_prefer(stopwords::stopwords)
toks <-
tokens(text_corp2, remove_numbers = TRUE, remove_punct = TRUE, remove_symbols = TRUE) %>%
tokens_wordstem %>%
tokens_remove(c(stopwords("en", source = "stopwords-iso")))

dfm_uk <- dfm(toks) %>%
dfm_trim(min_docfreq = 0.01, docfreq_type = "prop")

Error occurs here:

many_models <- data_frame(K = c(20, 30, 40, 50, 60)) %>%
mutate(topic_model = future_map(K, ~stm(dfm_uk, K = .,
verbose = FALSE, seed=2461)))

dplyr::last_dplyr_warnings()
[[1]]
<warning/rlang_warning>
Warning in mutate():
ℹ In argument: topic_model = future_map(K, ~stm(dfm_uk, K = ., verbose = FALSE, seed = 2461)).
Caused by warning in dfm2stm():
! Dropped 21,185 empty document(s)


Backtrace:

  1. ├─data_frame(K = c(20, 30, 40, 50, 60)) %>% ...
  2. ├─dplyr::mutate(...)
  3. └─dplyr:::mutate.data.frame(...)

[[2]]
<warning/rlang_warning>
Warning in mutate():
ℹ In argument: topic_model = future_map(K, ~stm(dfm_uk, K = ., verbose = FALSE, seed = 2461)).
Caused by warning:
! UNRELIABLE VALUE: Future (‘’) unexpectedly generated random numbers without specifying argument 'seed'. There is a risk that those random numbers are not statistically sound and the overall results might be invalid. To fix this, specify 'seed=TRUE'. This ensures that proper, parallel-safe random numbers are produced via the L'Ecuyer-CMRG method. To disable this check, use 'seed=NULL', or set option 'future.rng.onMisuse' to "ignore".

Backtrace:

  1. ├─data_frame(K = c(20, 30, 40, 50, 60)) %>% ...
  2. ├─dplyr::mutate(...)
  3. └─dplyr:::mutate.data.frame(...)

[[3]]
<warning/rlang_warning>
Warning in mutate():
ℹ In argument: topic_model = future_map(K, ~stm(dfm_uk, K = ., verbose = FALSE, seed = 2461)).
Caused by warning in dfm2stm():
! Dropped 21,185 empty document(s)

Backtrace:

  1. ├─data_frame(K = c(20, 30, 40, 50, 60)) %>% ...
  2. ├─dplyr::mutate(...)
  3. └─dplyr:::mutate.data.frame(...)

[[4]]
<warning/rlang_warning>
Warning in mutate():
ℹ In argument: topic_model = future_map(K, ~stm(dfm_uk, K = ., verbose = FALSE, seed = 2461)).
Caused by warning:
! UNRELIABLE VALUE: Future (‘’) unexpectedly generated random numbers without specifying argument 'seed'. There is a risk that those random numbers are not statistically sound and the overall results might be invalid. To fix this, specify 'seed=TRUE'. This ensures that proper, parallel-safe random numbers are produced via the L'Ecuyer-CMRG method. To disable this check, use 'seed=NULL', or set option 'future.rng.onMisuse' to "ignore".

Backtrace:

  1. ├─data_frame(K = c(20, 30, 40, 50, 60)) %>% ...
  2. ├─dplyr::mutate(...)
  3. └─dplyr:::mutate.data.frame(...)

[[5]]
<warning/rlang_warning>
Warning in mutate():
ℹ In argument: topic_model = future_map(K, ~stm(dfm_uk, K = ., verbose = FALSE, seed = 2461)).
Caused by warning in dfm2stm():
! Dropped 21,185 empty document(s)

Backtrace:

  1. ├─data_frame(K = c(20, 30, 40, 50, 60)) %>% ...
  2. ├─dplyr::mutate(...)
  3. └─dplyr:::mutate.data.frame(...)

... with 5 more warnings.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions