The chunksize variable specified in ldaseqmodel is not passed to ldamodel, so latter defaults to chunksize = 2000

Looking into ldaseqmodel.py, see that chunksize specified is not passed to ldamodel:

"if corpus is not None and time_slice is not None:
            self.max_doc_len = max(len(line) for line in corpus)

            if initialize == 'gensim':
                lda_model = ldamodel.LdaModel(
                    corpus, id2word=self.id2word, num_topics=self.num_topics,
                    passes=passes, alpha=self.alphas, random_state=random_state,
                    dtype=np.float64
                )"

This may cause suboptimal topics due to the default chunksize = 2000 being too small for applications that have many documents.

Could this be fixed in the next release?

Great package, thanks so much for sharing it and all of the work that has gone into it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

The chunksize variable specified in ldaseqmodel is not passed to ldamodel, so latter defaults to chunksize = 2000 #3472

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

The chunksize variable specified in ldaseqmodel is not passed to ldamodel, so latter defaults to chunksize = 2000 #3472

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions