Skip to content

The chunksize variable specified in ldaseqmodel is not passed to ldamodel, so latter defaults to chunksize = 2000 #3472

Open
@mspezio

Description

@mspezio

Looking into ldaseqmodel.py, see that chunksize specified is not passed to ldamodel:

"if corpus is not None and time_slice is not None:
self.max_doc_len = max(len(line) for line in corpus)

        if initialize == 'gensim':
            lda_model = ldamodel.LdaModel(
                corpus, id2word=self.id2word, num_topics=self.num_topics,
                passes=passes, alpha=self.alphas, random_state=random_state,
                dtype=np.float64
            )"

This may cause suboptimal topics due to the default chunksize = 2000 being too small for applications that have many documents.

Could this be fixed in the next release?

Great package, thanks so much for sharing it and all of the work that has gone into it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions