Amazon Topic Modeling

This project analysed the products on Amazon by creating topics based on customers reviews. Each of the reviews will be analysed using the Latent Dirichlet Allocation (LDA) algorithm to classify text to a particular topic. The most important (semantic and syntactic) keywords will be used to derive the topic cluster.

The dataset is created using Web Scrapping from the utils.py in the Electronics category. The following preprocessing procedure steps were performed:

Remove punctuation, Hypertext Transfer Protocols, whitespaces.
Lowercase the words.
Words that have fewer than 3 characters are removed.
Remove stopwords.
Apply Contraction.
Apply Stemming.
Apply Lemmatization.
Apply Tokenization: Split the sentences into words to prepare for LDA.
Build Bigrams: a sequence of two adjacent elements from a string of tokens.
Convert all the lists of words into the BoW format.

About using TF-IDF before LDA

LDA only needs a Bag-of-Word vector and TF-IDF corpus is not needed for LDA modelling based on the paper of 2003 (entitled "Latent Dirichlet Allocation") from Blei (who developed LDA). The algorithm is a word probabilistic generative model, which assumes a word is generated from a multinomial distribution. It doesn't make sense to say 0.6 word (tf-idf frequency weight) is generated from some distribution.

Name	Name	Last commit message	Last commit date
Latest commit Apatsi Update README.md Jan 23, 2022 cf3c48f · Jan 23, 2022 History 24 Commits
model	model	1.01 commit	Dec 7, 2021
.gitattributes	.gitattributes	Initial commit	Dec 7, 2021
.gitignore	.gitignore	Update .gitignore	Dec 11, 2021
LDA_Visualization.html	LDA_Visualization.html	First commit	Dec 7, 2021
LICENSE	LICENSE	Initial commit	Dec 7, 2021
README.md	README.md	Update README.md	Jan 23, 2022
requirements.txt	requirements.txt	Add dependencies info	Dec 11, 2021
topic_modelling.py	topic_modelling.py	Add offset hyperparameter tuning	Dec 12, 2021
utils.py	utils.py	Add offset hyperparameter tuning	Dec 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Amazon Topic Modeling

About using TF-IDF before LDA

About

Releases

Packages

Languages

License

Apatsi/Amazon_Topic_Modeling

Folders and files

Latest commit

History

Repository files navigation

Amazon Topic Modeling

About using TF-IDF before LDA

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages