GitHub - diem-ai/datascience-projects: Machine Learning and NLP projects

Text Classification
- Analyse the data and create new features
- Transform text data into Term Frequency - Inverse Document Frequency, select the best feature with f_classif and fit the transformed data to Bayesian algorithm
- Transform text data into Word Embedding, select the best feature with f_classif and fit the transformed data to Bayesian algorithm
- Word Embedding + Bayes improves 8% accuracy from 86% (baseline) to 94% Meanwhile TF-IDF + Bayes improve 5% accuracy from 86% to 91%
- Text Classification with TF-IDF, Word Embedding and Naive Bayes

Wine Clustering
- Clustering wine color from their chemical properties. Using KMeans as feature engineering in Classification
  Exploring how hyperparameter tuning and cross validation impact to the improvement
  KMeans - Clustering Method Part 1
Customer Segmentation
- Explore the variation of different customers and predict the segmentation throught their channel products and spending
  Dimensionality Reduction with Principal Component Analysis
Movie Recommendation
- Recommend movies to users using collaborative filtering and content based techinques
- Collaborative filtering: Recommendation list is generated based on the most similar items to a user's already-rated items
- Content based Model: Recommend movies with similar contents : genres, actors, actresses, crew
- Techniques: Data Cleaning, Data Visualization, NLP
Sentiment Mining
- Explore Logistic Regression Classifier with postive/negative/neutral product's reviews
- Extrac topics in customer's review with Laten Dirichlet Allocation (LDA)
- Generate postitive/negative/neutral reviews by implementing Marko Chain Text Generator
- Techniques: Data Visualization, Data Cleaning, Classification, Topic Modeling
Portfolio Investment Optimization
- Collect historical data of 20 stocks of S&P 500 in 5 years
- Use the Principle Component Analysis from Sklearn to structure the eigenvector features of covariance matrix of stocks
- Calculate the weights of each portfolio in PCA components
- Compute the sharpe ratio, annual return and annual volatility of each portfolio
- Techniques: Data scrapping, Data Visualization, Principal Components Analysis

Name		Name	Last commit message	Last commit date
Latest commit History 177 Commits
customer_segement		customer_segement
movie-recommender		movie-recommender
sentiment_mining		sentiment_mining
spam_classification		spam_classification
stock_analysis		stock_analysis
twitter_bot		twitter_bot
wine_clustering		wine_clustering
LICENSE		LICENSE
README.md		README.md
blogger.png		blogger.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

About

Uh oh!

Releases

Packages

Languages

License

diem-ai/datascience-projects

Folders and files

Latest commit

History

Repository files navigation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages