The revolution of social media sites has also attracted the users towards video sharing sites, such as YouTube. The online users express their opinions or sentiments on the videos that they watch on such sites.
According to the latest YouTube statistics, the video-sharing platform has 2.6 billion users worldwide as of 2022 (Statista, 2022). It’s ranked as the second-most popular social network, and the only platform that has more active users than YouTube is Facebook.
The objective is to automatically recognize and categorize opinions expressed in the comments to determine overall sentiment of the user.
Sentiment analysis helps data analysts within large enterprises gauge public opinion, conduct nuanced market research, monitor brand and product reputation, and understand customer experiences.
Observations on the following are made:
- Sentiment analysis on YouTube comments.
- Exploratory data analysis on positive sentences
- Exploratory data analysis on negative sentences.
- The dataset includes data gathered from the (up to) 200 listed videos on YouTube that are contained within the trending category each day in the US.
- The headers in the comments file are:
- video_id
- comment_text
- likes
- replies
- Polarity determination using TextBlob
- WordCloud representation of sentiments
- Removal of StopWords
- Emoji’s Analysis
- Use TextBlob's sentiment feature to determine polarity for each comment.
- Polarity values range from -1 to +1.
- Perform WordCloud of positive and negative sentences.
- To generate a word cloud, convert all comments into string format using the join function.
- Polarity 1 indicates a positive sentence, polarity -1 indicates a negative sentence.
- As you can see from the below image, the highlighted words don't add any value to the sentiments.
- Remove stop words using WordCloud's Stopword feature, which already has all the stop words present, to get the precise WordCloud, otherwise inaccurate data will be provided.
- Extract emojis from comments for analysis.
- The frequency of each emoji can be calculated using the counter feature in Collections libraries.
- Counter's most_common function will return the top 10 emojis that are frequently used.
- Separate emojis and frequency in order to plot a graph using plotly.
- Jupyter Notebook is used as IDE.
- Among the Python libraries, Pandas and NumPy are used for handling data, preprocessing, and mathematical operations, respectively.
- Plotly, Seaborn, and Matplotlib are used for visualizing plots.
For more details, please go through the Jupyter Notebook attached above.