-
Notifications
You must be signed in to change notification settings - Fork 0
/
README
62 lines (46 loc) · 3.24 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
README - Cloud Assignment 3
Name : Digvijay Singh(ds3161)
Kuber Kaul(kk2872)
1. Google App Engine:
We have used Google App Engine and the services it offers. We used Google App Engine's Python SDK to create an application and deployed on Google App Engine. The link to our application is : cloud-app-3.appspot.com
2. Twitter API:
We used twitter API to fetch the following and store in database as can be seen at google app engine :
1. username
2. handle
3. profile picture
4. timeline
5. tweet id
6. twitter
* As the assignment suggested we have used the google geocoding API. We used the pygeocoder wrapper for google geocoder to get the latitude and longitude information from any location provided by the user. Unlike the assignement suggested we take in all addresses and not just state names.
* We are using the get API and our own logic to get the tweets and other metadata about the tweet.
3. Google DataStore:
We have used Google NoSQL DataStore Service to cache the tweets. We have created a DataStore class having data members such as: id, text, from_user, from_user_name, Created_at, location. Based on the Twitter response, we are extracting information from the response and store it into an object instance of DataStore class for each tweet. Later, when we answer the same query(judging by same location and category), we return the tweets stored in the DataStore rather than sending a new request to Twitter API.
*To make the same request look fresh we are extracting the data from datastore in a random fashion thus creating something new with the same data. The buzz and the sentiment remain the same as usual.
4. Buzz Extraction:
Our buzz extraction module extracts the top 10 popular tweets based on our logic which comprises of three general parts:
1. Stemming of data in tweets
2. Removal of stop words from tweets
3. Extraction of top 10 frequent words from the remaining words.
5. Sentiment Analysis:
Our sentiment analysis uses the word list provided by the professor : (http://neuro.imm.dtu.dk/wiki/AFINN) and then follows "bag of words" algorithm to extract sentiments from the tweets. As per our own paper written in Search engine technology class we proved that bag of words/naive bayes works better than SVM , KNN, Decision Trees(which we also considered) when it comes to small amount of data which is the case here.
* Naive bayes performed to 82 % accuracy which was comparable if not higher to rest.
* We also use stemmer module and stop words reduction to pre process the tweets as before.
6. What we display:
We display the top 10 buzzes followed by related tweets to that buzz. In the end we categorize the sentiment for the buzz. The columns should include usersname, handle and their tweets that contributed to the buzz, his/her image.
The following is how we display our application :
______________________________________________________
Buzz:
______________________________________________________
Tweet
Metadata
______________________________________________________
Tweet
Metadata
______________________________________________________
Tweet
Metadata
______________________________________________________
.....
______________________________________________________
Sentiment:
______________________________________________________