Skip to content

Commit

Permalink
Switched to TwitterAPI
Browse files Browse the repository at this point in the history
  • Loading branch information
geduldig committed Jun 14, 2013
1 parent fea9a42 commit 642374d
Show file tree
Hide file tree
Showing 11 changed files with 101 additions and 298 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
*.cache

*.py[cod]

# C extensions
Expand Down
4 changes: 3 additions & 1 deletion CHANGES.txt
Original file line number Diff line number Diff line change
Expand Up @@ -10,4 +10,6 @@ v1.0.0, 30 Jan 2013 -- Uploaded to github.

v1.1.0, 12 Feb 2013 -- Replaced TwitterAPI with puttytat for Twitter requests.

v1.1.1, 19 Feb 2013 -- Geocoder uses viewport instead of bounds.
v1.1.1, 19 Feb 2013 -- Geocoder uses viewport instead of bounds.

v2.0.0, 14 Jun 2013 -- Switch to TwitterAPI and renamed to TwitterGeoPics
9 changes: 9 additions & 0 deletions MANIFEST
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# file GENERATED by distutils, do NOT edit
CHANGES.txt
setup.py
twittergeo/Geocoder.py
twittergeo/SearchGeo.py
twittergeo/SearchPics.py
twittergeo/StreamGeo.py
twittergeo/StreamPics.py
twittergeo/__init__.py
64 changes: 1 addition & 63 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,69 +1,7 @@
# TwitterGeo #
# TwitterGeoPics #

_Scripts for geocoding tweets and for downloading images embedded in tweets._

### Getting Location and Embedded Images... ###

TwitterGeo contains command line scripts for geocoding tweets and extracting embedded images from tweets from twitter.com. The scripts take one or more search words as command line arguments. The scripts download old tweets using Twitter's REST API and download new tweets using Twitter's Streaming API.

About 1% or 2% of tweets contain latitude and longitude. Of those tweets that do not contain coordinate data, about 60% have the user's profile location, a descriptive text field that may or not be accurate. Using Google's Maps API, we can geocode these tweets, which locates about half of all tweets, a portion of which are suspect.

Use the location option to restrict searches to a geographic location. Twitter returns tweets that either contain coordinates in the location region or tweets from users whose profile location is in the specified region.

Google does not require autentication, but it does enforce a daily limit of about 2,500 requests per day and about 10 requests per second.

The Twitter API requires OAuth credentials which you can get by creating an application on dev.twitter.com. Once you have your OAuth secrets and keys, copy them into puttytat/credentials.txt. Alternatively, specify the credentials file on the command line.

Twitter restricts searching old tweets to within roughly the past week. Twitter also places a bandwidth limit on searching current tweet, but you will notice this only when you are searching a popular word. When this limit occurs the total number of skipped tweets is printed and the connection is maintained.

### Features ###

*The following modules run as command line scripts and write tweets to the console.*

***SearchGeo***

Prints old tweets and their location information and coordinates when possible.

***StreamGeo***

Prints new tweets and their location information and coordinates when possible.

***SearchPics***

Prints old tweets, their coordinates and URLs of any embedded photos. To download the photos use the -photo_dir option. To get tweets only from a specific geographic region use the -location.

***StreamPics***

Prints new tweets, their coordinates and URLs of any embedded photos. To download the photos use the -photo_dir option. To get tweets only from a specific geographic region use the -location.

*This is utility module.*

***Geocoder***

A wrapper for the pygeocoder package. It adds throttling to respect Google's daily quota and rate limit. It also provides a caching mechanism for storing geocode lookups to a text file. The caching is only partially effective because user can enter their location in any format. There are also some Twitter specific methods.

### Installation ###


1. On a command line, type:

pip install twittergeo

2. Either copy your OAuth consumer secret and key and your access token secret and key into puttytat/credentials.txt, or copy them into another file which you will specify on the command line. See credentials.txt for the expected file format.

3. Run a script type with '-m' option, for example:

python -m twittergeo.StreamGeo zzz
python -m twittergeo.StreamGeo zzz -oauth ./my_credentials.txt

### External Dependencies ###

This package uses the following external packages.

* puttytat - for downloading tweets
* pygeocoder - for geo-referencing using Google's Maps service
* fridge - for caching latitudes and longitudes in a persistant dict

### Contributors ###

Jonas Geduldig
79 changes: 16 additions & 63 deletions twittergeo/Geocoder.py → TwitterGeoPics/Geocoder.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@
import datetime
import fridge
import math
import os
import pygeocoder
import socket
import time
Expand All @@ -26,12 +25,9 @@ class Geocoder:
def __init__(self, cache_file=None):
"""Zero counters and open cache file.
Parameters
----------
cache_file : str
cache_file :
File path for cache file. File will get opened for append or created if not found.
If cache_file is not supplied, the default file will be used.
"""
self.count_request = 0 # total number of geocode requests
self.count_request_ok = 0 # total number of successful geocode requests
Expand All @@ -47,17 +43,13 @@ def __init__(self, cache_file=None):
self.last_exec = None # time updated at each geocode request

if cache_file is None:
#path = os.path.dirname(__file__)
#cache_file = os.path.join(path, DEFAULT_CACHE_FILE)
cache_file = DEFAULT_CACHE_FILE

# cache is a persistent dict with place address as key and lat/lng and count as value
self.cache = fridge.Fridge(cache_file)

def _throttle(self):
"""Wait an interval to not exceed rate limit. Called before each geocode request.
"""
"""Wait an interval to not exceed rate limit. Called before each geocode request."""
if self.retry_count == 1:
# increase the throttle to respect rate limit
self.retry_count = 2
Expand All @@ -76,12 +68,9 @@ def _throttle(self):
def _should_retry(self):
"""Handle an OVER QUERY LIMIT exception. Called when GeocodeError is thrown.
Return
------
retry : boolean
Return : boolean
True means wait 2 seconds, increase the throttle, and retry the request.
False means stop making geocode requests because daily limit was exceeded.
"""
if not self.quota_exceeded:
if self.retry_count == 0:
Expand All @@ -101,23 +90,14 @@ def _should_retry(self):
def geocode(self, place):
"""Returns Google's geocode data for a place.
Parameters
----------
place : str
An address or partial address in any format.
place : An address or partial address in any format.
Return
------
geocode data : dict
Keys and values are from Google's JSON data.
Return : dict
Geocode from Google's JSON data.
Raises
------
pygeocoder.GeocoderError
Quota exceeded, indecipherable address, etc.
Exception
Socket errors.
Raises :
pygeocoder.GeocoderError : Quota exceeded, indecipherable address, etc.
Exception : Socket errors.
"""
self._throttle()
try:
Expand Down Expand Up @@ -163,24 +143,16 @@ def address_to_latlng(self, place):
def geocode_tweet(self, status):
"""Returns an address and coordinates associated with a tweet.
Parameters
----------
status : dict
Keys and values of a tweet (i.e. a Twitter status).
Return
------
place : str
Return : (str, float, float)
An address or part of an address from either the tweeter's Twitter profile
or from reverse geocoding coordinates associated with the tweet.
latitude, longitude : float
Coordinates either assocatiated with the tweet or from geocoding the
location in the tweeter's Twitter profile.
Raises
------
See Geocoder.geocode() documentation.
Raises: See Geocoder.geocode() documentation.
"""
# start off with the location in the user's profile (it may be empty)
place = status['user']['location']
Expand Down Expand Up @@ -232,24 +204,15 @@ def get_region_box(self, place):
The size of bounding box that Google returns depends on whether the place is
an address, a town or a country.
Parameters
----------
place : str
An address or partial address in any format. Googles will try anything.
Return
------
latitude, longitude : float
Return : floatx6
The place's coordinates.
latitude, longitude : float
The place's SW coordinates.
latitude, longitude : float
The place's NE coordinates.
Raises
------
See Geocoder.geocode() documentation.
Raises : See Geocoder.geocode() documentation.
"""
results = self.geocode(place)
geometry = results.raw[0]['geometry']
Expand All @@ -265,32 +228,22 @@ def get_region_circle(self, place):
The motivation for this method is Twitter's Search API's 'geocode'
parameter.
Parameters
----------
place : str
An address or partial address in any format.
Return
------
latitude, longitude : float
Return : float, float, str
The place's coordinates.
radius : str
Half the distance spanning the corner's of the place's bounding box in kilomters.
Raises
------
See Geocoder.geocode() documentation.
Raises : See Geocoder.geocode() documentation.
"""
latC, lngC, latSW, lngSW, latNE, lngNE = self.get_region_box(place)
D = self.distance(latSW, lngSW, latNE, lngNE)
return latC, lngC, D/2

@classmethod
def distance(cls, lat1, lng1, lat2, lng2):
"""Calculates the distance between two points on a sphere
"""
"""Calculates the distance between two points on a sphere."""
# Haversine distance formula
lat1, lng1 = math.radians(lat1), math.radians(lng1)
lat2, lng2 = math.radians(lat2), math.radians(lng2)
Expand Down
50 changes: 13 additions & 37 deletions twittergeo/StreamGeo.py → TwitterGeoPics/GetNewGeo.py
Original file line number Diff line number Diff line change
@@ -1,24 +1,3 @@
"""
REQUIRED: PASTE YOUR TWITTER OAUTH CREDENTIALS INTO puttytat/credentials.txt
OR USE -oauth OPTION TO USE A DIFFERENT FILE CONTAINING THE CREDENTIALS.
Downloads real-time tweets. You must supply either one or both of the -words and
-location options. Prints the tweet text and location information, including
latitude and longitude from Google's Map service.
Use the -words option to get tweets that contain any of the words that are passed
as arguments on the command line.
Use the -location option to get tweets from a geographical region. Location is
determined only from geocode in the tweet. Use -location ALL to get all geocoded
tweets from any location.
The script calls Twitter's Streaming API which is bandwidth limitted. If you
exceed the rate limit, Twitter sends a message with the total number of tweets
skipped during the current connection. This number is printed, and the connection
remains open.
"""

__author__ = "Jonas Geduldig"
__date__ = "December 20, 2012"
__license__ = "MIT"
Expand All @@ -29,17 +8,14 @@

import argparse
import Geocoder
import puttytat
import urllib
from TwitterAPI import TwitterAPI, TwitterOAuth


OAUTH = None
GEO = Geocoder.Geocoder()


def parse_tweet(status, region):
"""Print tweet, location and geocode
"""
"""Print tweet, location and geocode."""
try:
geocode = GEO.geocode_tweet(status)
print '\n%s: %s' % (status['user']['screen_name'], status['text'])
Expand All @@ -51,10 +27,8 @@ def parse_tweet(status, region):
raise


def stream_tweets(list, region):
"""Get tweets containing any words in 'list' or that have location or coordinates in 'region'
"""
def stream_tweets(api, list, region):
"""Get tweets containing any words in 'list' or that have location or coordinates in 'region'."""
params = {}
if list is not None:
words = ','.join(list)
Expand All @@ -63,10 +37,11 @@ def stream_tweets(list, region):
params['locations'] = '%f,%f,%f,%f' % region
print 'REGION', region
while True:
tw = puttytat.TwitterStream(OAUTH)
try:
api.request('statuses/filter', params)
iter = api.get_iterator()
while True:
for item in tw.request('statuses/filter', params):
for item in iter:
if 'text' in item:
parse_tweet(item, region)
elif 'disconnect' in item:
Expand All @@ -77,16 +52,17 @@ def stream_tweets(list, region):


if __name__ == '__main__':
parser = argparse.ArgumentParser(description='Get real-time tweet stream.')
parser.add_argument('-oauth', metavar='FILENAME', type=str, help='read OAuth credentials from file')
parser = argparse.ArgumentParser(description='Get real-time tweet stream with geocode.')
parser.add_argument('-location', type=str, help='limit tweets to a place; use ALL to get all geocoded tweets')
parser.add_argument('-oauth', metavar='FILENAME', type=str, help='read OAuth credentials from file')
parser.add_argument('-words', metavar='W', type=str, nargs='+', help='word(s) to track')
args = parser.parse_args()

if args.words is None and args.location is None:
sys.exit('You must use either -words or -locoation or both.')

OAUTH = puttytat.TwitterOauth.read_file(args.oauth)
oauth = TwitterOAuth.read_file(args.oauth)
api = TwitterAPI(oauth.consumer_key, oauth.consumer_secret, oauth.access_token_key, oauth.access_token_secret)

if args.location:
if args.location.lower() == 'all':
Expand All @@ -99,7 +75,7 @@ def stream_tweets(list, region):
region = None

try:
stream_tweets(args.words, region)
stream_tweets(api, args.words, region)
except KeyboardInterrupt:
print>>sys.stderr, '\nTerminated by user'

Expand Down
Loading

0 comments on commit 642374d

Please sign in to comment.