This blog post gives an intro to connecting to the Twitter Streaming API using the Python library Tweepy. There are two Python routines: one to connect to Twitter and download relevant tweets to a file, and a second to post-process the file using Pandas and create some graphs using Matplotlib.
None of this required the tweets to be written to a file first, so I adapted the scripts to do everything in real time rather than post-processing, and Pandas wasn't required.
The script needs to connect to the Pocket account using previously retrieved credentials. It uses pocket, another Python library for connecting to the Pocket API; an example of its use is here. My code for connecting to Pocket looks like this:
import pocket with open('pocket_api_key.txt') as fileHandle: (pckt_consumer_key, pckt_access_token, redirect_uri) = \ [item.strip('\n') for item in fileHandle.readlines()] if __name__ == '__main__': # Pocket authentication pocket_instance = pocket.Pocket(pckt_consumer_key, pckt_access_token)
Next, I connect to the Twitter Streaming API using
filter method, which returns only tweets that contain certain pre-defined
keywords. This is straight from the original post.
(Clearly the error-handling needs finishing.) Each tweet is processed by the
myrespond(), which looks for tweets that also
contain the word 'tutorial', and extracts the link if there is one.
import re import json def getdata(tweet, key): """Get dictionary value given key.""" try: val = tweet[key] except KeyError: return '' return val def word_in_text(word, text): """Search for work in text string.""" if re.search(word.lower(), text.lower()): return True return False def extract_link(text): """Extract link from tweet or return null string.""" regex = r'https?:..[^\s<>"]+|www\.[^\s<>"]+' match = re.search(regex, text) if match: return match.group() return '' def myrespond(data): """Respond to relevant tweet.""" tweet = json.loads(data) lang = getdata(tweet, 'lang') if lang == 'en': text = getdata(tweet, 'text') link = extract_link(text) if word_in_text('tutorial', text) and link != '': print text print link print pocket_instance.add(url=link) print "-----------------------------------------------------------"
myrespond() function uses the
json library to turn the JSON-format data
containing the tweet into a Python dictionary, then another function
to access individual data items; if the language is English, then the text of
the tweet is extracted and the
re library used to return the link. Finally,
the Pocket API is used to add the link to Pocket.
There's probably some sophistication that could be added to this to ensure better quality links are identified; I could, for example, use my own tweets and blog posts as a reference corpus, and test the similarity of tweets to the text in my corpus; and then only add links that have a similarity score above some threshold.