streaming_how_to.rst 5.02 KB
Newer Older
1 2
.. _streaming_how_to:
.. _Twitter Streaming API Documentation:
.. _Twitter Streaming API Connecting Documentation:
.. _Twitter Response Codes Documentation:
5 6 7 8 9 10 11 12 13

Streaming With Tweepy
Tweepy makes it easier to use the twitter streaming api by handling authentication, 
connection, creating and destroying the session, reading incoming messages, 
and partially routing messages. 

This page aims to help you get started using Twitter streams with Tweepy 
14 15
by offering a first walk through.  Some features of Tweepy streaming are
not covered here. See in the Tweepy source code. 
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69

API authorization is required to access Twitter streams. 
Follow the :ref:`auth_tutorial` if you need help with authentication. 

The Twitter streaming API is used to download twitter messages in real 
time.  It is useful for obtaining a high volume of tweets, or for 
creating a live feed using a site stream or user stream. 
See the `Twitter Streaming API Documentation`_.

The streaming api is quite different from the REST api because the
REST api is used to *pull* data from twitter but the streaming api
*pushes* messages to a persistent session. This allows the streaming 
api to download more data in real time than could be done using the

In Tweepy, an instance of **tweepy.Stream** establishes a streaming 
session and routes messages to **StreamListener** instance.  The
**on_data** method of a stream listener receives all messages and
calls functions according to the message type. The default 
**StreamListener** can classify most common twitter messages and 
routes them to appropriately named methods, but these methods are 
only stubs. 

Therefore using the streaming api has three steps. 

1. Create a class inheriting from **StreamListener**

2. Using that class create a **Stream** object 

3. Connect to the Twitter API using the **Stream**.

Step 1: Creating a **StreamListener**
This simple stream listener prints status text.
The **on_data** method of Tweepy's **StreamListener** conveniently passes 
data from statuses to the **on_status** method.
Create class **MyStreamListener** inheriting from  **StreamListener** 
and overriding **on_status**.::
  import tweepy
  #override tweepy.StreamListener to add logic to on_status
  class MyStreamListener(tweepy.StreamListener):
      def on_status(self, status):

Step 2: Creating a **Stream**
We need an api to stream. See :ref:`auth_tutorial` to learn how to get an api object. 
Once we have an api and a status listener we can create our stream object.::

  myStreamListener = MyStreamListener()
  myStream = tweepy.Stream(auth = api.auth, listener=myStreamListener)
71 72 73 74 75 76 77 78 79 80 81 82 83

Step 3: Starting a Stream
A number of twitter streams are available through Tweepy. Most cases 
will use filter, the user_stream, or the sitestream. 
For more information on the capabilities and limitations of the different
streams see `Twitter Streaming API Documentation`_.

In this example we will use **filter** to stream all tweets containing
the word *python*. The **track** parameter is an array of search terms to stream. ::

84 85
This example shows how to use **filter** to stream tweets by a specific user. The **follow** parameter is an array of IDs. ::

87 88

An easy way to find a single ID is to use one of the many conversion websites: search for 'what is my twitter ID'.

90 91
A Few More Pointers

93 94
Async Streaming
rajasagashe's avatar
rajasagashe committed
Streams do not terminate unless the connection is closed, blocking the thread. 
96 97
Tweepy offers a convenient **async** parameter on **filter** so the stream will run on a new
thread. For example ::

99 100 101 102 103 104 105 106 107
  myStream.filter(track=['python'], async=True)

Handling Errors
When using Twitter's streaming API one must be careful of the dangers of 
rate limiting. If clients exceed a limited number of attempts to connect to the streaming API 
in a window of time, they will receive error 420.  The amount of time a client has to wait after receiving error 420
will increase exponentially each time they make a failed attempt. 

108 109 110 111 112
Tweepy's **Stream Listener** passes error codes to an **on_error** stub. The
default implementation returns **False** for all codes, but we can override it
to allow Tweepy to reconnect for some or all codes, using the backoff
strategies recommended in the `Twitter Streaming API Connecting
Documentation`_. ::
113 114 115

  class MyStreamListener(tweepy.StreamListener):
      def on_error(self, status_code):
          if status_code == 420:
              #returning False in on_error disconnects the stream
119 120
              return False

121 122 123
          # returning non-False reconnects the stream, with backoff.

For more information on error codes from the Twitter API see `Twitter Response Codes Documentation`_.