Joel Grus – Polyglot Twitter Bot, Part 1: Node.js

Polyglot Twitter Bot, Part 1: Node.js

[The first in an (at least) 6-part series, all code on GitHub as always.]

Node.js
Node.js + AWS Lambda
Python 2.7 + AWS Lambda
Purescript
Purescript + AWS Lambda
Bonus: Purescript + Twitter Streaming

Like most of you, I've long dreamed of making a Twitter bot. And also like most of you, I've been doing a lot of Node.js recently. So I thought I'd take the first stab at writing my Twitter bot in Node. (Also, this will lay the groundwork for doing it in Purescript later.)

In particular, I wanted to create the make_greatagain bot, which would look for tweets containing "MAKE ___ GREAT AGAIN" constructions and retweet them. (But which skips tweets containing "MAKE AMERICA GREAT AGAIN", I'm looking for riffs on the original, not the original itself.)

Tweets by @make_greatagain

To start with, you should probably have Node installed. (I'll wait.) Then create a directory, and initialize a new project:

mkdir twitter-bot-node
cd twitter-bot-node
npm init

Just accept all the default options for npm init, I don't know what they mean either.

Now, if we're going to talk to Twitter, we should install the Node Twitter module.

npm install twitter --save

At this point you should create a Twitter account for your bot and get its credentials. After creating the account and logging in, go to apps.twitter.com and click on "Create New App". Give it a name and a description, and accept the terms of service. Then go to the "Keys and Access Tokens" tab and click "Create My Access Token". You should now have a consumer key, a consumer secret, an access token, and an access token secret. We need those, but KEEP THEM SECRET.

Now, we're ready to create our index.js. We start by loading the Twitter library and initializing it with our credentials:

var Twitter = require('twitter');
var client = new Twitter({
  consumer_key: "...",
  consumer_secret: "...",
  access_token_key: "...",
  access_token_secret: "..."
});

NOTE: if you are committing this code to GitHub, DO NOT CHECK IN THE CREDENTIALS. One approach is to stick them in credentials.js, like

module.exports = {
  consumer_key: "...",
  consumer_secret: "...",
  access_token_key: "...",
  access_token_secret: "..."
};

and then in index.js just do

var credentials = require('./credentials');
var client = new Twitter(credentials);

and then make sure to add credentials.js to your .gitignore.

Now, we want to find Tweets of the given form. For my example, that's

var query = 'make "great again" -america -filter:retweets';
var rgx = /make .* great again/i;

(Hopefully, your Twitter bot will do something different.)

The query is the actual query we'll send to Twitter. It looks for Tweets that contain both "make" and "great again" but not "america". And it ignores retweets. Since that search could (in theory) return irrelevant tweets (e.g. "great again doesn't make sense"), there's also a regex that we'll use as a client-side check.

Now, the Node model is asynchronous, which means we need to program with callbacks. That is, to search, we need to do something like

client.get('search/tweets', {q: "node.js"}, function(err, tweets, response) {
  if (err || !tweets.statuses) {
    console.log(err);
  } else {
    tweets.statuses.forEach(function(tweet) {
      console.log(tweet.user.screen_name + " " + tweet.text);
    });
  }
});

This code will kick off a search for "node.js" and then immediately go on to whatever code comes next. Meanwhile, whenever the search returns, the provided callback will be called, either logging the error or printing out the returned tweets.

Now in our code we want the callback to retweet each of the returned tweets. However, if we try to retweet a tweet we've already retweeted, we'll get an error. This means we either need to keep track of all the tweets we've already retweeted or else handle those errors intelligently. The second is a lot easier.

In order to retweet, we just need to post the tweet id to the retweets endpoint. If you inspect the returned tweets, they have both an id field (which is a number) and an id_str field (which is a string). For precision-related reasons (I assume), Javascript mangles the numeric ids, so we'll need to use the string version.

All of which results in a function that looks like

// Runs a Twitter search for the specified `query` and retweets all the results.
function searchAndTweet(succeed, fail) {
  console.log("search and tweet");
  client.get('search/tweets', {q: query, count: 15}, function(err, tweets, response) {
    if (!tweets.statuses) {
      fail(err);
    }

    tweets.statuses.forEach(function(tweet) {
      // Make sure we match the regex.
      var match = tweet.text.match(rgx);
      if (match) {
        var tweetId = tweet.id_str;
        client.post('statuses/retweet/' + tweetId, function(err, tweet, id) {
          // Will return an error if we try to retweet a tweet that we've already
          // retweeted.
          console.log(err || tweet.text);
        });
      } else {
        // consider doing something for no match
      }
    });
    succeed("success");
  });
}

Why do we pass in the succeed and fail callbacks? That's a story for the next post. (Spoiler: it involves AWS Lambda.) In the meantime, you can just pass in console.log for both.

Now, all that's left is to run your Twitter bot. We can use setInterval to make it run every 5 minutes:

setInterval(function() {
  searchAndTweet(console.log, console.log);
}, 5 * 60 * 1000);

And then if you simply

$ node index.js

your bot will start running. Of course, you probably don't want to keep it running locally on your computer all the time. We'll deal with that in the next post.

Posted on: 2015-12-29

Category: Code, Twitter, Node, Javascript, AWS, Make_GreatAgain