How to build your own Twitter Sentiment Analysis Tool

Twitter Sentiment AnalysisIn this article we will show how you can build a simple Sentiment Analysis tool which classifies tweets as positive, negative or neutral by using the Twitter REST API 1.1v and the Datumbox API 1.0v. Even though the examples will be given in PHP, you can very easily build your own tools in the computer language of your choice.

Update: The Datumbox Machine Learning Framework is now open-source and free to download. If you want to build a Sentiment Analysis classifier without hitting the API limitations, use the com.datumbox.applications.nlp.TextClassifier class.

You can find the complete PHP code of the Twitter Sentiment Analysis tool on Github.

Social Media Monitoring & Sentiment Analysis

Social Media Monitoring is one of the hottest topics nowadays. As more and more companies use Social Media Marketing to promote their brands, it became necessary for them to be able to evaluate the effectiveness of their campaigns.

Building a Social Media Monitoring tool requires at least 2 modules: one that evaluates how many people are influenced by the campaign and one that finds out what people think about the brand.

Evaluating the generated buzz is usually performed by using various KPIs such as the number of followers/friends, the number of likes/shares/RTs per post and more complex ones such as the engagement rate, the response rate and other composite metrics. Quantifying the buzz is usually straightforward and can be performed by using basic statistics.

On the other hand, being able to evaluate the opinion of the users is not a trivial matter. Evaluating their opinions requires performing Sentiment Analysis, which is the task of identifying automatically the polarity, the subjectivity and the emotional states of particular document or sentence. It requires using Machine Learning and Natural Language Processing techniques and this is where most of the developers hit the wall when they try to build their own tools.

Thankfully Datumbox simplifies the process of using Machine Learning since it offers several API functions which allow you to build custom Social Media Monitoring tools in no time. Some of the services that are available to the API users are the Sentiment Analysis, the Twitter Sentiment Analysis and the Subjectivity Analysis API functions. In this article we will focus only on the Twitter Sentiment Analysis method, nevertheless as you can easily find out the rest of the functions work similarly.

Performing Sentiment Analysis on Twitter

Performing Sentiment Analysis on Twitter is trickier than doing it for large reviews. This is because the tweets are very short (only about 140 characters) and usually contain slangs, emoticons, hash tags and other twitter specific jargon. This is the reason why Datumbox offers a completely different classifier for performing Sentiment Analysis on Twitter.

Building the Sentiment Analysis tool

In order to build the Sentiment Analysis tool we will need 2 things: First of all be able to connect on Twitter and search for tweets that contain a particular keyword. Second evaluate the polarity (positive, negative or neutral) of the tweets based on their words. For the first task we will use the Twitter REST API 1.1v and for the second the Datumbox API 1.0v.

To speed up the development we will use 2 classes: The great PHP-Twitter-API client written by Tim Whitlock and the Datumbox PHP-API-Client offered by our service. As you will soon find out getting the tweets via the Twitter API is the most complicated task of this tutorial.

Create your own Twitter Application

Unfortunately Twitter made it more complicated for developers to use their API. In order to be able to search for particular tweets you must authenticate yourself by using OAuth protocol. Fortunately the API client of Tim takes care most of the tasks and enables a fast and easy integration. Still you are required to create a new Twitter application before using the library.

So go to Twitter Applications Console, login by using your credentials, click on “Create new Application” button and fill in the form to register a new app. When you create it select the application and go to the “Details” tab (the first tab) and on the bottom of the page click the “Create my access token” button. Once you do this, go to the “OAuth tool” tab and note down the values: Consumer Key, Consumer secret, Access token and Access token secret.

Get your Datumbox API key

To access the Datumbox API sign up for a free account and visit your API Credentials panel to get your API Key.

Developing the Twitter Sentiment Analysis class

All we need to do in order to develop the tool is write a TwitterSentimentAnalysis class which uses the Twitter and Datumbox API Clients to fetch the tweets and evaluate their polarity.

Below you can see the code along with the necessary comments.

class TwitterSentimentAnalysis {
    protected $datumbox_api_key; //Your Datumbox API Key. Get it from
    protected $consumer_key; //Your Twitter Consumer Key. Get it from
    protected $consumer_secret; //Your Twitter Consumer Secret. Get it from
    protected $access_key; //Your Twitter Access Key. Get it from
    protected $access_secret; //Your Twitter Access Secret. Get it from
    * The constructor of the class
    * @param string $datumbox_api_key   Your Datumbox API Key
    * @param string $consumer_key       Your Twitter Consumer Key
    * @param string $consumer_secret    Your Twitter Consumer Secret
    * @param string $access_key         Your Twitter Access Key
    * @param string $access_secret      Your Twitter Access Secret
    * @return TwitterSentimentAnalysis  
    public function __construct($datumbox_api_key, $consumer_key, $consumer_secret, $access_key, $access_secret){
    * This function fetches the twitter list and evaluates their sentiment
    * @param array $twitterSearchParams The Twitter Search Parameters that are passed to Twitter API. Read more here
    * @return array
    public function sentimentAnalysis($twitterSearchParams) {
        return $this->findSentiment($tweets);
    * Calls the Search/tweets method of the Twitter API for particular Twitter Search Parameters and returns the list of tweets that match the search criteria.
    * @param mixed $twitterSearchParams The Twitter Search Parameters that are passed to Twitter API. Read more here
    * @return array $tweets
    protected function getTweets($twitterSearchParams) {
        $Client = new TwitterApiClient(); //Use the TwitterAPIClient
        $Client->set_oauth ($this->consumer_key, $this->consumer_secret, $this->access_key, $this->access_secret);

        $tweets = $Client->call('search/tweets', $twitterSearchParams, 'GET' ); //call the service and get the list of tweets
        return $tweets;
    protected function findSentiment($tweets) {
        $DatumboxAPI = new DatumboxAPI($this->datumbox_api_key); //initialize the DatumboxAPI client
        foreach($tweets['statuses'] as $tweet) { //foreach of the tweets that we received
            if(isset($tweet['metadata']['iso_language_code']) && $tweet['metadata']['iso_language_code']=='en') { //perform sentiment analysis only for the English Tweets
                $sentiment=$DatumboxAPI->TwitterSentimentAnalysis($tweet['text']); //call Datumbox service to get the sentiment
                if($sentiment!=false) { //if the sentiment is not false, the API call was successful.
                    $results[]=array( //add the tweet message in the results
        return $results;

What we do is pass to the constructor the necessary keys for all the services. Then on the public sentimentAnalysis function we first call Twitter service in order to get the list of tweets which much our search parameters and then we call for each tweet the Datumbox service to get is polarity.

This is it! You ready to use this class to perform Sentiment Analysis on tweets and build your own Social Media Monitoring tool. You can find the complete PHP code of the Twitter Sentiment Analysis tool on Github.

Extra: Detailed Information about the Twitter Sentiment Analysis Classifier

This part is optional for those of you who are interested in learning how Datumbox’s Twitter Sentiment Analysis works.

In order to detect the Sentiment of the tweets we used our Machine Learning framework to build a classifier capable of detecting Positive, Negative and Neutral tweets. Our training set consisted of 1.2 million tweets evenly distributed across the 3 categories. We tokenized the tweets by extracting their bigrams and by taking into account the URLs, the hash tags, the usernames and the emoticons.

In order to select the best features we used several different algorithms and at the end we chose the Mutual Information. Finally after performing several tests with various models and configurations we selected the Binarized Naïve Bayes as the best performing classifier for the particular problem (strangely enough Naïve Bayes beat SVM, Max Entropy and other classifiers which are known to perform usually better than NB). To evaluate the results we used the 10-fold cross-validation method and our best performing classifier achieves an accuracy of 83.26%.

Did you like the article? Please take a minute to share it on Twitter. :)


My name is Vasilis Vryniotis. I'm a Data Scientist, Software Engineer, Statistics & Machine Learning enthusiast, co-founder of and author of Datumbox Machine Learning Framework. Learn more

Latest Comments
  1. Roberto

    Hi @bbriniotis,

    I’m working with classification in data stream and I’d like to know how do you deal with vocabulary changes, AKA Shift Drift and sentiment drifts?

    Thank you

    • Vasilis Vryniotis

      Hi @Roberto,

      So basically what you mean is how someone can deal with the fact that the sentiment orientation/score of some words might change over time. This usually occurs when you are performing sentiment analysis on news articles or on twitter. For example the sentiment orientation/score of the name of a particular Political Candidate can change after a scandal is revealed.

      Sentiment drifts are a difficult topic in sentiment analysis as you know. If you are working on a domain where you expect such drifts, you can try to regularly update the training dataset over time. This way your algorithm will adapt the scores. The problem with this is that it is very hard to find an annotated fresh dataset. In that case you could use ensemble learning to combine the output of 2 classifiers: One that will use the old large dataset and one that will use a smaller one with a fresher new dataset and different feature selection method.

      Unfortunately given that I don’t know what type of analysis you are performing, I can’t provide you with a more targeted tip. At any case don’t expect to find a silver bullet for this problem. Sentiment Analysis has at its best 85% accuracy, so on its own it’s a very difficult problem. Let alone when you start dealing with irony, sarcasm, sentiment drifts and so on.

      Hope this helps!

      • Roberto

        Hi @bbriniotis,

        I’m Msc in Computer Science which the topic of my thesis is Sentiment Analysis in data streams with concept drifts. I’ve been develop an algorithm that deal with the issues of concept drift. My approach is based on samples selection (training set formation) in order to provide fresh and relevant examples to base classifier. However, I still have an issue of labeling the samples. In this way, now I’m working in an algorithm which employs active learning insights to reduce the labeling effort. Thus, turning the algorithm applicable in a real scenario. I have a lot of ideas what I wanna develop.

        Thank you by the explanation,

        • Vasilis Vryniotis

          @Roberto, Sounds really interesting. If you publish your research, send us the link. :)

  2. himanshu

    Regarding the training set containing 1.2 mn tweets with equal number of positive, negative and neutral tweets. Can you explain a bit about how it was obtained(I guess number is large enough and hints an automated approach such looking at smiley or pos/neg hashtags etc)?
    Then the test set, where you got 83.26% accuracy, was that a subset of those 1.2 mn tweets or some separate hand labeled set of tweets ?

    • Vasilis Vryniotis

      Hi @himanshu,

      I have found a dataset which contained 800k tweets (positive vs negative) and then I collected another 400k tweets for the neutral class mostly from editorial and news twitter accounts. The accuracy was estimated by doing a 10 fold cross validation.

  3. Dough

    Hi @Vasilis,
    Do you have idea how I can possibly extract features from these sentiments ?

    • Vasilis Vryniotis

      Hey @Dough,

      Thanks for the comment. You can use various feature selection algorithms to extract features. Some of the methods that are widely used in Text Classification problems are the Chisquare and the Mutual Information methods.

  4. fabian

    hi, could put in a zip entire working example?

  5. su

    Hi @Vasilis,

    Thanks for great example!
    But I could not get it working. Can you check what’s wrong with what I did? your zip file to my linux box(CentOS6.4 with PHP5.3) ->
    2. just added my keys to config.php (anything else need to change?) ->
    3. ran index.php in my Firefox, input a keyword and then submitted query.
    But nothing happened and no any file generated.

    • Vasilis Vryniotis

      Hi Su,

      You probably have an issue with the twitter API because I tested the script and works without problems. You must create a Twitter Application and add your keys in config file as described in the article.

      If you still face problems, try debugging the code to see which API does not respond.

  6. rizki

    Hi, Vasilis

    Thanks for the example :)
    The following steps I did:
    1. dowload the source code
    2. added my keys to config.php
    3. Ran index.php in chrome, but I get an error message
    The error message is: Uncaught exception ‘TwitterApiException’ with message ‘error setting certificate verify locations
    Do you know what is wrong with the location of the certificate or other things that cause the error?
    Thank you

    • Vasilis Vryniotis

      Hi @rizki,

      It seems to me that the Twitter API client can’t access the certificate. Make sure it is on the right place and that it has read privileges.

      • amber


        I am trying to run this demo and Nothing shows up when I enter a search term. I have this installed in a sub folder in my servers www. For example: www/test/Zip file contents Any clues as to why?

        • Vasilis Vryniotis

          Hi Amber,

          The script is fully functional but you need to setup properly the Twitter App and have a valid Datumbox API key. Check again the configuration and follow the steps found in the article. If this does not work for you, I’m afraid you have to check it with a debugger to see where it stacks.

  7. Ismahane

    Thank you, this is a very interesting article, I used Tweepy Striming API for extracting tweets and I would analyze my tweets using Python laguage. can i use DatumBox with Python?

    • Vasilis Vryniotis

      Hi Ismahane,

      Sure you can. Joel Hoskin was kind enough to build an implementation of datumbox API in Python and open source it. Check out our API page to download it:

      At any case you can build your own implementation in a matter of minutes. The API is super simple so you should not face any problems.

      Good luck!

  8. Neelima

    Hi Vasilis

    I am a student and as a part of my project i want to do sentiment analysis of the tweets that i collected using Twitter4j. Can we use datumbox with JAVA?


    • Vasilis Vryniotis

      Hi Neelima,

      Of course you can just sign up for a free key and write an API client.

      Good luck!

  9. Hardyman

    Hi Vasilis,

    Thanks for sharing valuable information. It was interesting and exciting experiences reading your blog.

  10. Gerald

    Hello Vasilis,

    Thanks for uploaded the great tutorial.

    Could you please share your training set (consisted of 1.2 million tweets evenly distributed across the 3 categories)??


Leave a Reply

Your email address will not be published. Required fields are marked *

7 + = fifteen

You may use these HTML tags and attributes: <a href="" title="" rel=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>