How to build your own Twitter Sentiment Analysis Tool

Twitter Sentiment AnalysisIn this article we will show how you can build a simple Sentiment Analysis tool which classifies tweets as positive, negative or neutral by using the Twitter REST API 1.1v and the Datumbox API 1.0v. Even though the examples will be given in PHP, you can very easily build your own tools in the computer language of your choice.

Update: The Datumbox Machine Learning Framework is now open-source and free to download. If you want to build a Sentiment Analysis classifier without hitting the API limitations, use the com.datumbox.applications.nlp.TextClassifier class.

You can find the complete PHP code of the Twitter Sentiment Analysis tool on Github.

Social Media Monitoring & Sentiment Analysis

Social Media Monitoring is one of the hottest topics nowadays. As more and more companies use Social Media Marketing to promote their brands, it became necessary for them to be able to evaluate the effectiveness of their campaigns.

Building a Social Media Monitoring tool requires at least 2 modules: one that evaluates how many people are influenced by the campaign and one that finds out what people think about the brand.

Evaluating the generated buzz is usually performed by using various KPIs such as the number of followers/friends, the number of likes/shares/RTs per post and more complex ones such as the engagement rate, the response rate and other composite metrics. Quantifying the buzz is usually straightforward and can be performed by using basic statistics.

On the other hand, being able to evaluate the opinion of the users is not a trivial matter. Evaluating their opinions requires performing Sentiment Analysis, which is the task of identifying automatically the polarity, the subjectivity and the emotional states of particular document or sentence. It requires using Machine Learning and Natural Language Processing techniques and this is where most of the developers hit the wall when they try to build their own tools.

Thankfully Datumbox simplifies the process of using Machine Learning since it offers several API functions which allow you to build custom Social Media Monitoring tools in no time. Some of the services that are available to the API users are the Sentiment Analysis, the Twitter Sentiment Analysis and the Subjectivity Analysis API functions. In this article we will focus only on the Twitter Sentiment Analysis method, nevertheless as you can easily find out the rest of the functions work similarly.

Performing Sentiment Analysis on Twitter

Performing Sentiment Analysis on Twitter is trickier than doing it for large reviews. This is because the tweets are very short (only about 140 characters) and usually contain slangs, emoticons, hash tags and other twitter specific jargon. This is the reason why Datumbox offers a completely different classifier for performing Sentiment Analysis on Twitter.

Building the Sentiment Analysis tool

In order to build the Sentiment Analysis tool we will need 2 things: First of all be able to connect on Twitter and search for tweets that contain a particular keyword. Second evaluate the polarity (positive, negative or neutral) of the tweets based on their words. For the first task we will use the Twitter REST API 1.1v and for the second the Datumbox API 1.0v.

To speed up the development we will use 2 classes: The great PHP-Twitter-API client written by Tim Whitlock and the Datumbox PHP-API-Client offered by our service. As you will soon find out getting the tweets via the Twitter API is the most complicated task of this tutorial.

Create your own Twitter Application

Unfortunately Twitter made it more complicated for developers to use their API. In order to be able to search for particular tweets you must authenticate yourself by using OAuth protocol. Fortunately the API client of Tim takes care most of the tasks and enables a fast and easy integration. Still you are required to create a new Twitter application before using the library.

So go to Twitter Applications Console, login by using your credentials, click on “Create new Application” button and fill in the form to register a new app. When you create it select the application and go to the “Details” tab (the first tab) and on the bottom of the page click the “Create my access token” button. Once you do this, go to the “OAuth tool” tab and note down the values: Consumer Key, Consumer secret, Access token and Access token secret.

Get your Datumbox API key

To access the Datumbox API sign up for a free account and visit your API Credentials panel to get your API Key.

Developing the Twitter Sentiment Analysis class

All we need to do in order to develop the tool is write a TwitterSentimentAnalysis class which uses the Twitter and Datumbox API Clients to fetch the tweets and evaluate their polarity.

Below you can see the code along with the necessary comments.

<?php
class TwitterSentimentAnalysis {
    
    protected $datumbox_api_key; //Your Datumbox API Key. Get it from https://www.datumbox.com/apikeys/view/
    
    protected $consumer_key; //Your Twitter Consumer Key. Get it from https://dev.twitter.com/apps
    protected $consumer_secret; //Your Twitter Consumer Secret. Get it from https://dev.twitter.com/apps
    protected $access_key; //Your Twitter Access Key. Get it from https://dev.twitter.com/apps
    protected $access_secret; //Your Twitter Access Secret. Get it from https://dev.twitter.com/apps
    
    /**
    * The constructor of the class
    * 
    * @param string $datumbox_api_key   Your Datumbox API Key
    * @param string $consumer_key       Your Twitter Consumer Key
    * @param string $consumer_secret    Your Twitter Consumer Secret
    * @param string $access_key         Your Twitter Access Key
    * @param string $access_secret      Your Twitter Access Secret
    * 
    * @return TwitterSentimentAnalysis  
    */
    public function __construct($datumbox_api_key, $consumer_key, $consumer_secret, $access_key, $access_secret){
        $this->datumbox_api_key=$datumbox_api_key;
        
        $this->consumer_key=$consumer_key;
        $this->consumer_secret=$consumer_secret;
        $this->access_key=$access_key;
        $this->access_secret=$access_secret;
    }
    
    /**
    * This function fetches the twitter list and evaluates their sentiment
    * 
    * @param array $twitterSearchParams The Twitter Search Parameters that are passed to Twitter API. Read more here https://dev.twitter.com/docs/api/1.1/get/search/tweets
    * 
    * @return array
    */
    public function sentimentAnalysis($twitterSearchParams) {
        $tweets=$this->getTweets($twitterSearchParams);
        
        return $this->findSentiment($tweets);
    }
    
    /**
    * Calls the Search/tweets method of the Twitter API for particular Twitter Search Parameters and returns the list of tweets that match the search criteria.
    * 
    * @param mixed $twitterSearchParams The Twitter Search Parameters that are passed to Twitter API. Read more here https://dev.twitter.com/docs/api/1.1/get/search/tweets
    * 
    * @return array $tweets
    */
    protected function getTweets($twitterSearchParams) {
        $Client = new TwitterApiClient(); //Use the TwitterAPIClient
        $Client->set_oauth ($this->consumer_key, $this->consumer_secret, $this->access_key, $this->access_secret);

        $tweets = $Client->call('search/tweets', $twitterSearchParams, 'GET' ); //call the service and get the list of tweets
        
        unset($Client);
        
        return $tweets;
    }
    
    protected function findSentiment($tweets) {
        $DatumboxAPI = new DatumboxAPI($this->datumbox_api_key); //initialize the DatumboxAPI client
        
        $results=array();
        foreach($tweets['statuses'] as $tweet) { //foreach of the tweets that we received
            if(isset($tweet['metadata']['iso_language_code']) && $tweet['metadata']['iso_language_code']=='en') { //perform sentiment analysis only for the English Tweets
                $sentiment=$DatumboxAPI->TwitterSentimentAnalysis($tweet['text']); //call Datumbox service to get the sentiment
                
                if($sentiment!=false) { //if the sentiment is not false, the API call was successful.
                    $results[]=array( //add the tweet message in the results
                        'id'=>$tweet['id_str'],
                        'user'=>$tweet['user']['name'],
                        'text'=>$tweet['text'],
                        'url'=>'https://twitter.com/'.$tweet['user']['name'].'/status/'.$tweet['id_str'],
                        
                        'sentiment'=>$sentiment,
                    );
                }
            }
            
        }
        
        unset($tweets);
        unset($DatumboxAPI);
        
        return $results;
    }
}
?>

What we do is pass to the constructor the necessary keys for all the services. Then on the public sentimentAnalysis function we first call Twitter service in order to get the list of tweets which much our search parameters and then we call for each tweet the Datumbox service to get is polarity.

This is it! You ready to use this class to perform Sentiment Analysis on tweets and build your own Social Media Monitoring tool. You can find the complete PHP code of the Twitter Sentiment Analysis tool on Github.

Extra: Detailed Information about the Twitter Sentiment Analysis Classifier

This part is optional for those of you who are interested in learning how Datumbox’s Twitter Sentiment Analysis works.

In order to detect the Sentiment of the tweets we used our Machine Learning framework to build a classifier capable of detecting Positive, Negative and Neutral tweets. Our training set consisted of 1.2 million tweets evenly distributed across the 3 categories. We tokenized the tweets by extracting their bigrams and by taking into account the URLs, the hash tags, the usernames and the emoticons.

In order to select the best features we used several different algorithms and at the end we chose the Mutual Information. Finally after performing several tests with various models and configurations we selected the Binarized Naïve Bayes as the best performing classifier for the particular problem (strangely enough Naïve Bayes beat SVM, Max Entropy and other classifiers which are known to perform usually better than NB). To evaluate the results we used the 10-fold cross-validation method and our best performing classifier achieves an accuracy of 83.26%.

Did you like the article? Please take a minute to share it on Twitter. 🙂

About 

My name is Vasilis Vryniotis. I'm a Machine Learning Engineer and a Data Scientist. Learn more

Latest Comments
  1. Vasilis Vryniotis

    Hi @Roberto,

    So basically what you mean is how someone can deal with the fact that the sentiment orientation/score of some words might change over time. This usually occurs when you are performing sentiment analysis on news articles or on twitter. For example the sentiment orientation/score of the name of a particular Political Candidate can change after a scandal is revealed.

    Sentiment drifts are a difficult topic in sentiment analysis as you know. If you are working on a domain where you expect such drifts, you can try to regularly update the training dataset over time. This way your algorithm will adapt the scores. The problem with this is that it is very hard to find an annotated fresh dataset. In that case you could use ensemble learning to combine the output of 2 classifiers: One that will use the old large dataset and one that will use a smaller one with a fresher new dataset and different feature selection method.

    Unfortunately given that I don’t know what type of analysis you are performing, I can’t provide you with a more targeted tip. At any case don’t expect to find a silver bullet for this problem. Sentiment Analysis has at its best 85% accuracy, so on its own it’s a very difficult problem. Let alone when you start dealing with irony, sarcasm, sentiment drifts and so on.

    Hope this helps!

    • Vasilis Vryniotis

      @Roberto, Sounds really interesting. If you publish your research, send us the link. 🙂

  2. Vasilis Vryniotis

    Hi @himanshu,

    I have found a dataset which contained 800k tweets (positive vs negative) and then I collected another 400k tweets for the neutral class mostly from editorial and news twitter accounts. The accuracy was estimated by doing a 10 fold cross validation.

  3. Vasilis Vryniotis

    Hey @Dough,

    Thanks for the comment. You can use various feature selection algorithms to extract features. Some of the methods that are widely used in Text Classification problems are the Chisquare and the Mutual Information methods.

  4. Vasilis Vryniotis

    Hi Fabian,

    I already did. I uploaded on github!

    🙂

  5. Vasilis Vryniotis

    Hi Su,

    You probably have an issue with the twitter API because I tested the script and works without problems. You must create a Twitter Application and add your keys in config file as described in the article.

    If you still face problems, try debugging the code to see which API does not respond.

  6. Vasilis Vryniotis

    Hi @rizki,

    It seems to me that the Twitter API client can’t access the certificate. Make sure it is on the right place and that it has read privileges.

    • Vasilis Vryniotis

      Hi Amber,

      The script is fully functional but you need to setup properly the Twitter App and have a valid Datumbox API key. Check again the configuration and follow the steps found in the article. If this does not work for you, I’m afraid you have to check it with a debugger to see where it stacks.

  7. Vasilis Vryniotis

    Hi Ismahane,

    Sure you can. Joel Hoskin was kind enough to build an implementation of datumbox API in Python and open source it. Check out our API page to download it:
    https://www.datumbox.com/machine-learning-api/

    At any case you can build your own implementation in a matter of minutes. The API is super simple so you should not face any problems.

    Good luck!

  8. Vasilis Vryniotis

    Hi Neelima,

    Of course you can just sign up for a free key and write an API client.

    Good luck!

  9. Vasilis Vryniotis

    I would loved to Gerald, but it is against the Twitter policy… 🙁

  10. Isabella

    Dear Vasilis,
    your project is really interesting.

    I am wondering if it is possible to modify the code to retrieve sentiment of italian tweet (equally tweets in other languages) using different training sets or the tool/Datumbox API works only with english texts.

    Thank you.
    Best regards,
    Isabella

    • Vasilis Vryniotis

      Hi Isabella,

      You don’t need to modify the code. You just provide training examples in the language of your choice. If you have the annotated data you can train the models. 🙂

      Cheers,
      Vasilis

  11. Martín

    Dear Vasilis,

    First at all, yours It’s a interesting project!!
    I don’t understand when you say “we selected the Binarized Naïve Bayes as the best performing classifier for the particular problem”, but you are classifying in 3 different categories (Positive, negative, and neutral)?

    I wait for your comments.

    Thanks in advance, regards,

    Martín.

  12. Scott

    Hi Vasilis,

    Do you have any documentation you can share regarding the accuracy/testing of the Twitter Sentimental Analysis tool?
    I am using Datumbox for academic research and would like to know more about your testing of the tool, the level of accuracy and what words are contained within datasets to judge their sentimental value.

    Thank you for your help.

  13. john ope

    I am new to sentiment analysis, it is a fascinating topic and i want to build a tool of my own using NetBeans java. I would love a video tutorial on how to approach building this tool using java.
    i await your guidance.

    thanks in advance

  14. ranjith ramachandran

    Dear Vasilis,

    First of all, Thanks for sharing your interesting project!!
    Do you have the similar projects for other networks like Face book,Instagram etc.. ?

  15. pragati

    hey can anyone tell me about how to train classifier using maximum entropy algorithm . i want programs in java.

  16. Robert Sebudandi

    Very interesting project and works well on my side . The issues is if i need it for a different language how can i get access to the dictionary to put that language words and also train with the language data sets. The tutorial just shows us how to connect using the available dictionary.

    Thank you

  17. srinivas raghav

    I started following a course to implement sentiment analysis where i installed various softwares like git,virtual box,text editor, vagrant Apparently the course didnt show how to analyse tweets and sentiment. It taught us how to take tweets and hashtags and filtering and also tweet count
    we built it using storm
    please help us in completing the sentiment analysis part


Leave a Reply

Your email address will not be published. Required fields are marked *

Captcha * Time limit is exhausted. Please reload the CAPTCHA.