In this article we will show how you can build a simple Sentiment Analysis tool which classifies tweets as positive, negative or neutral by using the Twitter REST API 1.1v and the Datumbox API 1.0v. Even though the examples will be given in PHP, you can very easily build your own tools in the computer language of your choice.
Update: The Datumbox Machine Learning Framework is now open-source and free to download. If you want to build a Sentiment Analysis classifier without hitting the API limitations, use the com.datumbox.applications.nlp.TextClassifier class.
You can find the complete PHP code of the Twitter Sentiment Analysis tool on Github.
Social Media Monitoring is one of the hottest topics nowadays. As more and more companies use Social Media Marketing to promote their brands, it became necessary for them to be able to evaluate the effectiveness of their campaigns.
Building a Social Media Monitoring tool requires at least 2 modules: one that evaluates how many people are influenced by the campaign and one that finds out what people think about the brand.
Evaluating the generated buzz is usually performed by using various KPIs such as the number of followers/friends, the number of likes/shares/RTs per post and more complex ones such as the engagement rate, the response rate and other composite metrics. Quantifying the buzz is usually straightforward and can be performed by using basic statistics.
On the other hand, being able to evaluate the opinion of the users is not a trivial matter. Evaluating their opinions requires performing Sentiment Analysis, which is the task of identifying automatically the polarity, the subjectivity and the emotional states of particular document or sentence. It requires using Machine Learning and Natural Language Processing techniques and this is where most of the developers hit the wall when they try to build their own tools.
Thankfully Datumbox simplifies the process of using Machine Learning since it offers several API functions which allow you to build custom Social Media Monitoring tools in no time. Some of the services that are available to the API users are the Sentiment Analysis, the Twitter Sentiment Analysis and the Subjectivity Analysis API functions. In this article we will focus only on the Twitter Sentiment Analysis method, nevertheless as you can easily find out the rest of the functions work similarly.
Performing Sentiment Analysis on Twitter is trickier than doing it for large reviews. This is because the tweets are very short (only about 140 characters) and usually contain slangs, emoticons, hash tags and other twitter specific jargon. This is the reason why Datumbox offers a completely different classifier for performing Sentiment Analysis on Twitter.
In order to build the Sentiment Analysis tool we will need 2 things: First of all be able to connect on Twitter and search for tweets that contain a particular keyword. Second evaluate the polarity (positive, negative or neutral) of the tweets based on their words. For the first task we will use the Twitter REST API 1.1v and for the second the Datumbox API 1.0v.
To speed up the development we will use 2 classes: The great PHP-Twitter-API client written by Tim Whitlock and the Datumbox PHP-API-Client offered by our service. As you will soon find out getting the tweets via the Twitter API is the most complicated task of this tutorial.
Unfortunately Twitter made it more complicated for developers to use their API. In order to be able to search for particular tweets you must authenticate yourself by using OAuth protocol. Fortunately the API client of Tim takes care most of the tasks and enables a fast and easy integration. Still you are required to create a new Twitter application before using the library.
So go to Twitter Applications Console, login by using your credentials, click on “Create new Application” button and fill in the form to register a new app. When you create it select the application and go to the “Details” tab (the first tab) and on the bottom of the page click the “Create my access token” button. Once you do this, go to the “OAuth tool” tab and note down the values: Consumer Key, Consumer secret, Access token and Access token secret.
To access the Datumbox API sign up for a free account and visit your API Credentials panel to get your API Key.
All we need to do in order to develop the tool is write a TwitterSentimentAnalysis class which uses the Twitter and Datumbox API Clients to fetch the tweets and evaluate their polarity.
Below you can see the code along with the necessary comments.
<?php class TwitterSentimentAnalysis { protected $datumbox_api_key; //Your Datumbox API Key. Get it from https://www.datumbox.com/apikeys/view/ protected $consumer_key; //Your Twitter Consumer Key. Get it from https://dev.twitter.com/apps protected $consumer_secret; //Your Twitter Consumer Secret. Get it from https://dev.twitter.com/apps protected $access_key; //Your Twitter Access Key. Get it from https://dev.twitter.com/apps protected $access_secret; //Your Twitter Access Secret. Get it from https://dev.twitter.com/apps /** * The constructor of the class * * @param string $datumbox_api_key Your Datumbox API Key * @param string $consumer_key Your Twitter Consumer Key * @param string $consumer_secret Your Twitter Consumer Secret * @param string $access_key Your Twitter Access Key * @param string $access_secret Your Twitter Access Secret * * @return TwitterSentimentAnalysis */ public function __construct($datumbox_api_key, $consumer_key, $consumer_secret, $access_key, $access_secret){ $this->datumbox_api_key=$datumbox_api_key; $this->consumer_key=$consumer_key; $this->consumer_secret=$consumer_secret; $this->access_key=$access_key; $this->access_secret=$access_secret; } /** * This function fetches the twitter list and evaluates their sentiment * * @param array $twitterSearchParams The Twitter Search Parameters that are passed to Twitter API. Read more here https://dev.twitter.com/docs/api/1.1/get/search/tweets * * @return array */ public function sentimentAnalysis($twitterSearchParams) { $tweets=$this->getTweets($twitterSearchParams); return $this->findSentiment($tweets); } /** * Calls the Search/tweets method of the Twitter API for particular Twitter Search Parameters and returns the list of tweets that match the search criteria. * * @param mixed $twitterSearchParams The Twitter Search Parameters that are passed to Twitter API. Read more here https://dev.twitter.com/docs/api/1.1/get/search/tweets * * @return array $tweets */ protected function getTweets($twitterSearchParams) { $Client = new TwitterApiClient(); //Use the TwitterAPIClient $Client->set_oauth ($this->consumer_key, $this->consumer_secret, $this->access_key, $this->access_secret); $tweets = $Client->call('search/tweets', $twitterSearchParams, 'GET' ); //call the service and get the list of tweets unset($Client); return $tweets; } protected function findSentiment($tweets) { $DatumboxAPI = new DatumboxAPI($this->datumbox_api_key); //initialize the DatumboxAPI client $results=array(); foreach($tweets['statuses'] as $tweet) { //foreach of the tweets that we received if(isset($tweet['metadata']['iso_language_code']) && $tweet['metadata']['iso_language_code']=='en') { //perform sentiment analysis only for the English Tweets $sentiment=$DatumboxAPI->TwitterSentimentAnalysis($tweet['text']); //call Datumbox service to get the sentiment if($sentiment!=false) { //if the sentiment is not false, the API call was successful. $results[]=array( //add the tweet message in the results 'id'=>$tweet['id_str'], 'user'=>$tweet['user']['name'], 'text'=>$tweet['text'], 'url'=>'https://twitter.com/'.$tweet['user']['name'].'/status/'.$tweet['id_str'], 'sentiment'=>$sentiment, ); } } } unset($tweets); unset($DatumboxAPI); return $results; } } ?>
What we do is pass to the constructor the necessary keys for all the services. Then on the public sentimentAnalysis function we first call Twitter service in order to get the list of tweets which much our search parameters and then we call for each tweet the Datumbox service to get is polarity.
This is it! You ready to use this class to perform Sentiment Analysis on tweets and build your own Social Media Monitoring tool. You can find the complete PHP code of the Twitter Sentiment Analysis tool on Github.
This part is optional for those of you who are interested in learning how Datumbox’s Twitter Sentiment Analysis works.
In order to detect the Sentiment of the tweets we used our Machine Learning framework to build a classifier capable of detecting Positive, Negative and Neutral tweets. Our training set consisted of 1.2 million tweets evenly distributed across the 3 categories. We tokenized the tweets by extracting their bigrams and by taking into account the URLs, the hash tags, the usernames and the emoticons.
In order to select the best features we used several different algorithms and at the end we chose the Mutual Information. Finally after performing several tests with various models and configurations we selected the Binarized Naïve Bayes as the best performing classifier for the particular problem (strangely enough Naïve Bayes beat SVM, Max Entropy and other classifiers which are known to perform usually better than NB). To evaluate the results we used the 10-fold cross-validation method and our best performing classifier achieves an accuracy of 83.26%.
Did you like the article? Please take a minute to share it on Twitter. 🙂
2013-2024 © Datumbox. All Rights Reserved. Privacy Policy | Terms of Use
Hi @Roberto,
So basically what you mean is how someone can deal with the fact that the sentiment orientation/score of some words might change over time. This usually occurs when you are performing sentiment analysis on news articles or on twitter. For example the sentiment orientation/score of the name of a particular Political Candidate can change after a scandal is revealed.
Sentiment drifts are a difficult topic in sentiment analysis as you know. If you are working on a domain where you expect such drifts, you can try to regularly update the training dataset over time. This way your algorithm will adapt the scores. The problem with this is that it is very hard to find an annotated fresh dataset. In that case you could use ensemble learning to combine the output of 2 classifiers: One that will use the old large dataset and one that will use a smaller one with a fresher new dataset and different feature selection method.
Unfortunately given that I don’t know what type of analysis you are performing, I can’t provide you with a more targeted tip. At any case don’t expect to find a silver bullet for this problem. Sentiment Analysis has at its best 85% accuracy, so on its own it’s a very difficult problem. Let alone when you start dealing with irony, sarcasm, sentiment drifts and so on.
Hope this helps!
@Roberto, Sounds really interesting. If you publish your research, send us the link. 🙂
Hi @himanshu,
I have found a dataset which contained 800k tweets (positive vs negative) and then I collected another 400k tweets for the neutral class mostly from editorial and news twitter accounts. The accuracy was estimated by doing a 10 fold cross validation.
Hey @Dough,
Thanks for the comment. You can use various feature selection algorithms to extract features. Some of the methods that are widely used in Text Classification problems are the Chisquare and the Mutual Information methods.
Hi Fabian,
I already did. I uploaded on github!
🙂
Hi Su,
You probably have an issue with the twitter API because I tested the script and works without problems. You must create a Twitter Application and add your keys in config file as described in the article.
If you still face problems, try debugging the code to see which API does not respond.
Hi @rizki,
It seems to me that the Twitter API client can’t access the certificate. Make sure it is on the right place and that it has read privileges.
Hi Amber,
The script is fully functional but you need to setup properly the Twitter App and have a valid Datumbox API key. Check again the configuration and follow the steps found in the article. If this does not work for you, I’m afraid you have to check it with a debugger to see where it stacks.
Hi Ismahane,
Sure you can. Joel Hoskin was kind enough to build an implementation of datumbox API in Python and open source it. Check out our API page to download it:
https://www.datumbox.com/machine-learning-api/
At any case you can build your own implementation in a matter of minutes. The API is super simple so you should not face any problems.
Good luck!
Hi Neelima,
Of course you can just sign up for a free key and write an API client.
Good luck!
I would loved to Gerald, but it is against the Twitter policy… 🙁
Dear Vasilis,
your project is really interesting.
I am wondering if it is possible to modify the code to retrieve sentiment of italian tweet (equally tweets in other languages) using different training sets or the tool/Datumbox API works only with english texts.
Thank you.
Best regards,
Isabella
Hi Isabella,
You don’t need to modify the code. You just provide training examples in the language of your choice. If you have the annotated data you can train the models. 🙂
Cheers,
Vasilis
Dear Vasilis,
First at all, yours It’s a interesting project!!
I don’t understand when you say “we selected the Binarized Naïve Bayes as the best performing classifier for the particular problem”, but you are classifying in 3 different categories (Positive, negative, and neutral)?
I wait for your comments.
Thanks in advance, regards,
Martín.
Hi Vasilis,
Do you have any documentation you can share regarding the accuracy/testing of the Twitter Sentimental Analysis tool?
I am using Datumbox for academic research and would like to know more about your testing of the tool, the level of accuracy and what words are contained within datasets to judge their sentimental value.
Thank you for your help.
I am new to sentiment analysis, it is a fascinating topic and i want to build a tool of my own using NetBeans java. I would love a video tutorial on how to approach building this tool using java.
i await your guidance.
thanks in advance
Dear Vasilis,
First of all, Thanks for sharing your interesting project!!
Do you have the similar projects for other networks like Face book,Instagram etc.. ?
hey can anyone tell me about how to train classifier using maximum entropy algorithm . i want programs in java.
Very interesting project and works well on my side . The issues is if i need it for a different language how can i get access to the dictionary to put that language words and also train with the language data sets. The tutorial just shows us how to connect using the available dictionary.
Thank you
I started following a course to implement sentiment analysis where i installed various softwares like git,virtual box,text editor, vagrant Apparently the course didnt show how to analyse tweets and sentiment. It taught us how to take tweets and hashtags and filtering and also tweet count
we built it using storm
please help us in completing the sentiment analysis part