Sentiment Analysis (detecting document’s polarity, subjectivity and emotional states) is a difficult problem and several times I bumped into unexpected and interesting results. One of the strangest things that I found is that despite the fact that neutral class can improve under specific conditions the classification accuracy, it is often ignored by most researchers.
During my research for my MSc Thesis, I bumped into some interesting properties of Neutral class and Max Entropy classifier. Originally we discussed with my supervisors the possibility of submitting an article about my findings, nevertheless due to my lack of time I decided not to. In this article I decided to present a light version of a part of my research and I am discussing the importance of Neutral class in sentiment analysis.
A similar classifier as described below is used by Datumbox’s Sentiment Analysis service and this powers up our API.
Neutral class is usually ignored
In Sentiment analysis, the neutrality is handled in various ways, depending on the technique that is being used. In lexicon-based techniques the neutrality score of the words is taken into account in order to either detect neutral opinions (Ding and Liu, 2008) or filter them out and enable algorithms to focus on words with positive and negative sentiment (Taboada et al, 2010).
On the other hand when statistical techniques are used, the way that neutrals are handled differs significantly. Some researchers consider that the objective (neutral) sentences of the text are less informative and thus they filter them out and focus only on the subjective statements in order to improve the binary classification (Bo Pang and Lillian Lee, 2002).In other cases they use hierarchical classification where the neutrality is determined first and sentiment polarity is determined second (Wilson et al, 2005).
Finally in most academic papers of sentiment analysis that use statistical approaches, researchers tend to ignore the neutral category under the assumption that neutral texts lie near the boundary of the binary classifier. Moreover it is assumed that there is less to learn from neutral texts comparing to the ones with clear positive or negative sentiment.
Neutral class is important
Koppel and Schler (2006) showed in their research both of the above assumptions are false. They suggested that as in every polarity problem three categories must be identified (positive, negative and neutral) and that the introduction of the neutral category can even improve the overall accuracy. Their work was primarily focused on SVM and they used geometric properties in order to improve the accuracy of their three binary classifiers.
An Intuitive Explanation of why Neutral class must not be ignored
An intuitive explanation of why neutral class is important is the following: Not all things are black and white and not all sentences have a sentiment. How would you classify the sentence “the weather is hot”? Is it positive or negative? It certainly does not provide any clue about whether the author likes this or not. The neutral class should not be considered as a state between positive and negative but as a separate class that denotes the lack of sentiment.
Also have in mind that when you use only 2 classes you basically force the features/words to be classified as either positive or negative leaving no room for neutrality. By doing so, we can dump into overfitting and become vulnerable to situations where due to randomness a particular neutral word occurs more times in positive or negative examples. Professors Koppel and Schler published a paper about this called “The Importance of Neutral Examples for Learning Sentiment” which I highly recommend you to read.
The Algorithmic Configurations that I used
In my Thesis I studied closely the problem and I wrote several classifiers including 3 Naïve Bayes variations (Multinomial, Binarized and Bernoulli), Max Entropy, SVMs, Softmax Regression, Adaboost and more. In the early stages of the research I tested all of them and based on the initial results I eliminated some of them by using several criteria such as: The overall accuracy, the variation across different datasets, the training and evaluation speed, their ability to parse large amount of data and the amount of resources that they use in terms of CPU and RAM. The selected classifiers were Max Entropy, Mutual Naïve Bayes, Binarized Naïve Bayes and SoftMax Regression.
My target was to build a classifier that is capable of detecting the sentiment in multiple domains. Thus the training dataset that I used came by combining a large number of well-known datasets from various topics. The final training dataset was balanced and had an equal number of examples in each class.
Originally I tested several feature selection algorithms including the Chi-square and the Mutual Information with different number of selected features. At the end I selected Chi-square because it provided better results for most classifiers. Below I present the results for algorithms that use the top 3000 bigrams extracted from 30000 examples.
During my research I tested various scenarios such as including and not including the neutral class, using single 3-class vs 3 binary classifiers and more. In this article I will focus only on the results of 3-class vs the 2-class classifiers because the 3-class outperformed the multiple binary classifiers.
Finally to evaluate the performance of the classifiers I use the average accuracy and to estimate it I performed a 10 fold cross validation.
Presenting the Results
Below we compare the average accuracy of 4 classifiers with and without the neutral class:
Average accuracy of 3-class and binary classification for 4 different classifiers
Max Entropy classifier provided better results for classification on multi-domain Sentiment Analysis and thus below I focus on this classifier. Based on the above chart Max Entropy is the only classifier that benefits marginally from the introduction of the neutral class. Moreover we can clearly see that Max Entropy outperforms all the other classifiers both for 3-class and for binary classifiers.
In order to ensure that the improvement of the overall accuracy can be reproduced, I use the same approach in various well-known datasets. Below I present the results of 3-class Max Entropy classifier versus Binary Max Entropy classifier for 3 commonly used datasets.
Accuracy of Max Entropy for 3-Class and Binary classification
The binary classifier was trained only by using the pos/neg dataset, while the 3-class classifier was trained by using both the pos/neg and the Neutral datasets. In all cases we used the same optimal configuration of Max Entropy classifier as we described it in section 6.5. To test the accuracy we used again the 10-fold cross validation method, training our classifier with 90% of the pos/neg dataset and testing it with the rest 10%. As we can clearly see in all cases, the accuracy of 3-class classifier is better than the one of the binary classifier.
Discussing the results
My findings can be summarized to the following: If we want to build a Max Entropy classifier that detects positive and negative texts, we can achieve better accuracy if we train it to detect positive, negative and neutral documents. The accuracy of the 3-class classifier will be higher compared to the two class classifier even if we evaluate only positive and negative documents.
We should also note that in all of the above cases, the testing data did not contain any neutral examples. Thus even though a very small percentage of documents (1.5-3% depending on the dataset) were falsely classified as neutral, all of them were considered as misclassifications. As a result, if we know beforehand that no neutral documents exist within our testing dataset, then we can arbitrary map the neutrals to either positive or negative and gain some additional 1-1.5% of accuracy.
Explaining the Results
As we saw earlier, Max Entropy is the only classifier that benefits from the introduction of neutral category. This is because Max Entropy principle states that our model should only take into account the important features, constrain our model to the expected probabilities of those features and use the closest to uniform model that satisfies our constrains. In simple words, the Max Entropy principle tells us to avoid any extrapolations/false assumptions and focus only to the things that we do know.
The introduction of the neutral category is fully aligned with the above principle. Within all documents there are words that do not have any sentiment orientation. By using binary classifiers we force the algorithm to put those neutral words to either the positive or negative category. Nevertheless if we choose to use a 3-class classifier, we avoid the above false assumptions and false extrapolations and thus we reduce overfitting. Obviously this is a property that applies only when we use 3-class classifiers and it does not work with multiple binary classifiers.
Another reason why the neutral category can improve accuracy is because it enables the feature selection algorithm to select features of higher quality. Again this property applies only to 3-class classifiers.
Also as we know not all documents are either positive or negative; neutral documents also exist. Koppel and Schler (2006) mentioned on their research that Neutral is not a state between positive and negative. Neutral is the lack of sentiment and this category must be detected in sentiment analysis. In their work they used the geometric properties of Support Vector Machines to assist their classifiers separate the positives from the negatives. This idea applies also in Maximum Entropy when we use a 3-class classifier. This is because as mentioned earlier, the feature selection algorithm selects better features and the Maximum Entropy algorithm assigns better weights to them, making it easier to separate the positive from the negative documents.
I tried to present an as easy as possible explanation of the above without at the same time missing important information about the methodology. If you have any questions or objections concerning the method that I used, leave your message below. I’ll be happy to provide more information or learn from what you have to suggest! 🙂
Did you like the article? Please take a minute to share it on Twitter. 🙂