How to build an Intelligent Antispam WordPress Plugin

wordpress-machine-learning-antispamIn this article we will see how we can build a WordPress plugin which uses Machine Learning to block spam, adult or even negative comments from our blog. The plugin is compatible with WordPress 3.6v or higher and uses Datumbox API 1.0v. Even though this article discusses the development of a WordPress plugin, we should note that by using the Datumbox API it is very easy to protect any type of online community from spam, offensive or inappropriate content. Read below and you will see how this is achieved.

Update: The Datumbox Machine Learning Framework is now open-source and free to download. If you want to build a Anti-Spam classifier without hitting the API limitations, use the com.datumbox.applications.nlp.TextClassifier class.

You can download the complete code of the Machine Learning Antispam WordPress Plugin from WordPress or Github.

The target of the WordPress plugin

Our target is to build a plugin which fires-up every time that someone submits a new comment. It should enable the blog owner to select what type of content he/she wishes to block. To make the plugin intelligent and make use of Machine Learning, we will use 3 of the available API functions of Datumbox: the Spam Detection, the Adult Content Detection and the Sentiment Analysis.

Installing the Plugin

Installing the plugin is super easy:

  1. Just download it, extract the zip file and move the contained “machine-learning-antispam” folder in your WordPress Plugins folder.
  2. Go to your Admin Area, click on the Plugins menu and Activate the plugin.
  3. Finally go to the left menu and select Settings=>Machine Learning Antispam. All you need to do is add your Datumbox API key and select the type of comments that you wish to filter (spam, adult or negative).

Using the plugin requires you to have a Datumbox API key. You can get one for free by signing-up for a Datumbox account. Once you register go to your API Credentials area copy your API Key and paste it in the aforementioned configuration page.

Building the Machine Learning Antispam Plugin

First of all we create a folder called “machine-learning-antispam”. This folder will contain all the files of our plugin. In order to be able to easily call the Datumbox API, we download the PHP Datumbox API client and we copy inside the previous folder the DatumboxAPI.php file. We do this because the DatumboxAPI Class provides us with a very easy interface to call the Datumbox API. Again as you will see later, the easiest part of this tutorial is to incorporate the Machine Learning functions on your software. This is because the Datumbox API is very easy to use and comes with several already implemented API clients in various languages.

The second step is to create an “options.php” file which will contain all the Configuration functions and Admin pages that are required for managing the plugin. This is where we place the code that adds our plugin in the Settings menu and prints the configuration page. To learn more about this I strongly recommend you reading the official WordPress guide “Creating Options Pages”. Here is the code of options.php file:

<?php
if (!function_exists('add_action')) {
    die();
}

add_action('admin_menu', 'machinelearningantispam_admin_menu');

function machinelearningantispam_admin_menu() {
    add_submenu_page('options-general.php', __('Machine Learning Antispam'), __('Machine Learning Antispam'), 'manage_options', 'machine-learning-antispam-config', 'machinelearningantispam_conf_page');
    
    //call register settings function
    add_action( 'admin_init', 'machinelearningantispam_settings' );
}

function machinelearningantispam_settings() {
    register_setting( 'machinelearningantispam-settings-group', 'datumbox_api_key');
    register_setting( 'machinelearningantispam-settings-group', 'machinelearningantispam_filterspam');
    register_setting( 'machinelearningantispam-settings-group', 'machinelearningantispam_filteradult');
    register_setting( 'machinelearningantispam-settings-group', 'machinelearningantispam_filternegative');
}

function machinelearningantispam_conf_page() {    
    ?>
    <div class="wrap">
    <h2><?php echo __('Machine Learning Antispam'); ?></h2>

    <?php
        if(get_option('datumbox_api_key')=='') {
    ?>
        <p><b><?php echo __('In order to use this plugin you must have a Datumbox API key. Sign up for a Free Datumbox Account:'); ?></b></p>
        <button onclick="window.location='https://www.datumbox.com/users/register/';" class="button button-primary"><?php echo __('Register Now'); ?></button>
        <br/>
        <br/>
        <hr/><br/>
    <?php
        }
    ?>
    
    <form method="post" action="options.php">
        <?php settings_fields( 'machinelearningantispam-settings-group' ); ?>
        <?php //do_settings( 'machinelearningantispam-settings-group' ); ?>
        <table class="form-table">
            <tr valign="top">
            <th scope="row"><?php echo __('Datumbox API Key'); ?></th>
            <td><input type="text" name="datumbox_api_key" value="<?php echo get_option('datumbox_api_key'); ?>" /></td>
            </tr>
            <tr valign="top">
            <th scope="row"><?php echo __('Filter Spam Comments'); ?></th>
            <td><input type="checkbox" name="machinelearningantispam_filterspam" value="1" <?php echo (get_option('machinelearningantispam_filterspam'))?'checked="checked"':''; ?> /></td>
            </tr>
            <tr valign="top">
            <th scope="row"><?php echo __('Filter Adult Comments'); ?></th>
            <td><input type="checkbox" name="machinelearningantispam_filteradult" value="1" <?php echo (get_option('machinelearningantispam_filteradult'))?'checked="checked"':''; ?> /></td>
            </tr>
            <tr valign="top">
            <th scope="row"><?php echo __('Filter Negative Comments'); ?></th>
            <td><input type="checkbox" name="machinelearningantispam_filternegative" value="1" <?php echo (get_option('machinelearningantispam_filternegative'))?'checked="checked"':''; ?> /></td>
            </tr>
        </table>
        
        <?php submit_button(); ?>

    </form>
    </div>
    <?php 
} 
?>

In the third step, we proceed to the development of the core file of our plugin. We create a file called machine-learning-antispam.php and we place in it the machinelearningantispam_check_comment() function which runs every time a new comment is submitted. This function checks the options and calls the DatumboxAPI services in order to validate whether the comment is spam, adult or negative. If the comment is classified by Datumbox service as spam or adult the comment is marked as “spam” while if it turns out to be negative it is marked as “pending”. Here is the code of the file:

<?php
/**
* Plugin Name: Machine Learning Antispam
* Plugin URI: https://www.datumbox.com
* Description: This WordPress Plugin uses Machine Learning to detect spam and adult content comments and mark them as spam. Additionally it allows you to filter negative comments and keep them pending for approval.
* Version: 1.0
* Author: Vasilis Vryniotis
* Author URI: https://www.datumbox.com
* License: GPL2
*/

if (!function_exists('add_action')) {
    die(); //block direct web requests
}
require_once(dirname( __FILE__ ).'/DatumboxAPI.php'); //require the DatumboxAPI client to easily call Datumbox API

if (is_admin()) { //if admin include the admin specific functions
    require_once(dirname( __FILE__ ).'/options.php');
}

function machinelearningantispam_get_key() {
    return get_option('datumbox_api_key'); //return the api key of datumbox
}

function machinelearningantispam_call_datumbox($commentText,$type_of_check) {
    $apiKey=machinelearningantispam_get_key(); //fetch the API key
    if($apiKey==false || $apiKey=='') {
        return true; //don't block the comment if the plugin is not well configured
    }
    
    $DatumboxAPI = new DatumboxAPI($apiKey); //initialize DatumboxAPI Client
    
    if($type_of_check=='spam') {
        $response=$DatumboxAPI->SpamDetection($commentText); //Call Spam Detection service
        
        if($response=='spam') { //if spam return false
            return false;
        }
    }
    else if($type_of_check=='adult') {
        $response=$DatumboxAPI->AdultContentDetection($commentText); //Call Adult Content Detection service
        
        if($response=='adult') { //if adult return false
            return false;
        }
    }
    else if($type_of_check=='negative') {
        $response=$DatumboxAPI->SentimentAnalysis($commentText); //Call Sentiment Analysis service
        
        if($response=='negative') { //if negative return false
            return false;
        }
    }
    
    unset($DatumboxAPI);
    
    return true;
}

function machinelearningantispam_check_comment($commentdata) {
    
    if(get_option('machinelearningantispam_filterspam') && machinelearningantispam_call_datumbox($commentdata['comment_content'],'spam')==false) {
        //if Spam filtering is on and the Datumbox Service considers it spam then mark it as spam
        add_filter('pre_comment_approved', 'machinelearningantispam_result_spam');
    }
    else if(get_option('machinelearningantispam_filteradult') && machinelearningantispam_call_datumbox($commentdata['comment_content'],'adult')==false) {
        //if Adult filtering is on and the Datumbox Service considers it adult then mark it as spam
        add_filter('pre_comment_approved', 'machinelearningantispam_result_spam');
    }
    else if(get_option('machinelearningantispam_filternegative') && machinelearningantispam_call_datumbox($commentdata['comment_content'],'negative')==false) {
        //if Negative filtering is on and the Datumbox Service considers it negative then mark it as pending
        add_filter('pre_comment_approved', 'machinelearningantispam_result_pending');
    }
    
    return $commentdata;
}

function machinelearningantispam_result_spam() {
    return 'spam';
}

function machinelearningantispam_result_pending() {
    return 0;
}

add_action( 'preprocess_comment' , 'machinelearningantispam_check_comment' ); 

?>

As we can see above the 2 main functions of the plugin are the machinelearningantispam_call_datumbox() and machinelearningantispam_check_comment(). The first function uses the Datumbox PHP API Client to call the API functions. The second function checks whether the plugin is configured to block spam, adult and negative comments and if these are enabled it calls the API. If the API marks the comment as inappropriate we update the status of the comment to spam or pending.

That’s it! You now have a plugin which is able to fight spam with the power of Machine Learning!

Did you like the article? Please take a minute to share it on Twitter. 🙂

About 

My name is Vasilis Vryniotis. I'm a Machine Learning Engineer and a Data Scientist. Learn more


Leave a Reply

Your email address will not be published. Required fields are marked *

Captcha * Time limit is exhausted. Please reload the CAPTCHA.