How to install and use the Datumbox Machine Learning Framework

How to install and use the Datumbox Machine Learning Framework

In this guide we are going to discuss how to install and use the Datumbox Machine Learning framework in your Java projects. Since almost all of the code is written in Java, using it is as simple as including it as dependency in your Java project. Nevertheless a couple of classes (DataEnvelopmentAnalysis and LPSolver) use […]

Read More

New open-source Machine Learning Framework written in Java

New open-source Machine Learning Framework written in  Java

I am happy to announce that the Datumbox Machine Learning Framework is now open sourced under GPL 3.0 and you can download its code from Github! What is this Framework? The Datumbox Machine Learning Framework is an open-source framework written in Java which enables the rapid development of Machine Learning models and Statistical applications. It […]

Read More

Clustering with Dirichlet Process Mixture Model in Java

Clustering with Dirichlet Process Mixture Model in Java

In the previous articles we discussed in detail the Dirichlet Process Mixture Models and how they can be used in cluster analysis. In this article we will present a Java implementation of two different DPMM models: the Dirichlet Multivariate Normal Mixture Model which can be used to cluster Gaussian data and the Dirichlet-Multinomial Mixture Model […]

Read More

Clustering documents and gaussian data with Dirichlet Process Mixture Models

Clustering documents and gaussian data with Dirichlet Process Mixture Models

This article is the fifth part of the tutorial on Clustering with DPMM. In the previous posts we covered in detail the theoretical background of the method and we described its mathematical representationsmu and ways to construct it. In this post we will try to link the theory with the practice by introducing two models […]

Read More

The Dirichlet Process Mixture Model

The Dirichlet Process Mixture Model

This blog post is the fourth part of the series on Clustering with Dirichlet Process Mixture Models. In previous articles we discussed the Finite Dirichlet Mixture Models and we took the limit of their model for infinite k clusters which led us to the introduction of Dirichlet Processes. As we saw, our target is to […]

Read More

The Dirichlet Process the Chinese Restaurant Process and other representations

The Dirichlet Process the Chinese Restaurant Process and other representations

This article is the third part of the series on Clustering with Dirichlet Process Mixture Models. The previous time we defined the Finite Mixture Model based on Dirichlet Distribution and we posed questions on how we can make this particular model infinite. We briefly discussed the idea of taking the limit of the model when […]

Read More

Finite Mixture Model based on Dirichlet Distribution

Finite Mixture Model based on Dirichlet Distribution

This blog post is the second part of an article series on Dirichlet Process mixture models. In the previous article we had an overview of several Cluster Analysis techniques and we discussed some of the problems/limitations that rise by using them. Moreover we briefly presented the Dirichlet Process Mixture Models, we talked about why they […]

Read More

Overview of Cluster Analysis and Dirichlet Process Mixture Models

Overview of Cluster Analysis and Dirichlet Process Mixture Models

In the ISO research project for my MSc in Machine Learning at Imperial College London, I focused on the problem of Cluster Analysis by using Dirichlet Process Mixture Models. The DPMMs is a “fully-Bayesian” unsupervised learning technique which unlike other Cluster Analysis methods does not require us to predefine the total number of clusters within […]

Read More

Using Artificial Intelligence to solve the 2048 Game (JAVA code)

Using Artificial Intelligence to solve the 2048 Game (JAVA code)

By now most of you have heard/played the 2048 game by Gabriele Cirulli. It’s a simple but highly addictive board game which requires you to combine the numbers of the cells in order to reach the number 2048. As expected the difficulty of the game increases as more cells are filled with high values. Personally […]

Read More

Measuring the Social Media Popularity of Pages with DEA in JAVA

Measuring the Social Media Popularity of Pages with DEA in JAVA

In the previous article we have discussed about the Data Envelopment Analysis technique and we have seen how it can be used as an effective non-parametric ranking algorithm. In this blog post we will develop an implementation of Data Envelopment Analysis in JAVA and we will use it to evaluate the Social Media Popularity of […]

Read More

Data Envelopment Analysis Tutorial

Data Envelopment Analysis Tutorial

Data Envelopment Analysis, also known as DEA, is a non-parametric method for performing frontier analysis. It uses linear programming to estimate the efficiency of multiple decision-making units and it is commonly used in production, management and economics. The technique was first proposed by Charnes, Cooper and Rhodes in 1978 and since then it became a […]

Read More

How to build your own Facebook Sentiment Analysis Tool

How to build your own Facebook Sentiment Analysis Tool

In this article we will discuss how you can build easily a simple Facebook Sentiment Analysis tool capable of classifying public posts (both from users and from pages) as positive, negative and neutral. We are going to use Facebook’s Graph API Search and the Datumbox API 1.0v. Similar to the Twitter Sentiment Analysis tool that […]

Read More

Developing a Naive Bayes Text Classifier in JAVA

Developing a Naive Bayes Text Classifier in JAVA

In previous articles we have discussed the theoretical background of Naive Bayes Text Classifier and the importance of using Feature Selection techniques in Text Classification. In this article, we are going to put everything together and build a simple implementation of the Naive Bayes text classification algorithm in JAVA. The code of the classifier is […]

Read More

Using Feature Selection Methods in Text Classification

Using Feature Selection Methods in Text Classification

In text classification, the feature selection is the process of selecting a specific subset of the terms of the training set and using only them in the classification algorithm. The feature selection process takes place before the training of the classifier. Update: The Datumbox Machine Learning Framework is now open-source and free to download. Check […]

Read More

Permanent bans to duplicate account owners

Permanent bans to duplicate account owners

It came to our attention that more and more users try to create multiple accounts on Datumbox service in order to generate additional API calls per day, despite the fact that it is strictly forbidden by our Terms of Use. Even though that creating multiple accounts will not give additional calls to the service, this […]

Read More

Using Datumbox API with Ruby & Node.js and other featured Projects

Using Datumbox API with Ruby & Node.js and other featured Projects

Since the introduction of Datumbox service, several software engineers and researchers used our API in order to develop innovative new applications. In the past we have featured, many developers who chose to open-source their projects and contribute their code to the community. In this article I am happy to feature the newest most interesting projects […]

Read More

Machine Learning Tutorial: The Multinomial Logistic Regression (Softmax Regression)

Machine Learning Tutorial: The Multinomial Logistic Regression (Softmax Regression)

In the previous two machine learning tutorials, we examined the Naive Bayes and the Max Entropy classifiers. In this tutorial we will discuss the Multinomial Logistic Regression also known as Softmax Regression. Implementing Multinomial Logistic Regression in a conventional programming language such as C++, PHP or JAVA can be fairly straightforward despite the fact that […]

Read More

Machine Learning Tutorial: The Max Entropy Text Classifier

Machine Learning Tutorial: The Max Entropy Text Classifier

In this tutorial we will discuss about Maximum Entropy text classifier, also known as MaxEnt classifier. The Max Entropy classifier is a discriminative classifier commonly used in Natural Language Processing, Speech and Information Retrieval problems. Implementing Max Entropy in a standard programming language such as JAVA, C++ or PHP is non-trivial primarily due to the […]

Read More

Tuning the learning rate in Gradient Descent

Tuning the learning rate in Gradient Descent

In most Supervised Machine Learning problems we need to define a model and estimate its parameters based on a training dataset. A popular and easy-to-use technique to calculate those parameters is to minimize model’s error with Gradient Descent. The Gradient Descent estimates the weights of the model in many iterations by minimizing a cost function […]

Read More

Coding Brain Neurons by using Hodgkin-Huxley model

Coding Brain Neurons by using Hodgkin-Huxley model

Understanding how the human brain works is a topic of active research and several scientists from various fields publish numerous of papers every year. Why is it important? Because knowing how our brain works will enable us to understand how we operate/think and perhaps enable us build truly intelligent machines in the future. The first […]

Read More

Machine Learning Tutorial: The Naive Bayes Text Classifier

Machine Learning Tutorial: The Naive Bayes Text Classifier

In this tutorial we will discuss about Naive Bayes text classifier. Naive Bayes is one of the simplest classifiers that one can use because of the simple mathematics that are involved and due to the fact that it is easy to code with every standard programming language including PHP, C#, JAVA etc. Update: The Datumbox […]

Read More

Using Datumbox API with Python and R languages

Using Datumbox API with Python and R languages

The Datumbox API can be used by any modern computer language which enables you to generate web requests. Our Machine Learning API can easily be implemented within minutes because it uses REST and JSON technologies and because all the requests are authenticated simply by passing your API Key. To test the API all you need […]

Read More

The importance of Neutral Class in Sentiment Analysis

The importance of Neutral Class in Sentiment Analysis

Sentiment Analysis (detecting document’s polarity, subjectivity and emotional states) is a difficult problem and several times I bumped into unexpected and interesting results. One of the strangest things that I found is that despite the fact that neutral class can improve under specific conditions the classification accuracy, it is often ignored by most researchers. During […]

Read More

How to build an Intelligent Antispam WordPress Plugin

How to build an Intelligent Antispam WordPress Plugin

In this article we will see how we can build a WordPress plugin which uses Machine Learning to block spam, adult or even negative comments from our blog. The plugin is compatible with WordPress 3.6v or higher and uses Datumbox API 1.0v. Even though this article discusses the development of a WordPress plugin, we should […]

Read More

10 Tips for Sentiment Analysis projects

10 Tips for Sentiment Analysis projects

In my Thesis project for the MSc in Statistics I focused on the problem of Sentiment Analysis. The Sentiment Analysis is an application of Natural Language Processing which targets on the identification of the sentiment (positive vs negative vs neutral), the subjectivity (objective vs subjective) and the emotional states of the document. I worked on […]

Read More

How to build your own Twitter Sentiment Analysis Tool

How to build your own Twitter Sentiment Analysis Tool

In this article we will show how you can build a simple Sentiment Analysis tool which classifies tweets as positive, negative or neutral by using the Twitter REST API 1.1v and the Datumbox API 1.0v. Even though the examples will be given in PHP, you can very easily build your own tools in the computer […]

Read More

What is Machine Learning?

What is Machine Learning?

Machine Learning is a fascinating era. It is when Computer Science joins forces with the Statistical Science and magical things popup. Why is that? Because by applying knowledge from both fields you are able to analyze a large amount of information, detect patterns, predict future outcomes and extract knowledge. If I am to give a […]

Read More

Extending CakePHP’s CacheHelper to use Cache Engines

Extending CakePHP’s CacheHelper to use Cache Engines

CakePHP is an MVC PHP framework which can make your life easier and your development several times faster. Despite the fact that it is considered a relatively slow framework, it comes with a large number of Cache Engines (FileCache, ApcCache, Wincache, XcacheEngine, MemcacheEngine and RedisEngine) which can help you improve the speed of your website or PHP […]

Read More