Drilling into Spark’s ALS Recommendation algorithm

Drilling into Spark’s ALS Recommendation algorithm

The ALS algorithm introduced by Hu et al., is a very popular technique used in Recommender System problems, especially when we have implicit datasets (for example clicks, likes etc). It can handle large volumes of data reasonably well and we can find many good implementations in various Machine Learning frameworks. Spark includes the algorithm in […]

Read More

Getting the GPU usage of NVIDIA cards with the Linux dstat tool

Getting the GPU usage of NVIDIA cards with the Linux dstat tool

The dstat is an awesome little tool which allows you to get resource statistics for your Linux box. It has a modular architecture which allows you to develop additional plugins and it’s easy to use. Recently I was profiling a Deep Learning pipeline developed with Keras and Tensorflow and I needed detailed statistics about the […]

Read More

Datumbox Machine Learning Framework version 0.8.0 released

Datumbox Machine Learning Framework version 0.8.0 released

Datumbox Framework v0.8.0 is out and packs several powerful features! This version brings new Preprocessing, Feature Selection and Model Selection algorithms, new powerful Storage Engines that give better control on how the Models and the Dataframes are saved/loaded, several pre-trained Machine Learning models and lots of memory & speed improvements. Download it now from Github […]

Read More

Datumbox Machine Learning Framework 0.7.0 Released

Datumbox Machine Learning Framework 0.7.0 Released

I am really excited to announce that, after several months of development, the new version of Datumbox is out! The 0.7.0 version brings multi-threading support, fast disk-based training for datasets that don’t fit in memory, several algorithmic enhancements and better architecture. Download it now from Github or Maven Central Repository. What is new? The focus […]

Read More

Datumbox Machine Learning Framework 0.6.0 Released

Datumbox Machine Learning Framework 0.6.0 Released

The new version of Datumbox Machine Learning Framework has been released! Download it now from Github or Maven Central Repository. What is new? The main focus of version 0.6.0 is to extend the Framework to handle Large Data, improve the code architecture and the public APIs, simplify data parsing, enhance the documentation and move to […]

Read More

How to install and use the Datumbox Machine Learning Framework

How to install and use the Datumbox Machine Learning Framework

In this guide we are going to discuss how to install and use the Datumbox Machine Learning framework in your Java projects. Since almost all of the code is written in Java, using it is as simple as including it as dependency in your Java project. Nevertheless a couple of classes (DataEnvelopmentAnalysis and LPSolver) use […]

Read More

New open-source Machine Learning Framework written in Java

New open-source Machine Learning Framework written in  Java

I am happy to announce that the Datumbox Machine Learning Framework is now open sourced under GPL 3.0 and you can download its code from Github! What is this Framework? The Datumbox Machine Learning Framework is an open-source framework written in Java which enables the rapid development of Machine Learning models and Statistical applications. It […]

Read More

Clustering with Dirichlet Process Mixture Model in Java

Clustering with Dirichlet Process Mixture Model in Java

In the previous articles we discussed in detail the Dirichlet Process Mixture Models and how they can be used in cluster analysis. In this article we will present a Java implementation of two different DPMM models: the Dirichlet Multivariate Normal Mixture Model which can be used to cluster Gaussian data and the Dirichlet-Multinomial Mixture Model […]

Read More

Using Artificial Intelligence to solve the 2048 Game (JAVA code)

Using Artificial Intelligence to solve the 2048 Game (JAVA code)

By now most of you have heard/played the 2048 game by Gabriele Cirulli. It’s a simple but highly addictive board game which requires you to combine the numbers of the cells in order to reach the number 2048. As expected the difficulty of the game increases as more cells are filled with high values. Personally […]

Read More

Measuring the Social Media Popularity of Pages with DEA in JAVA

Measuring the Social Media Popularity of Pages with DEA in JAVA

In the previous article we have discussed about the Data Envelopment Analysis technique and we have seen how it can be used as an effective non-parametric ranking algorithm. In this blog post we will develop an implementation of Data Envelopment Analysis in JAVA and we will use it to evaluate the Social Media Popularity of […]

Read More

How to build your own Facebook Sentiment Analysis Tool

How to build your own Facebook Sentiment Analysis Tool

In this article we will discuss how you can build easily a simple Facebook Sentiment Analysis tool capable of classifying public posts (both from users and from pages) as positive, negative and neutral. We are going to use Facebook’s Graph API Search and the Datumbox API 1.0v. Similar to the Twitter Sentiment Analysis tool that […]

Read More

Developing a Naive Bayes Text Classifier in JAVA

Developing a Naive Bayes Text Classifier in JAVA

In previous articles we have discussed the theoretical background of Naive Bayes Text Classifier and the importance of using Feature Selection techniques in Text Classification. In this article, we are going to put everything together and build a simple implementation of the Naive Bayes text classification algorithm in JAVA. The code of the classifier is […]

Read More

Coding Brain Neurons by using Hodgkin-Huxley model

Coding Brain Neurons by using Hodgkin-Huxley model

Understanding how the human brain works is a topic of active research and several scientists from various fields publish numerous of papers every year. Why is it important? Because knowing how our brain works will enable us to understand how we operate/think and perhaps enable us build truly intelligent machines in the future. The first […]

Read More

How to build an Intelligent Antispam WordPress Plugin

How to build an Intelligent Antispam WordPress Plugin

In this article we will see how we can build a WordPress plugin which uses Machine Learning to block spam, adult or even negative comments from our blog. The plugin is compatible with WordPress 3.6v or higher and uses Datumbox API 1.0v. Even though this article discusses the development of a WordPress plugin, we should […]

Read More

How to build your own Twitter Sentiment Analysis Tool

How to build your own Twitter Sentiment Analysis Tool

In this article we will show how you can build a simple Sentiment Analysis tool which classifies tweets as positive, negative or neutral by using the Twitter REST API 1.1v and the Datumbox API 1.0v. Even though the examples will be given in PHP, you can very easily build your own tools in the computer […]

Read More

Extending CakePHP’s CacheHelper to use Cache Engines

Extending CakePHP’s CacheHelper to use Cache Engines

CakePHP is an MVC PHP framework which can make your life easier and your development several times faster. Despite the fact that it is considered a relatively slow framework, it comes with a large number of Cache Engines (FileCache, ApcCache, Wincache, XcacheEngine, MemcacheEngine and RedisEngine) which can help you improve the speed of your website or PHP […]

Read More

Last updated by at .