Datumbox Machine Learning Framework v0.8.1 released

The Datumbox v0.8.1 has been released! Download it now from Github or Maven Central Repository.

What is new?

The main focus of version 0.8.1 is to resolve various bugs, update the depedencies and improve the code architecture of the framework. Here are the details:

  • Dependencies:
    • Updated the Maven Compiler, Nexus Staging, Surefire, SLF4J and Logback Classic plugins to the latest stable versions.
  • Code Improvements & Bug Fixes:
    • FlatDataColletion:
      • The copyCollection2DoubleArray() and copyCollection2Array() methods have been removed.
      • It now implements the Collection Interface instead of the Iterable.
    • Descriptives:
      • New count() method returns the number of non-null elements.
      • All methods can now handle null values. Null values are considered missing and they are ignored from the calculations.
    • TextClassifier:
      • The pipeline steps of the Text Classifier change to Feature Selection, Numerical Standardization and Modeling.
      • The Categorical Encoding step is no longer executed as the Text Extractor already encodes the words as numeric values.

Acknowledgements

Many thanks to Jose Luis for his help in detecting and reproducing some of the patched bugs. As always many thanks my friend and colleague Eleftherios Bampaletakis for his invaluable feedback.

 

Don’t forget to clone the code of Datumbox Framework v0.8.1 from Github, check out the Code Examples and download the pre-trained Machine Learning models from Datumbox Zoo. I am looking forward to your comments and suggestions.

About 

My name is Vasilis Vryniotis. I'm a Data Scientist, a Software Engineer, author of Datumbox Machine Learning Framework and a proud geek. Learn more

Latest Comments
  1. Timothy

    Hey,

    Is there a way we can update the model while our app is being used by users. This way I think we can together improve the models.

    Thanks,
    Timothy

    • Vasilis Vryniotis

      Unfortunately the framework does not currently have an algorithm for Online Learning. Typically such models on production environments are updated in mini-batch way, so you actually never update your model on the spot with every observation. Datumbox does not limit you from developing/extending an algo that does Online Learning. It’s just that it does not include one out of the box.


Leave a Reply

Your email address will not be published. Required fields are marked *

Captcha * Time limit is exhausted. Please reload the CAPTCHA.