8 Enterprise Class Open Source Tools for Data and Text Mining

by October 15, 2015 0 comments

Hers are some open source enterprise level tools  for data pre-processing, classification, regression, clustering, association rules, and visualization.

1. RapidMiner/RapidAnalytics: RapidMiner is a powerful, easy to use and intuitive graphical user interface for the design of analytic processes. RapidMiner claims to be “the world-leading open-source system for data and text mining.” RapidAnalytics is a server version of that product.

2. Mahout

This Apache project offers algorithms for clustering, classification and batch-based collaborative filtering that run on top of Hadoop. The project’s goal is to build scalable machine learning libraries.  Manhout Algorithm includes many new implementations built for speed on Mahout-Samsara.

3. Orange

Open source data visualization and data analysis for novice and expert. Interactive workflows with a large toolbox. It offers a wide variety of visualizations, plus a toolbox of more than 100 widgets.

4. Weka

Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization.

5. jHepWork

jHepWork is an environment for scientific computation, data analysis and data visualization for scientists, engineers, and students. The program is fully multiplatform (100% Java) and integrated with the Jython (Python) scripting language. Currently moved to the ScaVis project.


KEEL stands for “Knowledge Extraction based on Evolutionary Learning,” and it aims to help users assess evolutionary algorithms for data mining problems like regression, classification, clustering and pattern mining. It includes a large collection of existing algorithms that it uses to compare and with new algorithms.


It is a Java-based data mining framework, primarily focused on sequential pattern mining, but now also includes tools for association rule mining, sequential rule mining and frequent itemset mining. It offers implementations of 109 data mining algorithms

8. Rattle

It presents statistical and visual summaries of data, transforms data that can be readily modelled, builds both unsupervised and supervised models from the data, presents the performance of models graphically, and scores new datasets.

No Comments so far

Jump into a conversation

No Comments Yet!

You can be the one to start a conversation.