8 Enterprise Class Open Source Tools for Data and Text Mining

Hers are some open source enterprise level tools for data and text mining tools for data pre-processing, classification, regression, clustering, association rules, and visualization.

Rajkumar Maurya

15 Oct 2015 06:29 IST

New Update

Hers are some open source enterprise level tools for data pre-processing, classification, regression, clustering, association rules, and visualization.

Advertisment

1. RapidMiner/RapidAnalytics: RapidMiner is a powerful, easy to use and intuitive graphical user interface for the design of analytic processes. RapidMiner claims to be "the world-leading open-source system for data and text mining." RapidAnalytics is a server version of that product.

publive-image

2. Mahout

Advertisment

This Apache project offers algorithms for clustering, classification and batch-based collaborative filtering that run on top of Hadoop. The project's goal is to build scalable machine learning libraries. Manhout Algorithm includes many new implementations built for speed on Mahout-Samsara.

publive-image

3. Orange

Advertisment

Open source data visualization and data analysis for novice and expert. Interactive workflows with a large toolbox. It offers a wide variety of visualizations, plus a toolbox of more than 100 widgets.

publive-image

4. Weka

Advertisment

Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization.

publive-image

5. jHepWork

Advertisment

jHepWork is an environment for scientific computation, data analysis and data visualization for scientists, engineers, and students. The program is fully multiplatform (100% Java) and integrated with the Jython (Python) scripting language. Currently moved to the ScaVis project.

publive-image

6. KEEL

Advertisment

KEEL stands for "Knowledge Extraction based on Evolutionary Learning," and it aims to help users assess evolutionary algorithms for data mining problems like regression, classification, clustering and pattern mining. It includes a large collection of existing algorithms that it uses to compare and with new algorithms.

publive-image

7. SPMF

Advertisment

It is a Java-based data mining framework, primarily focused on sequential pattern mining, but now also includes tools for association rule mining, sequential rule mining and frequent itemset mining. It offers implementations of 109 data mining algorithms

publive-image

8. Rattle

It presents statistical and visual summaries of data, transforms data that can be readily modelled, builds both unsupervised and supervised models from the data, presents the performance of models graphically, and scores new datasets.

publive-image

data big-data opensource hadoop

Advertisment