15 Open Source Artificial Intelligence Tools

by February 13, 2017 0 comments

One of the hottest areas in technology right now is the Artificial Intelligence (AI). Big like IBM, Google, Microsoft, Facebook and Amazon investing lots money in the R&D to take the AI to next level. Even companies like Samsung last year take over a start-up to roll out it’s of AI assistant Bixby. Given the level of interest, here are some for tools for Building the next generation of AI algorithms.


Caffe is a deep learning framework made with expression, speed, and modularity in mind. It is managed by the Berkeley Vision and Learning Center (BVLC), and companies like NVIDIA and Amazon have made grants to support its development. It that Expressive architecture encourages application and innovation. Models and optimization are defined by configuration without hard-coding. Switch between CPU and GPU by setting a single flag to train on a GPU machine then deploy to commodity clusters or mobile devices.

Microsoft Cognitive Toolkit

The Microsoft Cognitive Toolkit—previously known as CNTK—empowers you to harness the intelligence within massive datasets through deep learning by providing uncompromised scaling, speed, and accuracy with commercial-grade quality and compatibility with the programming languages and algorithms you already use. It boasts outstanding performance whether it is running on a system with only CPUs, a single GPU, multiple GPUs or multiple machines with multiple GPUs


Deeplearning4j is the first commercial-grade, open-source, distributed deep-learning library written for Java and Scala. Integrated with Hadoop and Spark, DL4J is designed to be used in business environments on distributed GPUs and CPUs. Skymind is its commercial support arm.

Distributed Machine Learning Toolkit

Distributed machine learning has become more important than ever in this big data era. Especially in recent years, practices have demonstrated the trend that more training data and bigger models tend to generate better accuracies in various applications. It consists of three key components: the DMTK framework, the LightLDA topic model algorithm, and the Distributed (Multisense) Word Embedding algorithm.


H2O was written from scratch in Java and seamlessly integrates with the most popular open source products like Apache Hadoop and Spark to give customers the flexibility to solve their most challenging data problems. Set up and get started quickly using either H2O’s intuitive web-based Flow graphical user interface or familiar programming environments like R, Python, Java, Scala, JSON, and through our powerful APIs. Models can be visually inspected during training, which is unique to H2O.


Mahout is an open source machine learning framework. It offers three major features: A simple and extensible programming environment and framework for building scalable algorithms, premade algorithms for tools like Spark and H2O, and a vector-math experimentation environment called Samsara. Companies using Mahout include Adobe, Accenture, Foursquare, Intel, LinkedIn, Twitter, Yahoo and many others. Professional support is available through third parties listed on the website.


MLlib fits into Spark’s APIs and interoperates with NumPy in Python and R libraries (as of Spark 1.5). You can use any Hadoop data source (e.g. HDFS, HBase, or local files), making it easy to plug into Hadoop workflows. It includes a host of machine learning algorithms for classification, regression, decision trees, recommendation, clustering, topic modeling, feature transformations, model evaluation, ML pipeline construction, ML persistence, survival analysis, frequent itemset and sequential pattern mining, distributed linear algebra and statistics.


NuPIC is an open source artificial intelligence project based on a theory called Hierarchical Temporal Memory, or HTM. HTM is not a Deep Learning or Machine Learning technology. It is a machine intelligence framework strictly based on neuroscience and the physiology and interaction of pyramidal neurons in the neocortex of the mammalian brain. Essentially, HTM is an attempt to create a computer system modeled after the human neocortex. The goal is to create machines that “approach or exceed human level performance for many cognitive tasks.”


OpenNN is an open source class library written in C++ programming language which implements neural networks, the main area of machine learning research. The main advantage of OpenNN is its high performance. It is developed in C++ for better memory management and higher processing speed and implements CPU parallelization by means of OpenMP and GPU acceleration with CUDA.


The OpenCyc Platform is your gateway to the full power of Cyc, the world’s largest and most complete general knowledge base and commonsense reasoning engine. OpenCyc contains hundreds of thousands of Cyc terms organized in a carefully designed ontology. Cycorp offers this ontology at no cost and encourages you to make use of, and extend, this ontology rather than starting your own from scratch. OpenCyc can be used as the basis of a wide variety of intelligent applications such as rich domain modeling, semantic data integration, text understanding, domain-specific expert systems and game AIs.

Oryx 2

Oryx 2 is a realization of the lambda architecture built on Apache Spark and Apache Kafka, but with specialization for real-time large-scale machine learning. It is a framework for building applications but also includes packaged, end-to-end applications for collaborative filtering, classification, regression, and clustering.


Apache PredictionIO (incubating) is an open source Machine Learning Server built on top of state-of-the-art open source stack for developers and data scientists create predictive engines for any machine learning task. It helps users create predictive engines with machine learning capabilities that can be used to deploy Web services that respond to dynamic queries in real time.


Apache SystemML provides an optimal workplace for machine learning using big data. It can be run on top of Apache Spark, where it automatically scales your data, line by line, determining whether your code should be run on the driver or an Apache Spark cluster.


TensorFlow is an open source software library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) communicated between them. The flexible architecture allows you to deploy computation to one or more CPUs or GPUs in a desktop, server, or mobile device with a single API.


The torch is a scientific computing framework with wide support for machine learning algorithms that puts GPUs first. It is easy to use and efficient, thanks to an easy and fast scripting language, LuaJIT, and an underlying C/CUDA implementation. The torch comes with a large ecosystem of community-driven packages in machine learning, computer vision, signal processing, parallel processing, image, video, audio, and networking among others, and builds on top of the Lua community.

No Comments so far

Jump into a conversation

No Comments Yet!

You can be the one to start a conversation.

Your data will be safe!Your e-mail address will not be published. Also other data will not be shared with third person.