Advertisment

How to Choose An Open Source Database

author-image
PCQ Bureau
New Update


Advertisment

Advertisment

Sufyan bin Uzayr, Freelance Writer, Graphic Artist, Photographer, www.sufyan.co.nr

If you are an application developer or a database administrator, you already know the importance of Relational Database Management Systems. RDMSs store data in a reliable and flexible manner and facilitate its easy retrieval - and of course, they can be manipulated using Structured Query Language (SQL).

However, when it comes to database software, the choices are many - PostgreSQL and MySQL are the two most popular open source databases, but they're not the only choices. There are innumerable others to choose from as well. While having choices does indeed feel good, deciding which one to go for can be a difficult task amidst such 'problem of plenty'.

Advertisment

Snapshot

Price: Database admins

Applies to: Free

USP: Deciding among multiple database solutions

Related articles: Roadblocks to Open Source Adoption - http://ld2.in/41n


Search engine keywords: database, mysql, postgresql, nosql, rdbms

Further more, choosing a database is a decision with long-lasting effects - you need to get it right the first time itself, because once you decide on a database, going back and hopping on to another will not be an easy task.

Without further ado, we shall now take a look at the factors that need to be borne in mind before and during choosing an open source database solution.

Advertisment



Things to consider before choosing a database

"A database is a database is a database" - that is one notion you shouldn't have in mind while deciding a database. Each database has its own share of merits and demerits and neither can fit in the other's role. Sure, at the end of the day they all do the same thing - store and retrieve data. However, the mode of operation for each database solution differs from the other, at least in minute terms. For instance, all vehicles do the same task - run on roads (or elsewhere, depends). However, you won't buy a Ferrari to transport your commercial luggage, will you? Similarly, databases too come with different architecture and models, each with its own share of advantages and disadvantages.



Think of the present, keep an eye on the future

Advertisment

This is by far the most important criterion that one needs to consider while choosing a database. To keep it short, you need to consider your present requirements - security, scalability, ease of use, robustness, etc. However, apart from the present requirements, you should also think of the requirements that might arise in the future as your organization grows. It is always a good idea to select a database that has a good scope for flexibility so that as your organization progresses, your database solution is ready to meet the new challenges.



Mind the budget

While this point clearly seems obvious, a reminder doesn't hurt: while choosing an open source database solution, you need to bear your budget in mind. Most open source databases are free (or almost free), however, finding the staff and technical expertise required to handle each might vary in terms of costs. This, along with cost of maintenance, needs to be considered before opting for an open source database.

Advertisment



Things to consider while choosing a database

Simply put, a database needs to store and retrieve data. In the world of databases, data integrity is expressed under the following heads (handy acronym - ACID):

  • Atomicity: Each query should complete, with little scope for incomplete states
  • Consistency: The database should remain in a legal state after a transaction
  • Isolation: Each transaction should occur independently of the other
  • Durability: Transactions should persist over time (perhaps be stored on a filesystem).
Advertisment



Features and goodies

Most databases come with a huge set of features and choosing the ideal one that meets your needs can be confusing. Further more, the feature set of present day databases is so vast that more often than not, most features remain under-utilized. From an SME's point of view, features such as scalability, extensions and ease of use are indispensable.



Licensing and support

At this stage, you also need to be wary of the fine print. Most of the popular databases used today are open source, and they offer an equally wonderful service as their proprietary counterparts.

However, support tends to vary at times and this is where the 'community' element comes in handy. For instance, if you decide to go for PostgreSQL, and ever run into any problem, you can be sure to find help among the thousands of articles on the Internet, apart from PostgreSQL itself. In short, you're in safe hands! This applies with most of the well known databases. The level of support at hand is directly proportional to the popularity of the database.



To SQL or not to SQL, that is the question!

Before we move on to check the main options that are available for us, it is worth while to mention NoSQL. We all are aware of the pros and cons of SQL databases, and in all likelihood, majority of us must be using the same. NoSQL, on the other hand, is an alternative database technology that differs from traditional RDBMS models in the fact that it does not essentially require fixed table schemas. Usage of NoSQL in web development and gaming/networking is growing as we speak.

Most enterprises use traditional RDBMSs for storing data, though NoSQL databases show a better tenacity of handling large chunks of data and thus, the chances of migration are high. However, on the downside, NoSQL databases generally suffer from poor co-existence model with SQL options, non-adherence to ACID, etc. As a result, while NoSQL seems to be blessing for SMEs looking to manage large amounts of data, the trade-off is in terms of ACID.



'Fork' it?

Of late, there has been a flood of opinion pieces across the internet and print tech media about forks of popular database solutions being the ideal option for enterprises. The justification for the same is the fact that such forks of major databases offer additional features such as alternative storage engines and plugins that can further help enterprises get the maximum out of their database deployment. Does this hold true?

Well, for large enterprises, yes it does! However, from an SME's perspective, opting for such forks simply means choosing a rather lesser-known database. This in turn results in additional headaches in terms of less documentation or support and deployment options.

More importantly, such forks generally come with a lot of additional features - many of which are not required for an SME. For example, MariaDB is a fork of MySQL with drop-in replacement functionality. It includes alternate storage engines (such as XtraDB and PBXT), server optimizations and patches. Now, in all likelihood, a small- to medium-sized enterprise is not going to require XtraDB or PBXT and InnoDB will just fit the bill. As a result, opting for MariaDB here hardly makes sense.



And... a new approach!

So thus far, we've seen that neither forked projects, nor NoSQL offer a totalitarian alternative to SQL. As a result, several alternative databases have sprung up, collectively clubbed as NewSQL or ScalableSQL.

To begin with, the 'new' in NewSQL refers to vendors, not SQL. NewSQL overall refers to a set of high performance scalable databases. All such databases have SQL at thei backbone, full ACID-compliance, concurrency control coupled with scaled-out architecture resulting in higher per-node performance. NewSQL databases fall under multiple categories, the chief among the databases being Drizzle and NuoDB.



MySQL

MySQL claims to be "the world's most popular open source database". It forms part of LAMP, the reputed acronym in web development, along with Linux, Apache and PHP. Clearly, it is the backbone of some of the web's major entities such as WordPress, Drupal and phpBB. If that isn't good enough, MySQL powers the likes of Wikipedia, Facebook, Twitter and Slashdot. Therefore, if you opt for MySQL, you're surely going to keep good company!

In its initial stages, MySQL was designed to be a web server back-end with no support for the ACID features mentioned above. However, of late, it has shifted to full compliance with ACID in the form of InnoDB.

MySQL ownership has also seen its fair share of hiccups. It was acquired by Sun Microsystems in 2008, which was in turn acquired by Oracle. From an SME's perspective, Oracle's part means that the license becomes extra complicated - MySQL is available in several versions - some free, some paid. The core code of MySQL is GPL, though commercial licenses are also available.

Perhaps the main forte of MySQL lies in its speedy operation. MyISAM storage engine is known for its no-nonsense and fast operation for read-only databases. Thus, MySQL can be summed up as the web storage back-end that is ideal for performing fast reads and multiple operations. Do bear in mind that for sensitive data, your first choice of engine should be InnoDB.



PostgreSQL

PostgreSQL is another very popular database solution that, in contrast to MySQL, claims to be "the world's most advanced open source database". It is a community driven open source project licensed under a liberal BSD/MIT variant. However, unlike most other community driven projects, PostgreSQL offers only one version instead of multiple ones.

PostgreSQL is renowned to be a rock-solid, well engineered and reliable database. It is backed by extensive documentation, full ACID-compliance and, if need be, commercial support is provided via independent vendors. PostgreSQL serves the likes of Reddit, Skype and Disqus.

In stark contrast to MySQL, PostgreSQL is an integrated database server with a single storage engine. It comes with several methods to manage high availability, load balancing and replication.



CUBRID

CUBRID is an open source RDBMS optimized for web applications. It can handle large chunks of data and generate concurrent requests. It is coded in C.

For an enterprise, CUBRID comes loaded with high availability, sync/async/semi-sync replication, online and/or incremental backup facility, and optimization features that save cost overheads in terms of deployment. If your enterprise requires multiple concurrent users and availability is a concern, CUBRID should be a worthy consideration. Further more, CUBRID is also preferred by Defence departments and intelligence services of many nations (such as South Korea), and this adds to its secure credentials, thereby making it ideal for enterprises to store sensitive data such as customer credentials.



Other contenders worth looking at

Let us now take a look at some other noteworthy options in terms of database solutions.



Apache Derby

Apache Derby is an RDBMS option implemented in JAVA and available under the Apache License. If you need JAVA or JDBC, Apache Derby should be considered. Its prime advantage is the super-small size - mere 2.6 MB for the base engine and the embedded JDBC driver.



LucidDB

LucidDB caters especially to data warehousing and business intelligence. It supports column store, bitmap indexing, hash/join aggregation and page level multiversioning. It is highly scalable and you can implement it using a single Windows/Linux server. However, since it caters specifically to data warehousing and business intelligence, most SMEs will either find it not fitting the overall picture or, even worse, too narrow to suit their interests.



SQLite

SQLite is a software library with a transactional SQL database engine. For an enterprise, opting for SQLite makes sense if data is to be deployed and speedy retrieval is a concern.



SmallSQL

SmallSQL is an open source database engine licensed under LGPL. It is coded entirely in JAVA and that adds to its 'nimble' factor. On the downside, it lacks a network interface and user management (you cannot share the database between multiple applications at the same time).



Drizzle

Drizzle is a database optimized for the cloud and web. It is totally compatible with ACID, multi-CPU architecture, parallel concurrency, and comes with numerous APIs (some of which are custom coded for SME usage).

Drizzle is open source and easily customizable, though at the most basic level, you should consider it only if you require massive or heavy concurrency.



Ingres Database

Ingres Database is a NoSQL based database system which, though commercially supported, is open source. It is one of the few ACID-compliant NoSQL databases and is fully transactional. If you are considering NoSQL as an option, Ingres Database should be on your list. For all other practical reasons, it offers nothing new for SMEs.

With that, we come to the end of this round-up regarding open source databases. If you have deployed or are using any of these databases in an enterprise environment, feel free to share your thoughts at pcquest@cybermedia.co.in

Advertisment