Developers

Performance Tuning a MySQL Database

PCQ Bureau

01 Jun 2011 11:57 IST

New Update

Your database table seems to be well-indexed and error-proof, yet a simple query on it takes ages to complete. Or may be web apps look good in the dev environment, but become equally bad in the production environment. If you are a database admin, chances are that you have already encountered above situations at some stage or the other. Therefore, in this series of articles, we shall be looking at debugging, myth-busting and handling certain common (and uncommon) MySQL issues. In this first part, we begin with certain simple and easily implementable tips and tricks.

Applies To:Database admins
USP: Powerful tips and tricks to optimize your MySQL DB's performance
Related articles:Streamlining queries to MySQL - http://ld2.in/36x; New Backup Features in MySQL 6.0 - http://ld2.in/36y

Storage engine woes

If your table uses transactions, you should consider using InnoDB as it comes with full ACID compliance. However, if you do not require transactions, it would be wiser to stick to MyISAM, the default storage engine. Also, do not try to sail on two boats, er...sorry, storage engines. Consider this: in a transaction, some tables use InnoDB while the rest are on MyISAM. The outcome? The entire subject will be nullified, with only the ones in the transaction being brought back to original state, the rest dumped with committed data. Needless to say, this will lead to inconsistency across the database. However, there exists a simple way to enjoy both the flavours! Most MySQL distributions nowadays include InnoDB, compiled and linked! But if you opt for MyISAM, you can still download InnoDB separately, and use it as a plugin! Simple, eh?

Counting issues

If your table employs a storage engine that supports transactions (such as InnoDB), you shouldn't use COUNT(*) to find out the total number of rows in the table. The reason being that using COUNT(*) on a production class database will at the very most return an approximate value, as at any given time, some transactions will be running. Such incorrect result from COUNT(*) will obviously generate bugs if put to use.

The default storage engine for MySQL is MyISAM, which does not support transactions. However, engines such as InnoDB are favored over MyISAM as the latter has a (notorious) distinction of not being the best fault tolerant storage engine. This, in fact, beats the myth that MySQL is faster than PostgreSQL. COUNT(*) returns the results quickly in MySQL only when operating under MyISAM. If the storage engine is changed to InnoDB, COUNT(*) takes the same amount of time as PostgreSQL.

Test queries...repeatedly

The major headache with queries is not the fact that no matter how careful one is, something or the other is bound to be left out and cause a bug later on. Rather, the problem is the timing at which the bug surfaces, which in most cases is after the application/database has gone live. There really exists no sure-shot strategy to counter it, except for the test samples that you must run on your application/database. Any database query cannot be approved unless it is subjected to chunks of thousands of record samples.

Countering table scans

More often than not, if MySQL (or any relational database model) has to search or scan for any particular record in a table, a full table scan is used. Again, more often than not, the easiest cure here is to use index tables to solve the problem as full table scans result in poor performance. However, as we shall see in subsequent issues, this does not come without its share of fallacies.

Using 'EXPLAIN' query

EXPLAIN is an excellent command when it comes to debugging, so let us explore it in depth.

First, let us create a sample table:

CREATE TABLE 'awesome_pcq' (
'emp_id' INT(10) NOT NULL
DEFAULT '0' ,
'full_name' VARCHAR(100) NOT NULL ,
'email_id' VARCHAR(100) NOT NULL ,
'password' VARCHAR(50) NOT NULL ,
'deleted' TINYINT(4) NOT NULL ,
PRIMARY KEY ('emp_id')
) COLLATE = 'utf8_general_ci'
ENGINE = InnoDB
ROW_FORMAT = DEFAULT

The table is self-explanatory, with five columns, the last 'deleted' being a Boolean flag to check if an account is active or has been deleted. Next, you may populate this table with sample records (say, 100 employee records). As you can see, the Primary Key lies on 'emp_id'. So, using the email address and password fields, we can easily create a query to validate or deny a login attempt, as follows:

SELECT COUNT (*) FROM awesome_pcq WHERE
email_id = 'blahblah' AND password = 'blahblah' AND deleted = 0

Oops! I've already told you to avoid using COUNT(*). Let me rectify: SELECT emp_id FROM awesome_pcq WHERE
email_id = 'blahblah' AND password = 'blahblah' AND deleted = 0

Now, let us introspect. In the first instance, we queried to locate and return the number of rows where 'email_id' and 'password' were equal to the given values. In the second case, we did the same but instead decided to ask the value of 'emp_id' for all the rows that satisfied the given criterion. What'd you say? Which query is the more expensive?

Apparently, both of them are equally expensive database killing queries because unintentionally, we are querying
for a full table scan in each case. To understand better, execute this:

EXPLAIN SELECT emp_id FROM awesome_pcq WHERE
email_id = 'blahblah' AND password = 'blahblah' AND deleted = 0

In the output, concentrate on the second-last column, 'rows'. Assuming that we had populated the table with 100 records, it will show 100 in the first row, which is the number of rows that MySQL needs to scan in order to evaluate the result of this query. What does this show? Yes, a full table scan (read: memory hog). To overcome this evil, we need to add indexes.

Adding Indexes

First things first: its a bad idea to create indexes to every second problem that you might encounter. Excessive indexing leads to slower performances and resource hog. Before going any further, let us create a sample index on our example:

ALTER TABLE 'awesome_pcq' ADD INDEX 'LoginValidate' ('email_id')

Next, run the query again:

EXPLAIN SELECT emp_id FROM awesome_pcq WHERE
email_id = 'blahblah' AND password = 'blahblah' AND deleted = 0

Now notice the value. Instead of 100, it should now say 1. Thus, MySQL is now scanning only 1 row in order to give you the output of this query, thanks to the earlier created index. You might notice, the index created is only for the email address field while the query searches for other fields too. This shows that MySQL first performs a cros-check to see if any of the values specified in the WHERE clause has indexes defined for it, and if so, performs accordingly. However, it isn't that every iteration will be reduced to one. If, for instance, the indexed field is not unique (such as employee names, which can have identical values in two rows), there will be multiple records left even after indexing. Yet, it will still be better off than full table scan. Also, the order of columns specified in the WHERE clause does not play a role in the process. If, for instance, in the above query, you reverse the order of fields such that email address comes last, MySQL will still iterate on the basis of the indexed column.Well, with indexing at your finger tips, you must've noticed how to avoid numerous full table scans and gain better results. Still, there is a long way to go! In the coming parts of the series, we shall be further exploring indexing as well as busting some of the myths associated with MySQL databases. Till then, happy querying!

Stay connected with us through our social media channels for the latest updates and news!