Advertisment

Optimize Your MySQL Database

author-image
PCQ Bureau
New Update

MySQL is a pretty fast database and serves most requirements for a decent

sized Web application. But the dragon starts raising its head, when the number

of records grow. In this series we will look into various ways of optimizing the

MySQL database so as to tame the dragon from time to time.

Advertisment

The simplest optimization in MySQL is to choose correct data types, depending

on the record set stored in the table. Many MySQL database designers fail to do

this, resulting in database driven websites failing miserably. In this article,

we will cover the most common data types used in MySQL and how to use built-in

MySQL functionalities to figure out the most optimal data type for a column.

Size does matter



A standard practice among database designers is to define primary key as an
auto-increment INTEGER. By default MySQL will set this as Unsigned INT. This

means you can store 4,294,967,295 records in the given table. Sounds good from a

scalability point of view but if you are designing the table for a small website

ask yourself a question, is your website going to get 4,294,967,295 users ? Of

course we wish your website gets these many users but most websites do not have

so many users.

Thus choosing Unsigned INT as userID might not be a good choice. For small to

medium sized websites choosing MEDIUIMINT as userID will be the bast choice.

Advertisment

When compared to INT, MEDIUMINT consumes approximately 25% less space. A

column with INT type consumes 4 bytes whereas a column with MEDIUMINT type will

consume 3 bytes.

Direct Hit!
Applies To:

Database designers



USP: Learn various ways of optimizing
MySQL database



Primary Link: None


Keywords: Optimizing MySQL Database

MySQL has several specialized data types which require lesser memory and

space. Smaller the data type size, faster is the record look up and faster the

query execution. Specially the primary index of a table should be as small as

possible to make row identification faster.

Advertisment

Of course when more users start joining your website you can always alter the

table to go to next higher data type. MySQL manual is a good source to know size

of different data types and identify the most suitable, smallest data type for a

column. Storage requirements for each data type can be found here http://dev.mysql.com/doc/refman/5.0/en/storage-requirements.html

.



Rule to remember is to start small and grow the field type as record set
increase in the table.

NULL is not NULL



NULL columns require additional one bit per column. Setting column as NOT NULL
can make everything faster. Specially when your query involves subqueries, MySQL

will perform an additional search for rows that contain NULL. This search can be

avoided when there are no NULLs in the table. You should definitely use NULL if

your application requires them but do not set columns as NULLable by default.

Using a constant value which falls outside domain range for the column instead

of NULL can speed up the look ups.

Date and time



Setting record creation date and record update date to DATETIME is a favorite
among database designer. Most people are not aware that DATETIME is among the

costliest data types in terms of space. DATETIME consumes 8 bytes of space. Date

and time types should be avoided unless you need to compare dates in your quries.

Advertisment

If you have to store dates for display purpose only, storing Unix time stamp

as a MEDIUMINT can surve the purpose. All programing languages provide mechanism

for creating current Unix time stamp and converting Unix time stamp back to date

format. So, 1223890532 will result in faster look up compared to 2008-10-13

05:35:32.

CHAR or VARCHAR ?



CHAR is fixed width and VARCHAR is variable length. But VARCHAR requires

extra byte to store the record. If length of VARCHAR is 255 or less it requires

one byte but if the length of VARCHAR is greater than 255 bytes it requires two

bytes. A VARCHAR is store as one-byte or two-byte length prefix plus data.

The length prefix contains the number of bytes in the record. A field

declared as VARCHAR(5) may take 6 bytes instead of 5 bytes as expected.

Advertisment

Finding the correct data type



MySQL comes with a built-in procedure to analyze a query and suggests the

optimal data types for each column. This procedure is called "Procedure Analyse".

Procedure analyse will analyze the existing data in a table and try to guess

the optimal data type based on the distinct values per column present in the

table.

The syntax for this procedure is following;

Advertisment

SELECT col1, col2 FROM table1 PROCEDURE

ANALYSE(>);

Procedure Analyze expects two optional parameters max_elements and max_memory.

max_memory is maximum memory that shall be allocated on per column basis while

trying to find all distinct values for a column.

max_elements defaults to a value of 256 and is used to identify the maximum

number of distinct values analyse will use per column. Distinct values are used

to identify if ENUM is a



suitable data type for a given column.

Advertisment

To test procedure analyse we created a table called user_master here is the

description of the table

mysql> desc user_master ;



+-----------------+---------------------+---


| Field | Type | Null | Key | Default | Extra |


+-----------------+---------------------+---


| user_id | int(11) | NO | PRI | | auto_increment |


| first_name | varchar(20) | YES | | NULL | |


| last_name | varchar(20) | YES | | NULL | |


+-----------------+----------------------+--


3 rows in set (0.00 sec)






We loaded the table with 16 users but ensured that first_name of the users is

"Pcquest" or "Preview" or "Test". We also ensured every user has unique

last_name. Following command was used to analyze the table and identify the

optimal data type.

mysql> select user_id,first_name,last_name from

user_master procedure analyse(10,50)\G



********* Row 1*****************


Field_name: mydb.user_master.user_id


Min_value: 1


Max_value: 16


Min_length: 1


Max_length: 2


Empties_or_zeros: 0


Nulls: 0


Avg_value_or_avg_length: 8.5000


Std: 4.6098


Optimal_fieldtype: TINYINT(2) UNSIGNED NOT NULL


**************Row 2********** ***


Field_name: mydb.user_master.first_name


Min_value: Pcquest


Max_value: Test


Min_length: 4


Max_length: 7


Empties_or_zeros: 0


Nulls: 0


Avg_value_or_avg_length: 4.1875


Std: NULL


Optimal_fieldtype: ENUM('Pcquest','Preview','Test') NOT NULL


*************Row 3************ Field_name: mydb.user_master.last_name


Min_value: Usr1


Max_value: User101


Min_length: 4


Max_length: 7


Empties_or_zeros: 0


Nulls: 0


Avg_value_or_avg_length: 4.1875


Std: NULL


Optimal_fieldtype: VARCHAR(7) NOT NULL






























As you can notice the optimal data type suggested for user_id is TINYINT, as

for only 16 users INT type is an overkill. TINYINT will consume 1 byte where as

INT will consume 4 bytes, a saving of 75% of space and memory.

Procedure Analyse at times reports optimal data type as ENUM where you least

expect. In the example above first_name is suggested to be ENUM as procedure

analyse will only see 3 distinct values in 16 records. As procedure analyse

works on existing records in the table and distinct records in the current data

set, ENUM will the most optimal data type for first_name.



By trying different values to procedure analyse, more suggestions about optimal
data type can be obtained. You may have to play with settings till procedure

analyse does not return ENUM data type when it is not appropriate.

On a fairly large record set, results of procedure analyse are very useful.

Vivek Khurana

Advertisment