Gone are the days when we used to write applications for the desktop which
would be used by only one user at a time. Almost all applications that we write
today are multi-user systems which would be used by many users concurrently. And
today we have sophisticated frameworks and application infrastructures that let
the application developer focus on solving the business problem at hand, where
the frameworks and infrastructures take care of concurrency. For any data-driven
application, if you use a database server like Microsoft SQL Server, the server
handles the database concurrency issues for you, whereas you, the developer,
focuses on your core task of solving your client's business problem.
Direct Hit! |
Applies To: Advanced database managers USP: Learn five transaction isolation levels in MS SQL Server 2005 & 2008 Primary Link:None Keywords: Database concurrency |
You will not have to write code to manage database level concurrency in most
projects you would be involved in. SQL Server would figure out how to let
multiple users access the same piece of data without them overwriting each
other's changes. However, there would be times when you would need better
control over how the server manages concurrency and then you would need to
provide guidelines to the server as to how you want it to manage concurrency for
you. For example, when you start tuning your database server for performance,
you will realize performance tuning is actually a tight rope walk. And more
often than not, you gain performance at the expense of concurrency and you gain
concurrency at the expense of performance. Managing database concurrency is
quite an involved topic in itself. For the remainder of the article, I would
focus on one of the most important aspects of managing concurrency for Microsoft
SQL Server, configuring the transaction isolation level. I will explain the term
as we go along. I am running my code samples on a Microsoft SQL Server 2008
database, but most of what I am discussing would work on the older versions
also.
Prepare the data
Let us first get some sample data ready that we will later try to modify
concurrently.
CREATE TABLE Employee
(
empid int primary key IDENTITY(1, 1),
empname nvarchar(20),
city nvarchar(15)
)
INSERT INTO Employee(empname, city)
VALUES('Ashish', 'Mumbai')
INSERT INTO Employee(empname, city)
VALUES('Asha', 'Pune')
To keep matters simple, we will not have a front end with multiple users who
try to modify data simultaneously. Instead, I will open up multiple query
Windows in SQL Server Management Studio to simulate multiple users.
Start Concurrent updates
In one query window, start updating the Employee table you just created
inside a transaction.
BEGIN TRANSACTION
UPDATE Employee
SET city = 'Delhi'
WHERE empid = 1
Do not commit or rollback the transaction right away. Instead, while this
connection is still open, start another query window by clicking on the 'New
Query' button on the toolbar. You will normally find this button on the top left
corner of Management Studio. In the new query window, let us now try to access
the same row which is under the update transaction in the old query window:
SELECT empid, city FROM Employee
WHERE empid = 1
You will see that this activitywill get blocked. This query will complete
only when you finish the transaction in the first query editor window:
ROLLBACK TRANSACTION
Management Studio neatly tabs both the query windows for easy navigation. You
could alternately have committed the transaction. Either way, the query in the
second query window will be unblocked and will proceed to completion.
In this case, the second query window waited for the first query window to
complete the transaction before reading the data the first query window was
trying to modify. This is the default behavior with SQL Server.
Configuring Concurrent Updates
What if you are in a hurry and want to read the data even before it is
committed or rolled back in the previous transaction. You would need to state
your intent then explicitly. Let us repeat the entire exercise then. This is
what you do in the first query window, as before:
BEGIN TRANSACTION
UPDATE Employee
SET city = 'Delhi'
WHERE empid = 1
This time, in the second query window, you specify that you want to read the
data before it has been committed.
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
SELECT empid, city FROM Employee
WHERE empid = 1
You will now find your query responds immediately. Now the second query is
not blocked by the first query by setting the transaction isolation level to
READ UNCOMMITTED. Reading the result of an uncommitted transaction of a
different connection, as we have done now, is called as dirty read. What did we
gain? The job is done faster, because we did not have to wait for the first
query window to complete its transaction. What did we lose? The transaction
could be committed or rolled back. If it is committed, data as you read in the
second query window is the final data. If it is rolled back, data read in the
second query window would be different after the rollback.
Now please rollback the first transaction.
ROLLBACK TRANSACTION
Execute the query in the second query window again.The result is different
because the transaction was rolled back.
If you recollect, before we set the transaction isolation level, the default
behavior was to wait till the other transaction completes. This is known as the
transaction isolation level READ COMMITTED. Instead of relying on the default
behavior, you can explicitly set the transaction isolation level to READ
COMMITTED.Let us repeat the concurrent updates that way then. This is what you
do in the first query editor window:
BEGIN TRANSACTION
UPDATE Employee
SET city = 'Delhi'
WHERE empid = 1
And this is what you do in the second query editor window:
SET TRANSACTION ISOLATION LEVEL READ COMMITTED
SELECT empid, city FROM Employee
WHERE empid = 1
As expected, this query is blocked. Let us roll back the first transaction.
ROLLBACK TRANSACTION
This will unblock the second query.
SNAPSHOT
Remember what had happened when we set the transaction isolation level to
READ UNCOMMITTED? We were getting the data after the update, even if it was not
committed. What if we would want the data before the update? We would set the
transaction isolation level to SNAPSHOT. You would first need to configure the
database to allow SNAPHOT isolation. You can execute the following statement in
any of the query windows. Only take care not to run this inside a transaction
boundary.
ALTER DATABASE payroll
SET ALLOW_SNAPSHOT_ISOLATION ON
My database name is payroll; substitute whatever is the name of your
database. Now let me attempt the same transaction in the first query window.
BEGIN TRANSACTION
UPDATE Employee
SET city = 'Delhi'
WHERE empid = 1
Set the transaction isolation level to SNAPSHOT in the second query window.
SET TRANSACTION ISOLATION LEVEL SNAPSHOT
SELECT empid, city FROM Employee
WHERE empid = 1
The query runs without blocking and shows the data as it was before the
modification. But if you choose the commit the transaction in the first query
window and read the data again, you will get the new values.
COMMIT TRANSACTION
Either in the case of your transaction level being READ UNCOMMITED or SNAPSHOT,
you say you want to read the data before the other transaction has finished
modification. You can take the choice of whether you want to see the data as it
was before the modification or after the modification, there is still no
guarantee the data will remain the same after the transaction completes, because
it may either commit or rollback.
REPEATABLE READ
Let us now run the query in the second window also as a transaction. We will
set the transaction isolation level back to the default — READ COMMITTED
SET TRANSACTION ISOLATION LEVEL READ COMMITTED
BEGIN TRANSACTION
SELECT empid, city FROM Employee
WHERE empid = 1
Run a transaction in the first query window.
BEGIN TRANSACTION
UPDATE Employee
SET city = 'Mumbai'
WHERE empid = 1
COMMIT TRANSACTION
Come back and execute the SELECT statement in the transaction of the second
query window.
SELECT empid, city FROM Employee
WHERE empid = 1
You will get the value after the modification.
What we notice is within the transaction of the second query window, the same
SELECT statement is giving two different results at two different points in
time. We may however want to make the read repeatable, that is, we want to be
assured that within the same transaction if I read the same piece of data, the
results would be the same, i.e., repeatable.
For that we need to set the transaction isolation level of the second query
window to REPEATABLE READ.
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ
BEGIN TRANSACTION
SELECT empid, city FROM Employee
WHERE empid = 1
Now run the update in the first query window. Now the first query is blocked
by the second query to make sure its read is repeatable. Complete the second
transaction so that the first query proceeds to completion.
COMMIT TRANSACTION
SERIALIZABLE
Start the transaction of the second query window again. Let us now see all
rows instead of only one row.
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ
BEGIN TRANSACTION
SELECT empid, empname, city FROM Employee
We now know the first query window would be blocked if it tries to do an
update. However, let us try something different.
BEGIN TRANSACTION
INSERT INTO Employee
VALUES('Atul', 'Bangalore')
COMMIT TRANSACTION
This would proceed to completion. The transaction in the second query window is
not complete. Let us run the SELECT statement again.
SELECT empid, empname, city FROM Employee
The same query is returning an extra row in the same transaction. This extra
row is what is called a phantom row. Let us complete the second transaction.
COMMIT TRANSACTION
If we do not want phantom rows to appear within the same transaction, we
would set the transaction isolation level to SERIALIZABLE.
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
BEGIN TRANSACTION
SELECT empid, empname, city FROM Employee
Insert a different row in the first query window.
BEGIN TRANSACTION
INSERT INTO Employee
VALUES('Avinash', 'Bangalore')
COMMIT TRANSACTION
The INSERT would now be blocked by the second query window.
Complete the second transaction.
COMMIT TRANSACTION
The first query will complete now.
Summary
We learnt five transaction isolation levels in Microsoft SQL Server 2005 and
2008 — READ UNCOMMITTED, SNAPSHOT, READ COMMITTED, REPEATABLE READ and
SERIALIZABLE. SNAPSHOT isolation level was introduced in SQL 2005, but the other
four would work on SQL Server 2000 also.
In principle, the higher the concurrency (SERIALIZABLE would be the highest),
the lower the performance because other users would be blocked for a long
duration. And higher the performance, lower the concurrency (READ UNCOMMITTED
would be the lowest). You would need to set the right balance between the two,
depending on your priorities for the specific application you are developing.
Amaresh Patnaik